[jira] [Created] (SPARK-48847) Resubmitting a stage without verifying the stage attempt number may result in an infinite loop
jiang13021 created SPARK-48847: -- Summary: Resubmitting a stage without verifying the stage attempt number may result in an infinite loop Key: SPARK-48847 URL: https://issues.apache.org/jira/browse/SPARK-48847 Project: Spark Issue Type: Bug Components: Scheduler Affects Versions: 3.5.1, 3.3.2, 3.4.2, 3.2.2 Reporter: jiang13021 In org.apache.spark.scheduler.DAGScheduler#processShuffleMapStageCompletion {code:java} private def processShuffleMapStageCompletion(shuffleStage: ShuffleMapStage): Unit = { // some code ... if (!shuffleStage.isAvailable) { // Some tasks had failed; let's resubmit this shuffleStage. // TODO: Lower-level scheduler should also deal with this logInfo(log"Resubmitting ${MDC(STAGE, shuffleStage)} " + log"(${MDC(STAGE_NAME, shuffleStage.name)}) " + log"because some of its tasks had failed: " + log"${MDC(PARTITION_IDS, shuffleStage.findMissingPartitions().mkString(", "))}") submitStage(shuffleStage) // resubmit without check } else { markMapStageJobsAsFinished(shuffleStage) submitWaitingChildStages(shuffleStage) } }{code} The code above shows that the DAGScheduler will resubmit the stage directly without checking if the stage attempt number is greater than maxConsecutiveStageAttempts. However, resubmitting the stage may still lead to failure or the stage may continually fail, causing an infinite loop. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46240) Add ExecutedPlanPrepRules to SparkSessionExtensions
[ https://issues.apache.org/jira/browse/SPARK-46240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiang13021 resolved SPARK-46240. Resolution: Won't Do As the discussion in [https://github.com/apache/spark/pull/44254] columnar rule is enough for this case > Add ExecutedPlanPrepRules to SparkSessionExtensions > --- > > Key: SPARK-46240 > URL: https://issues.apache.org/jira/browse/SPARK-46240 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0, 3.3.0, 3.4.0 >Reporter: jiang13021 >Priority: Major > Labels: pull-request-available > > Some rules (Rule[SparkPlan]) are applied when preparing for the executedPlan. > However, users do not have the ability to add rules in this context. > {code:java} > // org.apache.spark.sql.execution.QueryExecution#preparations > private[execution] def preparations( > sparkSession: SparkSession, > adaptiveExecutionRule: Option[InsertAdaptiveSparkPlan] = None, > subquery: Boolean): Seq[Rule[SparkPlan]] = { > // `AdaptiveSparkPlanExec` is a leaf node. If inserted, all the following > rules will be no-op > // as the original plan is hidden behind `AdaptiveSparkPlanExec`. > adaptiveExecutionRule.toSeq ++ > Seq( > CoalesceBucketsInJoin, > PlanDynamicPruningFilters(sparkSession), > PlanSubqueries(sparkSession), > RemoveRedundantProjects, > EnsureRequirements(), > // `ReplaceHashWithSortAgg` needs to be added after `EnsureRequirements` > to guarantee the > // sort order of each node is checked to be valid. > ReplaceHashWithSortAgg, > // `RemoveRedundantSorts` needs to be added after `EnsureRequirements` to > guarantee the same > // number of partitions when instantiating PartitioningCollection. > RemoveRedundantSorts, > DisableUnnecessaryBucketedScan, > ApplyColumnarRulesAndInsertTransitions( > sparkSession.sessionState.columnarRules, outputsColumnar = false), > CollapseCodegenStages()) ++ > (if (subquery) { > Nil > } else { > Seq(ReuseExchangeAndSubquery) > }) > }{code} > We need to add some "Rule[SparkPlan]"s at this position because currently, > all such rules are present in AQE, which requires users to use AQE and meet > the requirements to enter AdaptiveSparkPlanExec. This makes it difficult to > implement certain extensions for simple SQLs. > For example, adding some new datasource filters for external data sources is > challenging. Modifying DataSourceStrategy directly is not conducive to > staying in sync with future advancements in the community. Additionally, > customizing the Strategy makes it difficult to append new functionalities in > an incremental manner. If we define AQE rules, they would not be effective > for the simplest 'SELECT * FROM ... WHERE ...' statements. Therefore, it is > necessary to introduce a customizable Rule[SparkPlan] between sparkPlan and > executedPlan. > We could add an extension called "ExecutedPlanPrepRule" to > SparkSessionExtensions, which would allow users to add their own rules. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46240) Add ExecutedPlanPrepRules to SparkSessionExtensions
[ https://issues.apache.org/jira/browse/SPARK-46240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiang13021 updated SPARK-46240: --- Description: Some rules (Rule[SparkPlan]) are applied when preparing for the executedPlan. However, users do not have the ability to add rules in this context. {code:java} // org.apache.spark.sql.execution.QueryExecution#preparations private[execution] def preparations( sparkSession: SparkSession, adaptiveExecutionRule: Option[InsertAdaptiveSparkPlan] = None, subquery: Boolean): Seq[Rule[SparkPlan]] = { // `AdaptiveSparkPlanExec` is a leaf node. If inserted, all the following rules will be no-op // as the original plan is hidden behind `AdaptiveSparkPlanExec`. adaptiveExecutionRule.toSeq ++ Seq( CoalesceBucketsInJoin, PlanDynamicPruningFilters(sparkSession), PlanSubqueries(sparkSession), RemoveRedundantProjects, EnsureRequirements(), // `ReplaceHashWithSortAgg` needs to be added after `EnsureRequirements` to guarantee the // sort order of each node is checked to be valid. ReplaceHashWithSortAgg, // `RemoveRedundantSorts` needs to be added after `EnsureRequirements` to guarantee the same // number of partitions when instantiating PartitioningCollection. RemoveRedundantSorts, DisableUnnecessaryBucketedScan, ApplyColumnarRulesAndInsertTransitions( sparkSession.sessionState.columnarRules, outputsColumnar = false), CollapseCodegenStages()) ++ (if (subquery) { Nil } else { Seq(ReuseExchangeAndSubquery) }) }{code} We need to add some "Rule[SparkPlan]"s at this position because currently, all such rules are present in AQE, which requires users to use AQE and meet the requirements to enter AdaptiveSparkPlanExec. This makes it difficult to implement certain extensions for simple SQLs. For example, adding some new datasource filters for external data sources is challenging. Modifying DataSourceStrategy directly is not conducive to staying in sync with future advancements in the community. Additionally, customizing the Strategy makes it difficult to append new functionalities in an incremental manner. If we define AQE rules, they would not be effective for the simplest 'SELECT * FROM ... WHERE ...' statements. Therefore, it is necessary to introduce a customizable Rule[SparkPlan] between sparkPlan and executedPlan. We could add an extension called "ExecutedPlanPrepRule" to SparkSessionExtensions, which would allow users to add their own rules. was: Some rules (Rule[SparkPlan]) are applied when preparing for the executedPlan. However, users do not have the ability to add rules in this context. {code:java} // org.apache.spark.sql.execution.QueryExecution#preparations private[execution] def preparations( sparkSession: SparkSession, adaptiveExecutionRule: Option[InsertAdaptiveSparkPlan] = None, subquery: Boolean): Seq[Rule[SparkPlan]] = { // `AdaptiveSparkPlanExec` is a leaf node. If inserted, all the following rules will be no-op // as the original plan is hidden behind `AdaptiveSparkPlanExec`. adaptiveExecutionRule.toSeq ++ Seq( CoalesceBucketsInJoin, PlanDynamicPruningFilters(sparkSession), PlanSubqueries(sparkSession), RemoveRedundantProjects, EnsureRequirements(), // `ReplaceHashWithSortAgg` needs to be added after `EnsureRequirements` to guarantee the // sort order of each node is checked to be valid. ReplaceHashWithSortAgg, // `RemoveRedundantSorts` needs to be added after `EnsureRequirements` to guarantee the same // number of partitions when instantiating PartitioningCollection. RemoveRedundantSorts, DisableUnnecessaryBucketedScan, ApplyColumnarRulesAndInsertTransitions( sparkSession.sessionState.columnarRules, outputsColumnar = false), CollapseCodegenStages()) ++ (if (subquery) { Nil } else { Seq(ReuseExchangeAndSubquery) }) }{code} We could add an extension called "PrepExecutedPlanRule" to SparkSessionExtensions, which would allow users to add their own rules. Summary: Add ExecutedPlanPrepRules to SparkSessionExtensions (was: Add PrepExecutedPlanRule to SparkSessionExtensions) > Add ExecutedPlanPrepRules to SparkSessionExtensions > --- > > Key: SPARK-46240 > URL: https://issues.apache.org/jira/browse/SPARK-46240 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0, 3.3.0, 3.4.0 >Reporter: jiang13021 >Priority: Major > > Some rules (Rule[SparkPlan]) are applied when preparing for the executedPlan. > However, users do not have the ability to add rules in this context. > {code:java} > // org.apache.spark.sql.execution.QueryExecution#preparations > private[execution] def preparations( > sparkSess
[jira] [Created] (SPARK-46240) Add PrepExecutedPlanRule to SparkSessionExtensions
jiang13021 created SPARK-46240: -- Summary: Add PrepExecutedPlanRule to SparkSessionExtensions Key: SPARK-46240 URL: https://issues.apache.org/jira/browse/SPARK-46240 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0, 3.3.0, 3.2.0 Reporter: jiang13021 Some rules (Rule[SparkPlan]) are applied when preparing for the executedPlan. However, users do not have the ability to add rules in this context. {code:java} // org.apache.spark.sql.execution.QueryExecution#preparations private[execution] def preparations( sparkSession: SparkSession, adaptiveExecutionRule: Option[InsertAdaptiveSparkPlan] = None, subquery: Boolean): Seq[Rule[SparkPlan]] = { // `AdaptiveSparkPlanExec` is a leaf node. If inserted, all the following rules will be no-op // as the original plan is hidden behind `AdaptiveSparkPlanExec`. adaptiveExecutionRule.toSeq ++ Seq( CoalesceBucketsInJoin, PlanDynamicPruningFilters(sparkSession), PlanSubqueries(sparkSession), RemoveRedundantProjects, EnsureRequirements(), // `ReplaceHashWithSortAgg` needs to be added after `EnsureRequirements` to guarantee the // sort order of each node is checked to be valid. ReplaceHashWithSortAgg, // `RemoveRedundantSorts` needs to be added after `EnsureRequirements` to guarantee the same // number of partitions when instantiating PartitioningCollection. RemoveRedundantSorts, DisableUnnecessaryBucketedScan, ApplyColumnarRulesAndInsertTransitions( sparkSession.sessionState.columnarRules, outputsColumnar = false), CollapseCodegenStages()) ++ (if (subquery) { Nil } else { Seq(ReuseExchangeAndSubquery) }) }{code} We could add an extension called "PrepExecutedPlanRule" to SparkSessionExtensions, which would allow users to add their own rules. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46240) Add PrepExecutedPlanRule to SparkSessionExtensions
[ https://issues.apache.org/jira/browse/SPARK-46240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiang13021 updated SPARK-46240: --- Description: Some rules (Rule[SparkPlan]) are applied when preparing for the executedPlan. However, users do not have the ability to add rules in this context. {code:java} // org.apache.spark.sql.execution.QueryExecution#preparations private[execution] def preparations( sparkSession: SparkSession, adaptiveExecutionRule: Option[InsertAdaptiveSparkPlan] = None, subquery: Boolean): Seq[Rule[SparkPlan]] = { // `AdaptiveSparkPlanExec` is a leaf node. If inserted, all the following rules will be no-op // as the original plan is hidden behind `AdaptiveSparkPlanExec`. adaptiveExecutionRule.toSeq ++ Seq( CoalesceBucketsInJoin, PlanDynamicPruningFilters(sparkSession), PlanSubqueries(sparkSession), RemoveRedundantProjects, EnsureRequirements(), // `ReplaceHashWithSortAgg` needs to be added after `EnsureRequirements` to guarantee the // sort order of each node is checked to be valid. ReplaceHashWithSortAgg, // `RemoveRedundantSorts` needs to be added after `EnsureRequirements` to guarantee the same // number of partitions when instantiating PartitioningCollection. RemoveRedundantSorts, DisableUnnecessaryBucketedScan, ApplyColumnarRulesAndInsertTransitions( sparkSession.sessionState.columnarRules, outputsColumnar = false), CollapseCodegenStages()) ++ (if (subquery) { Nil } else { Seq(ReuseExchangeAndSubquery) }) }{code} We could add an extension called "PrepExecutedPlanRule" to SparkSessionExtensions, which would allow users to add their own rules. was: Some rules (Rule[SparkPlan]) are applied when preparing for the executedPlan. However, users do not have the ability to add rules in this context. {code:java} // org.apache.spark.sql.execution.QueryExecution#preparations private[execution] def preparations( sparkSession: SparkSession, adaptiveExecutionRule: Option[InsertAdaptiveSparkPlan] = None, subquery: Boolean): Seq[Rule[SparkPlan]] = { // `AdaptiveSparkPlanExec` is a leaf node. If inserted, all the following rules will be no-op // as the original plan is hidden behind `AdaptiveSparkPlanExec`. adaptiveExecutionRule.toSeq ++ Seq( CoalesceBucketsInJoin, PlanDynamicPruningFilters(sparkSession), PlanSubqueries(sparkSession), RemoveRedundantProjects, EnsureRequirements(), // `ReplaceHashWithSortAgg` needs to be added after `EnsureRequirements` to guarantee the // sort order of each node is checked to be valid. ReplaceHashWithSortAgg, // `RemoveRedundantSorts` needs to be added after `EnsureRequirements` to guarantee the same // number of partitions when instantiating PartitioningCollection. RemoveRedundantSorts, DisableUnnecessaryBucketedScan, ApplyColumnarRulesAndInsertTransitions( sparkSession.sessionState.columnarRules, outputsColumnar = false), CollapseCodegenStages()) ++ (if (subquery) { Nil } else { Seq(ReuseExchangeAndSubquery) }) }{code} We could add an extension called "PrepExecutedPlanRule" to SparkSessionExtensions, which would allow users to add their own rules. > Add PrepExecutedPlanRule to SparkSessionExtensions > -- > > Key: SPARK-46240 > URL: https://issues.apache.org/jira/browse/SPARK-46240 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0, 3.3.0, 3.4.0 >Reporter: jiang13021 >Priority: Major > > Some rules (Rule[SparkPlan]) are applied when preparing for the executedPlan. > However, users do not have the ability to add rules in this context. > {code:java} > // org.apache.spark.sql.execution.QueryExecution#preparations > private[execution] def preparations( > sparkSession: SparkSession, > adaptiveExecutionRule: Option[InsertAdaptiveSparkPlan] = None, > subquery: Boolean): Seq[Rule[SparkPlan]] = { > // `AdaptiveSparkPlanExec` is a leaf node. If inserted, all the following > rules will be no-op > // as the original plan is hidden behind `AdaptiveSparkPlanExec`. > adaptiveExecutionRule.toSeq ++ > Seq( > CoalesceBucketsInJoin, > PlanDynamicPruningFilters(sparkSession), > PlanSubqueries(sparkSession), > RemoveRedundantProjects, > EnsureRequirements(), > // `ReplaceHashWithSortAgg` needs to be added after `EnsureRequirements` > to guarantee the > // sort order of each node is checked to be valid. > ReplaceHashWithSortAgg, > // `RemoveRedundantSorts` needs to be added after `EnsureRequirements` to > guarantee the same > // number of partitions when instantiating PartitioningCollection. > RemoveRedundantSorts, > DisableUnnecessaryBucketedScan, >
[jira] [Created] (SPARK-43218) Support "ESCAPE BY" in SparkScriptTransformationExec
jiang13021 created SPARK-43218: -- Summary: Support "ESCAPE BY" in SparkScriptTransformationExec Key: SPARK-43218 URL: https://issues.apache.org/jira/browse/SPARK-43218 Project: Spark Issue Type: Wish Components: SQL Affects Versions: 3.4.0, 3.3.0, 3.2.0 Reporter: jiang13021 If I don't `set spark.sql.catalogImplementation=hive`, I can't use "SELECT TRANSFORM" with "ESCAPE BY". Although HiveScriptTransform also doesn't implement ESCAPE BY, I can use RowFormatSerde to achieve this ability. In fact, HiveScriptTransform doesn't need to connect to Hive Metastore. I can use reflection to forcibly call HiveScriptTransformationExec without connecting to Hive Metastore, and it can work properly. Maybe HiveScriptTransform can be more generic. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42552) Get ParseException when run sql: "SELECT 1 UNION SELECT 1;"
[ https://issues.apache.org/jira/browse/SPARK-42552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17696395#comment-17696395 ] jiang13021 commented on SPARK-42552: The problem may be in this location: [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParseDriver.scala#L126] When the `PredictionMode` is `SLL`, `AstBuilder` will throw `ParseException` instead of `ParseCancellationException`,so the parser doesn't try `LL` mode. In fact, if we use `LL` mode, we can parse the sql correctly. > Get ParseException when run sql: "SELECT 1 UNION SELECT 1;" > --- > > Key: SPARK-42552 > URL: https://issues.apache.org/jira/browse/SPARK-42552 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.3 > Environment: Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java > 1.8.0_345) > Spark version 3.2.3-SNAPSHOT >Reporter: jiang13021 >Priority: Major > Fix For: 3.2.3 > > > When I run sql > {code:java} > scala> spark.sql("SELECT 1 UNION SELECT 1;") {code} > I get ParseException: > {code:java} > org.apache.spark.sql.catalyst.parser.ParseException: > mismatched input 'SELECT' expecting {, ';'}(line 1, pos 15)== SQL == > SELECT 1 UNION SELECT 1; > ---^^^ at > org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:266) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:127) > at > org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:51) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:77) > at org.apache.spark.sql.SparkSession.$anonfun$sql$2(SparkSession.scala:616) > at > org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111) > at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:616) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613) > ... 47 elided > {code} > If I run with parentheses , it works well > {code:java} > scala> spark.sql("(SELECT 1) UNION (SELECT 1);") > res4: org.apache.spark.sql.DataFrame = [1: int]{code} > This should be a bug > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42552) Get ParseException when run sql: "SELECT 1 UNION SELECT 1;"
[ https://issues.apache.org/jira/browse/SPARK-42552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiang13021 updated SPARK-42552: --- Priority: Major (was: Minor) > Get ParseException when run sql: "SELECT 1 UNION SELECT 1;" > --- > > Key: SPARK-42552 > URL: https://issues.apache.org/jira/browse/SPARK-42552 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.3 > Environment: Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java > 1.8.0_345) > Spark version 3.2.3-SNAPSHOT >Reporter: jiang13021 >Priority: Major > Fix For: 3.2.3 > > > When I run sql > {code:java} > scala> spark.sql("SELECT 1 UNION SELECT 1;") {code} > I get ParseException: > {code:java} > org.apache.spark.sql.catalyst.parser.ParseException: > mismatched input 'SELECT' expecting {, ';'}(line 1, pos 15)== SQL == > SELECT 1 UNION SELECT 1; > ---^^^ at > org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:266) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:127) > at > org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:51) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:77) > at org.apache.spark.sql.SparkSession.$anonfun$sql$2(SparkSession.scala:616) > at > org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111) > at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:616) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613) > ... 47 elided > {code} > If I run with parentheses , it works well > {code:java} > scala> spark.sql("(SELECT 1) UNION (SELECT 1);") > res4: org.apache.spark.sql.DataFrame = [1: int]{code} > This should be a bug > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42553) NonReserved keyword "interval" can't be column name
[ https://issues.apache.org/jira/browse/SPARK-42553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiang13021 updated SPARK-42553: --- Affects Version/s: 3.3.2 3.3.1 3.3.0 > NonReserved keyword "interval" can't be column name > --- > > Key: SPARK-42553 > URL: https://issues.apache.org/jira/browse/SPARK-42553 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0, 3.3.1, 3.2.3, 3.3.2 > Environment: Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java > 1.8.0_345) > Spark version 3.2.3-SNAPSHOT >Reporter: jiang13021 >Priority: Major > > INTERVAL is a Non-Reserved keyword in spark. "Non-Reserved keywords" have a > special meaning in particular contexts and can be used as identifiers in > other contexts. So by design, interval can be used as a column name. > {code:java} > scala> spark.sql("select interval from mytable") > org.apache.spark.sql.catalyst.parser.ParseException: > at least one time unit should be given for interval literal(line 1, pos 7)== > SQL == > select interval from mytable > ---^^^ at > org.apache.spark.sql.errors.QueryParsingErrors$.invalidIntervalLiteralError(QueryParsingErrors.scala:196) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.$anonfun$parseIntervalLiteral$1(AstBuilder.scala:2481) > at > org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:133) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.parseIntervalLiteral(AstBuilder.scala:2466) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.$anonfun$visitInterval$1(AstBuilder.scala:2432) > at > org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:133) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.visitInterval(AstBuilder.scala:2431) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.visitInterval(AstBuilder.scala:57) > at > org.apache.spark.sql.catalyst.parser.SqlBaseParser$IntervalContext.accept(SqlBaseParser.java:17308) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.visitChildren(AstBuilder.scala:71) > at > org.apache.spark.sql.catalyst.parser.SqlBaseBaseVisitor.visitIntervalLiteral(SqlBaseBaseVisitor.java:1581) > at > org.apache.spark.sql.catalyst.parser.SqlBaseParser$IntervalLiteralContext.accept(SqlBaseParser.java:16929) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.visitChildren(AstBuilder.scala:71) > at > org.apache.spark.sql.catalyst.parser.SqlBaseBaseVisitor.visitConstantDefault(SqlBaseBaseVisitor.java:1511) > at > org.apache.spark.sql.catalyst.parser.SqlBaseParser$ConstantDefaultContext.accept(SqlBaseParser.java:15905) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.visitChildren(AstBuilder.scala:71) > at > org.apache.spark.sql.catalyst.parser.SqlBaseBaseVisitor.visitValueExpressionDefault(SqlBaseBaseVisitor.java:1392) > at > org.apache.spark.sql.catalyst.parser.SqlBaseParser$ValueExpressionDefaultContext.accept(SqlBaseParser.java:15298) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.typedVisit(AstBuilder.scala:61) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.expression(AstBuilder.scala:1412) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.$anonfun$visitPredicated$1(AstBuilder.scala:1548) > at > org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:133) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.visitPredicated(AstBuilder.scala:1547) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.visitPredicated(AstBuilder.scala:57) > at > org.apache.spark.sql.catalyst.parser.SqlBaseParser$PredicatedContext.accept(SqlBaseParser.java:14745) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.visitChildren(AstBuilder.scala:71) > at > org.apache.spark.sql.catalyst.parser.SqlBaseBaseVisitor.visitExpression(SqlBaseBaseVisitor.java:1343) > at > org.apache.spark.sql.catalyst.parser.SqlBaseParser$ExpressionContext.accept(SqlBaseParser.java:14606) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.typedVisit(AstBuilder.scala:61) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.expression(AstBuilder.scala:1412) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.$anonfun$visitNamedExpression$1(AstBuilder.scala:1434) > at > org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:133) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.visitNamedExpression(AstBuilder.scala:1433) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.visitNamedExpression(AstBuilder.scala:57) > at > org.apache.spark.sql.catalyst.parser.SqlBaseParser$NamedExpressionContext.accept(SqlBaseParser.java:14124) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.typedVisit(Ast
[jira] [Created] (SPARK-42553) NonReserved keyword "interval" can't be column name
jiang13021 created SPARK-42553: -- Summary: NonReserved keyword "interval" can't be column name Key: SPARK-42553 URL: https://issues.apache.org/jira/browse/SPARK-42553 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.3 Environment: Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 1.8.0_345) Spark version 3.2.3-SNAPSHOT Reporter: jiang13021 INTERVAL is a Non-Reserved keyword in spark. "Non-Reserved keywords" have a special meaning in particular contexts and can be used as identifiers in other contexts. So by design, interval can be used as a column name. {code:java} scala> spark.sql("select interval from mytable") org.apache.spark.sql.catalyst.parser.ParseException: at least one time unit should be given for interval literal(line 1, pos 7)== SQL == select interval from mytable ---^^^ at org.apache.spark.sql.errors.QueryParsingErrors$.invalidIntervalLiteralError(QueryParsingErrors.scala:196) at org.apache.spark.sql.catalyst.parser.AstBuilder.$anonfun$parseIntervalLiteral$1(AstBuilder.scala:2481) at org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:133) at org.apache.spark.sql.catalyst.parser.AstBuilder.parseIntervalLiteral(AstBuilder.scala:2466) at org.apache.spark.sql.catalyst.parser.AstBuilder.$anonfun$visitInterval$1(AstBuilder.scala:2432) at org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:133) at org.apache.spark.sql.catalyst.parser.AstBuilder.visitInterval(AstBuilder.scala:2431) at org.apache.spark.sql.catalyst.parser.AstBuilder.visitInterval(AstBuilder.scala:57) at org.apache.spark.sql.catalyst.parser.SqlBaseParser$IntervalContext.accept(SqlBaseParser.java:17308) at org.apache.spark.sql.catalyst.parser.AstBuilder.visitChildren(AstBuilder.scala:71) at org.apache.spark.sql.catalyst.parser.SqlBaseBaseVisitor.visitIntervalLiteral(SqlBaseBaseVisitor.java:1581) at org.apache.spark.sql.catalyst.parser.SqlBaseParser$IntervalLiteralContext.accept(SqlBaseParser.java:16929) at org.apache.spark.sql.catalyst.parser.AstBuilder.visitChildren(AstBuilder.scala:71) at org.apache.spark.sql.catalyst.parser.SqlBaseBaseVisitor.visitConstantDefault(SqlBaseBaseVisitor.java:1511) at org.apache.spark.sql.catalyst.parser.SqlBaseParser$ConstantDefaultContext.accept(SqlBaseParser.java:15905) at org.apache.spark.sql.catalyst.parser.AstBuilder.visitChildren(AstBuilder.scala:71) at org.apache.spark.sql.catalyst.parser.SqlBaseBaseVisitor.visitValueExpressionDefault(SqlBaseBaseVisitor.java:1392) at org.apache.spark.sql.catalyst.parser.SqlBaseParser$ValueExpressionDefaultContext.accept(SqlBaseParser.java:15298) at org.apache.spark.sql.catalyst.parser.AstBuilder.typedVisit(AstBuilder.scala:61) at org.apache.spark.sql.catalyst.parser.AstBuilder.expression(AstBuilder.scala:1412) at org.apache.spark.sql.catalyst.parser.AstBuilder.$anonfun$visitPredicated$1(AstBuilder.scala:1548) at org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:133) at org.apache.spark.sql.catalyst.parser.AstBuilder.visitPredicated(AstBuilder.scala:1547) at org.apache.spark.sql.catalyst.parser.AstBuilder.visitPredicated(AstBuilder.scala:57) at org.apache.spark.sql.catalyst.parser.SqlBaseParser$PredicatedContext.accept(SqlBaseParser.java:14745) at org.apache.spark.sql.catalyst.parser.AstBuilder.visitChildren(AstBuilder.scala:71) at org.apache.spark.sql.catalyst.parser.SqlBaseBaseVisitor.visitExpression(SqlBaseBaseVisitor.java:1343) at org.apache.spark.sql.catalyst.parser.SqlBaseParser$ExpressionContext.accept(SqlBaseParser.java:14606) at org.apache.spark.sql.catalyst.parser.AstBuilder.typedVisit(AstBuilder.scala:61) at org.apache.spark.sql.catalyst.parser.AstBuilder.expression(AstBuilder.scala:1412) at org.apache.spark.sql.catalyst.parser.AstBuilder.$anonfun$visitNamedExpression$1(AstBuilder.scala:1434) at org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:133) at org.apache.spark.sql.catalyst.parser.AstBuilder.visitNamedExpression(AstBuilder.scala:1433) at org.apache.spark.sql.catalyst.parser.AstBuilder.visitNamedExpression(AstBuilder.scala:57) at org.apache.spark.sql.catalyst.parser.SqlBaseParser$NamedExpressionContext.accept(SqlBaseParser.java:14124) at org.apache.spark.sql.catalyst.parser.AstBuilder.typedVisit(AstBuilder.scala:61) at org.apache.spark.sql.catalyst.parser.AstBuilder.$anonfun$visitNamedExpressionSeq$2(AstBuilder.scala:628) at scala.collection.immutable.List.map(List.scala:293) at org.apache.spark.sql.catalyst.parser.AstBuilder.visitNamedExpressionSeq(AstBuilder.scala:628) at org.apache.spark.sql.catalyst.parser.AstBuilder.$anonfun$withSelectQuerySpecification$1(AstBuilder.scala:734) at org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(Pars
[jira] [Updated] (SPARK-42552) Get ParseException when run sql: "SELECT 1 UNION SELECT 1;"
[ https://issues.apache.org/jira/browse/SPARK-42552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiang13021 updated SPARK-42552: --- Summary: Get ParseException when run sql: "SELECT 1 UNION SELECT 1;" (was: Got ParseException when run sql: "SELECT 1 UNION SELECT 1;") > Get ParseException when run sql: "SELECT 1 UNION SELECT 1;" > --- > > Key: SPARK-42552 > URL: https://issues.apache.org/jira/browse/SPARK-42552 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.3 > Environment: Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java > 1.8.0_345) > Spark version 3.2.3-SNAPSHOT >Reporter: jiang13021 >Priority: Minor > Fix For: 3.2.3 > > > When I run sql > {code:java} > scala> spark.sql("SELECT 1 UNION SELECT 1;") {code} > I get ParseException: > {code:java} > org.apache.spark.sql.catalyst.parser.ParseException: > mismatched input 'SELECT' expecting {, ';'}(line 1, pos 15)== SQL == > SELECT 1 UNION SELECT 1; > ---^^^ at > org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:266) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:127) > at > org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:51) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:77) > at org.apache.spark.sql.SparkSession.$anonfun$sql$2(SparkSession.scala:616) > at > org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111) > at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:616) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613) > ... 47 elided > {code} > If I run with parentheses , it works well > {code:java} > scala> spark.sql("(SELECT 1) UNION (SELECT 1);") > res4: org.apache.spark.sql.DataFrame = [1: int]{code} > This should be a bug > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42552) Got ParseException when run sql: "SELECT 1 UNION SELECT 1;"
[ https://issues.apache.org/jira/browse/SPARK-42552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiang13021 updated SPARK-42552: --- Description: When I run sql {code:java} scala> spark.sql("SELECT 1 UNION SELECT 1;") {code} I get ParseException: {code:java} org.apache.spark.sql.catalyst.parser.ParseException: mismatched input 'SELECT' expecting {, ';'}(line 1, pos 15)== SQL == SELECT 1 UNION SELECT 1; ---^^^ at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:266) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:127) at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:51) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:77) at org.apache.spark.sql.SparkSession.$anonfun$sql$2(SparkSession.scala:616) at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111) at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:616) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613) ... 47 elided {code} If I run with parentheses , it works well {code:java} scala> spark.sql("(SELECT 1) UNION (SELECT 1);") res4: org.apache.spark.sql.DataFrame = [1: int]{code} This should be a bug was: When I run sql {code:java} scala> spark.sql("SELECT 1 UNION SELECT 1;") {code} I get ParseException: {code:java} org.apache.spark.sql.catalyst.parser.ParseException: mismatched input 'SELECT' expecting {, ';'}(line 1, pos 15)== SQL == SELECT 1 UNION SELECT 1; ---^^^ at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:266) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:127) at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:51) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:77) at org.apache.spark.sql.SparkSession.$anonfun$sql$2(SparkSession.scala:616) at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111) at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:616) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613) ... 47 elided {code} If I run with parentheses , it works well {code:java} scala> spark.sql("(SELECT 1) UNION (SELECT 1);") res4: org.apache.spark.sql.DataFrame = [1: int]{code} This should be a bug > Got ParseException when run sql: "SELECT 1 UNION SELECT 1;" > --- > > Key: SPARK-42552 > URL: https://issues.apache.org/jira/browse/SPARK-42552 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.3 > Environment: Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java > 1.8.0_345) > Spark version 3.2.3-SNAPSHOT >Reporter: jiang13021 >Priority: Minor > Fix For: 3.2.3 > > > When I run sql > {code:java} > scala> spark.sql("SELECT 1 UNION SELECT 1;") {code} > I get ParseException: > {code:java} > org.apache.spark.sql.catalyst.parser.ParseException: > mismatched input 'SELECT' expecting {, ';'}(line 1, pos 15)== SQL == > SELECT 1 UNION SELECT 1; > ---^^^ at > org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:266) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:127) > at > org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:51) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:77) > at org.apache.spark.sql.SparkSession.$anonfun$sql$2(SparkSession.scala:616) > at > org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111) > at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:616) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613) > ... 47 elided > {code} > If I run with parentheses , it works well > {code:java} > scala> spark.sql("(SELECT 1) UNION (SELECT 1);") > res4: org.apache.spark.sql.DataFrame = [1: int]{code} > This should be a bug > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42552) Got ParseException when run sql: "SELECT 1 UNION SELECT 1;"
jiang13021 created SPARK-42552: -- Summary: Got ParseException when run sql: "SELECT 1 UNION SELECT 1;" Key: SPARK-42552 URL: https://issues.apache.org/jira/browse/SPARK-42552 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.3 Environment: Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 1.8.0_345) Spark version 3.2.3-SNAPSHOT Reporter: jiang13021 Fix For: 3.2.3 When I run sql {code:java} scala> spark.sql("SELECT 1 UNION SELECT 1;") {code} I get ParseException: {code:java} org.apache.spark.sql.catalyst.parser.ParseException: mismatched input 'SELECT' expecting {, ';'}(line 1, pos 15)== SQL == SELECT 1 UNION SELECT 1; ---^^^ at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:266) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:127) at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:51) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:77) at org.apache.spark.sql.SparkSession.$anonfun$sql$2(SparkSession.scala:616) at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111) at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:616) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613) ... 47 elided {code} If I run with parentheses , it works well {code:java} scala> spark.sql("(SELECT 1) UNION (SELECT 1);") res4: org.apache.spark.sql.DataFrame = [1: int]{code} This should be a bug -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org