[jira] [Updated] (SPARK-45383) Missing case for RelationTimeTravel in CheckAnalysis
[ https://issues.apache.org/jira/browse/SPARK-45383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Johnson updated SPARK-45383: - Description: {{CheckAnalysis.checkAnalysis0}} lacks a case for {{{}RelationTimeTravel{}}}, and since the latter is (intentionally) an {{UnresolvedLeafNode}} rather than a {{{}UnaryNode{}}}, the existing checks do not traverse it. Result: Attempting time travel over a non-existing table produces a spark internal error from the [default case|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L818], rather than the expected {{{}AnalysisException{}}}: {code:java} [info] Cause: org.apache.spark.SparkException: [INTERNAL_ERROR] Found the unresolved operator: 'RelationTimeTravel 'UnresolvedRelation [not_exists], [], false, 0 [info] at org.apache.spark.SparkException$.internalError(SparkException.scala:77) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$54(CheckAnalysis.scala:753) {code} Fix should be simple enough: {code:java} case tt: RelationTimeTravel => checkAnalysis0(tt.table) {code} was: {{CheckAnalysis.checkAnalysis0}} lacks a case for {{{}RelationTimeTravel{}}}, and since the latter is (intentionally) an {{UnresolvedLeafNode}} rather than a {{{}UnaryNode{}}}, the existing checks do not traverse it. Result: Attempting time travel over a non-existing table produces a spark internal error from the [default case|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L818], rather than the expected {{{}AnalysisException{}}}: {code:java} [info] Cause: org.apache.spark.SparkException: [INTERNAL_ERROR] Found the unresolved operator: 'RelationTimeTravel 'UnresolvedRelation [not_exists], [], false, 0 [info] at org.apache.spark.SparkException$.internalError(SparkException.scala:77) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$54(CheckAnalysis.scala:753) {code} Solution should be simple enough: {code:java} case tt: RelationTimeTravel => checkAnalysis0(tt.table) {code} > Missing case for RelationTimeTravel in CheckAnalysis > > > Key: SPARK-45383 > URL: https://issues.apache.org/jira/browse/SPARK-45383 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Ryan Johnson >Priority: Major > > {{CheckAnalysis.checkAnalysis0}} lacks a case for {{{}RelationTimeTravel{}}}, > and since the latter is (intentionally) an {{UnresolvedLeafNode}} rather than > a {{{}UnaryNode{}}}, the existing checks do not traverse it. > Result: Attempting time travel over a non-existing table produces a spark > internal error from the [default > case|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L818], > rather than the expected {{{}AnalysisException{}}}: > {code:java} > [info] Cause: org.apache.spark.SparkException: [INTERNAL_ERROR] Found the > unresolved operator: 'RelationTimeTravel 'UnresolvedRelation [not_exists], > [], false, 0 > [info] at > org.apache.spark.SparkException$.internalError(SparkException.scala:77) > [info] at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$54(CheckAnalysis.scala:753) > {code} > Fix should be simple enough: > {code:java} > case tt: RelationTimeTravel => > checkAnalysis0(tt.table) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45383) Missing case for RelationTimeTravel in CheckAnalysis
Ryan Johnson created SPARK-45383: Summary: Missing case for RelationTimeTravel in CheckAnalysis Key: SPARK-45383 URL: https://issues.apache.org/jira/browse/SPARK-45383 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.5.0 Reporter: Ryan Johnson {{CheckAnalysis.checkAnalysis0}} lacks a case for {{{}RelationTimeTravel{}}}, and since the latter is (intentionally) an {{UnresolvedLeafNode}} rather than a {{{}UnaryNode{}}}, the existing checks do not traverse it. Result: Attempting time travel over a non-existing table produces a spark internal error from the [default case|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L818], rather than the expected {{{}AnalysisException{}}}: {code:java} [info] Cause: org.apache.spark.SparkException: [INTERNAL_ERROR] Found the unresolved operator: 'RelationTimeTravel 'UnresolvedRelation [not_exists], [], false, 0 [info] at org.apache.spark.SparkException$.internalError(SparkException.scala:77) [info] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$54(CheckAnalysis.scala:753) {code} Solution should be simple enough: {code:java} case tt: RelationTimeTravel => checkAnalysis0(tt.table) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45319) SHOW COLUMNS namespace is not catalog-aware
[ https://issues.apache.org/jira/browse/SPARK-45319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Johnson updated SPARK-45319: - Description: Today, the namespace argument of {{SHOW COLUMNS}} command is only partly catalog-aware. [AstBuilder.scala|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala#L4693] correctly pulls in and applies namespaces with more than one element, but [ResolveSessionCatalog.scala|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala#L316] does not correctly validate a multi-element namespace against the qualified table name: {code:java} case ShowColumns(ResolvedV1TableOrViewIdentifier(ident), ns, output) => val v1TableName = ident val resolver = conf.resolver val db = ns match { case Some(db) if v1TableName.database.exists(!resolver(_, db.head)) => throw QueryCompilationErrors.showColumnsWithConflictDatabasesError(db, v1TableName) case _ => ns.map(_.head) } {code} By always/only checking {{db.head}} against {{{}v1TableName.database{}}}, it will not correctly handle e.g. {code:java} SHOW COLUMNS FOR a.b.table IN a.b -- incorrectly fails {code} There is also some ambiguity in the semantics of the check: {code:java} SHOW COLUMNS FOR a.b.table IN a -- fails today (should it succeed?) SHOW COLUMNS FOR a.b.table IN b -- succeeds today (should it fail?) {code} It might work better to use an actual {{Identifier}} (instead of {{{}TableIdentifier{}}}), and compare its {{namespace}} against the user-provided namespace? Tangentially: It's a arguably a bit strange for the validity check to reside in {{ResolveSessionCatalog}} rule, given that it doesn't actually have anything to do with session catalogs? This seems more an artifact of an implementation that searches specifically for V1 tables, and {{ResolveSessionCatalog}} providing matchers that make that job easier? was: Today, the namespace argument of {{SHOW COLUMNS}} command is only partly catalog-aware. [AstBuilder.scala|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala#L4693] correctly pulls in and applies namespaces with more than one element, but [ResolveSessionCatalog.scala|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala#L316] does not correctly validate a multi-element namespace against the qualified table name: {code:java} case ShowColumns(ResolvedV1TableOrViewIdentifier(ident), ns, output) => val v1TableName = ident val resolver = conf.resolver val db = ns match { case Some(db) if v1TableName.database.exists(!resolver(_, db.head)) => throw QueryCompilationErrors.showColumnsWithConflictDatabasesError(db, v1TableName) case _ => ns.map(_.head) } {code} By always/only checking {{db.head}} against {{{}v1TableName.database{}}}, it will not correctly handle e.g. {code:java} SHOW COLUMNS FOR a.b.table IN a.b -- incorrectly fails {code} There is also some ambiguity in the semantics of the check: {code:java} SHOW COLUMNS FOR a.b.table IN a -- fails today (should it succeed?) SHOW COLUMNS FOR a.b.table IN b -- succeeds today (should it fail?) {code} A better solution would use an actual {{Identifier}} and compare its {{namespace}} against the user-provided namespace. Tangentially: It's a arguably a bit strange for the validity check to reside in {{ResolveSessionCatalog}} rule, given that it doesn't actually have anything to do with session catalogs? This seems more an artifact of an implementation that searches specifically for V1 tables, and {{ResolveSessionCatalog}} providing matchers that make that job easier? > SHOW COLUMNS namespace is not catalog-aware > > > Key: SPARK-45319 > URL: https://issues.apache.org/jira/browse/SPARK-45319 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Ryan Johnson >Priority: Major > > Today, the namespace argument of {{SHOW COLUMNS}} command is only partly > catalog-aware. > [AstBuilder.scala|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala#L4693] > correctly pulls in and applies namespaces with more than one element, but > [ResolveSessionCatalog.scala|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala#L316] > does not correctly validate a multi-element namespace against the qualified > table name: > {code:java} > case ShowColumns(Reso
[jira] [Updated] (SPARK-45319) SHOW COLUMNS namespace is not catalog-aware
[ https://issues.apache.org/jira/browse/SPARK-45319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Johnson updated SPARK-45319: - Description: Today, the namespace argument of {{SHOW COLUMNS}} command is only partly catalog-aware. [AstBuilder.scala|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala#L4693] correctly pulls in and applies namespaces with more than one element, but [ResolveSessionCatalog.scala|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala#L316] does not correctly validate a multi-element namespace against the qualified table name: {code:java} case ShowColumns(ResolvedV1TableOrViewIdentifier(ident), ns, output) => val v1TableName = ident val resolver = conf.resolver val db = ns match { case Some(db) if v1TableName.database.exists(!resolver(_, db.head)) => throw QueryCompilationErrors.showColumnsWithConflictDatabasesError(db, v1TableName) case _ => ns.map(_.head) } {code} By always/only checking {{db.head}} against {{{}v1TableName.database{}}}, it will not correctly handle e.g. {code:java} SHOW COLUMNS FOR a.b.table IN a.b -- incorrectly fails {code} There is also ambiguity: {code:java} SHOW COLUMNS FOR a.b.table IN a -- fails today (should it succeed?) SHOW COLUMNS FOR a.b.table IN b -- succeeds today (should it fail?) {code} A better solution would use an actual {{Identifier}} and compare its {{namespace}} against the user-provided namespace. Tangentially: It's a arguably a bit strange for the validity check to reside in {{ResolveSessionCatalog}} rule, given that it doesn't actually have anything to do with session catalogs? This seems more an artifact of an implementation that searches specifically for V1 tables, and {{ResolveSessionCatalog}} providing matchers that make that job easier? was: Today, the namespace argument of {{SHOW COLUMNS}} command is only partly catalog-aware. [AstBuilder.scala|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala#L4693] correctly pulls in and applies namespaces with more than one element, but [ResolveSessionCatalog.scala|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala#L316] does not correctly validate a multi-element namespace against the qualified table name: {code:java} case ShowColumns(ResolvedV1TableOrViewIdentifier(ident), ns, output) => val v1TableName = ident val resolver = conf.resolver val db = ns match { case Some(db) if v1TableName.database.exists(!resolver(_, db.head)) => throw QueryCompilationErrors.showColumnsWithConflictDatabasesError(db, v1TableName) case _ => ns.map(_.head) } {code} By always/only checking {{db.head}} against {{{}v1TableName.database{}}}, it will not correctly handle e.g. {code:java} SHOW COLUMNS FOR a.b.table IN a.b -- incorrectly fails SHOW COLUMNS FOR a.b.table IN a -- fails today (should it succeed?) SHOW COLUMNS FOR a.b.table IN b -- succeeds today (should it fail?){code} A better solution would use an actual {{Identifier}} and compare its {{namespace}} against the user-provided namespace. Tangentially: It's a arguably a bit strange for the validity check to reside in {{ResolveSessionCatalog}} rule, given that it doesn't actually have anything to do with session catalogs? This seems more an artifact of an implementation that searches specifically for V1 tables, and {{ResolveSessionCatalog}} providing matchers that make that job easier? > SHOW COLUMNS namespace is not catalog-aware > > > Key: SPARK-45319 > URL: https://issues.apache.org/jira/browse/SPARK-45319 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Ryan Johnson >Priority: Major > > Today, the namespace argument of {{SHOW COLUMNS}} command is only partly > catalog-aware. > [AstBuilder.scala|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala#L4693] > correctly pulls in and applies namespaces with more than one element, but > [ResolveSessionCatalog.scala|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala#L316] > does not correctly validate a multi-element namespace against the qualified > table name: > {code:java} > case ShowColumns(ResolvedV1TableOrViewIdentifier(ident), ns, output) => > val v1TableName = ident > val resolver = conf.resolver > val db = ns match { >
[jira] [Updated] (SPARK-45319) SHOW COLUMNS namespace is not catalog-aware
[ https://issues.apache.org/jira/browse/SPARK-45319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Johnson updated SPARK-45319: - Description: Today, the namespace argument of {{SHOW COLUMNS}} command is only partly catalog-aware. [AstBuilder.scala|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala#L4693] correctly pulls in and applies namespaces with more than one element, but [ResolveSessionCatalog.scala|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala#L316] does not correctly validate a multi-element namespace against the qualified table name: {code:java} case ShowColumns(ResolvedV1TableOrViewIdentifier(ident), ns, output) => val v1TableName = ident val resolver = conf.resolver val db = ns match { case Some(db) if v1TableName.database.exists(!resolver(_, db.head)) => throw QueryCompilationErrors.showColumnsWithConflictDatabasesError(db, v1TableName) case _ => ns.map(_.head) } {code} By always/only checking {{db.head}} against {{{}v1TableName.database{}}}, it will not correctly handle e.g. {code:java} SHOW COLUMNS FOR a.b.table IN a.b -- incorrectly fails {code} There is also some ambiguity in the semantics of the check: {code:java} SHOW COLUMNS FOR a.b.table IN a -- fails today (should it succeed?) SHOW COLUMNS FOR a.b.table IN b -- succeeds today (should it fail?) {code} A better solution would use an actual {{Identifier}} and compare its {{namespace}} against the user-provided namespace. Tangentially: It's a arguably a bit strange for the validity check to reside in {{ResolveSessionCatalog}} rule, given that it doesn't actually have anything to do with session catalogs? This seems more an artifact of an implementation that searches specifically for V1 tables, and {{ResolveSessionCatalog}} providing matchers that make that job easier? was: Today, the namespace argument of {{SHOW COLUMNS}} command is only partly catalog-aware. [AstBuilder.scala|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala#L4693] correctly pulls in and applies namespaces with more than one element, but [ResolveSessionCatalog.scala|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala#L316] does not correctly validate a multi-element namespace against the qualified table name: {code:java} case ShowColumns(ResolvedV1TableOrViewIdentifier(ident), ns, output) => val v1TableName = ident val resolver = conf.resolver val db = ns match { case Some(db) if v1TableName.database.exists(!resolver(_, db.head)) => throw QueryCompilationErrors.showColumnsWithConflictDatabasesError(db, v1TableName) case _ => ns.map(_.head) } {code} By always/only checking {{db.head}} against {{{}v1TableName.database{}}}, it will not correctly handle e.g. {code:java} SHOW COLUMNS FOR a.b.table IN a.b -- incorrectly fails {code} There is also ambiguity: {code:java} SHOW COLUMNS FOR a.b.table IN a -- fails today (should it succeed?) SHOW COLUMNS FOR a.b.table IN b -- succeeds today (should it fail?) {code} A better solution would use an actual {{Identifier}} and compare its {{namespace}} against the user-provided namespace. Tangentially: It's a arguably a bit strange for the validity check to reside in {{ResolveSessionCatalog}} rule, given that it doesn't actually have anything to do with session catalogs? This seems more an artifact of an implementation that searches specifically for V1 tables, and {{ResolveSessionCatalog}} providing matchers that make that job easier? > SHOW COLUMNS namespace is not catalog-aware > > > Key: SPARK-45319 > URL: https://issues.apache.org/jira/browse/SPARK-45319 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Ryan Johnson >Priority: Major > > Today, the namespace argument of {{SHOW COLUMNS}} command is only partly > catalog-aware. > [AstBuilder.scala|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala#L4693] > correctly pulls in and applies namespaces with more than one element, but > [ResolveSessionCatalog.scala|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala#L316] > does not correctly validate a multi-element namespace against the qualified > table name: > {code:java} > case ShowColumns(ResolvedV1TableOrViewIdentifier(ident), ns, output) => > val v1TableName
[jira] [Updated] (SPARK-45319) SHOW COLUMNS namespace is not catalog-aware
[ https://issues.apache.org/jira/browse/SPARK-45319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Johnson updated SPARK-45319: - Description: Today, the namespace argument of {{SHOW COLUMNS}} command is only partly catalog-aware. [AstBuilder.scala|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala#L4693] correctly pulls in and applies namespaces with more than one element, but [ResolveSessionCatalog.scala|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala#L316] does not correctly validate a multi-element namespace against the qualified table name: {code:java} case ShowColumns(ResolvedV1TableOrViewIdentifier(ident), ns, output) => val v1TableName = ident val resolver = conf.resolver val db = ns match { case Some(db) if v1TableName.database.exists(!resolver(_, db.head)) => throw QueryCompilationErrors.showColumnsWithConflictDatabasesError(db, v1TableName) case _ => ns.map(_.head) } {code} By always/only checking {{db.head}} against {{{}v1TableName.database{}}}, it will not correctly handle e.g. {code:java} SHOW COLUMNS FOR a.b.table IN a.b -- incorrectly fails SHOW COLUMNS FOR a.b.table IN a -- fails today (should it succeed?) SHOW COLUMNS FOR a.b.table IN b -- succeeds today (should it fail?){code} A better solution would use an actual {{Identifier}} and compare its {{namespace}} against the user-provided namespace. Tangentially: It's a arguably a bit strange for the validity check to reside in {{ResolveSessionCatalog}} rule, given that it doesn't actually have anything to do with session catalogs? This seems more an artifact of an implementation that searches specifically for V1 tables, and {{ResolveSessionCatalog}} providing matchers that make that job easier? was: Today, the namespace argument of {{SHOW COLUMNS}} command is only partly catalog-aware. [AstBuilder.scala|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala#L4693] correctly pulls in and applies namespaces with more than one element, but [ResolveSessionCatalog.scala|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala#L316] does not correctly validate a multi-element namespace against the qualified table name: {code:java} case ShowColumns(ResolvedV1TableOrViewIdentifier(ident), ns, output) => val v1TableName = ident val resolver = conf.resolver val db = ns match { case Some(db) if v1TableName.database.exists(!resolver(_, db.head)) => throw QueryCompilationErrors.showColumnsWithConflictDatabasesError(db, v1TableName) case _ => ns.map(_.head) } {code} By always/only checking {{db.head}} against {{{}v1TableName.database{}}}, it will not correctly handle e.g. {code:java} SHOW COLUMNS FOR a.b.table IN a -- incorrectly succeeds SHOW COLUMNS FOR a.b.table IN a.b -- incorrectly fails{code} A better solution would use an actual {{Identifier}} and compare its {{namespace}} against the user-provided namespace. Tangentially: It's a arguably a bit strange for the validity check to reside in {{ResolveSessionCatalog}} rule, given that it doesn't actually have anything to do with session catalogs? This seems more an artifact of an implementation that searches specifically for V1 tables, and {{ResolveSessionCatalog}} providing matchers that make that job easier? > SHOW COLUMNS namespace is not catalog-aware > > > Key: SPARK-45319 > URL: https://issues.apache.org/jira/browse/SPARK-45319 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Ryan Johnson >Priority: Major > > Today, the namespace argument of {{SHOW COLUMNS}} command is only partly > catalog-aware. > [AstBuilder.scala|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala#L4693] > correctly pulls in and applies namespaces with more than one element, but > [ResolveSessionCatalog.scala|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala#L316] > does not correctly validate a multi-element namespace against the qualified > table name: > {code:java} > case ShowColumns(ResolvedV1TableOrViewIdentifier(ident), ns, output) => > val v1TableName = ident > val resolver = conf.resolver > val db = ns match { > case Some(db) if v1TableName.database.exists(!resolver(_, db.head)) => > throw > QueryCompilationErrors.showColu
[jira] [Created] (SPARK-45319) SHOW COLUMNS namespace is not catalog-aware
Ryan Johnson created SPARK-45319: Summary: SHOW COLUMNS namespace is not catalog-aware Key: SPARK-45319 URL: https://issues.apache.org/jira/browse/SPARK-45319 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.5.0 Reporter: Ryan Johnson Today, the namespace argument of {{SHOW COLUMNS}} command is only partly catalog-aware. [AstBuilder.scala|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala#L4693] correctly pulls in and applies namespaces with more than one element, but [ResolveSessionCatalog.scala|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala#L316] does not correctly validate a multi-element namespace against the qualified table name: {code:java} case ShowColumns(ResolvedV1TableOrViewIdentifier(ident), ns, output) => val v1TableName = ident val resolver = conf.resolver val db = ns match { case Some(db) if v1TableName.database.exists(!resolver(_, db.head)) => throw QueryCompilationErrors.showColumnsWithConflictDatabasesError(db, v1TableName) case _ => ns.map(_.head) } {code} By always/only checking {{db.head}} against {{{}v1TableName.database{}}}, it will not correctly handle e.g. {code:java} SHOW COLUMNS FOR a.b.table IN a -- incorrectly succeeds SHOW COLUMNS FOR a.b.table IN a.b -- incorrectly fails{code} A better solution would use an actual {{Identifier}} and compare its {{namespace}} against the user-provided namespace. Tangentially: It's a arguably a bit strange for the validity check to reside in {{ResolveSessionCatalog}} rule, given that it doesn't actually have anything to do with session catalogs? This seems more an artifact of an implementation that searches specifically for V1 tables, and {{ResolveSessionCatalog}} providing matchers that make that job easier? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44081) Simplify PartitionedFileUtil API a little
[ https://issues.apache.org/jira/browse/SPARK-44081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Johnson updated SPARK-44081: - Summary: Simplify PartitionedFileUtil API a little (was: Simplify PartitionedFileUtil API) > Simplify PartitionedFileUtil API a little > - > > Key: SPARK-44081 > URL: https://issues.apache.org/jira/browse/SPARK-44081 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Ryan Johnson >Priority: Major > > Utility methods in {{PartitionedFileUtil}} take both {{file: > FileStatusWithMetadata}} and {{{}filePath: Path{}}}, even tho the latter can > be obtained easily from the former. > This was originally done for performance reasons, to allow callers to pass a > memoized {{Path}} and thus avoid the cost of an additional > {{FileStatus.getPath}} call. > Now that we have {{{}FileStatusWithMetadata{}}}, we no longer need the > redundancy because the new class can capture the path as a {{{}lazy val{}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44081) Simplify PartitionedFileUtil API
Ryan Johnson created SPARK-44081: Summary: Simplify PartitionedFileUtil API Key: SPARK-44081 URL: https://issues.apache.org/jira/browse/SPARK-44081 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.5.0 Reporter: Ryan Johnson Utility methods in {{PartitionedFileUtil}} take both {{file: FileStatusWithMetadata}} and {{{}filePath: Path{}}}, even tho the latter can be obtained easily from the former. This was originally done for performance reasons, to allow callers to pass a memoized {{Path}} and thus avoid the cost of an additional {{FileStatus.getPath}} call. Now that we have {{{}FileStatusWithMetadata{}}}, we no longer need the redundancy because the new class can capture the path as a {{{}lazy val{}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44071) Define UnresolvedNode trait to reduce redundancy
Ryan Johnson created SPARK-44071: Summary: Define UnresolvedNode trait to reduce redundancy Key: SPARK-44071 URL: https://issues.apache.org/jira/browse/SPARK-44071 Project: Spark Issue Type: New Feature Components: Spark Core Affects Versions: 3.5.0 Reporter: Ryan Johnson Looking at [unresolved.scala|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala], spark would benefit from an {{UnresolvedNode}} trait that various {{UnresolvedFoo}} classes could inherit from: {code:java} trait UnresolvedNode extends LogicalPlan { override def output: Seq[Attribute] = Nil override lazy val resolved = false }{code} Today, the code is duplicated in ~20 locations. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43226) Define extractors for file-constant metadata columns
Ryan Johnson created SPARK-43226: Summary: Define extractors for file-constant metadata columns Key: SPARK-43226 URL: https://issues.apache.org/jira/browse/SPARK-43226 Project: Spark Issue Type: New Feature Components: Spark Core Affects Versions: 3.4.0 Reporter: Ryan Johnson File-source constant metadata columns are often derived indirectly from file-level metadata values rather than exposing those values directly. For example, {{_metadata.file_name}} is currently hard-coded in {{FileFormat.updateMetadataInternalRow}} as: {code:java} UTF8String.fromString(filePath.getName){code} We should add support for metadata extractors, functions that map from {{PartitionedFile}} to {{{}Literal{}}}, so that we can express such columns in a generic way instead of hard-coding them. We can't just add them to the metadata map because then they have to be pre-computed even if it turns out the query does not select that field. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43039) Support custom fields in the file source _metadata column
Ryan Johnson created SPARK-43039: Summary: Support custom fields in the file source _metadata column Key: SPARK-43039 URL: https://issues.apache.org/jira/browse/SPARK-43039 Project: Spark Issue Type: New Feature Components: Spark Core Affects Versions: 3.4.0 Reporter: Ryan Johnson Today, the schema of the file source _metadata column depends on the file format (e.g. parquet file format supports {{{}_metadata.row_index{}}}) but this is hard-wired into the {{FileFormat}} itself. Not only is this an ugly design, it also prevents custom file formats from adding their own fields to the {{_metadata}} column. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42704) SubqueryAlias should propagate metadata columns its child already selects
[ https://issues.apache.org/jira/browse/SPARK-42704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Johnson updated SPARK-42704: - Description: The `AddMetadataColumns` analyzer rule intends to make resolve available metadata columns, even if the plan already contains projections that did not explicitly mention the metadata column. The `SubqueryAlias` plan node intentionally does not propagate metadata columns automatically from a non-leaf/non-subquery child node, because the following should _not_ work: {code:java} spark.read.table("t").select("a", "b").as("s").select("_metadata"){code} However, today it is too strict in breaks the metadata chain, in case the child node's output already includes the metadata column: {code:java} // expected to work (and does) spark.read.table("t") .select("a", "b").select("_metadata") // by extension, should also work (but does not) spark.read.table("t").select("a", "b", "_metadata").as("s") .select("a", "b").select("_metadata"){code} The solution is for `SubqueryAlias` to always propagate metadata columns that are already in the child's output, thus preserving the `metadataOutput` chain for that column. was: The `AddMetadataColumns` analyzer rule intends to make resolve available metadata columns, even if the plan already contains projections that did not explicitly mention the metadata column. The `SubqueryAlias` plan node intentionally does not propagate metadata columns automatically from a non-leaf/non-subquery child node, because the following should _not_ work: {code:java} spark.read.table("t").select("a", "b").as("s").select("_metadata"){code} However, today it is too strict in breaks the metadata chain, in case the child node's output already includes the metadata column: {code:java} // expected to work spark.read.table("t") .select("a", "b").select("_metadata") // by extension, should also work (but does not) spark.read.table("t").select("a", "b", "_metadata").as("s") .select("a", "b").select("_metadata"){code} The solution is for `SubqueryAlias` to always propagate metadata columns that are already in the child's output, thus preserving the `metadataOutput` chain for that column. > SubqueryAlias should propagate metadata columns its child already selects > -- > > Key: SPARK-42704 > URL: https://issues.apache.org/jira/browse/SPARK-42704 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.2, 3.4.0 >Reporter: Ryan Johnson >Priority: Major > > The `AddMetadataColumns` analyzer rule intends to make resolve available > metadata columns, even if the plan already contains projections that did not > explicitly mention the metadata column. > The `SubqueryAlias` plan node intentionally does not propagate metadata > columns automatically from a non-leaf/non-subquery child node, because the > following should _not_ work: > > {code:java} > spark.read.table("t").select("a", "b").as("s").select("_metadata"){code} > However, today it is too strict in breaks the metadata chain, in case the > child node's output already includes the metadata column: > > {code:java} > // expected to work (and does) > spark.read.table("t") > .select("a", "b").select("_metadata") > // by extension, should also work (but does not) > spark.read.table("t").select("a", "b", "_metadata").as("s") > .select("a", "b").select("_metadata"){code} > The solution is for `SubqueryAlias` to always propagate metadata columns that > are already in the child's output, thus preserving the `metadataOutput` chain > for that column. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42704) SubqueryAlias should propagate metadata columns its child already selects
Ryan Johnson created SPARK-42704: Summary: SubqueryAlias should propagate metadata columns its child already selects Key: SPARK-42704 URL: https://issues.apache.org/jira/browse/SPARK-42704 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.3.2, 3.4.0 Reporter: Ryan Johnson The `AddMetadataColumns` analyzer rule intends to make resolve available metadata columns, even if the plan already contains projections that did not explicitly mention the metadata column. The `SubqueryAlias` plan node intentionally does not propagate metadata columns automatically from a non-leaf/non-subquery child node, because the following should _not_ work: {code:java} spark.read.table("t").select("a", "b").as("s").select("_metadata"){code} However, today it is too strict in breaks the metadata chain, in case the child node's output already includes the metadata column: {code:java} // expected to work spark.read.table("t") .select("a", "b").select("_metadata") // by extension, should also work (but does not) spark.read.table("t").select("a", "b", "_metadata").as("s") .select("a", "b").select("_metadata"){code} The solution is for `SubqueryAlias` to always propagate metadata columns that are already in the child's output, thus preserving the `metadataOutput` chain for that column. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42683) Automatically rename metadata columns that conflict with data schema columns
[ https://issues.apache.org/jira/browse/SPARK-42683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Johnson updated SPARK-42683: - Target Version/s: 3.5.0 (was: 3.4.0) > Automatically rename metadata columns that conflict with data schema columns > > > Key: SPARK-42683 > URL: https://issues.apache.org/jira/browse/SPARK-42683 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.2, 3.4.0 >Reporter: Ryan Johnson >Priority: Major > > Today, if a datasource already has a column called `_metadata`, queries > cannot access the file-source metadata column that normally carries that > name. We can address this conflict with two changes to metadata column > handling: > # Automatically rename any metadata column whose name conflicts with a data > schema column > # Add a facility to reliably find metadata columns by their original/logical > name, even if they were renamed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42683) Automatically rename metadata columns that conflict with data schema columns
Ryan Johnson created SPARK-42683: Summary: Automatically rename metadata columns that conflict with data schema columns Key: SPARK-42683 URL: https://issues.apache.org/jira/browse/SPARK-42683 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.3.2, 3.4.0 Reporter: Ryan Johnson Today, if a datasource already has a column called `_metadata`, queries cannot access the file-source metadata column that normally carries that name. We can address this conflict with two changes to metadata column handling: # Automatically rename any metadata column whose name conflicts with a data schema column # Add a facility to reliably find metadata columns by their original/logical name, even if they were renamed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40141) Task listener overloads no longer needed with JDK 8+
Ryan Johnson created SPARK-40141: Summary: Task listener overloads no longer needed with JDK 8+ Key: SPARK-40141 URL: https://issues.apache.org/jira/browse/SPARK-40141 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.3.0 Reporter: Ryan Johnson TaskContext defines methods for registering completion and failure listeners, and the respective listener types qualify as functional interfaces in JDK 8+. This leads to awkward ambiguous overload errors with the overload of each function, that takes a function directly instead of a listener. Now that JDK 8 is the minimum allowed, we can remove the unnecessary overloads, which not only simplifies the code, but also removes a source of frustration since it can be nearly impossible to predict when an ambiguous overload might be triggered. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40106) Task failure handlers should always run if the task failed
Ryan Johnson created SPARK-40106: Summary: Task failure handlers should always run if the task failed Key: SPARK-40106 URL: https://issues.apache.org/jira/browse/SPARK-40106 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.3.0 Reporter: Ryan Johnson Today, if a task body succeeds, but a task completion listener fails, task failure listeners are not called -- even tho the task has indeed failed at that point. If a completion listener fails, and failure listeners were not previously invoked, we should invoke them before running the remaining completion listeners. Such a change would increase the utility of task listeners, especially ones intended to assist with task cleanup. To give one arbitrary example, code like this appears at several places in the code (taken from {{executeTask}} method of FileFormatWriter.scala): {code:java} try { Utils.tryWithSafeFinallyAndFailureCallbacks(block = { // Execute the task to write rows out and commit the task. dataWriter.writeWithIterator(iterator) dataWriter.commit() })(catchBlock = { // If there is an error, abort the task dataWriter.abort() logError(s"Job $jobId aborted.") }, finallyBlock = { dataWriter.close() }) } catch { case e: FetchFailedException => throw e case f: FileAlreadyExistsException if SQLConf.get.fastFailFileFormatOutput => // If any output file to write already exists, it does not make sense to re-run this task. // We throw the exception and let Executor throw ExceptionFailure to abort the job. throw new TaskOutputFileAlreadyExistException(f) case t: Throwable => throw QueryExecutionErrors.taskFailedWhileWritingRowsError(t) }{code} If failure listeners were reliably called, the above idiom could potentially be factored out as two failure listeners plus a completion listener, and reused rather than duplicated. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org