[GitHub] [spark] cloud-fan commented on a change in pull request #30881: [SPARK-33875][SQL] Implement DESCRIBE COLUMN for v2 tables

2020-12-29 Thread GitBox


cloud-fan commented on a change in pull request #30881:
URL: https://github.com/apache/spark/pull/30881#discussion_r549936158



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala
##
@@ -154,13 +154,54 @@ class DataSourceV2SQLSuite
   Array("Table Properties", "[bar=baz]", "")))
   }
 
-  test("Describe column is not supported for v2 catalog") {
-withTable("testcat.tbl") {
-  spark.sql("CREATE TABLE testcat.tbl (id bigint) USING foo")
-  val ex = intercept[AnalysisException] {
-spark.sql("DESCRIBE testcat.tbl id")
+  test("Describe column for v2 catalog") {
+val t = "testcat.tbl"
+withTable(t) {
+  sql(s"CREATE TABLE $t (id bigint, data string COMMENT 'hello') USING 
foo")
+  val df1 = sql(s"DESCRIBE $t id")
+  assert(df1.schema.map(field => (field.name, field.dataType))
+=== Seq(("info_name", StringType), ("info_value", StringType)))
+  assert(df1.collect === Seq(
+Row("col_name", "id"),
+Row("data_type", "bigint"),
+Row("comment", "NULL")))
+  val df2 = sql(s"DESCRIBE $t data")
+  assert(df2.schema.map(field => (field.name, field.dataType))
+=== Seq(("info_name", StringType), ("info_value", StringType)))
+  assert(df2.collect === Seq(
+Row("col_name", "data"),
+Row("data_type", "string"),
+Row("comment", "hello")))
+
+  assertAnalysisError(
+s"DESCRIBE $t invalid_col",
+"cannot resolve '`invalid_col`' given input columns: 
[testcat.tbl.data, testcat.tbl.id]")

Review comment:
   I think v2 is better. Let's keep it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30881: [SPARK-33875][SQL] Implement DESCRIBE COLUMN for v2 tables

2020-12-29 Thread GitBox


cloud-fan commented on a change in pull request #30881:
URL: https://github.com/apache/spark/pull/30881#discussion_r549935749



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
##
@@ -764,13 +765,13 @@ case class DescribeColumnCommand(
 val colName = UnresolvedAttribute(colNameParts).name
 val field = {
   relation.resolve(colNameParts, resolver).getOrElse {
-throw new AnalysisException(s"Column $colName does not exist")
+throw QueryCompilationErrors.columnDoesNotExistError(colName)
   }
 }
 if (!field.isInstanceOf[Attribute]) {
   // If the field is not an attribute after `resolve`, then it's a nested 
field.
-  throw new AnalysisException(
-s"DESC TABLE COLUMN command does not support nested data types: 
$colName")
+  throw QueryCompilationErrors.commandNotSupportNestedColumnError(
+"DESC TABLE COLUMN", toPrettySQL(field, removeAlias = true))

Review comment:
   We won't hit this branch anymore, as we fail earlier in 
`ResolveSessionCatalog`. We can simply use 
`assert(field.isInstanceOf[Attribute])`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30881: [SPARK-33875][SQL] Implement DESCRIBE COLUMN for v2 tables

2020-12-29 Thread GitBox


cloud-fan commented on a change in pull request #30881:
URL: https://github.com/apache/spark/pull/30881#discussion_r549935311



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
##
@@ -235,8 +237,21 @@ class ResolveSessionCatalog(
 case DescribeRelation(ResolvedV1TableOrViewIdentifier(ident), 
partitionSpec, isExtended) =>
   DescribeTableCommand(ident.asTableIdentifier, partitionSpec, isExtended)
 
-case DescribeColumn(ResolvedV1TableOrViewIdentifier(ident), colNameParts, 
isExtended) =>
-  DescribeColumnCommand(ident.asTableIdentifier, colNameParts, isExtended)
+case DescribeColumn(ResolvedViewIdentifier(ident), column: 
UnresolvedAttribute, isExtended) =>
+  // For views, the column will not be resolved by `ResolveReferences` 
because
+  // `ResolvedView` stores only the identifier.
+  DescribeColumnCommand(ident.asTableIdentifier, column.nameParts, 
isExtended)
+
+case DescribeColumn(ResolvedV1TableIdentifier(ident), column, isExtended) 
=>
+  column match {
+case u: UnresolvedAttribute =>
+  throw QueryCompilationErrors.columnDoesNotExistError(u.name)
+case a: Attribute =>
+  DescribeColumnCommand(ident.asTableIdentifier, a.qualifier :+ 
a.name, isExtended)
+case nested =>

Review comment:
   We can match `Alias` directly here, as other expressions won't be 
produced when resolving columns. e.g.
   ```
   case Alias(child, _) => throw ... toPrettySQL(child)
   case other => throw ...("[BUG] unexpected column expression: " + other)
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30881: [SPARK-33875][SQL] Implement DESCRIBE COLUMN for v2 tables

2020-12-29 Thread GitBox


cloud-fan commented on a change in pull request #30881:
URL: https://github.com/apache/spark/pull/30881#discussion_r549933484



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
##
@@ -235,8 +237,21 @@ class ResolveSessionCatalog(
 case DescribeRelation(ResolvedV1TableOrViewIdentifier(ident), 
partitionSpec, isExtended) =>
   DescribeTableCommand(ident.asTableIdentifier, partitionSpec, isExtended)
 
-case DescribeColumn(ResolvedV1TableOrViewIdentifier(ident), colNameParts, 
isExtended) =>
-  DescribeColumnCommand(ident.asTableIdentifier, colNameParts, isExtended)
+case DescribeColumn(ResolvedViewIdentifier(ident), column: 
UnresolvedAttribute, isExtended) =>
+  // For views, the column will not be resolved by `ResolveReferences` 
because
+  // `ResolvedView` stores only the identifier.
+  DescribeColumnCommand(ident.asTableIdentifier, column.nameParts, 
isExtended)
+
+case DescribeColumn(ResolvedV1TableIdentifier(ident), column, isExtended) 
=>
+  column match {
+case u: UnresolvedAttribute =>
+  throw QueryCompilationErrors.columnDoesNotExistError(u.name)
+case a: Attribute =>
+  DescribeColumnCommand(ident.asTableIdentifier, a.qualifier :+ 
a.name, isExtended)

Review comment:
   How about `ident.asMultipartIdentifier :+ a.name`? Then we don't need to 
add qualifiers to `ResolvedTable`.

##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
##
@@ -235,8 +237,21 @@ class ResolveSessionCatalog(
 case DescribeRelation(ResolvedV1TableOrViewIdentifier(ident), 
partitionSpec, isExtended) =>
   DescribeTableCommand(ident.asTableIdentifier, partitionSpec, isExtended)
 
-case DescribeColumn(ResolvedV1TableOrViewIdentifier(ident), colNameParts, 
isExtended) =>
-  DescribeColumnCommand(ident.asTableIdentifier, colNameParts, isExtended)
+case DescribeColumn(ResolvedViewIdentifier(ident), column: 
UnresolvedAttribute, isExtended) =>
+  // For views, the column will not be resolved by `ResolveReferences` 
because
+  // `ResolvedView` stores only the identifier.
+  DescribeColumnCommand(ident.asTableIdentifier, column.nameParts, 
isExtended)
+
+case DescribeColumn(ResolvedV1TableIdentifier(ident), column, isExtended) 
=>
+  column match {
+case u: UnresolvedAttribute =>
+  throw QueryCompilationErrors.columnDoesNotExistError(u.name)
+case a: Attribute =>
+  DescribeColumnCommand(ident.asTableIdentifier, a.qualifier :+ 
a.name, isExtended)

Review comment:
   How about `ident.asMultipartIdentifier :+ a.name`? Then we don't need to 
add qualifiers to `ResolvedTable.output`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30881: [SPARK-33875][SQL] Implement DESCRIBE COLUMN for v2 tables

2020-12-29 Thread GitBox


cloud-fan commented on a change in pull request #30881:
URL: https://github.com/apache/spark/pull/30881#discussion_r549933194



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
##
@@ -235,8 +237,21 @@ class ResolveSessionCatalog(
 case DescribeRelation(ResolvedV1TableOrViewIdentifier(ident), 
partitionSpec, isExtended) =>
   DescribeTableCommand(ident.asTableIdentifier, partitionSpec, isExtended)
 
-case DescribeColumn(ResolvedV1TableOrViewIdentifier(ident), colNameParts, 
isExtended) =>
-  DescribeColumnCommand(ident.asTableIdentifier, colNameParts, isExtended)
+case DescribeColumn(ResolvedViewIdentifier(ident), column: 
UnresolvedAttribute, isExtended) =>
+  // For views, the column will not be resolved by `ResolveReferences` 
because
+  // `ResolvedView` stores only the identifier.
+  DescribeColumnCommand(ident.asTableIdentifier, column.nameParts, 
isExtended)
+
+case DescribeColumn(ResolvedV1TableIdentifier(ident), column, isExtended) 
=>
+  column match {
+case u: UnresolvedAttribute =>
+  throw QueryCompilationErrors.columnDoesNotExistError(u.name)
+case a: Attribute =>
+  DescribeColumnCommand(ident.asTableIdentifier, a.qualifier :+ 
a.name, isExtended)
+case nested =>
+  throw QueryCompilationErrors.commandNotSupportNestedColumnError(
+"DESC TABLE COLUMN", toPrettySQL(nested, removeAlias = true))

Review comment:
   Can we manually remove `Alias` here instead of changing the 
`toPrettySQL` method?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30881: [SPARK-33875][SQL] Implement DESCRIBE COLUMN for v2 tables

2020-12-28 Thread GitBox


cloud-fan commented on a change in pull request #30881:
URL: https://github.com/apache/spark/pull/30881#discussion_r549602461



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
##
@@ -235,8 +236,17 @@ class ResolveSessionCatalog(
 case DescribeRelation(ResolvedV1TableOrViewIdentifier(ident), 
partitionSpec, isExtended) =>
   DescribeTableCommand(ident.asTableIdentifier, partitionSpec, isExtended)
 
-case DescribeColumn(ResolvedV1TableOrViewIdentifier(ident), colNameParts, 
isExtended) =>
-  DescribeColumnCommand(ident.asTableIdentifier, colNameParts, isExtended)
+case DescribeColumn(ResolvedV1TableOrViewIdentifier(ident), column, 
isExtended) =>
+  column match {
+case u: UnresolvedAttribute =>
+  // For views, the column will not be resolved by `ResolveReferences` 
because
+  // `ResolvedView` stores only the identifier.
+  DescribeColumnCommand(ident.asTableIdentifier, u.nameParts, 
isExtended)
+case a: Attribute =>
+  DescribeColumnCommand(ident.asTableIdentifier, a.qualifier :+ 
a.name, isExtended)
+case nested =>
+  throw 
QueryCompilationErrors.commandNotSupportNestedColumnError("DESC TABLE COLUMN")

Review comment:
   How about we strip the `Alias` and then call `toPrettySQL` which is 
defined in `org.apache.spark.sql.catalyst.util`? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30881: [SPARK-33875][SQL] Implement DESCRIBE COLUMN for v2 tables

2020-12-28 Thread GitBox


cloud-fan commented on a change in pull request #30881:
URL: https://github.com/apache/spark/pull/30881#discussion_r549592550



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala
##
@@ -272,8 +273,13 @@ class DataSourceV2Strategy(session: SparkSession) extends 
Strategy with Predicat
   }
   DescribeTableExec(desc.output, r.table, isExtended) :: Nil
 
-case DescribeColumn(_: ResolvedTable, _, _) =>
-  throw new AnalysisException("Describing columns is not supported for v2 
tables.")
+case desc @ DescribeColumn(_: ResolvedTable, column, isExtended) =>
+  column match {
+case c: Attribute =>
+  DescribeColumnExec(desc.output, c, isExtended) :: Nil
+case _ =>
+  throw 
QueryCompilationErrors.commandNotSupportNestedColumnError("DESC TABLE COLUMN")

Review comment:
   ditto





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30881: [SPARK-33875][SQL] Implement DESCRIBE COLUMN for v2 tables

2020-12-28 Thread GitBox


cloud-fan commented on a change in pull request #30881:
URL: https://github.com/apache/spark/pull/30881#discussion_r549592416



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
##
@@ -235,8 +236,17 @@ class ResolveSessionCatalog(
 case DescribeRelation(ResolvedV1TableOrViewIdentifier(ident), 
partitionSpec, isExtended) =>
   DescribeTableCommand(ident.asTableIdentifier, partitionSpec, isExtended)
 
-case DescribeColumn(ResolvedV1TableOrViewIdentifier(ident), colNameParts, 
isExtended) =>
-  DescribeColumnCommand(ident.asTableIdentifier, colNameParts, isExtended)
+case DescribeColumn(ResolvedV1TableOrViewIdentifier(ident), column, 
isExtended) =>
+  column match {
+case u: UnresolvedAttribute =>
+  // For views, the column will not be resolved by `ResolveReferences` 
because
+  // `ResolvedView` stores only the identifier.
+  DescribeColumnCommand(ident.asTableIdentifier, u.nameParts, 
isExtended)
+case a: Attribute =>
+  DescribeColumnCommand(ident.asTableIdentifier, a.qualifier :+ 
a.name, isExtended)
+case nested =>
+  throw 
QueryCompilationErrors.commandNotSupportNestedColumnError("DESC TABLE COLUMN")

Review comment:
   > Construct the original name from GetStructField, GetArrayStructFields, 
etc.
   
   Is it simply `nested.sql`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30881: [SPARK-33875][SQL] Implement DESCRIBE COLUMN for v2 tables

2020-12-28 Thread GitBox


cloud-fan commented on a change in pull request #30881:
URL: https://github.com/apache/spark/pull/30881#discussion_r549591975



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
##
@@ -235,8 +236,17 @@ class ResolveSessionCatalog(
 case DescribeRelation(ResolvedV1TableOrViewIdentifier(ident), 
partitionSpec, isExtended) =>
   DescribeTableCommand(ident.asTableIdentifier, partitionSpec, isExtended)
 
-case DescribeColumn(ResolvedV1TableOrViewIdentifier(ident), colNameParts, 
isExtended) =>
-  DescribeColumnCommand(ident.asTableIdentifier, colNameParts, isExtended)
+case DescribeColumn(ResolvedV1TableOrViewIdentifier(ident), column, 
isExtended) =>
+  column match {
+case u: UnresolvedAttribute =>
+  // For views, the column will not be resolved by `ResolveReferences` 
because
+  // `ResolvedView` stores only the identifier.
+  DescribeColumnCommand(ident.asTableIdentifier, u.nameParts, 
isExtended)

Review comment:
   It's possible when the column name doesn't exist in the table, and we 
should give a clear error message: `Column $colName does not exist`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30881: [SPARK-33875][SQL] Implement DESCRIBE COLUMN for v2 tables

2020-12-27 Thread GitBox


cloud-fan commented on a change in pull request #30881:
URL: https://github.com/apache/spark/pull/30881#discussion_r549236685



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -1675,6 +1675,13 @@ class Analyzer(override val catalogManager: 
CatalogManager)
   // Skip the having clause here, this will be handled in 
ResolveAggregateFunctions.
   case h: UnresolvedHaving => h
 
+  case d @ DescribeColumn(rt: ResolvedTable, _, _) =>
+rt.table match {
+  // References for v1 tables are resolved in DescribeColumnCommand.
+  case _: V1Table => d

Review comment:
   We can re-construct the multi-part column name from `Attribute`: 
`attr.qualifier +: attr.name`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30881: [SPARK-33875][SQL] Implement DESCRIBE COLUMN for v2 tables

2020-12-27 Thread GitBox


cloud-fan commented on a change in pull request #30881:
URL: https://github.com/apache/spark/pull/30881#discussion_r549236194



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -1675,6 +1675,13 @@ class Analyzer(override val catalogManager: 
CatalogManager)
   // Skip the having clause here, this will be handled in 
ResolveAggregateFunctions.
   case h: UnresolvedHaving => h
 
+  case d @ DescribeColumn(rt: ResolvedTable, _, _) =>
+rt.table match {
+  // References for v1 tables are resolved in DescribeColumnCommand.
+  case _: V1Table => d

Review comment:
   But seems like to resolve the column twice?

##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -1675,6 +1675,13 @@ class Analyzer(override val catalogManager: 
CatalogManager)
   // Skip the having clause here, this will be handled in 
ResolveAggregateFunctions.
   case h: UnresolvedHaving => h
 
+  case d @ DescribeColumn(rt: ResolvedTable, _, _) =>
+rt.table match {
+  // References for v1 tables are resolved in DescribeColumnCommand.
+  case _: V1Table => d

Review comment:
   But seems fine to resolve the column twice?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30881: [SPARK-33875][SQL] Implement DESCRIBE COLUMN for v2 tables

2020-12-27 Thread GitBox


cloud-fan commented on a change in pull request #30881:
URL: https://github.com/apache/spark/pull/30881#discussion_r549236035



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -1675,6 +1675,13 @@ class Analyzer(override val catalogManager: 
CatalogManager)
   // Skip the having clause here, this will be handled in 
ResolveAggregateFunctions.
   case h: UnresolvedHaving => h
 
+  case d @ DescribeColumn(rt: ResolvedTable, _, _) =>
+rt.table match {
+  // References for v1 tables are resolved in DescribeColumnCommand.
+  case _: V1Table => d

Review comment:
   Then we shouldn't change the v1 command.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30881: [SPARK-33875][SQL] Implement DESCRIBE COLUMN for v2 tables

2020-12-25 Thread GitBox


cloud-fan commented on a change in pull request #30881:
URL: https://github.com/apache/spark/pull/30881#discussion_r548838750



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/v2ResolutionPlans.scala
##
@@ -97,9 +98,21 @@ case class ResolvedNamespace(catalog: CatalogPlugin, 
namespace: Seq[String])
 /**
  * A plan containing resolved table.
  */
-case class ResolvedTable(catalog: TableCatalog, identifier: Identifier, table: 
Table)
-  extends LeafNode {
-  override def output: Seq[Attribute] = Nil
+case class ResolvedTable(
+catalog: TableCatalog,
+identifier: Identifier,
+table: Table,
+override val output: Seq[Attribute])
+  extends LeafNode
+
+object ResolvedTable {
+  def create(
+  catalog: TableCatalog,
+  identifier: Identifier,
+  table: Table): ResolvedTable = {

Review comment:
   This follows `DataSourceV2Relation.create`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30881: [SPARK-33875][SQL] Implement DESCRIBE COLUMN for v2 tables

2020-12-25 Thread GitBox


cloud-fan commented on a change in pull request #30881:
URL: https://github.com/apache/spark/pull/30881#discussion_r548830290



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -1675,6 +1675,13 @@ class Analyzer(override val catalogManager: 
CatalogManager)
   // Skip the having clause here, this will be handled in 
ResolveAggregateFunctions.
   case h: UnresolvedHaving => h
 
+  case d @ DescribeColumn(rt: ResolvedTable, _, _) =>
+rt.table match {
+  // References for v1 tables are resolved in DescribeColumnCommand.
+  case _: V1Table => d

Review comment:
   Or, we don't care too much about the DDL perf, then I think resolving 
the column twice is also fine and we don't need this hack either.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30881: [SPARK-33875][SQL] Implement DESCRIBE COLUMN for v2 tables

2020-12-25 Thread GitBox


cloud-fan commented on a change in pull request #30881:
URL: https://github.com/apache/spark/pull/30881#discussion_r548830073



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -1675,6 +1675,13 @@ class Analyzer(override val catalogManager: 
CatalogManager)
   // Skip the having clause here, this will be handled in 
ResolveAggregateFunctions.
   case h: UnresolvedHaving => h
 
+  case d @ DescribeColumn(rt: ResolvedTable, _, _) =>
+rt.table match {
+  // References for v1 tables are resolved in DescribeColumnCommand.
+  case _: V1Table => d

Review comment:
   Shall we change the v1 command to take resolved `Attribute` directly?  
Then we don't need the hack here.
   
   This also reminds me that, for all v1 commands, we will lookup the table 
twice: once in the framework, and once inside the v1 command. This is kind of a 
perf regression: previously we simply lookup the catalog and fallback to v1 
command if it's the session catalog, so the table lookup still happens once 
inside the v1 command. We can also update the v1 command and take the resolved 
table directly.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30881: [SPARK-33875][SQL] Implement DESCRIBE COLUMN for v2 tables

2020-12-25 Thread GitBox


cloud-fan commented on a change in pull request #30881:
URL: https://github.com/apache/spark/pull/30881#discussion_r548829115



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala
##
@@ -272,8 +272,14 @@ class DataSourceV2Strategy(session: SparkSession) extends 
Strategy with Predicat
   }
   DescribeTableExec(desc.output, r.table, isExtended) :: Nil
 
-case DescribeColumn(_: ResolvedTable, _, _) =>
-  throw new AnalysisException("Describing columns is not supported for v2 
tables.")
+case desc @ DescribeColumn(_: ResolvedTable, column, isExtended) =>
+  column match {
+case c: Attribute =>
+  DescribeColumnExec(desc.output, c, isExtended) :: Nil
+case c: GetStructField =>
+  throw new AnalysisException(
+"DESC TABLE COLUMN command does not support nested data types.")

Review comment:
   `nested data types` -> `nested fields`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30881: [SPARK-33875][SQL] Implement DESCRIBE COLUMN for v2 tables

2020-12-23 Thread GitBox


cloud-fan commented on a change in pull request #30881:
URL: https://github.com/apache/spark/pull/30881#discussion_r548399246



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala
##
@@ -344,10 +344,11 @@ case class DescribeRelation(
  */
 case class DescribeColumn(
 relation: LogicalPlan,
-colNameParts: Seq[String],
+column: NamedExpression,
 isExtended: Boolean) extends Command {
   override def children: Seq[LogicalPlan] = Seq(relation)
   override def output: Seq[Attribute] = 
DescribeCommandSchema.describeColumnAttributes()
+  override lazy val references: AttributeSet = AttributeSet.empty

Review comment:
   I agree. About `ResolvedView`, I think eventually it will have output as 
well, after we have v2 view APII. For now, `ResolvedView` always go to v1 
command, so we don't need its output.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30881: [SPARK-33875][SQL] Implement DESCRIBE COLUMN for v2 tables

2020-12-23 Thread GitBox


cloud-fan commented on a change in pull request #30881:
URL: https://github.com/apache/spark/pull/30881#discussion_r547946465



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala
##
@@ -344,10 +344,11 @@ case class DescribeRelation(
  */
 case class DescribeColumn(
 relation: LogicalPlan,
-colNameParts: Seq[String],
+column: NamedExpression,
 isExtended: Boolean) extends Command {
   override def children: Seq[LogicalPlan] = Seq(relation)
   override def output: Seq[Attribute] = 
DescribeCommandSchema.describeColumnAttributes()
+  override lazy val references: AttributeSet = AttributeSet.empty

Review comment:
   If we need a hack anyway, I'd prefer we override `inputSet` here, so 
that we can let analyzer to resolve columns.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #30881: [SPARK-33875][SQL] Implement DESCRIBE COLUMN for v2 tables

2020-12-22 Thread GitBox


cloud-fan commented on a change in pull request #30881:
URL: https://github.com/apache/spark/pull/30881#discussion_r547128251



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala
##
@@ -344,10 +344,11 @@ case class DescribeRelation(
  */
 case class DescribeColumn(
 relation: LogicalPlan,
-colNameParts: Seq[String],
+column: NamedExpression,
 isExtended: Boolean) extends Command {
   override def children: Seq[LogicalPlan] = Seq(relation)
   override def output: Seq[Attribute] = 
DescribeCommandSchema.describeColumnAttributes()
+  override lazy val references: AttributeSet = AttributeSet.empty

Review comment:
   What's the problem if `QueryPlan.expressions` pick up this field? Then 
we can let the analyzer to resolve columns which is even better.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org