[GitHub] spark pull request #14363: [SPARK-16731][SQL] use StructType in CatalogTable...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14363 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14363: [SPARK-16731][SQL] use StructType in CatalogTable...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/14363#discussion_r72910844 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala --- @@ -78,28 +78,6 @@ object CatalogStorageFormat { } /** - * A column in a table. - */ -case class CatalogColumn( -name: String, -// TODO: make this type-safe; this is left as a string due to issues in converting Hive -// varchars to and from SparkSQL strings. --- End diff -- I am wondering if it is related to the parser that converts hive's type string to spark's struct type? Can you try it out and see there is any issue? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14363: [SPARK-16731][SQL] use StructType in CatalogTable...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14363#discussion_r72735867 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -721,16 +723,16 @@ private[hive] class HiveClientImpl( Utils.classForName(name) .asInstanceOf[Class[_ <: org.apache.hadoop.hive.ql.io.HiveOutputFormat[_, _]]] - private def toHiveColumn(c: CatalogColumn): FieldSchema = { -new FieldSchema(c.name, c.dataType, c.comment.orNull) + private def toHiveColumn(c: StructField): FieldSchema = { +new FieldSchema(c.name, c.dataType.catalogString, c.getComment().orNull) } - private def fromHiveColumn(hc: FieldSchema): CatalogColumn = { -new CatalogColumn( + private def fromHiveColumn(hc: FieldSchema): StructField = { +val f = StructField( name = hc.getName, - dataType = hc.getType, - nullable = true, - comment = Option(hc.getComment)) + dataType = CatalystSqlParser.parseDataType(hc.getType), --- End diff -- So the behaviour change is: previously if a hive table contains type string that we can't parse, we are still able to describe it, but throw an exception if we try to read it. After this PR, we will throw an exception when we try to read its table meta from hive meta store. I think it's ok to break it, but need better error message. what do you think? cc @yhuai @liancheng --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14363: [SPARK-16731][SQL] use StructType in CatalogTable...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14363#discussion_r72735500 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala --- @@ -78,28 +78,6 @@ object CatalogStorageFormat { } /** - * A column in a table. - */ -case class CatalogColumn( -name: String, -// TODO: make this type-safe; this is left as a string due to issues in converting Hive -// varchars to and from SparkSQL strings. --- End diff -- I don't know either... cc @rxin @andrewor14 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14363: [SPARK-16731][SQL] use StructType in CatalogTable...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14363#discussion_r72695248 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala --- @@ -78,28 +78,6 @@ object CatalogStorageFormat { } /** - * A column in a table. - */ -case class CatalogColumn( -name: String, -// TODO: make this type-safe; this is left as a string due to issues in converting Hive -// varchars to and from SparkSQL strings. --- End diff -- Do you know what is the issue when we converting varchars to and from SparkSQL strings? Sorry, I am unable to find the answer. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14363: [SPARK-16731][SQL] use StructType in CatalogTable...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14363#discussion_r72694426 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -721,16 +723,16 @@ private[hive] class HiveClientImpl( Utils.classForName(name) .asInstanceOf[Class[_ <: org.apache.hadoop.hive.ql.io.HiveOutputFormat[_, _]]] - private def toHiveColumn(c: CatalogColumn): FieldSchema = { -new FieldSchema(c.name, c.dataType, c.comment.orNull) + private def toHiveColumn(c: StructField): FieldSchema = { +new FieldSchema(c.name, c.dataType.catalogString, c.getComment().orNull) } - private def fromHiveColumn(hc: FieldSchema): CatalogColumn = { -new CatalogColumn( + private def fromHiveColumn(hc: FieldSchema): StructField = { +val f = StructField( name = hc.getName, - dataType = hc.getType, - nullable = true, - comment = Option(hc.getComment)) + dataType = CatalystSqlParser.parseDataType(hc.getType), --- End diff -- This is the change we have to make if we convert `CatalogColumn` to `StructField`. It sounds like `hc.getType` could return null? or Hive could return some data types we might not recognize. We could hit the exception from Parser, right? That means, the caller of `fromHiveColumn` will also get the exception. `getTableOption` is the caller. I am just wondering if we do not want to see this kind of exception when doing `getTableOption`. Or maybe issue a nicer error message here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14363: [SPARK-16731][SQL] use StructType in CatalogTable...
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/14363 [SPARK-16731][SQL] use StructType in CatalogTable and remove CatalogColumn ## What changes were proposed in this pull request? `StructField` has very similar semantic with `CatalogColumn`, except that `CatalogColumn` use string to express data type. I think it's reasonable to use `StructType` as the `CatalogTable.schema` and remove `CatalogColumn`. ## How was this patch tested? existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark column Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14363.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14363 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org