[GitHub] spark pull request #21711: [SPARK-24681][SQL] Verify nested column names in ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21711 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21711: [SPARK-24681][SQL] Verify nested column names in ...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/21711#discussion_r202568645 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -138,17 +138,36 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat } /** - * Checks the validity of data column names. Hive metastore disallows the table to use comma in - * data column names. Partition columns do not have such a restriction. Views do not have such - * a restriction. + * Checks the validity of data column names. Hive metastore disallows the table to use some + * special characters (',', ':', and ';') in data column names, including nested column names. + * Partition columns do not have such a restriction. Views do not have such a restriction. */ private def verifyDataSchema( tableName: TableIdentifier, tableType: CatalogTableType, dataSchema: StructType): Unit = { if (tableType != VIEW) { - dataSchema.map(_.name).foreach { colName => -if (colName.contains(",")) { - throw new AnalysisException("Cannot create a table having a column whose name contains " + -s"commas in Hive metastore. Table: $tableName; Column: $colName") + val invalidChars = Seq(",", ":", ";") + def verifyNestedColumnNames(schema: StructType): Unit = schema.foreach { f => +f.dataType match { + case st: StructType => verifyNestedColumnNames(st) + case _ if invalidChars.exists(f.name.contains) => +val errMsg = "Cannot create a table having a nested column whose name contains " + + s"invalid characters (${invalidChars.map(c => s"'$c'").mkString(", ")}) " + --- End diff -- aha, I'll fix, thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21711: [SPARK-24681][SQL] Verify nested column names in ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21711#discussion_r202567965 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -138,17 +138,36 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat } /** - * Checks the validity of data column names. Hive metastore disallows the table to use comma in - * data column names. Partition columns do not have such a restriction. Views do not have such - * a restriction. + * Checks the validity of data column names. Hive metastore disallows the table to use some + * special characters (',', ':', and ';') in data column names, including nested column names. + * Partition columns do not have such a restriction. Views do not have such a restriction. */ private def verifyDataSchema( tableName: TableIdentifier, tableType: CatalogTableType, dataSchema: StructType): Unit = { if (tableType != VIEW) { - dataSchema.map(_.name).foreach { colName => -if (colName.contains(",")) { - throw new AnalysisException("Cannot create a table having a column whose name contains " + -s"commas in Hive metastore. Table: $tableName; Column: $colName") + val invalidChars = Seq(",", ":", ";") + def verifyNestedColumnNames(schema: StructType): Unit = schema.foreach { f => +f.dataType match { + case st: StructType => verifyNestedColumnNames(st) + case _ if invalidChars.exists(f.name.contains) => +val errMsg = "Cannot create a table having a nested column whose name contains " + + s"invalid characters (${invalidChars.map(c => s"'$c'").mkString(", ")}) " + --- End diff -- Normally, in this case, what we do is like: ```Scala val invalidCharsString = invalidChars.map(c => s"'$c'").mkString(", ") val errMsg = "Cannot create a table having a nested column whose name contains " + s"invalid characters ($invalidCharsString) in Hive metastore. Table: $tableName; " + s"Column: ${f.name}" ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21711: [SPARK-24681][SQL] Verify nested column names in ...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/21711#discussion_r202537869 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -138,17 +138,36 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat } /** - * Checks the validity of data column names. Hive metastore disallows the table to use comma in - * data column names. Partition columns do not have such a restriction. Views do not have such - * a restriction. + * Checks the validity of data column names. Hive metastore disallows the table to use some + * special characters (',', ':', and ';') in data column names, including nested column names. + * Partition columns do not have such a restriction. Views do not have such a restriction. */ private def verifyDataSchema( tableName: TableIdentifier, tableType: CatalogTableType, dataSchema: StructType): Unit = { if (tableType != VIEW) { - dataSchema.map(_.name).foreach { colName => -if (colName.contains(",")) { - throw new AnalysisException("Cannot create a table having a column whose name contains " + -s"commas in Hive metastore. Table: $tableName; Column: $colName") + val invalidChars = Seq(",", ":", ";") + def verifyNestedColumnNames(schema: StructType): Unit = schema.foreach { f => +f.dataType match { + case st: StructType => verifyNestedColumnNames(st) + case _ if invalidChars.exists(f.name.contains) => +val errMsg = "Cannot create a table having a nested column whose name contains " + + s"invalid characters (${invalidChars.map(c => s"'$c'").mkString(", ")}) " + --- End diff -- This is a weird red highlight...the syntax seems to be correct to me (also, the test passed). Anything you know? @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21711: [SPARK-24681][SQL] Verify nested column names in ...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/21711#discussion_r202536743 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -138,17 +138,35 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat } /** - * Checks the validity of data column names. Hive metastore disallows the table to use comma in - * data column names. Partition columns do not have such a restriction. Views do not have such - * a restriction. + * Checks the validity of data column names. Hive metastore disallows the table to use some + * special characters (',', ':', and ';') in data column names. Partition columns do not have + * such a restriction. Views do not have such a restriction. */ private def verifyDataSchema( tableName: TableIdentifier, tableType: CatalogTableType, dataSchema: StructType): Unit = { if (tableType != VIEW) { - dataSchema.map(_.name).foreach { colName => -if (colName.contains(",")) { - throw new AnalysisException("Cannot create a table having a column whose name contains " + -s"commas in Hive metastore. Table: $tableName; Column: $colName") + val invalidChars = Seq(",", ":", ";") + def verifyNestedColumnNames(schema: StructType): Unit = schema.foreach { f => +f.dataType match { + case st: StructType => verifyNestedColumnNames(st) + case _ if invalidChars.exists(f.name.contains) => +throw new AnalysisException("Cannot create a table having a nested column whose name " + + s"contains invalid characters (${invalidChars.map(c => s"'$c'").mkString(", ")}) " + --- End diff -- oh.. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21711: [SPARK-24681][SQL] Verify nested column names in ...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/21711#discussion_r202536673 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala --- @@ -2005,6 +2005,24 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton { } } + test("SPARK-24681 checks if nested column names do not include ',', ':', and ';'") { --- End diff -- ok --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21711: [SPARK-24681][SQL] Verify nested column names in ...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/21711#discussion_r202536655 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -138,17 +138,35 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat } /** - * Checks the validity of data column names. Hive metastore disallows the table to use comma in - * data column names. Partition columns do not have such a restriction. Views do not have such - * a restriction. + * Checks the validity of data column names. Hive metastore disallows the table to use some + * special characters (',', ':', and ';') in data column names. Partition columns do not have --- End diff -- ok --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21711: [SPARK-24681][SQL] Verify nested column names in ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21711#discussion_r202531134 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -138,17 +138,35 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat } /** - * Checks the validity of data column names. Hive metastore disallows the table to use comma in - * data column names. Partition columns do not have such a restriction. Views do not have such - * a restriction. + * Checks the validity of data column names. Hive metastore disallows the table to use some + * special characters (',', ':', and ';') in data column names. Partition columns do not have + * such a restriction. Views do not have such a restriction. */ private def verifyDataSchema( tableName: TableIdentifier, tableType: CatalogTableType, dataSchema: StructType): Unit = { if (tableType != VIEW) { - dataSchema.map(_.name).foreach { colName => -if (colName.contains(",")) { - throw new AnalysisException("Cannot create a table having a column whose name contains " + -s"commas in Hive metastore. Table: $tableName; Column: $colName") + val invalidChars = Seq(",", ":", ";") + def verifyNestedColumnNames(schema: StructType): Unit = schema.foreach { f => +f.dataType match { + case st: StructType => verifyNestedColumnNames(st) + case _ if invalidChars.exists(f.name.contains) => +throw new AnalysisException("Cannot create a table having a nested column whose name " + + s"contains invalid characters (${invalidChars.map(c => s"'$c'").mkString(", ")}) " + --- End diff -- something wrong, right? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21711: [SPARK-24681][SQL] Verify nested column names in ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21711#discussion_r202531122 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -138,17 +138,35 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat } /** - * Checks the validity of data column names. Hive metastore disallows the table to use comma in - * data column names. Partition columns do not have such a restriction. Views do not have such - * a restriction. + * Checks the validity of data column names. Hive metastore disallows the table to use some + * special characters (',', ':', and ';') in data column names. Partition columns do not have --- End diff -- > in data column names. -> > in data column names, including nested column names. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21711: [SPARK-24681][SQL] Verify nested column names in ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21711#discussion_r202531090 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala --- @@ -2005,6 +2005,24 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton { } } + test("SPARK-24681 checks if nested column names do not include ',', ':', and ';'") { --- End diff -- Move it to HiveDDLSuite? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21711: [SPARK-24681][SQL] Verify nested column names in ...
GitHub user maropu opened a pull request: https://github.com/apache/spark/pull/21711 [SPARK-24681][SQL] Verify nested column names in Hive metastore ## What changes were proposed in this pull request? This pr added code to check if nested column names do not include ',', ':', and ';' because Hive metastore can't handle these characters in nested column names; ref: https://github.com/apache/hive/blob/release-1.2.1/serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/TypeInfoUtils.java#L239 ## How was this patch tested? Added tests in `SQLQuerySuite`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/maropu/spark SPARK-24681 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21711.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21711 commit dbc300edb56b6e813c926b061e780378ee564778 Author: Takeshi Yamamuro Date: 2018-07-04T04:07:04Z Fix --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org