GitHub user skambha opened a pull request: https://github.com/apache/spark/pull/19747
[Spark-22431][SQL] Ensure that the datatype in the schema for the table/view metadata is parseable by Spark before persisting it ## What changes were proposed in this pull request? * JIRA: [SPARK-22431](https://issues.apache.org/jira/browse/SPARK-22431) : Creating Permanent view with illegal type **Description:** - It is possible in Spark SQL to create a permanent view that uses an nested field with an illegal name. - For example if we create the following view: ```create view x as select struct('a' as `$q`, 1 as b) q``` - A simple select fails with the following exception: ``` select * from x; org.apache.spark.SparkException: Cannot recognize hive type string: struct<$q:string,b:int> at org.apache.spark.sql.hive.client.HiveClientImpl$.fromHiveColumn(HiveClientImpl.scala:812) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11$$anonfun$7.apply(HiveClientImpl.scala:378) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1$$anonfun$apply$11$$anonfun$7.apply(HiveClientImpl.scala:378) ... ``` **Issue/Analysis**: Right now, we can create a view with a schema that cannot be read back by Spark from the Hive metastore. For more details, please see the discussion about the analysis and proposed fix options in comment 1 and comment 2 in the [SPARK-22431](https://issues.apache.org/jira/browse/SPARK-22431) **Proposed changes**: - Fix the hive table/view codepath to check whether the schema datatype is parseable by Spark before persisting it in the metastore. This change is localized to HiveClientImpl to do the check similar to the check in FromHiveColumn. This is fail-fast and we will avoid the scenario where we write something to the metastore that we are unable to read it back. - Added new unit tests - Ran the sql related unit test suites ( hive/test, sql/test, catalyst/test) OK With the fix: ``` create view x as select struct('a' as `$q`, 1 as b) q; 17/11/14 19:16:03 ERROR SparkSQLDriver: Failed in [create view x as select struct('a' as `$q`, 1 as b) q] org.apache.spark.SparkException: Cannot recognize the data type: struct<$q:string,b:int> at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$org$apache$spark$sql$hive$client$HiveClientImpl$$verifyColumnDataType$1.apply(HiveClientImpl.scala:907) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$org$apache$spark$sql$hive$client$HiveClientImpl$$verifyColumnDataType$1.apply(HiveClientImpl.scala:901) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) ``` ## How was this patch tested? - New unit tests have been added. @hvanhovell, Please review and share your thoughts/comments. Thank you so much. You can merge this pull request into a Git repository by running: $ git pull https://github.com/skambha/spark spark22431 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19747.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19747 ---- commit c5824feb40af633ab480b311495ecb7737705c3a Author: Sunitha Kambhampati <skam...@us.ibm.com> Date: 2017-11-14T12:38:17Z Add check to ensure that the schema col datatype is parseable before persisting to metastore, and add unit tests commit ce474b7b028bba45c8bd29c31308503626baafbc Author: Sunitha Kambhampati <skam...@us.ibm.com> Date: 2017-11-14T16:02:00Z Add : in error message commit d5b553438d8740716e402c0210e3d121a48c2c64 Author: Sunitha Kambhampati <skam...@us.ibm.com> Date: 2017-11-14T16:07:28Z Remove empty line commit 626703310aa269a9351a2cf7b6ce23f8e4ab095a Author: Sunitha Kambhampati <skam...@us.ibm.com> Date: 2017-11-14T16:20:06Z remove empty line ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org