GitHub user wangyum opened a pull request: https://github.com/apache/spark/pull/22287
[SPARK-25135][SQL] FileFormatWriter should respect the schema of Hive ## What changes were proposed in this pull request? This pr fix `FileFormatWriter's dataSchema` should respect the schema of Hive. Otherwise there will be two issues. 1. Throwing an exception(This can be reproduce by added test case): ```scala java.util.NoSuchElementException: None.get at scala.None$.get(Option.scala:347) at scala.None$.get(Option.scala:345) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$3$$anonfun$4.apply(FileFormatWriter.scala:87) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$3$$anonfun$4.apply(FileFormatWriter.scala:87) ``` 2. The schema of the Hive table is not the same as the schema of the parquet file. ## How was this patch tested? - Unit tests for FileFormatWriter should respect the schema of Hive. - Manual tests for didn't break UI issues fixed by [SPARK-22834](https://issues.apache.org/jira/browse/SPARK-22834): ![image](https://user-images.githubusercontent.com/5399861/44870021-94ce1700-acc1-11e8-8ef7-d7a8ba3c435d.png) You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyum/spark SPARK-25135-view Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22287.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22287 ---- commit b54953a8224aa0a7759289a83e876e3bfc166cb6 Author: Yuming Wang <yumwang@...> Date: 2018-08-30T17:46:02Z FileFormatWriter should respect the input query schema in HIVE ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org