GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/22932
[SPARK-25102][SQL] Write Spark version to ORC/Parquet file metadata ## What changes were proposed in this pull request? Currently, Spark writes Spark version number into Hive Table properties with `spark.sql.create.version`. ``` parameters:{ spark.sql.sources.schema.part.0={ "type":"struct", "fields":[{"name":"a","type":"integer","nullable":true,"metadata":{}}] }, transient_lastDdlTime=1541142761, spark.sql.sources.schema.numParts=1, spark.sql.create.version=2.4.0 } ``` This PR aims to write Spark versions to ORC/Parquet file metadata with `org.apache.spark.sql.create.version`. It's different from Hive Table property key `spark.sql.create.version`, but it seems that we cannot change that for backward compatibility. **ORC (`native` and `hive` implmentation)** ``` File Version: 0.12 with ORC_135 ... User Metadata: org.apache.spark.sql.create.version=3.0.0-SNAPSHOT ``` **PARQUET** ``` creator: parquet-mr version 1.10.0 (build 031a6654009e3b82020012a18434c582bd74c73a) extra: org.apache.spark.sql.create.version = 3.0.0-SNAPSHOT extra: org.apache.spark.sql.parquet.row.metadata = {"type":"struct","fields":[{"name":"id","type":"long","nullable":false,"metadata":{}}]} ``` ## How was this patch tested? Pass the Jenkins with newly added test cases. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-25102 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22932.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22932 ---- commit 601ccbb4e20a068469839bc71870230cfb6fd7a1 Author: Dongjoon Hyun <dongjoon@...> Date: 2018-11-03T06:43:48Z [SPARK-25102][SQL] Write Spark version to ORC/Parquet file metadata ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org