[ https://issues.apache.org/jira/browse/SPARK-25102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17074434#comment-17074434 ]
Wenchen Fan commented on SPARK-25102: ------------------------------------- I'd like to propose to backport it to 2.4. It's very important to have version info in the file metadata, to implement backward compatibility. It's unfortunate that we start this too late, but it still helps if Spark 2.4.6 starts to do it, as we will maintain the 2.4 line for a long time. Any thoughts? > Write Spark version to ORC/Parquet file metadata > ------------------------------------------------ > > Key: SPARK-25102 > URL: https://issues.apache.org/jira/browse/SPARK-25102 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.0.0 > Reporter: Zoltan Ivanfi > Assignee: Dongjoon Hyun > Priority: Major > Fix For: 3.0.0 > > > Currently, Spark writes Spark version number into Hive Table properties with > `spark.sql.create.version`. > {code} > parameters:{ > spark.sql.sources.schema.part.0={ > "type":"struct", > "fields":[{"name":"a","type":"integer","nullable":true,"metadata":{}}] > }, > transient_lastDdlTime=1541142761, > spark.sql.sources.schema.numParts=1, > spark.sql.create.version=2.4.0 > } > {code} > This issue aims to write Spark versions to ORC/Parquet file metadata with > `org.apache.spark.sql.create.version`. It's different from Hive Table > property key `spark.sql.create.version`. It seems that we cannot change that > for backward compatibility (even in Apache Spark 3.0) > *ORC* > {code} > User Metadata: > org.apache.spark.sql.create.version=3.0.0-SNAPSHOT > {code} > *PARQUET* > {code} > file: > file:/tmp/p/part-00007-9dc415fe-7773-49ba-9c59-4c151e16009a-c000.snappy.parquet > creator: parquet-mr version 1.10.0 (build > 031a6654009e3b82020012a18434c582bd74c73a) > extra: org.apache.spark.sql.create.version = 3.0.0-SNAPSHOT > extra: org.apache.spark.sql.parquet.row.metadata = > {"type":"struct","fields":[{"name":"id","type":"long","nullable":false,"metadata":{}}]} > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org