GitHub user dongjoon-hyun opened a pull request:

    https://github.com/apache/spark/pull/22932

    [SPARK-25102][SQL] Write Spark version to ORC/Parquet file metadata

    ## What changes were proposed in this pull request?
    
    Currently, Spark writes Spark version number into Hive Table properties 
with `spark.sql.create.version`.
    ```
    parameters:{
      spark.sql.sources.schema.part.0={
        "type":"struct",
        "fields":[{"name":"a","type":"integer","nullable":true,"metadata":{}}]
      },
      transient_lastDdlTime=1541142761, 
      spark.sql.sources.schema.numParts=1,
      spark.sql.create.version=2.4.0
    }
    ```
    
    This PR aims to write Spark versions to ORC/Parquet file metadata with 
`org.apache.spark.sql.create.version`. It's different from Hive Table property 
key `spark.sql.create.version`, but it seems that we cannot change that for 
backward compatibility.
    
    **ORC (`native` and `hive` implmentation)**
    ```
    File Version: 0.12 with ORC_135
    ...
    User Metadata:
      org.apache.spark.sql.create.version=3.0.0-SNAPSHOT
    ```
    
    **PARQUET**
    ```
    creator:     parquet-mr version 1.10.0 (build 
031a6654009e3b82020012a18434c582bd74c73a)
    extra:       org.apache.spark.sql.create.version = 3.0.0-SNAPSHOT
    extra:       org.apache.spark.sql.parquet.row.metadata = 
{"type":"struct","fields":[{"name":"id","type":"long","nullable":false,"metadata":{}}]}
    ```
    
    ## How was this patch tested?
    
    Pass the Jenkins with newly added test cases.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dongjoon-hyun/spark SPARK-25102

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22932.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22932
    
----
commit 601ccbb4e20a068469839bc71870230cfb6fd7a1
Author: Dongjoon Hyun <dongjoon@...>
Date:   2018-11-03T06:43:48Z

    [SPARK-25102][SQL] Write Spark version to ORC/Parquet file metadata

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to