[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

liancheng Thu, 12 Nov 2015 03:17:08 -0800

Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/9060#issuecomment-156077334
  
    You may construct a Parquet file consists of a single column with 
dictionary encoding using:
    
    ```scala
    val path = "file:///tmp/parquet/dict"
    sqlContext.range(1 << 16).selectExpr("(id % 4) AS 
i").coalesce(1).write.mode("overwrite").parquet(path)
    ```
    
    And here are instructions of building and installing the parquet-tools CLI 
tool. Then you can inspect Parquet metadata using:
    
    ```
    $ parquet-meta /tmp/parquet/dict
    
    file:        
file:/private/tmp/parquet/dict/part-r-00000-88498608-9eed-4728-b96a-b60bc5ebc2a8.gz.parquet
    creator:     parquet-mr version 1.6.0
    extra:       org.apache.spark.sql.parquet.row.metadata = 
{"type":"struct","fields":[{"name":"i","type":"long","nullable":true,"metadata":{}}]}
    
    file schema: root
    
----------------------------------------------------------------------------------------------------------------------------------------------
    i:           OPTIONAL INT64 R:0 D:1
    
    row group 1: RC:65536 TS:16615 OFFSET:4
    
----------------------------------------------------------------------------------------------------------------------------------------------
    i:            INT64 GZIP DO:0 FPO:4 SZ:198/16615/83.91 VC:65536 
ENC:BIT_PACKED,RLE,PLAIN_DICTIONARY
    ```
    
    The `ENC:...` part in the last line is column encoding information.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11044][SQL] Parquet writer version fixe...

Reply via email to