(parquet-testing) branch master updated: Add a Parquet file with column chunk key-value metadata (#49)

maplefu Sun, 21 Jul 2024 00:44:32 -0700

This is an automated email from the ASF dual-hosted git repository.

maplefu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/parquet-testing.git



The following commit(s) were added to refs/heads/master by this push:
     new 9b48ff4  Add a Parquet file with column chunk key-value metadata (#49)
9b48ff4 is described below

commit 9b48ff4f94dc5e89592d46a119884dbb88100884
Author: Chungmin Lee <cmlee...@gmail.com>
AuthorDate: Sun Jul 21 00:43:59 2024 -0700

    Add a Parquet file with column chunk key-value metadata (#49)
    
    * Add a Parquet file with column chunk key-value metadata
    
    This file has a single row group with 0 row and 1 column. The column
    chunk has key-value metadata, with a key "foo" mapped to a value "bar".
    
    Created with this code:
    
    ```c++
    PARQUET_ASSIGN_OR_THROW(
        auto sink, arrow::io::FileOutputStream::Open(
                       "column-chunk-key-value-metadata.parquet"));
    parquet::ParquetFileWriter::Open(
        sink, std::static_pointer_cast<parquet::schema::GroupNode>(
                  parquet::schema::GroupNode::Make(
                      "schema", parquet::Repetition::REQUIRED,
                      {parquet::schema::PrimitiveNode::Make(
                          "column1", parquet::Repetition::OPTIONAL,
                          parquet::Type::INT32)})))
        ->AppendRowGroup()
        ->NextColumn()
        ->key_value_metadata()
        .Append("foo", "bar");
    ```
    
    * Rename to match the prevalent style
    
    * Make it 2 columns
    
    * Update data/README.md
    
    * Add a KeyValue entry without Value
    
    * Update data/README.md
    
    Co-authored-by: mwish <maplewish...@gmail.com>
    
    * Update README.md
    
    * Update README.md
    
    ---------
    
    Co-authored-by: mwish <maplewish...@gmail.com>
---
 data/README.md                               |   3 ++-
 data/column_chunk_key_value_metadata.parquet | Bin 0 -> 400 bytes
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/data/README.md b/data/README.md
index 2782a93..70bfb21 100644
--- a/data/README.md
+++ b/data/README.md
@@ -51,6 +51,7 @@
 | concatenated_gzip_members.parquet     | 513 UINT64 numbers compressed using 
2 concatenated gzip members in a single data page |
 | byte_stream_split.zstd.parquet | Standard normals with `BYTE_STREAM_SPLIT` 
encoding. See [note](#byte-stream-split) below |
 | incorrect_map_schema.parquet | Contains a Map schema without explicitly 
required keys, produced by Presto. See [note](#incorrect-map-schema) |
+| column_chunk_key_value_metadata.parquet | two INT32 columns, one with column 
chunk key-value metadata {"foo": "bar", "thisiskeywithoutvalue": null} note 
that the second key "thisiskeywithoutvalue", does not have a value, but the 
value can be mapped to an empty string "" when read depending on the client |
 
 TODO: Document what each file is in the table above.
 
@@ -425,4 +426,4 @@ message hive_schema {
     }
   }
 }
-```
\ No newline at end of file
+```
diff --git a/data/column_chunk_key_value_metadata.parquet 
b/data/column_chunk_key_value_metadata.parquet
new file mode 100644
index 0000000..bcaf871
Binary files /dev/null and b/data/column_chunk_key_value_metadata.parquet differ

(parquet-testing) branch master updated: Add a Parquet file with column chunk key-value metadata (#49)

Reply via email to