This is an automated email from the ASF dual-hosted git repository. maplefu pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/parquet-testing.git
The following commit(s) were added to refs/heads/master by this push: new 9b48ff4 Add a Parquet file with column chunk key-value metadata (#49) 9b48ff4 is described below commit 9b48ff4f94dc5e89592d46a119884dbb88100884 Author: Chungmin Lee <cmlee...@gmail.com> AuthorDate: Sun Jul 21 00:43:59 2024 -0700 Add a Parquet file with column chunk key-value metadata (#49) * Add a Parquet file with column chunk key-value metadata This file has a single row group with 0 row and 1 column. The column chunk has key-value metadata, with a key "foo" mapped to a value "bar". Created with this code: ```c++ PARQUET_ASSIGN_OR_THROW( auto sink, arrow::io::FileOutputStream::Open( "column-chunk-key-value-metadata.parquet")); parquet::ParquetFileWriter::Open( sink, std::static_pointer_cast<parquet::schema::GroupNode>( parquet::schema::GroupNode::Make( "schema", parquet::Repetition::REQUIRED, {parquet::schema::PrimitiveNode::Make( "column1", parquet::Repetition::OPTIONAL, parquet::Type::INT32)}))) ->AppendRowGroup() ->NextColumn() ->key_value_metadata() .Append("foo", "bar"); ``` * Rename to match the prevalent style * Make it 2 columns * Update data/README.md * Add a KeyValue entry without Value * Update data/README.md Co-authored-by: mwish <maplewish...@gmail.com> * Update README.md * Update README.md --------- Co-authored-by: mwish <maplewish...@gmail.com> --- data/README.md | 3 ++- data/column_chunk_key_value_metadata.parquet | Bin 0 -> 400 bytes 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/data/README.md b/data/README.md index 2782a93..70bfb21 100644 --- a/data/README.md +++ b/data/README.md @@ -51,6 +51,7 @@ | concatenated_gzip_members.parquet | 513 UINT64 numbers compressed using 2 concatenated gzip members in a single data page | | byte_stream_split.zstd.parquet | Standard normals with `BYTE_STREAM_SPLIT` encoding. See [note](#byte-stream-split) below | | incorrect_map_schema.parquet | Contains a Map schema without explicitly required keys, produced by Presto. See [note](#incorrect-map-schema) | +| column_chunk_key_value_metadata.parquet | two INT32 columns, one with column chunk key-value metadata {"foo": "bar", "thisiskeywithoutvalue": null} note that the second key "thisiskeywithoutvalue", does not have a value, but the value can be mapped to an empty string "" when read depending on the client | TODO: Document what each file is in the table above. @@ -425,4 +426,4 @@ message hive_schema { } } } -``` \ No newline at end of file +``` diff --git a/data/column_chunk_key_value_metadata.parquet b/data/column_chunk_key_value_metadata.parquet new file mode 100644 index 0000000..bcaf871 Binary files /dev/null and b/data/column_chunk_key_value_metadata.parquet differ