+1 (non-binding)

Thanks for your work on this!
Ed
________________________________
From: Antoine Pitrou <[email protected]>
Sent: Thursday, March 7, 2024 5:15 AM
To: [email protected] <[email protected]>
Subject: [VOTE] Expand BYTE_STREAM_SPLIT to support FIXED_LEN_BYTE_ARRAY, INT32 
and INT64


Hello,

As discussed previously on this ML [1], I am proposing to expand
the types supported by the BYTE_STREAM_SPLIT encoding. The currently
supported types are FLOAT and DOUBLE. The proposal expands the
supported types to INT32, INT64 and FIXED_LEN_BYTE_ARRAY.

The format addition is tracked on JIRA where some measurements on
sample data are also published and discussed [2].

(please note that the original ML thread only discussed expanding
to FIXED_LEN_BYTE_ARRAY; discussion on the JIRA issue led to the
conclusion that it would also be beneficial to cover INT32 and INT64)

The format additions are submitted as a PR in [3].
A data file for integration testing is submitted in [4].
An implementation for Parquet C++ is submitted in [5].
An implementation for parquet-mr is submitted in [6].

This vote will be open for at least 1 week.

+1: Accept the format additions
+0: ...
-1: Reject the format additions because ...

Regards

Antoine.


[1] https://lists.apache.org/thread/5on7rnc141jnw2cdxtsfgm5xhhdmsb4q
[2] https://issues.apache.org/jira/browse/PARQUET-2414
[3] https://github.com/apache/parquet-format/pull/229
[4] https://github.com/apache/parquet-testing/pull/46
[5] https://github.com/apache/arrow/pull/40094
[6] https://github.com/apache/parquet-mr/pull/1291



Reply via email to