+1 (non-binding) Thanks for your work on this! Ed ________________________________ From: Antoine Pitrou <[email protected]> Sent: Thursday, March 7, 2024 5:15 AM To: [email protected] <[email protected]> Subject: [VOTE] Expand BYTE_STREAM_SPLIT to support FIXED_LEN_BYTE_ARRAY, INT32 and INT64
Hello, As discussed previously on this ML [1], I am proposing to expand the types supported by the BYTE_STREAM_SPLIT encoding. The currently supported types are FLOAT and DOUBLE. The proposal expands the supported types to INT32, INT64 and FIXED_LEN_BYTE_ARRAY. The format addition is tracked on JIRA where some measurements on sample data are also published and discussed [2]. (please note that the original ML thread only discussed expanding to FIXED_LEN_BYTE_ARRAY; discussion on the JIRA issue led to the conclusion that it would also be beneficial to cover INT32 and INT64) The format additions are submitted as a PR in [3]. A data file for integration testing is submitted in [4]. An implementation for Parquet C++ is submitted in [5]. An implementation for parquet-mr is submitted in [6]. This vote will be open for at least 1 week. +1: Accept the format additions +0: ... -1: Reject the format additions because ... Regards Antoine. [1] https://lists.apache.org/thread/5on7rnc141jnw2cdxtsfgm5xhhdmsb4q [2] https://issues.apache.org/jira/browse/PARQUET-2414 [3] https://github.com/apache/parquet-format/pull/229 [4] https://github.com/apache/parquet-testing/pull/46 [5] https://github.com/apache/arrow/pull/40094 [6] https://github.com/apache/parquet-mr/pull/1291
