Hello,
As discussed previously on this ML [1], I am proposing to expand the types supported by the BYTE_STREAM_SPLIT encoding. The currently supported types are FLOAT and DOUBLE. The proposal expands the supported types to INT32, INT64 and FIXED_LEN_BYTE_ARRAY. The format addition is tracked on JIRA where some measurements on sample data are also published and discussed [2]. (please note that the original ML thread only discussed expanding to FIXED_LEN_BYTE_ARRAY; discussion on the JIRA issue led to the conclusion that it would also be beneficial to cover INT32 and INT64) The format additions are submitted as a PR in [3]. A data file for integration testing is submitted in [4]. An implementation for Parquet C++ is submitted in [5]. An implementation for parquet-mr is submitted in [6]. This vote will be open for at least 1 week. +1: Accept the format additions +0: ... -1: Reject the format additions because ... Regards Antoine. [1] https://lists.apache.org/thread/5on7rnc141jnw2cdxtsfgm5xhhdmsb4q [2] https://issues.apache.org/jira/browse/PARQUET-2414 [3] https://github.com/apache/parquet-format/pull/229 [4] https://github.com/apache/parquet-testing/pull/46 [5] https://github.com/apache/arrow/pull/40094 [6] https://github.com/apache/parquet-mr/pull/1291
