+1 (non-binding) Best, Gang
On Fri, Mar 8, 2024 at 5:05 AM Edward Seidl <[email protected]> wrote: > +1 (non-binding) > > Thanks for your work on this! > Ed > ________________________________ > From: Antoine Pitrou <[email protected]> > Sent: Thursday, March 7, 2024 5:15 AM > To: [email protected] <[email protected]> > Subject: [VOTE] Expand BYTE_STREAM_SPLIT to support FIXED_LEN_BYTE_ARRAY, > INT32 and INT64 > > > Hello, > > As discussed previously on this ML [1], I am proposing to expand > the types supported by the BYTE_STREAM_SPLIT encoding. The currently > supported types are FLOAT and DOUBLE. The proposal expands the > supported types to INT32, INT64 and FIXED_LEN_BYTE_ARRAY. > > The format addition is tracked on JIRA where some measurements on > sample data are also published and discussed [2]. > > (please note that the original ML thread only discussed expanding > to FIXED_LEN_BYTE_ARRAY; discussion on the JIRA issue led to the > conclusion that it would also be beneficial to cover INT32 and INT64) > > The format additions are submitted as a PR in [3]. > A data file for integration testing is submitted in [4]. > An implementation for Parquet C++ is submitted in [5]. > An implementation for parquet-mr is submitted in [6]. > > This vote will be open for at least 1 week. > > +1: Accept the format additions > +0: ... > -1: Reject the format additions because ... > > Regards > > Antoine. > > > [1] https://lists.apache.org/thread/5on7rnc141jnw2cdxtsfgm5xhhdmsb4q > [2] https://issues.apache.org/jira/browse/PARQUET-2414 > [3] https://github.com/apache/parquet-format/pull/229 > [4] https://github.com/apache/parquet-testing/pull/46 > [5] https://github.com/apache/arrow/pull/40094 > [6] https://github.com/apache/parquet-mr/pull/1291 > > > >
