As part of this track I wrote up two draft PRs for what I think might be a workable release process for new features and giving concrete guidance on when they should be enabled by default in other implementations:
https://github.com/apache/parquet-format/pull/258 https://github.com/apache/parquet-site/pull/61 I think having a process around this is critical to avoid the confusion we've had over V2. Comments/Feedback appreciated. Thanks, Micah On Mon, May 27, 2024 at 10:46 PM Micah Kornfield <[email protected]> wrote: > As a follow-up to the "V3" Discussions [1][2] I wanted to start a > discussion to see who is interested in improving Parquet infrastructure. > In particular, as we consider newer features, I think we should be > considering regular major version releases, to allow for new features to > become default. > > There are a few areas that we need volunteers for, so it would be good to > get a sense of who is willing to help out. > > 1. Is anyone who isn't already involved in the release process willing to > volunteer to do parquet-java releases on a regular basis? I believe the > requirement is being a committer/PMC member on Parquet but might be > mistaken. Personally, given my current commitments, I think I can help > drive 1 Parquet-java release a year. I think once we can verify we have > enough people we can try to formalize a new release policy with major > version bumps to help ensure any work done on the other tracks will someday > become defaults for consumers. > > 2. Is anybody interested in looking more deeply into developing > integration tests between the different Parquet implementations and major > down-stream consumers of Parquet? I believe Apache arrow has a pretty good > model [3][4] in a lot of respects with cross-language integration tests, > and nightly (via crossbow) integration tests with other consumers, but > there are a wide variety of things that would improve the current state. > One other possible concern is the amount of CI resources this might > consume, and if we will need contributions to fund it. > > 3. I believe someone (maybe Ed) already mentioned they are working on a > full feature matrix for different parquet implementations but this was also > called out as critical. If no-one else is interested, I can also start > putting something together here. > > Anything else people want to bring up in the discussion? > > Thanks, > Micah > > [1] https://lists.apache.org/thread/5jyhzkwyrjk9z52g0b49g31ygnz73gxo > [2] > https://docs.google.com/document/d/19hQLYcU5_r5nJB7GtnjfODLlSDiNS24GXAtKg9b0_ls/edit > [3] > https://arrow.apache.org/docs/format/Integration.html#integration-testing > [4] https://github.com/ursacomputing/crossbow >
