This context should be added in the PR description itself. My main point is to keep the discussion connected rather than starting new threads on the mailing list or PRs on github that don't refer to the original doc they are connected to.
>From a design process perspective, it makes more difficult to converge the discussion and build consensus if we start multiple threads rather than keeping the discussion on the original thread. Goals are pretty concrete, but we have to write them down to make them clear. They are what motivates the change to the metadata. Discussing the changes in a PR without agreeing on why we're doing them is premature. Similarly before doing benchmarks we need to agree on what we are optimizing for. PRs On Fri, May 17, 2024 at 1:48 AM Antoine Pitrou <[email protected]> wrote: > > Hi Julien, > > Yes, I posted comments on Micah's document, and I referenced this PR in > those discussions. Personally, I feel more comfortable when I have some > concrete proposal to comment on, rather than abstract goals, and I > figured other people might be like me. Discussing actual Thrift > metadata makes it clearer to me where the friction points might reside, > and what the opportunities might be. > > These changes might also later serve as an experimentation platform to > run crude benchmarks and try to validate what's really needed for the > wide-schema case to be handled efficiently. > > They are not intended to be submitted for inclusion anytime soon, and > I'm not planning to push for them if someone comes up with something > better and more thought out. > > All in all, this started as a personal investigation to understand > whether and how a "v3 schema" could be made backwards-compatible, and > when I saw that it seemed actually doable I decided it would be worth > posting the initial sketch instead of keeping it for myself. > > Regards > > Antoine. > > > On Thu, 16 May 2024 18:41:26 -0700 > Julien Le Dem <[email protected]> wrote: > > Hi Antoine, > > > > On the other thread Micah is collecting feedback in a document. > > https://lists.apache.org/thread/61z98xgq2f76jxfjgn5xfq1jhxwm3jwf > > > > Would you mind putting your feedback there? > > We should collect the goals before jumping to solutions. > > It is a bit difficult to discuss those directly in the thrift metadata. > > > > Thank you > > > > > > On Thu, May 16, 2024 at 4:13 AM Antoine Pitrou < > [email protected]> wrote: > > > > > > > > Hello, > > > > > > In the light of recent discussions, I've put up a very rough proposal > > > of a Parquet 3 metadata format that allows both for light-weight > > > file-level metadata and backwards compatibility with legacy readers. > > > > > > For the sake of convenience and out of personal preference, I've made > > > this a PR to parquet-format rather than a Google Doc: > > > https://github.com/apache/parquet-format/pull/242 > > > > > > Feel free to point any glaring mistakes or misunderstandings on my > part, > > > or to comment on details. > > > > > > Regards > > > > > > Antoine. > > > > > > > > > > > > > > >
