There is a big difference between the repos of Arrow, Avro, Iceberg etc. and Parquet. The mentioned projects have everything in one repo including the different language bindings etc. so it is natural to have the specs there as well and having universal releases. Meanwhile Parquet has different implementations spread in different repos. parquet-mr is just one implementation. So we must have parquet-format tracked/released separately.
How we want to present our specs is an orthogonal topic. I am completely fine with having a "versioned" link to the tagged github repo. The problem with having the specs on our site is that the thrift file itself is part of the spec. We are referencing it in the docs. I don't know how could we represent the thrift file properly for the site. Cheers, Gabor Vinoo Ganesh <[email protected]> ezt írta (időpont: 2024. márc. 7., Cs, 14:05): > Hi Antoine - Perhaps my thoughts weren't clear - but I'm mostly pointing > out a few things: > > 1. The parquet-format repo doesn't have much code other than the thrift > definition > 2. parquet operates fairly uniquely compared to other products in this > space, that maintain doc versions either in the github repo of their > website. > 3. parquet-format in its current form has been fine, but do we what to > migrate to following the standard of other major projects that build their > format either into their "main project" (which in our case would be > parquet-mr) (ex: https://github.com/apache/iceberg/tree/main/format or > https://github.com/apache/arrow/tree/main/docs/source/format) or on their > website docs (which again are also versioned in github). > > This doesn't have to do with simply a website building procedure, it has to > do with the process of managing releases of parquet and whether or not want > information duplication and an additional step in the building process. > > If folks are happy with the existing state of the world, then there isn't > any need for action. > > To summarize, it sounds like people do not want to archive parquet-format > and want to continue doing doc versioning work in that repo. > > Would it then be fair to suggest that authors of changes to parquet-format > are also responsible for keeping the website up to date? > > On Thu, Mar 7, 2024 at 7:23 AM Uwe L. Korn <[email protected]> wrote: > > > I can strongly second Antoine's response here. It is a small but very > > import repository hold crucial information for the project.. > > > > Best > > Uwe > > > > On Thu, Mar 7, 2024, at 1:17 PM, Antoine Pitrou wrote: > > > Hello, > > > > > > I am surprised that this is suggesting to deprecate or delete a > > > repository just because a website building procedure isn't properly > > > setup to deal with it. > > > > > > ISTM the "right" solution would be for the Parquet website to > > > automatically update its contents based on the latest released version > > > of parquet-format. Perhaps using a git submodule or something. > > > > > > Regards > > > > > > Antoine. > > > > > > > > > On Tue, 5 Mar 2024 21:30:45 -0500 > > > Vinoo Ganesh <[email protected]> > > > wrote: > > >> Hi Parquet Dev - > > >> > > >> There have been some conversations about content stored on the > > >> parquet-format github repo vs. the website. Doing a cursory pass of > the > > >> parquet-format <https://github.com/apache/parquet-format> repo, it > > looks > > >> like, other than the markdown documentation stored in the repo, most > of > > the > > >> core code was marked as deprecated here: > > >> https://github.com/apache/parquet-format/pull/105, content was moved > to > > >> parquet-mr, and that entire repo really only exists to host this file: > > >> > > > https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift > > . > > >> It's possible I'm missing something, but is my understanding correct? > > >> > > >> If so, would it make sense to just deprecate parquet-format as a repo, > > move > > >> the content to be exclusively hosted on parquet-site > > >> <https://github.com/apache/parquet-site/tree/asf-site>, and host the > > thrift > > >> file elsewhere? This would solve the content duplication problem > between > > >> parquet format and the website, and would cut down on having to > manage a > > >> separate repo. I know there is benefit to having comments/discussions > on > > >> PRs or issues on the repo, but we could also pretty easily port this > to > > the > > >> site. > > >> > > >> I'm sure this proposal will elicit some strong responses, but wanted > to > > see > > >> if anyone had insights here / if I'm missing anything. > > >> > > >> Thanks, Vinoo > > >> > > >> > > >> <[email protected]> > > >> > > >
