Hi Gabor - Thanks for providing that context! Given the variety of
different implementations, it sounds like keeping the status quo is best
for now. I'll publish the latest version of the website and will also look
into including versioned links to the tagged parquet-format github repo
there too.

Thanks everyone for your thoughts and helping resolve this.



<[email protected]>


On Thu, Mar 7, 2024 at 9:08 AM Gábor Szádovszky <[email protected]> wrote:

> There is a big difference between the repos of Arrow, Avro, Iceberg etc.
> and Parquet. The mentioned projects have everything in one repo including
> the different language bindings etc. so it is natural to have the specs
> there as well and having universal releases.
> Meanwhile Parquet has different implementations spread in different repos.
> parquet-mr is just one implementation. So we must have parquet-format
> tracked/released separately.
>
> How we want to present our specs is an orthogonal topic. I am
> completely fine with having a "versioned" link to the tagged github repo.
> The problem with having the specs on our site is that the thrift file
> itself is part of the spec. We are referencing it in the docs. I don't know
> how could we represent the thrift file properly for the site.
>
> Cheers,
> Gabor
>
> Vinoo Ganesh <[email protected]> ezt írta (időpont: 2024. márc. 7.,
> Cs, 14:05):
>
> > Hi Antoine -  Perhaps my thoughts weren't clear - but I'm mostly pointing
> > out a few things:
> >
> > 1. The parquet-format repo doesn't have much code other than the thrift
> > definition
> > 2. parquet operates fairly uniquely compared to other products in this
> > space, that maintain doc versions either in the github repo of their
> > website.
> > 3. parquet-format in its current form has been fine, but do we what to
> > migrate to following the standard of other major projects that build
> their
> > format either into their "main project" (which in our case would be
> > parquet-mr) (ex: https://github.com/apache/iceberg/tree/main/format or
> > https://github.com/apache/arrow/tree/main/docs/source/format) or on
> their
> > website docs (which again are also versioned in github).
> >
> > This doesn't have to do with simply a website building procedure, it has
> to
> > do with the process of managing releases of parquet and whether or not
> want
> > information duplication and an additional step in the building process.
> >
> > If folks are happy with the existing state of the world, then there isn't
> > any need for action.
> >
> > To summarize, it sounds like people do not want to archive parquet-format
> > and want to continue doing doc versioning work in that repo.
> >
> > Would it then be fair to suggest that authors of changes to
> parquet-format
> > are also responsible for keeping the website up to date?
> >
> > On Thu, Mar 7, 2024 at 7:23 AM Uwe L. Korn <[email protected]> wrote:
> >
> > > I can strongly second Antoine's response here. It is a small but very
> > > import repository hold crucial information for the project..
> > >
> > > Best
> > > Uwe
> > >
> > > On Thu, Mar 7, 2024, at 1:17 PM, Antoine Pitrou wrote:
> > > > Hello,
> > > >
> > > > I am surprised that this is suggesting to deprecate or delete a
> > > > repository just because a website building procedure isn't properly
> > > > setup to deal with it.
> > > >
> > > > ISTM the "right" solution would be for the Parquet website to
> > > > automatically update its contents based on the latest released
> version
> > > > of parquet-format. Perhaps using a git submodule or something.
> > > >
> > > > Regards
> > > >
> > > > Antoine.
> > > >
> > > >
> > > > On Tue, 5 Mar 2024 21:30:45 -0500
> > > > Vinoo Ganesh <[email protected]>
> > > > wrote:
> > > >> Hi Parquet Dev -
> > > >>
> > > >> There have been some conversations about content stored on the
> > > >> parquet-format github repo vs. the website. Doing a cursory pass of
> > the
> > > >> parquet-format <https://github.com/apache/parquet-format> repo, it
> > > looks
> > > >> like, other than the markdown documentation stored in the repo, most
> > of
> > > the
> > > >> core code was marked as deprecated here:
> > > >> https://github.com/apache/parquet-format/pull/105, content was
> moved
> > to
> > > >> parquet-mr, and that entire repo really only exists to host this
> file:
> > > >>
> > >
> >
> https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift
> > > .
> > > >> It's possible I'm missing something, but is my understanding
> correct?
> > > >>
> > > >> If so, would it make sense to just deprecate parquet-format as a
> repo,
> > > move
> > > >> the content to be exclusively hosted on parquet-site
> > > >> <https://github.com/apache/parquet-site/tree/asf-site>, and host
> the
> > > thrift
> > > >> file elsewhere? This would solve the content duplication problem
> > between
> > > >> parquet format and the website, and would cut down on having to
> > manage a
> > > >> separate repo. I know there is benefit to having
> comments/discussions
> > on
> > > >> PRs or issues on the repo, but we could also pretty easily port this
> > to
> > > the
> > > >> site.
> > > >>
> > > >> I'm sure this proposal will elicit some strong responses, but wanted
> > to
> > > see
> > > >> if anyone had insights here / if I'm missing anything.
> > > >>
> > > >> Thanks, Vinoo
> > > >>
> > > >>
> > > >> <[email protected]>
> > > >>
> > >
> >
>

Reply via email to