Hi Antoine -  Perhaps my thoughts weren't clear - but I'm mostly pointing
out a few things:

1. The parquet-format repo doesn't have much code other than the thrift
definition
2. parquet operates fairly uniquely compared to other products in this
space, that maintain doc versions either in the github repo of their
website.
3. parquet-format in its current form has been fine, but do we what to
migrate to following the standard of other major projects that build their
format either into their "main project" (which in our case would be
parquet-mr) (ex: https://github.com/apache/iceberg/tree/main/format or
https://github.com/apache/arrow/tree/main/docs/source/format) or on their
website docs (which again are also versioned in github).

This doesn't have to do with simply a website building procedure, it has to
do with the process of managing releases of parquet and whether or not want
information duplication and an additional step in the building process.

If folks are happy with the existing state of the world, then there isn't
any need for action.

To summarize, it sounds like people do not want to archive parquet-format
and want to continue doing doc versioning work in that repo.

Would it then be fair to suggest that authors of changes to parquet-format
are also responsible for keeping the website up to date?

On Thu, Mar 7, 2024 at 7:23 AM Uwe L. Korn <[email protected]> wrote:

> I can strongly second Antoine's response here. It is a small but very
> import repository hold crucial information for the project..
>
> Best
> Uwe
>
> On Thu, Mar 7, 2024, at 1:17 PM, Antoine Pitrou wrote:
> > Hello,
> >
> > I am surprised that this is suggesting to deprecate or delete a
> > repository just because a website building procedure isn't properly
> > setup to deal with it.
> >
> > ISTM the "right" solution would be for the Parquet website to
> > automatically update its contents based on the latest released version
> > of parquet-format. Perhaps using a git submodule or something.
> >
> > Regards
> >
> > Antoine.
> >
> >
> > On Tue, 5 Mar 2024 21:30:45 -0500
> > Vinoo Ganesh <[email protected]>
> > wrote:
> >> Hi Parquet Dev -
> >>
> >> There have been some conversations about content stored on the
> >> parquet-format github repo vs. the website. Doing a cursory pass of the
> >> parquet-format <https://github.com/apache/parquet-format> repo, it
> looks
> >> like, other than the markdown documentation stored in the repo, most of
> the
> >> core code was marked as deprecated here:
> >> https://github.com/apache/parquet-format/pull/105, content was moved to
> >> parquet-mr, and that entire repo really only exists to host this file:
> >>
> https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift
> .
> >> It's possible I'm missing something, but is my understanding correct?
> >>
> >> If so, would it make sense to just deprecate parquet-format as a repo,
> move
> >> the content to be exclusively hosted on parquet-site
> >> <https://github.com/apache/parquet-site/tree/asf-site>, and host the
> thrift
> >> file elsewhere? This would solve the content duplication problem between
> >> parquet format and the website, and would cut down on having to manage a
> >> separate repo. I know there is benefit to having comments/discussions on
> >> PRs or issues on the repo, but we could also pretty easily port this to
> the
> >> site.
> >>
> >> I'm sure this proposal will elicit some strong responses, but wanted to
> see
> >> if anyone had insights here / if I'm missing anything.
> >>
> >> Thanks, Vinoo
> >>
> >>
> >> <[email protected]>
> >>
>

Reply via email to