Yes, that sounds good to me.

Thanks!
Gang

On Tue, Mar 5, 2024 at 10:08 AM Vinoo Ganesh <vinoo.gan...@gmail.com> wrote:

> Hi Gang - We could embed the README.md on parquet-format as an iframe on
> the docsy website (something better than just a url link). It could also be
> easy just to link. The other option, which seems to be what iceberg does  (
> https://iceberg.apache.org/docs/latest/ or
> https://iceberg.apache.org/docs/1.4.3/) is to actually version the entire
> set of docs and tie it to a version of parquet-mr or parquet-format. I've
> mostly treated releases as blog posts for now
> https://parquet.apache.org/blog/, but if that's not the best way to handle
> versioned docs, we can explore adopting Iceberg's model.
> <vinoo.gan...@gmail.com>
>
>
> On Mon, Mar 4, 2024 at 8:50 PM Gang Wu <ust...@gmail.com> wrote:
>
> > Hi Vinoo,
> >
> > Thanks for the reply! How do you want to embed the README?
> > Linking it to the parquet-format repo or just copying the whole content?
> > IMO we might need to make it clear to the users that they know what
> > version of the format they are looking at. Therefore linking to the
> > format repo (and maybe add different versions as well) sounds much
> > better to me.
> >
> > Best,
> > Gang
> >
> > On Tue, Mar 5, 2024 at 3:18 AM Vinoo Ganesh <vinoo.gan...@gmail.com>
> > wrote:
> >
> > > Hi All - Sorry I missed this email chain. I've been mostly responsible
> > > for building the infrastructure around the new parquet-site website,
> but
> > > have mostly left the existing content alone. I'm happy to just link to
> > the
> > > parquet-format repo, but that would mean the content is no longer
> > > searchable from the website, and users would have to first find the
> link
> > to
> > > the parquet-format repo from the docs and then navigate there.
> > >
> > > I could just embed the parquet-format README in an iframe on the spec
> > docs.
> > > Alternatively, as part of the release actions, we can add a task that
> > opens
> > > an issue on parquet-site for update.
> > >
> > > Do people have thoughts / opinions on these two?
> > >
> > > On Thu, Jan 18, 2024 at 1:33 PM Kaili Zhang <kaili...@hotmail.com>
> > wrote:
> > >
> > > > Hi Gabor
> > > >
> > > > I am OK with that. As long as the information is up-to-date, whatever
> > > > method most convenient for the devs will do.
> > > >
> > > > Kind regards
> > > >
> > > > Kaili
> > > >
> > > > ________________________________
> > > > From: Gábor Szádovszky <ga...@apache.org>
> > > > Sent: Monday, January 15, 2024 12:25:39 AM
> > > > To: dev@parquet.apache.org <dev@parquet.apache.org>
> > > > Subject: Re: Discrepancy in parquet format documentation
> > > >
> > > > Hey Gang, Kaili,
> > > >
> > > > I think the easiest way to solve this issue is to completely remove
> the
> > > > spec from the site and add a reference to the parquet-format repo
> > > instead.
> > > > We should probably add the release tag links when we make a release
> of
> > > > parquet-format with a "latest" link. This way we would also avoid
> > > potential
> > > > issues when someone would make decisions based on un-released spec
> > > changes.
> > > >
> > > > Cheers,
> > > > Gabor
> > > >
> > > > Kaili Zhang <kaili...@hotmail.com> ezt írta (időpont: 2024. jan.
> 13.,
> > > Szo,
> > > > 20:53):
> > > >
> > > > > Hi Gang
> > > > >
> > > > > Thank you for looking into this. Updating the description on
> > > > > parquet.apache.org will save everyone searching for this
> > information a
> > > > > few hours of head scratching. It is unfortunate that the slightly
> > > > > out-of-date spec features more prominently in Google results.
> > > > >
> > > > > Kind regards
> > > > >
> > > > > Kaili
> > > > > ________________________________
> > > > > From: Gang Wu <ust...@gmail.com>
> > > > > Sent: Tuesday, January 9, 2024 5:56 PM
> > > > > To: dev@parquet.apache.org <dev@parquet.apache.org>
> > > > > Subject: Re: Discrepancy in parquet format documentation
> > > > >
> > > > > Hi Kaili,
> > > > >
> > > > > You're right. Please refer to the parquet-format repo for specs.
> The
> > > site
> > > > > is unfortunately out of sync for a long time and there isn't any
> > > > automatic
> > > > > process to update it. Let me update the site manually to be in sync
> > > with
> > > > > the latest format release.
> > > > >
> > > > > Best,
> > > > > Gang
> > > > >
> > > > > On Sun, Jan 7, 2024 at 8:03 AM Kaili Zhang <kaili...@hotmail.com>
> > > wrote:
> > > > >
> > > > > > Hi all
> > > > > >
> > > > > > I found this page via Google when searching for a description of
> > the
> > > > > > parquet binary format:
> > > > > > https://parquet.apache.org/docs/file-format/data-pages/. This
> page
> > > > > > suggests that definition levels are written before repetition
> > levels.
> > > > > >
> > > > > > However, after experimenting with parquet files generated by
> pandas
> > > and
> > > > > > pyarrow and perusing the arrow source code (especially
> > > > > > InitializeLevelDecoders in
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/arrow/blob/main/cpp/src/parquet/column_reader.cc
> > > > > ),
> > > > > > I strongly believe that repetition levels are written before
> > > definition
> > > > > > levels. I also found this other documentation of parquet format
> > that
> > > > has
> > > > > > repetition levels before definition levels
> > > > > > https://github.com/apache/parquet-format.
> > > > > >
> > > > > > The content of the parquet.apache.org/docs site appears to be
> > > tracked
> > > > on
> > > > > > Github under https://github.com/apache/parquet-site. Is the
> > > > > documentation
> > > > > > content still being actively updated? Has there been an effort to
> > > > > > synchronize the format descriptions under apache/parquet-site
> with
> > > > those
> > > > > > under apache/parquet-format?
> > > > > >
> > > > > > Kind regards
> > > > > >
> > > > > > Kaili
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to