Hi Vinoo,

Thanks for the reply! How do you want to embed the README?
Linking it to the parquet-format repo or just copying the whole content?
IMO we might need to make it clear to the users that they know what
version of the format they are looking at. Therefore linking to the
format repo (and maybe add different versions as well) sounds much
better to me.

Best,
Gang

On Tue, Mar 5, 2024 at 3:18 AM Vinoo Ganesh <[email protected]> wrote:

> Hi All - Sorry I missed this email chain. I've been mostly responsible
> for building the infrastructure around the new parquet-site website, but
> have mostly left the existing content alone. I'm happy to just link to the
> parquet-format repo, but that would mean the content is no longer
> searchable from the website, and users would have to first find the link to
> the parquet-format repo from the docs and then navigate there.
>
> I could just embed the parquet-format README in an iframe on the spec docs.
> Alternatively, as part of the release actions, we can add a task that opens
> an issue on parquet-site for update.
>
> Do people have thoughts / opinions on these two?
>
> On Thu, Jan 18, 2024 at 1:33 PM Kaili Zhang <[email protected]> wrote:
>
> > Hi Gabor
> >
> > I am OK with that. As long as the information is up-to-date, whatever
> > method most convenient for the devs will do.
> >
> > Kind regards
> >
> > Kaili
> >
> > ________________________________
> > From: Gábor Szádovszky <[email protected]>
> > Sent: Monday, January 15, 2024 12:25:39 AM
> > To: [email protected] <[email protected]>
> > Subject: Re: Discrepancy in parquet format documentation
> >
> > Hey Gang, Kaili,
> >
> > I think the easiest way to solve this issue is to completely remove the
> > spec from the site and add a reference to the parquet-format repo
> instead.
> > We should probably add the release tag links when we make a release of
> > parquet-format with a "latest" link. This way we would also avoid
> potential
> > issues when someone would make decisions based on un-released spec
> changes.
> >
> > Cheers,
> > Gabor
> >
> > Kaili Zhang <[email protected]> ezt írta (időpont: 2024. jan. 13.,
> Szo,
> > 20:53):
> >
> > > Hi Gang
> > >
> > > Thank you for looking into this. Updating the description on
> > > parquet.apache.org will save everyone searching for this information a
> > > few hours of head scratching. It is unfortunate that the slightly
> > > out-of-date spec features more prominently in Google results.
> > >
> > > Kind regards
> > >
> > > Kaili
> > > ________________________________
> > > From: Gang Wu <[email protected]>
> > > Sent: Tuesday, January 9, 2024 5:56 PM
> > > To: [email protected] <[email protected]>
> > > Subject: Re: Discrepancy in parquet format documentation
> > >
> > > Hi Kaili,
> > >
> > > You're right. Please refer to the parquet-format repo for specs. The
> site
> > > is unfortunately out of sync for a long time and there isn't any
> > automatic
> > > process to update it. Let me update the site manually to be in sync
> with
> > > the latest format release.
> > >
> > > Best,
> > > Gang
> > >
> > > On Sun, Jan 7, 2024 at 8:03 AM Kaili Zhang <[email protected]>
> wrote:
> > >
> > > > Hi all
> > > >
> > > > I found this page via Google when searching for a description of the
> > > > parquet binary format:
> > > > https://parquet.apache.org/docs/file-format/data-pages/. This page
> > > > suggests that definition levels are written before repetition levels.
> > > >
> > > > However, after experimenting with parquet files generated by pandas
> and
> > > > pyarrow and perusing the arrow source code (especially
> > > > InitializeLevelDecoders in
> > > >
> > >
> >
> https://github.com/apache/arrow/blob/main/cpp/src/parquet/column_reader.cc
> > > ),
> > > > I strongly believe that repetition levels are written before
> definition
> > > > levels. I also found this other documentation of parquet format that
> > has
> > > > repetition levels before definition levels
> > > > https://github.com/apache/parquet-format.
> > > >
> > > > The content of the parquet.apache.org/docs site appears to be
> tracked
> > on
> > > > Github under https://github.com/apache/parquet-site. Is the
> > > documentation
> > > > content still being actively updated? Has there been an effort to
> > > > synchronize the format descriptions under apache/parquet-site with
> > those
> > > > under apache/parquet-format?
> > > >
> > > > Kind regards
> > > >
> > > > Kaili
> > > >
> > > >
> > >
> >
>

Reply via email to