Yes, that sounds good to me. Thanks! Gang
On Tue, Mar 5, 2024 at 10:08 AM Vinoo Ganesh <vinoo.gan...@gmail.com> wrote: > Hi Gang - We could embed the README.md on parquet-format as an iframe on > the docsy website (something better than just a url link). It could also be > easy just to link. The other option, which seems to be what iceberg does ( > https://iceberg.apache.org/docs/latest/ or > https://iceberg.apache.org/docs/1.4.3/) is to actually version the entire > set of docs and tie it to a version of parquet-mr or parquet-format. I've > mostly treated releases as blog posts for now > https://parquet.apache.org/blog/, but if that's not the best way to handle > versioned docs, we can explore adopting Iceberg's model. > <vinoo.gan...@gmail.com> > > > On Mon, Mar 4, 2024 at 8:50 PM Gang Wu <ust...@gmail.com> wrote: > > > Hi Vinoo, > > > > Thanks for the reply! How do you want to embed the README? > > Linking it to the parquet-format repo or just copying the whole content? > > IMO we might need to make it clear to the users that they know what > > version of the format they are looking at. Therefore linking to the > > format repo (and maybe add different versions as well) sounds much > > better to me. > > > > Best, > > Gang > > > > On Tue, Mar 5, 2024 at 3:18 AM Vinoo Ganesh <vinoo.gan...@gmail.com> > > wrote: > > > > > Hi All - Sorry I missed this email chain. I've been mostly responsible > > > for building the infrastructure around the new parquet-site website, > but > > > have mostly left the existing content alone. I'm happy to just link to > > the > > > parquet-format repo, but that would mean the content is no longer > > > searchable from the website, and users would have to first find the > link > > to > > > the parquet-format repo from the docs and then navigate there. > > > > > > I could just embed the parquet-format README in an iframe on the spec > > docs. > > > Alternatively, as part of the release actions, we can add a task that > > opens > > > an issue on parquet-site for update. > > > > > > Do people have thoughts / opinions on these two? > > > > > > On Thu, Jan 18, 2024 at 1:33 PM Kaili Zhang <kaili...@hotmail.com> > > wrote: > > > > > > > Hi Gabor > > > > > > > > I am OK with that. As long as the information is up-to-date, whatever > > > > method most convenient for the devs will do. > > > > > > > > Kind regards > > > > > > > > Kaili > > > > > > > > ________________________________ > > > > From: Gábor Szádovszky <ga...@apache.org> > > > > Sent: Monday, January 15, 2024 12:25:39 AM > > > > To: dev@parquet.apache.org <dev@parquet.apache.org> > > > > Subject: Re: Discrepancy in parquet format documentation > > > > > > > > Hey Gang, Kaili, > > > > > > > > I think the easiest way to solve this issue is to completely remove > the > > > > spec from the site and add a reference to the parquet-format repo > > > instead. > > > > We should probably add the release tag links when we make a release > of > > > > parquet-format with a "latest" link. This way we would also avoid > > > potential > > > > issues when someone would make decisions based on un-released spec > > > changes. > > > > > > > > Cheers, > > > > Gabor > > > > > > > > Kaili Zhang <kaili...@hotmail.com> ezt írta (időpont: 2024. jan. > 13., > > > Szo, > > > > 20:53): > > > > > > > > > Hi Gang > > > > > > > > > > Thank you for looking into this. Updating the description on > > > > > parquet.apache.org will save everyone searching for this > > information a > > > > > few hours of head scratching. It is unfortunate that the slightly > > > > > out-of-date spec features more prominently in Google results. > > > > > > > > > > Kind regards > > > > > > > > > > Kaili > > > > > ________________________________ > > > > > From: Gang Wu <ust...@gmail.com> > > > > > Sent: Tuesday, January 9, 2024 5:56 PM > > > > > To: dev@parquet.apache.org <dev@parquet.apache.org> > > > > > Subject: Re: Discrepancy in parquet format documentation > > > > > > > > > > Hi Kaili, > > > > > > > > > > You're right. Please refer to the parquet-format repo for specs. > The > > > site > > > > > is unfortunately out of sync for a long time and there isn't any > > > > automatic > > > > > process to update it. Let me update the site manually to be in sync > > > with > > > > > the latest format release. > > > > > > > > > > Best, > > > > > Gang > > > > > > > > > > On Sun, Jan 7, 2024 at 8:03 AM Kaili Zhang <kaili...@hotmail.com> > > > wrote: > > > > > > > > > > > Hi all > > > > > > > > > > > > I found this page via Google when searching for a description of > > the > > > > > > parquet binary format: > > > > > > https://parquet.apache.org/docs/file-format/data-pages/. This > page > > > > > > suggests that definition levels are written before repetition > > levels. > > > > > > > > > > > > However, after experimenting with parquet files generated by > pandas > > > and > > > > > > pyarrow and perusing the arrow source code (especially > > > > > > InitializeLevelDecoders in > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/arrow/blob/main/cpp/src/parquet/column_reader.cc > > > > > ), > > > > > > I strongly believe that repetition levels are written before > > > definition > > > > > > levels. I also found this other documentation of parquet format > > that > > > > has > > > > > > repetition levels before definition levels > > > > > > https://github.com/apache/parquet-format. > > > > > > > > > > > > The content of the parquet.apache.org/docs site appears to be > > > tracked > > > > on > > > > > > Github under https://github.com/apache/parquet-site. Is the > > > > > documentation > > > > > > content still being actively updated? Has there been an effort to > > > > > > synchronize the format descriptions under apache/parquet-site > with > > > > those > > > > > > under apache/parquet-format? > > > > > > > > > > > > Kind regards > > > > > > > > > > > > Kaili > > > > > > > > > > > > > > > > > > > > > > > > > > >