Hello Vinoo, Can you please share a link where it says Parquet V2 is not official or not stable for use by third parties ?
On Wed, Apr 24, 2024 at 11:28 AM Vinoo Ganesh <[email protected]> wrote: > Hi Prem, Wes' comment on the thread you posted on the arrow dev list > should clear up your confusion: > https://lists.apache.org/thread/72qwr66wf3xyrl5cozgojz88ct23qzxx. There > is a difference between the "standard" itself (parquet-format) and the > implementation (parquet-mr, etc...). > > Parquet-format (https://github.com/apache/parquet-format) contains mostly > just the docs and thrift definition now that a PR to clean up the > remaining deprecated code was just merged. Releases of this just format, > which again, is mostly just docs, is what Gang was referring to in [2]. > > We just started conversations about how a Parquet 2.0 release may look in > the meeting yesterday. As these conversations progress, the dev list will > be kept updated. > > > On Wed, Apr 24, 2024 at 11:10 AM Prem Sahoo <[email protected]> wrote: > >> Hello Vinoo/Team, >> As per pyarrow Team , They don't see any concern , please check below. >> Please let us know *where it says Parquet V2 is not official * >> >> "> *As per Apache Parquet Community Parquet V2 is not final yet so it is >> not >> > official . They are advising not to use Parquet V2 for writing (though >> code >> > is available ) .* >> >> This would be news to me. Parquet releases are listed (by the parquet >> community) at [1] >> >> The vote to release parquet 2.10 is here: [2] >> >> >> *Neither of these links mention anything about this being an >> experimental,unofficial, or non-finalized release.* >> >> I understand your concern. I believe your quotes are coming from your >> discussion on the parquet mailing list here [3]. This communication is >> unfortunate and confusing to me as well. >> >> [1] https://parquet.apache.org/blog/ >> [2] https://lists.apache.org/thread/fdf1zz0f3xzz5zpvo6c811xjswhm1zy6 >> [3] https://lists.apache.org/thread/4nzroc68czwxnp0ndqz15kp1vhcd7vg3" >> >> >> On Mon, Apr 22, 2024 at 4:56 PM Prem Sahoo <[email protected]> wrote: >> >>> Hello Vinoo/Team,. >>> I was going through pyarrow and they have started using V2 as default . >>> isn't it they should avoid it as it is not official. >>> >>> >>> https://arrow.apache.org/docs/python/generated/pyarrow.parquet.write_table.html#pyarrow.parquet.write_table >>> >>> version{“1.0”, “2.4”, “2.6”}, default “2.6” >>> >>> Determine which Parquet logical types are available for use, whether the >>> reduced set from the Parquet 1.x.x format or the expanded logical types >>> added in later format versions. Files written with version=’2.4’ or ‘2.6’ >>> may not be readable in all Parquet implementations, so version=’1.0’ is >>> likely the choice that maximizes file compatibility. UINT32 and some >>> logical types are only available with version ‘2.4’. Nanosecond timestamps >>> are only available with version ‘2.6’. Other features such as compression >>> algorithms or the new serialized data page format must be enabled >>> separately (see ‘compression’ and ‘data_page_version’). >>> >>
