Hello Vinoo, Thanks for your assistance . Pyarrow folks are using Parquet V2 though it is not recommended . I don't want to make any mess so I am just checking with all different groups .
On Wed, Apr 24, 2024 at 12:31 PM Vinoo Ganesh <[email protected]> wrote: > I'm not sure what you're looking for. A few different folks (Ryan/Steve on > the Spark list, Wes on the Arrow list, and Gang/me on the Parquet list) > have said that they wouldn't recommend using the Parquet V2 encodings, but > you're free to do whatever you want in your own data stack, as are the > clients who are using Parquet V2. Again, I (and others) personally wouldn't > recommend storing production data in an unstable format, and that's the > reason we are warning against it. > > On Wed, Apr 24, 2024 at 11:47 AM Prem Sahoo <[email protected]> wrote: > >> Hello Vinoo, >> Can you please share a link where it says Parquet V2 is not official or >> not stable for use by third parties ? >> >> >> On Wed, Apr 24, 2024 at 11:28 AM Vinoo Ganesh <[email protected]> >> wrote: >> >>> Hi Prem, Wes' comment on the thread you posted on the arrow dev list >>> should clear up your confusion: >>> https://lists.apache.org/thread/72qwr66wf3xyrl5cozgojz88ct23qzxx. There >>> is a difference between the "standard" itself (parquet-format) and the >>> implementation (parquet-mr, etc...). >>> >>> Parquet-format (https://github.com/apache/parquet-format) contains >>> mostly just the docs and thrift definition now that a PR to clean up the >>> remaining deprecated code was just merged. Releases of this just format, >>> which again, is mostly just docs, is what Gang was referring to in [2]. >>> >>> We just started conversations about how a Parquet 2.0 release may look >>> in the meeting yesterday. As these conversations progress, the dev list >>> will be kept updated. >>> >>> >>> On Wed, Apr 24, 2024 at 11:10 AM Prem Sahoo <[email protected]> >>> wrote: >>> >>>> Hello Vinoo/Team, >>>> As per pyarrow Team , They don't see any concern , please check below. >>>> Please let us know *where it says Parquet V2 is not official * >>>> >>>> "> *As per Apache Parquet Community Parquet V2 is not final yet so it >>>> is not >>>> > official . They are advising not to use Parquet V2 for writing (though >>>> code >>>> > is available ) .* >>>> >>>> This would be news to me. Parquet releases are listed (by the parquet >>>> community) at [1] >>>> >>>> The vote to release parquet 2.10 is here: [2] >>>> >>>> >>>> *Neither of these links mention anything about this being an >>>> experimental,unofficial, or non-finalized release.* >>>> >>>> I understand your concern. I believe your quotes are coming from your >>>> discussion on the parquet mailing list here [3]. This communication is >>>> unfortunate and confusing to me as well. >>>> >>>> [1] https://parquet.apache.org/blog/ >>>> [2] https://lists.apache.org/thread/fdf1zz0f3xzz5zpvo6c811xjswhm1zy6 >>>> [3] https://lists.apache.org/thread/4nzroc68czwxnp0ndqz15kp1vhcd7vg3" >>>> >>>> >>>> On Mon, Apr 22, 2024 at 4:56 PM Prem Sahoo <[email protected]> >>>> wrote: >>>> >>>>> Hello Vinoo/Team,. >>>>> I was going through pyarrow and they have started using V2 as default >>>>> . isn't it they should avoid it as it is not official. >>>>> >>>>> >>>>> https://arrow.apache.org/docs/python/generated/pyarrow.parquet.write_table.html#pyarrow.parquet.write_table >>>>> >>>>> version{“1.0”, “2.4”, “2.6”}, default “2.6” >>>>> >>>>> Determine which Parquet logical types are available for use, whether >>>>> the reduced set from the Parquet 1.x.x format or the expanded logical >>>>> types >>>>> added in later format versions. Files written with version=’2.4’ or ‘2.6’ >>>>> may not be readable in all Parquet implementations, so version=’1.0’ is >>>>> likely the choice that maximizes file compatibility. UINT32 and some >>>>> logical types are only available with version ‘2.4’. Nanosecond timestamps >>>>> are only available with version ‘2.6’. Other features such as compression >>>>> algorithms or the new serialized data page format must be enabled >>>>> separately (see ‘compression’ and ‘data_page_version’). >>>>> >>>>
