Hello Gang/Team,
Thanks for your reply.
As per your suggestion there is none to differentiate if the Parquet is
written thru V2 or V1 which is very confusing .
We should have some flag or tag which differentiates Parquet written in V1
or V2. While reading if the engine doesn't support V2 reading then we are
sure we shouldn't feed V2 Parquet.

Now few Tools/products are using Parquet V2 for both reading & writing but*
Apache Spark is not supporting write through V2 encoding as per Parquet
community V2 is not final yet*.

Do we have any date when the parquet-mr jar will have Parquet V2 writing
functionality so that Spark can adhere to it.

On Wed, Apr 24, 2024 at 1:28 AM Gang Wu <[email protected]> wrote:

> As I have said in another thread, Parquet V2 is a concept which contains
> a lot of features. FWIW, what are defined in the specs [1] are finalized
> and
> some of them have been implemented in various implementations. Any file
> that contains one or more of those features can be considered v2 but the
> community has never defined a formal approach to distinguish between
> v1 and v2. Parquet does have a field in the footer thrift definition to
> mark
> the file version [2]. However, not all implementations populate it
> correctly and
> some engines will even throw if the version is not 1. To avoid confusion, I
> strongly suggest not using any v2 feature in your case unless you are 100%
> sure that all your tools support the v2 feature set you have enabled.
>
> [1] https://github.com/apache/parquet-format/blob/master/CHANGES.md
> [2]
>
> https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L1111
>
> Best,
> Gang
>
> On Wed, Apr 24, 2024 at 10:29 AM Prem Sahoo <[email protected]> wrote:
>
> > Any one please shed some light on this ?
> > Sent from my iPhone
> >
> > > On Apr 23, 2024, at 4:30 PM, Prem Sahoo <[email protected]> wrote:
> > >
> > > Hello Team,
> > > How to find out if the Parquet file is V1 or V2 ?
> > >
> > > Do we have any tag/identifier which can say a Parquet file is created
> > thru V2 or V1 ?
> > >
> > > Is there any specific properties need to be there then only that
> parquet
> > can be written in Parquet V2?
> > > Sent from my iPhone
> >
>

Reply via email to