I'm not sure what you're looking for. A few different folks (Ryan/Steve on
the Spark list, Wes on the Arrow list, and Gang/me on the Parquet list)
have said that they wouldn't recommend using the Parquet V2 encodings, but
you're free to do whatever you want in your own data stack, as are the
clients who are using Parquet V2. Again, I (and others) personally wouldn't
recommend storing production data in an unstable format, and that's the
reason we are warning against it.

On Wed, Apr 24, 2024 at 11:47 AM Prem Sahoo <[email protected]> wrote:

> Hello Vinoo,
> Can you please share a link where it says Parquet V2 is not official or
> not stable for use by third parties ?
>
>
> On Wed, Apr 24, 2024 at 11:28 AM Vinoo Ganesh <[email protected]>
> wrote:
>
>> Hi Prem, Wes' comment on the thread you posted on the arrow dev list
>> should clear up your confusion:
>> https://lists.apache.org/thread/72qwr66wf3xyrl5cozgojz88ct23qzxx. There
>> is a difference between the "standard" itself (parquet-format) and the
>> implementation (parquet-mr, etc...).
>>
>> Parquet-format (https://github.com/apache/parquet-format) contains
>> mostly just the docs and thrift definition now that a PR to clean up the
>> remaining deprecated code was just merged. Releases of this just format,
>> which again, is mostly just docs, is what Gang was referring to in [2].
>>
>> We just started conversations about how a Parquet 2.0 release may look in
>> the meeting yesterday. As these conversations progress, the dev list will
>> be kept updated.
>>
>>
>> On Wed, Apr 24, 2024 at 11:10 AM Prem Sahoo <[email protected]> wrote:
>>
>>> Hello Vinoo/Team,
>>> As per pyarrow Team , They  don't see any concern , please check below.
>>> Please let us know *where it says Parquet V2 is not official *
>>>
>>> "> *As per Apache Parquet Community Parquet V2 is not final yet so it is
>>> not
>>> > official . They are advising not to use Parquet V2 for writing (though
>>> code
>>> > is available ) .*
>>>
>>> This would be news to me.  Parquet releases are listed (by the parquet
>>> community) at [1]
>>>
>>> The vote to release parquet 2.10 is here: [2]
>>>
>>>
>>> *Neither of these links mention anything about this being an
>>> experimental,unofficial, or non-finalized release.*
>>>
>>> I understand your concern.  I believe your quotes are coming from your
>>> discussion on the parquet mailing list here [3].  This communication is
>>> unfortunate and confusing to me as well.
>>>
>>> [1] https://parquet.apache.org/blog/
>>> [2] https://lists.apache.org/thread/fdf1zz0f3xzz5zpvo6c811xjswhm1zy6
>>> [3] https://lists.apache.org/thread/4nzroc68czwxnp0ndqz15kp1vhcd7vg3";
>>>
>>>
>>> On Mon, Apr 22, 2024 at 4:56 PM Prem Sahoo <[email protected]> wrote:
>>>
>>>> Hello Vinoo/Team,.
>>>> I was going through pyarrow and they have started using V2 as default .
>>>> isn't it they should avoid it as it is not official.
>>>>
>>>>
>>>> https://arrow.apache.org/docs/python/generated/pyarrow.parquet.write_table.html#pyarrow.parquet.write_table
>>>>
>>>> version{“1.0”, “2.4”, “2.6”}, default “2.6”
>>>>
>>>> Determine which Parquet logical types are available for use, whether
>>>> the reduced set from the Parquet 1.x.x format or the expanded logical types
>>>> added in later format versions. Files written with version=’2.4’ or ‘2.6’
>>>> may not be readable in all Parquet implementations, so version=’1.0’ is
>>>> likely the choice that maximizes file compatibility. UINT32 and some
>>>> logical types are only available with version ‘2.4’. Nanosecond timestamps
>>>> are only available with version ‘2.6’. Other features such as compression
>>>> algorithms or the new serialized data page format must be enabled
>>>> separately (see ‘compression’ and ‘data_page_version’).
>>>>
>>>

Reply via email to