Hello Vinoo,
Thanks for your assistance . Pyarrow folks are using Parquet V2 though it
is not recommended . I don't want to make any mess so I am just checking
with all different groups .

On Wed, Apr 24, 2024 at 12:31 PM Vinoo Ganesh <[email protected]>
wrote:

> I'm not sure what you're looking for. A few different folks (Ryan/Steve on
> the Spark list, Wes on the Arrow list, and Gang/me on the Parquet list)
> have said that they wouldn't recommend using the Parquet V2 encodings, but
> you're free to do whatever you want in your own data stack, as are the
> clients who are using Parquet V2. Again, I (and others) personally wouldn't
> recommend storing production data in an unstable format, and that's the
> reason we are warning against it.
>
> On Wed, Apr 24, 2024 at 11:47 AM Prem Sahoo <[email protected]> wrote:
>
>> Hello Vinoo,
>> Can you please share a link where it says Parquet V2 is not official or
>> not stable for use by third parties ?
>>
>>
>> On Wed, Apr 24, 2024 at 11:28 AM Vinoo Ganesh <[email protected]>
>> wrote:
>>
>>> Hi Prem, Wes' comment on the thread you posted on the arrow dev list
>>> should clear up your confusion:
>>> https://lists.apache.org/thread/72qwr66wf3xyrl5cozgojz88ct23qzxx. There
>>> is a difference between the "standard" itself (parquet-format) and the
>>> implementation (parquet-mr, etc...).
>>>
>>> Parquet-format (https://github.com/apache/parquet-format) contains
>>> mostly just the docs and thrift definition now that a PR to clean up the
>>> remaining deprecated code was just merged. Releases of this just format,
>>> which again, is mostly just docs, is what Gang was referring to in [2].
>>>
>>> We just started conversations about how a Parquet 2.0 release may look
>>> in the meeting yesterday. As these conversations progress, the dev list
>>> will be kept updated.
>>>
>>>
>>> On Wed, Apr 24, 2024 at 11:10 AM Prem Sahoo <[email protected]>
>>> wrote:
>>>
>>>> Hello Vinoo/Team,
>>>> As per pyarrow Team , They  don't see any concern , please check below.
>>>> Please let us know *where it says Parquet V2 is not official *
>>>>
>>>> "> *As per Apache Parquet Community Parquet V2 is not final yet so it
>>>> is not
>>>> > official . They are advising not to use Parquet V2 for writing (though
>>>> code
>>>> > is available ) .*
>>>>
>>>> This would be news to me.  Parquet releases are listed (by the parquet
>>>> community) at [1]
>>>>
>>>> The vote to release parquet 2.10 is here: [2]
>>>>
>>>>
>>>> *Neither of these links mention anything about this being an
>>>> experimental,unofficial, or non-finalized release.*
>>>>
>>>> I understand your concern.  I believe your quotes are coming from your
>>>> discussion on the parquet mailing list here [3].  This communication is
>>>> unfortunate and confusing to me as well.
>>>>
>>>> [1] https://parquet.apache.org/blog/
>>>> [2] https://lists.apache.org/thread/fdf1zz0f3xzz5zpvo6c811xjswhm1zy6
>>>> [3] https://lists.apache.org/thread/4nzroc68czwxnp0ndqz15kp1vhcd7vg3";
>>>>
>>>>
>>>> On Mon, Apr 22, 2024 at 4:56 PM Prem Sahoo <[email protected]>
>>>> wrote:
>>>>
>>>>> Hello Vinoo/Team,.
>>>>> I was going through pyarrow and they have started using V2 as default
>>>>> . isn't it they should avoid it as it is not official.
>>>>>
>>>>>
>>>>> https://arrow.apache.org/docs/python/generated/pyarrow.parquet.write_table.html#pyarrow.parquet.write_table
>>>>>
>>>>> version{“1.0”, “2.4”, “2.6”}, default “2.6”
>>>>>
>>>>> Determine which Parquet logical types are available for use, whether
>>>>> the reduced set from the Parquet 1.x.x format or the expanded logical 
>>>>> types
>>>>> added in later format versions. Files written with version=’2.4’ or ‘2.6’
>>>>> may not be readable in all Parquet implementations, so version=’1.0’ is
>>>>> likely the choice that maximizes file compatibility. UINT32 and some
>>>>> logical types are only available with version ‘2.4’. Nanosecond timestamps
>>>>> are only available with version ‘2.6’. Other features such as compression
>>>>> algorithms or the new serialized data page format must be enabled
>>>>> separately (see ‘compression’ and ‘data_page_version’).
>>>>>
>>>>

Reply via email to