That's not something I can speak to unfortunately. I'd recommend reaching
out to the Dremio team to ask them.


<[email protected]>


On Mon, Apr 22, 2024 at 10:44 AM Prem Sahoo <[email protected]> wrote:

> Just a quick question May I know why the other tools/softwares started
> using this version , when it is not official yet :(
> Sent from my iPhone
>
> On Apr 22, 2024, at 8:12 AM, Prem Sahoo <[email protected]> wrote:
>
> Thank you Vinoo , will check internally do we really need this atm with
> so many caution.
> Sent from my iPhone
>
> On Apr 21, 2024, at 9:24 PM, Vinoo Ganesh <[email protected]> wrote:
>
> 
> If you still want to write your production Parquet V2, which again, is not
> a finalized format and is therefore NOT recommended, you can override
> parquet.writer.version (
> https://github.com/apache/parquet-mr/blob/f51ed41ded4d91c18fc4eaa827664bc3a02b18f3/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetOutputFormat.java#L142)
> in the latest parquet-mr, and set it to output V2 format.
>
> Again, both the Spark dev list and Parquet dev list have warned against
> this, so I'd recommend you proceed with caution.
>
>
>
> <[email protected]>
>
>
> On Sun, Apr 21, 2024 at 6:50 PM Prem Sahoo <[email protected]> wrote:
>
>> Hello Team,
>> Do you have any clue in which version of parquet-mr jar Parquet V2
>> encoding code  is available ?
>>
>> On Sun, Apr 21, 2024 at 6:21 PM Prem Sahoo <[email protected]> wrote:
>>
>>> Thanks Vinoo for the valuable information .
>>>
>>> On Sat, Apr 20, 2024 at 5:07 PM Vinoo Ganesh <[email protected]>
>>> wrote:
>>>
>>>> Hi Prem - Maybe I can help clarify to the best of my knowledge. Parquet
>>>> V2
>>>> as a standard isn't finalized just yet. Meaning there is no formal,
>>>> *finalized* "contract" that specifies what it means to write data in
>>>> the V2
>>>> version. The discussions/conversations about what the final V2 standard
>>>> may
>>>> be are still in progress and are evolving.
>>>>
>>>> That being said, because V2 code does exist (though unfinalized), there
>>>> are
>>>> clients / tools that are writing data in the un-finalized V2 format, as
>>>> seems to be the case with Dremio.
>>>>
>>>> Now, as that comment you quoted said, you can have Spark write V2 files,
>>>> but it's worth being mindful about the fact that V2 is a moving target
>>>> and
>>>> can (and likely will) change. You can overwrite parquet.writer.version
>>>> to
>>>> specify your desired version, but it can be dangerous to produce data
>>>> in a
>>>> moving-target format. For example, let's say you write a bunch of data
>>>> in
>>>> Parquet V2, and then the community decides to make a breaking change
>>>> (which
>>>> is completely fine / allowed since V2 isn't finalized). You are now left
>>>> having to deal with a potentially large and complicated file format
>>>> update.
>>>> That's why it's not recommended to write files in parquet v2 just yet.
>>>>
>>>>
>>>>
>>>> <[email protected]>
>>>>
>>>>
>>>> On Wed, Apr 17, 2024 at 3:47 PM Prem Sahoo <[email protected]>
>>>> wrote:
>>>>
>>>> > Hello Team,
>>>> > I am working on different products such as Spark and Dremio.
>>>> >
>>>> > Dremio is able to write and read Parquet V2 and due this upgrade it is
>>>> > working faster than Parquet V1 files.
>>>> >
>>>> > In case of spark it is still defaulting to Parquet V1 and when I
>>>> > checked with Spark community they told me Parquet community isn't
>>>> > recommending Parquet V2.
>>>> >
>>>> > "Prem, as I said earlier, v2 is not a finalized spec so you should
>>>> not use
>>>> > it. That's why it is not the default. You can get Spark to write v2
>>>> files,
>>>> > but it isn't recommended by the Parquet community."
>>>> >
>>>> > please advise.
>>>> >
>>>>
>>>

Reply via email to