That's not something I can speak to unfortunately. I'd recommend reaching out to the Dremio team to ask them.
<[email protected]> On Mon, Apr 22, 2024 at 10:44 AM Prem Sahoo <[email protected]> wrote: > Just a quick question May I know why the other tools/softwares started > using this version , when it is not official yet :( > Sent from my iPhone > > On Apr 22, 2024, at 8:12 AM, Prem Sahoo <[email protected]> wrote: > > Thank you Vinoo , will check internally do we really need this atm with > so many caution. > Sent from my iPhone > > On Apr 21, 2024, at 9:24 PM, Vinoo Ganesh <[email protected]> wrote: > > > If you still want to write your production Parquet V2, which again, is not > a finalized format and is therefore NOT recommended, you can override > parquet.writer.version ( > https://github.com/apache/parquet-mr/blob/f51ed41ded4d91c18fc4eaa827664bc3a02b18f3/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetOutputFormat.java#L142) > in the latest parquet-mr, and set it to output V2 format. > > Again, both the Spark dev list and Parquet dev list have warned against > this, so I'd recommend you proceed with caution. > > > > <[email protected]> > > > On Sun, Apr 21, 2024 at 6:50 PM Prem Sahoo <[email protected]> wrote: > >> Hello Team, >> Do you have any clue in which version of parquet-mr jar Parquet V2 >> encoding code is available ? >> >> On Sun, Apr 21, 2024 at 6:21 PM Prem Sahoo <[email protected]> wrote: >> >>> Thanks Vinoo for the valuable information . >>> >>> On Sat, Apr 20, 2024 at 5:07 PM Vinoo Ganesh <[email protected]> >>> wrote: >>> >>>> Hi Prem - Maybe I can help clarify to the best of my knowledge. Parquet >>>> V2 >>>> as a standard isn't finalized just yet. Meaning there is no formal, >>>> *finalized* "contract" that specifies what it means to write data in >>>> the V2 >>>> version. The discussions/conversations about what the final V2 standard >>>> may >>>> be are still in progress and are evolving. >>>> >>>> That being said, because V2 code does exist (though unfinalized), there >>>> are >>>> clients / tools that are writing data in the un-finalized V2 format, as >>>> seems to be the case with Dremio. >>>> >>>> Now, as that comment you quoted said, you can have Spark write V2 files, >>>> but it's worth being mindful about the fact that V2 is a moving target >>>> and >>>> can (and likely will) change. You can overwrite parquet.writer.version >>>> to >>>> specify your desired version, but it can be dangerous to produce data >>>> in a >>>> moving-target format. For example, let's say you write a bunch of data >>>> in >>>> Parquet V2, and then the community decides to make a breaking change >>>> (which >>>> is completely fine / allowed since V2 isn't finalized). You are now left >>>> having to deal with a potentially large and complicated file format >>>> update. >>>> That's why it's not recommended to write files in parquet v2 just yet. >>>> >>>> >>>> >>>> <[email protected]> >>>> >>>> >>>> On Wed, Apr 17, 2024 at 3:47 PM Prem Sahoo <[email protected]> >>>> wrote: >>>> >>>> > Hello Team, >>>> > I am working on different products such as Spark and Dremio. >>>> > >>>> > Dremio is able to write and read Parquet V2 and due this upgrade it is >>>> > working faster than Parquet V1 files. >>>> > >>>> > In case of spark it is still defaulting to Parquet V1 and when I >>>> > checked with Spark community they told me Parquet community isn't >>>> > recommending Parquet V2. >>>> > >>>> > "Prem, as I said earlier, v2 is not a finalized spec so you should >>>> not use >>>> > it. That's why it is not the default. You can get Spark to write v2 >>>> files, >>>> > but it isn't recommended by the Parquet community." >>>> > >>>> > please advise. >>>> > >>>> >>>
