Re: [DISCUSS] Extensibility of Parquet

Micah Kornfield Thu, 30 May 2024 00:07:54 -0700

>
> A "vendor" encoding would also allow candidate encodings to be shared
> accross the ecosystem before they are eventually enchristened as regular
> encodings in the Thrift metadata.



I'm not a huge fan of this for two reasons:
1.  I think it makes it much more complicated for end-users to get support
if they happen to have a file with a custom encoding.  There are already
enough rough edges in compatibility between implementations that this gives
another degree of freedom where things could break.
2.  From a software supply chain perspective I think this makes Parquet a
lot riskier if it is going to arbitrarily load/invoke code from potentially
unknown sources.



On Wed, May 29, 2024 at 12:15 AM Antoine Pitrou <[email protected]> wrote:

>
> I'm not sure how people are envisioning 2) (pluggable encodings) to be
> concretely represented in Thrift data, but perhaps an easy alternative
> is to add a "vendor" encoding that would be described by a (name,
> parameters) pair of arbitrary strings.
>
> A "vendor" encoding would also allow candidate encodings to be shared
> accross the ecosystem before they are eventually enchristened as regular
> encodings in the Thrift metadata.
>
> Finally, I agree that allowing for pluggable encodings will not
> reduce the burden for implementors who want to support a given encoding.
>
> Regards
>
> Antoine.
>
>
> On Wed, 29 May 2024 09:57:47 +0800
> Gang Wu <[email protected]> wrote:
> > I'm supportive of most of the points in this thread.
> >
> > For 2), making encodings pluggable does not eliminate the work on
> > implementation and interoperability. If people are worried about the
> > lengthy process to promote a new encoding to the spec, perhaps we
> > can preserve an encoding type for each new candidate in the spec
> > at its early stage and then officially add or remove it once the idea
> > gets mature.
> >
> > Best,
> > Gang
> >
> > On Wed, May 29, 2024 at 1:37 AM Micah Kornfield <[email protected]>
> > wrote:
> >
> > > As a follow-up to the "V3" Discussions [1][2] there were some open
> > > questions around extensibility and how it might be handled, so that
> readers
> > > could determine if they supported the necessary features.
> > >
> > > I think the areas discussed are:
> > > 1.  New encodings (In spec)
> > > 2.  Pluggable encodings
> > > 3.  Extensible logical types.
> > > 4.  New/additional metadata information in footer.
> > >
> > > For 1) these are already handled by existing mechanisms at the column
> level
> > > (based on page encodings in column metadata).
> > > For 2) the consensus I inferred from PMC members that commented on the
> doc
> > > is that in general this was not a direction we wanted to take (I also
> > > concur with this sentiment). But if people want to make a more public
> > > argument on why it should be considered we can do it on the ML to make
> it
> > > official
> > > For 3) Antoine started a new thread on this [3]
> > > For 4) I think any new footer will have a bitmap that will handle
> changes
> > > and extensibility will likely be limited here.
> > >
> > > If this doesn't cover the use-cases people were thinking of this would
> be a
> > > good place to bring it up.
> > >
> > > Thanks,
> > > Micah
> > >
> > >
> > > [1] https://lists.apache.org/thread/5jyhzkwyrjk9z52g0b49g31ygnz73gxo
> > > [2]
> > >
> > >
> https://docs.google.com/document/d/19hQLYcU5_r5nJB7GtnjfODLlSDiNS24GXAtKg9b0_ls/edit
> > > [3] https://lists.apache.org/thread/9xo3mp4n23p8psmrhso5t9q899vxwfjt
> > >
> >
>
>
>
>

Re: [DISCUSS] Extensibility of Parquet

Reply via email to