> > A "vendor" encoding would also allow candidate encodings to be shared > accross the ecosystem before they are eventually enchristened as regular > encodings in the Thrift metadata.
I'm not a huge fan of this for two reasons: 1. I think it makes it much more complicated for end-users to get support if they happen to have a file with a custom encoding. There are already enough rough edges in compatibility between implementations that this gives another degree of freedom where things could break. 2. From a software supply chain perspective I think this makes Parquet a lot riskier if it is going to arbitrarily load/invoke code from potentially unknown sources. On Wed, May 29, 2024 at 12:15 AM Antoine Pitrou <[email protected]> wrote: > > I'm not sure how people are envisioning 2) (pluggable encodings) to be > concretely represented in Thrift data, but perhaps an easy alternative > is to add a "vendor" encoding that would be described by a (name, > parameters) pair of arbitrary strings. > > A "vendor" encoding would also allow candidate encodings to be shared > accross the ecosystem before they are eventually enchristened as regular > encodings in the Thrift metadata. > > Finally, I agree that allowing for pluggable encodings will not > reduce the burden for implementors who want to support a given encoding. > > Regards > > Antoine. > > > On Wed, 29 May 2024 09:57:47 +0800 > Gang Wu <[email protected]> wrote: > > I'm supportive of most of the points in this thread. > > > > For 2), making encodings pluggable does not eliminate the work on > > implementation and interoperability. If people are worried about the > > lengthy process to promote a new encoding to the spec, perhaps we > > can preserve an encoding type for each new candidate in the spec > > at its early stage and then officially add or remove it once the idea > > gets mature. > > > > Best, > > Gang > > > > On Wed, May 29, 2024 at 1:37 AM Micah Kornfield <[email protected]> > > wrote: > > > > > As a follow-up to the "V3" Discussions [1][2] there were some open > > > questions around extensibility and how it might be handled, so that > readers > > > could determine if they supported the necessary features. > > > > > > I think the areas discussed are: > > > 1. New encodings (In spec) > > > 2. Pluggable encodings > > > 3. Extensible logical types. > > > 4. New/additional metadata information in footer. > > > > > > For 1) these are already handled by existing mechanisms at the column > level > > > (based on page encodings in column metadata). > > > For 2) the consensus I inferred from PMC members that commented on the > doc > > > is that in general this was not a direction we wanted to take (I also > > > concur with this sentiment). But if people want to make a more public > > > argument on why it should be considered we can do it on the ML to make > it > > > official > > > For 3) Antoine started a new thread on this [3] > > > For 4) I think any new footer will have a bitmap that will handle > changes > > > and extensibility will likely be limited here. > > > > > > If this doesn't cover the use-cases people were thinking of this would > be a > > > good place to bring it up. > > > > > > Thanks, > > > Micah > > > > > > > > > [1] https://lists.apache.org/thread/5jyhzkwyrjk9z52g0b49g31ygnz73gxo > > > [2] > > > > > > > https://docs.google.com/document/d/19hQLYcU5_r5nJB7GtnjfODLlSDiNS24GXAtKg9b0_ls/edit > > > [3] https://lists.apache.org/thread/9xo3mp4n23p8psmrhso5t9q899vxwfjt > > > > > > > > >
