Regarding the high-level layer, I think it waits for a progress at
https://docs.google.com/document/d/11qz84ajysvVo5ZAV9mXKOeh6ay4-xgkBrubggCP5220/edit?usp=sharing
No activity there since last November. This is unfortunate, because Tham
has put a lot of work in coding the high-level layer (and addressing 200+
review comments) in the PR https://github.com/apache/arrow/pull/8023. The
code is functional, compatible with the Java version in parquet-mr, and can
be updated with the threading changes in the doc above. I hope all this
good work will not be wasted.

Cheers, Gidon


On Sat, Feb 13, 2021 at 6:52 AM Micah Kornfield <emkornfi...@gmail.com>
wrote:

> My thoughts:
> 1.  I've lost track of the higher level encryption implementation in C++.
> I think we were trying to come to a consensus on the threading/thread
> safety model?
>
> 2.  I'm open to exposing the lower level encryption libraries in python
> (without appropriate namespacing/communication).  It seems at least for
> reading, there is potentially less harm (I'll caveat that with I'm not a
> security expert).  Are both the low level read and write implementations
> necessary?  (it probably makes sense to have a few smaller PRs for exposing
> this functionality anyways).
>
>
>
> On Wed, Feb 10, 2021 at 7:10 AM Itamar Turner-Trauring <
> ita...@pythonspeed.com> wrote:
>
> > Hi,
> >
> > Since the PR for high-level C++ Parquet encryption API appears stalled (
> > https://github.com/apache/arrow/pull/8023), I'm looking into exposing
> the
> > low-level Parquet encryption API to Python.
> >
> > Arguments for doing this: the low-level API is all the users I'm talking
> > to need, at the moment, so it's plausible others would also find some
> > benefit in having the Pyarrow API expose low-level Parquet encryption.
> Then
> > again, it might only be this one company and no one else cares.
> >
> > The arguments against, per Gidon Gershinsky:
> >
> > >  * security: low-level encryption API is easy to misuse (eg giving the
> > same keys for a number of different files; this'd break the AES GCM
> > cipher). The high-level encryption layer handles that by applying
> envelope
> > encryption and other best practices in data security. Also, this layer is
> > maintained by the community, meaning that future improvements and
> security
> > fixes can be upstreamed by anyone, and available to all.
> > >  * compatibility: parquet-mr implements the high-level encryption
> layer.
> > If we want the files produced by Spark/Presto/etc to be readable by
> > pandas/PyArrow (and vice versa), we need to provide the Arrow users with
> > the high-level API.
> > > ...
> > >
> > > The current situation is not ideal, it'd be good to merge the
> high-level
> > PR (and maybe hide the low level), but here we are; also, C++ is a kind
> of
> > a low-level language; Python would expose it to a less experienced
> audience.
> >
> > (Source: https://issues.apache.org/jira/browse/ARROW-8040)
> >
> > I find the compatibility argument less compelling, that's readily
> > addressed by documentation. I am not a crypto expert so I can't evaluate
> > how risky exposing the low-level encryption APIs would be, but I can see
> > how that would be a significant concern.
> >
> > Some options are:
> >  * Status quo, no Python API for low-level Parquet encryption. This has
> > two possible outcomes:
> >    * Eventually high-level API gets merged, gets Python binding.
> >    * High-level encryption API is never merged, Python users never get
> > access to encryption.
> >  * Add low-level Parquet encryption API to Pyarrow, perhaps using
> "hazmat"
> > idiom used by the Python cryptography package (API namespace indicating
> > "use at your own risk, this is dangerous", basically, e.g.
> > `cryptography.hazmat.primitives.ciphers.aead.``ChaCha20Poly1305`).
> >    * Gidon Gershinsky did not find this suggestion compelling enough to
> > override his security concerns.
> >  * Low-level encryption done as third party Python package, either
> private
> > or open source. This is annoying technically, plausibly would require
> > maintaining a fork.
> > Any other ideas? Thoughts on these options?
> >
> > —Itamar
>

Reply via email to