I think some of the comments might be conflicting.  One of the concerns
(that I would need to refresh myself on to offer an opinion which was
covered in Ben's doc) was the threading model we expect in the library.

On Tue, Feb 16, 2021 at 8:03 AM Antoine Pitrou <anto...@python.org> wrote:

>
> Hi Gidon,
>
> Le 16/02/2021 à 16:42, Gidon Gershinsky a écrit :
> > Regarding the high-level layer, I think it waits for a progress at
> >
> https://docs.google.com/document/d/11qz84ajysvVo5ZAV9mXKOeh6ay4-xgkBrubggCP5220/edit?usp=sharing
> > No activity there since last November. This is unfortunate, because Tham
> > has put a lot of work in coding the high-level layer (and addressing 200+
> > review comments) in the PR https://github.com/apache/arrow/pull/8023.
> The
> > code is functional, compatible with the Java version in parquet-mr, and
> can
> > be updated with the threading changes in the doc above. I hope all this
> > good work will not be wasted.
>
> I'm sorry for the possibly frustrating process.  Looking at the PR,
> though, it seems a bunch of comments were not addressed.  Is it possible
> to go through them and ensure they get an answer and/or a resolution?
>
> Best regards
>
> Antoine.
>
>
>
> >
> > Cheers, Gidon
> >
> >
> > On Sat, Feb 13, 2021 at 6:52 AM Micah Kornfield <emkornfi...@gmail.com>
> > wrote:
> >
> >> My thoughts:
> >> 1.  I've lost track of the higher level encryption implementation in
> C++.
> >> I think we were trying to come to a consensus on the threading/thread
> >> safety model?
> >>
> >> 2.  I'm open to exposing the lower level encryption libraries in python
> >> (without appropriate namespacing/communication).  It seems at least for
> >> reading, there is potentially less harm (I'll caveat that with I'm not a
> >> security expert).  Are both the low level read and write implementations
> >> necessary?  (it probably makes sense to have a few smaller PRs for
> exposing
> >> this functionality anyways).
> >>
> >>
> >>
> >> On Wed, Feb 10, 2021 at 7:10 AM Itamar Turner-Trauring <
> >> ita...@pythonspeed.com> wrote:
> >>
> >>> Hi,
> >>>
> >>> Since the PR for high-level C++ Parquet encryption API appears stalled
> (
> >>> https://github.com/apache/arrow/pull/8023), I'm looking into exposing
> >> the
> >>> low-level Parquet encryption API to Python.
> >>>
> >>> Arguments for doing this: the low-level API is all the users I'm
> talking
> >>> to need, at the moment, so it's plausible others would also find some
> >>> benefit in having the Pyarrow API expose low-level Parquet encryption.
> >> Then
> >>> again, it might only be this one company and no one else cares.
> >>>
> >>> The arguments against, per Gidon Gershinsky:
> >>>
> >>>>  * security: low-level encryption API is easy to misuse (eg giving the
> >>> same keys for a number of different files; this'd break the AES GCM
> >>> cipher). The high-level encryption layer handles that by applying
> >> envelope
> >>> encryption and other best practices in data security. Also, this layer
> is
> >>> maintained by the community, meaning that future improvements and
> >> security
> >>> fixes can be upstreamed by anyone, and available to all.
> >>>>  * compatibility: parquet-mr implements the high-level encryption
> >> layer.
> >>> If we want the files produced by Spark/Presto/etc to be readable by
> >>> pandas/PyArrow (and vice versa), we need to provide the Arrow users
> with
> >>> the high-level API.
> >>>> ...
> >>>>
> >>>> The current situation is not ideal, it'd be good to merge the
> >> high-level
> >>> PR (and maybe hide the low level), but here we are; also, C++ is a kind
> >> of
> >>> a low-level language; Python would expose it to a less experienced
> >> audience.
> >>>
> >>> (Source: https://issues.apache.org/jira/browse/ARROW-8040)
> >>>
> >>> I find the compatibility argument less compelling, that's readily
> >>> addressed by documentation. I am not a crypto expert so I can't
> evaluate
> >>> how risky exposing the low-level encryption APIs would be, but I can
> see
> >>> how that would be a significant concern.
> >>>
> >>> Some options are:
> >>>  * Status quo, no Python API for low-level Parquet encryption. This has
> >>> two possible outcomes:
> >>>    * Eventually high-level API gets merged, gets Python binding.
> >>>    * High-level encryption API is never merged, Python users never get
> >>> access to encryption.
> >>>  * Add low-level Parquet encryption API to Pyarrow, perhaps using
> >> "hazmat"
> >>> idiom used by the Python cryptography package (API namespace indicating
> >>> "use at your own risk, this is dangerous", basically, e.g.
> >>> `cryptography.hazmat.primitives.ciphers.aead.``ChaCha20Poly1305`).
> >>>    * Gidon Gershinsky did not find this suggestion compelling enough to
> >>> override his security concerns.
> >>>  * Low-level encryption done as third party Python package, either
> >> private
> >>> or open source. This is annoying technically, plausibly would require
> >>> maintaining a fork.
> >>> Any other ideas? Thoughts on these options?
> >>>
> >>> —Itamar
> >>
> >
>

Reply via email to