I'm not sure a threading model is expected for an encryption layer.  Am
I missing something?

Regards

Antoine.


Le 17/02/2021 à 06:59, Gidon Gershinsky a écrit :
> Precisely, the main change is in the threading model. Afaik, the document
> proposes a model that fits pandas, but might be problematic for other users
> of this library.
> Technically, this is not showstopper though; if the community decides on
> this model, it will be compatible with the high-level encryption design;
> but the change implementation would need to be done by pandas experts (not
> us; but we'll help where we can).
> Micah, you know this subject (and the community) better than we do - we'd
> much appreciate it if you'd take a lead on removing this roadblock.
> 
> Cheers, Gidon
> 
> 
> On Wed, Feb 17, 2021 at 6:08 AM Micah Kornfield <emkornfi...@gmail.com>
> wrote:
> 
>> I think some of the comments might be conflicting.  One of the concerns
>> (that I would need to refresh myself on to offer an opinion which was
>> covered in Ben's doc) was the threading model we expect in the library.
>>
>> On Tue, Feb 16, 2021 at 8:03 AM Antoine Pitrou <anto...@python.org> wrote:
>>
>>>
>>> Hi Gidon,
>>>
>>> Le 16/02/2021 à 16:42, Gidon Gershinsky a écrit :
>>>> Regarding the high-level layer, I think it waits for a progress at
>>>>
>>>
>> https://docs.google.com/document/d/11qz84ajysvVo5ZAV9mXKOeh6ay4-xgkBrubggCP5220/edit?usp=sharing
>>>> No activity there since last November. This is unfortunate, because
>> Tham
>>>> has put a lot of work in coding the high-level layer (and addressing
>> 200+
>>>> review comments) in the PR https://github.com/apache/arrow/pull/8023.
>>> The
>>>> code is functional, compatible with the Java version in parquet-mr, and
>>> can
>>>> be updated with the threading changes in the doc above. I hope all this
>>>> good work will not be wasted.
>>>
>>> I'm sorry for the possibly frustrating process.  Looking at the PR,
>>> though, it seems a bunch of comments were not addressed.  Is it possible
>>> to go through them and ensure they get an answer and/or a resolution?
>>>
>>> Best regards
>>>
>>> Antoine.
>>>
>>>
>>>
>>>>
>>>> Cheers, Gidon
>>>>
>>>>
>>>> On Sat, Feb 13, 2021 at 6:52 AM Micah Kornfield <emkornfi...@gmail.com
>>>
>>>> wrote:
>>>>
>>>>> My thoughts:
>>>>> 1.  I've lost track of the higher level encryption implementation in
>>> C++.
>>>>> I think we were trying to come to a consensus on the threading/thread
>>>>> safety model?
>>>>>
>>>>> 2.  I'm open to exposing the lower level encryption libraries in
>> python
>>>>> (without appropriate namespacing/communication).  It seems at least
>> for
>>>>> reading, there is potentially less harm (I'll caveat that with I'm
>> not a
>>>>> security expert).  Are both the low level read and write
>> implementations
>>>>> necessary?  (it probably makes sense to have a few smaller PRs for
>>> exposing
>>>>> this functionality anyways).
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Feb 10, 2021 at 7:10 AM Itamar Turner-Trauring <
>>>>> ita...@pythonspeed.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Since the PR for high-level C++ Parquet encryption API appears
>> stalled
>>> (
>>>>>> https://github.com/apache/arrow/pull/8023), I'm looking into
>> exposing
>>>>> the
>>>>>> low-level Parquet encryption API to Python.
>>>>>>
>>>>>> Arguments for doing this: the low-level API is all the users I'm
>>> talking
>>>>>> to need, at the moment, so it's plausible others would also find some
>>>>>> benefit in having the Pyarrow API expose low-level Parquet
>> encryption.
>>>>> Then
>>>>>> again, it might only be this one company and no one else cares.
>>>>>>
>>>>>> The arguments against, per Gidon Gershinsky:
>>>>>>
>>>>>>>  * security: low-level encryption API is easy to misuse (eg giving
>> the
>>>>>> same keys for a number of different files; this'd break the AES GCM
>>>>>> cipher). The high-level encryption layer handles that by applying
>>>>> envelope
>>>>>> encryption and other best practices in data security. Also, this
>> layer
>>> is
>>>>>> maintained by the community, meaning that future improvements and
>>>>> security
>>>>>> fixes can be upstreamed by anyone, and available to all.
>>>>>>>  * compatibility: parquet-mr implements the high-level encryption
>>>>> layer.
>>>>>> If we want the files produced by Spark/Presto/etc to be readable by
>>>>>> pandas/PyArrow (and vice versa), we need to provide the Arrow users
>>> with
>>>>>> the high-level API.
>>>>>>> ...
>>>>>>>
>>>>>>> The current situation is not ideal, it'd be good to merge the
>>>>> high-level
>>>>>> PR (and maybe hide the low level), but here we are; also, C++ is a
>> kind
>>>>> of
>>>>>> a low-level language; Python would expose it to a less experienced
>>>>> audience.
>>>>>>
>>>>>> (Source: https://issues.apache.org/jira/browse/ARROW-8040)
>>>>>>
>>>>>> I find the compatibility argument less compelling, that's readily
>>>>>> addressed by documentation. I am not a crypto expert so I can't
>>> evaluate
>>>>>> how risky exposing the low-level encryption APIs would be, but I can
>>> see
>>>>>> how that would be a significant concern.
>>>>>>
>>>>>> Some options are:
>>>>>>  * Status quo, no Python API for low-level Parquet encryption. This
>> has
>>>>>> two possible outcomes:
>>>>>>    * Eventually high-level API gets merged, gets Python binding.
>>>>>>    * High-level encryption API is never merged, Python users never
>> get
>>>>>> access to encryption.
>>>>>>  * Add low-level Parquet encryption API to Pyarrow, perhaps using
>>>>> "hazmat"
>>>>>> idiom used by the Python cryptography package (API namespace
>> indicating
>>>>>> "use at your own risk, this is dangerous", basically, e.g.
>>>>>> `cryptography.hazmat.primitives.ciphers.aead.``ChaCha20Poly1305`).
>>>>>>    * Gidon Gershinsky did not find this suggestion compelling enough
>> to
>>>>>> override his security concerns.
>>>>>>  * Low-level encryption done as third party Python package, either
>>>>> private
>>>>>> or open source. This is annoying technically, plausibly would require
>>>>>> maintaining a fork.
>>>>>> Any other ideas? Thoughts on these options?
>>>>>>
>>>>>> —Itamar
>>>>>
>>>>
>>>
>>
> 

Reply via email to