Cool, thank you. This would solve the problem at hand.
I agree it'd be good to kick off the PyArrow API discussion in parallel
with the PR8023 review.
Maybe you and Itamar could prep a googledoc draft for the community to have
a look and to comment.

Cheers, Gidon


On Fri, Sep 4, 2020 at 6:08 PM Roee Shlomo <roe...@gmail.com> wrote:

> Sounds good. In the suggestion above the builders for
> FileEncryptionProperties/FileDecryptionProperties should not be exposed, so
> only key tools would create those. This is just one option of course.
>
> On 2020/09/03 20:44:26, Antoine Pitrou <anto...@python.org> wrote:
> >
> > It would be useful for outsiders to expose what those two API levels
> > are, and to what usage they correspond.
> > Is Parquet encryption used only with that Spark?  While Spark
> > interoperability is important, Parquet files are more ubiquitous than
> that.
> >
> > Regards
> >
> > Antoine.
> >
> >
> > Le 03/09/2020 à 22:31, Gidon Gershinsky a écrit :
> > > Why would the low level API be exposed directly.. This will break the
> > > interop between the two analytic ecosystems down the road.
> > > Again, let me suggest leveraging the high level interface, based on the
> > > PropertiesDrivenCryptoFactory.
> > > It should address your technical requirements; if it doesn't, we can
> > > discuss the gaps.
> > > All questions are welcome.
> > >
> > > Cheers, Gidon
> > >
> > >
> > > On Thu, Sep 3, 2020 at 10:11 PM Roee Shlomo <roe...@gmail.com> wrote:
> > >
> > >> Hi Itamar,
> > >>
> > >> I implemented some python wrappers for the low level API and would be
> > >> happy to collaborate on that. The reason I didn't push this forward
> yet is
> > >> what Gidon mentioned. The API to expose to python users needs to be
> > >> finalized first and it must include the key tools API for interop with
> > >> Spark.
> > >>
> > >> Perhaps it would be good to kickoff a discussion on how the pyarrow
> API
> > >> for PME should look like (in parallel to reviewing the arrow-cpp
> > >> implementation of key-tools; to ensure that wrapping it would be a
> > >> reasonable effort).
> > >>
> > >> One possible approach is to expose both the low level API and keytools
> > >> separately. A user would create and initialize a
> > >> PropertiesDrivenCryptoFactory and use it to create the
> > >> FileEncryptionProperties/FileDecryptionProperties to pass to the lower
> > >> level API. In pandas this would translate to something like:
> > >> ```
> > >> factory = PropertiesDrivenCryptoFactory(...)
> > >> df.to_parquet(path, engine="pyarrow",
> > >> encryption=factory.getFileEncryptionProperties(...))
> > >> df = pd.read_parquet(path, engine="pyarrow",
> > >> decryption=factory.getFileDecryptionProperties(...))
> > >> ```
> > >> This should also work with reading datasets since decryption uses a
> > >> KeyRetriever, but I'm not sure what will need to be done once
> datasets will
> > >> support write.
> > >>
> > >> On 2020/09/03 14:11:51, "Itamar Turner-Trauring" <
> ita...@pythonspeed.com>
> > >> wrote:
> > >>> Hi,
> > >>>
> > >>> I'm looking into implementing this, and it seems like there are two
> > >> parts: packaging, but also wrapping the APIs in Python. Is the latter
> item
> > >> accurate? If so, any examples of similar existing wrapped APIs, or
> should I
> > >> just come up with something on my own?
> > >>>
> > >>> Context:
> > >>> https://github.com/apache/arrow/pull/4826
> > >>> https://issues.apache.org/jira/browse/ARROW-8040
> > >>>
> > >>> Thanks,
> > >>>
> > >>> —Itamar
> > >>
> > >
> >
>

Reply via email to