Thanks Ben, I left a few comments there.

Cheers, Gidon


On Mon, Nov 16, 2020 at 2:58 AM Benjamin Kietzman <bengil...@gmail.com>
wrote:

> @Gidon
>
> Copied the gist to a google doc for commenting:
>
> https://docs.google.com/document/d/11qz84ajysvVo5ZAV9mXKOeh6ay4-xgkBrubggCP5220/edit#
>
> @Micah
>
> > it would be preferable to put them in an internal namespace to ensure
> adequate unit testing is in place.
>
> Agreed; my intent was only to indicate that implementation details should
> be private. We can even have
> *_internal.h header files for those to make them available for unit
> testing as we do with some other
> classes. I've added a note of this caveat to the doc.
>
> > It seems like KmsClient and maybe PropertiesDrivenCryptoFactory should
> still be implemented
> > with internal synchronization to guarantee thread-safety?
>
> If these are exclusively accessed from the main thread (where they are
> used to assemble
> FileDecryptionProperties) then there is still no need to synchronize them.
>
> @Tham
>
> > As I see, the returned FileDecryptionProperties object from 
> > PropertiesDrivenCryptoFactory
> is not mutable,
> > since it has a DecryptionKeyRetriever object.
>
> If the DecryptionKeyRetriever accesses the caches directly, then it will
> indeed need to be synchronized.
> Instead, we can ensure that cache access happens on the main thread before
> worker tasks are launched.
> After we assemble this ahead-of-time set of keys it will not change during
> the course of a read, so the
> DecryptionKeyRetriever can safely access it without mutexes. I've added a
> comment to the doc
>
> On Fri, Nov 13, 2020 at 3:09 AM Gidon Gershinsky <gg5...@gmail.com> wrote:
>
>> Hi all,
>>
>> Glad to see the parquet-cpp progress on this! Can I suggest creating a
>> googledoc for the technical discussion? The current md doc format seems to
>> be harder for pinpointed comments. I got a few, but they are too minor for
>> sending to the two mailing lists.
>>
>> Cheers, Gidon
>>
>>
>> On Fri, Nov 13, 2020 at 9:17 AM Tham Ha <t...@emotiv.com> wrote:
>>
>>> Hi Ben,
>>>
>>> Can you reconsider this point:
>>> > Properties (including FileDecryptionProperties) are immutable after
>>> construction and thus can be safely shared between threads with no further
>>> synchronization.
>>>
>>> As I see, the returned FileDecryptionProperties object from 
>>> PropertiesDrivenCryptoFactory
>>> is not mutable, since it has a DecryptionKeyRetriever object (i.e. 
>>> FileKeyUnwrapper
>>> object in this pull) that helps get the key from key metadata.
>>> From the current implementation of FileKeyUnwrapper with the cache, I
>>> think synchronization is necessary, unless you have another idea.
>>>
>>> Cheers, Tham
>>>
>>> On Fri, Nov 13, 2020 at 1:28 PM Micah Kornfield <emkornfi...@gmail.com>
>>> wrote:
>>>
>>>> I skimmed through and this seems like a clean design (I would have to
>>>> reread the PR to do a comparison.  A few thoughts of the top of my head:
>>>>
>>>>
>>>> > - Multiple internal classes are left public in header files, where it
>>>> > would be
>>>> >   preferred that public classes be kept to a minimum.
>>>>
>>>> I think some of these classes are non-trivial.  If that  is the case it
>>>> would be preferable to put them in an internal namespace to ensure
>>>> adequate unit testing is in place.
>>>>
>>>> It seems like KmsClient and maybe PropertiesDrivenCryptoFactory should
>>>> still be implemented with internal synchronization to guarantee
>>>> thread-safety?
>>>>
>>>>
>>>>
>>>> On Wed, Nov 11, 2020 at 6:19 PM Benjamin Kietzman <bengil...@gmail.com>
>>>> wrote:
>>>>
>>>> > In the context of https://issues.apache.org/jira/browse/ARROW-9318 /
>>>> > https://github.com/apache/arrow/pull/8023 which port the parquet-mr
>>>> > design to c++: there's some question whether that design is consistent
>>>> > with the style and conventions of the c++ implementation of parquet.
>>>> >
>>>> > Here is a Gist with a sketched alternative design adapted to c++
>>>> > https://gist.github.com/bkietz/f701d32add6f5a2aeed89ce36a443d43
>>>> >
>>>> > Specific concerns in brief:
>>>> > - Multiple internal classes are left public in header files, where it
>>>> > would be
>>>> >   preferred that public classes be kept to a minimum.
>>>> > - Several levels of explicit synchronization with mutexes are used to
>>>> > guarantee
>>>> >   that each class is completely thread safe, but this is unnecessary
>>>> > overhead
>>>> >   given the pattern of use of `parquet-cpp`.
>>>> >
>>>>
>>>
>>>
>>> --
>>> Tham Ha Thi
>>> C++ Developer
>>> *EMOTIV* www.emotiv.com
>>> Neurotech for the Global Community
>>>
>>> LEGAL NOTICE
>>>
>>> This message (including all attachments) contains confidential
>>> information intended for a specific individual and purpose, and is
>>> protected by law.  Any confidentiality or privilege is not waived or lost
>>> because this email has been sent to you by mistake. If you have received it
>>> in error, please let us know by reply email, delete it from your system and
>>> destroy any copies.
>>>
>>> This email is also subject to copyright. Any disclosure, copying, or
>>> distribution of this message, or the taking of any action based on it, is
>>> strictly prohibited.
>>>
>>> Emails may be interfered with, may contain computer viruses or other
>>> defects and may not be successfully replicated on other systems. We give no
>>> warranties in relation to these matters. If you have any doubts about the
>>> authenticity of an email purportedly sent by us, please contact us
>>> immediately.
>>>
>>>

Reply via email to