Thanks Ben, I left a few comments there. Cheers, Gidon
On Mon, Nov 16, 2020 at 2:58 AM Benjamin Kietzman <bengil...@gmail.com> wrote: > @Gidon > > Copied the gist to a google doc for commenting: > > https://docs.google.com/document/d/11qz84ajysvVo5ZAV9mXKOeh6ay4-xgkBrubggCP5220/edit# > > @Micah > > > it would be preferable to put them in an internal namespace to ensure > adequate unit testing is in place. > > Agreed; my intent was only to indicate that implementation details should > be private. We can even have > *_internal.h header files for those to make them available for unit > testing as we do with some other > classes. I've added a note of this caveat to the doc. > > > It seems like KmsClient and maybe PropertiesDrivenCryptoFactory should > still be implemented > > with internal synchronization to guarantee thread-safety? > > If these are exclusively accessed from the main thread (where they are > used to assemble > FileDecryptionProperties) then there is still no need to synchronize them. > > @Tham > > > As I see, the returned FileDecryptionProperties object from > > PropertiesDrivenCryptoFactory > is not mutable, > > since it has a DecryptionKeyRetriever object. > > If the DecryptionKeyRetriever accesses the caches directly, then it will > indeed need to be synchronized. > Instead, we can ensure that cache access happens on the main thread before > worker tasks are launched. > After we assemble this ahead-of-time set of keys it will not change during > the course of a read, so the > DecryptionKeyRetriever can safely access it without mutexes. I've added a > comment to the doc > > On Fri, Nov 13, 2020 at 3:09 AM Gidon Gershinsky <gg5...@gmail.com> wrote: > >> Hi all, >> >> Glad to see the parquet-cpp progress on this! Can I suggest creating a >> googledoc for the technical discussion? The current md doc format seems to >> be harder for pinpointed comments. I got a few, but they are too minor for >> sending to the two mailing lists. >> >> Cheers, Gidon >> >> >> On Fri, Nov 13, 2020 at 9:17 AM Tham Ha <t...@emotiv.com> wrote: >> >>> Hi Ben, >>> >>> Can you reconsider this point: >>> > Properties (including FileDecryptionProperties) are immutable after >>> construction and thus can be safely shared between threads with no further >>> synchronization. >>> >>> As I see, the returned FileDecryptionProperties object from >>> PropertiesDrivenCryptoFactory >>> is not mutable, since it has a DecryptionKeyRetriever object (i.e. >>> FileKeyUnwrapper >>> object in this pull) that helps get the key from key metadata. >>> From the current implementation of FileKeyUnwrapper with the cache, I >>> think synchronization is necessary, unless you have another idea. >>> >>> Cheers, Tham >>> >>> On Fri, Nov 13, 2020 at 1:28 PM Micah Kornfield <emkornfi...@gmail.com> >>> wrote: >>> >>>> I skimmed through and this seems like a clean design (I would have to >>>> reread the PR to do a comparison. A few thoughts of the top of my head: >>>> >>>> >>>> > - Multiple internal classes are left public in header files, where it >>>> > would be >>>> > preferred that public classes be kept to a minimum. >>>> >>>> I think some of these classes are non-trivial. If that is the case it >>>> would be preferable to put them in an internal namespace to ensure >>>> adequate unit testing is in place. >>>> >>>> It seems like KmsClient and maybe PropertiesDrivenCryptoFactory should >>>> still be implemented with internal synchronization to guarantee >>>> thread-safety? >>>> >>>> >>>> >>>> On Wed, Nov 11, 2020 at 6:19 PM Benjamin Kietzman <bengil...@gmail.com> >>>> wrote: >>>> >>>> > In the context of https://issues.apache.org/jira/browse/ARROW-9318 / >>>> > https://github.com/apache/arrow/pull/8023 which port the parquet-mr >>>> > design to c++: there's some question whether that design is consistent >>>> > with the style and conventions of the c++ implementation of parquet. >>>> > >>>> > Here is a Gist with a sketched alternative design adapted to c++ >>>> > https://gist.github.com/bkietz/f701d32add6f5a2aeed89ce36a443d43 >>>> > >>>> > Specific concerns in brief: >>>> > - Multiple internal classes are left public in header files, where it >>>> > would be >>>> > preferred that public classes be kept to a minimum. >>>> > - Several levels of explicit synchronization with mutexes are used to >>>> > guarantee >>>> > that each class is completely thread safe, but this is unnecessary >>>> > overhead >>>> > given the pattern of use of `parquet-cpp`. >>>> > >>>> >>> >>> >>> -- >>> Tham Ha Thi >>> C++ Developer >>> *EMOTIV* www.emotiv.com >>> Neurotech for the Global Community >>> >>> LEGAL NOTICE >>> >>> This message (including all attachments) contains confidential >>> information intended for a specific individual and purpose, and is >>> protected by law. Any confidentiality or privilege is not waived or lost >>> because this email has been sent to you by mistake. If you have received it >>> in error, please let us know by reply email, delete it from your system and >>> destroy any copies. >>> >>> This email is also subject to copyright. Any disclosure, copying, or >>> distribution of this message, or the taking of any action based on it, is >>> strictly prohibited. >>> >>> Emails may be interfered with, may contain computer viruses or other >>> defects and may not be successfully replicated on other systems. We give no >>> warranties in relation to these matters. If you have any doubts about the >>> authenticity of an email purportedly sent by us, please contact us >>> immediately. >>> >>>