Hi all, Glad to see the parquet-cpp progress on this! Can I suggest creating a googledoc for the technical discussion? The current md doc format seems to be harder for pinpointed comments. I got a few, but they are too minor for sending to the two mailing lists.
Cheers, Gidon On Fri, Nov 13, 2020 at 9:17 AM Tham Ha <t...@emotiv.com> wrote: > Hi Ben, > > Can you reconsider this point: > > Properties (including FileDecryptionProperties) are immutable after > construction and thus can be safely shared between threads with no further > synchronization. > > As I see, the returned FileDecryptionProperties object from > PropertiesDrivenCryptoFactory > is not mutable, since it has a DecryptionKeyRetriever object (i.e. > FileKeyUnwrapper > object in this pull) that helps get the key from key metadata. > From the current implementation of FileKeyUnwrapper with the cache, I > think synchronization is necessary, unless you have another idea. > > Cheers, Tham > > On Fri, Nov 13, 2020 at 1:28 PM Micah Kornfield <emkornfi...@gmail.com> > wrote: > >> I skimmed through and this seems like a clean design (I would have to >> reread the PR to do a comparison. A few thoughts of the top of my head: >> >> >> > - Multiple internal classes are left public in header files, where it >> > would be >> > preferred that public classes be kept to a minimum. >> >> I think some of these classes are non-trivial. If that is the case it >> would be preferable to put them in an internal namespace to ensure >> adequate unit testing is in place. >> >> It seems like KmsClient and maybe PropertiesDrivenCryptoFactory should >> still be implemented with internal synchronization to guarantee >> thread-safety? >> >> >> >> On Wed, Nov 11, 2020 at 6:19 PM Benjamin Kietzman <bengil...@gmail.com> >> wrote: >> >> > In the context of https://issues.apache.org/jira/browse/ARROW-9318 / >> > https://github.com/apache/arrow/pull/8023 which port the parquet-mr >> > design to c++: there's some question whether that design is consistent >> > with the style and conventions of the c++ implementation of parquet. >> > >> > Here is a Gist with a sketched alternative design adapted to c++ >> > https://gist.github.com/bkietz/f701d32add6f5a2aeed89ce36a443d43 >> > >> > Specific concerns in brief: >> > - Multiple internal classes are left public in header files, where it >> > would be >> > preferred that public classes be kept to a minimum. >> > - Several levels of explicit synchronization with mutexes are used to >> > guarantee >> > that each class is completely thread safe, but this is unnecessary >> > overhead >> > given the pattern of use of `parquet-cpp`. >> > >> > > > -- > Tham Ha Thi > C++ Developer > *EMOTIV* www.emotiv.com > Neurotech for the Global Community > > LEGAL NOTICE > > This message (including all attachments) contains confidential information > intended for a specific individual and purpose, and is protected by law. > Any confidentiality or privilege is not waived or lost because this email > has been sent to you by mistake. If you have received it in error, please > let us know by reply email, delete it from your system and destroy any > copies. > > This email is also subject to copyright. Any disclosure, copying, or > distribution of this message, or the taking of any action based on it, is > strictly prohibited. > > Emails may be interfered with, may contain computer viruses or other > defects and may not be successfully replicated on other systems. We give no > warranties in relation to these matters. If you have any doubts about the > authenticity of an email purportedly sent by us, please contact us > immediately. > >