4/30/2019
Attendee:
Zoltan and Several other folks(Cloudera)
Brian (SaS?)
Ryan Blue(Netflix)
Julien(WeWorks)
Wes McKinney(Ursa Labs)
Gidon Gershinsky(IBM)
Steven(?)
Anikt(?)
Deepak(?)
Xinli Shang(Uber)
Topics:
1.
Key signing issue
1.
Zoltan/Julien/Ryan:
1.
We already have email exchange of this issue.
2.
In the past, it is done in person. But it is OK to sign each other
via video conference. We can do a video session of signing keys.
3.
It is painful to do this every release
1.
Column Encryption
1.
Gidon:
1.
C++ version progress well. It is pretty much done.
2.
Wait for Parquet-1.11.0 release to send out code review
3.
Found issues in Java. Worked around it. Will talk to Java
community.
2.
Xinli:
1.
On top of Gidon’s change, we introduced a plugin/interface to
Parquet to activate encryption and build up encryption properties.
Currently, we implement its schema driven implementation, but
it can be
implemented in another way too. I will send out the design soon.
3.
Gidon:
1.
Overall we took a bottom-up approach. We might need another layer
on top of these to make the adoption easier.
4.
Ryan:
1.
Different companies can have a different implementation. It is
good to have a plugin mode.
5.
Brian: Question of the key metadata, KMS.
1.
Currently, Parquet designs it as a byte array. Depending on the
implementation, it can be used to record the KMS/Key Metadata.
2.
Parquet-1.11.0 Release Validation
1.
Ryan
1.
Validate the write path of column index - We need to test the
calculation is correct; Validation is independent. Ryan will
take this task.
2.
Brian:
1.
Can help some testing in Summer if needed.
3.
Steven:
1.
What is the test strategy, any fuzzing test?
4.
Ryan:
1.
We have some random test but not reliable. Inside Netflix, we have
stable fuzzing test. May need to port some to Parquet.
5.
Xinli:
1.
We have run a lot of regression test on Parquet-1.11.0. We add
encryption code on top of 1.11.0 and run a lot of tests. No
new feature
test of 1.110 but existing features tests are so far so good.
Let us know
if you want us to add some more tests into our test suite.
1.
Remove old Parquet modules
1.
Ryan
1.
We should remove those old modules if they are not needed
2.
Hive module - Seems not used
3.
Scrooge module - if it is only used by one company, we might not want
to maintain it
4.
Does anybody still use parquet-tools instead of parquet-cli? Maybe we
can mark it as deprecated.
5.
Open a Jira ticket for it.
2.
Julien
1.
Twitter may use it. Julien will check with Twitter.
2.
We should communicate widely.
--
Xinli Shang (Uber)