Hi, I took some notes during our last parquet sync, please find below:
Ryan Blue, (Netflix) Steven Moy (Yelp): encryption, indexing Jim Apple (Cloudera): Bloom filters, indexing Xinli Shang (Uber): encryption Csaba Ringhofer (Cloudera) Anna Szonyi (Cloudera): how to reduce commit time Nandor Kollar (Cloudera) Zoltan Ivanfi (Cloudera): present page size results, merge strategy Julien (We Work) *Encryption:* Voting (and new discussion) has started on the design doc for the modular encryption: https://lists.apache.org/thread.html/e93a723df1c8c3b961cd9664d5da289f5ccffa47160cf1ecfb3227b5@%3Cdev.parquet.apache.org%3E *Page size results:* - https://docs.google.com/spreadsheets/d/1hfQPy8NkGbgGugnHWvIHSzZ-3Q5M7f3Dtf_oD9ACFRg/edit?usp=sharing#gid=552274286 *Merge strategies for feature branches:* - Squashing: con: losing the (blame) history for lines - Merge: con: complicated history, difficult to revert - Rebase or rebase and squash: con: difficult for non-committers collaborating on a feature branch when rebasing is necessary Generally Treat feature branch as master by proxy, gets reviewed as if on master, squashed, has jiras related to the commit (no junk commit). Cloudera folks will try out rebase/merge/merge revert and see which is the most feasible. Discussion started on the dev list: https://lists.apache.org/thread.html/b923f9608e9d6d59f1040bb196b4aca176d1c06f838dda0b28020ebd@%3Cdev.parquet.apache.org%3E *Bloom filters* Discussion around what is left to get it committed? A.I.:Start a vote on the design doc *Reduce "time to committed"* Generally differentiate between changes that affect the format vs. not, if it does than it should be slower, if it doesn't then we should be less strict. Format changes: design docs, votes on docs/changes contributors reviewing other contributors can lead to faster committing. Frequent reviews -> committership A.I.: Anna to create a draft of a contributors doc. Best, Anna