Attendees/Agenda Julien (Dremio): - 1.9.0 release Dan, Ryan (Netflix): - new statistics discussion (ordering) - new encodings. - IOManager discussion - time wasted in GC in Hive Parquet serde Piyush (Twitter): - better thrift integration in Scala Sergio (Cloudera): - presented new people working on Parquet on Cloudera side - follow thread about creating APIs in Hive to make it easier for ORC and Parquet to be compatible across versions Uwe (Blue Yonder): - look into a bug in parquet-cpp - prepare for first release of parquet-cpp 0.1
Notes: - Versioning: - we should decouple library versioning from format versioning: - allow major release more often and remove deprecated apis - parquet-mr/cpp/format versioned independently - parquet-cpp to start at 0.1 - 1.9.0 release - blocked on Statistics: need Alex’s feedback - want to release ASAP - releasing: - need to release more often - at least make a minor release every 3 months. - make a patch releases as necessary (any bug fix might warrant a patch release) - rotate release manager role. (Ryan, Piyush, ...) - validation integration/performance tests from Netflix/Twitter - delete hive serve in Parquet since it’s been in hive for a while - new encodings: - Ryan tried new encodings. - RLE + bitwidth + zigzag + delta: good results - should make a flag per new encodings: - for compatibility with other implementations - for performance - should document what encoding is supported for each version of parquet-mr/parquet-cpp/impala - Action: Ryan. Start a document. - strategies to select an encoding: - Piyush has started experimenting and would like feedback. - better fallback solution. - Ryan: tools to re-encode and compare performance of encodings. - Action: Ryan email dev list about where to put it. - IOManager: perf on S3, allocations with G1 collector. - optimization of seeks vs reads, when to ignore - reduce firs record latency - use threads - G1 collector humongous allocations pinned to old gen memory. - greater than a certain size. Default row group size hits the limit. - Action: open JIRA. - time wasted in GC in hive parquet serde: - Action: Create JIRA On Thu, Oct 6, 2016 at 10:00 AM, Julien Le Dem <jul...@dremio.com> wrote: > Parquet sync starting now at: > https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up > > On Wed, Oct 5, 2016 at 9:52 PM, Julien Le Dem <jul...@dremio.com> wrote: > >> Yes that's correct >> The next parquet sync is tomorrow 10am PT on Google hangout >> >> >> On Monday, September 26, 2016, Jim Pivarski <jpivar...@gmail.com> wrote: >> >>> On Thu, Sep 22, 2016 at 7:18 PM, Julien Le Dem <jul...@dremio.com> >>> wrote: >>> >>> > The sync next week collides with strata Conf in NY. >>> > I propose to move it to the following week. >>> >>> >>> Does that mean it would be pushed back to Thursday, October 3 at 10-11am >>> PT? >>> >> >> >> -- >> Julien >> >> > > > -- > Julien > -- Julien