Re: Parquet sync starting now

Julien Le Dem Thu, 06 Oct 2016 11:07:37 -0700

 Attendees/Agenda
Julien (Dremio):
 - 1.9.0 release
Dan, Ryan (Netflix):
 - new statistics discussion (ordering)
 - new encodings.
 - IOManager discussion
 - time wasted in GC in Hive Parquet serde
Piyush (Twitter):
 - better thrift integration in Scala
Sergio (Cloudera):
 - presented new people working on Parquet on Cloudera side
 - follow thread about creating APIs in Hive to make it easier for ORC and
Parquet to be compatible across versions
Uwe (Blue Yonder):
 - look into a bug in parquet-cpp
 - prepare for first release of parquet-cpp 0.1



Notes:
 - Versioning:
    - we should decouple library versioning from format versioning:
    - allow major release more often and remove deprecated apis
    - parquet-mr/cpp/format versioned independently
    - parquet-cpp to start at 0.1
 - 1.9.0 release
   - blocked on Statistics: need Alex’s feedback
   - want to release ASAP
 - releasing:
   - need to release more often
   - at least make a minor release every 3 months.
   - make a patch releases as necessary (any bug fix might warrant a patch
release)
   - rotate release manager role. (Ryan, Piyush, ...)
   - validation integration/performance tests from Netflix/Twitter
 - delete hive serve in Parquet since it’s been in hive for a while
 - new encodings:
   - Ryan tried new encodings.
      - RLE + bitwidth + zigzag + delta: good results
   - should make a flag per new encodings:
      - for compatibility with other implementations
      - for performance
   - should document what encoding is supported for each version of
parquet-mr/parquet-cpp/impala
      - Action: Ryan. Start a document.
   - strategies to select an encoding:
      - Piyush has started experimenting and would like feedback.
      - better fallback solution.
      - Ryan: tools to re-encode and compare performance of encodings.
         - Action: Ryan email dev list about where to put it.
 - IOManager: perf on S3, allocations with G1 collector.
     - optimization of seeks vs reads, when to ignore
     - reduce firs record latency
     - use threads
     - G1 collector humongous allocations pinned to old gen memory.
       - greater than a certain size. Default row group size hits the
limit.
    - Action: open JIRA.
- time wasted in GC in hive parquet serde:
    - Action: Create JIRA










On Thu, Oct 6, 2016 at 10:00 AM, Julien Le Dem <jul...@dremio.com> wrote:

> Parquet sync starting now at:
> https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up
>
> On Wed, Oct 5, 2016 at 9:52 PM, Julien Le Dem <jul...@dremio.com> wrote:
>
>> Yes that's correct
>> The next parquet sync is tomorrow 10am PT on Google hangout
>>
>>
>> On Monday, September 26, 2016, Jim Pivarski <jpivar...@gmail.com> wrote:
>>
>>> On Thu, Sep 22, 2016 at 7:18 PM, Julien Le Dem <jul...@dremio.com>
>>> wrote:
>>>
>>> > The sync next week collides with strata Conf in NY.
>>> > I propose to move it to the following week.
>>>
>>>
>>> Does that mean it would be pushed back to Thursday, October 3 at 10-11am
>>> PT?
>>>
>>
>>
>> --
>> Julien
>>
>>
>
>
> --
> Julien
>



-- 
Julien

Re: Parquet sync starting now

Reply via email to