Re: High performance vectorized reader meeting notes

Jacques Nadeau Mon, 27 Oct 2014 22:56:23 -0700

You can't send attachments.  Can you post as google doc or gist?

On Mon, Oct 27, 2014 at 7:41 PM, Zhenxiao Luo <[email protected]>
wrote:


>
> Thanks Brock and Jason.
>
> I just drafted a proposed APIs for vectorized Parquet reader(attached in
> this email). Any comments and suggestions are appreciated.
>
> Thanks,
> Zhenxiao
>
> On Tue, Oct 7, 2014 at 5:34 PM, Brock Noland <[email protected]> wrote:
>
>> Hi,
>>
>> The Hive + Parquet community is very interested in improving performance
>> of
>> Hive + Parquet and Parquet generally. We are very interested in
>> contributing to the Parquet vectorization and lazy materialization effort.
>> Please add myself to any future meetings on this topic.
>>
>> BTW here it the JIRA tracking this effort from the Hive side:
>> https://issues.apache.org/jira/browse/HIVE-8120
>>
>> Brock
>>
>> On Tue, Oct 7, 2014 at 2:04 PM, Zhenxiao Luo <[email protected]>
>> wrote:
>>
>> > Thanks Jason.
>> >
>> > Yes, Netflix is using Presto and Parquet for our BigDataPlatform(
>> >
>> >
>> http://techblog.netflix.com/2014/10/using-presto-in-our-big-data-platform.html
>> > ).
>> >
>> > The fastest format currently in Presto is ORC, not DWRF(Parquet is fast,
>> > but not as fast as ORC). We are referring to ORC, not facebook's DWRF
>> > implementation.
>> >
>> > We already get Parquet working in Presto. We definitely would like to
>> get
>> > it as fast as ORC.
>> >
>> > Facebook has did native support for ORC in Presto, which does not use
>> the
>> > ORCRecordReader at all. They parses the ORC footer, and does Predicate
>> > Pushdown by skipping row groups, Vectorization by introducing Type
>> Specific
>> > Vectors, and Lazy Materialization by introducing LazyVectors(their code
>> has
>> > not been committed yet, I mean their pull request). We are planning to
>> do
>> > similar optimization for Parquet in Presto.
>> >
>> > For the ParquetRecordReader, we need additional APIs to read the next
>> Batch
>> > of values, and read in a vector of values. For example, here are the
>> > related APIs in the ORC code:
>> >
>> > /**
>> >    * Read the next row batch. The size of the batch to read cannot be
>> > controlled
>> >    * by the callers. Caller need to look at VectorizedRowBatch.size of
>> the
>> > retunred
>> >    * object to know the batch size read.
>> >    * @param previousBatch a row batch object that can be reused by the
>> > reader
>> >    * @return the row batch that was read
>> >    * @throws java.io.IOException
>> >    */
>> >   VectorizedRowBatch nextBatch(VectorizedRowBatch previousBatch) throws
>> > IOException;
>> >
>> > And, here are the related APIs in Presto code, which is used for ORC
>> > support in Presto:
>> >
>> > public void readVector(int columnIndex, Object vector);
>> >
>> > For lazy materialization, we may also consider adding LazyVectors or
>> > LazyBlocks, so that the value is not materialized until they are
>> accessed
>> > by the Operator.
>> >
>> > Any comments and suggestions are appreciated.
>> >
>> > Thanks,
>> > Zhenxiao
>> >
>> >
>> > On Tue, Oct 7, 2014 at 1:05 PM, Jason Altekruse <
>> [email protected]>
>> > wrote:
>> >
>> > > Hello All,
>> > >
>> > > No updates from me yet, just sending out another message for some of
>> the
>> > > Netflix engineers that were still just subscribed to the google group
>> > mail.
>> > > This will allow them to respond directly with their research on the
>> > > optimized ORC reader for consideration in the design discussion.
>> > >
>> > > -Jason
>> > >
>> > > On Mon, Oct 6, 2014 at 10:51 PM, Jason Altekruse <
>> > [email protected]
>> > > >
>> > > wrote:
>> > >
>> > > > Hello Parquet team,
>> > > >
>> > > > I wanted to report the results of a discussion between the Drill
>> team
>> > and
>> > > > the engineers  at Netflix working to make Parquet run faster with
>> > Presto.
>> > > > As we have said in the last few hangouts we both want to make
>> > > contributions
>> > > > back to parquet-mr to add features and performance. We thought it
>> would
>> > > be
>> > > > good to sit down and speak directly about our real goals and the
>> best
>> > > next
>> > > > steps to get an engineering effort started to accomplish these
>> goals.
>> > > >
>> > > > Below is a summary of the meeting.
>> > > >
>> > > > - Meeting notes
>> > > >
>> > > >    - Attendees:
>> > > >
>> > > >        - Netflix : Eva Tse, Daniel Weeks, Zhenxiao Luo
>> > > >
>> > > >        - MapR (Drill Team) : Jacques Nadeau, Jason Altekruse, Parth
>> > > Chandra
>> > > >
>> > > > - Minutes
>> > > >
>> > > >    - Introductions/ Background
>> > > >
>> > > >    - Netflix
>> > > >
>> > > >        - Working on providing interactive SQL querying to users
>> > > >
>> > > >        - have chosen Presto as the query engine and Parquet as high
>> > > > performance data
>> > > >
>> > > >          storage format
>> > > >
>> > > >        - Presto is providing needed speed in some cases, but others
>> are
>> > > > missing optimizations
>> > > >
>> > > >          that could be avoiding reads
>> > > >
>> > > >        - Have already started some development and investigation,
>> have
>> > > > identified key goals
>> > > >
>> > > >        - Some initial benchmarks with a modified ORC reader DWRF,
>> > written
>> > > > by the Presto
>> > > >
>> > > >          team shows that such gains are possible with a different
>> > reader
>> > > > implementation
>> > > >
>> > > >        - goals
>> > > >
>> > > >            - filter pushdown
>> > > >
>> > > >                - skipping reads based on filter evaluation on one or
>> > more
>> > > > columns
>> > > >
>> > > >                - this can happen at several granularities : row
>> group,
>> > > > page, record/value
>> > > >
>> > > >            - late/lazy materialization
>> > > >
>> > > >                - for columns not involved in a filter, avoid
>> > > materializing
>> > > > them entirely
>> > > >
>> > > >                  until they are know to be needed after evaluating a
>> > > > filter on other columns
>> > > >
>> > > >    - Drill
>> > > >
>> > > >        - the Drill engine uses an in-memory vectorized
>> representation
>> > of
>> > > > records
>> > > >
>> > > >        - for scalar and repeated types we have implemented a fast
>> > > > vectorized reader
>> > > >
>> > > >          that is optimized to transform between Parquet's on disk
>> and
>> > our
>> > > > in-memory format
>> > > >
>> > > >        - this is currently producing performant table scans, but
>> has no
>> > > > facility for filter
>> > > >
>> > > >          push down
>> > > >
>> > > >        - Major goals going forward
>> > > >
>> > > >            - filter pushdown
>> > > >
>> > > >                - decide the best implementation for incorporating
>> > filter
>> > > > pushdown into
>> > > >
>> > > >                  our current implementation, or figure out a way to
>> > > > leverage existing
>> > > >
>> > > >                  work in the parquet-mr library to accomplish this
>> goal
>> > > >
>> > > >            - late/lazy materialization
>> > > >
>> > > >                - see above
>> > > >
>> > > >            - contribute existing code back to parquet
>> > > >
>> > > >                - the Drill parquet reader has a very strong
>> emphasis on
>> > > > performance, a
>> > > >
>> > > >                  clear interface to consume, that is sufficiently
>> > > > separated from Drill
>> > > >
>> > > >                  could prove very useful for other projects
>> > > >
>> > > >    - First steps
>> > > >
>> > > >        - Netflix team will share some of their thoughts and research
>> > from
>> > > > working with
>> > > >
>> > > >          the DWRF code
>> > > >
>> > > >            - we can have a discussion based off of this, which
>> aspects
>> > > are
>> > > > done well,
>> > > >
>> > > >              and any opportunities they may have missed that we can
>> > > > incorporate into our
>> > > >
>> > > >              design
>> > > >
>> > > >            - do further investigation and ask the existing community
>> > for
>> > > > guidance on existing
>> > > >
>> > > >              parquet-mr features or planned APIs that may provide
>> > desired
>> > > > functionality
>> > > >
>> > > >        - We will begin a discussion of an API for the new
>> functionality
>> > > >
>> > > >            - some outstanding thoughts for down the road
>> > > >
>> > > >                - The Drill team has an interest in very late
>> > > > materialization for data stored
>> > > >
>> > > >                  in dictionary encoded pages, such as running a
>> join or
>> > > > filter on the dictionary
>> > > >
>> > > >                  and then going back to the reader to grab all of
>> the
>> > > > values in the data that match
>> > > >
>> > > >                  the needed members of the dictionary
>> > > >
>> > > >                    - this is a later consideration, but just some of
>> > the
>> > > > idea of the reason we are
>> > > >
>> > > >                      opening up the design discussion early so that
>> the
>> > > > API can be flexible enough
>> > > >                      to allow this in the further, even if not
>> > > implemented
>> > > > too soon
>> > > >
>> > >
>> >
>>
>
>

Re: High performance vectorized reader meeting notes

Reply via email to