Re: High performance vectorized reader meeting notes

Zhenxiao Luo Mon, 10 Nov 2014 16:28:11 -0800

Hi Ryan,

I just created this JIRA for it:
https://issues.apache.org/jira/browse/PARQUET-131


Comments and suggestions are welcome.

Thanks,
Zhenxiao

On Mon, Nov 10, 2014 at 10:59 AM, Ryan Blue <[email protected]> wrote:

> Hi everyone,
>
> Is there a JIRA issue tracking the vectorized reader API? Brock and I have
> been working through how we would integrate this with Hive and have a few
> questions and comments. Thanks!
>
> rb
>
>
> On 11/01/2014 01:03 PM, Brock Noland wrote:
>
>> Hi,
>>
>> Great! I will take a look soon.
>>
>> Cheers!
>> Brock
>>
>> On Mon, Oct 27, 2014 at 11:18 PM, Zhenxiao Luo <[email protected]> wrote:
>>
>>
>>> Thanks Jacques.
>>>
>>> Here is the gist:
>>> https://gist.github.com/zhenxiao/2728ce4fe0a7be2d3b30
>>>
>>> Comments and Suggestions are appreciated.
>>>
>>> Thanks,
>>> Zhenxiao
>>>
>>> On Mon, Oct 27, 2014 at 10:55 PM, Jacques Nadeau <[email protected]>
>>> wrote:
>>>
>>>  You can't send attachments.  Can you post as google doc or gist?
>>>>
>>>> On Mon, Oct 27, 2014 at 7:41 PM, Zhenxiao Luo <[email protected]
>>>> >
>>>> wrote:
>>>>
>>>>
>>>>> Thanks Brock and Jason.
>>>>>
>>>>> I just drafted a proposed APIs for vectorized Parquet reader(attached
>>>>> in
>>>>> this email). Any comments and suggestions are appreciated.
>>>>>
>>>>> Thanks,
>>>>> Zhenxiao
>>>>>
>>>>> On Tue, Oct 7, 2014 at 5:34 PM, Brock Noland <[email protected]>
>>>>>
>>>> wrote:
>>>>
>>>>>
>>>>>  Hi,
>>>>>>
>>>>>> The Hive + Parquet community is very interested in improving
>>>>>>
>>>>> performance
>>>>
>>>>> of
>>>>>> Hive + Parquet and Parquet generally. We are very interested in
>>>>>> contributing to the Parquet vectorization and lazy materialization
>>>>>>
>>>>> effort.
>>>>
>>>>> Please add myself to any future meetings on this topic.
>>>>>>
>>>>>> BTW here it the JIRA tracking this effort from the Hive side:
>>>>>> https://issues.apache.org/jira/browse/HIVE-8120
>>>>>>
>>>>>> Brock
>>>>>>
>>>>>> On Tue, Oct 7, 2014 at 2:04 PM, Zhenxiao Luo <[email protected]
>>>>>>
>>>>>
>>>>>  wrote:
>>>>>>
>>>>>>  Thanks Jason.
>>>>>>>
>>>>>>> Yes, Netflix is using Presto and Parquet for our BigDataPlatform(
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>  http://techblog.netflix.com/2014/10/using-presto-in-our-
>>>> big-data-platform.html
>>>>
>>>>> ).
>>>>>>>
>>>>>>> The fastest format currently in Presto is ORC, not DWRF(Parquet is
>>>>>>>
>>>>>> fast,
>>>>
>>>>> but not as fast as ORC). We are referring to ORC, not facebook's DWRF
>>>>>>> implementation.
>>>>>>>
>>>>>>> We already get Parquet working in Presto. We definitely would like to
>>>>>>>
>>>>>> get
>>>>>>
>>>>>>> it as fast as ORC.
>>>>>>>
>>>>>>> Facebook has did native support for ORC in Presto, which does not use
>>>>>>>
>>>>>> the
>>>>>>
>>>>>>> ORCRecordReader at all. They parses the ORC footer, and does
>>>>>>>
>>>>>> Predicate
>>>>
>>>>> Pushdown by skipping row groups, Vectorization by introducing Type
>>>>>>>
>>>>>> Specific
>>>>>>
>>>>>>> Vectors, and Lazy Materialization by introducing LazyVectors(their
>>>>>>>
>>>>>> code
>>>>
>>>>> has
>>>>>>
>>>>>>> not been committed yet, I mean their pull request). We are planning
>>>>>>>
>>>>>> to
>>>>
>>>>> do
>>>>>>
>>>>>>> similar optimization for Parquet in Presto.
>>>>>>>
>>>>>>> For the ParquetRecordReader, we need additional APIs to read the next
>>>>>>>
>>>>>> Batch
>>>>>>
>>>>>>> of values, and read in a vector of values. For example, here are the
>>>>>>> related APIs in the ORC code:
>>>>>>>
>>>>>>> /**
>>>>>>>     * Read the next row batch. The size of the batch to read cannot
>>>>>>> be
>>>>>>> controlled
>>>>>>>     * by the callers. Caller need to look at VectorizedRowBatch.size
>>>>>>>
>>>>>> of
>>>>
>>>>> the
>>>>>>
>>>>>>> retunred
>>>>>>>     * object to know the batch size read.
>>>>>>>     * @param previousBatch a row batch object that can be reused by
>>>>>>>
>>>>>> the
>>>>
>>>>> reader
>>>>>>>     * @return the row batch that was read
>>>>>>>     * @throws java.io.IOException
>>>>>>>     */
>>>>>>>    VectorizedRowBatch nextBatch(VectorizedRowBatch previousBatch)
>>>>>>>
>>>>>> throws
>>>>
>>>>> IOException;
>>>>>>>
>>>>>>> And, here are the related APIs in Presto code, which is used for ORC
>>>>>>> support in Presto:
>>>>>>>
>>>>>>> public void readVector(int columnIndex, Object vector);
>>>>>>>
>>>>>>> For lazy materialization, we may also consider adding LazyVectors or
>>>>>>> LazyBlocks, so that the value is not materialized until they are
>>>>>>>
>>>>>> accessed
>>>>>>
>>>>>>> by the Operator.
>>>>>>>
>>>>>>> Any comments and suggestions are appreciated.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Zhenxiao
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Oct 7, 2014 at 1:05 PM, Jason Altekruse <
>>>>>>>
>>>>>> [email protected]>
>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>>  Hello All,
>>>>>>>>
>>>>>>>> No updates from me yet, just sending out another message for some
>>>>>>>>
>>>>>>> of
>>>>
>>>>> the
>>>>>>
>>>>>>> Netflix engineers that were still just subscribed to the google
>>>>>>>>
>>>>>>> group
>>>>
>>>>> mail.
>>>>>>>
>>>>>>>> This will allow them to respond directly with their research on the
>>>>>>>> optimized ORC reader for consideration in the design discussion.
>>>>>>>>
>>>>>>>> -Jason
>>>>>>>>
>>>>>>>> On Mon, Oct 6, 2014 at 10:51 PM, Jason Altekruse <
>>>>>>>>
>>>>>>> [email protected]
>>>>>>>
>>>>>>>>
>>>>>>>>>  wrote:
>>>>>>>>
>>>>>>>>  Hello Parquet team,
>>>>>>>>>
>>>>>>>>> I wanted to report the results of a discussion between the Drill
>>>>>>>>>
>>>>>>>> team
>>>>>>
>>>>>>> and
>>>>>>>
>>>>>>>> the engineers  at Netflix working to make Parquet run faster with
>>>>>>>>>
>>>>>>>> Presto.
>>>>>>>
>>>>>>>> As we have said in the last few hangouts we both want to make
>>>>>>>>>
>>>>>>>> contributions
>>>>>>>>
>>>>>>>>> back to parquet-mr to add features and performance. We thought it
>>>>>>>>>
>>>>>>>> would
>>>>>>
>>>>>>> be
>>>>>>>>
>>>>>>>>> good to sit down and speak directly about our real goals and the
>>>>>>>>>
>>>>>>>> best
>>>>>>
>>>>>>> next
>>>>>>>>
>>>>>>>>> steps to get an engineering effort started to accomplish these
>>>>>>>>>
>>>>>>>> goals.
>>>>>>
>>>>>>>
>>>>>>>>> Below is a summary of the meeting.
>>>>>>>>>
>>>>>>>>> - Meeting notes
>>>>>>>>>
>>>>>>>>>     - Attendees:
>>>>>>>>>
>>>>>>>>>         - Netflix : Eva Tse, Daniel Weeks, Zhenxiao Luo
>>>>>>>>>
>>>>>>>>>         - MapR (Drill Team) : Jacques Nadeau, Jason Altekruse,
>>>>>>>>>
>>>>>>>> Parth
>>>>
>>>>> Chandra
>>>>>>>>
>>>>>>>>>
>>>>>>>>> - Minutes
>>>>>>>>>
>>>>>>>>>     - Introductions/ Background
>>>>>>>>>
>>>>>>>>>     - Netflix
>>>>>>>>>
>>>>>>>>>         - Working on providing interactive SQL querying to users
>>>>>>>>>
>>>>>>>>>         - have chosen Presto as the query engine and Parquet as
>>>>>>>>>
>>>>>>>> high
>>>>
>>>>> performance data
>>>>>>>>>
>>>>>>>>>           storage format
>>>>>>>>>
>>>>>>>>>         - Presto is providing needed speed in some cases, but
>>>>>>>>>
>>>>>>>> others
>>>>
>>>>> are
>>>>>>
>>>>>>> missing optimizations
>>>>>>>>>
>>>>>>>>>           that could be avoiding reads
>>>>>>>>>
>>>>>>>>>         - Have already started some development and investigation,
>>>>>>>>>
>>>>>>>> have
>>>>>>
>>>>>>> identified key goals
>>>>>>>>>
>>>>>>>>>         - Some initial benchmarks with a modified ORC reader DWRF,
>>>>>>>>>
>>>>>>>> written
>>>>>>>
>>>>>>>> by the Presto
>>>>>>>>>
>>>>>>>>>           team shows that such gains are possible with a different
>>>>>>>>>
>>>>>>>> reader
>>>>>>>
>>>>>>>> implementation
>>>>>>>>>
>>>>>>>>>         - goals
>>>>>>>>>
>>>>>>>>>             - filter pushdown
>>>>>>>>>
>>>>>>>>>                 - skipping reads based on filter evaluation on
>>>>>>>>>
>>>>>>>> one or
>>>>
>>>>> more
>>>>>>>
>>>>>>>> columns
>>>>>>>>>
>>>>>>>>>                 - this can happen at several granularities : row
>>>>>>>>>
>>>>>>>> group,
>>>>>>
>>>>>>> page, record/value
>>>>>>>>>
>>>>>>>>>             - late/lazy materialization
>>>>>>>>>
>>>>>>>>>                 - for columns not involved in a filter, avoid
>>>>>>>>>
>>>>>>>> materializing
>>>>>>>>
>>>>>>>>> them entirely
>>>>>>>>>
>>>>>>>>>                   until they are know to be needed after
>>>>>>>>>
>>>>>>>> evaluating a
>>>>
>>>>> filter on other columns
>>>>>>>>>
>>>>>>>>>     - Drill
>>>>>>>>>
>>>>>>>>>         - the Drill engine uses an in-memory vectorized
>>>>>>>>>
>>>>>>>> representation
>>>>>>
>>>>>>> of
>>>>>>>
>>>>>>>> records
>>>>>>>>>
>>>>>>>>>         - for scalar and repeated types we have implemented a fast
>>>>>>>>> vectorized reader
>>>>>>>>>
>>>>>>>>>           that is optimized to transform between Parquet's on disk
>>>>>>>>>
>>>>>>>> and
>>>>>>
>>>>>>> our
>>>>>>>
>>>>>>>> in-memory format
>>>>>>>>>
>>>>>>>>>         - this is currently producing performant table scans, but
>>>>>>>>>
>>>>>>>> has no
>>>>>>
>>>>>>> facility for filter
>>>>>>>>>
>>>>>>>>>           push down
>>>>>>>>>
>>>>>>>>>         - Major goals going forward
>>>>>>>>>
>>>>>>>>>             - filter pushdown
>>>>>>>>>
>>>>>>>>>                 - decide the best implementation for incorporating
>>>>>>>>>
>>>>>>>> filter
>>>>>>>
>>>>>>>> pushdown into
>>>>>>>>>
>>>>>>>>>                   our current implementation, or figure out a way
>>>>>>>>>
>>>>>>>> to
>>>>
>>>>> leverage existing
>>>>>>>>>
>>>>>>>>>                   work in the parquet-mr library to accomplish
>>>>>>>>>
>>>>>>>> this
>>>>
>>>>> goal
>>>>>>
>>>>>>>
>>>>>>>>>             - late/lazy materialization
>>>>>>>>>
>>>>>>>>>                 - see above
>>>>>>>>>
>>>>>>>>>             - contribute existing code back to parquet
>>>>>>>>>
>>>>>>>>>                 - the Drill parquet reader has a very strong
>>>>>>>>>
>>>>>>>> emphasis on
>>>>>>
>>>>>>> performance, a
>>>>>>>>>
>>>>>>>>>                   clear interface to consume, that is sufficiently
>>>>>>>>> separated from Drill
>>>>>>>>>
>>>>>>>>>                   could prove very useful for other projects
>>>>>>>>>
>>>>>>>>>     - First steps
>>>>>>>>>
>>>>>>>>>         - Netflix team will share some of their thoughts and
>>>>>>>>>
>>>>>>>> research
>>>>
>>>>> from
>>>>>>>
>>>>>>>> working with
>>>>>>>>>
>>>>>>>>>           the DWRF code
>>>>>>>>>
>>>>>>>>>             - we can have a discussion based off of this, which
>>>>>>>>>
>>>>>>>> aspects
>>>>>>
>>>>>>> are
>>>>>>>>
>>>>>>>>> done well,
>>>>>>>>>
>>>>>>>>>               and any opportunities they may have missed that we
>>>>>>>>>
>>>>>>>> can
>>>>
>>>>> incorporate into our
>>>>>>>>>
>>>>>>>>>               design
>>>>>>>>>
>>>>>>>>>             - do further investigation and ask the existing
>>>>>>>>>
>>>>>>>> community
>>>>
>>>>> for
>>>>>>>
>>>>>>>> guidance on existing
>>>>>>>>>
>>>>>>>>>               parquet-mr features or planned APIs that may provide
>>>>>>>>>
>>>>>>>> desired
>>>>>>>
>>>>>>>> functionality
>>>>>>>>>
>>>>>>>>>         - We will begin a discussion of an API for the new
>>>>>>>>>
>>>>>>>> functionality
>>>>>>
>>>>>>>
>>>>>>>>>             - some outstanding thoughts for down the road
>>>>>>>>>
>>>>>>>>>                 - The Drill team has an interest in very late
>>>>>>>>> materialization for data stored
>>>>>>>>>
>>>>>>>>>                   in dictionary encoded pages, such as running a
>>>>>>>>>
>>>>>>>> join or
>>>>>>
>>>>>>> filter on the dictionary
>>>>>>>>>
>>>>>>>>>                   and then going back to the reader to grab all of
>>>>>>>>>
>>>>>>>> the
>>>>>>
>>>>>>> values in the data that match
>>>>>>>>>
>>>>>>>>>                   the needed members of the dictionary
>>>>>>>>>
>>>>>>>>>                     - this is a later consideration, but just
>>>>>>>>>
>>>>>>>> some of
>>>>
>>>>> the
>>>>>>>
>>>>>>>> idea of the reason we are
>>>>>>>>>
>>>>>>>>>                       opening up the design discussion early so
>>>>>>>>>
>>>>>>>> that
>>>>
>>>>> the
>>>>>>
>>>>>>> API can be flexible enough
>>>>>>>>>                       to allow this in the further, even if not
>>>>>>>>>
>>>>>>>> implemented
>>>>>>>>
>>>>>>>>> too soon
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>
> --
> Ryan Blue
> Software Engineer
> Cloudera, Inc.
>

Re: High performance vectorized reader meeting notes

Reply via email to