Hi, Great! I will take a look soon.
Cheers! Brock On Mon, Oct 27, 2014 at 11:18 PM, Zhenxiao Luo <[email protected]> wrote: > > Thanks Jacques. > > Here is the gist: > https://gist.github.com/zhenxiao/2728ce4fe0a7be2d3b30 > > Comments and Suggestions are appreciated. > > Thanks, > Zhenxiao > > On Mon, Oct 27, 2014 at 10:55 PM, Jacques Nadeau <[email protected]> > wrote: > >> You can't send attachments. Can you post as google doc or gist? >> >> On Mon, Oct 27, 2014 at 7:41 PM, Zhenxiao Luo <[email protected]> >> wrote: >> >> > >> > Thanks Brock and Jason. >> > >> > I just drafted a proposed APIs for vectorized Parquet reader(attached in >> > this email). Any comments and suggestions are appreciated. >> > >> > Thanks, >> > Zhenxiao >> > >> > On Tue, Oct 7, 2014 at 5:34 PM, Brock Noland <[email protected]> >> wrote: >> > >> >> Hi, >> >> >> >> The Hive + Parquet community is very interested in improving >> performance >> >> of >> >> Hive + Parquet and Parquet generally. We are very interested in >> >> contributing to the Parquet vectorization and lazy materialization >> effort. >> >> Please add myself to any future meetings on this topic. >> >> >> >> BTW here it the JIRA tracking this effort from the Hive side: >> >> https://issues.apache.org/jira/browse/HIVE-8120 >> >> >> >> Brock >> >> >> >> On Tue, Oct 7, 2014 at 2:04 PM, Zhenxiao Luo <[email protected] >> > >> >> wrote: >> >> >> >> > Thanks Jason. >> >> > >> >> > Yes, Netflix is using Presto and Parquet for our BigDataPlatform( >> >> > >> >> > >> >> >> http://techblog.netflix.com/2014/10/using-presto-in-our-big-data-platform.html >> >> > ). >> >> > >> >> > The fastest format currently in Presto is ORC, not DWRF(Parquet is >> fast, >> >> > but not as fast as ORC). We are referring to ORC, not facebook's DWRF >> >> > implementation. >> >> > >> >> > We already get Parquet working in Presto. We definitely would like to >> >> get >> >> > it as fast as ORC. >> >> > >> >> > Facebook has did native support for ORC in Presto, which does not use >> >> the >> >> > ORCRecordReader at all. They parses the ORC footer, and does >> Predicate >> >> > Pushdown by skipping row groups, Vectorization by introducing Type >> >> Specific >> >> > Vectors, and Lazy Materialization by introducing LazyVectors(their >> code >> >> has >> >> > not been committed yet, I mean their pull request). We are planning >> to >> >> do >> >> > similar optimization for Parquet in Presto. >> >> > >> >> > For the ParquetRecordReader, we need additional APIs to read the next >> >> Batch >> >> > of values, and read in a vector of values. For example, here are the >> >> > related APIs in the ORC code: >> >> > >> >> > /** >> >> > * Read the next row batch. The size of the batch to read cannot be >> >> > controlled >> >> > * by the callers. Caller need to look at VectorizedRowBatch.size >> of >> >> the >> >> > retunred >> >> > * object to know the batch size read. >> >> > * @param previousBatch a row batch object that can be reused by >> the >> >> > reader >> >> > * @return the row batch that was read >> >> > * @throws java.io.IOException >> >> > */ >> >> > VectorizedRowBatch nextBatch(VectorizedRowBatch previousBatch) >> throws >> >> > IOException; >> >> > >> >> > And, here are the related APIs in Presto code, which is used for ORC >> >> > support in Presto: >> >> > >> >> > public void readVector(int columnIndex, Object vector); >> >> > >> >> > For lazy materialization, we may also consider adding LazyVectors or >> >> > LazyBlocks, so that the value is not materialized until they are >> >> accessed >> >> > by the Operator. >> >> > >> >> > Any comments and suggestions are appreciated. >> >> > >> >> > Thanks, >> >> > Zhenxiao >> >> > >> >> > >> >> > On Tue, Oct 7, 2014 at 1:05 PM, Jason Altekruse < >> >> [email protected]> >> >> > wrote: >> >> > >> >> > > Hello All, >> >> > > >> >> > > No updates from me yet, just sending out another message for some >> of >> >> the >> >> > > Netflix engineers that were still just subscribed to the google >> group >> >> > mail. >> >> > > This will allow them to respond directly with their research on the >> >> > > optimized ORC reader for consideration in the design discussion. >> >> > > >> >> > > -Jason >> >> > > >> >> > > On Mon, Oct 6, 2014 at 10:51 PM, Jason Altekruse < >> >> > [email protected] >> >> > > > >> >> > > wrote: >> >> > > >> >> > > > Hello Parquet team, >> >> > > > >> >> > > > I wanted to report the results of a discussion between the Drill >> >> team >> >> > and >> >> > > > the engineers at Netflix working to make Parquet run faster with >> >> > Presto. >> >> > > > As we have said in the last few hangouts we both want to make >> >> > > contributions >> >> > > > back to parquet-mr to add features and performance. We thought it >> >> would >> >> > > be >> >> > > > good to sit down and speak directly about our real goals and the >> >> best >> >> > > next >> >> > > > steps to get an engineering effort started to accomplish these >> >> goals. >> >> > > > >> >> > > > Below is a summary of the meeting. >> >> > > > >> >> > > > - Meeting notes >> >> > > > >> >> > > > - Attendees: >> >> > > > >> >> > > > - Netflix : Eva Tse, Daniel Weeks, Zhenxiao Luo >> >> > > > >> >> > > > - MapR (Drill Team) : Jacques Nadeau, Jason Altekruse, >> Parth >> >> > > Chandra >> >> > > > >> >> > > > - Minutes >> >> > > > >> >> > > > - Introductions/ Background >> >> > > > >> >> > > > - Netflix >> >> > > > >> >> > > > - Working on providing interactive SQL querying to users >> >> > > > >> >> > > > - have chosen Presto as the query engine and Parquet as >> high >> >> > > > performance data >> >> > > > >> >> > > > storage format >> >> > > > >> >> > > > - Presto is providing needed speed in some cases, but >> others >> >> are >> >> > > > missing optimizations >> >> > > > >> >> > > > that could be avoiding reads >> >> > > > >> >> > > > - Have already started some development and investigation, >> >> have >> >> > > > identified key goals >> >> > > > >> >> > > > - Some initial benchmarks with a modified ORC reader DWRF, >> >> > written >> >> > > > by the Presto >> >> > > > >> >> > > > team shows that such gains are possible with a different >> >> > reader >> >> > > > implementation >> >> > > > >> >> > > > - goals >> >> > > > >> >> > > > - filter pushdown >> >> > > > >> >> > > > - skipping reads based on filter evaluation on >> one or >> >> > more >> >> > > > columns >> >> > > > >> >> > > > - this can happen at several granularities : row >> >> group, >> >> > > > page, record/value >> >> > > > >> >> > > > - late/lazy materialization >> >> > > > >> >> > > > - for columns not involved in a filter, avoid >> >> > > materializing >> >> > > > them entirely >> >> > > > >> >> > > > until they are know to be needed after >> evaluating a >> >> > > > filter on other columns >> >> > > > >> >> > > > - Drill >> >> > > > >> >> > > > - the Drill engine uses an in-memory vectorized >> >> representation >> >> > of >> >> > > > records >> >> > > > >> >> > > > - for scalar and repeated types we have implemented a fast >> >> > > > vectorized reader >> >> > > > >> >> > > > that is optimized to transform between Parquet's on disk >> >> and >> >> > our >> >> > > > in-memory format >> >> > > > >> >> > > > - this is currently producing performant table scans, but >> >> has no >> >> > > > facility for filter >> >> > > > >> >> > > > push down >> >> > > > >> >> > > > - Major goals going forward >> >> > > > >> >> > > > - filter pushdown >> >> > > > >> >> > > > - decide the best implementation for incorporating >> >> > filter >> >> > > > pushdown into >> >> > > > >> >> > > > our current implementation, or figure out a way >> to >> >> > > > leverage existing >> >> > > > >> >> > > > work in the parquet-mr library to accomplish >> this >> >> goal >> >> > > > >> >> > > > - late/lazy materialization >> >> > > > >> >> > > > - see above >> >> > > > >> >> > > > - contribute existing code back to parquet >> >> > > > >> >> > > > - the Drill parquet reader has a very strong >> >> emphasis on >> >> > > > performance, a >> >> > > > >> >> > > > clear interface to consume, that is sufficiently >> >> > > > separated from Drill >> >> > > > >> >> > > > could prove very useful for other projects >> >> > > > >> >> > > > - First steps >> >> > > > >> >> > > > - Netflix team will share some of their thoughts and >> research >> >> > from >> >> > > > working with >> >> > > > >> >> > > > the DWRF code >> >> > > > >> >> > > > - we can have a discussion based off of this, which >> >> aspects >> >> > > are >> >> > > > done well, >> >> > > > >> >> > > > and any opportunities they may have missed that we >> can >> >> > > > incorporate into our >> >> > > > >> >> > > > design >> >> > > > >> >> > > > - do further investigation and ask the existing >> community >> >> > for >> >> > > > guidance on existing >> >> > > > >> >> > > > parquet-mr features or planned APIs that may provide >> >> > desired >> >> > > > functionality >> >> > > > >> >> > > > - We will begin a discussion of an API for the new >> >> functionality >> >> > > > >> >> > > > - some outstanding thoughts for down the road >> >> > > > >> >> > > > - The Drill team has an interest in very late >> >> > > > materialization for data stored >> >> > > > >> >> > > > in dictionary encoded pages, such as running a >> >> join or >> >> > > > filter on the dictionary >> >> > > > >> >> > > > and then going back to the reader to grab all of >> >> the >> >> > > > values in the data that match >> >> > > > >> >> > > > the needed members of the dictionary >> >> > > > >> >> > > > - this is a later consideration, but just >> some of >> >> > the >> >> > > > idea of the reason we are >> >> > > > >> >> > > > opening up the design discussion early so >> that >> >> the >> >> > > > API can be flexible enough >> >> > > > to allow this in the further, even if not >> >> > > implemented >> >> > > > too soon >> >> > > > >> >> > > >> >> > >> >> >> > >> > >> > >
