You can't send attachments. Can you post as google doc or gist? On Mon, Oct 27, 2014 at 7:41 PM, Zhenxiao Luo <[email protected]> wrote:
> > Thanks Brock and Jason. > > I just drafted a proposed APIs for vectorized Parquet reader(attached in > this email). Any comments and suggestions are appreciated. > > Thanks, > Zhenxiao > > On Tue, Oct 7, 2014 at 5:34 PM, Brock Noland <[email protected]> wrote: > >> Hi, >> >> The Hive + Parquet community is very interested in improving performance >> of >> Hive + Parquet and Parquet generally. We are very interested in >> contributing to the Parquet vectorization and lazy materialization effort. >> Please add myself to any future meetings on this topic. >> >> BTW here it the JIRA tracking this effort from the Hive side: >> https://issues.apache.org/jira/browse/HIVE-8120 >> >> Brock >> >> On Tue, Oct 7, 2014 at 2:04 PM, Zhenxiao Luo <[email protected]> >> wrote: >> >> > Thanks Jason. >> > >> > Yes, Netflix is using Presto and Parquet for our BigDataPlatform( >> > >> > >> http://techblog.netflix.com/2014/10/using-presto-in-our-big-data-platform.html >> > ). >> > >> > The fastest format currently in Presto is ORC, not DWRF(Parquet is fast, >> > but not as fast as ORC). We are referring to ORC, not facebook's DWRF >> > implementation. >> > >> > We already get Parquet working in Presto. We definitely would like to >> get >> > it as fast as ORC. >> > >> > Facebook has did native support for ORC in Presto, which does not use >> the >> > ORCRecordReader at all. They parses the ORC footer, and does Predicate >> > Pushdown by skipping row groups, Vectorization by introducing Type >> Specific >> > Vectors, and Lazy Materialization by introducing LazyVectors(their code >> has >> > not been committed yet, I mean their pull request). We are planning to >> do >> > similar optimization for Parquet in Presto. >> > >> > For the ParquetRecordReader, we need additional APIs to read the next >> Batch >> > of values, and read in a vector of values. For example, here are the >> > related APIs in the ORC code: >> > >> > /** >> > * Read the next row batch. The size of the batch to read cannot be >> > controlled >> > * by the callers. Caller need to look at VectorizedRowBatch.size of >> the >> > retunred >> > * object to know the batch size read. >> > * @param previousBatch a row batch object that can be reused by the >> > reader >> > * @return the row batch that was read >> > * @throws java.io.IOException >> > */ >> > VectorizedRowBatch nextBatch(VectorizedRowBatch previousBatch) throws >> > IOException; >> > >> > And, here are the related APIs in Presto code, which is used for ORC >> > support in Presto: >> > >> > public void readVector(int columnIndex, Object vector); >> > >> > For lazy materialization, we may also consider adding LazyVectors or >> > LazyBlocks, so that the value is not materialized until they are >> accessed >> > by the Operator. >> > >> > Any comments and suggestions are appreciated. >> > >> > Thanks, >> > Zhenxiao >> > >> > >> > On Tue, Oct 7, 2014 at 1:05 PM, Jason Altekruse < >> [email protected]> >> > wrote: >> > >> > > Hello All, >> > > >> > > No updates from me yet, just sending out another message for some of >> the >> > > Netflix engineers that were still just subscribed to the google group >> > mail. >> > > This will allow them to respond directly with their research on the >> > > optimized ORC reader for consideration in the design discussion. >> > > >> > > -Jason >> > > >> > > On Mon, Oct 6, 2014 at 10:51 PM, Jason Altekruse < >> > [email protected] >> > > > >> > > wrote: >> > > >> > > > Hello Parquet team, >> > > > >> > > > I wanted to report the results of a discussion between the Drill >> team >> > and >> > > > the engineers at Netflix working to make Parquet run faster with >> > Presto. >> > > > As we have said in the last few hangouts we both want to make >> > > contributions >> > > > back to parquet-mr to add features and performance. We thought it >> would >> > > be >> > > > good to sit down and speak directly about our real goals and the >> best >> > > next >> > > > steps to get an engineering effort started to accomplish these >> goals. >> > > > >> > > > Below is a summary of the meeting. >> > > > >> > > > - Meeting notes >> > > > >> > > > - Attendees: >> > > > >> > > > - Netflix : Eva Tse, Daniel Weeks, Zhenxiao Luo >> > > > >> > > > - MapR (Drill Team) : Jacques Nadeau, Jason Altekruse, Parth >> > > Chandra >> > > > >> > > > - Minutes >> > > > >> > > > - Introductions/ Background >> > > > >> > > > - Netflix >> > > > >> > > > - Working on providing interactive SQL querying to users >> > > > >> > > > - have chosen Presto as the query engine and Parquet as high >> > > > performance data >> > > > >> > > > storage format >> > > > >> > > > - Presto is providing needed speed in some cases, but others >> are >> > > > missing optimizations >> > > > >> > > > that could be avoiding reads >> > > > >> > > > - Have already started some development and investigation, >> have >> > > > identified key goals >> > > > >> > > > - Some initial benchmarks with a modified ORC reader DWRF, >> > written >> > > > by the Presto >> > > > >> > > > team shows that such gains are possible with a different >> > reader >> > > > implementation >> > > > >> > > > - goals >> > > > >> > > > - filter pushdown >> > > > >> > > > - skipping reads based on filter evaluation on one or >> > more >> > > > columns >> > > > >> > > > - this can happen at several granularities : row >> group, >> > > > page, record/value >> > > > >> > > > - late/lazy materialization >> > > > >> > > > - for columns not involved in a filter, avoid >> > > materializing >> > > > them entirely >> > > > >> > > > until they are know to be needed after evaluating a >> > > > filter on other columns >> > > > >> > > > - Drill >> > > > >> > > > - the Drill engine uses an in-memory vectorized >> representation >> > of >> > > > records >> > > > >> > > > - for scalar and repeated types we have implemented a fast >> > > > vectorized reader >> > > > >> > > > that is optimized to transform between Parquet's on disk >> and >> > our >> > > > in-memory format >> > > > >> > > > - this is currently producing performant table scans, but >> has no >> > > > facility for filter >> > > > >> > > > push down >> > > > >> > > > - Major goals going forward >> > > > >> > > > - filter pushdown >> > > > >> > > > - decide the best implementation for incorporating >> > filter >> > > > pushdown into >> > > > >> > > > our current implementation, or figure out a way to >> > > > leverage existing >> > > > >> > > > work in the parquet-mr library to accomplish this >> goal >> > > > >> > > > - late/lazy materialization >> > > > >> > > > - see above >> > > > >> > > > - contribute existing code back to parquet >> > > > >> > > > - the Drill parquet reader has a very strong >> emphasis on >> > > > performance, a >> > > > >> > > > clear interface to consume, that is sufficiently >> > > > separated from Drill >> > > > >> > > > could prove very useful for other projects >> > > > >> > > > - First steps >> > > > >> > > > - Netflix team will share some of their thoughts and research >> > from >> > > > working with >> > > > >> > > > the DWRF code >> > > > >> > > > - we can have a discussion based off of this, which >> aspects >> > > are >> > > > done well, >> > > > >> > > > and any opportunities they may have missed that we can >> > > > incorporate into our >> > > > >> > > > design >> > > > >> > > > - do further investigation and ask the existing community >> > for >> > > > guidance on existing >> > > > >> > > > parquet-mr features or planned APIs that may provide >> > desired >> > > > functionality >> > > > >> > > > - We will begin a discussion of an API for the new >> functionality >> > > > >> > > > - some outstanding thoughts for down the road >> > > > >> > > > - The Drill team has an interest in very late >> > > > materialization for data stored >> > > > >> > > > in dictionary encoded pages, such as running a >> join or >> > > > filter on the dictionary >> > > > >> > > > and then going back to the reader to grab all of >> the >> > > > values in the data that match >> > > > >> > > > the needed members of the dictionary >> > > > >> > > > - this is a later consideration, but just some of >> > the >> > > > idea of the reason we are >> > > > >> > > > opening up the design discussion early so that >> the >> > > > API can be flexible enough >> > > > to allow this in the further, even if not >> > > implemented >> > > > too soon >> > > > >> > > >> > >> > >
