Thanks Jason. Yes, Netflix is using Presto and Parquet for our BigDataPlatform( http://techblog.netflix.com/2014/10/using-presto-in-our-big-data-platform.html ).
The fastest format currently in Presto is ORC, not DWRF(Parquet is fast, but not as fast as ORC). We are referring to ORC, not facebook's DWRF implementation. We already get Parquet working in Presto. We definitely would like to get it as fast as ORC. Facebook has did native support for ORC in Presto, which does not use the ORCRecordReader at all. They parses the ORC footer, and does Predicate Pushdown by skipping row groups, Vectorization by introducing Type Specific Vectors, and Lazy Materialization by introducing LazyVectors(their code has not been committed yet, I mean their pull request). We are planning to do similar optimization for Parquet in Presto. For the ParquetRecordReader, we need additional APIs to read the next Batch of values, and read in a vector of values. For example, here are the related APIs in the ORC code: /** * Read the next row batch. The size of the batch to read cannot be controlled * by the callers. Caller need to look at VectorizedRowBatch.size of the retunred * object to know the batch size read. * @param previousBatch a row batch object that can be reused by the reader * @return the row batch that was read * @throws java.io.IOException */ VectorizedRowBatch nextBatch(VectorizedRowBatch previousBatch) throws IOException; And, here are the related APIs in Presto code, which is used for ORC support in Presto: public void readVector(int columnIndex, Object vector); For lazy materialization, we may also consider adding LazyVectors or LazyBlocks, so that the value is not materialized until they are accessed by the Operator. Any comments and suggestions are appreciated. Thanks, Zhenxiao On Tue, Oct 7, 2014 at 1:05 PM, Jason Altekruse <[email protected]> wrote: > Hello All, > > No updates from me yet, just sending out another message for some of the > Netflix engineers that were still just subscribed to the google group mail. > This will allow them to respond directly with their research on the > optimized ORC reader for consideration in the design discussion. > > -Jason > > On Mon, Oct 6, 2014 at 10:51 PM, Jason Altekruse <[email protected] > > > wrote: > > > Hello Parquet team, > > > > I wanted to report the results of a discussion between the Drill team and > > the engineers at Netflix working to make Parquet run faster with Presto. > > As we have said in the last few hangouts we both want to make > contributions > > back to parquet-mr to add features and performance. We thought it would > be > > good to sit down and speak directly about our real goals and the best > next > > steps to get an engineering effort started to accomplish these goals. > > > > Below is a summary of the meeting. > > > > - Meeting notes > > > > - Attendees: > > > > - Netflix : Eva Tse, Daniel Weeks, Zhenxiao Luo > > > > - MapR (Drill Team) : Jacques Nadeau, Jason Altekruse, Parth > Chandra > > > > - Minutes > > > > - Introductions/ Background > > > > - Netflix > > > > - Working on providing interactive SQL querying to users > > > > - have chosen Presto as the query engine and Parquet as high > > performance data > > > > storage format > > > > - Presto is providing needed speed in some cases, but others are > > missing optimizations > > > > that could be avoiding reads > > > > - Have already started some development and investigation, have > > identified key goals > > > > - Some initial benchmarks with a modified ORC reader DWRF, written > > by the Presto > > > > team shows that such gains are possible with a different reader > > implementation > > > > - goals > > > > - filter pushdown > > > > - skipping reads based on filter evaluation on one or more > > columns > > > > - this can happen at several granularities : row group, > > page, record/value > > > > - late/lazy materialization > > > > - for columns not involved in a filter, avoid > materializing > > them entirely > > > > until they are know to be needed after evaluating a > > filter on other columns > > > > - Drill > > > > - the Drill engine uses an in-memory vectorized representation of > > records > > > > - for scalar and repeated types we have implemented a fast > > vectorized reader > > > > that is optimized to transform between Parquet's on disk and our > > in-memory format > > > > - this is currently producing performant table scans, but has no > > facility for filter > > > > push down > > > > - Major goals going forward > > > > - filter pushdown > > > > - decide the best implementation for incorporating filter > > pushdown into > > > > our current implementation, or figure out a way to > > leverage existing > > > > work in the parquet-mr library to accomplish this goal > > > > - late/lazy materialization > > > > - see above > > > > - contribute existing code back to parquet > > > > - the Drill parquet reader has a very strong emphasis on > > performance, a > > > > clear interface to consume, that is sufficiently > > separated from Drill > > > > could prove very useful for other projects > > > > - First steps > > > > - Netflix team will share some of their thoughts and research from > > working with > > > > the DWRF code > > > > - we can have a discussion based off of this, which aspects > are > > done well, > > > > and any opportunities they may have missed that we can > > incorporate into our > > > > design > > > > - do further investigation and ask the existing community for > > guidance on existing > > > > parquet-mr features or planned APIs that may provide desired > > functionality > > > > - We will begin a discussion of an API for the new functionality > > > > - some outstanding thoughts for down the road > > > > - The Drill team has an interest in very late > > materialization for data stored > > > > in dictionary encoded pages, such as running a join or > > filter on the dictionary > > > > and then going back to the reader to grab all of the > > values in the data that match > > > > the needed members of the dictionary > > > > - this is a later consideration, but just some of the > > idea of the reason we are > > > > opening up the design discussion early so that the > > API can be flexible enough > > to allow this in the further, even if not > implemented > > too soon > > >
