Shubham, I feel that instead of having an operator; it should be a plugin to the input operator.
So that, if someone has some other input operator for a particular file system (extending AbstractFileInputOperator) he should be able to read Parquet file from that file system using this plugin. ~ Yogi On 14 March 2016 at 11:31, Tushar Gosavi <[email protected]> wrote: > +1 > > Does Parquet support partitioned read from a single file? If yes then may > be we can also add support in FileSplitterInput and BlockReader to read > single file parallely. > > - Tushar. > > > > On Mon, Mar 14, 2016 at 11:23 AM, Devendra Tagare < > [email protected] > > wrote: > > > + 1 > > > > ~Dev > > > > On Mon, Mar 14, 2016 at 11:12 AM, Shubham Pathak < > [email protected]> > > wrote: > > > > > Hello Community, > > > > > > I am working on developing a ParquetReaderOperator which will allow > apex > > > users to read parquet files. > > > > > > Apache Parquet is a columnar storage format available to any project in > > the > > > Hadoop ecosystem, regardless of the choice of data processing > framework, > > > data model or programming language. > > > For more information : Apache Parquet > > > <https://parquet.apache.org/documentation/latest/> > > > > > > Proposed design : > > > > > > 1. Develop AbstractParquetFileReaderOperator that extends > > > from AbstractFileInputOperator. > > > 2. Override openFile() method to instantiate a ParquetReader ( > reader > > > provided by parquet-mr <https://github.com/Parquet/parquet-mr> > > project > > > that reads parquet records from a file ) with GroupReadSupport ( > > records > > > would be read as Group ) . > > > 3. Override readEntity() method to read the records and call > > > convertGroup() method. Derived classes to override convertGroup() > > > method > > > to convert Group to any form required by downstream operators. > > > 4. Provide a concrete implementation, ParquetFilePOJOReader operator > > > that extends from AbstractParquetFileReaderOperator and > > > overrides convertGroup() method to convert a given Group to POJO. > > > > > > Parquet schema and directory path would be inputs to the base operator. > > For > > > ParquetFilePOJOReader, pojo class would also be required. > > > > > > Please feel free to let me know your thoughts on this. > > > > > > Thanks, > > > Shubham > > > > > >
