+1 for Parquet reader. ~ Yogi
On 14 March 2016 at 11:41, Yogi Devendra <[email protected]> wrote: > Shubham, > > I feel that instead of having an operator; it should be a plugin to the > input operator. > > So that, if someone has some other input operator for a particular file > system (extending AbstractFileInputOperator) he should be able to read > Parquet file from that file system using this plugin. > > ~ Yogi > > On 14 March 2016 at 11:31, Tushar Gosavi <[email protected]> wrote: > >> +1 >> >> Does Parquet support partitioned read from a single file? If yes then may >> be we can also add support in FileSplitterInput and BlockReader to read >> single file parallely. >> >> - Tushar. >> >> >> >> On Mon, Mar 14, 2016 at 11:23 AM, Devendra Tagare < >> [email protected] >> > wrote: >> >> > + 1 >> > >> > ~Dev >> > >> > On Mon, Mar 14, 2016 at 11:12 AM, Shubham Pathak < >> [email protected]> >> > wrote: >> > >> > > Hello Community, >> > > >> > > I am working on developing a ParquetReaderOperator which will allow >> apex >> > > users to read parquet files. >> > > >> > > Apache Parquet is a columnar storage format available to any project >> in >> > the >> > > Hadoop ecosystem, regardless of the choice of data processing >> framework, >> > > data model or programming language. >> > > For more information : Apache Parquet >> > > <https://parquet.apache.org/documentation/latest/> >> > > >> > > Proposed design : >> > > >> > > 1. Develop AbstractParquetFileReaderOperator that extends >> > > from AbstractFileInputOperator. >> > > 2. Override openFile() method to instantiate a ParquetReader ( >> reader >> > > provided by parquet-mr <https://github.com/Parquet/parquet-mr> >> > project >> > > that reads parquet records from a file ) with GroupReadSupport ( >> > records >> > > would be read as Group ) . >> > > 3. Override readEntity() method to read the records and call >> > > convertGroup() method. Derived classes to override convertGroup() >> > > method >> > > to convert Group to any form required by downstream operators. >> > > 4. Provide a concrete implementation, ParquetFilePOJOReader >> operator >> > > that extends from AbstractParquetFileReaderOperator and >> > > overrides convertGroup() method to convert a given Group to POJO. >> > > >> > > Parquet schema and directory path would be inputs to the base >> operator. >> > For >> > > ParquetFilePOJOReader, pojo class would also be required. >> > > >> > > Please feel free to let me know your thoughts on this. >> > > >> > > Thanks, >> > > Shubham >> > > >> > >> > >
