Shubham,

I feel that instead of having an operator; it should be a plugin to the
input operator.

So that, if someone has some other input operator for a particular file
system (extending AbstractFileInputOperator) he should be able to read
Parquet file from that file system using this plugin.

~ Yogi

On 14 March 2016 at 11:31, Tushar Gosavi <[email protected]> wrote:

> +1
>
> Does Parquet support partitioned read from a single file? If yes then may
> be we can also add support in FileSplitterInput and BlockReader to read
> single file parallely.
>
> - Tushar.
>
>
>
> On Mon, Mar 14, 2016 at 11:23 AM, Devendra Tagare <
> [email protected]
> > wrote:
>
> > + 1
> >
> > ~Dev
> >
> > On Mon, Mar 14, 2016 at 11:12 AM, Shubham Pathak <
> [email protected]>
> > wrote:
> >
> > > Hello Community,
> > >
> > > I am working on developing a ParquetReaderOperator which will allow
> apex
> > > users to read parquet files.
> > >
> > > Apache Parquet is a columnar storage format available to any project in
> > the
> > > Hadoop ecosystem, regardless of the choice of data processing
> framework,
> > > data model or programming language.
> > > For more information : Apache Parquet
> > > <https://parquet.apache.org/documentation/latest/>
> > >
> > > Proposed design :
> > >
> > >    1. Develop  AbstractParquetFileReaderOperator that extends
> > >    from AbstractFileInputOperator.
> > >    2. Override openFile() method to instantiate a ParquetReader (
> reader
> > >    provided by parquet-mr <https://github.com/Parquet/parquet-mr>
> > project
> > >    that reads parquet records from a file ) with GroupReadSupport (
> > records
> > >    would be read as Group ) .
> > >    3. Override  readEntity() method to read the records and call
> > >    convertGroup() method.  Derived classes to override convertGroup()
> > > method
> > >    to convert Group to any form required by downstream operators.
> > >    4. Provide a concrete implementation, ParquetFilePOJOReader operator
> > >    that extends from AbstractParquetFileReaderOperator and
> > >    overrides convertGroup() method to convert a given Group to POJO.
> > >
> > > Parquet schema and directory path would be inputs to the base operator.
> > For
> > > ParquetFilePOJOReader, pojo class would also be required.
> > >
> > > Please feel free to let me know your thoughts on this.
> > >
> > > Thanks,
> > > Shubham
> > >
> >
>

Reply via email to