+1

Does Parquet support partitioned read from a single file? If yes then may
be we can also add support in FileSplitterInput and BlockReader to read
single file parallely.

- Tushar.



On Mon, Mar 14, 2016 at 11:23 AM, Devendra Tagare <[email protected]
> wrote:

> + 1
>
> ~Dev
>
> On Mon, Mar 14, 2016 at 11:12 AM, Shubham Pathak <[email protected]>
> wrote:
>
> > Hello Community,
> >
> > I am working on developing a ParquetReaderOperator which will allow apex
> > users to read parquet files.
> >
> > Apache Parquet is a columnar storage format available to any project in
> the
> > Hadoop ecosystem, regardless of the choice of data processing framework,
> > data model or programming language.
> > For more information : Apache Parquet
> > <https://parquet.apache.org/documentation/latest/>
> >
> > Proposed design :
> >
> >    1. Develop  AbstractParquetFileReaderOperator that extends
> >    from AbstractFileInputOperator.
> >    2. Override openFile() method to instantiate a ParquetReader ( reader
> >    provided by parquet-mr <https://github.com/Parquet/parquet-mr>
> project
> >    that reads parquet records from a file ) with GroupReadSupport (
> records
> >    would be read as Group ) .
> >    3. Override  readEntity() method to read the records and call
> >    convertGroup() method.  Derived classes to override convertGroup()
> > method
> >    to convert Group to any form required by downstream operators.
> >    4. Provide a concrete implementation, ParquetFilePOJOReader operator
> >    that extends from AbstractParquetFileReaderOperator and
> >    overrides convertGroup() method to convert a given Group to POJO.
> >
> > Parquet schema and directory path would be inputs to the base operator.
> For
> > ParquetFilePOJOReader, pojo class would also be required.
> >
> > Please feel free to let me know your thoughts on this.
> >
> > Thanks,
> > Shubham
> >
>

Reply via email to