+1 for Parquet reader.

~ Yogi

On 14 March 2016 at 11:41, Yogi Devendra <[email protected]> wrote:

> Shubham,
>
> I feel that instead of having an operator; it should be a plugin to the
> input operator.
>
> So that, if someone has some other input operator for a particular file
> system (extending AbstractFileInputOperator) he should be able to read
> Parquet file from that file system using this plugin.
>
> ~ Yogi
>
> On 14 March 2016 at 11:31, Tushar Gosavi <[email protected]> wrote:
>
>> +1
>>
>> Does Parquet support partitioned read from a single file? If yes then may
>> be we can also add support in FileSplitterInput and BlockReader to read
>> single file parallely.
>>
>> - Tushar.
>>
>>
>>
>> On Mon, Mar 14, 2016 at 11:23 AM, Devendra Tagare <
>> [email protected]
>> > wrote:
>>
>> > + 1
>> >
>> > ~Dev
>> >
>> > On Mon, Mar 14, 2016 at 11:12 AM, Shubham Pathak <
>> [email protected]>
>> > wrote:
>> >
>> > > Hello Community,
>> > >
>> > > I am working on developing a ParquetReaderOperator which will allow
>> apex
>> > > users to read parquet files.
>> > >
>> > > Apache Parquet is a columnar storage format available to any project
>> in
>> > the
>> > > Hadoop ecosystem, regardless of the choice of data processing
>> framework,
>> > > data model or programming language.
>> > > For more information : Apache Parquet
>> > > <https://parquet.apache.org/documentation/latest/>
>> > >
>> > > Proposed design :
>> > >
>> > >    1. Develop  AbstractParquetFileReaderOperator that extends
>> > >    from AbstractFileInputOperator.
>> > >    2. Override openFile() method to instantiate a ParquetReader (
>> reader
>> > >    provided by parquet-mr <https://github.com/Parquet/parquet-mr>
>> > project
>> > >    that reads parquet records from a file ) with GroupReadSupport (
>> > records
>> > >    would be read as Group ) .
>> > >    3. Override  readEntity() method to read the records and call
>> > >    convertGroup() method.  Derived classes to override convertGroup()
>> > > method
>> > >    to convert Group to any form required by downstream operators.
>> > >    4. Provide a concrete implementation, ParquetFilePOJOReader
>> operator
>> > >    that extends from AbstractParquetFileReaderOperator and
>> > >    overrides convertGroup() method to convert a given Group to POJO.
>> > >
>> > > Parquet schema and directory path would be inputs to the base
>> operator.
>> > For
>> > > ParquetFilePOJOReader, pojo class would also be required.
>> > >
>> > > Please feel free to let me know your thoughts on this.
>> > >
>> > > Thanks,
>> > > Shubham
>> > >
>> >
>>
>
>

Reply via email to