Can parallel read not be achieved by partitioning ?

Ram

On Tue, Feb 16, 2016 at 1:01 AM, Priyanka Gugale <[email protected]>
wrote:

> Hi,
>
> It is a common usecase to read big files on HDFS in parallel fashion i.e.
> many reader thread are used to read the file in parallel. We can achieve
> this on top of Apex using following Malhar operators:
>
> 1. AbstractFileSplitter
> 2. AbstractBlockReader
>
> where FileSplitter, as per file metadata, creates small reader tasks(to
> read file in parts). Those reader tasks are run by BlockReaders in parallel
> to read the file.
>
> As these operators are generally used together to achieve file read
> operation, I propose we create a module, called HDFSFileReader for this.
>
> Please provide your suggestions on same.
>
> -Priyanka
>

Reply via email to