Can parallel read not be achieved by partitioning ? Ram
On Tue, Feb 16, 2016 at 1:01 AM, Priyanka Gugale <[email protected]> wrote: > Hi, > > It is a common usecase to read big files on HDFS in parallel fashion i.e. > many reader thread are used to read the file in parallel. We can achieve > this on top of Apex using following Malhar operators: > > 1. AbstractFileSplitter > 2. AbstractBlockReader > > where FileSplitter, as per file metadata, creates small reader tasks(to > read file in parts). Those reader tasks are run by BlockReaders in parallel > to read the file. > > As these operators are generally used together to achieve file read > operation, I propose we create a module, called HDFSFileReader for this. > > Please provide your suggestions on same. > > -Priyanka >
