Let me rephrase Ram's question to make it clear: For an application developer using Malhar: What are the advantages / disadvantages of using the proposed HDFS File input Module as compared to directly using FileSplitter, BlockReader Operators available in Malhar?
~ Yogi On 16 February 2016 at 21:56, Munagala Ramanath <[email protected]> wrote: > Can parallel read not be achieved by partitioning ? > > Ram > > On Tue, Feb 16, 2016 at 1:01 AM, Priyanka Gugale <[email protected] > > > wrote: > > > Hi, > > > > It is a common usecase to read big files on HDFS in parallel fashion i.e. > > many reader thread are used to read the file in parallel. We can achieve > > this on top of Apex using following Malhar operators: > > > > 1. AbstractFileSplitter > > 2. AbstractBlockReader > > > > where FileSplitter, as per file metadata, creates small reader tasks(to > > read file in parts). Those reader tasks are run by BlockReaders in > parallel > > to read the file. > > > > As these operators are generally used together to achieve file read > > operation, I propose we create a module, called HDFSFileReader for this. > > > > Please provide your suggestions on same. > > > > -Priyanka > > >
