Hi, It is a common usecase to read big files on HDFS in parallel fashion i.e. many reader thread are used to read the file in parallel. We can achieve this on top of Apex using following Malhar operators:
1. AbstractFileSplitter 2. AbstractBlockReader where FileSplitter, as per file metadata, creates small reader tasks(to read file in parts). Those reader tasks are run by BlockReaders in parallel to read the file. As these operators are generally used together to achieve file read operation, I propose we create a module, called HDFSFileReader for this. Please provide your suggestions on same. -Priyanka
