+1 Will this support reading a single file in parallel? On 29-Apr-2016 3:27 pm, "Mohit Jotwani" <[email protected]> wrote:
> +1 > > Regards, > Mohit > > On Thu, Apr 28, 2016 at 4:29 PM, Yogi Devendra < > [email protected] > > wrote: > > > Hi, > > > > My usecase involves reading from HDFS and emit each record as a separate > > tuple. Record can be either fixed length record or separator based record > > (such as newline). Expected output is byte[] for each record. > > > > I am planning to solve this as follows: > > - New operator which extends BlockReader. > > - It will have configuration option to select mode for FIXED_LENGTH, > > SEPARATOR_BASED. > > - Use appropriate ReaderContext based on mode. > > > > Reason for having different operator than BlockReader is because output > > port signature is different than BlockReader. This new operator can be > used > > in conjunction with FileSplitter. > > > > Any feedback? > > > > ~ Yogi > > >
