Hi all,
to create a RecordReader in new API, we needs a TaskAttemptContext object, 
which 
seems to me the RecordReader should only be created on each split that has been 
assigned a task ID. However, I want to do a centralized sampling and create 
record reader on some splits before the job is submitted. What I am doing is 
create a dummy TaskAttemptContext and use it to create record reader, but not 
sure whether there is some side-effects. Is there any better way to do this? 
Why 
we are not supposed to create record reader centrally as indicated by the new 
API?

Thanks,
-Gang




Reply via email to