Hi all, to create a RecordReader in new API, we needs a TaskAttemptContext object, which seems to me the RecordReader should only be created on each split that has been assigned a task ID. However, I want to do a centralized sampling and create record reader on some splits before the job is submitted. What I am doing is create a dummy TaskAttemptContext and use it to create record reader, but not sure whether there is some side-effects. Is there any better way to do this? Why we are not supposed to create record reader centrally as indicated by the new API?
Thanks, -Gang