Hi, My application needs to send some objects to map tasks, which specify how to process the input records. I know I can transfer them as string via the configuration file. But I prefer to leverage hadoop Writable interface, since the objects require a recursive serialization.
I tried to create a subclass of FileSplit to convey the data, but finally I found that it's not elegant to implement. Because the FileSplits are initialized in getSplits() of InputFormat, while the only way to initialize the InputFormat is via the setConf(). So I have to end up implementing 3 new subclass with the same custom fields: FileSplit, InputFormat and Configuration. Another approach may be to write these objects to a file on the HDFS or DistributedCache. I just wonder is there a better way to do this job? Thank you. --- Zhiwei Xiao