Hi,

My application needs to send some objects to map tasks, which specify how to
process the input records. I know I can transfer them as string via the
configuration file. But I prefer to leverage hadoop Writable interface,
since the objects require a recursive serialization.

I tried to create a subclass of FileSplit to convey the data, but finally I
found that it's not elegant to implement. Because the FileSplits are
initialized in getSplits() of InputFormat, while the only way to initialize
the InputFormat is via the setConf(). So I have to end up implementing 3 new
subclass with the same custom fields: FileSplit, InputFormat and
Configuration.

Another approach may be to write these objects to a file on the HDFS or
DistributedCache.

I just wonder is there a better way to do this job?

Thank you.
---
Zhiwei Xiao

Reply via email to