Hi John, I think this is what are you looking for:
http://archive.cloudera.com/cdh/3/hadoop/api/org/apache/hadoop/mapreduce/lib/input/MultipleInputs.html http://archive.cloudera.com/cdh/3/hadoop/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html Examples of usages are part of API doc. Regards, Dino Kečo On Wed, Aug 10, 2011 at 6:08 PM, Jian Fang <jian.fang.subscr...@gmail.com>wrote: > Hi, > > I am working on a project, which requires multiple input formats and > multiple output formats. Basically, I store some sales rank data to a > Cassandra cluster and I get a sales rank update file each day to update the > ranks in the Cassandra. In the meanwhile, I need to find all the products > whose rank change exceeds a threshold and output them to a file. That is to > say, I need two input formats, one from the file system (sales rank update > file) and one from the Cassandra (current sales rank), and two output > formats, one to the file system (products whose rank change exceeds a > threshold) and one to Cassandra (write the new rank to Cassandra). > > Right now, I used multiple cascading jobs to do the work and use HDFS to > share data among jobs. But this is not very efficient since some > intermediate files need to be read multiple times in different jobs. I > wonder if there is a more elegant way to solve this problem. Seems Hadoop > 0.19 supports multiple input/output formats. It would be great if I could > merge the multiple jobs to one with multiple input formats and multiple > output formats. Is this doable in Hadoop 0.20.2? Are there any examples of > multiple input formats and multiple output formats for Hadoop 0.20.2? > > Thanks in advance, > > John > >