Thanks. Will try that. On Wed, Aug 10, 2011 at 12:20 PM, Dino Kečo <dino.k...@gmail.com> wrote:
> Hi John, > > I think this is what are you looking for: > > > http://archive.cloudera.com/cdh/3/hadoop/api/org/apache/hadoop/mapreduce/lib/input/MultipleInputs.html > > > http://archive.cloudera.com/cdh/3/hadoop/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html > > Examples of usages are part of API doc. > > Regards, > Dino Kečo > > > On Wed, Aug 10, 2011 at 6:08 PM, Jian Fang > <jian.fang.subscr...@gmail.com>wrote: > >> Hi, >> >> I am working on a project, which requires multiple input formats and >> multiple output formats. Basically, I store some sales rank data to a >> Cassandra cluster and I get a sales rank update file each day to update the >> ranks in the Cassandra. In the meanwhile, I need to find all the products >> whose rank change exceeds a threshold and output them to a file. That is to >> say, I need two input formats, one from the file system (sales rank update >> file) and one from the Cassandra (current sales rank), and two output >> formats, one to the file system (products whose rank change exceeds a >> threshold) and one to Cassandra (write the new rank to Cassandra). >> >> Right now, I used multiple cascading jobs to do the work and use HDFS to >> share data among jobs. But this is not very efficient since some >> intermediate files need to be read multiple times in different jobs. I >> wonder if there is a more elegant way to solve this problem. Seems Hadoop >> 0.19 supports multiple input/output formats. It would be great if I could >> merge the multiple jobs to one with multiple input formats and multiple >> output formats. Is this doable in Hadoop 0.20.2? Are there any examples of >> multiple input formats and multiple output formats for Hadoop 0.20.2? >> >> Thanks in advance, >> >> John >> >> >