Re: Multiple input formats and multiple output formats in Hadoop 0.20.2

Jian Fang Wed, 10 Aug 2011 09:26:32 -0700

Thanks. Will try that.

On Wed, Aug 10, 2011 at 12:20 PM, Dino Kečo <dino.k...@gmail.com> wrote:


> Hi John,
>
> I think this is what are you looking for:
>
>
> http://archive.cloudera.com/cdh/3/hadoop/api/org/apache/hadoop/mapreduce/lib/input/MultipleInputs.html
>
>
> http://archive.cloudera.com/cdh/3/hadoop/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html
>
> Examples of usages are part of API doc.
>
> Regards,
> Dino Kečo
>
>
> On Wed, Aug 10, 2011 at 6:08 PM, Jian Fang 
> <jian.fang.subscr...@gmail.com>wrote:
>
>> Hi,
>>
>> I am working on a project, which requires multiple input formats and
>> multiple output formats. Basically, I store some sales rank data to a
>> Cassandra cluster and I get a sales rank update file each day to update the
>> ranks in the Cassandra. In the meanwhile, I need to find all the products
>> whose rank change exceeds a threshold and output them to a file. That is to
>> say, I need two input formats, one from the file system (sales rank update
>> file) and one from the Cassandra (current sales rank), and two output
>> formats, one to the file system (products whose rank change exceeds a
>> threshold) and one to Cassandra (write the new rank to Cassandra).
>>
>> Right now, I used multiple cascading jobs to do the work and use HDFS to
>> share data among jobs. But this is not very efficient since some
>> intermediate files need to be read multiple times in different jobs. I
>> wonder if there is a more elegant way to solve this problem. Seems Hadoop
>> 0.19 supports multiple input/output formats. It would be great if I could
>> merge the multiple jobs to one with multiple input formats and multiple
>> output formats. Is this doable in Hadoop 0.20.2?  Are there any examples of
>> multiple input formats and multiple output formats for Hadoop 0.20.2?
>>
>> Thanks in advance,
>>
>> John
>>
>>
>

Re: Multiple input formats and multiple output formats in Hadoop 0.20.2

Reply via email to