Re: Duplicated entries with map job reading from HBase

Niels Basjes Fri, 05 Nov 2010 09:23:38 -0700

Hi,

I don't know the answer (simply not enough information in your email) but
I'm willing to make a guess:
You are running on a system with two processing nodes?
If so then try removing the Combiner. The combiner is a performance
optimization and the whole processing should work without it.
Some times there is a design fault in the processing and the combiner
disrupts the processing.


HTH

Niels Basjes

2010/11/5 Adam Phelps <a...@opendns.com>

> I've noticed an odd behavior with a map-reduce job I've written which is
> reading data out of an HBase table.  After a couple days of poking at this I
> haven't been able to figure out the cause of the problem, so I figured I'd
> ask on here.
>
> (For reference I'm running with the cdh3b2 release)
>
> The problem is that it seems that every line from the HBase table is passed
> to the mappers twice, thus resulting in counts ending up as exactly double
> what they should be.
>
> I set up the job like this:
>
>            Scan scan = new Scan();
>            scan.addFamily(Bytes.toBytes(scanFamily));
>
>            TableMapReduceUtil.initTableMapperJob(table,
>                                                  scan,
>                                                  mapper,
>                                                  Text.class,
>                                                  LongWritable.class,
>                                                  job);
>            job.setCombinerClass(LongSumReducer.class);
>
>            job.setReducerClass(reducer);
>
> I've set up counters in the mapper to verify what is happening, so that I
> know for certain that the mapper is being called twice with the same bit of
> data.  I've also confirmed (using the hbase shell) that each entry appears
> only once in the table.
>
> Is there a known bug along these lines?  If not, does anyone have any
> thoughts on what might be causing this or where I'd start looking to
> diagnose?
>
> Thanks
> - Adam
>



-- 
Met vriendelijke groeten,

Niels Basjes

Re: Duplicated entries with map job reading from HBase

Reply via email to