Hi, I don't know the answer (simply not enough information in your email) but I'm willing to make a guess: You are running on a system with two processing nodes? If so then try removing the Combiner. The combiner is a performance optimization and the whole processing should work without it. Some times there is a design fault in the processing and the combiner disrupts the processing.
HTH Niels Basjes 2010/11/5 Adam Phelps <a...@opendns.com> > I've noticed an odd behavior with a map-reduce job I've written which is > reading data out of an HBase table. After a couple days of poking at this I > haven't been able to figure out the cause of the problem, so I figured I'd > ask on here. > > (For reference I'm running with the cdh3b2 release) > > The problem is that it seems that every line from the HBase table is passed > to the mappers twice, thus resulting in counts ending up as exactly double > what they should be. > > I set up the job like this: > > Scan scan = new Scan(); > scan.addFamily(Bytes.toBytes(scanFamily)); > > TableMapReduceUtil.initTableMapperJob(table, > scan, > mapper, > Text.class, > LongWritable.class, > job); > job.setCombinerClass(LongSumReducer.class); > > job.setReducerClass(reducer); > > I've set up counters in the mapper to verify what is happening, so that I > know for certain that the mapper is being called twice with the same bit of > data. I've also confirmed (using the hbase shell) that each entry appears > only once in the table. > > Is there a known bug along these lines? If not, does anyone have any > thoughts on what might be causing this or where I'd start looking to > diagnose? > > Thanks > - Adam > -- Met vriendelijke groeten, Niels Basjes