If the task fails the counter for that task is not used. So if you have speculative execution turned on and the JT kills a task, it won't affect your end results.
Again the only major caveat is that the counters are in memory so if you have a lot of counters... On Jul 23, 2012, at 4:52 PM, Peter Marron wrote: > Yeah, I thought about using counters but I was worried about > what happens if a Mapper task fails. Does the counter get adjusted to > remove any contributions that the failed Mapper made before > another replacement Mapper is started? Otherwise in the case of any > Mapper failure I'm going to get an overcount am I not? > > Or is there some way to make sure that counters have > the correct semantics in the face of failures? > > Peter Marron > >> -----Original Message----- >> From: Dave Shine >> [mailto:Dave.Shine@channelintelligence. >> com] >> Sent: 23 July 2012 15:35 >> To: common-user@hadoop.apache.org >> Subject: RE: Counting records >> >> You could just use a counter and never >> emit anything from the Map(). Use the >> getCounter("MyRecords", >> "RecordTypeToCount").increment(1) >> whenever you find the type of record you >> are looking for. Never call >> output.collect(). Call the job with >> reduceTasks(0). When the job finishes, >> you can programmatically get the values >> of all counters including the one you >> create in the Map() method. >> >> >> Dave Shine >> Sr. Software Engineer >> 321.939.5093 direct | 407.314.0122 >> mobile CI Boost(tm) Clients Outperform >> Online(tm) www.ciboost.com >> >> >> -----Original Message----- >> From: Peter Marron >> [mailto:Peter.Marron@trilliumsoftware. >> com] >> Sent: Monday, July 23, 2012 10:25 AM >> To: common-user@hadoop.apache.org >> Subject: Counting records >> >> Hi, >> >> I am a complete noob with Hadoop and >> MapReduce and I have a question that is >> probably silly, but I still don't know the >> answer. >> >> For the purposes of discussion I'll assume >> that I'm using a standard >> TextInputFormat. >> (I don't think that this changes things too >> much.) >> >> To simplify (a fair bit) I want to count all >> the records that meet specific criteria. >> I would like to use MapReduce because I >> anticipate large sources and I want to >> get the performance and reliability that >> MapReduce offers. >> >> So the obvious and simple approach is to >> have my Mapper check whether each >> record meets the criteria and emit a 0 or >> a 1. Then I could use a combiner which >> accumulates (like a LongSumReducer) >> and use this as a reducer as well, and I >> am sure that that would work fine. >> >> However it seems massive overkill to >> have all those "1"s and "0"s emitted and >> stored on disc. >> It seems tempting to have the Mapper >> accumulate the count for all of the >> records that it sees and then just emit >> once at the end the total value. This >> seems simple enough, except that the >> Mapper doesn't seem to have any easy >> way to know when it is presented with >> the last record. >> >> Now I could just make the Mapper take a >> copy of the OutputCollector for each >> record called and then in the close >> method it could do a single emit. >> However, although, this looks like it >> would work with the current >> implementation, there seem to be no >> guarantees that the collector is valid at >> the time that the close is called. This just >> seems ugly. >> >> Or I could get the Mapper to record the >> first offset that it sees and read the split >> length using >> report.getInputSplit().getLength() and >> then it could monitor how far it is >> through the split and it should be able to >> detect the last record. It looks like the >> MapRunner class creates a Mapper >> object and uses it to process a split, and >> so it looks like it's safe to store state in >> the mapper class between invocations of >> the map method. (But is this just an >> implementation artefact? Is the mapper >> class supposed to be completely >> stateless?) >> >> Maybe I should have a custom >> InputFormat class and have it flag the >> last record by placing some extra >> information in the key? (Assuming that >> the InputFormant has enough >> information from the split to be able to >> detect the last record, which seems >> reasonable enough.) >> >> Is there some "blessed" way to do this? >> Or am I barking up the wrong tree >> because I should really just generate all >> those "1"s and "0"s and accept the >> overhead? >> >> Regards, >> >> Peter Marron >> Trillium Software UK Limited >> >> >> The information contained in this email >> message is considered confidential and >> proprietary to the sender and is intended >> solely for review and use by the named >> recipient. Any unauthorized review, use >> or distribution is strictly prohibited. If you >> have received this message in error, >> please advise the sender by reply email >> and delete the message. > > >