Could someone please help explain the job counters shown for Combine records on the JobTracker JSP page?
Here's an example from one of our MR jobs. There are Combine input and output record counters shown for both Map phase and Reduce phase. We're not quite sure how to interpret them - Map Phase: Map input records 85,013,261,279 Map output records 85,013,261,279 Combine input records 114,936,724,505 Combine output records 38,750,511,975 Reduce Phase: Combine input records 8,827,017,275 Combine output records 17,986,654 Reduce input groups 2,221,796 Reduce input records 17,986,654 Reduce output records 4,443,590 What makes sense: * Considering the MR job and its data, the 85.0b count for Map output records is expected * I would believe a rate of 85.0b / 38.8b = 2.2 for our combiner * Reduce phase shows Combine output records at 18.0m = Reduce input records at 18.0m * Reduce input groups at 2.2m is expected * Reduce output records at 4.4m is verified What doesn't make sense: * The 115b count for Combine input records during Map phase * The 8.8b count for Combine input records during Reduce phase What would be the actual count of records coming out of the Map phase? Thanks, Paco