Could someone please help explain the job counters shown for Combine
records on the JobTracker JSP page?

Here's an example from one of our MR jobs.  There are Combine input
and output record counters shown for both Map phase and Reduce phase.
We're not quite sure how to interpret them -

Map Phase:
   Map input records   85,013,261,279
   Map output records   85,013,261,279
   Combine input records   114,936,724,505
   Combine output records   38,750,511,975

Reduce Phase:
   Combine input records   8,827,017,275
   Combine output records   17,986,654
   Reduce input groups   2,221,796
   Reduce input records   17,986,654
   Reduce output records   4,443,590


What makes sense:
   * Considering the MR job and its data, the 85.0b count for Map
output records is expected
   * I would believe a rate of 85.0b / 38.8b = 2.2 for our combiner
   * Reduce phase shows Combine output records at 18.0m = Reduce input
records at 18.0m
   * Reduce input groups at 2.2m is expected
   * Reduce output records at 4.4m is verified

What doesn't make sense:
   * The 115b count for Combine input records during Map phase
   * The 8.8b count for Combine input records during Reduce phase

What would be the actual count of records coming out of the Map phase?

Thanks,
Paco

Reply via email to