Hi Gang, My understanding to this is that, the combiner has to re-read some records > which have already been spilled to disk and combine them with those records > which come later. >
I believe the combine operation is done before map spill and after reduce merge. Combine only occurs in the memory, instead of re-read records from disks. > Besides, I am not sure whether the combiner can guarantee there is only one > record for each distinct key in each map task. Or does it just "try its > best" to combine? > Yes, they can only "try their best".