Hi Gang,

My understanding to this is that, the combiner has to re-read some records
> which have already been spilled to disk and combine them with those records
> which come later.
>

I believe the combine operation is done before map spill and after reduce
merge. Combine only occurs in the memory, instead of re-read records from
disks.


> Besides, I am not sure whether the combiner can guarantee there is only one
> record for each distinct key in each map task. Or does it just "try its
> best" to combine?
>

Yes, they can only "try their best".

Reply via email to