Hi there,

can anybody help me out on a (most likely) simple unclarity.

I am wondering how intermediate key/value pairs are materialized. I have a job where the map phase produces 600,000 records and map output bytes is ~300GB. What I thought (up to now) is that these 600,000 records, i.e., 300GB, are materialized locally by the mappers and that later on reducers pull these records (based on the key). What I see (and cannot explain) is that the FILE_BYTES_WRITTEN counter is as high as ~900GB.

So - where does the factor 3 come from between Map output bytes and FILE_BYTES_WRITTEN??? I thought about the replication factor of 3 in the file system - but that should be HDFS only?!

Thanks
- tim

Reply via email to