As you noticed, your map tasks are spilling three times as many
records as they are outputting. In general, if the map output buffer
is large enough to hold all records in memory, these values will be
equal. If there isn't enough room, as was the case with your job, the
buffer makes additional intermediate spills.

To fix this, you can try tuning the per-job configurables io.sort.mb
and io.sort.record.percent. Look at the counters of a few map tasks to
get an idea of how much data (io.sort.mb) and how many records
(io.sort.record.percent) they produce.

Ed

On Wed, Feb 24, 2010 at 2:45 AM, Tim Kiefer <[email protected]> wrote:
> Sure,
> I see:
> Map input eecords: 10,000
> Map output records: 600,000
> Map output bytes: 307,216,800,000  (each reacord is about 500kb - that fits
> the application and is to be expected)
>
> Map spilled records: 1,802,965 (ahhh... now that you ask for it - here there
> also is a factor of 3 between output and spilled).
>
> So - question now is: why are three times as many records spilled than
> actually produced by the mappers?
>
> In my map function, I do not perform any additional file writing besides the
> context.write() for the intermediate records.
>
> Thanks, Tim
>
> Am 24.02.2010 05:28, schrieb Amogh Vasekar:
>>
>> Hi,
>> Can you let us know what is the value for :
>> Map input records
>> Map spilled records
>> Map output bytes
>> Is there any side effect file written?
>>
>> Thanks,
>> Amogh
>>
>>
>> On 2/23/10 8:57 PM, "Tim Kiefer"<[email protected]>  wrote:
>>
>> No... 900GB is in the map column. Reduce adds another ~70GB of
>> FILE_BYTES_WRITTEN and the total column consequently shows ~970GB.
>>
>> Am 23.02.2010 16:11, schrieb Ed Mazur:
>>
>>>
>>> Hi Tim,
>>>
>>> I'm guessing a lot of these writes are happening on the reduce side.
>>> On the JT web interface, there are three columns: map, reduce,
>>> overall. Is the 900GB figure from the overall column? The value in the
>>> map column will probably be closer to what you were expecting. There
>>> are writes on the reduce side too during the shuffle and multi-pass
>>> merge.
>>>
>>> Ed
>>>
>>> 2010/2/23 Tim Kiefer<[email protected]>:
>>>
>>>
>>>>
>>>> Hi Gang,
>>>>
>>>> thanks for your reply.
>>>>
>>>> To clarify: I look at the statistics through the job tracker. In the
>>>> webinterface for my job I have columns for map, reduce and total. What I
>>>> was refering to is "map" - i.e. I see FILE_BYTES_WRITTEN = 3 * Map
>>>> Output Bytes in the map column.
>>>>
>>>> About the replication factor: I would expect the exact same thing -
>>>> changing to 6 has no influence on FILE_BYTES_WRITTEN.
>>>>
>>>> About the sorting: I have io.sort.mb = 100 and io.sort.factor = 10.
>>>> Furthermore, I have 40 mappers and map output data is ~300GB. I can't
>>>> see how that ends up in a factor 3?
>>>>
>>>> - tim
>>>>
>>>> Am 23.02.2010 14:39, schrieb Gang Luo:
>>>>
>>>>
>>>>>
>>>>> Hi Tim,
>>>>> the intermediate data is materialized to local file system. Before it
>>>>> is available for reducers, mappers will sort them. If the buffer
>>>>> (io.sort.mb) is too small for the intermediate data, multi-phase sorting
>>>>> happen, which means you read and write the same bit more than one time.
>>>>>
>>>>> Besides, are you looking at the statistics per mapper through the job
>>>>> tracker, or just the information output when a job finish? If you look at
>>>>> the information given out at the end of the job, note that this is an
>>>>> overall statistics which include sorting at reduce side. It also include 
>>>>> the
>>>>> amount of data written to HDFS (I am not 100% sure).
>>>>>
>>>>> And, the FILE-BYTES_WRITTEN has nothing to do with the replication
>>>>> factor. I think if you change the factor to 6, FILE_BYTES_WRITTEN is still
>>>>> the same.
>>>>>
>>>>>  -Gang
>>>>>
>>>>>
>>>>> Hi there,
>>>>>
>>>>> can anybody help me out on a (most likely) simple unclarity.
>>>>>
>>>>> I am wondering how intermediate key/value pairs are materialized. I
>>>>> have a job where the map phase produces 600,000 records and map output 
>>>>> bytes
>>>>> is ~300GB. What I thought (up to now) is that these 600,000 records, i.e.,
>>>>> 300GB, are materialized locally by the mappers and that later on reducers
>>>>> pull these records (based on the key).
>>>>> What I see (and cannot explain) is that the FILE_BYTES_WRITTEN counter
>>>>> is as high as ~900GB.
>>>>>
>>>>> So - where does the factor 3 come from between Map output bytes and
>>>>> FILE_BYTES_WRITTEN??? I thought about the replication factor of 3 in the
>>>>> file system - but that should be HDFS only?!
>>>>>
>>>>> Thanks
>>>>> - tim
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>
>>
>

Reply via email to