Re: How are intermediate key/value pairs materialized between map and reduce?

2010-02-24 Thread Ed Mazur
As you noticed, your map tasks are spilling three times as many
records as they are outputting. In general, if the map output buffer
is large enough to hold all records in memory, these values will be
equal. If there isn't enough room, as was the case with your job, the
buffer makes additional intermediate spills.

To fix this, you can try tuning the per-job configurables io.sort.mb
and io.sort.record.percent. Look at the counters of a few map tasks to
get an idea of how much data (io.sort.mb) and how many records
(io.sort.record.percent) they produce.

Ed

On Wed, Feb 24, 2010 at 2:45 AM, Tim Kiefer tim-kie...@gmx.de wrote:
 Sure,
 I see:
 Map input eecords: 10,000
 Map output records: 600,000
 Map output bytes: 307,216,800,000  (each reacord is about 500kb - that fits
 the application and is to be expected)

 Map spilled records: 1,802,965 (ahhh... now that you ask for it - here there
 also is a factor of 3 between output and spilled).

 So - question now is: why are three times as many records spilled than
 actually produced by the mappers?

 In my map function, I do not perform any additional file writing besides the
 context.write() for the intermediate records.

 Thanks, Tim

 Am 24.02.2010 05:28, schrieb Amogh Vasekar:

 Hi,
 Can you let us know what is the value for :
 Map input records
 Map spilled records
 Map output bytes
 Is there any side effect file written?

 Thanks,
 Amogh


 On 2/23/10 8:57 PM, Tim Kiefertim-kie...@gmx.de  wrote:

 No... 900GB is in the map column. Reduce adds another ~70GB of
 FILE_BYTES_WRITTEN and the total column consequently shows ~970GB.

 Am 23.02.2010 16:11, schrieb Ed Mazur:


 Hi Tim,

 I'm guessing a lot of these writes are happening on the reduce side.
 On the JT web interface, there are three columns: map, reduce,
 overall. Is the 900GB figure from the overall column? The value in the
 map column will probably be closer to what you were expecting. There
 are writes on the reduce side too during the shuffle and multi-pass
 merge.

 Ed

 2010/2/23 Tim Kiefertim-kie...@gmx.de:



 Hi Gang,

 thanks for your reply.

 To clarify: I look at the statistics through the job tracker. In the
 webinterface for my job I have columns for map, reduce and total. What I
 was refering to is map - i.e. I see FILE_BYTES_WRITTEN = 3 * Map
 Output Bytes in the map column.

 About the replication factor: I would expect the exact same thing -
 changing to 6 has no influence on FILE_BYTES_WRITTEN.

 About the sorting: I have io.sort.mb = 100 and io.sort.factor = 10.
 Furthermore, I have 40 mappers and map output data is ~300GB. I can't
 see how that ends up in a factor 3?

 - tim

 Am 23.02.2010 14:39, schrieb Gang Luo:



 Hi Tim,
 the intermediate data is materialized to local file system. Before it
 is available for reducers, mappers will sort them. If the buffer
 (io.sort.mb) is too small for the intermediate data, multi-phase sorting
 happen, which means you read and write the same bit more than one time.

 Besides, are you looking at the statistics per mapper through the job
 tracker, or just the information output when a job finish? If you look at
 the information given out at the end of the job, note that this is an
 overall statistics which include sorting at reduce side. It also include 
 the
 amount of data written to HDFS (I am not 100% sure).

 And, the FILE-BYTES_WRITTEN has nothing to do with the replication
 factor. I think if you change the factor to 6, FILE_BYTES_WRITTEN is still
 the same.

  -Gang


 Hi there,

 can anybody help me out on a (most likely) simple unclarity.

 I am wondering how intermediate key/value pairs are materialized. I
 have a job where the map phase produces 600,000 records and map output 
 bytes
 is ~300GB. What I thought (up to now) is that these 600,000 records, i.e.,
 300GB, are materialized locally by the mappers and that later on reducers
 pull these records (based on the key).
 What I see (and cannot explain) is that the FILE_BYTES_WRITTEN counter
 is as high as ~900GB.

 So - where does the factor 3 come from between Map output bytes and
 FILE_BYTES_WRITTEN??? I thought about the replication factor of 3 in the
 file system - but that should be HDFS only?!

 Thanks
 - tim










Re: How are intermediate key/value pairs materialized between map and reduce?

2010-02-24 Thread Amogh Vasekar
Hi,
Map spilled records: 1,802,965 (ahhh... now that you ask for it - here there 
also is a factor of 3 between output and spilled).
Exactly what I suspected :)
Ed has already provided some pointers as to why this is the case. You should 
try to minimize this number as much as possible, since this along with the 
Reduce Shuffle Bytes degrades your job performance by considerable amount.
To understand the internals and what Ed said, I would strongly recommend going 
through
http://www.slideshare.net/gnap/berkeley-performance-tuning
By a few fellow Yahoos. There is detailed explanation on why map side spills 
occur and how one can minimize that :)

Thanks,
Amogh

On 2/24/10 1:15 PM, Tim Kiefer tim-kie...@gmx.de wrote:

Sure,
I see:
Map input eecords: 10,000
Map output records: 600,000
Map output bytes: 307,216,800,000  (each reacord is about 500kb - that
fits the application and is to be expected)

Map spilled records: 1,802,965 (ahhh... now that you ask for it - here
there also is a factor of 3 between output and spilled).

So - question now is: why are three times as many records spilled than
actually produced by the mappers?

In my map function, I do not perform any additional file writing besides
the context.write() for the intermediate records.

Thanks, Tim

Am 24.02.2010 05:28, schrieb Amogh Vasekar:
 Hi,
 Can you let us know what is the value for :
 Map input records
 Map spilled records
 Map output bytes
 Is there any side effect file written?

 Thanks,
 Amogh





How are intermediate key/value pairs materialized between map and reduce?

2010-02-23 Thread Tim Kiefer

Hi there,

can anybody help me out on a (most likely) simple unclarity.

I am wondering how intermediate key/value pairs are materialized. I have 
a job where the map phase produces 600,000 records and map output bytes 
is ~300GB. What I thought (up to now) is that these 600,000 records, 
i.e., 300GB, are materialized locally by the mappers and that later on 
reducers pull these records (based on the key).
What I see (and cannot explain) is that the FILE_BYTES_WRITTEN counter 
is as high as ~900GB.


So - where does the factor 3 come from between Map output bytes and 
FILE_BYTES_WRITTEN??? I thought about the replication factor of 3 in the 
file system - but that should be HDFS only?!


Thanks
- tim


Re: How are intermediate key/value pairs materialized between map and reduce?

2010-02-23 Thread Gang Luo
Hi Tim,
the intermediate data is materialized to local file system. Before it is 
available for reducers, mappers will sort them. If the buffer (io.sort.mb) is 
too small for the intermediate data, multi-phase sorting happen, which means 
you read and write the same bit more than one time. 

Besides, are you looking at the statistics per mapper through the job tracker, 
or just the information output when a job finish? If you look at the 
information given out at the end of the job, note that this is an overall 
statistics which include sorting at reduce side. It also include the amount of 
data written to HDFS (I am not 100% sure).

And, the FILE-BYTES_WRITTEN has nothing to do with the replication factor. I 
think if you change the factor to 6, FILE_BYTES_WRITTEN is still the same.

 -Gang


- 原始邮件 
发件人: Tim Kiefer tim-kie...@gmx.de
收件人: common-user@hadoop.apache.org common-user@hadoop.apache.org
发送日期: 2010/2/23 (周二) 6:44:28 上午
主   题: How are intermediate key/value pairs materialized between map and reduce?

Hi there,

can anybody help me out on a (most likely) simple unclarity.

I am wondering how intermediate key/value pairs are materialized. I have a job 
where the map phase produces 600,000 records and map output bytes is ~300GB. 
What I thought (up to now) is that these 600,000 records, i.e., 300GB, are 
materialized locally by the mappers and that later on reducers pull these 
records (based on the key).
What I see (and cannot explain) is that the FILE_BYTES_WRITTEN counter is as 
high as ~900GB.

So - where does the factor 3 come from between Map output bytes and 
FILE_BYTES_WRITTEN??? I thought about the replication factor of 3 in the file 
system - but that should be HDFS only?!

Thanks
- tim



  ___ 
  好玩贺卡等你发,邮箱贺卡全新上线! 
http://card.mail.cn.yahoo.com/


Re: How are intermediate key/value pairs materialized between map and reduce?

2010-02-23 Thread Tim Kiefer
Hi Gang,

thanks for your reply.

To clarify: I look at the statistics through the job tracker. In the
webinterface for my job I have columns for map, reduce and total. What I
was refering to is map - i.e. I see FILE_BYTES_WRITTEN = 3 * Map
Output Bytes in the map column.

About the replication factor: I would expect the exact same thing -
changing to 6 has no influence on FILE_BYTES_WRITTEN.

About the sorting: I have io.sort.mb = 100 and io.sort.factor = 10.
Furthermore, I have 40 mappers and map output data is ~300GB. I can't
see how that ends up in a factor 3?

- tim

Am 23.02.2010 14:39, schrieb Gang Luo:
 Hi Tim,
 the intermediate data is materialized to local file system. Before it is 
 available for reducers, mappers will sort them. If the buffer (io.sort.mb) is 
 too small for the intermediate data, multi-phase sorting happen, which means 
 you read and write the same bit more than one time. 

 Besides, are you looking at the statistics per mapper through the job 
 tracker, or just the information output when a job finish? If you look at the 
 information given out at the end of the job, note that this is an overall 
 statistics which include sorting at reduce side. It also include the amount 
 of data written to HDFS (I am not 100% sure).

 And, the FILE-BYTES_WRITTEN has nothing to do with the replication factor. I 
 think if you change the factor to 6, FILE_BYTES_WRITTEN is still the same.

  -Gang


 - 原始邮件 
 发件人: Tim Kiefer tim-kie...@gmx.de
 收件人: common-user@hadoop.apache.org common-user@hadoop.apache.org
 发送日期: 2010/2/23 (周二) 6:44:28 上午
 主   题: How are intermediate key/value pairs materialized between map and 
 reduce?

 Hi there,

 can anybody help me out on a (most likely) simple unclarity.

 I am wondering how intermediate key/value pairs are materialized. I have a 
 job where the map phase produces 600,000 records and map output bytes is 
 ~300GB. What I thought (up to now) is that these 600,000 records, i.e., 
 300GB, are materialized locally by the mappers and that later on reducers 
 pull these records (based on the key).
 What I see (and cannot explain) is that the FILE_BYTES_WRITTEN counter is as 
 high as ~900GB.

 So - where does the factor 3 come from between Map output bytes and 
 FILE_BYTES_WRITTEN??? I thought about the replication factor of 3 in the file 
 system - but that should be HDFS only?!

 Thanks
 - tim



   ___ 
   好玩贺卡等你发,邮箱贺卡全新上线! 
 http://card.mail.cn.yahoo.com/
   


Re: How are intermediate key/value pairs materialized between map and reduce?

2010-02-23 Thread Ed Mazur
Hi Tim,

I'm guessing a lot of these writes are happening on the reduce side.
On the JT web interface, there are three columns: map, reduce,
overall. Is the 900GB figure from the overall column? The value in the
map column will probably be closer to what you were expecting. There
are writes on the reduce side too during the shuffle and multi-pass
merge.

Ed

2010/2/23 Tim Kiefer tim-kie...@gmx.de:
 Hi Gang,

 thanks for your reply.

 To clarify: I look at the statistics through the job tracker. In the
 webinterface for my job I have columns for map, reduce and total. What I
 was refering to is map - i.e. I see FILE_BYTES_WRITTEN = 3 * Map
 Output Bytes in the map column.

 About the replication factor: I would expect the exact same thing -
 changing to 6 has no influence on FILE_BYTES_WRITTEN.

 About the sorting: I have io.sort.mb = 100 and io.sort.factor = 10.
 Furthermore, I have 40 mappers and map output data is ~300GB. I can't
 see how that ends up in a factor 3?

 - tim

 Am 23.02.2010 14:39, schrieb Gang Luo:
 Hi Tim,
 the intermediate data is materialized to local file system. Before it is 
 available for reducers, mappers will sort them. If the buffer (io.sort.mb) 
 is too small for the intermediate data, multi-phase sorting happen, which 
 means you read and write the same bit more than one time.

 Besides, are you looking at the statistics per mapper through the job 
 tracker, or just the information output when a job finish? If you look at 
 the information given out at the end of the job, note that this is an 
 overall statistics which include sorting at reduce side. It also include the 
 amount of data written to HDFS (I am not 100% sure).

 And, the FILE-BYTES_WRITTEN has nothing to do with the replication factor. I 
 think if you change the factor to 6, FILE_BYTES_WRITTEN is still the same.

  -Gang


 - 原始邮件 
 发件人: Tim Kiefer tim-kie...@gmx.de
 收件人: common-user@hadoop.apache.org common-user@hadoop.apache.org
 发送日期: 2010/2/23 (周二) 6:44:28 上午
 主   题: How are intermediate key/value pairs materialized between map and 
 reduce?

 Hi there,

 can anybody help me out on a (most likely) simple unclarity.

 I am wondering how intermediate key/value pairs are materialized. I have a 
 job where the map phase produces 600,000 records and map output bytes is 
 ~300GB. What I thought (up to now) is that these 600,000 records, i.e., 
 300GB, are materialized locally by the mappers and that later on reducers 
 pull these records (based on the key).
 What I see (and cannot explain) is that the FILE_BYTES_WRITTEN counter is as 
 high as ~900GB.

 So - where does the factor 3 come from between Map output bytes and 
 FILE_BYTES_WRITTEN??? I thought about the replication factor of 3 in the 
 file system - but that should be HDFS only?!

 Thanks
 - tim



   ___
   好玩贺卡等你发,邮箱贺卡全新上线!
 http://card.mail.cn.yahoo.com/




Re: How are intermediate key/value pairs materialized between map and reduce?

2010-02-23 Thread Tim Kiefer
No... 900GB is in the map column. Reduce adds another ~70GB of
FILE_BYTES_WRITTEN and the total column consequently shows ~970GB.

Am 23.02.2010 16:11, schrieb Ed Mazur:
 Hi Tim,

 I'm guessing a lot of these writes are happening on the reduce side.
 On the JT web interface, there are three columns: map, reduce,
 overall. Is the 900GB figure from the overall column? The value in the
 map column will probably be closer to what you were expecting. There
 are writes on the reduce side too during the shuffle and multi-pass
 merge.

 Ed

 2010/2/23 Tim Kiefer tim-kie...@gmx.de:
   
 Hi Gang,

 thanks for your reply.

 To clarify: I look at the statistics through the job tracker. In the
 webinterface for my job I have columns for map, reduce and total. What I
 was refering to is map - i.e. I see FILE_BYTES_WRITTEN = 3 * Map
 Output Bytes in the map column.

 About the replication factor: I would expect the exact same thing -
 changing to 6 has no influence on FILE_BYTES_WRITTEN.

 About the sorting: I have io.sort.mb = 100 and io.sort.factor = 10.
 Furthermore, I have 40 mappers and map output data is ~300GB. I can't
 see how that ends up in a factor 3?

 - tim

 Am 23.02.2010 14:39, schrieb Gang Luo:
 
 Hi Tim,
 the intermediate data is materialized to local file system. Before it is 
 available for reducers, mappers will sort them. If the buffer (io.sort.mb) 
 is too small for the intermediate data, multi-phase sorting happen, which 
 means you read and write the same bit more than one time.

 Besides, are you looking at the statistics per mapper through the job 
 tracker, or just the information output when a job finish? If you look at 
 the information given out at the end of the job, note that this is an 
 overall statistics which include sorting at reduce side. It also include 
 the amount of data written to HDFS (I am not 100% sure).

 And, the FILE-BYTES_WRITTEN has nothing to do with the replication factor. 
 I think if you change the factor to 6, FILE_BYTES_WRITTEN is still the same.

  -Gang


 - 原始邮件 
 发件人: Tim Kiefer tim-kie...@gmx.de
 收件人: common-user@hadoop.apache.org common-user@hadoop.apache.org
 发送日期: 2010/2/23 (周二) 6:44:28 上午
 主   题: How are intermediate key/value pairs materialized between map and 
 reduce?

 Hi there,

 can anybody help me out on a (most likely) simple unclarity.

 I am wondering how intermediate key/value pairs are materialized. I have a 
 job where the map phase produces 600,000 records and map output bytes is 
 ~300GB. What I thought (up to now) is that these 600,000 records, i.e., 
 300GB, are materialized locally by the mappers and that later on reducers 
 pull these records (based on the key).
 What I see (and cannot explain) is that the FILE_BYTES_WRITTEN counter is 
 as high as ~900GB.

 So - where does the factor 3 come from between Map output bytes and 
 FILE_BYTES_WRITTEN??? I thought about the replication factor of 3 in the 
 file system - but that should be HDFS only?!

 Thanks
 - tim



   ___
   好玩贺卡等你发,邮箱贺卡全新上线!
 http://card.mail.cn.yahoo.com/

   
 


Re: How are intermediate key/value pairs materialized between map and reduce?

2010-02-23 Thread Amogh Vasekar
Hi,
Can you let us know what is the value for :
Map input records
Map spilled records
Map output bytes
Is there any side effect file written?

Thanks,
Amogh


On 2/23/10 8:57 PM, Tim Kiefer tim-kie...@gmx.de wrote:

No... 900GB is in the map column. Reduce adds another ~70GB of
FILE_BYTES_WRITTEN and the total column consequently shows ~970GB.

Am 23.02.2010 16:11, schrieb Ed Mazur:
 Hi Tim,

 I'm guessing a lot of these writes are happening on the reduce side.
 On the JT web interface, there are three columns: map, reduce,
 overall. Is the 900GB figure from the overall column? The value in the
 map column will probably be closer to what you were expecting. There
 are writes on the reduce side too during the shuffle and multi-pass
 merge.

 Ed

 2010/2/23 Tim Kiefer tim-kie...@gmx.de:

 Hi Gang,

 thanks for your reply.

 To clarify: I look at the statistics through the job tracker. In the
 webinterface for my job I have columns for map, reduce and total. What I
 was refering to is map - i.e. I see FILE_BYTES_WRITTEN = 3 * Map
 Output Bytes in the map column.

 About the replication factor: I would expect the exact same thing -
 changing to 6 has no influence on FILE_BYTES_WRITTEN.

 About the sorting: I have io.sort.mb = 100 and io.sort.factor = 10.
 Furthermore, I have 40 mappers and map output data is ~300GB. I can't
 see how that ends up in a factor 3?

 - tim

 Am 23.02.2010 14:39, schrieb Gang Luo:

 Hi Tim,
 the intermediate data is materialized to local file system. Before it is 
 available for reducers, mappers will sort them. If the buffer (io.sort.mb) 
 is too small for the intermediate data, multi-phase sorting happen, which 
 means you read and write the same bit more than one time.

 Besides, are you looking at the statistics per mapper through the job 
 tracker, or just the information output when a job finish? If you look at 
 the information given out at the end of the job, note that this is an 
 overall statistics which include sorting at reduce side. It also include 
 the amount of data written to HDFS (I am not 100% sure).

 And, the FILE-BYTES_WRITTEN has nothing to do with the replication factor. 
 I think if you change the factor to 6, FILE_BYTES_WRITTEN is still the same.

  -Gang


 Hi there,

 can anybody help me out on a (most likely) simple unclarity.

 I am wondering how intermediate key/value pairs are materialized. I have a 
 job where the map phase produces 600,000 records and map output bytes is 
 ~300GB. What I thought (up to now) is that these 600,000 records, i.e., 
 300GB, are materialized locally by the mappers and that later on reducers 
 pull these records (based on the key).
 What I see (and cannot explain) is that the FILE_BYTES_WRITTEN counter is 
 as high as ~900GB.

 So - where does the factor 3 come from between Map output bytes and 
 FILE_BYTES_WRITTEN??? I thought about the replication factor of 3 in the 
 file system - but that should be HDFS only?!

 Thanks
 - tim






Re: How are intermediate key/value pairs materialized between map and reduce?

2010-02-23 Thread Tim Kiefer

Sure,
I see:
Map input eecords: 10,000
Map output records: 600,000
Map output bytes: 307,216,800,000  (each reacord is about 500kb - that 
fits the application and is to be expected)


Map spilled records: 1,802,965 (ahhh... now that you ask for it - here 
there also is a factor of 3 between output and spilled).


So - question now is: why are three times as many records spilled than 
actually produced by the mappers?


In my map function, I do not perform any additional file writing besides 
the context.write() for the intermediate records.


Thanks, Tim

Am 24.02.2010 05:28, schrieb Amogh Vasekar:

Hi,
Can you let us know what is the value for :
Map input records
Map spilled records
Map output bytes
Is there any side effect file written?

Thanks,
Amogh


On 2/23/10 8:57 PM, Tim Kiefertim-kie...@gmx.de  wrote:

No... 900GB is in the map column. Reduce adds another ~70GB of
FILE_BYTES_WRITTEN and the total column consequently shows ~970GB.

Am 23.02.2010 16:11, schrieb Ed Mazur:
   

Hi Tim,

I'm guessing a lot of these writes are happening on the reduce side.
On the JT web interface, there are three columns: map, reduce,
overall. Is the 900GB figure from the overall column? The value in the
map column will probably be closer to what you were expecting. There
are writes on the reduce side too during the shuffle and multi-pass
merge.

Ed

2010/2/23 Tim Kiefertim-kie...@gmx.de:

 

Hi Gang,

thanks for your reply.

To clarify: I look at the statistics through the job tracker. In the
webinterface for my job I have columns for map, reduce and total. What I
was refering to is map - i.e. I see FILE_BYTES_WRITTEN = 3 * Map
Output Bytes in the map column.

About the replication factor: I would expect the exact same thing -
changing to 6 has no influence on FILE_BYTES_WRITTEN.

About the sorting: I have io.sort.mb = 100 and io.sort.factor = 10.
Furthermore, I have 40 mappers and map output data is ~300GB. I can't
see how that ends up in a factor 3?

- tim

Am 23.02.2010 14:39, schrieb Gang Luo:

   

Hi Tim,
the intermediate data is materialized to local file system. Before it is 
available for reducers, mappers will sort them. If the buffer (io.sort.mb) is 
too small for the intermediate data, multi-phase sorting happen, which means 
you read and write the same bit more than one time.

Besides, are you looking at the statistics per mapper through the job tracker, 
or just the information output when a job finish? If you look at the 
information given out at the end of the job, note that this is an overall 
statistics which include sorting at reduce side. It also include the amount of 
data written to HDFS (I am not 100% sure).

And, the FILE-BYTES_WRITTEN has nothing to do with the replication factor. I 
think if you change the factor to 6, FILE_BYTES_WRITTEN is still the same.

  -Gang


Hi there,

can anybody help me out on a (most likely) simple unclarity.

I am wondering how intermediate key/value pairs are materialized. I have a job 
where the map phase produces 600,000 records and map output bytes is ~300GB. 
What I thought (up to now) is that these 600,000 records, i.e., 300GB, are 
materialized locally by the mappers and that later on reducers pull these 
records (based on the key).
What I see (and cannot explain) is that the FILE_BYTES_WRITTEN counter is as 
high as ~900GB.

So - where does the factor 3 come from between Map output bytes and 
FILE_BYTES_WRITTEN??? I thought about the replication factor of 3 in the file 
system - but that should be HDFS only?!

Thanks
- tim