[ 
http://issues.apache.org/jira/browse/HADOOP-195?page=comments#action_12378703 ] 

paul sutter commented on HADOOP-195:
------------------------------------


Owen,

Are you still using 64,000 mappers? If so, wouldnt your average map output file 
size be around 80KB?

I'd suggest doing a /= 10 or /=50 on those mappers. 

If you had 1880 mappers and 376 reducers, your map output files would be 2.8MB 
each, which might be better then 80KB.

You might try 1800 mappers and 350 reducers, so that you have spare capacity on 
your nodes for failed mappers or reducers (giving you 3MB map output files).

Has anyone measured the map-task creation overhead? Does anyone know the file 
creation/deletion overhead on Linux? Each of those little files is created, 
written, read, and deleted twice in the currnet code, and each time as that 
tiny filesize).

Paul

Paul

> transfer map output transfer with http instead of rpc
> -----------------------------------------------------
>
>          Key: HADOOP-195
>          URL: http://issues.apache.org/jira/browse/HADOOP-195
>      Project: Hadoop
>         Type: Improvement

>   Components: mapred
>     Versions: 0.2
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley
>      Fix For: 0.3
>  Attachments: netstat.log, netstat.xls
>
> The data transfer of the map output should be transfered via http instead 
> rpc, because rpc is very slow for this application and the timeout behavior 
> is suboptimal. (server sends data and client ignores it because it took more 
> than 10 seconds to be received.)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to