yes of course. Agree with your analysis.
On May 7, 2006, at 1:50 PM, paul sutter (JIRA) wrote:
[ http://issues.apache.org/jira/browse/HADOOP-195?
page=comments#action_12378315 ]
paul sutter commented on HADOOP-195:
------------------------------------
eric,
most of my suggestions relate to the copy phase of the sort path,
not the sort itself. once that is working, i can make sort
suggestions (although my best sort suggestion is for you guys to
talk with david cossock about sorts).
this whole area is critical. on that cluster, owen's 2TB should
sort in 10 minutes, and the data should be copied in less than that
time, for a total run time of <20 minutes.
pleased that yahoo has resources to apply.
paul
transfer map output transfer with http instead of rpc
-----------------------------------------------------
Key: HADOOP-195
URL: http://issues.apache.org/jira/browse/HADOOP-195
Project: Hadoop
Type: Improvement
Components: mapred
Versions: 0.2
Reporter: Owen O'Malley
Assignee: Owen O'Malley
Fix For: 0.3
The data transfer of the map output should be transfered via http
instead rpc, because rpc is very slow for this application and the
timeout behavior is suboptimal. (server sends data and client
ignores it because it took more than 10 seconds to be received.)
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the
administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira