I'm having some trouble on one node of a 5-node cluster. I can successfully run maps on all of them, but the reduce phase always stalls on one particular host. It throws a connection refused exception when attempting to connect to itself to get the data from the map outputs. The only difference between host5 and the other hosts that I can see is that on host5, its hostname resolves to 127.0.0.1 instead of its external IP address. I can't imagine that should prevent it from connecting to itself, however. Has anyone else had a similar problem? Is there a document somewhere that indicates the requirements for host name resolution for nodes in a cluster?

Thanks,
Brandon

snippet of log of the reduce failing to copy data from itself on host5.test:

2008-12-16 12:25:12,532 INFO org.apache.hadoop.mapred.ReduceTask: attempt_200812161640_0003_r_000004_0: Got 2 new map-outputs & number of known map outputs is 2 2008-12-16 12:25:12,532 INFO org.apache.hadoop.mapred.ReduceTask: attempt_200812161640_0003_r_000004_0 Scheduled 1 of 2 known outputs (0 slow hosts and 1 dup hosts) 2008-12-16 12:25:12,533 WARN org.apache.hadoop.mapred.ReduceTask: attempt_200812161640_0003_r_000004_0 copy failed: attempt_200812161640_0003_m_000003_0 from host5.test 2008-12-16 12:25:12,534 WARN org.apache.hadoop.mapred.ReduceTask: java.net.ConnectException: Connection refused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun .reflect .NativeConstructorAccessorImpl .newInstance(NativeConstructorAccessorImpl.java:39) at sun .reflect .DelegatingConstructorAccessorImpl .newInstance(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at sun.net.www.protocol.http.HttpURLConnection $6.run(HttpURLConnection.java:1296)
        at java.security.AccessController.doPrivileged(Native Method)
at sun .net .www .protocol .http.HttpURLConnection.getChainedException(HttpURLConnection.java:1290) at sun .net .www .protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java: 944) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier $MapOutputCopier.getInputStream(ReduceTask.java:1143) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier $MapOutputCopier.getMapOutput(ReduceTask.java:1084) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier $MapOutputCopier.copyOutput(ReduceTask.java:997) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier $MapOutputCopier.run(ReduceTask.java:946)
Caused by: java.net.ConnectException: Connection refused
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
        at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
        at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
        at java.net.Socket.connect(Socket.java:519)
        at sun.net.NetworkClient.doConnect(NetworkClient.java:152)
        at sun.net.www.http.HttpClient.openServer(HttpClient.java:394)
        at sun.net.www.http.HttpClient.openServer(HttpClient.java:529)
        at sun.net.www.http.HttpClient.<init>(HttpClient.java:233)
        at sun.net.www.http.HttpClient.New(HttpClient.java:306)
        at sun.net.www.http.HttpClient.New(HttpClient.java:323)
at sun .net .www .protocol .http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:788) at sun .net .www .protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java: 729) at sun .net .www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:654) at sun .net .www .protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java: 977)

Reply via email to