If you may have turned on ipv6 on your hadoop cluster, it may cause severe
performance hit!

When I ran the gridmix2 benchmark on a newly constructed cluster, it took
30% more time than the baseline time that was obtained on a similar cluster.

I noticed that some task processes on some machines took 3+ minutes to
initialize.
After examining these processes in details, I found that they were stuck at
socket initialization tile, as shown in the following stack:

"main" prio=10 tid=0x0805b400 nid=0x4681 runnable [0xf7fbb000..0xf7fbc208]
   java.lang.Thread.State: RUNNABLE
    at java.net.PlainSocketImpl.initProto(Native Method)
    at java.net.PlainSocketImpl.<clinit>(PlainSocketImpl.java:84)
    at java.net.Socket.setImpl(Socket.java:434)
    at java.net.Socket.<init>(Socket.java:68)
    at sun.nio.ch.SocketAdaptor.<init>(SocketAdaptor.java:50)
    at sun.nio.ch.SocketAdaptor.create(SocketAdaptor.java:55)
    at sun.nio.ch.SocketChannelImpl.socket(SocketChannelImpl.java:105)
    - locked <0xf17a38c8> (a java.lang.Object)
    at 
org.apache.hadoop.net.StandardSocketFactory.createSocket(StandardSocketFacto
ry.java:58)
    at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:298)
    - locked <0xf1795db0> (a org.apache.hadoop.ipc.Client$Connection)
    at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:178)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:820)
    at org.apache.hadoop.ipc.Client.call(Client.java:705)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
    at org.apache.hadoop.mapred.$Proxy0.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:348)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:335)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:372)
    at 
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2188)


I did a search on the web and found that that was due to a known bug for
Java related to ipv6.

More information about the bug can be found at the following two pages:

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6483406
http://edocs.bea.com/jrockit/releases/5026x/relnotes/relnotes.html

{quote}
    Slow startup because of a hang in java.net.PlainSocketImpl.initProto(),
which typically is called when creating the first Socket or    ServerSocket.
    In BEA JRockit 5.0 R26 the network stack is configured so that IPv6 is
used in preference to IPv4 when it is present.
    During initialization of the network stack, the network code connects a
socket to its own loopback interface to set up some data   structures.
Blocking this connection (e.g. with a firewall) will cause the
initialization code to wait for a socket timeout, after which   the system
falls back on using IPv4.


{quote}

Suggested Workaround:

Either set -Djava.net.preferIPv4Stack=true for the child process option,
which forces Java to use IPv4
instead, or you disable IPv6 entirely in the system. The proper fix is to
allow IPv6 traffic from localhost to localhost.
For more information, see the Sun documentation:
http://java.sun.com/j2se/1.4.2/docs/guide/net/ipv6_guide/#ipv6-networking



Runping


Reply via email to