Hi Marc, Your error is not related to the transfer thread limit (xceivers). You're hitting a "ulimit -n" cap at your DataNode, i.e. system maximum number of open files allowed for the user running the DN process.
Check what your limits say for 'Max open files' in /proc/<DN PID>/limits and raise it if its proving insufficient today. Also do try to upgrade your cluster to a more recent release, as there were some improvements on this front you can benefit from. On Thu, Feb 23, 2012 at 8:56 PM, Marc Sturlese <marc.sturl...@gmail.com> wrote: > Hey there, > I've been running a cluster for about a year (about 20 machines). I've run > many concurrent jobs there and some of them with multiOutput and never had > any problem (multiOutputs where creating just 3 or 4 different outputs). > Now I've a job with multiOutputs that creates 100 different outputs and it > always end up with errors. > Tasks start throwing this erros: > > java.io.IOException: Bad connect ack with firstBadLink 10.2.0.154:50010 > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2963) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2888) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1900(DFSClient.java:2139) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2329) > > > or: > java.io.EOFException > at java.io.DataInputStream.readByte(DataInputStream.java:250) > at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298) > at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319) > at org.apache.hadoop.io.Text.readString(Text.java:400) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2961) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2888) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1900(DFSClient.java:2139) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2329) > > > Checking the datanode log I see hundreds of times this error: > 2012-02-23 14:22:56,008 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: Reopen already-open Block > for append blk_3368446040000470452_29464903 > 2012-02-23 14:22:56,008 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock > blk_3368446040000470452_29464903 received exception > java.net.SocketException: Too many open files > 2012-02-23 14:22:56,008 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeRegistration(10.2.0.156:50010, > storageID=DS-1194175480-10.2.0.156-50010-1329304363220, infoPort=50075, > ipcPort=50020):DataXceiver > java.net.SocketException: Too many open files > at sun.nio.ch.Net.socket0(Native Method) > at sun.nio.ch.Net.socket(Net.java:97) > at sun.nio.ch.SocketChannelImpl.<init>(SocketChannelImpl.java:84) > at > sun.nio.ch.SelectorProviderImpl.openSocketChannel(SelectorProviderImpl.java:37) > at java.nio.channels.SocketChannel.open(SocketChannel.java:105) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.newSocket(DataNode.java:429) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:296) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:118) > 2012-02-23 14:22:56,034 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block > blk_-2698946892792040969_29464904 src: /10.2.0.156:40969 dest: > /10.2.0.156:50010 > 2012-02-23 14:22:56,035 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock > blk_-2698946892792040969_29464904 received exception > java.net.SocketException: Too many open files > 2012-02-23 14:22:56,035 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeRegistration(10.2.0.156:50010, > storageID=DS-1194175480-10.2.0.156-50010-1329304363220, infoPort=50075, > ipcPort=50020):DataXceiver > java.net.SocketException: Too many open files > at sun.nio.ch.Net.socket0(Native Method) > at sun.nio.ch.Net.socket(Net.java:97) > at sun.nio.ch.SocketChannelImpl.<init>(SocketChannelImpl.java:84) > at > sun.nio.ch.SelectorProviderImpl.openSocketChannel(SelectorProviderImpl.java:37) > at java.nio.channels.SocketChannel.open(SocketChannel.java:105) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.newSocket(DataNode.java:429) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:296) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:118) > > > I've always had configured in hdfs-site.xml: > <property> > <name>dfs.datanode.max.xcievers</name> > <value>4096</value> > </property> > > But I think now it's not enough to handle that many multipleOutputs. If I > increase even more max.xcievers which are de side effects? Wich value > should be considered as maximum (I suppose it depends on the CPU and RAM, > but aprox). > > Thanks in advance. > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/multioutput-dfs-datanode-max-xcievers-and-too-many-open-files-tp3770024p3770024.html > Sent from the Hadoop lucene-users mailing list archive at Nabble.com. -- Harsh J Customer Ops. Engineer Cloudera | http://tiny.cloudera.com/about