Re: multioutput dfs.datanode.max.xcievers and too many open files

Harsh J Thu, 23 Feb 2012 07:35:01 -0800

Hi Marc,

Your error is not related to the transfer thread limit (xceivers).
You're hitting a "ulimit -n" cap at your DataNode, i.e. system maximum
number of open files allowed for the user running the DN process.


Check what your limits say for 'Max open files' in /proc/<DN
PID>/limits and raise it if its proving insufficient today. Also do
try to upgrade your cluster to a more recent release, as there were
some improvements on this front you can benefit from.

On Thu, Feb 23, 2012 at 8:56 PM, Marc Sturlese <marc.sturl...@gmail.com> wrote:
> Hey there,
> I've been running a cluster for about a year (about 20 machines). I've run
> many concurrent jobs there and some of them with multiOutput and never had
> any problem (multiOutputs where creating just 3 or 4 different outputs).
> Now I've a job with multiOutputs that creates 100 different outputs and it
> always end up with errors.
> Tasks start throwing this erros:
>
> java.io.IOException: Bad connect ack with firstBadLink 10.2.0.154:50010
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2963)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2888)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1900(DFSClient.java:2139)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2329)
>
>
> or:
> java.io.EOFException
>        at java.io.DataInputStream.readByte(DataInputStream.java:250)
>        at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
>        at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
>        at org.apache.hadoop.io.Text.readString(Text.java:400)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2961)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2888)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1900(DFSClient.java:2139)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2329)
>
>
> Checking the datanode log I see hundreds of times this error:
> 2012-02-23 14:22:56,008 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Reopen already-open Block
> for append blk_3368446040000470452_29464903
> 2012-02-23 14:22:56,008 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
> blk_3368446040000470452_29464903 received exception
> java.net.SocketException: Too many open files
> 2012-02-23 14:22:56,008 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(10.2.0.156:50010,
> storageID=DS-1194175480-10.2.0.156-50010-1329304363220, infoPort=50075,
> ipcPort=50020):DataXceiver
> java.net.SocketException: Too many open files
>        at sun.nio.ch.Net.socket0(Native Method)
>        at sun.nio.ch.Net.socket(Net.java:97)
>        at sun.nio.ch.SocketChannelImpl.<init>(SocketChannelImpl.java:84)
>        at
> sun.nio.ch.SelectorProviderImpl.openSocketChannel(SelectorProviderImpl.java:37)
>        at java.nio.channels.SocketChannel.open(SocketChannel.java:105)
>        at
> org.apache.hadoop.hdfs.server.datanode.DataNode.newSocket(DataNode.java:429)
>        at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:296)
>        at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:118)
> 2012-02-23 14:22:56,034 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_-2698946892792040969_29464904 src: /10.2.0.156:40969 dest:
> /10.2.0.156:50010
> 2012-02-23 14:22:56,035 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
> blk_-2698946892792040969_29464904 received exception
> java.net.SocketException: Too many open files
> 2012-02-23 14:22:56,035 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(10.2.0.156:50010,
> storageID=DS-1194175480-10.2.0.156-50010-1329304363220, infoPort=50075,
> ipcPort=50020):DataXceiver
> java.net.SocketException: Too many open files
>        at sun.nio.ch.Net.socket0(Native Method)
>        at sun.nio.ch.Net.socket(Net.java:97)
>        at sun.nio.ch.SocketChannelImpl.<init>(SocketChannelImpl.java:84)
>        at
> sun.nio.ch.SelectorProviderImpl.openSocketChannel(SelectorProviderImpl.java:37)
>        at java.nio.channels.SocketChannel.open(SocketChannel.java:105)
>        at
> org.apache.hadoop.hdfs.server.datanode.DataNode.newSocket(DataNode.java:429)
>        at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:296)
>        at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:118)
>
>
> I've always had configured in hdfs-site.xml:
>        <property>
>                <name>dfs.datanode.max.xcievers</name>
>                <value>4096</value>
>        </property>
>
> But I think now it's not enough to handle that many multipleOutputs. If I
> increase  even more max.xcievers which are de side effects? Wich value
> should be considered as maximum (I suppose it depends on the CPU and RAM,
> but aprox).
>
> Thanks in advance.
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/multioutput-dfs-datanode-max-xcievers-and-too-many-open-files-tp3770024p3770024.html
> Sent from the Hadoop lucene-users mailing list archive at Nabble.com.



-- 
Harsh J
Customer Ops. Engineer
Cloudera | http://tiny.cloudera.com/about

Re: multioutput dfs.datanode.max.xcievers and too many open files

Reply via email to