Hi Jeff, Can you also check what's your machine swappiness is set by running '/sbin/sysctl vm.swappiness'? HBase recommends to set it very low (0 or 5).
Alex K On Fri, Jun 4, 2010 at 12:03 PM, Todd Lipcon <t...@cloudera.com> wrote: > Hi Jeff, > > That seems like a reasonable config, but the error message you pasted > indicated xceivers was set to 2048 instead of 4096. > > Also, in my experience SocketTimeoutExceptions are usually due to swapping. > Verify that your machines aren't swapping when you're under load. > > BTW since this is hbase-related, may be better to move this to the hbase > user list. > > -Todd > > On Fri, Jun 4, 2010 at 9:37 AM, Jeff Whiting <je...@qualtrics.com> wrote: > >> I've tried to follow it the best I can. I already increased the ulimit >> to 32768. This is what I now have in my hdfs-site.xml. Am I missing >> anything? >> <?xml version="1.0"?> >> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> >> >> <!-- Put site-specific property overrides in this file. --> >> >> <configuration> >> <property> >> <name>dfs.data.dir</name> >> <value>/media/sdb,/media/sdc,/media/sdd</value> >> </property> >> >> <property> >> <name>dfs.replication</name> >> <value>3</value> >> </property> >> <property> >> <name>dfs.datanode.max.xcievers</name> >> <value>4096</value> >> </property> >> <property> >> <name>dfs.datanode.handler.count</name> >> <value>10</value> >> </property> >> </configuration> >> >> >> . >> >> Todd Lipcon wrote: >> >> Hi Jeff, >> >> Have you followed the HDFS configuration guide from the HBase wiki? You >> need to bump up the transceiver count and probably ulimit as well. Looks >> like you already tuned to 2048 but isn't high enough if you're still getting >> the "exceeds the limit" message. >> >> The EOFs and Connection Reset messages are when DFS clients are >> disconnecting prematurely from a client stream (probably due to xceiver >> errors on other streams) >> >> -Todd >> >> On Fri, Jun 4, 2010 at 8:56 AM, jeff whiting <je...@qualtrics.com> wrote: >> >>> I had my HRegionServers go down due to hdfs exception. In the datanode >>> logs I'm seeing a lot of different and varied exceptions. I've increased >>> the data xceiver count now but these other ones don't make a lot of sense. >>> >>> Among them are: >>> >>> :2010-06-04 07:41:56,917 ERROR datanode.DataNode >>> (DataXceiver.java:run(131)) - DatanodeRegistration(192.168.1.184:50010, >>> storageID=DS-1601700079-192.168.1.184-50010-1274208308658, infoPort=50075, >>> ipcPort=50020):DataXceiver >>> -java.io.EOFException >>> - at java.io.DataInputStream.readByte(DataInputStream.java:250) >>> - at >>> org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298) >>> - at >>> org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319) >>> - at org.apache.hadoop.io.Text.readString(Text.java:400) >>> - at >>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:313) >>> - at >>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103) >>> - at java.lang.Thread.run(Thread.java:619) >>> >>> >>> :2010-06-04 08:49:56,389 ERROR datanode.DataNode >>> (DataXceiver.java:run(131)) - DatanodeRegistration(192.168.1.184:50010, >>> storageID=DS-1601700079-192.168.1.184-50010-1274208308658, infoPort=50075, >>> ipcPort=50020):DataXceiver >>> -java.io.IOException: Connection reset by peer >>> - at sun.nio.ch.FileDispatcher.read0(Native Method) >>> - at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) >>> - at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233) >>> - at sun.nio.ch.IOUtil.read(IOUtil.java:206) >>> - at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236) >>> - at >>> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55) >>> - at >>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) >>> - at >>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) >>> >>> >>> :2010-06-04 05:36:54,840 ERROR datanode.DataNode >>> (DataXceiver.java:run(131)) - DatanodeRegistration(192.168.1.184:50010, >>> storageID=DS-1601700079-192.168.1.184-50010-1274208308658, infoPort=50075, >>> ipcPort=50020):DataXceiver >>> -java.io.IOException: xceiverCount 2049 exceeds the limit of concurrent >>> xcievers 2047 >>> - at >>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88) >>> - at java.lang.Thread.run(Thread.java:619) >>> >>> :2010-06-04 05:36:48,848 ERROR datanode.DataNode >>> (DataXceiver.java:run(131)) - DatanodeRegistration(192.168.1.184:50010, >>> storageID=DS-1601700079-192.168.1.184-50010-1274208308658, infoPort=50075, >>> ipcPort=50020):DataXceiver >>> -java.net.SocketTimeoutException: 480000 millis timeout while waiting for >>> channel to be ready for write. ch : >>> java.nio.channels.SocketChannel[connected local=/192.168.1.184:50010remote=/ >>> 192.168.1.184:55349] >>> - at >>> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) >>> - at >>> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159) >>> - at >>> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198) >>> - at >>> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:313) >>> - at >>> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:400) >>> - at >>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180) >>> - at >>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95) >>> - at java.lang.Thread.run(Thread.java:619) >>> -- >>> >>> The EOFException is the most common one I get. I'm also unsure how I >>> would get a connection reset by peer when I'm connecting locally. Why is >>> the file prematurely ending? Any idea of what is going on? >>> >>> Thanks, >>> ~Jeff >>> >>> -- >>> Jeff Whiting >>> Qualtrics Senior Software Engineer >>> je...@qualtrics.com >>> >>> >>> >>> >>> >>> >> >> >> -- >> Todd Lipcon >> Software Engineer, Cloudera >> >> >> -- >> Jeff Whiting >> Qualtrics Senior Software engineerje...@qualtrics.com >> >> > > > -- > Todd Lipcon > Software Engineer, Cloudera >