Re: How to test DFS?
you could just list the file contents in your hadoop data/ directories, of the individual nodes, ... somewhere in there the file blocks will be floating around. On Tue, May 26, 2015 at 4:59 PM, Caesar Samsi caesarsa...@mac.com wrote: Hello, How would I go about and confirm that a file has been distributed successfully to all datanodes? I would like to demonstrate this capability in a short briefing for my colleagues. Can I access the file from the datanode itself (todate I can only access the files from the master node, not the slaves)? Thank you, Caesar. -- jay vyas
DataNode Timeout exceptions.
Hi All, I am on Apache Yarn 2.3.0 and lately I have been seeing this exceptions happening frequently.Can someone tell me the root cause of this issue. I have set the the property in mapred-site.xml as follows , is there any other property that I need to set also? property namemapreduce.task.timeout/name value180/value description The time out value for taks, I set this because the JVMs might be busy in GC and this is causing timeout in Hadoop Tasks. /description /property 15/05/26 02:06:53 WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block BP-1751673171-112.123.123.123-1431824104307:blk_1073749395_8571 java.net.SocketTimeoutException: 65000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/112.123.123.123:35398 remote=/112.123.123.123:50010] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118) at java.io.FilterInputStream.read(FilterInputStream.java:83) at java.io.FilterInputStream.read(FilterInputStream.java:83) at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1881) at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:116) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:726) 15/05/26 02:06:53 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/df/.staging/job_1431824165463_0221 15/05/26 02:06:54 WARN security.UserGroupInformation: PriviledgedActionException as:df (auth:SIMPLE) cause:java.io.IOException: All datanodes 112.123.123.123:50010 are bad. Aborting... 15/05/26 02:06:54 WARN security.UserGroupInformation: PriviledgedActionException as:df (auth:SIMPLE) cause:java.io.IOException: All datanodes 112.123.123.123:50010 are bad. Aborting... Exception in thread main java.io.IOException: All datanodes 112.123.123.123:50010 are bad. Aborting... at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1023) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:838) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:483)
Re: DataNode Timeout exceptions.
bq. All datanodes 112.123.123.123:50010 are bad. Aborting... How many datanodes do you have ? Can you check datanode namenode log ? Cheers On Tue, May 26, 2015 at 5:00 PM, S.L simpleliving...@gmail.com wrote: Hi All, I am on Apache Yarn 2.3.0 and lately I have been seeing this exceptions happening frequently.Can someone tell me the root cause of this issue. I have set the the property in mapred-site.xml as follows , is there any other property that I need to set also? property namemapreduce.task.timeout/name value180/value description The time out value for taks, I set this because the JVMs might be busy in GC and this is causing timeout in Hadoop Tasks. /description /property 15/05/26 02:06:53 WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block BP-1751673171-112.123.123.123-1431824104307:blk_1073749395_8571 java.net.SocketTimeoutException: 65000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/112.123.123.123:35398 remote=/112.123.123.123:50010] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118) at java.io.FilterInputStream.read(FilterInputStream.java:83) at java.io.FilterInputStream.read(FilterInputStream.java:83) at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1881) at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:116) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:726) 15/05/26 02:06:53 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/df/.staging/job_1431824165463_0221 15/05/26 02:06:54 WARN security.UserGroupInformation: PriviledgedActionException as:df (auth:SIMPLE) cause:java.io.IOException: All datanodes 112.123.123.123:50010 are bad. Aborting... 15/05/26 02:06:54 WARN security.UserGroupInformation: PriviledgedActionException as:df (auth:SIMPLE) cause:java.io.IOException: All datanodes 112.123.123.123:50010 are bad. Aborting... Exception in thread main java.io.IOException: All datanodes 112.123.123.123:50010 are bad. Aborting... at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1023) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:838) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:483)
Re: Using YARN with native applications
YARN should kill the container. I’m not sure what JVM you’re referring to, but the NodeManager writes and then spawns a shell script that will invoke your shell script which in turn(presumably) will invoke your C++ application. A monitoring thread then looks at the memory usage of the process tree and compares it to the limits for the container. -Varun From: Kevin Reply-To: user@hadoop.apache.orgmailto:user@hadoop.apache.org Date: Tuesday, May 26, 2015 at 7:22 AM To: user@hadoop.apache.orgmailto:user@hadoop.apache.org Subject: Re: Using YARN with native applications Thanks for the reply, Varun. So if I use the DefaultContainerExecutor and run a C++ application via a shell script inside a container whose virtual memory limit is, for example, 2 GB, and that application does a malloc for 3 GB, YARN will kill the container? I always just thought that YARN kept its eye on the JVM it spins up for the container (under the DefaultContainerExecutor). -Kevin On Mon, May 25, 2015 at 4:17 AM, Varun Vasudev vvasu...@hortonworks.commailto:vvasu...@hortonworks.com wrote: Hi Kevin, By default, the NodeManager monitors physical and virtual memory usage of containers. Containers that exceed either limit are killed. Admins can disable the checks by setting yarn.nodemanager.pmem-check-enabled and/or yarn.nodemanager.vmem-check-enabled to false. The virtual memory limit for a container is determined using the config variable yarn.nodemanager.vmem-pmem-ratio(default value is 2.1). In case of vcores - 1. If you’re using Cgroups under LinuxContainerExecutor, by default, if there is spare CPU available on the node, your container will be allowed to use it. Admins can restrict containers to use only the CPU allocated to them by setting yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage to true. This setting is only applicable when using Cgroups under LinuxContainerExecutor. 2. If you aren’t using Cgroups under LinuxContainerExecutor, there is no limiting of the amount of the CPU that containers can use. -Varun From: Kevin Reply-To: user@hadoop.apache.orgmailto:user@hadoop.apache.org Date: Friday, May 22, 2015 at 3:30 AM To: user@hadoop.apache.orgmailto:user@hadoop.apache.org Subject: Using YARN with native applications Hello, I have been using the distributed shell application and Oozie to run native C++ applications in the cluster. Is YARN able to see the resources these native applications use. For example, if I use Oozie's shell action, the NodeManager hosts the mapper container and allocates a certain amount of memory and vcores (as configured). What happens if my C++ application uses more memory or vcores than the NodeManager allocated? I was looking in the Hadoop code and I couldn't find my way to answer. Although, it seems the LinuxContainerExecutor may be the answer to my question since it uses cgroups. I'm interested to know how YARN reacts to non-Java applications running inside of it. Thanks, Kevin
Re: How to test DFS?
Hi, You can use 'hdfs fsck' command for determining block locations. Sample run shows below: [root@qa-b1 ~]# hdfs fsck /tmp/jack -files -blocks -locations Connecting to namenode via http://192.168.50.171:50070 FSCK started by root (auth:SIMPLE) from /192.168.50.170 for path /tmp/jack at Wed May 27 14:51:56 KST 2015 /tmp/jack 517472256 bytes, 4 block(s): OK 0. BP-1171919055-192.168.50.171-1431320286009:blk_1073742878_2054 len=134217728 repl=3 [192.168.50.174:50010, 192.168.50.172:50010, 192.168.50.173:50010] 1. BP-1171919055-192.168.50.171-1431320286009:blk_1073742879_2055 len=134217728 repl=3 [192.168.50.174:50010, 192.168.50.172:50010, 192.168.50.173:50010] 2. BP-1171919055-192.168.50.171-1431320286009:blk_1073742880_2056 len=134217728 repl=3 [192.168.50.174:50010, 192.168.50.172:50010, 192.168.50.173:50010] 3. BP-1171919055-192.168.50.171-1431320286009:blk_1073742881_2057 len=114819072 repl=3 [192.168.50.174:50010, 192.168.50.172:50010, 192.168.50.173:50010] file /tmp/jack is split by four blocks. Block 0 is replicated 3 node, 192.168.50.174, 192.168.50.172, 192.168.50.173 Thanks. Drake 민영근 Ph.D kt NexR On Wed, May 27, 2015 at 8:58 AM, jay vyas jayunit100.apa...@gmail.com wrote: you could just list the file contents in your hadoop data/ directories, of the individual nodes, ... somewhere in there the file blocks will be floating around. On Tue, May 26, 2015 at 4:59 PM, Caesar Samsi caesarsa...@mac.com wrote: Hello, How would I go about and confirm that a file has been distributed successfully to all datanodes? I would like to demonstrate this capability in a short briefing for my colleagues. Can I access the file from the datanode itself (todate I can only access the files from the master node, not the slaves)? Thank you, Caesar. -- jay vyas
Cannot obtain block length for LocatedBlock
Hi all, I have an MR job running and exiting with following exception. java.io.IOException: Cannot obtain block length for LocatedBlock {BP-1632531813-172.19.67.67-1393407344218:blk_1109280129_1099547327549; getBlockSize()=139397; corrupt=false; offset=0; locs=[172.19.67.67:50010, 172.19.67.78:50010, 172.19.67.84:50010]} Now, the fun part is that i don't know which file is in question. In order to find this out, i did this: *hdfs fsck -files -blocks / | grep blk_1109280129_1099547327549* Interestingly enough, it came up with nothing. Did anyone experience anything similar? Or does anyone have a piece of advice on how to resolve this? Version of hadoop is 2.3.0 Thanks in advance! -- Adnan Karač ᐧ
RE: Cannot obtain block length for LocatedBlock
Can you try like following..? hdfs fsck -openforwrite -files -blocks -locations / | grep blk_1109280129_1099547327549 Thanks Regards Brahma Reddy Battula From: Adnan Karač [adnanka...@gmail.com] Sent: Tuesday, May 26, 2015 1:34 PM To: user@hadoop.apache.org Subject: Cannot obtain block length for LocatedBlock Hi all, I have an MR job running and exiting with following exception. java.io.IOException: Cannot obtain block length for LocatedBlock {BP-1632531813-172.19.67.67-1393407344218:blk_1109280129_1099547327549; getBlockSize()=139397; corrupt=false; offset=0; locs=[172.19.67.67:50010http://172.19.67.67:50010, 172.19.67.78:50010http://172.19.67.78:50010, 172.19.67.84:50010http://172.19.67.84:50010]} Now, the fun part is that i don't know which file is in question. In order to find this out, i did this: hdfs fsck -files -blocks / | grep blk_1109280129_1099547327549 Interestingly enough, it came up with nothing. Did anyone experience anything similar? Or does anyone have a piece of advice on how to resolve this? Version of hadoop is 2.3.0 Thanks in advance! -- Adnan Karač [https://mailfoogae.appspot.com/t?sender=aYWRuYW5rYXJhY0BnbWFpbC5jb20%3Dtype=zerocontentguid=316827dc-8cb2-45d7-a776-5c8b1d11bc17]ᐧ
Re: Cannot obtain block length for LocatedBlock
Hi Brahma, Thanks for the quick response. I assumed that running file check without *openforwrite* option would yield file in this block whether it was open for write and not. However, I have just tried it as well, unfortunately no success. Adnan ᐧ On Tue, May 26, 2015 at 10:12 AM, Brahma Reddy Battula brahmareddy.batt...@huawei.com wrote: Can you try like following..? * hdfs fsck -openforwrite -files -blocks -locations / | grep blk_1109280129_1099547327549* Thanks Regards Brahma Reddy Battula -- *From:* Adnan Karač [adnanka...@gmail.com] *Sent:* Tuesday, May 26, 2015 1:34 PM *To:* user@hadoop.apache.org *Subject:* Cannot obtain block length for LocatedBlock Hi all, I have an MR job running and exiting with following exception. java.io.IOException: Cannot obtain block length for LocatedBlock {BP-1632531813-172.19.67.67-1393407344218:blk_1109280129_1099547327549; getBlockSize()=139397; corrupt=false; offset=0; locs=[172.19.67.67:50010, 172.19.67.78:50010, 172.19.67.84:50010]} Now, the fun part is that i don't know which file is in question. In order to find this out, i did this: *hdfs fsck -files -blocks / | grep blk_1109280129_1099547327549* Interestingly enough, it came up with nothing. Did anyone experience anything similar? Or does anyone have a piece of advice on how to resolve this? Version of hadoop is 2.3.0 Thanks in advance! -- Adnan Karač ᐧ -- Adnan Karač
Re: Cannot obtain block length for LocatedBlock
Hi adnan I've met similar problem, the reducer output file length is zero and missing some bytes at the end of the output file. the cause is I use MultipleOutputs and forgot to close it at reducer cleanup method. hope it helps On 26 May 2015 at 17:13, Adnan Karač adnanka...@gmail.com wrote: Hi Brahma, Thanks for the quick response. I assumed that running file check without *openforwrite* option would yield file in this block whether it was open for write and not. However, I have just tried it as well, unfortunately no success. Adnan ᐧ On Tue, May 26, 2015 at 10:12 AM, Brahma Reddy Battula brahmareddy.batt...@huawei.com wrote: Can you try like following..? * hdfs fsck -openforwrite -files -blocks -locations / | grep blk_1109280129_1099547327549* Thanks Regards Brahma Reddy Battula -- *From:* Adnan Karač [adnanka...@gmail.com] *Sent:* Tuesday, May 26, 2015 1:34 PM *To:* user@hadoop.apache.org *Subject:* Cannot obtain block length for LocatedBlock Hi all, I have an MR job running and exiting with following exception. java.io.IOException: Cannot obtain block length for LocatedBlock {BP-1632531813-172.19.67.67-1393407344218:blk_1109280129_1099547327549; getBlockSize()=139397; corrupt=false; offset=0; locs=[172.19.67.67:50010, 172.19.67.78:50010, 172.19.67.84:50010]} Now, the fun part is that i don't know which file is in question. In order to find this out, i did this: *hdfs fsck -files -blocks / | grep blk_1109280129_1099547327549* Interestingly enough, it came up with nothing. Did anyone experience anything similar? Or does anyone have a piece of advice on how to resolve this? Version of hadoop is 2.3.0 Thanks in advance! -- Adnan Karač ᐧ -- Adnan Karač -- All the best Liu Bo
Please unsubscribe Me.
Please Unsubscribe Me. aqeel@gmail.com -- *Regards,Aqeel Ahmed*
RE: Please unsubscribe Me.
Please send email to user-unsubscr...@hadoop.apache.orghttp://apache.org Thanks Regards Brahma Reddy Battula From: Aqeel Ahmed [aqeel@gmail.com] Sent: Tuesday, May 26, 2015 6:47 PM To: user@hadoop.apache.org Subject: Please unsubscribe Me. Please Unsubscribe Me. aqeel@gmail.commailto:aqeel@gmail.com -- Regards, Aqeel Ahmed
How to test DFS?
Hello, How would I go about and confirm that a file has been distributed successfully to all datanodes? I would like to demonstrate this capability in a short briefing for my colleagues. Can I access the file from the datanode itself (todate I can only access the files from the master node, not the slaves)? Thank you, Caesar.
Socket Timeout Exception
Hi, I'm seeing this exception on every HDFS node once in a while on one cluster: 2015-05-26 13:37:31,831 INFO datanode.DataNode (BlockSender.java:sendPacket(566)) - Failed to send data: java.net.SocketTimeoutException: 1 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/172.22.5.34:50010 remote=/ 172.22.5.34:31684] 2015-05-26 13:37:31,831 INFO DataNode.clienttrace (BlockSender.java:sendBlock(738)) - src: /172.22.5.34:50010, dest: / 172.22.5.34:31684, bytes: 12451840, op: HDFS_READ, cliID: DFSClient_hb_rs_my-hadoop-node-fqdn,60020,1432041913240_-1351889511_35, offset: 47212032, srvID: 9bfc58b8-94b0-40a5-ba33-6d712fa1faa2, blockid: BP-1988583858-172.22.5.40-1424448407690:blk_1105314202_31576629, duration: 10486866121 2015-05-26 13:37:31,831 WARN datanode.DataNode (DataXceiver.java:readBlock(541)) - DatanodeRegistration(172.22.5.34, datanodeUuid=9bfc58b8-94b0-40a5-ba33-6d712fa1faa2, infoPort=50075, ipcPort=8010, storageInfo=lv=-55;cid=CID-962af1ea-201a-4d27-ae80-e4a7b712f1ac;nsid=109597947;c=0):Got exception while serving BP-1988583858-172.22.5.40-1424448407690:blk_1105314202_31576629 to / 172.22.5.34:31684 java.net.SocketTimeoutException: 1 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/172.22.5.34:50010 remote=/ 172.22.5.34:31684] at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172) at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:547) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:716) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:506) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:110) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232) at java.lang.Thread.run(Thread.java:745) 2015-05-26 13:37:31,831 ERROR datanode.DataNode (DataXceiver.java:run(250)) - my-hadoop-node-fqdn:50010:DataXceiver error processing READ_BLOCK operation src: /172.22.5.34:31684 dst: /172.22.5.34:50010 java.net.SocketTimeoutException: 1 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/172.22.5.34:50010 remote=/ 172.22.5.34:31684] at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172) at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:547) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:716) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:506) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:110) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232) at java.lang.Thread.run(Thread.java:745) ...and it's basically only complaining about itself. On same node there's HDFS, RegionServer and Yarn. I'm struggling little bit how to interpret this. Funny thing is that this is our live cluster, the one where we are writing everything. Thinking if it's possible that HBase flush size (256M) is problem while block size is 128M. Any advice where to look is welcome! Thanks, Dejan
Re: What is the git clone URL for a stable apache hadoop release
Rongzheng, The correct URL is https://git-wip-us.apache.org/repos/asf/hadoop.git To target a stable release you can either checkout the corresponding git tag or download the source release from https://hadoop.apache.org/releases.html You may see occasional test failures as some tests are flaky so feel free to file a Jira and attach test logs if the issue has not been reported already. If you see a large number of failures it could be specific to your setup. On 5/26/15, 3:18 PM, rongzheng yan rongzheng@oracle.com wrote: Hello, I cloned a local Git repository from Apache repository from URL http://git.apache.org/hadoop.git. Before I did any change, I tried to build and run the tests, but got several test failures. Is any test failure expected in Apache repository? From JIRA Hadoop-11636, it seems that there are some test failures left in the Apache repository. If this is true, where can I get the git clone URL for a stable release? (e.g. apache hadoop 2.7.0) Is a stable release clean, without any test failure? Thanks in advance, Rongzheng
What is the git clone URL for a stable apache hadoop release
Hello, I cloned a local Git repository from Apache repository from URL http://git.apache.org/hadoop.git. Before I did any change, I tried to build and run the tests, but got several test failures. Is any test failure expected in Apache repository? From JIRA Hadoop-11636, it seems that there are some test failures left in the Apache repository. If this is true, where can I get the git clone URL for a stable release? (e.g. apache hadoop 2.7.0) Is a stable release clean, without any test failure? Thanks in advance, Rongzheng