[ https://issues.apache.org/jira/browse/HDFS-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15049010#comment-15049010 ]
Xiaoyu Yao commented on HDFS-9530: ---------------------------------- This looks like the symptom of HDFS-8072, where RBW reserved space is not released when Datanode BlockReceiver encounters an IOException. The space won't be releases until DN restart. The fix should be included in hadoop 2.6.2 and 2.7.1. Can you post the "hadoop version" command output? > huge Non-DFS Used in hadoop 2.6.2 & 2.7.1 > ----------------------------------------- > > Key: HDFS-9530 > URL: https://issues.apache.org/jira/browse/HDFS-9530 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Fei Hui > > i run a hive job, and errors are as follow > =============================================================================== > Diagnostic Messages for this Task: > Error: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row {"k":"1","v":1} > at > org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:172) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing row {"k":"1","v":1} > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:518) > at > org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:163) > ... 8 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /test_abc/.hive-staging_hive_2015-12-09_15-24-10_553_7745334154733108653-1/_task_tmp.-ext-10002/pt=23/_tmp.000017_3 > could only be replicated to 0 nodes instead of minReplication (=1). There > are 3 datanode(s) running and no node(s) are excluded in this operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1562) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3245) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:663) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:482) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2036) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2034) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:787) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:97) > at > org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:162) > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:508) > ... 9 more > Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /test_abc/.hive-staging_hive_2015-12-09_15-24-10_553_7745334154733108653-1/_task_tmp.-ext-10002/pt=23/_tmp.000017_3 > could only be replicated to 0 nodes instead of minReplication (=1). There > are 3 datanode(s) running and no node(s) are excluded in this operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1562) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3245) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:663) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:482) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2036) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2034) > at org.apache.hadoop.ipc.Client.call(Client.java:1469) > at org.apache.hadoop.ipc.Client.call(Client.java:1400) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) > at com.sun.proxy.$Proxy12.addBlock(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:399) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy13.addBlock(Unknown Source) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1532) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1349) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588) > i think there are bugs in HDFS > =============================================================================== > here is config > <property> > <name>dfs.datanode.data.dir</name> > <value> > > file:///mnt/disk4,file:///mnt/disk1,file:///mnt/disk3,file:///mnt/disk2 > </value> > </property> > here is dfsadmin report > [hadoop@worker-1 ~]$ hadoop dfsadmin -report > DEPRECATED: Use of this script to execute hdfs command is deprecated. > Instead use the hdfs command for it. > Configured Capacity: 240769253376 (224.23 GB) > Present Capacity: 238604832768 (222.22 GB) > DFS Remaining: 215772954624 (200.95 GB) > DFS Used: 22831878144 (21.26 GB) > DFS Used%: 9.57% > Under replicated blocks: 4 > Blocks with corrupt replicas: 0 > Missing blocks: 0 > ------------------------------------------------- > Live datanodes (3): > Name: 10.117.60.59:50010 (worker-2) > Hostname: worker-2 > Decommission Status : Normal > Configured Capacity: 80256417792 (74.74 GB) > DFS Used: 7190958080 (6.70 GB) > Non DFS Used: 721473536 (688.05 MB) > DFS Remaining: 72343986176 (67.38 GB) > DFS Used%: 8.96% > DFS Remaining%: 90.14% > Configured Cache Capacity: 0 (0 B) > Cache Used: 0 (0 B) > Cache Remaining: 0 (0 B) > Cache Used%: 100.00% > Cache Remaining%: 0.00% > Xceivers: 1 > Last contact: Wed Dec 09 15:55:02 CST 2015 > Name: 10.168.156.0:50010 (worker-3) > Hostname: worker-3 > Decommission Status : Normal > Configured Capacity: 80256417792 (74.74 GB) > DFS Used: 7219073024 (6.72 GB) > Non DFS Used: 721473536 (688.05 MB) > DFS Remaining: 72315871232 (67.35 GB) > DFS Used%: 9.00% > DFS Remaining%: 90.11% > Configured Cache Capacity: 0 (0 B) > Cache Used: 0 (0 B) > Cache Remaining: 0 (0 B) > Cache Used%: 100.00% > Cache Remaining%: 0.00% > Xceivers: 1 > Last contact: Wed Dec 09 15:55:03 CST 2015 > Name: 10.117.15.38:50010 (worker-1) > Hostname: worker-1 > Decommission Status : Normal > Configured Capacity: 80256417792 (74.74 GB) > DFS Used: 8421847040 (7.84 GB) > Non DFS Used: 721473536 (688.05 MB) > DFS Remaining: 71113097216 (66.23 GB) > DFS Used%: 10.49% > DFS Remaining%: 88.61% > Configured Cache Capacity: 0 (0 B) > Cache Used: 0 (0 B) > Cache Remaining: 0 (0 B) > Cache Used%: 100.00% > Cache Remaining%: 0.00% > Xceivers: 1 > Last contact: Wed Dec 09 15:55:03 CST 2015 > ================================================================================ > when running hive job , dfsadmin report as follows > [hadoop@worker-1 ~]$ hadoop dfsadmin -report > DEPRECATED: Use of this script to execute hdfs command is deprecated. > Instead use the hdfs command for it. > Configured Capacity: 240769253376 (224.23 GB) > Present Capacity: 108266011136 (100.83 GB) > DFS Remaining: 80078416384 (74.58 GB) > DFS Used: 28187594752 (26.25 GB) > DFS Used%: 26.04% > Under replicated blocks: 7 > Blocks with corrupt replicas: 0 > Missing blocks: 0 > ------------------------------------------------- > Live datanodes (3): > Name: 10.117.60.59:50010 (worker-2) > Hostname: worker-2 > Decommission Status : Normal > Configured Capacity: 80256417792 (74.74 GB) > DFS Used: 9015627776 (8.40 GB) > Non DFS Used: 44303742464 (41.26 GB) > DFS Remaining: 26937047552 (25.09 GB) > DFS Used%: 11.23% > DFS Remaining%: 33.56% > Configured Cache Capacity: 0 (0 B) > Cache Used: 0 (0 B) > Cache Remaining: 0 (0 B) > Cache Used%: 100.00% > Cache Remaining%: 0.00% > Xceivers: 693 > Last contact: Wed Dec 09 15:37:35 CST 2015 > Name: 10.168.156.0:50010 (worker-3) > Hostname: worker-3 > Decommission Status : Normal > Configured Capacity: 80256417792 (74.74 GB) > DFS Used: 9163116544 (8.53 GB) > Non DFS Used: 47895897600 (44.61 GB) > DFS Remaining: 23197403648 (21.60 GB) > DFS Used%: 11.42% > DFS Remaining%: 28.90% > Configured Cache Capacity: 0 (0 B) > Cache Used: 0 (0 B) > Cache Remaining: 0 (0 B) > Cache Used%: 100.00% > Cache Remaining%: 0.00% > Xceivers: 750 > Last contact: Wed Dec 09 15:37:36 CST 2015 > Name: 10.117.15.38:50010 (worker-1) > Hostname: worker-1 > Decommission Status : Normal > Configured Capacity: 80256417792 (74.74 GB) > DFS Used: 10008850432 (9.32 GB) > Non DFS Used: 40303602176 (37.54 GB) > DFS Remaining: 29943965184 (27.89 GB) > DFS Used%: 12.47% > DFS Remaining%: 37.31% > Configured Cache Capacity: 0 (0 B) > Cache Used: 0 (0 B) > Cache Remaining: 0 (0 B) > Cache Used%: 100.00% > Cache Remaining%: 0.00% > Xceivers: 632 > Last contact: Wed Dec 09 15:37:36 CST 2015 > ========================================================================= > but, df output is as follows on worker-1 > [hadoop@worker-1 ~]$ df > Filesystem 1K-blocks Used Available Use% Mounted on > /dev/xvda1 20641404 4229676 15363204 22% / > tmpfs 8165456 0 8165456 0% /dev/shm > /dev/xvdc 20642428 2596920 16996932 14% /mnt/disk3 > /dev/xvdb 20642428 2692228 16901624 14% /mnt/disk4 > /dev/xvdd 20642428 2445852 17148000 13% /mnt/disk2 > /dev/xvde 20642428 2909764 16684088 15% /mnt/disk1 > df output conflitcs with dfsadmin report > any suggestions? -- This message was sent by Atlassian JIRA (v6.3.4#6332)