[ https://issues.apache.org/jira/browse/HDFS-13833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16584445#comment-16584445 ]
Xiao Chen commented on HDFS-13833: ---------------------------------- I think the total load == 0.0 issue is a corner case when running with very small number of datanodes. [~shwetayakkali] was looking at a test failure earlier, and the 0.0 case seems should be fixed for the consider load logic. > Failed to choose from local rack (location = /default); the second replica is > not found, retry choosing ramdomly > ---------------------------------------------------------------------------------------------------------------- > > Key: HDFS-13833 > URL: https://issues.apache.org/jira/browse/HDFS-13833 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Henrique Barros > Priority: Critical > > I'm having a random problem with blocks replication with Hadoop > 2.6.0-cdh5.15.0 > With Cloudera CDH-5.15.0-1.cdh5.15.0.p0.21 > > In my case we are getting this error very randomly (after some hours) and > with only one Datanode (for now, we are trying this cloudera cluster for a > POC) > Here is the Log. > {code:java} > Choosing random from 1 available nodes on node /default, scope=/default, > excludedScope=null, excludeNodes=[] > 2:38:20.527 PM DEBUG NetworkTopology > Choosing random from 0 available nodes on node /default, scope=/default, > excludedScope=null, excludeNodes=[192.168.220.53:50010] > 2:38:20.527 PM DEBUG NetworkTopology > chooseRandom returning null > 2:38:20.527 PM DEBUG BlockPlacementPolicy > [ > Node /default/192.168.220.53:50010 [ > Datanode 192.168.220.53:50010 is not chosen since the node is too busy > (load: 8 > 0.0). > 2:38:20.527 PM DEBUG NetworkTopology > chooseRandom returning 192.168.220.53:50010 > 2:38:20.527 PM INFO BlockPlacementPolicy > Not enough replicas was chosen. Reason:{NODE_TOO_BUSY=1} > 2:38:20.527 PM DEBUG StateChange > closeFile: > /mobi.me/development/apps/flink/checkpoints/a5a6806866c1640660924ea1453cbe34/chk-2118/eef8bff6-75a9-43c1-ae93-4b1a9ca31ad9 > with 1 blocks is persisted to the file system > 2:38:20.527 PM DEBUG StateChange > *BLOCK* NameNode.addBlock: file > /mobi.me/development/apps/flink/checkpoints/a5a6806866c1640660924ea1453cbe34/chk-2118/1cfe900d-6f45-4b55-baaa-73c02ace2660 > fileId=129628869 for DFSClient_NONMAPREDUCE_467616914_65 > 2:38:20.527 PM DEBUG BlockPlacementPolicy > Failed to choose from local rack (location = /default); the second replica is > not found, retry choosing ramdomly > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy$NotEnoughReplicasException: > > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:784) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:694) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseLocalRack(BlockPlacementPolicyDefault.java:601) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseLocalStorage(BlockPlacementPolicyDefault.java:561) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTargetInOrder(BlockPlacementPolicyDefault.java:464) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:395) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:270) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:142) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:158) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1715) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3505) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:694) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:219) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:507) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2281) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2277) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2275) > {code} > This part makes no sense at all: > {code:java} > load: 8 > 0.0{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org