[ https://issues.apache.org/jira/browse/HDFS-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Surendra Singh Lilhore updated HDFS-13768: ------------------------------------------ Attachment: HDFS-13768.04.patch > Adding replicas to volume map makes DataNode start slowly > ----------------------------------------------------------- > > Key: HDFS-13768 > URL: https://issues.apache.org/jira/browse/HDFS-13768 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 3.1.0 > Reporter: Yiqun Lin > Assignee: Surendra Singh Lilhore > Priority: Major > Attachments: HDFS-13768.01.patch, HDFS-13768.02.patch, > HDFS-13768.03.patch, HDFS-13768.04.patch, HDFS-13768.patch, screenshot-1.png > > > We find DN starting so slowly when rolling upgrade our cluster. When we > restart DNs, the DNs start so slowly and not register to NN immediately. And > this cause a lots of following error: > {noformat} > DataXceiver error processing WRITE_BLOCK operation src: /xx.xx.xx.xx:64360 > dst: /xx.xx.xx.xx:50010 > java.io.IOException: Not ready to serve the block pool, > BP-1508644862-xx.xx.xx.xx-1493781183457. > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1290) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1298) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:630) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Looking into the logic of DN startup, it will do the initial block pool > operation before the registration. And during initializing block pool > operation, we found the adding replicas to volume map is the most expensive > operation. Related log: > {noformat} > 2018-07-26 10:46:23,771 INFO [Thread-105] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/1/dfs/dn/current: 242722ms > 2018-07-26 10:46:26,231 INFO [Thread-109] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/5/dfs/dn/current: 245182ms > 2018-07-26 10:46:32,146 INFO [Thread-112] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/8/dfs/dn/current: 251097ms > 2018-07-26 10:47:08,283 INFO [Thread-106] > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to > add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on > volume /home/hard_disk/2/dfs/dn/current: 287235ms > {noformat} > Currently DN uses independent thread to scan and add replica for each volume, > but we still need to wait the slowest thread to finish its work. So the main > problem here is that we could make the thread to run faster. > The jstack we get when DN blocking in the adding replica: > {noformat} > "Thread-113" #419 daemon prio=5 os_prio=0 tid=0x00007f40879ff000 nid=0x145da > runnable [0x00007f4043a38000] > java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.list(Native Method) > at java.io.File.list(File.java:1122) > at java.io.File.listFiles(File.java:1207) > at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1165) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:445) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.getVolumeMap(BlockPoolSlice.java:342) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getVolumeMap(FsVolumeImpl.java:864) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$1.run(FsVolumeList.java:191) > {noformat} > One improvement maybe we can use ForkJoinPool to do this recursive task, > rather than a sync way. This will be a great improvement because it can > greatly speed up recovery process. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org