[ https://issues.apache.org/jira/browse/HDFS-4130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13491147#comment-13491147 ]
Han Xiao commented on HDFS-4130: -------------------------------- Hi Uma, There are thousands of them. The default configuration will not purge them until there are more than 100,0000 EXTRA_EDITS, which is configured by "dfs.namenode.num.extra.edits.retained". If there is no operation, each edit-log will contain two txids which are the start and end of segment. It means that with the default, not until more than 50,000 edit-log files/ledgers, the redundant files/ledgers would be cleaned. We can set it smaller. And anyway, the way of reading ledger from bk in bkjm now is inefficient and the change works better. > The reading for editlog at NN starting using bkjm is not efficient > ------------------------------------------------------------------- > > Key: HDFS-4130 > URL: https://issues.apache.org/jira/browse/HDFS-4130 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, performance > Affects Versions: 2.0.2-alpha > Reporter: Han Xiao > Attachments: HDFS-4130.patch > > > Now, the method of BookKeeperJournalManager.selectInputStreams is written > like: > while (true) { > EditLogInputStream elis; > try { > elis = getInputStream(fromTxId, inProgressOk); > } catch (IOException e) { > LOG.error(e); > return; > } > if (elis == null) { > return; > } > streams.add(elis); > if (elis.getLastTxId() == HdfsConstants.INVALID_TXID) { > return; > } > fromTxId = elis.getLastTxId() + 1; > } > > EditLogInputstream is got from getInputStream(), which will read the ledgers > from zookeeper in each calling. > This will be a larger cost of times when the the number ledgers becomes large. > The reading of ledgers from zk is not necessary for every calling of > getInputStream(). > The log of time wasting here is as follows: > 2012-10-30 16:44:52,995 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: > Caching file names occuring more than 10 times > 2012-10-30 16:49:24,643 INFO > hidden.bkjournal.org.apache.bookkeeper.proto.PerChannelBookieClient: > Successfully connected to bookie: /167.52.1.121:318 > The stack of the process when blocking between the two lines of log is like: > "main" prio=10 tid=0x000000004011f000 nid=0x39ba in Object.wait() > \[0x00007fca020fe000\] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:485) > at > hidden.bkjournal.org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1253) > \- locked <0x00000006fb8495a8> (a > hidden.bkjournal.org.apache.zookeeper.ClientCnxn$Packet) > at > hidden.bkjournal.org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1129) > at > org.apache.hadoop.contrib.bkjournal.utils.RetryableZookeeper.getData(RetryableZookeeper.java:501) > at > hidden.bkjournal.org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1160) > at > org.apache.hadoop.contrib.bkjournal.EditLogLedgerMetadata.read(EditLogLedgerMetadata.java:113) > at > org.apache.hadoop.contrib.bkjournal.BookKeeperJournalManager.getLedgerList(BookKeeperJournalManager.java:725) > at > org.apache.hadoop.contrib.bkjournal.BookKeeperJournalManager.getInputStream(BookKeeperJournalManager.java:442) > at > org.apache.hadoop.contrib.bkjournal.BookKeeperJournalManager.selectInputStreams(BookKeeperJournalManager.java:480) > > betweent different time, the diff of stack is: > diff stack stack2 > 1c1 > < 2012-10-30 16:44:53 > --- > > 2012-10-30 16:46:17 > 106c106 > < - locked <0x00000006fb8495a8> (a > hidden.bkjournal.org.apache.zookeeper.ClientCnxn$Packet) > --- > > - locked <0x00000006fae58468> (a > > hidden.bkjournal.org.apache.zookeeper.ClientCnxn$Packet) > In our environment, the waiting time could even reach to tens of minutes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira