[ https://issues.apache.org/jira/browse/HBASE-9208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13744502#comment-13744502 ]
Lars Hofhansl commented on HBASE-9208: -------------------------------------- Trunk patch looks. +1 > ReplicationLogCleaner slow at large scale > ----------------------------------------- > > Key: HBASE-9208 > URL: https://issues.apache.org/jira/browse/HBASE-9208 > Project: HBase > Issue Type: Improvement > Components: Replication > Reporter: Dave Latham > Assignee: Dave Latham > Fix For: 0.94.12, 0.96.0 > > Attachments: HBASE-9208-0.94.patch, HBASE-9208.patch, > HBASE-9208-v2.patch > > > At a large scale the ReplicationLogCleaner fails to clean up .oldlogs as fast > as the cluster is producing them. For each old HLog file that has been > replicated and should be deleted the ReplicationLogCleaner checks every > replication queue in ZooKeeper before removing it. This means that as a > cluster scales up the number of files to delete scales as well as the time to > delete each file so the cleanup chore scales quadratically. In our case it > reached the point where the oldlogs were growing faster than they were being > cleaned up. > We're now running with a patch that allows the ReplicationLogCleaner to > refresh its list of files in the replication queues from ZooKeeper just once > for each batch of files the CleanerChore wants to evaluate. > I'd propose updating FileCleanerDelegate to take a List<FileStatus> rather > than a single one at a time. This would allow file cleaners that check an > external resource for references such as ZooKeeper (for > ReplicationLogCleaner) or HDFS (for SnapshotLogCleaner which looks like it > may also have similar trouble at scale) to load those references once per > batch rather than for every log. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira