[ https://issues.apache.org/jira/browse/HBASE-11322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14036528#comment-14036528 ]
Lars Hofhansl edited comment on HBASE-11322 at 6/18/14 10:18 PM: ----------------------------------------------------------------- Let's change min to max in all branches to get this into 0.94.21. Edit: Just saw the last comment... OK, we need to fix these together. was (Author: lhofhansl): Let's change min to max in all branches to get this into 0.94.21. > SnapshotHFileCleaner makes the wrong check for lastModified time thus causing > too many cache refreshes > ------------------------------------------------------------------------------------------------------ > > Key: HBASE-11322 > URL: https://issues.apache.org/jira/browse/HBASE-11322 > Project: HBase > Issue Type: Bug > Affects Versions: 0.94.19 > Reporter: churro morales > Assignee: churro morales > Priority: Critical > Fix For: 0.94.21 > > Attachments: 11322.94.txt, HBASE-11322.patch > > > The SnapshotHFileCleaner calls the SnapshotFileCache if a particular HFile in > question is part of a snapshot. > If the HFile is not in the cache, we then refresh the cache and check again. > But the cache refresh checks to see if anything has been modified since the > last cache refresh but this logic is incorrect in certain scenarios. > The last modified time is done via this operation: > {code} > this.lastModifiedTime = Math.min(dirStatus.getModificationTime(), > tempStatus.getModificationTime()); > {code} > and the check to see if the snapshot directories have been modified: > {code} > // if the snapshot directory wasn't modified since we last check, we are done > if (dirStatus.getModificationTime() <= lastModifiedTime && > tempStatus.getModificationTime() <= lastModifiedTime) { > return; > } > {code} > Suppose the following happens: > dirStatus modified 6-1-2014 > tempStatus modified 6-2-2014 > lastModifiedTime = 6-1-2014 > provided these two directories don't get modified again all subsequent checks > wont exit early, like they should. > In our cluster, this was a huge performance hit. The cleaner chain fell > behind, thus almost filling up dfs and our namenode heap. > Its a simple fix, instead of Math.min we use Math.max for the lastModified, I > believe that will be correct. > I'll apply a patch for you guys. -- This message was sent by Atlassian JIRA (v6.2#6252)