[ 
https://issues.apache.org/jira/browse/HBASE-11322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14031486#comment-14031486
 ] 

churro morales commented on HBASE-11322:
----------------------------------------

Hi Lars, 

That is correct, but maybe I'm missing something with the Math.max solution.

Provided that if we say lastModifiedTime is the greater of the two modified 
times for the directories in question.  
And we say that when doing any given check, the maximum value that 
lastModifiedTime can have is the current timestamp.  
Then, if any of the directories is modified between the current check and the 
next subsequent check in the future.  Their respective modified times must be 
larger than the previous lastModifiedTime variable as long as the next check 
happens in the future.

On the other hand maybe its late and I'm not thinking of a condition where this 
could be incorrect.  
We applied the patch in production today and the cleaner could finally keep up 
without us having to manually step in and delete files manually from the 
archive directory.  Our namenode was finally happy and we stopped running out 
of dfs disk space.


> SnapshotHFileCleaner makes the wrong check for lastModified time thus causing 
> too many cache refreshes
> ------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-11322
>                 URL: https://issues.apache.org/jira/browse/HBASE-11322
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.19
>            Reporter: churro morales
>            Assignee: churro morales
>            Priority: Critical
>             Fix For: 0.94.21
>
>         Attachments: 11322.94.txt, HBASE-11322.patch
>
>
> The SnapshotHFileCleaner calls the SnapshotFileCache if a particular HFile in 
> question is part of a snapshot.
> If the HFile is not in the cache, we then refresh the cache and check again.
> But the cache refresh checks to see if anything has been modified since the 
> last cache refresh but this logic is incorrect in certain scenarios.
> The last modified time is done via this operation:
> {code}
> this.lastModifiedTime = Math.min(dirStatus.getModificationTime(),
>                                      tempStatus.getModificationTime());
> {code}
> and the check to see if the snapshot directories have been modified:
> {code}
> // if the snapshot directory wasn't modified since we last check, we are done
>     if (dirStatus.getModificationTime() <= lastModifiedTime &&
>         tempStatus.getModificationTime() <= lastModifiedTime) {
>       return;
>     }
> {code}
> Suppose the following happens:
> dirStatus modified 6-1-2014
> tempStatus modified 6-2-2014
> lastModifiedTime = 6-1-2014
> provided these two directories don't get modified again all subsequent checks 
> wont exit early, like they should.
> In our cluster, this was a huge performance hit.  The cleaner chain fell 
> behind, thus almost filling up dfs and our namenode heap.
> Its a simple fix, instead of Math.min we use Math.max for the lastModified, I 
> believe that will be correct.
> I'll apply a patch for you guys.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to