[jira] [Commented] (HDFS-5394) fix race conditions in DN caching and uncaching

Andrew Wang (JIRA) Thu, 07 Nov 2013 14:07:27 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13816492#comment-13816492
 ]


Andrew Wang commented on HDFS-5394:
-----------------------------------

Thanks for bumping Colin, basically just rollup in this review:

bq. Could this be written with value.state == State.CACHING_CANCELLED instead?

My point here was about the logic, since I did a "find usages" on 
CACHING_CANCELLED in Eclipse and only saw it being set. Right now it checks 
"not CACHED" which should be equivalent to "is CACHING_CANCELLED" because of 
the state transition invariants, and ideally with this kind of logic, we 
transition based on being *in* a state rather than *not being* in a state.

bq. I would rather not do that, since right now we can look at entries in the 
map and instantly know that anything in state UNCACHING has an associated 
Runnable scheduled in the Executor.

I guess this makes sense in light of HDFS-5182, since uncaching might require 
waiting for clients while cancelling caching shouldn't. In either case though, 
something needs to happen, it's just that instead of deferring the work to an 
UncachingTask, it's deferred to the end of the CachingTask.

bq. <waitFor>

Makes sense, though I'll note that 6,000,000 is 100 minutes, not ten minutes :) 
Overkill.

bq. <catching FileNotFoundException>

This is better, thanks. As a general comment, I'd like to avoid relying on NN 
retries if possible, but I guess it's okay for now.

Test:
* Do we need that {{Preconditions}} check in {{setUp}}? There's already an 
assumeTrue for the same thing right above it, so I don't think it'll do 
anything.
* I'd like to see the {{LogVerificationAppender}} used in 
{{testUncachingBlocksBeforeCachingFinishes}} too. This seems like it might be 
flaky though. What was wrong with the old approach that used a barrier to force 
ordering?

Also need to run through the Jenkins stuff still. The javac warning is fine 
(the new usage of Unsafe to get the page size) but the rest needs to be touched 
up. Not sure about the test failure.

> fix race conditions in DN caching and uncaching
> -----------------------------------------------
>
>                 Key: HDFS-5394
>                 URL: https://issues.apache.org/jira/browse/HDFS-5394
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode, namenode
>    Affects Versions: 3.0.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-5394-caching.001.patch, 
> HDFS-5394-caching.002.patch, HDFS-5394-caching.003.patch, 
> HDFS-5394-caching.004.patch, HDFS-5394.005.patch, HDFS-5394.006.patch, 
> HDFS-5394.007.patch, HDFS-5394.008.patch
>
>
> The DN needs to handle situations where it is asked to cache the same replica 
> more than once.  (Currently, it can actually do two mmaps and mlocks.)  It 
> also needs to handle the situation where caching a replica is cancelled 
> before said caching completes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HDFS-5394) fix race conditions in DN caching and uncaching

Reply via email to