[ https://issues.apache.org/jira/browse/HDFS-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13816492#comment-13816492 ]
Andrew Wang commented on HDFS-5394: ----------------------------------- Thanks for bumping Colin, basically just rollup in this review: bq. Could this be written with value.state == State.CACHING_CANCELLED instead? My point here was about the logic, since I did a "find usages" on CACHING_CANCELLED in Eclipse and only saw it being set. Right now it checks "not CACHED" which should be equivalent to "is CACHING_CANCELLED" because of the state transition invariants, and ideally with this kind of logic, we transition based on being *in* a state rather than *not being* in a state. bq. I would rather not do that, since right now we can look at entries in the map and instantly know that anything in state UNCACHING has an associated Runnable scheduled in the Executor. I guess this makes sense in light of HDFS-5182, since uncaching might require waiting for clients while cancelling caching shouldn't. In either case though, something needs to happen, it's just that instead of deferring the work to an UncachingTask, it's deferred to the end of the CachingTask. bq. <waitFor> Makes sense, though I'll note that 6,000,000 is 100 minutes, not ten minutes :) Overkill. bq. <catching FileNotFoundException> This is better, thanks. As a general comment, I'd like to avoid relying on NN retries if possible, but I guess it's okay for now. Test: * Do we need that {{Preconditions}} check in {{setUp}}? There's already an assumeTrue for the same thing right above it, so I don't think it'll do anything. * I'd like to see the {{LogVerificationAppender}} used in {{testUncachingBlocksBeforeCachingFinishes}} too. This seems like it might be flaky though. What was wrong with the old approach that used a barrier to force ordering? Also need to run through the Jenkins stuff still. The javac warning is fine (the new usage of Unsafe to get the page size) but the rest needs to be touched up. Not sure about the test failure. > fix race conditions in DN caching and uncaching > ----------------------------------------------- > > Key: HDFS-5394 > URL: https://issues.apache.org/jira/browse/HDFS-5394 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode > Affects Versions: 3.0.0 > Reporter: Colin Patrick McCabe > Assignee: Colin Patrick McCabe > Attachments: HDFS-5394-caching.001.patch, > HDFS-5394-caching.002.patch, HDFS-5394-caching.003.patch, > HDFS-5394-caching.004.patch, HDFS-5394.005.patch, HDFS-5394.006.patch, > HDFS-5394.007.patch, HDFS-5394.008.patch > > > The DN needs to handle situations where it is asked to cache the same replica > more than once. (Currently, it can actually do two mmaps and mlocks.) It > also needs to handle the situation where caching a replica is cancelled > before said caching completes. -- This message was sent by Atlassian JIRA (v6.1#6144)