[jira] [Commented] (HDFS-5394) fix race conditions in DN caching and uncaching

Colin Patrick McCabe (JIRA) Wed, 23 Oct 2013 21:43:32 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13803740#comment-13803740
 ]


Colin Patrick McCabe commented on HDFS-5394:
--------------------------------------------

bq. This state machine logic seems like overkill. It's cool how we can do all 
this with atomic swaps, but synchronization isn't a big overhead as this isn't 
a hot path, there aren't that many states, and this the transitions aren't that 
complicated. I think it'd be simpler to just use a couple different maps: 
replicasMap for cached replicas and pendingCache / pendingUncache for 
in-progress replicas.

I think the state machine makes things easier to understand, and is more 
efficient as well.  A "big lock" architecture would end up being just as 
complex, due to the fact that we can't do long-running operations under the 
lock.  You'd have to carefully check the state of everything after taking the 
big lock again, which ends up being just as complex as the compare-and-swap 
here.

Whether to advertise a block is a different issue than state machine vs. ad 
hoc.  Currently, we only advertise blocks in the {{CACHING}} state, meaning 
blocks that have been completely cached and not scheduled for uncaching.  
Originally I had a state which was "uncaching but visible" but it seemed like 
that wasn't a good idea because work might be scheduled on the node in the 
erroneous belief that the block would stay cached, etc.

bq. I like the idea of UncachingTask, it's good future-proofing. However, I 
don't think it should be using the same executor as CachingTask, since the 
executor is currently capped at 4 threads per volume in the interest of disk 
contention. Uncaching tasks do no I/O, and we wouldn't want stuck uncaching 
tasks to stop other work. It might be better to instead have a background 
sweeper thread that goes through pendingUncache (kicked on-demand as 
necessary). This works well with a separate pendingUncache map too.

It's not really possible for Uncaching tasks to "get stuck" currently, since 
all they do is munmap.  Although that's technically a blocking system call, 
since the mmap is not dirty it won't take long.  We could have a separate 
executor, I suppose, to prevent caching tasks from blocking uncaching ones.

In the long term, we are not going to have threads blocking waiting for 
clients.  Instead, we'll have tasks that reschedule themselves with some fixed 
period to poll whether the client has released the mmap.  It would be nice to 
think of a way to edge-trigger this, but that's probably an optimization for 
later.

In any case, I'll create another executor to prevent mmap from blocking munmap.

bq. NativeIO: should move the comment up to be javadoc instead

OK.

bq. A remark, this only sets nextValue to null, which is okay here, but seeing 
it did make me check that it's not an error:

The Java compiler complains about local variables that are used before 
initialization.  So you don't need to check.

See 
http://developer.nokia.com/Community/Wiki/Initializing_local_variables_in_Java:
bq. In Java, it is a fixed rule that you may not read the value from a local 
variable unless the compiler can prove that the variable will have been 
initialized. Otherwise, the code will not compile. This is the "definite 
assignment" rule.

bq. I don't see a method named "startUncaching", need to update the exception 
text

OK.

bq. Why is it called "visibleLength" instead of just just "length"? We don't 
support partial caching, reading of blocks open for append, or cached reads of 
blocks that are being cached, so mention of "visibility" threw me off. That's 
why I wanted javadoc above.

Yes, but we plan on supporting those things in the future.  So we should be 
clear about what we're measuring, which is not the length of the block file, 
but the length of the block file that the client is allowed to see.

bq. Could we keep FsDatasetImpl#validToCache method separate rather than 
inlining it in cacheBlock?

The issue is that {{ConcurrentHashMap}} has a method for atomically adding a 
value for the given key if one is not already present.  If this were a separate 
method to check the map and then another to add to it, we would have a race 
condition.  Plus, we'd be checking the map two times, which is inefficient.  
Given that it's only 2 or 3 lines, I don't think a separate method makes sense 
here.

bq. Don't we need to remove a CACHING_CANCELLED MappableBlock from the 
replicaMap when the swap check in CachingTask fails?

Fixed, thanks.

> fix race conditions in DN caching and uncaching
> -----------------------------------------------
>
>                 Key: HDFS-5394
>                 URL: https://issues.apache.org/jira/browse/HDFS-5394
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode, namenode
>    Affects Versions: HDFS-4949
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-5394-caching.001.patch, 
> HDFS-5394-caching.002.patch, HDFS-5394-caching.003.patch
>
>
> The DN needs to handle situations where it is asked to cache the same replica 
> more than once.  (Currently, it can actually do two mmaps and mlocks.)  It 
> also needs to handle the situation where caching a replica is cancelled 
> before said caching completes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HDFS-5394) fix race conditions in DN caching and uncaching

Reply via email to