[ https://issues.apache.org/jira/browse/HDFS-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13803550#comment-13803550 ]
Andrew Wang commented on HDFS-5394: ----------------------------------- Hi Colin, good catch on these races. There's definitely some sloppiness we can prevent here, and some of your other code cleanups look good too. High-level comments: * This state machine logic seems like overkill. It's cool how we can do all this with atomic swaps, but synchronization isn't a big overhead as this isn't a hot path, there aren't that many states, and this the transitions aren't that complicated. I think it'd be simpler to just use a couple different maps: {{replicasMap}} for cached replicas and {{pendingCache}} / {{pendingUncache}} for in-progress replicas. ** {{CACHING_CANCELLED}} could be indicated by just moving it from {{pendingCache}} to {{pendingUncache}} ** Removes the need to worry about "visibility" via {{shouldAdvertise}} * I like the idea of {{UncachingTask}}, it's good future-proofing. However, I don't think it should be using the same executor as {{CachingTask}}, since the executor is currently capped at 4 threads per volume in the interest of disk contention. Uncaching tasks do no I/O, and we wouldn't want stuck uncaching tasks to stop other work. It might be better to instead have a background sweeper thread that goes through {{pendingUncache}} (kicked on-demand as necessary). This works well with a separate {{pendingUncache}} map too. Lower-level: * NativeIO: should move the comment up to be javadoc instead * Nit: FsDatasetCache comment should read "was cancelled" not "got cancelled" * A remark, this only sets {{nextValue}} to null, which is okay here, but seeing it did make me check that it's not an error: {code} Value prevValue, nextValue = null; {code} * I don't see a method named "startUncaching", need to update the exception text * trailing whitespace: {code} if (nextValue.state != State.CACHING_CANCELLED) { {code} * Would be good to refer to the JIRA # in the {{UncachingTask}} TODO * Javadoc for MappableBlock#getVisibleLength and MappableBlock constructor * Javadoc for MappableBlock#load refers to "visibleLeng" instead of "visibleLength" * Would be nice to drop a comment in {{uncacheBlock}} about the purpose of the do while for atomic swaps * Why is it called "visibleLength" instead of just just "length"? We don't support partial caching, reading of blocks open for append, or cached reads of blocks that are being cached, so mention of "visibility" threw me off. That's why I wanted javadoc above. * Could we keep {{FsDatasetImpl#validToCache}} method separate rather than inlining it in {{cacheBlock}}? * Don't we need to remove a {{CACHING_CANCELLED}} {{MappableBlock}} from the replicaMap when the swap check in {{CachingTask}} fails? > fix race conditions in DN caching and uncaching > ----------------------------------------------- > > Key: HDFS-5394 > URL: https://issues.apache.org/jira/browse/HDFS-5394 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode > Affects Versions: HDFS-4949 > Reporter: Colin Patrick McCabe > Assignee: Colin Patrick McCabe > Attachments: HDFS-5394-caching.001.patch, > HDFS-5394-caching.002.patch, HDFS-5394-caching.003.patch > > > The DN needs to handle situations where it is asked to cache the same replica > more than once. (Currently, it can actually do two mmaps and mlocks.) It > also needs to handle the situation where caching a replica is cancelled > before said caching completes. -- This message was sent by Atlassian JIRA (v6.1#6144)