[ 
https://issues.apache.org/jira/browse/HDFS-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814568#comment-13814568
 ] 

Colin Patrick McCabe commented on HDFS-5394:
--------------------------------------------

bq. Unused imports in FsDatasetImpl and FsVolumeImpl

removed

bq. Do we still need to rename getExecutor to getCacheExecutor in FsVolumeImpl?

Well, the name of the variable is {{cacheExecutor}}; shouldn't the getter be 
{{getCacheExecutor}}?

bq. State#isUncaching() is unused

removed

bq. Could use a core pool size of 0 for uncachingExecutor, I don't think it's 
that latency sensitive

agreed

bq. usedBytes javadoc: "more things to cache that we can't actually do because 
of" is an awkward turn of phrase, maybe say "assign more blocks than we can 
actually cache because of" instead

ok

bq. MappableBlock#load javadoc: visibleLeng parameter should be renamed to 
length. The return value is now also a MappableBlock, not a boolean.

fixed

bq. Key: rename id to blockId for clarity? or add a bit of javadoc

added javadoc

bq. Naming the HashMap replicaMap is confusing since there's already a datanode 
ReplicaMap class. Maybe mappableBlockMap instead?

ok

bq. Caching can fail if the underlying block is invalidated in between getting 
the block's filename and running the CacheTask. It'd be nice to distinguish 
this race from a real error for when we do metrics (and also quash the 
exception).

I just added a catch block for the {{FileNotFound}} exception which both 
{{getBlockInputStream}} and {{getMetaDataInputStream}} can throw.  I still 
think we want to log this exception, but at INFO rather than WARN.  We will 
retry sending the {{DNA_CACHE}} command (once 5366 is committed), so hitting 
this narrow race if a block is being moved is just a temporary setback.

bq. If we get a DNA_CACHE for a block that is currently being uncached, 
shouldn't we try to cancel the uncache and re-cache it? The NN will resend the 
command, but it'd be better to not have to wait for that.

We don't know how far along the uncaching process is.  We can't cancel it if we 
already called {{munmap}}.  We could allow cancellation of pending uncaches by 
splitting {{UNCACHING}} into {{UNCACHING_SCHEDULED}} and 
{{UNCACHING_IN_PROGRESS}}, and only allowing cancellation on the former.  This 
might be a good improvement to make as part of 5182.  But for now, the 
uncaching process is really quick, so let's keep it simple.

bq. Could this be written with value.state == State.CACHING_CANCELLED instead? 
Would be clearer, and I believe equivalent since uncacheBlock won't set the 
state to UNCACHING if it's CACHING or CACHING_CANCELLED.

well, if value is null, you don't want to be dereferencing that, right?

bq. Even better would be interrupting a CachingTask on uncache since it'll save 
us I/O and CPU.

That kind of interruption logic gets complex quickly.  I'd rather save that for 
a potential performance improvement JIRA later down the line.  I also think 
that if we're thrashing (cancelling caching requests right and left) the real 
fix might be on the NameNode anyway...

bq. Could we combine CACHING_CANCELLED into UNCACHING? It seems like 
CachingTask could check for UNCACHING in that if statement at the end and 
uncache, same sort of change for uncacheBlock.

I would rather not do that, since right now we can look at entries in the map 
and instantly know that anything in state {{UNCACHING}} has an associated 
{{Runnable}} scheduled in the {{Executor}}.  cancelled is not really the same 
thing as uncaching since in the former case, there is actually nothing to do!

bq. I think using a switch/case on the prevValue.state in uncacheBlock would be 
clearer

ok

bq. 6,000,000 milliseconds seem like very long test timeouts  Can we change 
them to say, 60,000?

the general idea is to do stuff that can time out in 
{{GenericTestUtils#waitFor}} blocks.  The waitFor blocks actually give useful 
backtraces and messages when they time out, unlike the generic test timeouts.  
I wanted to avoid the scenario where the test-level timeouts kick in, but out 
of paranoia, I set the overall test timeout to 10 minutes in case there was 
some other unexpected timeout.  I wanted to avoid the issues we've had with 
zombie tests in Jenkins causing heisenfailures.

bq. Are these new log prints for sanity checking? Maybe we can just remove them.

it's more so you can see what's going on in the sea of log messages.  
otherwise, it becomes hard to debug.

bq. Some of the comments seem to refer to a previous patch version that used a 
countdown latch.

fixed

bq. It's unclear what this is testing beyond caching and then uncaching a bunch 
of blocks. Can we check for log prints to see that it's actually cancelling as 
expected? Any other ideas for definitively hitting cancellation?

we could add callback hooks to more points in the system, and set up a bunch of 
countdown latches (or similar), but it might be overkill here.  I have looked 
at the logs and I do see cancellation.  you'd have to hit a GC or something to 
avoid it in this test, I think

> fix race conditions in DN caching and uncaching
> -----------------------------------------------
>
>                 Key: HDFS-5394
>                 URL: https://issues.apache.org/jira/browse/HDFS-5394
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode, namenode
>    Affects Versions: 3.0.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-5394-caching.001.patch, 
> HDFS-5394-caching.002.patch, HDFS-5394-caching.003.patch, 
> HDFS-5394-caching.004.patch, HDFS-5394.005.patch, HDFS-5394.006.patch, 
> HDFS-5394.007.patch
>
>
> The DN needs to handle situations where it is asked to cache the same replica 
> more than once.  (Currently, it can actually do two mmaps and mlocks.)  It 
> also needs to handle the situation where caching a replica is cancelled 
> before said caching completes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to