[
https://issues.apache.org/jira/browse/HDFS-9643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15108785#comment-15108785
]
James Clampffer commented on HDFS-9643:
---------------------------------------
Your observation is right; if network IO hangs or is just really slow this
won't really help out.
In order to close bad connections in a timely manner something would need to
keep track of BlockReader instances and push out updates rather than having
them poll the cancelled flag.
I think this could be done by having the FileHandleImpl keep a
vector<weak_ptr<BlockReader>> of readers it instantiates. When
CancelOperations is called the FH could iterate through that vector and close
the socket of any readers that are still alive. Downside would be that the
vector would need some periodic garbage collection to avoid accumulating a lot
of weak_ptrs to nothing. The idea of adding GC that loops over an operation
that will be locking the data bus doesn't sound ideal so I'm open to
suggestions. Otherwise I'll try and implement the method I described and post
a patch later today.
> libhdfs++: Support async cancellation of read operations
> --------------------------------------------------------
>
> Key: HDFS-9643
> URL: https://issues.apache.org/jira/browse/HDFS-9643
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: hdfs-client
> Reporter: James Clampffer
> Assignee: James Clampffer
> Attachments: HDFS-9643.HDFS-8707.000.patch,
> HDFS-9643.HDFS-8707.001.patch
>
>
> It should be possible for any thread to cancel operations in progress on a
> FileHandle. Any ephemeral objects created by the FileHandle should free
> resources as quickly as possible.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)