[jira] [Updated] (HDFS-12103) libhdfs++: Provide workaround to support cancel on filesystem connect until HDFS-11437 is resolved
[ https://issues.apache.org/jira/browse/HDFS-12103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Clampffer updated HDFS-12103: --- Resolution: Fixed Status: Resolved (was: Patch Available) Thanks for reviewing [~xiaowei.zhu]! Committed to HDFS-8707. Manual testing was done by verifying the steps in the workaround procedure can cancel a slow connection. Fix also gets run with and without valgrind as part of another project on a regular basis - in that case it's too closely coupled to the project to isolate the test. My hope is to fix the root issue and revert this in the next 3-4 weeks once I finish up HDFS-11807 and HDFS-12111. > libhdfs++: Provide workaround to support cancel on filesystem connect until > HDFS-11437 is resolved > -- > > Key: HDFS-12103 > URL: https://issues.apache.org/jira/browse/HDFS-12103 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: James Clampffer >Assignee: James Clampffer > Attachments: HDFS-12103.HDFS-8707.000.patch > > > HDFS-11437 is going to take a non-trivial amount of work to do right. In the > meantime it'd be nice to have a way to cancel pending connections (even when > the FS claimed they are finished). > Proposed workaround is to relax the rules about when > FileSystem::CancelPending connect can be called since it isn't able to > properly determine when it's connected anyway. In order to determine when > the FS has connected you can do some simple RPC call since that will wait on > failover. If CancelPending can be called during that first RPC call then it > will effectively be canceling FileSystem::Connect > Current cancel rules - asterisk on steps where CancelPending is allowed > FileSystem::Connect called > FileSystem communicates with first NN * > FileSystem::Connect returns - even if it hasn't communicated with the active > NN > Proposed relaxation > FileSystem::Connect called > FileSystem communicates with first NN* > FileSystem::Connect returns * > FileSystem::GetFileInfo called * -any namenode RPC call will do, ignore perm > errors > RPC engine blocks until it hits the active or runs out of retries * > FileSystem::GetFileInfo returns > It'd be up to the user to add in the dummy NN RPC call. Once HDFS-11437 is > fixed this workaround can be removed. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12103) libhdfs++: Provide workaround to support cancel on filesystem connect until HDFS-11437 is resolved
[ https://issues.apache.org/jira/browse/HDFS-12103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Clampffer updated HDFS-12103: --- Status: Patch Available (was: Open) > libhdfs++: Provide workaround to support cancel on filesystem connect until > HDFS-11437 is resolved > -- > > Key: HDFS-12103 > URL: https://issues.apache.org/jira/browse/HDFS-12103 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: James Clampffer >Assignee: James Clampffer > Attachments: HDFS-12103.HDFS-8707.000.patch > > > HDFS-11437 is going to take a non-trivial amount of work to do right. In the > meantime it'd be nice to have a way to cancel pending connections (even when > the FS claimed they are finished). > Proposed workaround is to relax the rules about when > FileSystem::CancelPending connect can be called since it isn't able to > properly determine when it's connected anyway. In order to determine when > the FS has connected you can do some simple RPC call since that will wait on > failover. If CancelPending can be called during that first RPC call then it > will effectively be canceling FileSystem::Connect > Current cancel rules - asterisk on steps where CancelPending is allowed > FileSystem::Connect called > FileSystem communicates with first NN * > FileSystem::Connect returns - even if it hasn't communicated with the active > NN > Proposed relaxation > FileSystem::Connect called > FileSystem communicates with first NN* > FileSystem::Connect returns * > FileSystem::GetFileInfo called * -any namenode RPC call will do, ignore perm > errors > RPC engine blocks until it hits the active or runs out of retries * > FileSystem::GetFileInfo returns > It'd be up to the user to add in the dummy NN RPC call. Once HDFS-11437 is > fixed this workaround can be removed. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12103) libhdfs++: Provide workaround to support cancel on filesystem connect until HDFS-11437 is resolved
[ https://issues.apache.org/jira/browse/HDFS-12103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Clampffer updated HDFS-12103: --- Attachment: HDFS-12103.HDFS-8707.000.patch Attached patch to allow NamenodeOperations::CancelPendingConnect to be called after FileSystem::Connect returns. This will make any blocked RPC calls return an operation canceled failure status. > libhdfs++: Provide workaround to support cancel on filesystem connect until > HDFS-11437 is resolved > -- > > Key: HDFS-12103 > URL: https://issues.apache.org/jira/browse/HDFS-12103 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: James Clampffer >Assignee: James Clampffer > Attachments: HDFS-12103.HDFS-8707.000.patch > > > HDFS-11437 is going to take a non-trivial amount of work to do right. In the > meantime it'd be nice to have a way to cancel pending connections (even when > the FS claimed they are finished). > Proposed workaround is to relax the rules about when > FileSystem::CancelPending connect can be called since it isn't able to > properly determine when it's connected anyway. In order to determine when > the FS has connected you can do some simple RPC call since that will wait on > failover. If CancelPending can be called during that first RPC call then it > will effectively be canceling FileSystem::Connect > Current cancel rules - asterisk on steps where CancelPending is allowed > FileSystem::Connect called > FileSystem communicates with first NN * > FileSystem::Connect returns - even if it hasn't communicated with the active > NN > Proposed relaxation > FileSystem::Connect called > FileSystem communicates with first NN* > FileSystem::Connect returns * > FileSystem::GetFileInfo called * -any namenode RPC call will do, ignore perm > errors > RPC engine blocks until it hits the active or runs out of retries * > FileSystem::GetFileInfo returns > It'd be up to the user to add in the dummy NN RPC call. Once HDFS-11437 is > fixed this workaround can be removed. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org