[jira] [Updated] (HDFS-12103) libhdfs++: Provide workaround to support cancel on filesystem connect until HDFS-11437 is resolved

2017-07-10 Thread James Clampffer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Clampffer updated HDFS-12103:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks for reviewing [~xiaowei.zhu]!  Committed to HDFS-8707.

Manual testing was done by verifying the steps in the workaround procedure can 
cancel a slow connection.  Fix also gets run with and without valgrind as part 
of another project on a regular basis - in that case it's too closely coupled 
to the project to isolate the test.  My hope is to fix the root issue and 
revert this in the next 3-4 weeks once I finish up HDFS-11807 and HDFS-12111.

> libhdfs++: Provide workaround to support cancel on filesystem connect until 
> HDFS-11437 is resolved
> --
>
> Key: HDFS-12103
> URL: https://issues.apache.org/jira/browse/HDFS-12103
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: James Clampffer
>Assignee: James Clampffer
> Attachments: HDFS-12103.HDFS-8707.000.patch
>
>
> HDFS-11437 is going to take a non-trivial amount of work to do right.  In the 
> meantime it'd be nice to have a way to cancel pending connections (even when 
> the FS claimed they are finished).  
> Proposed workaround is to relax the rules about when 
> FileSystem::CancelPending connect can be called since it isn't able to 
> properly determine when it's connected anyway.  In order to determine when 
> the FS has connected you can do some simple RPC call since that will wait on 
> failover.  If CancelPending can be called during that first RPC call then it 
> will effectively be canceling FileSystem::Connect
> Current cancel rules - asterisk on steps where CancelPending is allowed
> FileSystem::Connect called
> FileSystem communicates with first NN *
> FileSystem::Connect returns - even if it hasn't communicated with the active 
> NN
> Proposed relaxation
> FileSystem::Connect called
> FileSystem communicates with first NN*
> FileSystem::Connect returns *
> FileSystem::GetFileInfo called * -any namenode RPC call will do, ignore perm 
> errors
> RPC engine blocks until it hits the active or runs out of retries *
> FileSystem::GetFileInfo returns
> It'd be up to the user to add in the dummy NN RPC call.  Once HDFS-11437 is 
> fixed this workaround can be removed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12103) libhdfs++: Provide workaround to support cancel on filesystem connect until HDFS-11437 is resolved

2017-07-07 Thread James Clampffer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Clampffer updated HDFS-12103:
---
Status: Patch Available  (was: Open)

> libhdfs++: Provide workaround to support cancel on filesystem connect until 
> HDFS-11437 is resolved
> --
>
> Key: HDFS-12103
> URL: https://issues.apache.org/jira/browse/HDFS-12103
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: James Clampffer
>Assignee: James Clampffer
> Attachments: HDFS-12103.HDFS-8707.000.patch
>
>
> HDFS-11437 is going to take a non-trivial amount of work to do right.  In the 
> meantime it'd be nice to have a way to cancel pending connections (even when 
> the FS claimed they are finished).  
> Proposed workaround is to relax the rules about when 
> FileSystem::CancelPending connect can be called since it isn't able to 
> properly determine when it's connected anyway.  In order to determine when 
> the FS has connected you can do some simple RPC call since that will wait on 
> failover.  If CancelPending can be called during that first RPC call then it 
> will effectively be canceling FileSystem::Connect
> Current cancel rules - asterisk on steps where CancelPending is allowed
> FileSystem::Connect called
> FileSystem communicates with first NN *
> FileSystem::Connect returns - even if it hasn't communicated with the active 
> NN
> Proposed relaxation
> FileSystem::Connect called
> FileSystem communicates with first NN*
> FileSystem::Connect returns *
> FileSystem::GetFileInfo called * -any namenode RPC call will do, ignore perm 
> errors
> RPC engine blocks until it hits the active or runs out of retries *
> FileSystem::GetFileInfo returns
> It'd be up to the user to add in the dummy NN RPC call.  Once HDFS-11437 is 
> fixed this workaround can be removed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12103) libhdfs++: Provide workaround to support cancel on filesystem connect until HDFS-11437 is resolved

2017-07-07 Thread James Clampffer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Clampffer updated HDFS-12103:
---
Attachment: HDFS-12103.HDFS-8707.000.patch

Attached patch to allow NamenodeOperations::CancelPendingConnect to be called 
after FileSystem::Connect returns.  This will make any blocked RPC calls return 
an operation canceled failure status.

> libhdfs++: Provide workaround to support cancel on filesystem connect until 
> HDFS-11437 is resolved
> --
>
> Key: HDFS-12103
> URL: https://issues.apache.org/jira/browse/HDFS-12103
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: James Clampffer
>Assignee: James Clampffer
> Attachments: HDFS-12103.HDFS-8707.000.patch
>
>
> HDFS-11437 is going to take a non-trivial amount of work to do right.  In the 
> meantime it'd be nice to have a way to cancel pending connections (even when 
> the FS claimed they are finished).  
> Proposed workaround is to relax the rules about when 
> FileSystem::CancelPending connect can be called since it isn't able to 
> properly determine when it's connected anyway.  In order to determine when 
> the FS has connected you can do some simple RPC call since that will wait on 
> failover.  If CancelPending can be called during that first RPC call then it 
> will effectively be canceling FileSystem::Connect
> Current cancel rules - asterisk on steps where CancelPending is allowed
> FileSystem::Connect called
> FileSystem communicates with first NN *
> FileSystem::Connect returns - even if it hasn't communicated with the active 
> NN
> Proposed relaxation
> FileSystem::Connect called
> FileSystem communicates with first NN*
> FileSystem::Connect returns *
> FileSystem::GetFileInfo called * -any namenode RPC call will do, ignore perm 
> errors
> RPC engine blocks until it hits the active or runs out of retries *
> FileSystem::GetFileInfo returns
> It'd be up to the user to add in the dummy NN RPC call.  Once HDFS-11437 is 
> fixed this workaround can be removed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org