[jira] [Updated] (HDFS-9103) Retry reads on DN failure

2015-11-17 Thread James Clampffer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Clampffer updated HDFS-9103:
--
Attachment: HDFS-9103.HDFS-8707.010.patch

Thanks for the clarification [~wheat9]

New patch posted:
-got bad_datanode_tracker.h out of the public headers, moved into lib/fs 
because thats where it's most tightly coupled
-declared NodeExclusionRule in hdfs.h
-got rid of comment about static cast
-renamed 'optional_exclude_rule' param to 'excluded_nodes'
-inlined SelectBlockAndNode
-changed dn selection to use a find_if rather than an explicit loop
-kept existing bad_datanode_test tests
-put the unit tests for BadDataNodeTracker and ExclusionSet that don't use mock 
objects/methods into a seperate test and cmake target

Other things:
-NodeExclusionRule and classes that derive from it got virtual destructors to 
avoid leaks
-Added tests for the ExcludedSet object.  It's incredibly simple but more tests 
won't hurt.

> Retry reads on DN failure
> -
>
> Key: HDFS-9103
> URL: https://issues.apache.org/jira/browse/HDFS-9103
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Bob Hansen
>Assignee: James Clampffer
> Fix For: HDFS-8707
>
> Attachments: HDFS-9103.1.patch, HDFS-9103.2.patch, 
> HDFS-9103.HDFS-8707.006.patch, HDFS-9103.HDFS-8707.007.patch, 
> HDFS-9103.HDFS-8707.008.patch, HDFS-9103.HDFS-8707.009.patch, 
> HDFS-9103.HDFS-8707.010.patch, HDFS-9103.HDFS-8707.3.patch, 
> HDFS-9103.HDFS-8707.4.patch, HDFS-9103.HDFS-8707.5.patch
>
>
> When AsyncPreadSome fails, add the failed DataNode to the excluded list and 
> try again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9103) Retry reads on DN failure

2015-11-17 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-9103:
-
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

I've committed the patch to trunk and branch-2. Thanks [~James Clampffer] for 
the contribution.

> Retry reads on DN failure
> -
>
> Key: HDFS-9103
> URL: https://issues.apache.org/jira/browse/HDFS-9103
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Bob Hansen
>Assignee: James Clampffer
> Fix For: HDFS-8707, HDFS-8707
>
> Attachments: HDFS-9103.1.patch, HDFS-9103.2.patch, 
> HDFS-9103.HDFS-8707.006.patch, HDFS-9103.HDFS-8707.007.patch, 
> HDFS-9103.HDFS-8707.008.patch, HDFS-9103.HDFS-8707.009.patch, 
> HDFS-9103.HDFS-8707.010.patch, HDFS-9103.HDFS-8707.3.patch, 
> HDFS-9103.HDFS-8707.4.patch, HDFS-9103.HDFS-8707.5.patch
>
>
> When AsyncPreadSome fails, add the failed DataNode to the excluded list and 
> try again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9103) Retry reads on DN failure

2015-11-13 Thread James Clampffer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Clampffer updated HDFS-9103:
--
Attachment: HDFS-9103.HDFS-8707.009.patch

New patch:
-Addessed [~bobthansen]'s concerns about test coverage and API
-got rid of the little GC pass for expired nodes, we can wait and see if that 
ever becomes a real problem
-Got rid of the optional_node_rule default parameter in AsyncPreadSome, just 
pass in nullptr if you don't want to use it.

> Retry reads on DN failure
> -
>
> Key: HDFS-9103
> URL: https://issues.apache.org/jira/browse/HDFS-9103
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Bob Hansen
>Assignee: James Clampffer
> Fix For: HDFS-8707
>
> Attachments: HDFS-9103.1.patch, HDFS-9103.2.patch, 
> HDFS-9103.HDFS-8707.006.patch, HDFS-9103.HDFS-8707.007.patch, 
> HDFS-9103.HDFS-8707.008.patch, HDFS-9103.HDFS-8707.009.patch, 
> HDFS-9103.HDFS-8707.3.patch, HDFS-9103.HDFS-8707.4.patch, 
> HDFS-9103.HDFS-8707.5.patch
>
>
> When AsyncPreadSome fails, add the failed DataNode to the excluded list and 
> try again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9103) Retry reads on DN failure

2015-11-09 Thread James Clampffer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Clampffer updated HDFS-9103:
--
Attachment: HDFS-9103.HDFS-8707.008.patch

I still need to get rid of some test duplication and write a couple good tests 
for AsyncPreadSome with an override but wanted to post this in case anyone was 
curious.

-Got rid of explicitly passing around the BadDataNodeTracker.  FileSystem and 
InputStream now keep shared_ptrs to the BadDataNodeTracker.  The tracker is 
used by default for methods like PositionRead.  

-I've added an abstraction, NodeExclusionRule with a uuid->bool virtual method 
for testing bad nodes so that the tracker can be overridden if the user want to 
in AsyncPreadSome.  Added a wrapper for std::set that inherits from this to 
make provide an easy way to pass in a set of nodes to exclude.

-Added unit tests for BadDataNodeTracker.  Added a method that can be used in 
tests to move time forward to make sure that nodes get kicked out after enough 
time has elapsed.

> Retry reads on DN failure
> -
>
> Key: HDFS-9103
> URL: https://issues.apache.org/jira/browse/HDFS-9103
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Bob Hansen
>Assignee: James Clampffer
> Fix For: HDFS-8707
>
> Attachments: HDFS-9103.1.patch, HDFS-9103.2.patch, 
> HDFS-9103.HDFS-8707.006.patch, HDFS-9103.HDFS-8707.007.patch, 
> HDFS-9103.HDFS-8707.008.patch, HDFS-9103.HDFS-8707.3.patch, 
> HDFS-9103.HDFS-8707.4.patch, HDFS-9103.HDFS-8707.5.patch
>
>
> When AsyncPreadSome fails, add the failed DataNode to the excluded list and 
> try again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9103) Retry reads on DN failure

2015-11-04 Thread James Clampffer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Clampffer updated HDFS-9103:
--
Attachment: HDFS-9103.HDFS-8707.007.patch

New patch, there's a bit of extra noise due to clang-format hitting a few files 
that hadn't had it before.

Addressing Haohui's batch of concerns in order:
-That name_match function isn't needed after switching bad_datanodes_ to a map
-Got rid of BadDataNodeTracker::GetNodesToExclude and added a IsBadNode method 
instead.  The InputStream takes a shared_ptr to the BadDataNodeTracker and 
calls IsBadNode directly, this should get rid of any need for caching as it 
gets rid of a lot of copies and other work making sets of strings.
-Got rid of BadDataNodeTracker::Clear entirely and changed the tests so that 
BadDataNodeTracker is scoped by test function.  This avoids issues with 
possibly carrying state between tests.
-Added a datanode exclusion duration to the Option class with a default of 10 
minutes.  Switched time units to milliseconds to be consistent.  Is there a 
standard name for this?  I didn't see anything in the options used for 
hdfs-sites.xml.
-Switched from system_clock to steady_clock to make sure time is always 
monotonically increasing.
-I think the way I rearranged the code that this comment referred to simplified 
it.  If it's not please let me know what exactly needs to be simplified.
-Made ShouldExclude a static method of InputStream, got rid of the duplicate 
used by the gmock test.

> Retry reads on DN failure
> -
>
> Key: HDFS-9103
> URL: https://issues.apache.org/jira/browse/HDFS-9103
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Bob Hansen
>Assignee: James Clampffer
> Fix For: HDFS-8707
>
> Attachments: HDFS-9103.1.patch, HDFS-9103.2.patch, 
> HDFS-9103.HDFS-8707.006.patch, HDFS-9103.HDFS-8707.007.patch, 
> HDFS-9103.HDFS-8707.3.patch, HDFS-9103.HDFS-8707.4.patch, 
> HDFS-9103.HDFS-8707.5.patch
>
>
> When AsyncPreadSome fails, add the failed DataNode to the excluded list and 
> try again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9103) Retry reads on DN failure

2015-10-30 Thread James Clampffer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Clampffer updated HDFS-9103:
--
Attachment: HDFS-9103.HDFS-8707.006.patch

Adding a patch:

This is effectively the logic I had in one of the earlier revisions of 
HDFS-8766 however now it keeps time on a per-node basis.  I added a class, 
BadDataNodeTracker, that encapsulates all locking.  The HadoopFileSystem is the 
first to create it and keep a shared_ptr with make_shared and then each 
FileHandle object is given a shared_ptr to tracker as well.

> Retry reads on DN failure
> -
>
> Key: HDFS-9103
> URL: https://issues.apache.org/jira/browse/HDFS-9103
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Bob Hansen
>Assignee: James Clampffer
> Fix For: HDFS-8707
>
> Attachments: HDFS-9103.1.patch, HDFS-9103.2.patch, 
> HDFS-9103.HDFS-8707.006.patch, HDFS-9103.HDFS-8707.3.patch, 
> HDFS-9103.HDFS-8707.4.patch, HDFS-9103.HDFS-8707.5.patch
>
>
> When AsyncPreadSome fails, add the failed DataNode to the excluded list and 
> try again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9103) Retry reads on DN failure

2015-09-22 Thread Bob Hansen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bob Hansen updated HDFS-9103:
-
Attachment: HDFS-9103.HDFS-8707.5.patch

> Retry reads on DN failure
> -
>
> Key: HDFS-9103
> URL: https://issues.apache.org/jira/browse/HDFS-9103
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Bob Hansen
>Assignee: Bob Hansen
> Fix For: HDFS-8707
>
> Attachments: HDFS-9103.1.patch, HDFS-9103.2.patch, 
> HDFS-9103.HDFS-8707.3.patch, HDFS-9103.HDFS-8707.4.patch, 
> HDFS-9103.HDFS-8707.5.patch
>
>
> When AsyncPreadSome fails, add the failed DataNode to the excluded list and 
> try again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9103) Retry reads on DN failure

2015-09-22 Thread Bob Hansen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bob Hansen updated HDFS-9103:
-
Attachment: HDFS-9103.HDFS-8707.4.patch

> Retry reads on DN failure
> -
>
> Key: HDFS-9103
> URL: https://issues.apache.org/jira/browse/HDFS-9103
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Bob Hansen
>Assignee: Bob Hansen
> Fix For: HDFS-8707
>
> Attachments: HDFS-9103.1.patch, HDFS-9103.2.patch, 
> HDFS-9103.HDFS-8707.3.patch, HDFS-9103.HDFS-8707.4.patch
>
>
> When AsyncPreadSome fails, add the failed DataNode to the excluded list and 
> try again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9103) Retry reads on DN failure

2015-09-22 Thread Bob Hansen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bob Hansen updated HDFS-9103:
-
Attachment: HDFS-9103.HDFS-8707.3.patch

Removed the public state mutations that are no longer necessary for testing.

> Retry reads on DN failure
> -
>
> Key: HDFS-9103
> URL: https://issues.apache.org/jira/browse/HDFS-9103
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Bob Hansen
>Assignee: Bob Hansen
> Fix For: HDFS-8707
>
> Attachments: HDFS-9103.1.patch, HDFS-9103.2.patch, 
> HDFS-9103.HDFS-8707.3.patch
>
>
> When AsyncPreadSome fails, add the failed DataNode to the excluded list and 
> try again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9103) Retry reads on DN failure

2015-09-21 Thread Bob Hansen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bob Hansen updated HDFS-9103:
-
Attachment: HDFS-9103.2.patch

> Retry reads on DN failure
> -
>
> Key: HDFS-9103
> URL: https://issues.apache.org/jira/browse/HDFS-9103
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Bob Hansen
>Assignee: Bob Hansen
> Fix For: HDFS-8707
>
> Attachments: HDFS-9103.1.patch, HDFS-9103.2.patch
>
>
> When AsyncPreadSome fails, add the failed DataNode to the excluded list and 
> try again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9103) Retry reads on DN failure

2015-09-21 Thread Bob Hansen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bob Hansen updated HDFS-9103:
-
Fix Version/s: HDFS-8707

> Retry reads on DN failure
> -
>
> Key: HDFS-9103
> URL: https://issues.apache.org/jira/browse/HDFS-9103
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Bob Hansen
>Assignee: Bob Hansen
> Fix For: HDFS-8707
>
> Attachments: HDFS-9103.1.patch
>
>
> When AsyncPreadSome fails, add the failed DataNode to the excluded list and 
> try again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9103) Retry reads on DN failure

2015-09-18 Thread Bob Hansen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bob Hansen updated HDFS-9103:
-
Status: Patch Available  (was: Open)

Changes of note:
I changed the semantics of InputStream::PositionRead to be success-or-failure.  
It will now retry if there is a Status::exception in the pipeline [Q: will all 
I/O errors be reflected as Status::exceptions?]

I exposed Status::Code so we can use the right internal semantics.

I moved the excluded_datanodes to be a member of the InputStream.  I think we 
would want failed nodes to be remembered across individual reads and not 
re-tried.  Because it's mutable, I didn't want to be passing copies or mutable 
references on the stack.









> Retry reads on DN failure
> -
>
> Key: HDFS-9103
> URL: https://issues.apache.org/jira/browse/HDFS-9103
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Bob Hansen
>Assignee: Bob Hansen
> Attachments: HDFS-9103.1.patch
>
>
> When AsyncPreadSome fails, add the failed DataNode to the excluded list and 
> try again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9103) Retry reads on DN failure

2015-09-18 Thread Bob Hansen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bob Hansen updated HDFS-9103:
-
Target Version/s: HDFS-8707

> Retry reads on DN failure
> -
>
> Key: HDFS-9103
> URL: https://issues.apache.org/jira/browse/HDFS-9103
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Bob Hansen
>Assignee: Bob Hansen
> Attachments: HDFS-9103.1.patch
>
>
> When AsyncPreadSome fails, add the failed DataNode to the excluded list and 
> try again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9103) Retry reads on DN failure

2015-09-18 Thread Bob Hansen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bob Hansen updated HDFS-9103:
-
Attachment: HDFS-9103.1.patch

> Retry reads on DN failure
> -
>
> Key: HDFS-9103
> URL: https://issues.apache.org/jira/browse/HDFS-9103
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Bob Hansen
>Assignee: Bob Hansen
> Attachments: HDFS-9103.1.patch
>
>
> When AsyncPreadSome fails, add the failed DataNode to the excluded list and 
> try again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9103) Retry reads on DN failure

2015-09-18 Thread Bob Hansen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bob Hansen updated HDFS-9103:
-
Attachment: HDFS-9103.1.patch

> Retry reads on DN failure
> -
>
> Key: HDFS-9103
> URL: https://issues.apache.org/jira/browse/HDFS-9103
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Bob Hansen
>Assignee: Bob Hansen
> Attachments: HDFS-9103.1.patch
>
>
> When AsyncPreadSome fails, add the failed DataNode to the excluded list and 
> try again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9103) Retry reads on DN failure

2015-09-18 Thread Bob Hansen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bob Hansen updated HDFS-9103:
-
Attachment: (was: HDFS-9103.1.patch)

> Retry reads on DN failure
> -
>
> Key: HDFS-9103
> URL: https://issues.apache.org/jira/browse/HDFS-9103
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Bob Hansen
>Assignee: Bob Hansen
> Attachments: HDFS-9103.1.patch
>
>
> When AsyncPreadSome fails, add the failed DataNode to the excluded list and 
> try again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)