[jira] [Comment Edited] (HDFS-13779) Implement performFailover logic for ObserverReadProxyProvider.

2018-08-30 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597855#comment-16597855
 ] 

Konstantin Shvachko edited comment on HDFS-13779 at 8/30/18 7:53 PM:
-

Looks good, Erik. Few comments from me:
# In ORPP you use raw types: "References to generic type 
AbstractNNFailoverProxyProvider should be parameterized"
# Unused imports (2) in {{TestObserverReadProxyProvider}}.
# For {{CachingHAProxyFactory}} I don't think we achieve much here. It still 
stores the the Proxy twice - one in the {{backingFactory}} and another in the 
cache. If want to cache proxies we should do it in the RPC engine globally for 
all proxies. I suggest we skip this optimization for now.
# Also {{TestObserverNode}} fails for me with your patch. I can debug if it is 
only on my box.


was (Author: shv):
Looks good, Erik. Two comments from me:
# In ORPP you use raw types: "References to generic type 
AbstractNNFailoverProxyProvider should be parameterized"
# Unused imports (2) in {{TestObserverReadProxyProvider}}.
# For {{CachingHAProxyFactory}} I don't think we achieve much here. It still 
stores the the Proxy twice - one in the {{backingFactory}} and another in the 
cache. If want to cache proxies we should do it in the RPC engine globally for 
all proxies. I suggest we skip this optimization for now.
# Also {{TestObserverNode}} fails for me with your patch. I can debug if it is 
only on my box.

> Implement performFailover logic for ObserverReadProxyProvider.
> --
>
> Key: HDFS-13779
> URL: https://issues.apache.org/jira/browse/HDFS-13779
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Konstantin Shvachko
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-13779-HDFS-12943.000.patch, 
> HDFS-13779-HDFS-12943.001.patch, HDFS-13779-HDFS-12943.WIP00.patch
>
>
> Currently {{ObserverReadProxyProvider}} inherits {{performFailover()}} method 
> from {{ConfiguredFailoverProxyProvider}}, which simply increments the index 
> and switches over to another NameNode. The logic for ORPP should be smart 
> enough to choose another observer, otherwise it can switch to a SBN, where 
> reads are disallowed, or to an ANN, which defeats the purpose of reads from 
> standby.
> This was discussed in HDFS-12976.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13779) Implement performFailover logic for ObserverReadProxyProvider.

2018-08-30 Thread Erik Krogen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597827#comment-16597827
 ] 

Erik Krogen edited comment on HDFS-13779 at 8/30/18 7:34 PM:
-

Just attached v001 patch addressing the review comments as above, filling out 
the remaining Javadoc, fixing TestObserverNode, and a few other minor fixes. 
Should be ready to go now pending any other review comments.


was (Author: xkrogen):
Just attached v001 patch addressing the review comments as above, filling out 
the remaining Javadoc, fixing TestObserverNode, and a few other minor fixes.

> Implement performFailover logic for ObserverReadProxyProvider.
> --
>
> Key: HDFS-13779
> URL: https://issues.apache.org/jira/browse/HDFS-13779
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Konstantin Shvachko
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-13779-HDFS-12943.000.patch, 
> HDFS-13779-HDFS-12943.001.patch, HDFS-13779-HDFS-12943.WIP00.patch
>
>
> Currently {{ObserverReadProxyProvider}} inherits {{performFailover()}} method 
> from {{ConfiguredFailoverProxyProvider}}, which simply increments the index 
> and switches over to another NameNode. The logic for ORPP should be smart 
> enough to choose another observer, otherwise it can switch to a SBN, where 
> reads are disallowed, or to an ANN, which defeats the purpose of reads from 
> standby.
> This was discussed in HDFS-12976.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13779) Implement performFailover logic for ObserverReadProxyProvider.

2018-08-27 Thread Erik Krogen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16594309#comment-16594309
 ] 

Erik Krogen edited comment on HDFS-13779 at 8/27/18 10:50 PM:
--

Attached v000 patch based on recent changes to 
{{ConfiguredFailoverProxyProvider}} and {{ObserverReadProxyProvider}}.

This takes the approach of scanning through each proxy and checking the state; 
if it is not correct, try another one. Once a correct (observer) proxy is 
found, stay on that one until it fails. This is essentially a copy of how 
{{ConfiguredFailoverProxyProvider}} is implemented.

One issue the above approach opens us up to is if we are currently using an 
observer proxy, but it switches to be an active, we may not notice. I think we 
can add a period (say, every 1 or 5 minutes) refresh of the state of the proxy 
we are currently using to ensure it is still an observer. I would prefer to do 
this in a follow-on patch.

I have a few outstanding TODOs about comments in the patch, but otherwise it 
should be ready for review.


was (Author: xkrogen):
Attached v000 patch based on recent changes to 
{{ConfiguredFailoverProxyProvider}} and {{ObserverReadProxyProvider}}.

This takes the approach of scanning through each proxy and checking the state; 
if it is not correct, try another one. Once a correct (observer) proxy is 
found, stay on that one until it fails. This is essentially a copy of how 
{{ConfiguredFailoverProxyProvider}} is implemented.

I have a few outstanding TODOs about comments in the patch, but otherwise it 
should be ready for review.

> Implement performFailover logic for ObserverReadProxyProvider.
> --
>
> Key: HDFS-13779
> URL: https://issues.apache.org/jira/browse/HDFS-13779
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Konstantin Shvachko
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-13779-HDFS-12943.000.patch, 
> HDFS-13779-HDFS-12943.WIP00.patch
>
>
> Currently {{ObserverReadProxyProvider}} inherits {{performFailover()}} method 
> from {{ConfiguredFailoverProxyProvider}}, which simply increments the index 
> and switches over to another NameNode. The logic for ORPP should be smart 
> enough to choose another observer, otherwise it can switch to a SBN, where 
> reads are disallowed, or to an ANN, which defeats the purpose of reads from 
> standby.
> This was discussed in HDFS-12976.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13779) Implement performFailover logic for ObserverReadProxyProvider.

2018-08-15 Thread Erik Krogen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16581593#comment-16581593
 ] 

Erik Krogen edited comment on HDFS-13779 at 8/15/18 8:53 PM:
-

Thanks for looking [~vagarychen]!
 For #1, the {{currentIndex}} only increases, then you take the modulo vs. the 
current list size to get the actual index. So in the case you've discussed with 
{{currentIndex = 2}} (not 3, since it is 0-indexed), before we have (caret 
indicating current index):
{code:java}
1 2 3 4 5
^
{code}
Then we try 3 and 4, they both throw standby so we remove them, so the 
remaining list is ({{currentIndex = 2}} still):
{code:java}
1 2 5
^
{code}
Next we try 5, is says RETRY, so we increment to get {{currentIndex = 3}}, then 
we try 1 ({{3 % 3 == 0}}) as expected.

I went with {{currentIndex}} increasing and taking modulo, rather than always 
maintaining an actual index, since the size of the list may change and 
otherwise we would have to potentially update {{currentIndex}} every time the 
list size changes (to avoid AIOOBE). Modulus gives us AIOOBE avoidance for free.

For #2, good point. For #3 / #4, yes, it needs cleanup, just a WIP for now.


was (Author: xkrogen):
Thanks for looking [~vagarychen]!
For #1, the {{currentIndex}} only increases, then you take the modulo vs. the 
current list size to get the actual index. So in the case you've discussed with 
{{currentIndex = 2}} (not 3, since it is 0-indexed), before we have (caret 
indicating current index):
{code}
1 2 3 4 5
  ^
{code}
Then we try 3 and 4, they both throw standby so we remove them, so the 
remaining list is ({{currentIndex = 2}} still):
{code}
1 2 5
  ^
{code}
Next we try 5, is says RETRY, so we increment to get {{currentIndex = 3}}, then 
we try 1 ({{3 % 3 == 0}}) as expected.

I went with {{currentIndex}} increasing and taking modulo, rather than always 
maintaining an actual index, since the size of the list may change and 
otherwise we would have to potentially update {{currentIndex}} every time the 
list size changes (to avoid AIOOBE). Modulus gives us AIOOBE avoidance for free.

For #2, good point. For #3 / #4, yes, it needs cleanup, just a WIP for now.

> Implement performFailover logic for ObserverReadProxyProvider.
> --
>
> Key: HDFS-13779
> URL: https://issues.apache.org/jira/browse/HDFS-13779
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Konstantin Shvachko
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-13779-HDFS-12943.WIP00.patch
>
>
> Currently {{ObserverReadProxyProvider}} inherits {{performFailover()}} method 
> from {{ConfiguredFailoverProxyProvider}}, which simply increments the index 
> and switches over to another NameNode. The logic for ORPP should be smart 
> enough to choose another observer, otherwise it can switch to a SBN, where 
> reads are disallowed, or to an ANN, which defeats the purpose of reads from 
> standby.
> This was discussed in HDFS-12976.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org