[jira] [Commented] (HDFS-13779) Implement performFailover logic for ObserverReadProxyProvider.

2018-08-03 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568558#comment-16568558
 ] 

Konstantin Shvachko commented on HDFS-13779:


Also reuse one of the existing retry policies for ORPP.

> Implement performFailover logic for ObserverReadProxyProvider.
> --
>
> Key: HDFS-13779
> URL: https://issues.apache.org/jira/browse/HDFS-13779
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Konstantin Shvachko
>Priority: Major
>
> Currently {{ObserverReadProxyProvider}} inherits {{performFailover()}} method 
> from {{ConfiguredFailoverProxyProvider}}, which simply increments the index 
> and switches over to another NameNode. The logic for ORPP should be smart 
> enough to choose another observer, otherwise it can switch to a SBN, where 
> reads are disallowed, or to an ANN, which defeats the purpose of reads from 
> standby.
> This was discussed in HDFS-12976.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13779) Implement performFailover logic for ObserverReadProxyProvider.

2018-08-08 Thread Erik Krogen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16574081#comment-16574081
 ] 

Erik Krogen commented on HDFS-13779:


Attaching a WIP patch which covers this and HDFS-13780, and relies on some more 
refactoring in ConfiguredFailoverProxyProvider.

The idea is basically to continue allowing the CFPP layer to manage only the 
Active/Standby NameNodes, and ORPP to manage the observers (then fall back to 
CFPP's non-observer proxies). _Failover_ refers only to switching Active 
NameNodes, _not_ switching between observers.
 * Refactor CFPP a bit to create a separate {{getProxies()}} method where the 
initialization can be done (solving HDFS-13780), and also where it can be 
overridden by ORPP.
 * On proxy initialization, also fetch all of the NameNode states. When CFPP 
requests proxies, filter to only non-observers.
 * Method invocation, on read methods, tries all of the NNs thought to 
currently be in observer state. If any throws a StandbyException, mark it as 
non-observer. Unfortunately we have no way to tell here if one of the 
thought-to-be-observers has actually become Active, but in this case failover 
will happen soon (at the next write request) and the situation will be fixed 
(see below).
 * If all observer NNs fail, or it is a write method, pass the request up to 
CFPP, which will try the current Active. This may trigger failover. If so, 
before picking a new node from the list of non-observers, refresh the states of 
all of the NameNodes. This handles the case where one of the previous observers 
is now active.

[~shv], [~chliang], let me know your thoughts on the above.

> Implement performFailover logic for ObserverReadProxyProvider.
> --
>
> Key: HDFS-13779
> URL: https://issues.apache.org/jira/browse/HDFS-13779
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Konstantin Shvachko
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-13779-HDFS-12943.WIP00.patch
>
>
> Currently {{ObserverReadProxyProvider}} inherits {{performFailover()}} method 
> from {{ConfiguredFailoverProxyProvider}}, which simply increments the index 
> and switches over to another NameNode. The logic for ORPP should be smart 
> enough to choose another observer, otherwise it can switch to a SBN, where 
> reads are disallowed, or to an ANN, which defeats the purpose of reads from 
> standby.
> This was discussed in HDFS-12976.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13779) Implement performFailover logic for ObserverReadProxyProvider.

2018-08-15 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16581498#comment-16581498
 ] 

Chen Liang commented on HDFS-13779:
---

Thanks for the patch [~xkrogen], looking good to me overall, some comments:
1. I was too sure about the comment
{code}
// Only move the index if a new node should be tried.
// Don't move the index if the node will be removed from the list
// of observers, else one node will be skipped.
{code}
For example, it seems to me that, say there are 5 proxies and {{currentIndex}} 
is currently 3. If no succeed, the for loop will traverse all 5 in the order of 
3, 4, 5, 1, 2. Say 3, 4 both throw {{isStandbyException}} and removed from 
list, then for 5 the policy returns RETRY. Then {{currentIndex}} gets 
incremented by 1 to 4? Seems setting to 1 in this case makes more sense?
2. {{ObserverReadProxyProvider#getProxy}} the current code inserts a ',' 
between proxy strings, seems this is removed in the patch. Should we keep it?
3. need adding some new lines in class {{NameNodeProxyInfo}} 
4. would be great if we can add a unit test.

> Implement performFailover logic for ObserverReadProxyProvider.
> --
>
> Key: HDFS-13779
> URL: https://issues.apache.org/jira/browse/HDFS-13779
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Konstantin Shvachko
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-13779-HDFS-12943.WIP00.patch
>
>
> Currently {{ObserverReadProxyProvider}} inherits {{performFailover()}} method 
> from {{ConfiguredFailoverProxyProvider}}, which simply increments the index 
> and switches over to another NameNode. The logic for ORPP should be smart 
> enough to choose another observer, otherwise it can switch to a SBN, where 
> reads are disallowed, or to an ANN, which defeats the purpose of reads from 
> standby.
> This was discussed in HDFS-12976.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13779) Implement performFailover logic for ObserverReadProxyProvider.

2018-08-15 Thread Erik Krogen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16581593#comment-16581593
 ] 

Erik Krogen commented on HDFS-13779:


Thanks for looking [~vagarychen]!
For #1, the {{currentIndex}} only increases, then you take the modulo vs. the 
current list size to get the actual index. So in the case you've discussed with 
{{currentIndex = 2}} (not 3, since it is 0-indexed), before we have (caret 
indicating current index):
{code}
1 2 3 4 5
  ^
{code}
Then we try 3 and 4, they both throw standby so we remove them, so the 
remaining list is ({{currentIndex = 2}} still):
{code}
1 2 5
  ^
{code}
Next we try 5, is says RETRY, so we increment to get {{currentIndex = 3}}, then 
we try 1 ({{3 % 3 == 0}}) as expected.

I went with {{currentIndex}} increasing and taking modulo, rather than always 
maintaining an actual index, since the size of the list may change and 
otherwise we would have to potentially update {{currentIndex}} every time the 
list size changes (to avoid AIOOBE). Modulus gives us AIOOBE avoidance for free.

For #2, good point. For #3 / #4, yes, it needs cleanup, just a WIP for now.

> Implement performFailover logic for ObserverReadProxyProvider.
> --
>
> Key: HDFS-13779
> URL: https://issues.apache.org/jira/browse/HDFS-13779
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Konstantin Shvachko
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-13779-HDFS-12943.WIP00.patch
>
>
> Currently {{ObserverReadProxyProvider}} inherits {{performFailover()}} method 
> from {{ConfiguredFailoverProxyProvider}}, which simply increments the index 
> and switches over to another NameNode. The logic for ORPP should be smart 
> enough to choose another observer, otherwise it can switch to a SBN, where 
> reads are disallowed, or to an ANN, which defeats the purpose of reads from 
> standby.
> This was discussed in HDFS-12976.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13779) Implement performFailover logic for ObserverReadProxyProvider.

2018-08-15 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16581729#comment-16581729
 ] 

Chen Liang commented on HDFS-13779:
---

I see, so I guess the incremented {{currentIndex}} only takes effect when 
{{invoke}} gets called next time, by which time {{getFilteredProxies}} will 
return a different list with those proxies being removed. This is the part I 
was missing, thanks for the clarification!

But this led me think about another situation: it seems to me that {{getProxy}} 
can be called at any point of time. And {{getProxy}} will trigger 
{{getAllProxies}} which further triggers {{refreshProxyState}}. And 
{{refreshProxyState}} calls {{refreshState}} of each NN, which may change 
{{isObserver}} to either true or false. In short, the value of {{isObserver}} 
could change at any point of time due to {{getProxy}}. So when {{invoke}} gets 
called each time, there seems no guarantee what will (not) be in the list 
returned from {{getFilteredProxies}} with regard to the previous state. Is this 
a valid case? 

> Implement performFailover logic for ObserverReadProxyProvider.
> --
>
> Key: HDFS-13779
> URL: https://issues.apache.org/jira/browse/HDFS-13779
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Konstantin Shvachko
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-13779-HDFS-12943.WIP00.patch
>
>
> Currently {{ObserverReadProxyProvider}} inherits {{performFailover()}} method 
> from {{ConfiguredFailoverProxyProvider}}, which simply increments the index 
> and switches over to another NameNode. The logic for ORPP should be smart 
> enough to choose another observer, otherwise it can switch to a SBN, where 
> reads are disallowed, or to an ANN, which defeats the purpose of reads from 
> standby.
> This was discussed in HDFS-12976.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13779) Implement performFailover logic for ObserverReadProxyProvider.

2018-08-16 Thread Erik Krogen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16583012#comment-16583012
 ] 

Erik Krogen commented on HDFS-13779:


{quote}
And {{getProxy}} will trigger {{getAllProxies}} which further triggers 
{{refreshProxyState}}.
{quote}
Actually this is not quite true; {{getAllProxies}} only calls 
{{refreshProxyState}} when {{proxies}} is empty (i.e., initialization). After 
that it just returns the current value of {{proxies}}. However, 
{{refreshProxyState}} may be called at any time by, for example, 
{{performFailover}}, so your point below about {{isObserver}} is still valid:
{quote}
In short, the value of {{isObserver}} could change at any point of time  So 
when invoke gets called each time, there seems no guarantee what will (not) be 
in the list returned from {{getFilteredProxies}} with regard to the previous 
state.
{quote}
So yes, I agree that there isn't a guarantee that {{getFilteredProxies}} will 
return the same thing when invoked multiple times. That's part of why, at the 
top of {{invoke}}, I call {{getFilteredProxies}} a single time and then use 
that value throughout the function. But the list should only change upon 
failover-style actions (e.g. Observer changes to Standby) and the only thing 
that happens if the list changes is to start talking to a different observer, 
so I don't think it's an issue, but let me know if there's something I'm 
missing.

> Implement performFailover logic for ObserverReadProxyProvider.
> --
>
> Key: HDFS-13779
> URL: https://issues.apache.org/jira/browse/HDFS-13779
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Konstantin Shvachko
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-13779-HDFS-12943.WIP00.patch
>
>
> Currently {{ObserverReadProxyProvider}} inherits {{performFailover()}} method 
> from {{ConfiguredFailoverProxyProvider}}, which simply increments the index 
> and switches over to another NameNode. The logic for ORPP should be smart 
> enough to choose another observer, otherwise it can switch to a SBN, where 
> reads are disallowed, or to an ANN, which defeats the purpose of reads from 
> standby.
> This was discussed in HDFS-12976.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13779) Implement performFailover logic for ObserverReadProxyProvider.

2018-08-16 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16583130#comment-16583130
 ] 

Chen Liang commented on HDFS-13779:
---

Thanks for the correction [~xkrogen]! Makes sense to me. I recall one example 
case I was thinking of was say there are 0, 1, 2 proxies, and currentIndex = 2. 
If 1 gets removed and the list shrinks to 0, 2. Due to modulo on size, seems 
currentIndex would effectively point to 2%size (2) = 0, but it was meant to 
point to 2. I guess the whole point I was concerning was that since 
{{currentObservers}} is based on modulo of size, but size can be changing at 
any point of time, {{currentIndex}} can become not so current over time. But 
seems even if  {{currentIndex}} is wrong, it should be eventually fixed by the 
for loop, and  thanks for point out that this should only happen in failover 
actions. I agree this should not be an issue.

> Implement performFailover logic for ObserverReadProxyProvider.
> --
>
> Key: HDFS-13779
> URL: https://issues.apache.org/jira/browse/HDFS-13779
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Konstantin Shvachko
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-13779-HDFS-12943.WIP00.patch
>
>
> Currently {{ObserverReadProxyProvider}} inherits {{performFailover()}} method 
> from {{ConfiguredFailoverProxyProvider}}, which simply increments the index 
> and switches over to another NameNode. The logic for ORPP should be smart 
> enough to choose another observer, otherwise it can switch to a SBN, where 
> reads are disallowed, or to an ANN, which defeats the purpose of reads from 
> standby.
> This was discussed in HDFS-12976.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13779) Implement performFailover logic for ObserverReadProxyProvider.

2018-08-23 Thread Erik Krogen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590979#comment-16590979
 ] 

Erik Krogen commented on HDFS-13779:


I had some offline discussion with [~shv] and we realized that the complicated 
logic I use to try to be as efficient as possible about passing CFPP only 
active/standby NameNodes is not really necessary. Given how infrequently 
failover occurs, the few RPCs saved are not really worth the extra complexity 
of the logic and intermingling between CFPP and ORPP. I will put up a new patch 
with simpler logic based on Konstantin's patch for HDFS-13782.

> Implement performFailover logic for ObserverReadProxyProvider.
> --
>
> Key: HDFS-13779
> URL: https://issues.apache.org/jira/browse/HDFS-13779
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Konstantin Shvachko
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-13779-HDFS-12943.WIP00.patch
>
>
> Currently {{ObserverReadProxyProvider}} inherits {{performFailover()}} method 
> from {{ConfiguredFailoverProxyProvider}}, which simply increments the index 
> and switches over to another NameNode. The logic for ORPP should be smart 
> enough to choose another observer, otherwise it can switch to a SBN, where 
> reads are disallowed, or to an ANN, which defeats the purpose of reads from 
> standby.
> This was discussed in HDFS-12976.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13779) Implement performFailover logic for ObserverReadProxyProvider.

2018-08-27 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16594397#comment-16594397
 ] 

genericqa commented on HDFS-13779:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
27s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} HDFS-12943 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  2m  
3s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
27s{color} | {color:green} HDFS-12943 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m 
43s{color} | {color:green} HDFS-12943 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 8s{color} | {color:green} HDFS-12943 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
55s{color} | {color:green} HDFS-12943 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 39s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
31s{color} | {color:green} HDFS-12943 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
18s{color} | {color:green} HDFS-12943 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  3m  
9s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 59s{color} | {color:orange} hadoop-hdfs-project: The patch generated 21 new 
+ 11 unchanged - 7 fixed = 32 total (was 18) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  0s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
10s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
28s{color} | {color:red} hadoop-hdfs-project_hadoop-hdfs-client generated 10 
new + 0 unchanged - 0 fixed = 10 total (was 0) {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
51s{color} | {color:red} hadoop-hdfs-project_hadoop-hdfs generated 2 new + 1 
unchanged - 0 fixed = 3 total (was 1) {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
36s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}103m  2s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
36s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}180m 38s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestDFSInotifyEventInputStreamKerberized |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy |
|   | hadoop.hdfs.server.namenode.ha.TestObserverNode |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:9b55946 |
| JIRA Issue | HDFS-13779 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12937344/HDFS-13779-HDFS-12943.000.patch
 |
| Option

[jira] [Commented] (HDFS-13779) Implement performFailover logic for ObserverReadProxyProvider.

2018-08-28 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16595716#comment-16595716
 ] 

Chen Liang commented on HDFS-13779:
---

Thanks for the patch [~xkrogen]! LGTM overall, only some minor comments:
 # {{ObserverReadProxyProvider#invoke}}, change {{observerCount}} to some like 
{{failedObserverCount}} ?
 # {{ObserverReadProxyProvider#changeProxy}}, seems we can reduce the scope of 
locking to after the {{if (currentProxy != initial)}} check, so that it does 
not need to grab the lock just for this check and then return. May not be that 
necessary, I'm okay with either way.
 # {{CachingHAProxyFactory}}, move the initialization of {{cache}} into 
constructor?
 # More of a question, do we really need to have {{CachingHAProxyFactory}}? 
because for now it seems to me that even without caching, 
{{createProxyIfNeeded}} is doing the job that only when proxy is null it will 
create proxy. Am I missing something here?

> Implement performFailover logic for ObserverReadProxyProvider.
> --
>
> Key: HDFS-13779
> URL: https://issues.apache.org/jira/browse/HDFS-13779
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Konstantin Shvachko
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-13779-HDFS-12943.000.patch, 
> HDFS-13779-HDFS-12943.WIP00.patch
>
>
> Currently {{ObserverReadProxyProvider}} inherits {{performFailover()}} method 
> from {{ConfiguredFailoverProxyProvider}}, which simply increments the index 
> and switches over to another NameNode. The logic for ORPP should be smart 
> enough to choose another observer, otherwise it can switch to a SBN, where 
> reads are disallowed, or to an ANN, which defeats the purpose of reads from 
> standby.
> This was discussed in HDFS-12976.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13779) Implement performFailover logic for ObserverReadProxyProvider.

2018-08-28 Thread Erik Krogen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16595743#comment-16595743
 ] 

Erik Krogen commented on HDFS-13779:


Thanks for the review [~vagarychen]!
# Sure, it is only used for logging how many failed, so sounds good to me.
# The lock is needed before the if-check, else the {{currentProxy}} could have 
changed after checking but before locking. We could add a double check (once 
before lock, once after lock), but since {{changeProxy}} is only called when an 
observer has a failure I don't think we need to try too hard to optimize its 
synchronization. {{getCurrentProxy}}, on the other hand, is called on every RPC 
which is why I wanted to make sure it wasn't synchronized.
# Sure, seems good
# {{createProxyIfNeeded}} will only create it once, yes, but the 
{{NNProxyInfo}} instances within {{ObserverReadProxyProvider}} are not the same 
as the ones within {{ConfiguredFailoverProxyProvider}} (it is delegation, not 
inheritance), so the proxy will still be created twice if 
{{CachingHAProxyFactory}} is not used. I am open to cleaner solutions to avoid 
the redundant proxy creation, though.

> Implement performFailover logic for ObserverReadProxyProvider.
> --
>
> Key: HDFS-13779
> URL: https://issues.apache.org/jira/browse/HDFS-13779
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Konstantin Shvachko
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-13779-HDFS-12943.000.patch, 
> HDFS-13779-HDFS-12943.WIP00.patch
>
>
> Currently {{ObserverReadProxyProvider}} inherits {{performFailover()}} method 
> from {{ConfiguredFailoverProxyProvider}}, which simply increments the index 
> and switches over to another NameNode. The logic for ORPP should be smart 
> enough to choose another observer, otherwise it can switch to a SBN, where 
> reads are disallowed, or to an ANN, which defeats the purpose of reads from 
> standby.
> This was discussed in HDFS-12976.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13779) Implement performFailover logic for ObserverReadProxyProvider.

2018-08-30 Thread Erik Krogen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16597827#comment-16597827
 ] 

Erik Krogen commented on HDFS-13779:


Just attached v001 patch addressing the review comments as above, filling out 
the remaining Javadoc, fixing TestObserverNode, and a few other minor fixes.

> Implement performFailover logic for ObserverReadProxyProvider.
> --
>
> Key: HDFS-13779
> URL: https://issues.apache.org/jira/browse/HDFS-13779
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Konstantin Shvachko
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-13779-HDFS-12943.000.patch, 
> HDFS-13779-HDFS-12943.001.patch, HDFS-13779-HDFS-12943.WIP00.patch
>
>
> Currently {{ObserverReadProxyProvider}} inherits {{performFailover()}} method 
> from {{ConfiguredFailoverProxyProvider}}, which simply increments the index 
> and switches over to another NameNode. The logic for ORPP should be smart 
> enough to choose another observer, otherwise it can switch to a SBN, where 
> reads are disallowed, or to an ANN, which defeats the purpose of reads from 
> standby.
> This was discussed in HDFS-12976.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13779) Implement performFailover logic for ObserverReadProxyProvider.

2018-08-30 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16597855#comment-16597855
 ] 

Konstantin Shvachko commented on HDFS-13779:


Looks good, Erik. Two comments from me:
# In ORPP you use raw types: "References to generic type 
AbstractNNFailoverProxyProvider should be parameterized"
# Unused imports (2) in {{TestObserverReadProxyProvider}}.
# For {{CachingHAProxyFactory}} I don't think we achieve much here. It still 
stores the the Proxy twice - one in the {{backingFactory}} and another in the 
cache. If want to cache proxies we should do it in the RPC engine globally for 
all proxies. I suggest we skip this optimization for now.
# Also {{TestObserverNode}} fails for me with your patch. I can debug if it is 
only on my box.

> Implement performFailover logic for ObserverReadProxyProvider.
> --
>
> Key: HDFS-13779
> URL: https://issues.apache.org/jira/browse/HDFS-13779
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Konstantin Shvachko
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-13779-HDFS-12943.000.patch, 
> HDFS-13779-HDFS-12943.001.patch, HDFS-13779-HDFS-12943.WIP00.patch
>
>
> Currently {{ObserverReadProxyProvider}} inherits {{performFailover()}} method 
> from {{ConfiguredFailoverProxyProvider}}, which simply increments the index 
> and switches over to another NameNode. The logic for ORPP should be smart 
> enough to choose another observer, otherwise it can switch to a SBN, where 
> reads are disallowed, or to an ANN, which defeats the purpose of reads from 
> standby.
> This was discussed in HDFS-12976.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13779) Implement performFailover logic for ObserverReadProxyProvider.

2018-08-30 Thread Erik Krogen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16598050#comment-16598050
 ] 

Erik Krogen commented on HDFS-13779:


Looks like me and Konstantin hit a race condition :) Comments 2 and 4 were 
fixed in v001 patch.

I just attached v002 which addresses 1 and 3. Konstantin and I discussed 
offline and realized that the caching done in {{ipc.Client}} will make the two 
ProxyProviders share a single connection per NameNode regardless of the caching 
of the proxies themselves, so I agree that the {{CachingHAProxyFactory}} is not 
necessary. This also makes my changes to the constructor (accepting a class and 
using reflection, instead of directly accepting an instance) unnecessary. 

> Implement performFailover logic for ObserverReadProxyProvider.
> --
>
> Key: HDFS-13779
> URL: https://issues.apache.org/jira/browse/HDFS-13779
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Konstantin Shvachko
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-13779-HDFS-12943.000.patch, 
> HDFS-13779-HDFS-12943.001.patch, HDFS-13779-HDFS-12943.002.patch, 
> HDFS-13779-HDFS-12943.WIP00.patch
>
>
> Currently {{ObserverReadProxyProvider}} inherits {{performFailover()}} method 
> from {{ConfiguredFailoverProxyProvider}}, which simply increments the index 
> and switches over to another NameNode. The logic for ORPP should be smart 
> enough to choose another observer, otherwise it can switch to a SBN, where 
> reads are disallowed, or to an ANN, which defeats the purpose of reads from 
> standby.
> This was discussed in HDFS-12976.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13779) Implement performFailover logic for ObserverReadProxyProvider.

2018-08-30 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16598123#comment-16598123
 ] 

Konstantin Shvachko commented on HDFS-13779:


Erik, this looks good. Two minor nits from my IDE
# Unused import of Constructor in ORPP
# {{TestObserverReadProxyProvider.proxyFactoryInvocations}} is used for 
incrementing it. You might want to get rid of it, or check its value somewhere.

I recommend committing this without waiting for Jenkins to unblock other jiras. 
We should fix warnings if there will be any later on.

> Implement performFailover logic for ObserverReadProxyProvider.
> --
>
> Key: HDFS-13779
> URL: https://issues.apache.org/jira/browse/HDFS-13779
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Konstantin Shvachko
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-13779-HDFS-12943.000.patch, 
> HDFS-13779-HDFS-12943.001.patch, HDFS-13779-HDFS-12943.002.patch, 
> HDFS-13779-HDFS-12943.WIP00.patch
>
>
> Currently {{ObserverReadProxyProvider}} inherits {{performFailover()}} method 
> from {{ConfiguredFailoverProxyProvider}}, which simply increments the index 
> and switches over to another NameNode. The logic for ORPP should be smart 
> enough to choose another observer, otherwise it can switch to a SBN, where 
> reads are disallowed, or to an ANN, which defeats the purpose of reads from 
> standby.
> This was discussed in HDFS-12976.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13779) Implement performFailover logic for ObserverReadProxyProvider.

2018-08-31 Thread Erik Krogen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16598955#comment-16598955
 ] 

Erik Krogen commented on HDFS-13779:


Good catches, Konstantin! I should have taken a closer look after my v001 -> 
v002 revisions. I also removed the {{throws IOException}} clause from the 
{{ObserverReadProxyProvider}} constructors, as it is no longer applicable. 
Attaching v003 patch, which I also just committed to HDFS-12943 branch. Thanks 
for the help [~shv] and [~vagarychen]!

> Implement performFailover logic for ObserverReadProxyProvider.
> --
>
> Key: HDFS-13779
> URL: https://issues.apache.org/jira/browse/HDFS-13779
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Konstantin Shvachko
>Assignee: Erik Krogen
>Priority: Major
> Fix For: HDFS-12943
>
> Attachments: HDFS-13779-HDFS-12943.000.patch, 
> HDFS-13779-HDFS-12943.001.patch, HDFS-13779-HDFS-12943.002.patch, 
> HDFS-13779-HDFS-12943.003.patch, HDFS-13779-HDFS-12943.WIP00.patch
>
>
> Currently {{ObserverReadProxyProvider}} inherits {{performFailover()}} method 
> from {{ConfiguredFailoverProxyProvider}}, which simply increments the index 
> and switches over to another NameNode. The logic for ORPP should be smart 
> enough to choose another observer, otherwise it can switch to a SBN, where 
> reads are disallowed, or to an ANN, which defeats the purpose of reads from 
> standby.
> This was discussed in HDFS-12976.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org