[jira] [Updated] (HDFS-14017) ObserverReadProxyProviderWithIPFailover should work with HA configuration

2018-11-14 Thread Chen Liang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-14017:
--
Attachment: HDFS-14017-HDFS-12943.012.patch

> ObserverReadProxyProviderWithIPFailover should work with HA configuration
> -
>
> Key: HDFS-14017
> URL: https://issues.apache.org/jira/browse/HDFS-14017
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-14017-HDFS-12943.001.patch, 
> HDFS-14017-HDFS-12943.002.patch, HDFS-14017-HDFS-12943.003.patch, 
> HDFS-14017-HDFS-12943.004.patch, HDFS-14017-HDFS-12943.005.patch, 
> HDFS-14017-HDFS-12943.006.patch, HDFS-14017-HDFS-12943.008.patch, 
> HDFS-14017-HDFS-12943.009.patch, HDFS-14017-HDFS-12943.010.patch, 
> HDFS-14017-HDFS-12943.011.patch, HDFS-14017-HDFS-12943.012.patch
>
>
> Currently {{ObserverReadProxyProviderWithIPFailover}} extends 
> {{ObserverReadProxyProvider}}, and the only difference is changing the proxy 
> factory to use {{IPFailoverProxyProvider}}. However this is not enough 
> because when calling constructor of {{ObserverReadProxyProvider}} in 
> super(...), the follow line:
> {code:java}
> nameNodeProxies = getProxyAddresses(uri,
> HdfsClientConfigKeys.DFS_NAMENODE_RPC_ADDRESS_KEY);
> {code}
> will try to resolve the all configured NN addresses to do configured 
> failover. But in the case of IPFailover, this does not really apply.
>  
> A second issue closely related is about delegation token. For example, in 
> current IPFailover setup, say we have a virtual host nn.xyz.com, which points 
> to either of two physical nodes nn1.xyz.com or nn2.xyz.com. In current HDFS, 
> there is always only one DT being exchanged, which has hostname nn.xyz.com. 
> Server only issues this DT, and client only knows the host nn.xyz.com, so all 
> is good. But in Observer read, even with IPFailover, the client will no 
> longer contacting nn.xyz.com, but will actively reaching to nn1.xyz.com and 
> nn2.xyz.com. During this process, current code will look for DT associated 
> with hostname nn1.xyz.com or nn2.xyz.com, which is different from the DT 
> given by NN. causing Token authentication to fail. This happens in 
> {{AbstractDelegationTokenSelector#selectToken}}. New IPFailover proxy 
> provider will need to resolve this as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12943) Consistent Reads from Standby Node

2018-11-12 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16684222#comment-16684222
 ] 

Chen Liang commented on HDFS-12943:
---

[~xiangheng] thanks for trying Observer read! What was the full command you 
ran? It should be something like {{hdfs haadmin -transitionToObserver }} 
where nnID is the ID of the name node that you want to transition to Observer. 
You can run {{hdfs haadmin -getAllServiceState}} to list all the valid nnIDs in 
the cluster.

> Consistent Reads from Standby Node
> --
>
> Key: HDFS-12943
> URL: https://issues.apache.org/jira/browse/HDFS-12943
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs
>Reporter: Konstantin Shvachko
>Priority: Major
> Attachments: ConsistentReadsFromStandbyNode.pdf, 
> ConsistentReadsFromStandbyNode.pdf, 
> TestPlan-ConsistentReadsFromStandbyNode.pdf
>
>
> StandbyNode in HDFS is a replica of the active NameNode. The states of the 
> NameNodes are coordinated via the journal. It is natural to consider 
> StandbyNode as a read-only replica. As with any replicated distributed system 
> the problem of stale reads should be resolved. Our main goal is to provide 
> reads from standby in a consistent way in order to enable a wide range of 
> existing applications running on top of HDFS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14059) Test reads from standby on a secure cluster with Configured failover

2018-11-12 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16684229#comment-16684229
 ] 

Chen Liang commented on HDFS-14059:
---

Thanks for sharing [~zero45]!

For (1), On a quick glance of the code, it seems if 
\{{dfs.ha.automatic-failover.enabled}} is set to true, manually transition will 
be rejected with that error. Did you have this configured? We don't seem to 
have this.

For (2), I think what you suspect makes a lot of sense. I was getting the same 
error and ended up with adding {{hadoop.security.service.user.name.key}}. 
HDFS-14035 should fix this.

> Test reads from standby on a secure cluster with Configured failover
> 
>
> Key: HDFS-14059
> URL: https://issues.apache.org/jira/browse/HDFS-14059
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Reporter: Konstantin Shvachko
>Assignee: Plamen Jeliazkov
>Priority: Major
>
> Run standard HDFS tests to verify reading from ObserverNode on a secure HA 
> cluster with {{ConfiguredFailoverProxyProvider}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14035) NN status discovery does not leverage delegation token

2018-11-12 Thread Chen Liang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-14035:
--
Attachment: HDFS-14035-HDFS-12943.013.patch

> NN status discovery does not leverage delegation token
> --
>
> Key: HDFS-14035
> URL: https://issues.apache.org/jira/browse/HDFS-14035
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-14035-HDFS-12943.001.patch, 
> HDFS-14035-HDFS-12943.002.patch, HDFS-14035-HDFS-12943.003.patch, 
> HDFS-14035-HDFS-12943.004.patch, HDFS-14035-HDFS-12943.005.patch, 
> HDFS-14035-HDFS-12943.006.patch, HDFS-14035-HDFS-12943.007.patch, 
> HDFS-14035-HDFS-12943.008.patch, HDFS-14035-HDFS-12943.009.patch, 
> HDFS-14035-HDFS-12943.010.patch, HDFS-14035-HDFS-12943.011.patch, 
> HDFS-14035-HDFS-12943.012.patch, HDFS-14035-HDFS-12943.013.patch
>
>
> Currently ObserverReadProxyProvider uses 
> {{HAServiceProtocol#getServiceStatus}} to get the status of each NN. However 
> {{HAServiceProtocol}} does not leverage delegation token. So when running an 
> application on YARN and when YARN node manager makes this call 
> getServiceStatus, token authentication will fail, causing the application to 
> fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14035) NN status discovery does not leverage delegation token

2018-11-12 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16684433#comment-16684433
 ] 

Chen Liang commented on HDFS-14035:
---

Thanks for the review [~shv]! The failed test TestConsistentReadsObserver is 
related. Turns out a side effect of using client protocol to discover server 
state is that the call to {{changeProxy}} could potentially updating client 
alignment context state id to most recent, if talked to active, introducing a 
race condition to {{testMsyncSimple}}. Post v0013 patch to resolve this.

> NN status discovery does not leverage delegation token
> --
>
> Key: HDFS-14035
> URL: https://issues.apache.org/jira/browse/HDFS-14035
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-14035-HDFS-12943.001.patch, 
> HDFS-14035-HDFS-12943.002.patch, HDFS-14035-HDFS-12943.003.patch, 
> HDFS-14035-HDFS-12943.004.patch, HDFS-14035-HDFS-12943.005.patch, 
> HDFS-14035-HDFS-12943.006.patch, HDFS-14035-HDFS-12943.007.patch, 
> HDFS-14035-HDFS-12943.008.patch, HDFS-14035-HDFS-12943.009.patch, 
> HDFS-14035-HDFS-12943.010.patch, HDFS-14035-HDFS-12943.011.patch, 
> HDFS-14035-HDFS-12943.012.patch, HDFS-14035-HDFS-12943.013.patch
>
>
> Currently ObserverReadProxyProvider uses 
> {{HAServiceProtocol#getServiceStatus}} to get the status of each NN. However 
> {{HAServiceProtocol}} does not leverage delegation token. So when running an 
> application on YARN and when YARN node manager makes this call 
> getServiceStatus, token authentication will fail, causing the application to 
> fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14017) ObserverReadProxyProviderWithIPFailover should work with HA configuration

2018-11-14 Thread Chen Liang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-14017:
--
Attachment: HDFS-14017-HDFS-12943.011.patch

> ObserverReadProxyProviderWithIPFailover should work with HA configuration
> -
>
> Key: HDFS-14017
> URL: https://issues.apache.org/jira/browse/HDFS-14017
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-14017-HDFS-12943.001.patch, 
> HDFS-14017-HDFS-12943.002.patch, HDFS-14017-HDFS-12943.003.patch, 
> HDFS-14017-HDFS-12943.004.patch, HDFS-14017-HDFS-12943.005.patch, 
> HDFS-14017-HDFS-12943.006.patch, HDFS-14017-HDFS-12943.008.patch, 
> HDFS-14017-HDFS-12943.009.patch, HDFS-14017-HDFS-12943.010.patch, 
> HDFS-14017-HDFS-12943.011.patch
>
>
> Currently {{ObserverReadProxyProviderWithIPFailover}} extends 
> {{ObserverReadProxyProvider}}, and the only difference is changing the proxy 
> factory to use {{IPFailoverProxyProvider}}. However this is not enough 
> because when calling constructor of {{ObserverReadProxyProvider}} in 
> super(...), the follow line:
> {code:java}
> nameNodeProxies = getProxyAddresses(uri,
> HdfsClientConfigKeys.DFS_NAMENODE_RPC_ADDRESS_KEY);
> {code}
> will try to resolve the all configured NN addresses to do configured 
> failover. But in the case of IPFailover, this does not really apply.
>  
> A second issue closely related is about delegation token. For example, in 
> current IPFailover setup, say we have a virtual host nn.xyz.com, which points 
> to either of two physical nodes nn1.xyz.com or nn2.xyz.com. In current HDFS, 
> there is always only one DT being exchanged, which has hostname nn.xyz.com. 
> Server only issues this DT, and client only knows the host nn.xyz.com, so all 
> is good. But in Observer read, even with IPFailover, the client will no 
> longer contacting nn.xyz.com, but will actively reaching to nn1.xyz.com and 
> nn2.xyz.com. During this process, current code will look for DT associated 
> with hostname nn1.xyz.com or nn2.xyz.com, which is different from the DT 
> given by NN. causing Token authentication to fail. This happens in 
> {{AbstractDelegationTokenSelector#selectToken}}. New IPFailover proxy 
> provider will need to resolve this as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14017) ObserverReadProxyProviderWithIPFailover should work with HA configuration

2018-11-14 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686936#comment-16686936
 ] 

Chen Liang commented on HDFS-14017:
---

v011 patch to fix checkstyle issues.

> ObserverReadProxyProviderWithIPFailover should work with HA configuration
> -
>
> Key: HDFS-14017
> URL: https://issues.apache.org/jira/browse/HDFS-14017
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-14017-HDFS-12943.001.patch, 
> HDFS-14017-HDFS-12943.002.patch, HDFS-14017-HDFS-12943.003.patch, 
> HDFS-14017-HDFS-12943.004.patch, HDFS-14017-HDFS-12943.005.patch, 
> HDFS-14017-HDFS-12943.006.patch, HDFS-14017-HDFS-12943.008.patch, 
> HDFS-14017-HDFS-12943.009.patch, HDFS-14017-HDFS-12943.010.patch, 
> HDFS-14017-HDFS-12943.011.patch
>
>
> Currently {{ObserverReadProxyProviderWithIPFailover}} extends 
> {{ObserverReadProxyProvider}}, and the only difference is changing the proxy 
> factory to use {{IPFailoverProxyProvider}}. However this is not enough 
> because when calling constructor of {{ObserverReadProxyProvider}} in 
> super(...), the follow line:
> {code:java}
> nameNodeProxies = getProxyAddresses(uri,
> HdfsClientConfigKeys.DFS_NAMENODE_RPC_ADDRESS_KEY);
> {code}
> will try to resolve the all configured NN addresses to do configured 
> failover. But in the case of IPFailover, this does not really apply.
>  
> A second issue closely related is about delegation token. For example, in 
> current IPFailover setup, say we have a virtual host nn.xyz.com, which points 
> to either of two physical nodes nn1.xyz.com or nn2.xyz.com. In current HDFS, 
> there is always only one DT being exchanged, which has hostname nn.xyz.com. 
> Server only issues this DT, and client only knows the host nn.xyz.com, so all 
> is good. But in Observer read, even with IPFailover, the client will no 
> longer contacting nn.xyz.com, but will actively reaching to nn1.xyz.com and 
> nn2.xyz.com. During this process, current code will look for DT associated 
> with hostname nn1.xyz.com or nn2.xyz.com, which is different from the DT 
> given by NN. causing Token authentication to fail. This happens in 
> {{AbstractDelegationTokenSelector#selectToken}}. New IPFailover proxy 
> provider will need to resolve this as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14035) NN status discovery does not leverage delegation token

2018-11-13 Thread Chen Liang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-14035:
--
Attachment: HDFS-14035-HDFS-12943.014.patch

> NN status discovery does not leverage delegation token
> --
>
> Key: HDFS-14035
> URL: https://issues.apache.org/jira/browse/HDFS-14035
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-14035-HDFS-12943.001.patch, 
> HDFS-14035-HDFS-12943.002.patch, HDFS-14035-HDFS-12943.003.patch, 
> HDFS-14035-HDFS-12943.004.patch, HDFS-14035-HDFS-12943.005.patch, 
> HDFS-14035-HDFS-12943.006.patch, HDFS-14035-HDFS-12943.007.patch, 
> HDFS-14035-HDFS-12943.008.patch, HDFS-14035-HDFS-12943.009.patch, 
> HDFS-14035-HDFS-12943.010.patch, HDFS-14035-HDFS-12943.011.patch, 
> HDFS-14035-HDFS-12943.012.patch, HDFS-14035-HDFS-12943.013.patch, 
> HDFS-14035-HDFS-12943.014.patch
>
>
> Currently ObserverReadProxyProvider uses 
> {{HAServiceProtocol#getServiceStatus}} to get the status of each NN. However 
> {{HAServiceProtocol}} does not leverage delegation token. So when running an 
> application on YARN and when YARN node manager makes this call 
> getServiceStatus, token authentication will fail, causing the application to 
> fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14035) NN status discovery does not leverage delegation token

2018-11-13 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685620#comment-16685620
 ] 

Chen Liang commented on HDFS-14035:
---

Discussed with [~xkrogen] offline, seems we can also resolve race condition in 
the unit test, but avoid using sleep, by making an uncoordinated call to server 
early. This will initialize the observer proxy and also sets the state id on 
client side. Post v014 patch, also added a couple missing javadoc.

> NN status discovery does not leverage delegation token
> --
>
> Key: HDFS-14035
> URL: https://issues.apache.org/jira/browse/HDFS-14035
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-14035-HDFS-12943.001.patch, 
> HDFS-14035-HDFS-12943.002.patch, HDFS-14035-HDFS-12943.003.patch, 
> HDFS-14035-HDFS-12943.004.patch, HDFS-14035-HDFS-12943.005.patch, 
> HDFS-14035-HDFS-12943.006.patch, HDFS-14035-HDFS-12943.007.patch, 
> HDFS-14035-HDFS-12943.008.patch, HDFS-14035-HDFS-12943.009.patch, 
> HDFS-14035-HDFS-12943.010.patch, HDFS-14035-HDFS-12943.011.patch, 
> HDFS-14035-HDFS-12943.012.patch, HDFS-14035-HDFS-12943.013.patch, 
> HDFS-14035-HDFS-12943.014.patch
>
>
> Currently ObserverReadProxyProvider uses 
> {{HAServiceProtocol#getServiceStatus}} to get the status of each NN. However 
> {{HAServiceProtocol}} does not leverage delegation token. So when running an 
> application on YARN and when YARN node manager makes this call 
> getServiceStatus, token authentication will fail, causing the application to 
> fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14017) ObserverReadProxyProviderWithIPFailover should work with HA configuration

2018-11-02 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673712#comment-16673712
 ] 

Chen Liang commented on HDFS-14017:
---

Thanks for the clarification [~xkrogen].

bq. when the active changes, how will it ever start using the new active?
My thought was that, just like how current IPFailover works now, it simply 
assumes the VIP is pointing to ANN, changing the VIP to NN mapping happens 
outside of HDFS. So I was thinking of the same approach, in IPFailover, just 
assume the VIP is the savior when failure happens (i.e. let failoverproxy 
variable point to VIP always), and don't bother to figure out what exactly it 
is pointing to. Although the current patch may not doing this correctly/have 
this at all.

bq. we would need to introduce additional VIPs for observers, and it's not 
clear to me if this makes sense
Can't agree more, exactly the struggle I had! The intuition of IPFailover is to 
rely on VIPs and don't bother figure out the physical addresses, but when there 
are multiple NNs, with some being observer and some being active. I was not 
sure how that is going to look like. So decided to go (for now at least) 
IPFailover sending request to discover physical nodes by itself.




> ObserverReadProxyProviderWithIPFailover should work with HA configuration
> -
>
> Key: HDFS-14017
> URL: https://issues.apache.org/jira/browse/HDFS-14017
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-14017-HDFS-12943.001.patch, 
> HDFS-14017-HDFS-12943.002.patch
>
>
> Currently {{ObserverReadProxyProviderWithIPFailover}} extends 
> {{ObserverReadProxyProvider}}, and the only difference is changing the proxy 
> factory to use {{IPFailoverProxyProvider}}. However this is not enough 
> because when calling constructor of {{ObserverReadProxyProvider}} in 
> super(...), the follow line:
> {code:java}
> nameNodeProxies = getProxyAddresses(uri,
> HdfsClientConfigKeys.DFS_NAMENODE_RPC_ADDRESS_KEY);
> {code}
> will try to resolve the all configured NN addresses to do configured 
> failover. But in the case of IPFailover, this does not really apply.
>  
> A second issue closely related is about delegation token. For example, in 
> current IPFailover setup, say we have a virtual host nn.xyz.com, which points 
> to either of two physical nodes nn1.xyz.com or nn2.xyz.com. In current HDFS, 
> there is always only one DT being exchanged, which has hostname nn.xyz.com. 
> Server only issues this DT, and client only knows the host nn.xyz.com, so all 
> is good. But in Observer read, even with IPFailover, the client will no 
> longer contacting nn.xyz.com, but will actively reaching to nn1.xyz.com and 
> nn2.xyz.com. During this process, current code will look for DT associated 
> with hostname nn1.xyz.com or nn2.xyz.com, which is different from the DT 
> given by NN. causing Token authentication to fail. This happens in 
> {{AbstractDelegationTokenSelector#selectToken}}. New IPFailover proxy 
> provider will need to resolve this as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14035) NN status discovery does not leverage delegation token

2018-11-02 Thread Chen Liang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-14035:
--
Attachment: HDFS-14035-HDFS-12943.004.patch

> NN status discovery does not leverage delegation token
> --
>
> Key: HDFS-14035
> URL: https://issues.apache.org/jira/browse/HDFS-14035
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-14035-HDFS-12943.001.patch, 
> HDFS-14035-HDFS-12943.002.patch, HDFS-14035-HDFS-12943.003.patch, 
> HDFS-14035-HDFS-12943.004.patch
>
>
> Currently ObserverReadProxyProvider uses 
> {{HAServiceProtocol#getServiceStatus}} to get the status of each NN. However 
> {{HAServiceProtocol}} does not leverage delegation token. So when running an 
> application on YARN and when YARN node manager makes this call 
> getServiceStatus, token authentication will fail, causing the application to 
> fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14035) NN status discovery does not leverage delegation token

2018-11-02 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673535#comment-16673535
 ] 

Chen Liang commented on HDFS-14035:
---

Rebased with v004 patch

> NN status discovery does not leverage delegation token
> --
>
> Key: HDFS-14035
> URL: https://issues.apache.org/jira/browse/HDFS-14035
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-14035-HDFS-12943.001.patch, 
> HDFS-14035-HDFS-12943.002.patch, HDFS-14035-HDFS-12943.003.patch, 
> HDFS-14035-HDFS-12943.004.patch
>
>
> Currently ObserverReadProxyProvider uses 
> {{HAServiceProtocol#getServiceStatus}} to get the status of each NN. However 
> {{HAServiceProtocol}} does not leverage delegation token. So when running an 
> application on YARN and when YARN node manager makes this call 
> getServiceStatus, token authentication will fail, causing the application to 
> fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13880) Add mechanism to allow certain RPC calls to bypass sync

2018-08-31 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599019#comment-16599019
 ] 

Chen Liang commented on HDFS-13880:
---

[~shv] Masync is just the current name I picked for methods that do not need to 
go through the sync process (which is msync). Just replaced sync with async. 
Please feel free to propose a different term :).

[~csun] thanks for the clarification. Will double check if 
{{HAServiceProtocol}} is currently synced by msync.

> Add mechanism to allow certain RPC calls to bypass sync
> ---
>
> Key: HDFS-13880
> URL: https://issues.apache.org/jira/browse/HDFS-13880
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-13880-HDFS-12943.001.patch, 
> HDFS-13880-HDFS-12943.002.patch
>
>
> Currently, every single call to NameNode will be synced, in the sense that 
> NameNode will not process it until state id catches up. But in certain cases, 
> we would like to bypass this check and allow the call to return immediately, 
> even when the server id is not up to date. One case could be the to-be-added 
> new API in HDFS-13749 that request for current state id. Others may include 
> calls that do not promise real time responses such as {{getContentSummary}}. 
> This Jira is to add the mechanism to allow certain calls to bypass sync.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13880) Add mechanism to allow certain RPC calls to bypass sync

2018-08-31 Thread Chen Liang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-13880:
--
Status: Patch Available  (was: Open)

> Add mechanism to allow certain RPC calls to bypass sync
> ---
>
> Key: HDFS-13880
> URL: https://issues.apache.org/jira/browse/HDFS-13880
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-13880-HDFS-12943.001.patch, 
> HDFS-13880-HDFS-12943.002.patch
>
>
> Currently, every single call to NameNode will be synced, in the sense that 
> NameNode will not process it until state id catches up. But in certain cases, 
> we would like to bypass this check and allow the call to return immediately, 
> even when the server id is not up to date. One case could be the to-be-added 
> new API in HDFS-13749 that request for current state id. Others may include 
> calls that do not promise real time responses such as {{getContentSummary}}. 
> This Jira is to add the mechanism to allow certain calls to bypass sync.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13872) Only some protocol methods should perform msync wait

2018-08-31 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599035#comment-16599035
 ] 

Chen Liang commented on HDFS-13872:
---

Somehow I missed this Jira completely...so I filed HDFS-13880 and submitted a 
patch there too...Sorry my bad!

I was taking a very similar approach at the beginning, I added a method in 
ReadOnly annotation to indicate whether this method should go through msync. 
But then I ran into an issue, which was that ReadOnly annotation is only 
applied to ClientProtocol. But when it comes down to  {{ProtobufRpcEngine}} 
layer, it actually changes from {{ClientProtocol}} to 
{{ClientNamenodeProtocol}} and the annotation could no longer be found. And 
{{ClientNamenodeProtocol}} class is actually a protobuf-generated class so we 
can not annotate there...Also, chatted with Konstantin, seems a more desired 
approach to do the check on server side.

So the approach I take in HDFS-13880 is that on server side when receiving a 
RPC call, it looks up method name in the RPC call in ClientProtocol, if the 
same name method exist, then the annotation of this method in ClientProtocol 
will be used to check if msync should be bypassed.

Again, sorry I missed this Jira earlier...

> Only some protocol methods should perform msync wait
> 
>
> Key: HDFS-13872
> URL: https://issues.apache.org/jira/browse/HDFS-13872
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-13872-HDFS-12943.000.patch
>
>
> Currently the implementation of msync added in HDFS-13767 waits until the 
> server has caught up to the client-specified transaction ID regardless of 
> what the inbound RPC is. This particularly causes problems for 
> ObserverReadProxyProvider (see HDFS-13779) when we try to fetch the state 
> from an observer/standby; this should be a quick operation, but it has to 
> wait for the node to catch up to the most current state. I initially thought 
> all {{HAServiceProtocol}} methods should thus be excluded from the wait 
> period, but actually I think the right approach is that _only_ 
> {{ClientProtocol}} methods should be subjected to the wait period. I propose 
> that we can do this via an annotation on client protocol which can then be 
> checked within {{ipc.Server}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13924) Handle BlockMissingException when reading from observer

2018-09-20 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16622626#comment-16622626
 ] 

Chen Liang commented on HDFS-13924:
---

Thanks for the update [~csun]! Just to add to my previous comment: What I was 
thinking of was to make the retry a more uniformed fashion. Specifically, in 
the ideal situation, I think on client side, it should always only be 
ProxyProvider that handles NN redirecting logic. To this extend, I would think 
of server side a better place to handle this compared to DFSOutputStream: 
server side throws exception, then ProxyProvider does the redirecting properly, 
so DFSInputStream is hidden from the retry and doesn't need to do anything in 
addition.

So IMO, the better way may be, just like you mentioned, creating a new 
exception, say, ObserverOperationFailException, for all the situations where 
Observer can not successfully handle a whatever request and worth retry active, 
just throw this exception. Whenever ObserverProxyProvider sees this exception, 
try again with active. Something along this line.

> Handle BlockMissingException when reading from observer
> ---
>
> Key: HDFS-13924
> URL: https://issues.apache.org/jira/browse/HDFS-13924
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chao Sun
>Priority: Major
>
> Internally we found that reading from ObserverNode may result to 
> {{BlockMissingException}}. This may happen when the observer sees a smaller 
> number of DNs than active (maybe due to communication issue with those DNs), 
> or (we guess) late block reports from some DNs to the observer. This error 
> happens in 
> [DFSInputStream#chooseDataNode|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java#L846],
>  when no valid DN can be found for the {{LocatedBlock}} got from the NN side.
> One potential solution (although a little hacky) is to ask the 
> {{DFSInputStream}} to retry active when this happens. The retry logic already 
> present in the code - we just have to dynamically set a flag to ask the 
> {{ObserverReadProxyProvider}} try active in this case.
> cc [~shv], [~xkrogen], [~vagarychen], [~zero45] for discussion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13873) ObserverNode should reject read requests when it is too far behind.

2018-09-20 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16622645#comment-16622645
 ] 

Chen Liang commented on HDFS-13873:
---

[~csun] any updates/plans for this? I saw you are busy with two other Jiras, I 
can help on this one if you like :).

Either way, I'm curious about what is the current internal implementation at 
Uber? I was syncing with Konstantin, we were planning to do this based on state 
id. But the threshold for rejection should probably be based on some runtime 
moving average (e.g. the number of txid being processed in past X mins). Any 
thoughts on this?

> ObserverNode should reject read requests when it is too far behind.
> ---
>
> Key: HDFS-13873
> URL: https://issues.apache.org/jira/browse/HDFS-13873
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client, namenode
>Affects Versions: HDFS-12943
>Reporter: Konstantin Shvachko
>Assignee: Chao Sun
>Priority: Major
>
> Add a server-side threshold for ObserverNode to reject read requests when it 
> is too far behind.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13873) ObserverNode should reject read requests when it is too far behind.

2018-09-20 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16622948#comment-16622948
 ] 

Chen Liang commented on HDFS-13873:
---

Thanks [~csun]!

I had some thoughts on this, sharing here for references:

version 1: have a tracker that, whenever client sends request to Observer, 
trackers records Observer's current state id X, and timestamp tx comparing to 
previous value Y and previous timestamp ty and t = (X - Y) / (tx - ty) gives an 
estimation of how long it takes for observer to proceed one txid. (this can be 
measured as moving average for better accuracy). And say delta = clientStateId 
- X, then delta * t gives the estimate time of when the client request can 
start being processed i.e. the msync wait time.

Plan 2: instead of tracking Observer state id increasing rate. We could also 
have t = the average time of processing one request. (This needs more code to 
measure time spent for a request to be in the queue until finished). The delta 
* t then becomes the estimate of when the client request will actually finish.

version 2 requires more code changes, but is able to handle the case that, 
Observer state id is actually not too far behind, but Observer node itself is 
being too slow, causing still a long processing time of a request. Which is not 
captured by version 1. The downside though, it seemed to me there can be cases 
where version 2 can reject many calls over-aggressively. Also addressing slow 
Observer seems a bit beyond the scope of this Jira.

I would say maybe we can go with the simpler of version 1 first and see how it 
works out. Any comments [~csun], [~shv]?

> ObserverNode should reject read requests when it is too far behind.
> ---
>
> Key: HDFS-13873
> URL: https://issues.apache.org/jira/browse/HDFS-13873
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client, namenode
>Affects Versions: HDFS-12943
>Reporter: Konstantin Shvachko
>Assignee: Chao Sun
>Priority: Major
>
> Add a server-side threshold for ObserverNode to reject read requests when it 
> is too far behind.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13873) ObserverNode should reject read requests when it is too far behind.

2018-09-24 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626547#comment-16626547
 ] 

Chen Liang commented on HDFS-13873:
---

Update here for reference: Synced offline with Konstantin. Seems one 
fundamental issue with my previous proposed approaches is that, when the server 
state id increment is slow, it is hard to differentiate between: 1. the server 
is slow; 2. there were not many write anyway. Meaning, in addition to 
estimating request syncing time, we also need to have a reasonable estimate of 
server state catch up rate, instead purely based current window.

> ObserverNode should reject read requests when it is too far behind.
> ---
>
> Key: HDFS-13873
> URL: https://issues.apache.org/jira/browse/HDFS-13873
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client, namenode
>Affects Versions: HDFS-12943
>Reporter: Konstantin Shvachko
>Assignee: Chao Sun
>Priority: Major
>
> Add a server-side threshold for ObserverNode to reject read requests when it 
> is too far behind.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14017) ObserverReadProxyProviderWithIPFailover should work with HA configuration

2018-11-16 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689781#comment-16689781
 ] 

Chen Liang commented on HDFS-14017:
---

Hmmm...thanks for the checking [~xkrogen]!

> ObserverReadProxyProviderWithIPFailover should work with HA configuration
> -
>
> Key: HDFS-14017
> URL: https://issues.apache.org/jira/browse/HDFS-14017
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-14017-HDFS-12943.001.patch, 
> HDFS-14017-HDFS-12943.002.patch, HDFS-14017-HDFS-12943.003.patch, 
> HDFS-14017-HDFS-12943.004.patch, HDFS-14017-HDFS-12943.005.patch, 
> HDFS-14017-HDFS-12943.006.patch, HDFS-14017-HDFS-12943.008.patch, 
> HDFS-14017-HDFS-12943.009.patch, HDFS-14017-HDFS-12943.010.patch, 
> HDFS-14017-HDFS-12943.011.patch, HDFS-14017-HDFS-12943.012.patch, 
> HDFS-14017-HDFS-12943.013.patch, HDFS-14017-HDFS-12943.014.patch
>
>
> Currently {{ObserverReadProxyProviderWithIPFailover}} extends 
> {{ObserverReadProxyProvider}}, and the only difference is changing the proxy 
> factory to use {{IPFailoverProxyProvider}}. However this is not enough 
> because when calling constructor of {{ObserverReadProxyProvider}} in 
> super(...), the follow line:
> {code:java}
> nameNodeProxies = getProxyAddresses(uri,
> HdfsClientConfigKeys.DFS_NAMENODE_RPC_ADDRESS_KEY);
> {code}
> will try to resolve the all configured NN addresses to do configured 
> failover. But in the case of IPFailover, this does not really apply.
>  
> A second issue closely related is about delegation token. For example, in 
> current IPFailover setup, say we have a virtual host nn.xyz.com, which points 
> to either of two physical nodes nn1.xyz.com or nn2.xyz.com. In current HDFS, 
> there is always only one DT being exchanged, which has hostname nn.xyz.com. 
> Server only issues this DT, and client only knows the host nn.xyz.com, so all 
> is good. But in Observer read, even with IPFailover, the client will no 
> longer contacting nn.xyz.com, but will actively reaching to nn1.xyz.com and 
> nn2.xyz.com. During this process, current code will look for DT associated 
> with hostname nn1.xyz.com or nn2.xyz.com, which is different from the DT 
> given by NN. causing Token authentication to fail. This happens in 
> {{AbstractDelegationTokenSelector#selectToken}}. New IPFailover proxy 
> provider will need to resolve this as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14058) Test reads from standby on a secure cluster with IP failover

2018-11-16 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689993#comment-16689993
 ] 

Chen Liang commented on HDFS-14058:
---

The tests I've run include the following. Please note that the following tests 
were done without several recent changes such as HDFS-14035 and HDFS-14017, but 
with some hacky code change and workaround. Although the required changes have 
been formalized to recent Jiras, the following tests haven't all been re-run 
along with those change. Post here for record.

The tests were done with the setup of 100+ datanodes, 1 Active NameNode and 1 
Observer NameNode. No other standby nodes. The cluster has light HDFS workload, 
has YARN deployed, and has security (Kerberos) enabled. The purpose here was 
not evaluate performance gain, but only to prove the functionality. In all the 
tests below, it is verified from Observer node audit log that the reads 
actually went to Observer node.

1. basic hdfs IO
- From hdfs command:
-- create/delete directory
-- basic file put/get/delete
- From a simple Java program. I wrote some code which creates a DFSClient 
instance and perform some basic operations against it:
-- create/delete directory
-- get/renew delegation token

One observation on this is that, from command line, depending on the relative 
order of ANN and ONN in config, the failover may happen every single time, with 
an exception printed. I believe this is because from command, every single 
command line call will create a new DFSClient instance. Which may start with 
calling Observer for write, causing failover. But for reused DFSClient (e.g. 
from a Java program where it create and reuse same DFSClient), there is no this 
issue.

2. simple MR job: a simple wordcount job from mapreduce-examples jar, on a very 
small input.

3. SliveTest: ran Slive from hadoop-mapreduce-client-jobclient jar, without 
parameters (so it uses default). I ran Slive 3 times for both with Observer 
enabled and disabled. I saw roughly the same ops/sec.

4.DFSIO: ran DFSIO read test several times from 
hadoop-mapreduce-client-jobclient jar, but only with very small input size. (10 
files with 1KB each). 

5. TeraGen/Sort/Validate: ran TeraGen/Sort/Validate from 
hadoop-mapreduce-examples jar with 1TB of data. TeraSort used 1800+ mappers and 
500 reducers. All three jobs finished successfully.

> Test reads from standby on a secure cluster with IP failover
> 
>
> Key: HDFS-14058
> URL: https://issues.apache.org/jira/browse/HDFS-14058
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Reporter: Konstantin Shvachko
>Assignee: Chen Liang
>Priority: Major
>
> Run standard HDFS tests to verify reading from ObserverNode on a secure HA 
> cluster with {{IPFailoverProxyProvider}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14181) Suspect there is a bug in NetworkTopology.java chooseRandom function.

2019-01-02 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732304#comment-16732304
 ] 

Chen Liang commented on HDFS-14181:
---

Sorry for the late response, just got back from vacation. The fix seems 
correct, v005 patch LGTM. 

> Suspect there is a bug in NetworkTopology.java chooseRandom function.
> -
>
> Key: HDFS-14181
> URL: https://issues.apache.org/jira/browse/HDFS-14181
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, namenode
>Affects Versions: 2.9.2
>Reporter: Sihai Ke
>Assignee: Sihai Ke
>Priority: Major
> Attachments: 0001-add-UT-for-NetworkTopology.patch, 
> 0001-fix-NetworkTopology.java-chooseRandom-bug.patch, HDFS-14181.01.patch, 
> HDFS-14181.02.patch, HDFS-14181.03.patch, HDFS-14181.04.patch, 
> HDFS-14181.05.patch, image-2018-12-29-15-02-19-415.png
>
>
> During reading the hadoop NetworkTopology.java, I suspect there is a bug in 
> function 
> chooseRandom (line 498, hadoop version 2.9.2-RC0), 
>  I think there is a bug in{color:#f79232} code, ~excludedScope doesn't mean 
> availableNodes under Scope node, and I also add unit test for this and get an 
> exception.{color}
> bug code in the else.
> {code:java}
> // code placeholder
>  if (excludedScope == null) {
> availableNodes = countNumOfAvailableNodes(scope, excludedNodes);
>   } else {
> availableNodes =
> countNumOfAvailableNodes("~" + excludedScope, excludedNodes);
>   }{code}
> Source code:
> {code:java}
> // code placeholder
> protected Node chooseRandom(final String scope, String excludedScope,
> final Collection excludedNodes) {
>   if (excludedScope != null) {
> if (scope.startsWith(excludedScope)) {
>   return null;
> }
> if (!excludedScope.startsWith(scope)) {
>   excludedScope = null;
> }
>   }
>   Node node = getNode(scope);
>   if (!(node instanceof InnerNode)) {
> return excludedNodes != null && excludedNodes.contains(node) ?
> null : node;
>   }
>   InnerNode innerNode = (InnerNode)node;
>   int numOfDatanodes = innerNode.getNumOfLeaves();
>   if (excludedScope == null) {
> node = null;
>   } else {
> node = getNode(excludedScope);
> if (!(node instanceof InnerNode)) {
>   numOfDatanodes -= 1;
> } else {
>   numOfDatanodes -= ((InnerNode)node).getNumOfLeaves();
> }
>   }
>   if (numOfDatanodes <= 0) {
> LOG.debug("Failed to find datanode (scope=\"{}\" excludedScope=\"{}\")."
> + " numOfDatanodes={}",
> scope, excludedScope, numOfDatanodes);
> return null;
>   }
>   final int availableNodes;
>   if (excludedScope == null) {
> availableNodes = countNumOfAvailableNodes(scope, excludedNodes);
>   } else {
> availableNodes =
> countNumOfAvailableNodes("~" + excludedScope, excludedNodes);
>   }
>   LOG.debug("Choosing random from {} available nodes on node {},"
>   + " scope={}, excludedScope={}, excludeNodes={}. numOfDatanodes={}.",
>   availableNodes, innerNode, scope, excludedScope, excludedNodes,
>   numOfDatanodes);
>   Node ret = null;
>   if (availableNodes > 0) {
> ret = chooseRandom(innerNode, node, excludedNodes, numOfDatanodes,
> availableNodes);
>   }
>   LOG.debug("chooseRandom returning {}", ret);
>   return ret;
> }
> {code}
>  
>  
> Add Unit Test in TestClusterTopology.java, but get exception.
>  
> {code:java}
> // code placeholder
> @Test
> public void testChooseRandom1() {
>   // create the topology
>   NetworkTopology cluster = NetworkTopology.getInstance(new Configuration());
>   NodeElement node1 = getNewNode("node1", "/a1/b1/c1");
>   cluster.add(node1);
>   NodeElement node2 = getNewNode("node2", "/a1/b1/c1");
>   cluster.add(node2);
>   NodeElement node3 = getNewNode("node3", "/a1/b1/c2");
>   cluster.add(node3);
>   NodeElement node4 = getNewNode("node4", "/a1/b2/c3");
>   cluster.add(node4);
>   Node node = cluster.chooseRandom("/a1/b1", "/a1/b1/c1", null);
>   assertSame(node.getName(), "node3");
> }
> {code}
>  
> Exception:
> {code:java}
> // code placeholder
> java.lang.IllegalArgumentException: 1 should >= 2, and both should be 
> positive. 
> at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) 
> at 
> org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:567) 
> at 
> org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:544) 
> atorg.apache.hadoop.net.TestClusterTopology.testChooseRandom1(TestClusterTopology.java:198)
> {code}
>  
> {color:#f79232}!image-2018-12-29-15-02-19-415.png!{color}
>  
>  
> [~vagarychen] this change is imported in PR HDFS-11577, could you help to 
> check whether this is a bug ?
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HDFS-14205) Backport HDFS-6440 to branch-2

2019-01-14 Thread Chen Liang (JIRA)
Chen Liang created HDFS-14205:
-

 Summary: Backport HDFS-6440 to branch-2
 Key: HDFS-14205
 URL: https://issues.apache.org/jira/browse/HDFS-14205
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Chen Liang


Currently support for more than 2 NameNodes (HDFS-6440) is only in branch-3. 
This JIRA aims to backport it to branch-2, as this is required by HDFS-12943 
(consistent read from standby) backport to branch-2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14204) Backport HDFS-12943 to branch-2

2019-01-14 Thread Chen Liang (JIRA)
Chen Liang created HDFS-14204:
-

 Summary: Backport HDFS-12943 to branch-2
 Key: HDFS-14204
 URL: https://issues.apache.org/jira/browse/HDFS-14204
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Chen Liang


Currently, consistent read from standby feature (HDFS-12943) is only in trunk 
(branch-3). This JIRA aims to backport the feature to branch-2.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14204) Backport HDFS-12943 to branch-2

2019-01-14 Thread Chen Liang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-14204:
--
External issue ID: hdfs-12943

> Backport HDFS-12943 to branch-2
> ---
>
> Key: HDFS-14204
> URL: https://issues.apache.org/jira/browse/HDFS-14204
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Liang
>Priority: Major
>
> Currently, consistent read from standby feature (HDFS-12943) is only in trunk 
> (branch-3). This JIRA aims to backport the feature to branch-2.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14142) Move ipfailover config key out of HdfsClientConfigKeys

2018-12-12 Thread Chen Liang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-14142:
--
   Resolution: Fixed
Fix Version/s: HDFS-12943
   Status: Resolved  (was: Patch Available)

> Move ipfailover config key out of HdfsClientConfigKeys
> --
>
> Key: HDFS-14142
> URL: https://issues.apache.org/jira/browse/HDFS-14142
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Minor
> Fix For: HDFS-12943
>
> Attachments: HDFS-14142-HDFS-12943.001.patch
>
>
> Running TestHdfsConfigFields throws error complaining missing key 
> dfs.client.failover.ipfailover.virtual-address. Since this config key is 
> specific to only ORFPPwithIP, This Jira moves this config prefix to 
> ObserverReadProxyProviderWithIPFailover.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14142) Move ipfailover config key out of HdfsClientConfigKeys

2018-12-12 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719300#comment-16719300
 ] 

Chen Liang commented on HDFS-14142:
---

The checkstyle warning can be fixed by adding final keyword to the key string. 
I fixed and verified with checkstyle locally. I've committed to the feature 
branch, thanks [~shv] for the review! 

> Move ipfailover config key out of HdfsClientConfigKeys
> --
>
> Key: HDFS-14142
> URL: https://issues.apache.org/jira/browse/HDFS-14142
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Minor
> Fix For: HDFS-12943
>
> Attachments: HDFS-14142-HDFS-12943.001.patch
>
>
> Running TestHdfsConfigFields throws error complaining missing key 
> dfs.client.failover.ipfailover.virtual-address. Since this config key is 
> specific to only ORFPPwithIP, This Jira moves this config prefix to 
> ObserverReadProxyProviderWithIPFailover.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13617) Allow wrapping NN QOP into token in encrypted message

2018-12-10 Thread Chen Liang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-13617:
--
Status: Patch Available  (was: In Progress)

> Allow wrapping NN QOP into token in encrypted message
> -
>
> Key: HDFS-13617
> URL: https://issues.apache.org/jira/browse/HDFS-13617
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-13617.001.patch, HDFS-13617.002.patch, 
> HDFS-13617.003.patch, HDFS-13617.004.patch
>
>
> This Jira allows NN to configurably wrap the QOP it has established with the 
> client into the token message sent back to the client. The QOP is sent back 
> in encrypted message, using BlockAccessToken encryption key as the key.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13617) Allow wrapping NN QOP into token in encrypted message

2018-12-10 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715323#comment-16715323
 ] 

Chen Liang commented on HDFS-13617:
---

Have been busy with another project...coming back to this. Post v004 patch for 
rebase.

> Allow wrapping NN QOP into token in encrypted message
> -
>
> Key: HDFS-13617
> URL: https://issues.apache.org/jira/browse/HDFS-13617
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-13617.001.patch, HDFS-13617.002.patch, 
> HDFS-13617.003.patch, HDFS-13617.004.patch
>
>
> This Jira allows NN to configurably wrap the QOP it has established with the 
> client into the token message sent back to the client. The QOP is sent back 
> in encrypted message, using BlockAccessToken encryption key as the key.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13617) Allow wrapping NN QOP into token in encrypted message

2018-12-10 Thread Chen Liang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-13617:
--
Attachment: HDFS-13617.004.patch

> Allow wrapping NN QOP into token in encrypted message
> -
>
> Key: HDFS-13617
> URL: https://issues.apache.org/jira/browse/HDFS-13617
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-13617.001.patch, HDFS-13617.002.patch, 
> HDFS-13617.003.patch, HDFS-13617.004.patch
>
>
> This Jira allows NN to configurably wrap the QOP it has established with the 
> client into the token message sent back to the client. The QOP is sent back 
> in encrypted message, using BlockAccessToken encryption key as the key.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14146) Handle exception from internalQueueCall

2018-12-12 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719556#comment-16719556
 ] 

Chen Liang commented on HDFS-14146:
---

Thanks [~csun] for reporting! Interesting...in what situation did you get issue?

> Handle exception from internalQueueCall
> ---
>
> Key: HDFS-14146
> URL: https://issues.apache.org/jira/browse/HDFS-14146
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ipc
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Critical
> Attachments: HDFS-14146-HDFS-12943.000.patch
>
>
> When we re-queue RPC call, the {{internalQueueCall}} will potentially throw 
> exceptions (e.g., RPC backoff), which is then swallowed. This will cause the 
> RPC to be silently discarded without response to the client, which is not 
> good.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14146) Handle exception from internalQueueCall

2018-12-12 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719613#comment-16719613
 ] 

Chen Liang commented on HDFS-14146:
---

This is a good discussion and good point on the deadlock possibility. Even 
without considering the possible deadlock, handlers should probably never be 
introduced to this potentially block when requeuing requests. Just to clarify, 
so it seems to me we have two things to ensure about handler requeuing here: 1. 
it never blocks; 2. if requeuing throws an exception, it is handled as it 
should be. Thanks Chao for working on this critical issue!

> Handle exception from internalQueueCall
> ---
>
> Key: HDFS-14146
> URL: https://issues.apache.org/jira/browse/HDFS-14146
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ipc
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Critical
> Attachments: HDFS-14146-HDFS-12943.000.patch
>
>
> When we re-queue RPC call, the {{internalQueueCall}} will potentially throw 
> exceptions (e.g., RPC backoff), which is then swallowed. This will cause the 
> RPC to be silently discarded without response to the client, which is not 
> good.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14116) Fix a potential class cast error in ObserverReadProxyProvider

2018-12-12 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719657#comment-16719657
 ] 

Chen Liang commented on HDFS-14116:
---

Thanks for the update [~csun]! v002 patch looks pretty good overall. Just a 
couple minor comments:
1. in {{AbstractNNFailoverProxyProvider}}, can we change {{factory instanceof 
ClientHAProxyFactory}} to {{pi.proxy instance ClientProtocol}}? Which seems 
to me more clear and less assumption on ClientHAProxyFactory
2. maybe change the name {{clientProxy}} to something like 
{{serviceStateProxy}} for being more informative? We had something similar 
before.

> Fix a potential class cast error in ObserverReadProxyProvider
> -
>
> Key: HDFS-14116
> URL: https://issues.apache.org/jira/browse/HDFS-14116
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Chen Liang
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-14116-HDFS-12943.000.patch, 
> HDFS-14116-HDFS-12943.001.patch, HDFS-14116-HDFS-12943.002.patch
>
>
> Currently in {{ObserverReadProxyProvider}} constructor there is this line 
> {code}
> ((ClientHAProxyFactory) factory).setAlignmentContext(alignmentContext);
> {code}
> This could potentially cause failure, because it is possible that factory can 
> not be casted here. Specifically,  
> {{NameNodeProxiesClient.createFailoverProxyProvider}} is where the 
> constructor will be called, and there are two paths that could call into this:
> (1).{{NameNodeProxies.createProxy}}
> (2).{{NameNodeProxiesClient.createFailoverProxyProvider}}
> (2) works fine because it always uses {{ClientHAProxyFactory}} but (1) uses 
> {{NameNodeHAProxyFactory}} which can not be casted to 
> {{ClientHAProxyFactory}}, this happens when, for example, running 
> NNThroughputBenmarck. To fix this we can at least:
> 1. introduce setAlignmentContext to HAProxyFactory which is the parent of 
> both  ClientHAProxyFactory and NameNodeHAProxyFactory OR
> 2. only setAlignmentContext when it is ClientHAProxyFactory by, say, having a 
> if check with reflection. 
> Depending on whether it make sense to have alignment context for the case (1) 
> calling code paths.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14116) Fix a potential class cast error in ObserverReadProxyProvider

2018-12-13 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720720#comment-16720720
 ] 

Chen Liang commented on HDFS-14116:
---

Just one more trivial thing, can we fix at least the third one of the 
checkstyle warning? Which I think should be just add 'private' to the service 
proxy variable. +1 with this fixed. I've also run the failed tests locally and 
they all passed.

> Fix a potential class cast error in ObserverReadProxyProvider
> -
>
> Key: HDFS-14116
> URL: https://issues.apache.org/jira/browse/HDFS-14116
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Chen Liang
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-14116-HDFS-12943.000.patch, 
> HDFS-14116-HDFS-12943.001.patch, HDFS-14116-HDFS-12943.002.patch, 
> HDFS-14116-HDFS-12943.003.patch
>
>
> Currently in {{ObserverReadProxyProvider}} constructor there is this line 
> {code}
> ((ClientHAProxyFactory) factory).setAlignmentContext(alignmentContext);
> {code}
> This could potentially cause failure, because it is possible that factory can 
> not be casted here. Specifically,  
> {{NameNodeProxiesClient.createFailoverProxyProvider}} is where the 
> constructor will be called, and there are two paths that could call into this:
> (1).{{NameNodeProxies.createProxy}}
> (2).{{NameNodeProxiesClient.createFailoverProxyProvider}}
> (2) works fine because it always uses {{ClientHAProxyFactory}} but (1) uses 
> {{NameNodeHAProxyFactory}} which can not be casted to 
> {{ClientHAProxyFactory}}, this happens when, for example, running 
> NNThroughputBenmarck. To fix this we can at least:
> 1. introduce setAlignmentContext to HAProxyFactory which is the parent of 
> both  ClientHAProxyFactory and NameNodeHAProxyFactory OR
> 2. only setAlignmentContext when it is ClientHAProxyFactory by, say, having a 
> if check with reflection. 
> Depending on whether it make sense to have alignment context for the case (1) 
> calling code paths.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14116) ObserverReadProxyProvider should work with protocols other than ClientProtocol

2018-12-17 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723327#comment-16723327
 ] 

Chen Liang commented on HDFS-14116:
---

I had an offline discussion with [~shv]. So it looks like we were trying to 
resolve the cast exception in this Jira and ended up repurposing to extend 
ORFPP to support non-client-protocol. This may not be the right change because 
by design, Observer is meant to only for client protocol operations and nothing 
else. We do not design ORFPP for any other protocol so far to this point. So it 
might actually be the right thing to just throw exception if ORFPP is used for 
any protocol other than client protocol, and make changes to 
NNThroughputBenmark like in the v000 patch.

> ObserverReadProxyProvider should work with protocols other than ClientProtocol
> --
>
> Key: HDFS-14116
> URL: https://issues.apache.org/jira/browse/HDFS-14116
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Chen Liang
>Assignee: Chao Sun
>Priority: Major
> Fix For: HDFS-12943
>
> Attachments: HDFS-14116-HDFS-12943.000.patch, 
> HDFS-14116-HDFS-12943.001.patch, HDFS-14116-HDFS-12943.002.patch, 
> HDFS-14116-HDFS-12943.003.patch, HDFS-14116-HDFS-12943.004.patch
>
>
> Currently in {{ObserverReadProxyProvider}} constructor there is this line 
> {code}
> ((ClientHAProxyFactory) factory).setAlignmentContext(alignmentContext);
> {code}
> This could potentially cause failure, because it is possible that factory can 
> not be casted here. Specifically,  
> {{NameNodeProxiesClient.createFailoverProxyProvider}} is where the 
> constructor will be called, and there are two paths that could call into this:
> (1).{{NameNodeProxies.createProxy}}
> (2).{{NameNodeProxiesClient.createFailoverProxyProvider}}
> (2) works fine because it always uses {{ClientHAProxyFactory}} but (1) uses 
> {{NameNodeHAProxyFactory}} which can not be casted to 
> {{ClientHAProxyFactory}}, this happens when, for example, running 
> NNThroughputBenmarck. To fix this we can at least:
> 1. introduce setAlignmentContext to HAProxyFactory which is the parent of 
> both  ClientHAProxyFactory and NameNodeHAProxyFactory OR
> 2. only setAlignmentContext when it is ClientHAProxyFactory by, say, having a 
> if check with reflection. 
> Depending on whether it make sense to have alignment context for the case (1) 
> calling code paths.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-12943) Consistent Reads from Standby Node

2018-12-17 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723272#comment-16723272
 ] 

Chen Liang edited comment on HDFS-12943 at 12/17/18 10:45 PM:
--

Hi [~brahmareddy],

Thanks for testing! The timeout issue seems interesting. To start with, it is 
expected to see some performance degradation *from CLI*, because CLI initiates 
a DFSClient every time for each command, a fresh DFSClient has to get status of 
name nodes every time. But if it is the same DFSClient being reused, this would 
not be an issue. I have never seen the second-call issue. Here is an output 
from our cluster (log outpu part omitted), and I think you are right about 
lowering dfs.ha.tail-edits.period, we had similar numbers here:
{code:java}
$time hdfs --loglevel debug dfs 
-Ddfs.client.failover.proxy.provider.***=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider
 -mkdir /TestsORF1
real0m2.254s
user0m3.608s
sys 0m0.331s
$time hdfs --loglevel debug dfs 
-Ddfs.client.failover.proxy.provider.***=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider
 -mkdir /TestsORF2
real0m2.159s
user0m3.855s
sys 0m0.330s{code}
Curious, how many NN you had in the testing? and was there any error from NN 
logs?


was (Author: vagarychen):
Hi [~brahmareddy],

Thanks for testing! The timeout issue seems interesting. To start with, it is 
expected to see some performance degradation *from CLI*, because CLI initiates 
a DFSClient every time for each command, a fresh DFSClient has to get status of 
name nodes every time. But if it is the same DFSClient being reused, this would 
not be an issue. I have never seen the second-call issue. Here is an output 
from our cluster (log outpu part omitted), and I think you are right about 
lowering dfs.ha.tail-edits.period, we had similar numbers here:
{code:java}
$time hdfs --loglevel debug dfs 
-Ddfs.client.failover.proxy.provider.ltx1-unonn01=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider
 -mkdir /TestsORF1
real0m2.254s
user0m3.608s
sys 0m0.331s
$time hdfs --loglevel debug dfs 
-Ddfs.client.failover.proxy.provider.ltx1-unonn01=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider
 -mkdir /TestsORF2
real0m2.159s
user0m3.855s
sys 0m0.330s{code}
 ** Curious, how many NN you had in the testing? and was there any error from 
NN logs?

> Consistent Reads from Standby Node
> --
>
> Key: HDFS-12943
> URL: https://issues.apache.org/jira/browse/HDFS-12943
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs
>Reporter: Konstantin Shvachko
>Priority: Major
> Attachments: ConsistentReadsFromStandbyNode.pdf, 
> ConsistentReadsFromStandbyNode.pdf, HDFS-12943-001.patch, 
> HDFS-12943-002.patch, TestPlan-ConsistentReadsFromStandbyNode.pdf
>
>
> StandbyNode in HDFS is a replica of the active NameNode. The states of the 
> NameNodes are coordinated via the journal. It is natural to consider 
> StandbyNode as a read-only replica. As with any replicated distributed system 
> the problem of stale reads should be resolved. Our main goal is to provide 
> reads from standby in a consistent way in order to enable a wide range of 
> existing applications running on top of HDFS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14116) ObserverReadProxyProvider should work with protocols other than ClientProtocol

2018-12-17 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723521#comment-16723521
 ] 

Chen Liang commented on HDFS-14116:
---

Thanks [~shv] for the patch! +1 from me on v005 patch

> ObserverReadProxyProvider should work with protocols other than ClientProtocol
> --
>
> Key: HDFS-14116
> URL: https://issues.apache.org/jira/browse/HDFS-14116
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Chen Liang
>Assignee: Chao Sun
>Priority: Major
> Fix For: HDFS-12943
>
> Attachments: HDFS-14116-HDFS-12943.000.patch, 
> HDFS-14116-HDFS-12943.001.patch, HDFS-14116-HDFS-12943.002.patch, 
> HDFS-14116-HDFS-12943.003.patch, HDFS-14116-HDFS-12943.004.patch, 
> HDFS-14116-HDFS-12943.005.patch
>
>
> Currently in {{ObserverReadProxyProvider}} constructor there is this line 
> {code}
> ((ClientHAProxyFactory) factory).setAlignmentContext(alignmentContext);
> {code}
> This could potentially cause failure, because it is possible that factory can 
> not be casted here. Specifically,  
> {{NameNodeProxiesClient.createFailoverProxyProvider}} is where the 
> constructor will be called, and there are two paths that could call into this:
> (1).{{NameNodeProxies.createProxy}}
> (2).{{NameNodeProxiesClient.createFailoverProxyProvider}}
> (2) works fine because it always uses {{ClientHAProxyFactory}} but (1) uses 
> {{NameNodeHAProxyFactory}} which can not be casted to 
> {{ClientHAProxyFactory}}, this happens when, for example, running 
> NNThroughputBenmarck. To fix this we can at least:
> 1. introduce setAlignmentContext to HAProxyFactory which is the parent of 
> both  ClientHAProxyFactory and NameNodeHAProxyFactory OR
> 2. only setAlignmentContext when it is ClientHAProxyFactory by, say, having a 
> if check with reflection. 
> Depending on whether it make sense to have alignment context for the case (1) 
> calling code paths.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12943) Consistent Reads from Standby Node

2018-12-17 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723272#comment-16723272
 ] 

Chen Liang commented on HDFS-12943:
---

Hi [~brahmareddy],

Thanks for testing! The timeout issue seems interesting. To start with, it is 
expected to see some performance degradation *from CLI*, because CLI initiates 
a DFSClient every time for each command, a fresh DFSClient has to get status of 
name nodes every time. But if it is the same DFSClient being reused, this would 
not be an issue. I have never seen the second-call issue. Here is an output 
from our cluster (log outpu part omitted), and I think you are right about 
lowering dfs.ha.tail-edits.period, we had similar numbers here:
{code:java}
$time hdfs --loglevel debug dfs 
-Ddfs.client.failover.proxy.provider.ltx1-unonn01=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider
 -mkdir /TestsORF1
real0m2.254s
user0m3.608s
sys 0m0.331s
$time hdfs --loglevel debug dfs 
-Ddfs.client.failover.proxy.provider.ltx1-unonn01=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider
 -mkdir /TestsORF2
real0m2.159s
user0m3.855s
sys 0m0.330s{code}
 ** Curious, how many NN you had in the testing? and was there any error from 
NN logs?

> Consistent Reads from Standby Node
> --
>
> Key: HDFS-12943
> URL: https://issues.apache.org/jira/browse/HDFS-12943
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs
>Reporter: Konstantin Shvachko
>Priority: Major
> Attachments: ConsistentReadsFromStandbyNode.pdf, 
> ConsistentReadsFromStandbyNode.pdf, HDFS-12943-001.patch, 
> HDFS-12943-002.patch, TestPlan-ConsistentReadsFromStandbyNode.pdf
>
>
> StandbyNode in HDFS is a replica of the active NameNode. The states of the 
> NameNodes are coordinated via the journal. It is natural to consider 
> StandbyNode as a read-only replica. As with any replicated distributed system 
> the problem of stale reads should be resolved. Our main goal is to provide 
> reads from standby in a consistent way in order to enable a wide range of 
> existing applications running on top of HDFS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14116) Fix a potential class cast error in ObserverReadProxyProvider

2018-12-10 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715883#comment-16715883
 ] 

Chen Liang commented on HDFS-14116:
---

I applied the patch and reran NNThroughtputBenchmark. The run did work but I 
had to change fs.defaultFS from name service ID to virtual IP address. I was 
trying to understand why. [~csun] I wonder, is there a specific reason why the 
patch uses createNonHAProxy, how about createProxy? I can try change this and 
see if it works for name service ID.

> Fix a potential class cast error in ObserverReadProxyProvider
> -
>
> Key: HDFS-14116
> URL: https://issues.apache.org/jira/browse/HDFS-14116
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Chen Liang
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-14116-HDFS-12943.000.patch
>
>
> Currently in {{ObserverReadProxyProvider}} constructor there is this line 
> {code}
> ((ClientHAProxyFactory) factory).setAlignmentContext(alignmentContext);
> {code}
> This could potentially cause failure, because it is possible that factory can 
> not be casted here. Specifically,  
> {{NameNodeProxiesClient.createFailoverProxyProvider}} is where the 
> constructor will be called, and there are two paths that could call into this:
> (1).{{NameNodeProxies.createProxy}}
> (2).{{NameNodeProxiesClient.createFailoverProxyProvider}}
> (2) works fine because it always uses {{ClientHAProxyFactory}} but (1) uses 
> {{NameNodeHAProxyFactory}} which can not be casted to 
> {{ClientHAProxyFactory}}, this happens when, for example, running 
> NNThroughputBenmarck. To fix this we can at least:
> 1. introduce setAlignmentContext to HAProxyFactory which is the parent of 
> both  ClientHAProxyFactory and NameNodeHAProxyFactory OR
> 2. only setAlignmentContext when it is ClientHAProxyFactory by, say, having a 
> if check with reflection. 
> Depending on whether it make sense to have alignment context for the case (1) 
> calling code paths.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14116) Fix a potential class cast error in ObserverReadProxyProvider

2018-12-10 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715883#comment-16715883
 ] 

Chen Liang edited comment on HDFS-14116 at 12/11/18 1:04 AM:
-

I applied the patch and reran NNThroughtputBenchmark. The run did work but I 
had to change fs.defaultFS from name service ID to virtual IP address. I was 
trying to understand why. [~csun] I wonder, is there a specific reason why the 
patch uses createNonHAProxy, how about createProxy? I haven't looked into 
detail, not sure if createProxy actually works here though.


was (Author: vagarychen):
I applied the patch and reran NNThroughtputBenchmark. The run did work but I 
had to change fs.defaultFS from name service ID to virtual IP address. I was 
trying to understand why. [~csun] I wonder, is there a specific reason why the 
patch uses createNonHAProxy, how about createProxy? I can try change this and 
see if it works for name service ID.

> Fix a potential class cast error in ObserverReadProxyProvider
> -
>
> Key: HDFS-14116
> URL: https://issues.apache.org/jira/browse/HDFS-14116
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Chen Liang
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-14116-HDFS-12943.000.patch
>
>
> Currently in {{ObserverReadProxyProvider}} constructor there is this line 
> {code}
> ((ClientHAProxyFactory) factory).setAlignmentContext(alignmentContext);
> {code}
> This could potentially cause failure, because it is possible that factory can 
> not be casted here. Specifically,  
> {{NameNodeProxiesClient.createFailoverProxyProvider}} is where the 
> constructor will be called, and there are two paths that could call into this:
> (1).{{NameNodeProxies.createProxy}}
> (2).{{NameNodeProxiesClient.createFailoverProxyProvider}}
> (2) works fine because it always uses {{ClientHAProxyFactory}} but (1) uses 
> {{NameNodeHAProxyFactory}} which can not be casted to 
> {{ClientHAProxyFactory}}, this happens when, for example, running 
> NNThroughputBenmarck. To fix this we can at least:
> 1. introduce setAlignmentContext to HAProxyFactory which is the parent of 
> both  ClientHAProxyFactory and NameNodeHAProxyFactory OR
> 2. only setAlignmentContext when it is ClientHAProxyFactory by, say, having a 
> if check with reflection. 
> Depending on whether it make sense to have alignment context for the case (1) 
> calling code paths.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13617) Allow wrapping NN QOP into token in encrypted message

2018-12-11 Thread Chen Liang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-13617:
--
Attachment: HDFS-13617.005.patch

> Allow wrapping NN QOP into token in encrypted message
> -
>
> Key: HDFS-13617
> URL: https://issues.apache.org/jira/browse/HDFS-13617
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-13617.001.patch, HDFS-13617.002.patch, 
> HDFS-13617.003.patch, HDFS-13617.004.patch, HDFS-13617.005.patch
>
>
> This Jira allows NN to configurably wrap the QOP it has established with the 
> client into the token message sent back to the client. The QOP is sent back 
> in encrypted message, using BlockAccessToken encryption key as the key.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13617) Allow wrapping NN QOP into token in encrypted message

2018-12-11 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16718001#comment-16718001
 ] 

Chen Liang commented on HDFS-13617:
---

Fix the checkstyle, javac, findbug warnings in v005 patch.

> Allow wrapping NN QOP into token in encrypted message
> -
>
> Key: HDFS-13617
> URL: https://issues.apache.org/jira/browse/HDFS-13617
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-13617.001.patch, HDFS-13617.002.patch, 
> HDFS-13617.003.patch, HDFS-13617.004.patch, HDFS-13617.005.patch
>
>
> This Jira allows NN to configurably wrap the QOP it has established with the 
> client into the token message sent back to the client. The QOP is sent back 
> in encrypted message, using BlockAccessToken encryption key as the key.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13873) ObserverNode should reject read requests when it is too far behind.

2018-12-11 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717750#comment-16717750
 ] 

Chen Liang commented on HDFS-13873:
---

I guess there are (at least) two directions: estimate-based and timeout-based.

Timeout-based is simple: if a call has stuck in the queue for too long (> X 
sec), reject it. The upside is simple, while the downside is that some resource 
gets wasted: client wait has already happened, the server queue slot also has 
been occupied for X sec. 

Estimate-based is basically, upon seeing the request, Observer make a guess 
whether this will wait for too long, if so, reject at this point of time. 
Upside is that no wait or queuing, immediate rejection. Downside is that the 
estimate better be correct and reasonable, and leaving no wholes for bad client 
to leverage...

We can have some abstractions to allow different reject policies to be plugged 
in, and we can even have a combination of both: on seeing the requests, make an 
estimate, but even if the estimate is inaccurate and the requests pass, timeout 
still makes sure that the requests won't stay in the queue indefinitely. For 
now, we can have both Konstantin and Chao's strategies combined to start with.

> ObserverNode should reject read requests when it is too far behind.
> ---
>
> Key: HDFS-13873
> URL: https://issues.apache.org/jira/browse/HDFS-13873
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client, namenode
>Affects Versions: HDFS-12943
>Reporter: Konstantin Shvachko
>Assignee: Chao Sun
>Priority: Major
>
> Add a server-side threshold for ObserverNode to reject read requests when it 
> is too far behind.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14142) Move ipfailover config key out of HdfsClientConfigKeys

2018-12-11 Thread Chen Liang (JIRA)
Chen Liang created HDFS-14142:
-

 Summary: Move ipfailover config key out of HdfsClientConfigKeys
 Key: HDFS-14142
 URL: https://issues.apache.org/jira/browse/HDFS-14142
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Chen Liang
Assignee: Chen Liang


Running TestHdfsConfigFields throws error complaining missing key 
dfs.client.failover.ipfailover.virtual-address. Since this config key is 
specific to only ORFPPwithIP, This Jira moves this config prefix to 
ObserverReadProxyProviderWithIPFailover.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14142) Move ipfailover config key out of HdfsClientConfigKeys

2018-12-11 Thread Chen Liang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-14142:
--
Attachment: HDFS-14142-HDFS-12943.001.patch

> Move ipfailover config key out of HdfsClientConfigKeys
> --
>
> Key: HDFS-14142
> URL: https://issues.apache.org/jira/browse/HDFS-14142
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Minor
> Attachments: HDFS-14142-HDFS-12943.001.patch
>
>
> Running TestHdfsConfigFields throws error complaining missing key 
> dfs.client.failover.ipfailover.virtual-address. Since this config key is 
> specific to only ORFPPwithIP, This Jira moves this config prefix to 
> ObserverReadProxyProviderWithIPFailover.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14142) Move ipfailover config key out of HdfsClientConfigKeys

2018-12-11 Thread Chen Liang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-14142:
--
Status: Patch Available  (was: Open)

> Move ipfailover config key out of HdfsClientConfigKeys
> --
>
> Key: HDFS-14142
> URL: https://issues.apache.org/jira/browse/HDFS-14142
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Minor
> Attachments: HDFS-14142-HDFS-12943.001.patch
>
>
> Running TestHdfsConfigFields throws error complaining missing key 
> dfs.client.failover.ipfailover.virtual-address. Since this config key is 
> specific to only ORFPPwithIP, This Jira moves this config prefix to 
> ObserverReadProxyProviderWithIPFailover.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14138) Description errors in the comparison logic of transaction ID

2018-12-13 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720798#comment-16720798
 ] 

Chen Liang commented on HDFS-14138:
---

Hey [~xiangheng] thanks for looking through the code! I missed this JIra, you 
can file this Jira as subtask of HDFS-12943, it would be a lot easier for us to 
realize and track :). This is indeed a typo by me, but I think the ongoing 
HDFS-14146 is fixing this also.

> Description errors in the comparison logic of transaction ID
> 
>
> Key: HDFS-14138
> URL: https://issues.apache.org/jira/browse/HDFS-14138
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: HDFS-12943
>Reporter: xiangheng
>Priority: Minor
> Attachments: HDFS-14138-HDFS-12943.000.patch
>
>
> The call processing should be postponed until the client call's state id is 
> aligned (<=) with the server state id,not >=.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14138) Description errors in the comparison logic of transaction ID

2018-12-14 Thread Chen Liang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-14138:
--
   Resolution: Fixed
Fix Version/s: HDFS-12943
   Status: Resolved  (was: Patch Available)

> Description errors in the comparison logic of transaction ID
> 
>
> Key: HDFS-14138
> URL: https://issues.apache.org/jira/browse/HDFS-14138
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-12943
>Reporter: xiangheng
>Assignee: xiangheng
>Priority: Minor
> Fix For: HDFS-12943
>
> Attachments: HDFS-14138-HDFS-12943.000.patch
>
>
> The call processing should be postponed until the client call's state id is 
> aligned (<=) with the server state id,not >=.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14138) Description errors in the comparison logic of transaction ID

2018-12-14 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721786#comment-16721786
 ] 

Chen Liang commented on HDFS-14138:
---

Thanks [~csun] for chiming in! I've committed the v000 patch to the feature 
branch, thanks [~xiangheng] for the contribution!

> Description errors in the comparison logic of transaction ID
> 
>
> Key: HDFS-14138
> URL: https://issues.apache.org/jira/browse/HDFS-14138
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-12943
>Reporter: xiangheng
>Assignee: xiangheng
>Priority: Minor
> Fix For: HDFS-12943
>
> Attachments: HDFS-14138-HDFS-12943.000.patch
>
>
> The call processing should be postponed until the client call's state id is 
> aligned (<=) with the server state id,not >=.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12943) Consistent Reads from Standby Node

2018-12-19 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725379#comment-16725379
 ] 

Chen Liang commented on HDFS-12943:
---

Hi [~brahmareddy],

Some more notes to add:
1. getHAServiceState() only gets called when initialization of client proxies 
(and of course when existing proxies failed and client reinitialize them). In 
regular operation, this call will not happen so it should not be a concern in 
benchmarks.
2. I tried the unit test you shared locally with Observer read 
enabled/disabled. I did not see difference in terms of mkdir time, it has been 
about 2ms the whole time regardless. I saw some degradation on get content 
summary though. But this is due to that the unit test is doing mkdir -> 
getContentSummary -> getFileStatus -> repeat. So the client is constantly 
switching between write and read, and thus constantly switching between 
proxies(NNs). This is not the IO pattern Observer is mainly targeting for, and 
probably the worst case for Observer read because every single 
getContentSummary call here could potentially trigger Observer catch up wait.

> Consistent Reads from Standby Node
> --
>
> Key: HDFS-12943
> URL: https://issues.apache.org/jira/browse/HDFS-12943
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs
>Reporter: Konstantin Shvachko
>Priority: Major
> Attachments: ConsistentReadsFromStandbyNode.pdf, 
> ConsistentReadsFromStandbyNode.pdf, HDFS-12943-001.patch, 
> HDFS-12943-002.patch, TestPlan-ConsistentReadsFromStandbyNode.pdf
>
>
> StandbyNode in HDFS is a replica of the active NameNode. The states of the 
> NameNodes are coordinated via the journal. It is natural to consider 
> StandbyNode as a read-only replica. As with any replicated distributed system 
> the problem of stale reads should be resolved. Our main goal is to provide 
> reads from standby in a consistent way in order to enable a wide range of 
> existing applications running on top of HDFS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-14138) Description errors in the comparison logic of transaction ID

2018-12-14 Thread Chen Liang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang reassigned HDFS-14138:
-

Assignee: xiangheng

> Description errors in the comparison logic of transaction ID
> 
>
> Key: HDFS-14138
> URL: https://issues.apache.org/jira/browse/HDFS-14138
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: HDFS-12943
>Reporter: xiangheng
>Assignee: xiangheng
>Priority: Minor
> Attachments: HDFS-14138-HDFS-12943.000.patch
>
>
> The call processing should be postponed until the client call's state id is 
> aligned (<=) with the server state id,not >=.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14138) Description errors in the comparison logic of transaction ID

2018-12-14 Thread Chen Liang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-14138:
--
Issue Type: Sub-task  (was: Bug)
Parent: HDFS-12943

> Description errors in the comparison logic of transaction ID
> 
>
> Key: HDFS-14138
> URL: https://issues.apache.org/jira/browse/HDFS-14138
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-12943
>Reporter: xiangheng
>Assignee: xiangheng
>Priority: Minor
> Attachments: HDFS-14138-HDFS-12943.000.patch
>
>
> The call processing should be postponed until the client call's state id is 
> aligned (<=) with the server state id,not >=.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14138) Description errors in the comparison logic of transaction ID

2018-12-14 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721657#comment-16721657
 ] 

Chen Liang commented on HDFS-14138:
---

Hi [~xiangheng], absolutely no worries, and welcome to the community! :) I've 
asked a Jira admin from our team to add you to contributor. From this point on, 
you should be able to (and feel free to) assign a Hadoop/HDFS Jira you work on 
to yourself by setting assignee of the Jira. And just for reference, by 
convention, if a Jira is found to be part of another one (which happens fairly 
often), you can just close the Jira, marking duplicate.

> Description errors in the comparison logic of transaction ID
> 
>
> Key: HDFS-14138
> URL: https://issues.apache.org/jira/browse/HDFS-14138
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: HDFS-12943
>Reporter: xiangheng
>Assignee: xiangheng
>Priority: Minor
> Attachments: HDFS-14138-HDFS-12943.000.patch
>
>
> The call processing should be postponed until the client call's state id is 
> aligned (<=) with the server state id,not >=.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14116) ObserverReadProxyProvider should work with protocols other than ClientProtocol

2018-12-14 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721686#comment-16721686
 ] 

Chen Liang commented on HDFS-14116:
---

I've committed v004 patch to the feature branch, thanks [~csun] for the 
contribution!

> ObserverReadProxyProvider should work with protocols other than ClientProtocol
> --
>
> Key: HDFS-14116
> URL: https://issues.apache.org/jira/browse/HDFS-14116
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Chen Liang
>Assignee: Chao Sun
>Priority: Major
> Fix For: HDFS-12943
>
> Attachments: HDFS-14116-HDFS-12943.000.patch, 
> HDFS-14116-HDFS-12943.001.patch, HDFS-14116-HDFS-12943.002.patch, 
> HDFS-14116-HDFS-12943.003.patch, HDFS-14116-HDFS-12943.004.patch
>
>
> Currently in {{ObserverReadProxyProvider}} constructor there is this line 
> {code}
> ((ClientHAProxyFactory) factory).setAlignmentContext(alignmentContext);
> {code}
> This could potentially cause failure, because it is possible that factory can 
> not be casted here. Specifically,  
> {{NameNodeProxiesClient.createFailoverProxyProvider}} is where the 
> constructor will be called, and there are two paths that could call into this:
> (1).{{NameNodeProxies.createProxy}}
> (2).{{NameNodeProxiesClient.createFailoverProxyProvider}}
> (2) works fine because it always uses {{ClientHAProxyFactory}} but (1) uses 
> {{NameNodeHAProxyFactory}} which can not be casted to 
> {{ClientHAProxyFactory}}, this happens when, for example, running 
> NNThroughputBenmarck. To fix this we can at least:
> 1. introduce setAlignmentContext to HAProxyFactory which is the parent of 
> both  ClientHAProxyFactory and NameNodeHAProxyFactory OR
> 2. only setAlignmentContext when it is ClientHAProxyFactory by, say, having a 
> if check with reflection. 
> Depending on whether it make sense to have alignment context for the case (1) 
> calling code paths.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14116) ObserverReadProxyProvider should work with protocols other than ClientProtocol

2018-12-14 Thread Chen Liang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-14116:
--
   Resolution: Fixed
Fix Version/s: HDFS-12943
   Status: Resolved  (was: Patch Available)

> ObserverReadProxyProvider should work with protocols other than ClientProtocol
> --
>
> Key: HDFS-14116
> URL: https://issues.apache.org/jira/browse/HDFS-14116
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Chen Liang
>Assignee: Chao Sun
>Priority: Major
> Fix For: HDFS-12943
>
> Attachments: HDFS-14116-HDFS-12943.000.patch, 
> HDFS-14116-HDFS-12943.001.patch, HDFS-14116-HDFS-12943.002.patch, 
> HDFS-14116-HDFS-12943.003.patch, HDFS-14116-HDFS-12943.004.patch
>
>
> Currently in {{ObserverReadProxyProvider}} constructor there is this line 
> {code}
> ((ClientHAProxyFactory) factory).setAlignmentContext(alignmentContext);
> {code}
> This could potentially cause failure, because it is possible that factory can 
> not be casted here. Specifically,  
> {{NameNodeProxiesClient.createFailoverProxyProvider}} is where the 
> constructor will be called, and there are two paths that could call into this:
> (1).{{NameNodeProxies.createProxy}}
> (2).{{NameNodeProxiesClient.createFailoverProxyProvider}}
> (2) works fine because it always uses {{ClientHAProxyFactory}} but (1) uses 
> {{NameNodeHAProxyFactory}} which can not be casted to 
> {{ClientHAProxyFactory}}, this happens when, for example, running 
> NNThroughputBenmarck. To fix this we can at least:
> 1. introduce setAlignmentContext to HAProxyFactory which is the parent of 
> both  ClientHAProxyFactory and NameNodeHAProxyFactory OR
> 2. only setAlignmentContext when it is ClientHAProxyFactory by, say, having a 
> if check with reflection. 
> Depending on whether it make sense to have alignment context for the case (1) 
> calling code paths.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14116) Fix a potential class cast error in ObserverReadProxyProvider

2018-11-29 Thread Chen Liang (JIRA)
Chen Liang created HDFS-14116:
-

 Summary: Fix a potential class cast error in 
ObserverReadProxyProvider
 Key: HDFS-14116
 URL: https://issues.apache.org/jira/browse/HDFS-14116
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Chen Liang


Currently in {{ObserverReadProxyProvider}} constructor there is this line 
{code}
((ClientHAProxyFactory) factory).setAlignmentContext(alignmentContext);
{code}
This could potentially cause failure, because it is possible that factory can 
not be casted here. Specifically,  
{{NameNodeProxiesClient.createFailoverProxyProvider}} is where the constructor 
will be called, and there are two paths that could call into this:
(1).{{NameNodeProxies.createProxy}}
(2).{{NameNodeProxiesClient.createFailoverProxyProvider}}

(2) works fine because it always uses {{ClientHAProxyFactory}} but (1) uses 
{{NameNodeHAProxyFactory}} which can not be casted to {{ClientHAProxyFactory}}, 
this happens when, for example, running NNThroughputBenmarck. To fix this we 
can at least:
1. introduce setAlignmentContext to HAProxyFactory which is the parent of both  
ClientHAProxyFactory and NameNodeHAProxyFactory OR
2. only setAlignmentContext when it is ClientHAProxyFactory by, say, having a 
if check with reflection. 
Depending on whether it make sense to have alignment context for the case (1) 
calling code paths.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13547) Add ingress port based sasl resolver

2018-11-29 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16703810#comment-16703810
 ] 

Chen Liang commented on HDFS-13547:
---

Thanks for checking [~vinodkv]! Will commit to 3.1.1 to branch-3 soon.

> Add ingress port based sasl resolver
> 
>
> Key: HDFS-13547
> URL: https://issues.apache.org/jira/browse/HDFS-13547
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: security
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: HDFS-13547.001.patch, HDFS-13547.002.patch, 
> HDFS-13547.003.patch, HDFS-13547.004.patch
>
>
> This Jira extends the SASL properties resolver interface to take an ingress 
> port parameter, and also adds an implementation based on this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14116) Fix a potential class cast error in ObserverReadProxyProvider

2018-11-29 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16703877#comment-16703877
 ] 

Chen Liang commented on HDFS-14116:
---

Thanks [~csun]! Yeah I can another look later. Posting error stack trace when 
running NNThroughput here for record:
{code}
Caused by: java.lang.ClassCastException: 
org.apache.hadoop.hdfs.server.namenode.ha.NameNodeHAProxyFactory cannot be cast 
to org.apache.hadoop.hdfs.server.namenode.ha.ClientHAProxyFactory
at 
org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider.(ObserverReadProxyProvider.java:118)
at 
org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProviderWithIPFailover.(ObserverReadProxyProviderWithIPFailover.java:99)
at 
org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProviderWithIPFailover.(ObserverReadProxyProviderWithIPFailover.java:86)
... 12 more
{code}

{code}
Exception in thread "main" java.io.IOException: Couldn't create proxy provider 
class 
org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProviderWithIPFailover
at 
org.apache.hadoop.hdfs.NameNodeProxiesClient.createFailoverProxyProvider(NameNodeProxiesClient.java:261)
at 
org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:115)
at 
org.apache.hadoop.hdfs.DFSTestUtil.getRefreshUserMappingsProtocolProxy(DFSTestUtil.java:2022)
at 
org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark.run(NNThroughputBenchmark.java:1524)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
at 
org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark.runBenchmark(NNThroughputBenchmark.java:1432)
at 
org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark.main(NNThroughputBenchmark.java:1552)
{code}

> Fix a potential class cast error in ObserverReadProxyProvider
> -
>
> Key: HDFS-14116
> URL: https://issues.apache.org/jira/browse/HDFS-14116
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Chen Liang
>Priority: Major
>
> Currently in {{ObserverReadProxyProvider}} constructor there is this line 
> {code}
> ((ClientHAProxyFactory) factory).setAlignmentContext(alignmentContext);
> {code}
> This could potentially cause failure, because it is possible that factory can 
> not be casted here. Specifically,  
> {{NameNodeProxiesClient.createFailoverProxyProvider}} is where the 
> constructor will be called, and there are two paths that could call into this:
> (1).{{NameNodeProxies.createProxy}}
> (2).{{NameNodeProxiesClient.createFailoverProxyProvider}}
> (2) works fine because it always uses {{ClientHAProxyFactory}} but (1) uses 
> {{NameNodeHAProxyFactory}} which can not be casted to 
> {{ClientHAProxyFactory}}, this happens when, for example, running 
> NNThroughputBenmarck. To fix this we can at least:
> 1. introduce setAlignmentContext to HAProxyFactory which is the parent of 
> both  ClientHAProxyFactory and NameNodeHAProxyFactory OR
> 2. only setAlignmentContext when it is ClientHAProxyFactory by, say, having a 
> if check with reflection. 
> Depending on whether it make sense to have alignment context for the case (1) 
> calling code paths.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-13547) Add ingress port based sasl resolver

2018-11-29 Thread Chen Liang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang resolved HDFS-13547.
---
   Resolution: Fixed
Fix Version/s: 3.1.1

> Add ingress port based sasl resolver
> 
>
> Key: HDFS-13547
> URL: https://issues.apache.org/jira/browse/HDFS-13547
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: security
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Fix For: 3.2.0, 3.1.1
>
> Attachments: HDFS-13547.001.patch, HDFS-13547.002.patch, 
> HDFS-13547.003.patch, HDFS-13547.004.patch
>
>
> This Jira extends the SASL properties resolver interface to take an ingress 
> port parameter, and also adds an implementation based on this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13547) Add ingress port based sasl resolver

2018-11-29 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16703845#comment-16703845
 ] 

Chen Liang commented on HDFS-13547:
---

Committed v004 patch to branch-3 and branch-3.1.1.

> Add ingress port based sasl resolver
> 
>
> Key: HDFS-13547
> URL: https://issues.apache.org/jira/browse/HDFS-13547
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: security
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: HDFS-13547.001.patch, HDFS-13547.002.patch, 
> HDFS-13547.003.patch, HDFS-13547.004.patch
>
>
> This Jira extends the SASL properties resolver interface to take an ingress 
> port parameter, and also adds an implementation based on this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14120) ORFPP should also clone DT for the virtual IP

2018-11-30 Thread Chen Liang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-14120:
--
Description: Currently with HDFS-14017, ORFPP behaves the similar way on 
handling delegation as ConfiguredFailoverProxyProvider. Specifically, given the 
delegation token associated with name service ID, it clones the DTs for all the 
corresponding physical addresses. But ORFPPwIP requires more work than CFPP in 
the sense that it also leverages VIP address for failover, meaning in addition 
to cloning DT for physical addresses, ORFPPwIP also needs to clone DT for the 
VIP address, which is missed from HDFS-14017. This is specific to ORFPPwIP, 
should not affect ORFPP.  (was: Currently with HDFS-14017, ORFPP behaves the 
similar way on handling delegation as ConfiguredFailoverProxyProvider. 
Specifically, given the delegation token associated with name service ID, it 
clones the DTs for all the corresponding physical addresses. But ORFPP requires 
more work than CFPP in the sense that it also leverages VIP address for 
failover, meaning in addition to cloning DT for physical addresses, ORFPP also 
needs to clone DT for the VIP address, which is missed from HDFS-14017.)

> ORFPP should also clone DT for the virtual IP
> -
>
> Key: HDFS-14120
> URL: https://issues.apache.org/jira/browse/HDFS-14120
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-12943
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
>
> Currently with HDFS-14017, ORFPP behaves the similar way on handling 
> delegation as ConfiguredFailoverProxyProvider. Specifically, given the 
> delegation token associated with name service ID, it clones the DTs for all 
> the corresponding physical addresses. But ORFPPwIP requires more work than 
> CFPP in the sense that it also leverages VIP address for failover, meaning in 
> addition to cloning DT for physical addresses, ORFPPwIP also needs to clone 
> DT for the VIP address, which is missed from HDFS-14017. This is specific to 
> ORFPPwIP, should not affect ORFPP.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14120) ORFPP should also clone DT for the virtual IP

2018-11-30 Thread Chen Liang (JIRA)
Chen Liang created HDFS-14120:
-

 Summary: ORFPP should also clone DT for the virtual IP
 Key: HDFS-14120
 URL: https://issues.apache.org/jira/browse/HDFS-14120
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-12943
Reporter: Chen Liang
Assignee: Chen Liang


Currently with HDFS-14017, ORFPP behaves the similar way on handling delegation 
as ConfiguredFailoverProxyProvider. Specifically, given the delegation token 
associated with name service ID, it clones the DTs for all the corresponding 
physical addresses. But ORFPP requires more work than CFPP in the sense that it 
also leverages VIP address for failover, meaning in addition to cloning DT for 
physical addresses, ORFPP also needs to clone DT for the VIP address, which is 
missed from HDFS-14017.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14126) DataNode DirectoryScanner holding global lock for too long

2018-12-04 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16709085#comment-16709085
 ] 

Chen Liang commented on HDFS-14126:
---

Thanks for reporting [~jojochuang]. I have not seen this issue though. I just 
randomly checked several DNs in our 3.1 cluster, with number of blocks from 
330K to 1024K, I did not see this exception.

> DataNode DirectoryScanner holding global lock for too long
> --
>
> Key: HDFS-14126
> URL: https://issues.apache.org/jira/browse/HDFS-14126
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Wei-Chiu Chuang
>Priority: Major
>
> I've got a Hadoop 3 based cluster set up, and this DN has just 434 thousand 
> blocks.
> And yet, DirectoryScanner holds the fsdataset lock for 2.7 seconds:
> {quote}
> 2018-12-03 21:33:09,130 INFO 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
> BP-4588049-10.17.XXX-XX-281857726 Total blocks: 434401, missing metadata 
> fi
> les:0, missing block files:0, missing blocks in memory:0, mismatched blocks:0
> 2018-12-03 21:33:09,131 WARN 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Lock 
> held time above threshold: lock identifier: org.apache.hadoop.hdfs.serve
> r.datanode.fsdataset.impl.FsDatasetImpl lockHeldTimeMs=2710 ms. Suppressed 0 
> lock warnings. The stack trace is: 
> java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
> org.apache.hadoop.util.InstrumentedLock.logWarning(InstrumentedLock.java:148)
> org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:186)
> org.apache.hadoop.util.InstrumentedLock.unlock(InstrumentedLock.java:133)
> org.apache.hadoop.util.AutoCloseableLock.release(AutoCloseableLock.java:84)
> org.apache.hadoop.util.AutoCloseableLock.close(AutoCloseableLock.java:96)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:473)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:373)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:318)
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> java.lang.Thread.run(Thread.java:748)
> {quote}
> Log messages like this repeats every several hours (6, to be exact). I am not 
> sure if this is a performance regression, or just the fact that the lock 
> information is printed in Hadoop 3. [~vagarychen] or [~templedf] do you know?
> There's no log in DN to indicate any sort of JVM GC going on. Plus, the DN's 
> heap size is set to several GB.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14116) Fix a potential class cast error in ObserverReadProxyProvider

2018-12-05 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710791#comment-16710791
 ] 

Chen Liang commented on HDFS-14116:
---

Thanks for the finding [~shv], seems this error is different though. Because 
this is triggering a different code path, causing a different class casting. 
For fsck to work, as a current workaround, we can override configs to use 
configured proxy provider in the fsck command. For example if we have 
fs.defaultFS=ns1, we can call fsck as
{code}
hdfs fsck -Ddfs.client.failover.proxy.provider.ns1= \
org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider  
hdfs://ns1/
{code}
Still need to fix this though.

> Fix a potential class cast error in ObserverReadProxyProvider
> -
>
> Key: HDFS-14116
> URL: https://issues.apache.org/jira/browse/HDFS-14116
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Chen Liang
>Assignee: Chao Sun
>Priority: Major
>
> Currently in {{ObserverReadProxyProvider}} constructor there is this line 
> {code}
> ((ClientHAProxyFactory) factory).setAlignmentContext(alignmentContext);
> {code}
> This could potentially cause failure, because it is possible that factory can 
> not be casted here. Specifically,  
> {{NameNodeProxiesClient.createFailoverProxyProvider}} is where the 
> constructor will be called, and there are two paths that could call into this:
> (1).{{NameNodeProxies.createProxy}}
> (2).{{NameNodeProxiesClient.createFailoverProxyProvider}}
> (2) works fine because it always uses {{ClientHAProxyFactory}} but (1) uses 
> {{NameNodeHAProxyFactory}} which can not be casted to 
> {{ClientHAProxyFactory}}, this happens when, for example, running 
> NNThroughputBenmarck. To fix this we can at least:
> 1. introduce setAlignmentContext to HAProxyFactory which is the parent of 
> both  ClientHAProxyFactory and NameNodeHAProxyFactory OR
> 2. only setAlignmentContext when it is ClientHAProxyFactory by, say, having a 
> if check with reflection. 
> Depending on whether it make sense to have alignment context for the case (1) 
> calling code paths.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14120) ORFPP should also clone DT for the virtual IP

2018-12-03 Thread Chen Liang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-14120:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> ORFPP should also clone DT for the virtual IP
> -
>
> Key: HDFS-14120
> URL: https://issues.apache.org/jira/browse/HDFS-14120
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-12943
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-14120-HDFS-12943.001.patch
>
>
> Currently with HDFS-14017, ORFPP behaves the similar way on handling 
> delegation as ConfiguredFailoverProxyProvider. Specifically, given the 
> delegation token associated with name service ID, it clones the DTs for all 
> the corresponding physical addresses. But ORFPPwIP requires more work than 
> CFPP in the sense that it also leverages VIP address for failover, meaning in 
> addition to cloning DT for physical addresses, ORFPPwIP also needs to clone 
> DT for the VIP address, which is missed from HDFS-14017. This is specific to 
> ORFPPwIP, should not affect ORFPP.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14120) ORFPP should also clone DT for the virtual IP

2018-12-03 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708038#comment-16708038
 ] 

Chen Liang commented on HDFS-14120:
---

Thanks [~shv] for the review! I've committed to the feature branch.

> ORFPP should also clone DT for the virtual IP
> -
>
> Key: HDFS-14120
> URL: https://issues.apache.org/jira/browse/HDFS-14120
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-12943
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-14120-HDFS-12943.001.patch
>
>
> Currently with HDFS-14017, ORFPP behaves the similar way on handling 
> delegation as ConfiguredFailoverProxyProvider. Specifically, given the 
> delegation token associated with name service ID, it clones the DTs for all 
> the corresponding physical addresses. But ORFPPwIP requires more work than 
> CFPP in the sense that it also leverages VIP address for failover, meaning in 
> addition to cloning DT for physical addresses, ORFPPwIP also needs to clone 
> DT for the VIP address, which is missed from HDFS-14017. This is specific to 
> ORFPPwIP, should not affect ORFPP.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14120) ORFPP should also clone DT for the virtual IP

2018-12-03 Thread Chen Liang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-14120:
--
Status: Patch Available  (was: Open)

> ORFPP should also clone DT for the virtual IP
> -
>
> Key: HDFS-14120
> URL: https://issues.apache.org/jira/browse/HDFS-14120
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-12943
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-14120-HDFS-12943.001.patch
>
>
> Currently with HDFS-14017, ORFPP behaves the similar way on handling 
> delegation as ConfiguredFailoverProxyProvider. Specifically, given the 
> delegation token associated with name service ID, it clones the DTs for all 
> the corresponding physical addresses. But ORFPPwIP requires more work than 
> CFPP in the sense that it also leverages VIP address for failover, meaning in 
> addition to cloning DT for physical addresses, ORFPPwIP also needs to clone 
> DT for the VIP address, which is missed from HDFS-14017. This is specific to 
> ORFPPwIP, should not affect ORFPP.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14120) ORFPP should also clone DT for the virtual IP

2018-12-03 Thread Chen Liang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-14120:
--
Attachment: HDFS-14120-HDFS-12943.001.patch

> ORFPP should also clone DT for the virtual IP
> -
>
> Key: HDFS-14120
> URL: https://issues.apache.org/jira/browse/HDFS-14120
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-12943
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-14120-HDFS-12943.001.patch
>
>
> Currently with HDFS-14017, ORFPP behaves the similar way on handling 
> delegation as ConfiguredFailoverProxyProvider. Specifically, given the 
> delegation token associated with name service ID, it clones the DTs for all 
> the corresponding physical addresses. But ORFPPwIP requires more work than 
> CFPP in the sense that it also leverages VIP address for failover, meaning in 
> addition to cloning DT for physical addresses, ORFPPwIP also needs to clone 
> DT for the VIP address, which is missed from HDFS-14017. This is specific to 
> ORFPPwIP, should not affect ORFPP.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13547) Add ingress port based sasl resolver

2018-12-03 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16707743#comment-16707743
 ] 

Chen Liang commented on HDFS-13547:
---

I was not aware of this about fix-version for released version, thanks 
[~vinodkv] for taking care!

> Add ingress port based sasl resolver
> 
>
> Key: HDFS-13547
> URL: https://issues.apache.org/jira/browse/HDFS-13547
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: security
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: HDFS-13547.001.patch, HDFS-13547.002.patch, 
> HDFS-13547.003.patch, HDFS-13547.004.patch
>
>
> This Jira extends the SASL properties resolver interface to take an ingress 
> port parameter, and also adds an implementation based on this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14058) Test reads from standby on a secure cluster with IP failover

2018-12-04 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16709359#comment-16709359
 ] 

Chen Liang commented on HDFS-14058:
---

Another test done:
6. Performed manual failover while a TeraSort job was running. Specifically, I 
had two NNs ha1 and ha2, one as ANN and one as ONN. After started a TeraSort 
job with 1800+ mappers and 500 reducers, I changed ONN to SNN, and then call  
{{hdfs haadmin -failover --forcefence ha1 ha2}}, and also reconfigured the VIP 
to the other name node. Although some task attempts ended up failed due to 
timeout, the job finished successfully.

> Test reads from standby on a secure cluster with IP failover
> 
>
> Key: HDFS-14058
> URL: https://issues.apache.org/jira/browse/HDFS-14058
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Reporter: Konstantin Shvachko
>Assignee: Chen Liang
>Priority: Major
>
> Run standard HDFS tests to verify reading from ObserverNode on a secure HA 
> cluster with {{IPFailoverProxyProvider}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14058) Test reads from standby on a secure cluster with IP failover

2018-12-04 Thread Chen Liang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-14058:
--
Attachment: dfsio_crs.with-crs.txt
dfsio_crs.no-crs.txt

> Test reads from standby on a secure cluster with IP failover
> 
>
> Key: HDFS-14058
> URL: https://issues.apache.org/jira/browse/HDFS-14058
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Reporter: Konstantin Shvachko
>Assignee: Chen Liang
>Priority: Major
> Attachments: dfsio_crs.no-crs.txt, dfsio_crs.with-crs.txt
>
>
> Run standard HDFS tests to verify reading from ObserverNode on a secure HA 
> cluster with {{IPFailoverProxyProvider}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14058) Test reads from standby on a secure cluster with IP failover

2018-12-04 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16709444#comment-16709444
 ] 

Chen Liang commented on HDFS-14058:
---

Post some numbers from running DFSIO test, I ran the test for both with 
observer read enabled and without observer read, each with three runs.  
[^dfsio_crs.no-crs.txt] shows the results from the three runs with observer 
read disabled, while  [^dfsio_crs.with-crs.txt] shows the results with observer 
read enabled. The numbers are very close. Again it is because the cluster is 
fairly empty so ANN alone is able to handle the read requests, in this case we 
don't gain much performance improvement by adding observer read. The test is to 
prove correctness and show there is no performance degradation.

> Test reads from standby on a secure cluster with IP failover
> 
>
> Key: HDFS-14058
> URL: https://issues.apache.org/jira/browse/HDFS-14058
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Reporter: Konstantin Shvachko
>Assignee: Chen Liang
>Priority: Major
> Attachments: dfsio_crs.no-crs.txt, dfsio_crs.with-crs.txt
>
>
> Run standard HDFS tests to verify reading from ObserverNode on a secure HA 
> cluster with {{IPFailoverProxyProvider}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13767) Add msync server implementation.

2018-12-04 Thread Chen Liang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-13767:
--
Fix Version/s: HDFS-12943

> Add msync server implementation.
> 
>
> Key: HDFS-13767
> URL: https://issues.apache.org/jira/browse/HDFS-13767
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Fix For: HDFS-12943
>
> Attachments: HDFS-13767-HDFS-12943.001.patch, 
> HDFS-13767-HDFS-12943.002.patch, HDFS-13767-HDFS-12943.003.patch, 
> HDFS-13767-HDFS-12943.004.patch, HDFS-13767.WIP.001.patch, 
> HDFS-13767.WIP.002.patch, HDFS-13767.WIP.003.patch, HDFS-13767.WIP.004.patch
>
>
> This is a followup on HDFS-13688, where msync API is introduced to 
> {{ClientProtocol}} but the server side implementation is missing. This is 
> Jira is to implement the server side logic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14058) Test reads from standby on a secure cluster with IP failover

2018-12-04 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689993#comment-16689993
 ] 

Chen Liang edited comment on HDFS-14058 at 12/5/18 12:02 AM:
-

The tests were done with the setup of 100+ datanodes, 1 Active NameNode and 1 
Observer NameNode. No other standby nodes. The cluster has light HDFS workload, 
and has YARN deployed, has security (Kerberos) enabled. The purpose here was 
not evaluate performance gain, but mainly to prove the functionality and 
correctness.

In all the tests below, it is *verified from both name nodes audit log* that 
the reads actually went to Observer node and writes went to Active, and it is 
*verified from job/client logs* that when client could not talk to Observer 
(e.g. for write requests, or Observer node is actually in Standby not 
observer), it fell back to talking to the active.

The specific tests done include:
1. basic hdfs IO
- From hdfs command:
-- create/delete directory
-- basic file put/get/delete
- From a simple Java program. I wrote some code which creates a DFSClient 
instance and perform some basic operations against it:
-- create/delete directory
-- get/renew delegation token

One observation on this is that, from command line, depending on the relative 
order of ANN and ONN in config, the failover may happen every single time, with 
an exception printed. This is because from command, every single command line 
call will create a new DFSClient instance. Which may start with calling 
Observer for write, causing failover. But for reused DFSClient (e.g. from a 
Java program where it create and reuse same DFSClient), there is no this issue.

2. simple MR job: a simple wordcount job from mapreduce-examples jar, on a very 
small input.

3. SliveTest: ran Slive from hadoop-mapreduce-client-jobclient jar, with 
default parameters. I ran Slive 3 times for both with Observer enabled and 
disabled. I saw similar number of ops/sec.

4.DFSIO: ran DFSIO read test several times from 
hadoop-mapreduce-client-jobclient jar, the tests were done with 100 files, 100 
MB each. 

5. TeraGen/Sort/Validate: ran TeraGen/Sort/Validate several times from 
hadoop-mapreduce-examples jar with 1TB of data. TeraSort used 1800+ mappers and 
500 reducers. All three jobs finished successfully.


was (Author: vagarychen):
The tests I've run include the following. Please note that the following tests 
were done without several recent changes such as HDFS-14035 and HDFS-14017, but 
with some hacky code change and workaround. Although the required changes have 
been formalized to recent Jiras, the following tests haven't all been re-run 
along with those change. Post here for record.

The tests were done with the setup of 100+ datanodes, 1 Active NameNode and 1 
Observer NameNode. No other standby nodes. The cluster has light HDFS workload, 
has YARN deployed, and has security (Kerberos) enabled. The purpose here was 
not evaluate performance gain, but only to prove the functionality. In all the 
tests below, it is verified from Observer node audit log that the reads 
actually went to Observer node.

1. basic hdfs IO
- From hdfs command:
-- create/delete directory
-- basic file put/get/delete
- From a simple Java program. I wrote some code which creates a DFSClient 
instance and perform some basic operations against it:
-- create/delete directory
-- get/renew delegation token

One observation on this is that, from command line, depending on the relative 
order of ANN and ONN in config, the failover may happen every single time, with 
an exception printed. I believe this is because from command, every single 
command line call will create a new DFSClient instance. Which may start with 
calling Observer for write, causing failover. But for reused DFSClient (e.g. 
from a Java program where it create and reuse same DFSClient), there is no this 
issue.

2. simple MR job: a simple wordcount job from mapreduce-examples jar, on a very 
small input.

3. SliveTest: ran Slive from hadoop-mapreduce-client-jobclient jar, without 
parameters (so it uses default). I ran Slive 3 times for both with Observer 
enabled and disabled. I saw roughly the same ops/sec.

4.DFSIO: ran DFSIO read test several times from 
hadoop-mapreduce-client-jobclient jar, but only with very small input size. (10 
files with 1KB each). 

5. TeraGen/Sort/Validate: ran TeraGen/Sort/Validate from 
hadoop-mapreduce-examples jar with 1TB of data. TeraSort used 1800+ mappers and 
500 reducers. All three jobs finished successfully.

> Test reads from standby on a secure cluster with IP failover
> 
>
> Key: HDFS-14058
> URL: https://issues.apache.org/jira/browse/HDFS-14058
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Reporter: Konstantin Shvachko
>  

[jira] [Commented] (HDFS-14017) ObserverReadProxyProviderWithIPFailover should work with HA configuration

2018-11-16 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16690185#comment-16690185
 ] 

Chen Liang commented on HDFS-14017:
---

I've committed v014 patch to feature branch, thanks for all the reviews and 
discussions!

> ObserverReadProxyProviderWithIPFailover should work with HA configuration
> -
>
> Key: HDFS-14017
> URL: https://issues.apache.org/jira/browse/HDFS-14017
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Fix For: HDFS-12943
>
> Attachments: HDFS-14017-HDFS-12943.001.patch, 
> HDFS-14017-HDFS-12943.002.patch, HDFS-14017-HDFS-12943.003.patch, 
> HDFS-14017-HDFS-12943.004.patch, HDFS-14017-HDFS-12943.005.patch, 
> HDFS-14017-HDFS-12943.006.patch, HDFS-14017-HDFS-12943.008.patch, 
> HDFS-14017-HDFS-12943.009.patch, HDFS-14017-HDFS-12943.010.patch, 
> HDFS-14017-HDFS-12943.011.patch, HDFS-14017-HDFS-12943.012.patch, 
> HDFS-14017-HDFS-12943.013.patch, HDFS-14017-HDFS-12943.014.patch
>
>
> Currently {{ObserverReadProxyProviderWithIPFailover}} extends 
> {{ObserverReadProxyProvider}}, and the only difference is changing the proxy 
> factory to use {{IPFailoverProxyProvider}}. However this is not enough 
> because when calling constructor of {{ObserverReadProxyProvider}} in 
> super(...), the follow line:
> {code:java}
> nameNodeProxies = getProxyAddresses(uri,
> HdfsClientConfigKeys.DFS_NAMENODE_RPC_ADDRESS_KEY);
> {code}
> will try to resolve the all configured NN addresses to do configured 
> failover. But in the case of IPFailover, this does not really apply.
>  
> A second issue closely related is about delegation token. For example, in 
> current IPFailover setup, say we have a virtual host nn.xyz.com, which points 
> to either of two physical nodes nn1.xyz.com or nn2.xyz.com. In current HDFS, 
> there is always only one DT being exchanged, which has hostname nn.xyz.com. 
> Server only issues this DT, and client only knows the host nn.xyz.com, so all 
> is good. But in Observer read, even with IPFailover, the client will no 
> longer contacting nn.xyz.com, but will actively reaching to nn1.xyz.com and 
> nn2.xyz.com. During this process, current code will look for DT associated 
> with hostname nn1.xyz.com or nn2.xyz.com, which is different from the DT 
> given by NN. causing Token authentication to fail. This happens in 
> {{AbstractDelegationTokenSelector#selectToken}}. New IPFailover proxy 
> provider will need to resolve this as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14017) ObserverReadProxyProviderWithIPFailover should work with HA configuration

2018-11-16 Thread Chen Liang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-14017:
--
   Resolution: Fixed
Fix Version/s: HDFS-12943
   Status: Resolved  (was: Patch Available)

> ObserverReadProxyProviderWithIPFailover should work with HA configuration
> -
>
> Key: HDFS-14017
> URL: https://issues.apache.org/jira/browse/HDFS-14017
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Fix For: HDFS-12943
>
> Attachments: HDFS-14017-HDFS-12943.001.patch, 
> HDFS-14017-HDFS-12943.002.patch, HDFS-14017-HDFS-12943.003.patch, 
> HDFS-14017-HDFS-12943.004.patch, HDFS-14017-HDFS-12943.005.patch, 
> HDFS-14017-HDFS-12943.006.patch, HDFS-14017-HDFS-12943.008.patch, 
> HDFS-14017-HDFS-12943.009.patch, HDFS-14017-HDFS-12943.010.patch, 
> HDFS-14017-HDFS-12943.011.patch, HDFS-14017-HDFS-12943.012.patch, 
> HDFS-14017-HDFS-12943.013.patch, HDFS-14017-HDFS-12943.014.patch
>
>
> Currently {{ObserverReadProxyProviderWithIPFailover}} extends 
> {{ObserverReadProxyProvider}}, and the only difference is changing the proxy 
> factory to use {{IPFailoverProxyProvider}}. However this is not enough 
> because when calling constructor of {{ObserverReadProxyProvider}} in 
> super(...), the follow line:
> {code:java}
> nameNodeProxies = getProxyAddresses(uri,
> HdfsClientConfigKeys.DFS_NAMENODE_RPC_ADDRESS_KEY);
> {code}
> will try to resolve the all configured NN addresses to do configured 
> failover. But in the case of IPFailover, this does not really apply.
>  
> A second issue closely related is about delegation token. For example, in 
> current IPFailover setup, say we have a virtual host nn.xyz.com, which points 
> to either of two physical nodes nn1.xyz.com or nn2.xyz.com. In current HDFS, 
> there is always only one DT being exchanged, which has hostname nn.xyz.com. 
> Server only issues this DT, and client only knows the host nn.xyz.com, so all 
> is good. But in Observer read, even with IPFailover, the client will no 
> longer contacting nn.xyz.com, but will actively reaching to nn1.xyz.com and 
> nn2.xyz.com. During this process, current code will look for DT associated 
> with hostname nn1.xyz.com or nn2.xyz.com, which is different from the DT 
> given by NN. causing Token authentication to fail. This happens in 
> {{AbstractDelegationTokenSelector#selectToken}}. New IPFailover proxy 
> provider will need to resolve this as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14017) ObserverReadProxyProviderWithIPFailover should work with HA configuration

2018-11-16 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16690170#comment-16690170
 ] 

Chen Liang commented on HDFS-14017:
---

Thanks for looking into this [~xkrogen]! I agree that there should be no way 
the patch was breaking things, as it is changing a class not currently being 
used anywhere, just like you mentioned. I checked with Konstantin as well, he 
is okay with committing it. I will commit soon.

> ObserverReadProxyProviderWithIPFailover should work with HA configuration
> -
>
> Key: HDFS-14017
> URL: https://issues.apache.org/jira/browse/HDFS-14017
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-14017-HDFS-12943.001.patch, 
> HDFS-14017-HDFS-12943.002.patch, HDFS-14017-HDFS-12943.003.patch, 
> HDFS-14017-HDFS-12943.004.patch, HDFS-14017-HDFS-12943.005.patch, 
> HDFS-14017-HDFS-12943.006.patch, HDFS-14017-HDFS-12943.008.patch, 
> HDFS-14017-HDFS-12943.009.patch, HDFS-14017-HDFS-12943.010.patch, 
> HDFS-14017-HDFS-12943.011.patch, HDFS-14017-HDFS-12943.012.patch, 
> HDFS-14017-HDFS-12943.013.patch, HDFS-14017-HDFS-12943.014.patch
>
>
> Currently {{ObserverReadProxyProviderWithIPFailover}} extends 
> {{ObserverReadProxyProvider}}, and the only difference is changing the proxy 
> factory to use {{IPFailoverProxyProvider}}. However this is not enough 
> because when calling constructor of {{ObserverReadProxyProvider}} in 
> super(...), the follow line:
> {code:java}
> nameNodeProxies = getProxyAddresses(uri,
> HdfsClientConfigKeys.DFS_NAMENODE_RPC_ADDRESS_KEY);
> {code}
> will try to resolve the all configured NN addresses to do configured 
> failover. But in the case of IPFailover, this does not really apply.
>  
> A second issue closely related is about delegation token. For example, in 
> current IPFailover setup, say we have a virtual host nn.xyz.com, which points 
> to either of two physical nodes nn1.xyz.com or nn2.xyz.com. In current HDFS, 
> there is always only one DT being exchanged, which has hostname nn.xyz.com. 
> Server only issues this DT, and client only knows the host nn.xyz.com, so all 
> is good. But in Observer read, even with IPFailover, the client will no 
> longer contacting nn.xyz.com, but will actively reaching to nn1.xyz.com and 
> nn2.xyz.com. During this process, current code will look for DT associated 
> with hostname nn1.xyz.com or nn2.xyz.com, which is different from the DT 
> given by NN. causing Token authentication to fail. This happens in 
> {{AbstractDelegationTokenSelector#selectToken}}. New IPFailover proxy 
> provider will need to resolve this as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13566) Add configurable additional RPC listener to NameNode

2018-09-12 Thread Chen Liang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-13566:
--
Attachment: HDFS-13566.005.patch

> Add configurable additional RPC listener to NameNode
> 
>
> Key: HDFS-13566
> URL: https://issues.apache.org/jira/browse/HDFS-13566
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ipc
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-13566.001.patch, HDFS-13566.002.patch, 
> HDFS-13566.003.patch, HDFS-13566.004.patch, HDFS-13566.005.patch
>
>
> This Jira aims to add the capability to NameNode to run additional 
> listener(s). Such that NameNode can be accessed from multiple ports. 
> Fundamentally, this Jira tries to extend ipc.Server to allow configured with 
> more listeners, binding to different ports, but sharing the same call queue 
> and the handlers. Useful when different clients are only allowed to access 
> certain different ports. Combined with HDFS-13547, this also allows different 
> ports to have different SASL security levels. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13898) Throw retriable exception for getBlockLocations when ObserverNameNode is in safemode

2018-09-14 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16615149#comment-16615149
 ] 

Chen Liang commented on HDFS-13898:
---

Thanks [~csun] for reporting and fixing this! My main question was also that if 
we are mocking BlockManager why do we still need to change DN number. Seems 
that the {{Requested replication factor of 0 is less than the required minimum 
of 1}} was reported in {{BlockManager.verifyReplication}}, I wonder is it 
possible to mock the method {{verifyReplication}}, making it a no-op, so it 
won't perform this check? e.g. I randomly added the following lines before 
{{createNewFile}}, seems the test can pass without changing the num of DN. Not 
sure if this is a better way to go though. Just a thought.
{code:java}
BlockManager bmSpy1 = NameNodeAdapter.spyOnBlockManager(namenodes[0]);
doNothing().when(bmSpy1).verifyReplication(anyString(), anyShort(), 
anyString());{code}
Also, there is an inconsistency on the style of the mocking arguments, that 
some are {{Mock.any()}}, while some others are like {{anyBoolean()}}. (no Mock.)

> Throw retriable exception for getBlockLocations when ObserverNameNode is in 
> safemode
> 
>
> Key: HDFS-13898
> URL: https://issues.apache.org/jira/browse/HDFS-13898
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-13898-HDFS-12943.000.patch
>
>
> When ObserverNameNode is in safe mode, {{getBlockLocations}} may throw safe 
> mode exception if the given file doesn't have any block yet. 
> {code}
> try {
>   checkOperation(OperationCategory.READ);
>   res = FSDirStatAndListingOp.getBlockLocations(
>   dir, pc, srcArg, offset, length, true);
>   if (isInSafeMode()) {
> for (LocatedBlock b : res.blocks.getLocatedBlocks()) {
>   // if safemode & no block locations yet then throw safemodeException
>   if ((b.getLocations() == null) || (b.getLocations().length == 0)) {
> SafeModeException se = newSafemodeException(
> "Zero blocklocations for " + srcArg);
> if (haEnabled && haContext != null &&
> haContext.getState().getServiceState() == 
> HAServiceState.ACTIVE) {
>   throw new RetriableException(se);
> } else {
>   throw se;
> }
>   }
> }
>   }
> {code}
> It only throws {{RetriableException}} for active NN so requests on observer 
> may just fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13924) Handle BlockMissingException when reading from observer

2018-09-17 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618247#comment-16618247
 ] 

Chen Liang commented on HDFS-13924:
---

Good finding, thanks for reporting [~csun]! I'm wondering, is there a full 
stack trace still available? Just want to get a better idea on why the retry 
logic did not help for this case.

> Handle BlockMissingException when reading from observer
> ---
>
> Key: HDFS-13924
> URL: https://issues.apache.org/jira/browse/HDFS-13924
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chao Sun
>Priority: Major
>
> Internally we found that reading from ObserverNode may result to 
> {{BlockMissingException}}. This may happen when the observer sees a smaller 
> number of DNs than active (maybe due to communication issue with those DNs), 
> or (we guess) late block reports from some DNs to the observer. This error 
> happens in 
> [DFSInputStream#chooseDataNode|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java#L846],
>  when no valid DN can be found for the {{LocatedBlock}} got from the NN side.
> One potential solution (although a little hacky) is to ask the 
> {{DFSInputStream}} to retry active when this happens. The retry logic already 
> present in the code - we just have to dynamically set a flag to ask the 
> {{ObserverReadProxyProvider}} try active in this case.
> cc [~shv], [~xkrogen], [~vagarychen], [~zero45] for discussion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13924) Handle BlockMissingException when reading from observer

2018-09-17 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618281#comment-16618281
 ] 

Chen Liang commented on HDFS-13924:
---

Thanks [~csun]. I see, so I imagine the error did not happen on server side, 
because server side does not treat this as error, it still returns a 
LocatedBlock, but with empty block info list. This only becomes an exception 
when later client actually tries to read the block? If this is what was 
happening, maybe another fix would be that on server side, if server finds 
itself in observer state, and getBlockLocations is called with no known block 
info, instead of returning empty list, it throws exception instead, so that 
client side triggers retry to a different node.

Let DFSInputStream switch to active also makes sense to me though.

> Handle BlockMissingException when reading from observer
> ---
>
> Key: HDFS-13924
> URL: https://issues.apache.org/jira/browse/HDFS-13924
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chao Sun
>Priority: Major
>
> Internally we found that reading from ObserverNode may result to 
> {{BlockMissingException}}. This may happen when the observer sees a smaller 
> number of DNs than active (maybe due to communication issue with those DNs), 
> or (we guess) late block reports from some DNs to the observer. This error 
> happens in 
> [DFSInputStream#chooseDataNode|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java#L846],
>  when no valid DN can be found for the {{LocatedBlock}} got from the NN side.
> One potential solution (although a little hacky) is to ask the 
> {{DFSInputStream}} to retry active when this happens. The retry logic already 
> present in the code - we just have to dynamically set a flag to ask the 
> {{ObserverReadProxyProvider}} try active in this case.
> cc [~shv], [~xkrogen], [~vagarychen], [~zero45] for discussion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13566) Add configurable additional RPC listener to NameNode

2018-09-17 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618291#comment-16618291
 ] 

Chen Liang commented on HDFS-13566:
---

{{TestLeaseRecovery2}} failed regardless of whether the patch is applied or 
now. The other failed tests succeeded locally. The check-style issues were not 
introduced in this patch.

> Add configurable additional RPC listener to NameNode
> 
>
> Key: HDFS-13566
> URL: https://issues.apache.org/jira/browse/HDFS-13566
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ipc
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-13566.001.patch, HDFS-13566.002.patch, 
> HDFS-13566.003.patch, HDFS-13566.004.patch, HDFS-13566.005.patch, 
> HDFS-13566.006.patch
>
>
> This Jira aims to add the capability to NameNode to run additional 
> listener(s). Such that NameNode can be accessed from multiple ports. 
> Fundamentally, this Jira tries to extend ipc.Server to allow configured with 
> more listeners, binding to different ports, but sharing the same call queue 
> and the handlers. Useful when different clients are only allowed to access 
> certain different ports. Combined with HDFS-13547, this also allows different 
> ports to have different SASL security levels. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13566) Add configurable additional RPC listener to NameNode

2018-09-13 Thread Chen Liang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-13566:
--
Attachment: HDFS-13566.006.patch

> Add configurable additional RPC listener to NameNode
> 
>
> Key: HDFS-13566
> URL: https://issues.apache.org/jira/browse/HDFS-13566
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ipc
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-13566.001.patch, HDFS-13566.002.patch, 
> HDFS-13566.003.patch, HDFS-13566.004.patch, HDFS-13566.005.patch, 
> HDFS-13566.006.patch
>
>
> This Jira aims to add the capability to NameNode to run additional 
> listener(s). Such that NameNode can be accessed from multiple ports. 
> Fundamentally, this Jira tries to extend ipc.Server to allow configured with 
> more listeners, binding to different ports, but sharing the same call queue 
> and the handlers. Useful when different clients are only allowed to access 
> certain different ports. Combined with HDFS-13547, this also allows different 
> ports to have different SASL security levels. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13880) Add mechanism to allow certain RPC calls to bypass sync

2018-09-13 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614159#comment-16614159
 ] 

Chen Liang commented on HDFS-13880:
---

I've committed v005 patch to the feature branch (with the unused import 
removed). Thanks for the review [~shv], [~xkrogen]! 

> Add mechanism to allow certain RPC calls to bypass sync
> ---
>
> Key: HDFS-13880
> URL: https://issues.apache.org/jira/browse/HDFS-13880
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-13880-HDFS-12943.001.patch, 
> HDFS-13880-HDFS-12943.002.patch, HDFS-13880-HDFS-12943.003.patch, 
> HDFS-13880-HDFS-12943.004.patch, HDFS-13880-HDFS-12943.005.patch
>
>
> Currently, every single call to NameNode will be synced, in the sense that 
> NameNode will not process it until state id catches up. But in certain cases, 
> we would like to bypass this check and allow the call to return immediately, 
> even when the server id is not up to date. One case could be the to-be-added 
> new API in HDFS-13749 that request for current state id. Others may include 
> calls that do not promise real time responses such as {{getContentSummary}}. 
> This Jira is to add the mechanism to allow certain calls to bypass sync.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13880) Add mechanism to allow certain RPC calls to bypass sync

2018-09-13 Thread Chen Liang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-13880:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Add mechanism to allow certain RPC calls to bypass sync
> ---
>
> Key: HDFS-13880
> URL: https://issues.apache.org/jira/browse/HDFS-13880
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-13880-HDFS-12943.001.patch, 
> HDFS-13880-HDFS-12943.002.patch, HDFS-13880-HDFS-12943.003.patch, 
> HDFS-13880-HDFS-12943.004.patch, HDFS-13880-HDFS-12943.005.patch
>
>
> Currently, every single call to NameNode will be synced, in the sense that 
> NameNode will not process it until state id catches up. But in certain cases, 
> we would like to bypass this check and allow the call to return immediately, 
> even when the server id is not up to date. One case could be the to-be-added 
> new API in HDFS-13749 that request for current state id. Others may include 
> calls that do not promise real time responses such as {{getContentSummary}}. 
> This Jira is to add the mechanism to allow certain calls to bypass sync.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12943) Consistent Reads from Standby Node

2018-12-18 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16724465#comment-16724465
 ] 

Chen Liang commented on HDFS-12943:
---

Hi [~brahmareddy]
bq. you can see this issue if "dfs.ha.tail-edits.period" is default value.
Yes, with default period of 1min, any read can take up to 1min to finish, this 
is not specific to "second" call as you were mentioning, but any read. I agree 
that we need to lower this value. In our environment, we do already have set it 
to 100ms, and with this setting, I never seen the issue of always the second 
call timeout as you mentioned, nor getServiceState taking 2 seconds. I was 
under the impression that you still had the timeout even with setting it to 
100ms? 

> Consistent Reads from Standby Node
> --
>
> Key: HDFS-12943
> URL: https://issues.apache.org/jira/browse/HDFS-12943
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs
>Reporter: Konstantin Shvachko
>Priority: Major
> Attachments: ConsistentReadsFromStandbyNode.pdf, 
> ConsistentReadsFromStandbyNode.pdf, HDFS-12943-001.patch, 
> HDFS-12943-002.patch, TestPlan-ConsistentReadsFromStandbyNode.pdf
>
>
> StandbyNode in HDFS is a replica of the active NameNode. The states of the 
> NameNodes are coordinated via the journal. It is natural to consider 
> StandbyNode as a read-only replica. As with any replicated distributed system 
> the problem of stale reads should be resolved. Our main goal is to provide 
> reads from standby in a consistent way in order to enable a wide range of 
> existing applications running on top of HDFS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14205) Backport HDFS-6440 to branch-2

2019-01-28 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16754241#comment-16754241
 ] 

Chen Liang commented on HDFS-14205:
---

Thanks for chiming in [~csun]! I also had a patch for this backport. It 
compiles but some tests were failing, I'm not sure if the fails are related 
because many of them failed even without the patch. I haven't got the bandwidth 
to look further into the fails though. Post my patch here, hope it helps.

> Backport HDFS-6440 to branch-2
> --
>
> Key: HDFS-14205
> URL: https://issues.apache.org/jira/browse/HDFS-14205
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Liang
>Assignee: Chao Sun
>Priority: Major
>
> Currently support for more than 2 NameNodes (HDFS-6440) is only in branch-3. 
> This JIRA aims to backport it to branch-2, as this is required by HDFS-12943 
> (consistent read from standby) backport to branch-2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14205) Backport HDFS-6440 to branch-2

2019-01-28 Thread Chen Liang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-14205:
--
Attachment: HDFS-14205-branch-2.001.patch

> Backport HDFS-6440 to branch-2
> --
>
> Key: HDFS-14205
> URL: https://issues.apache.org/jira/browse/HDFS-14205
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Liang
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-14205-branch-2.001.patch
>
>
> Currently support for more than 2 NameNodes (HDFS-6440) is only in branch-3. 
> This JIRA aims to backport it to branch-2, as this is required by HDFS-12943 
> (consistent read from standby) backport to branch-2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13699) Add DFSClient sending handshake token to DataNode, and allow DataNode overwrite downstream QOP

2019-04-02 Thread Chen Liang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-13699:
--
Attachment: HDFS-13699.009.patch

> Add DFSClient sending handshake token to DataNode, and allow DataNode 
> overwrite downstream QOP
> --
>
> Key: HDFS-13699
> URL: https://issues.apache.org/jira/browse/HDFS-13699
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-13699.001.patch, HDFS-13699.002.patch, 
> HDFS-13699.003.patch, HDFS-13699.004.patch, HDFS-13699.005.patch, 
> HDFS-13699.006.patch, HDFS-13699.007.patch, HDFS-13699.008.patch, 
> HDFS-13699.009.patch, HDFS-13699.WIP.001.patch
>
>
> Given the other Jiras under HDFS-13541, this Jira is to allow DFSClient to 
> redirect the encrypt secret to DataNode. The encrypted message is the QOP 
> that client and NameNode have used. DataNode decrypts the message and enforce 
> the QOP for the client connection. Also, this Jira will also include 
> overwriting downstream QOP, as mentioned in the HDFS-13541 design doc. 
> Namely, this is to allow inter-DN QOP that is different from client-DN QOP.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13699) Add DFSClient sending handshake token to DataNode, and allow DataNode overwrite downstream QOP

2019-04-02 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808008#comment-16808008
 ] 

Chen Liang commented on HDFS-13699:
---

Thanks for the clarification [~shv]! I've attached v009 patch to address all 
comments.

> Add DFSClient sending handshake token to DataNode, and allow DataNode 
> overwrite downstream QOP
> --
>
> Key: HDFS-13699
> URL: https://issues.apache.org/jira/browse/HDFS-13699
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-13699.001.patch, HDFS-13699.002.patch, 
> HDFS-13699.003.patch, HDFS-13699.004.patch, HDFS-13699.005.patch, 
> HDFS-13699.006.patch, HDFS-13699.007.patch, HDFS-13699.008.patch, 
> HDFS-13699.009.patch, HDFS-13699.WIP.001.patch
>
>
> Given the other Jiras under HDFS-13541, this Jira is to allow DFSClient to 
> redirect the encrypt secret to DataNode. The encrypted message is the QOP 
> that client and NameNode have used. DataNode decrypts the message and enforce 
> the QOP for the client connection. Also, this Jira will also include 
> overwriting downstream QOP, as mentioned in the HDFS-13541 design doc. 
> Namely, this is to allow inter-DN QOP that is different from client-DN QOP.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13699) Add DFSClient sending handshake token to DataNode, and allow DataNode overwrite downstream QOP

2019-04-02 Thread Chen Liang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-13699:
--
Attachment: HDFS-13699.010.patch

> Add DFSClient sending handshake token to DataNode, and allow DataNode 
> overwrite downstream QOP
> --
>
> Key: HDFS-13699
> URL: https://issues.apache.org/jira/browse/HDFS-13699
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-13699.001.patch, HDFS-13699.002.patch, 
> HDFS-13699.003.patch, HDFS-13699.004.patch, HDFS-13699.005.patch, 
> HDFS-13699.006.patch, HDFS-13699.007.patch, HDFS-13699.008.patch, 
> HDFS-13699.009.patch, HDFS-13699.010.patch, HDFS-13699.WIP.001.patch
>
>
> Given the other Jiras under HDFS-13541, this Jira is to allow DFSClient to 
> redirect the encrypt secret to DataNode. The encrypted message is the QOP 
> that client and NameNode have used. DataNode decrypts the message and enforce 
> the QOP for the client connection. Also, this Jira will also include 
> overwriting downstream QOP, as mentioned in the HDFS-13541 design doc. 
> Namely, this is to allow inter-DN QOP that is different from client-DN QOP.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13699) Add DFSClient sending handshake token to DataNode, and allow DataNode overwrite downstream QOP

2019-04-02 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808186#comment-16808186
 ] 

Chen Liang commented on HDFS-13699:
---

Post v010 patch with one additional unused import fix.

> Add DFSClient sending handshake token to DataNode, and allow DataNode 
> overwrite downstream QOP
> --
>
> Key: HDFS-13699
> URL: https://issues.apache.org/jira/browse/HDFS-13699
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-13699.001.patch, HDFS-13699.002.patch, 
> HDFS-13699.003.patch, HDFS-13699.004.patch, HDFS-13699.005.patch, 
> HDFS-13699.006.patch, HDFS-13699.007.patch, HDFS-13699.008.patch, 
> HDFS-13699.009.patch, HDFS-13699.010.patch, HDFS-13699.WIP.001.patch
>
>
> Given the other Jiras under HDFS-13541, this Jira is to allow DFSClient to 
> redirect the encrypt secret to DataNode. The encrypted message is the QOP 
> that client and NameNode have used. DataNode decrypts the message and enforce 
> the QOP for the client connection. Also, this Jira will also include 
> overwriting downstream QOP, as mentioned in the HDFS-13541 design doc. 
> Namely, this is to allow inter-DN QOP that is different from client-DN QOP.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14205) Backport HDFS-6440 to branch-2

2019-03-26 Thread Chen Liang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-14205:
--
   Resolution: Fixed
Fix Version/s: 2.10.0
   Status: Resolved  (was: Patch Available)

> Backport HDFS-6440 to branch-2
> --
>
> Key: HDFS-14205
> URL: https://issues.apache.org/jira/browse/HDFS-14205
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Liang
>Assignee: Chao Sun
>Priority: Major
> Fix For: 2.10.0
>
> Attachments: HDFS-14205-branch-2.001.patch, 
> HDFS-14205-branch-2.002.patch, HDFS-14205-branch-2.003.patch, 
> HDFS-14205-branch-2.004.patch, HDFS-14205-branch-2.005.patch, 
> HDFS-14205-branch-2.006.patch, HDFS-14205-branch-2.007.patch, 
> HDFS-14205-branch-2.008.patch, HDFS-14205-branch-2.009.patch
>
>
> Currently support for more than 2 NameNodes (HDFS-6440) is only in branch-3. 
> This JIRA aims to backport it to branch-2, as this is required by HDFS-12943 
> (consistent read from standby) backport to branch-2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14205) Backport HDFS-6440 to branch-2

2019-03-26 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802083#comment-16802083
 ] 

Chen Liang commented on HDFS-14205:
---

I have backported v009 patch to branch-2, thanks [~csun] for the effort!

> Backport HDFS-6440 to branch-2
> --
>
> Key: HDFS-14205
> URL: https://issues.apache.org/jira/browse/HDFS-14205
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Liang
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-14205-branch-2.001.patch, 
> HDFS-14205-branch-2.002.patch, HDFS-14205-branch-2.003.patch, 
> HDFS-14205-branch-2.004.patch, HDFS-14205-branch-2.005.patch, 
> HDFS-14205-branch-2.006.patch, HDFS-14205-branch-2.007.patch, 
> HDFS-14205-branch-2.008.patch, HDFS-14205-branch-2.009.patch
>
>
> Currently support for more than 2 NameNodes (HDFS-6440) is only in branch-3. 
> This JIRA aims to backport it to branch-2, as this is required by HDFS-12943 
> (consistent read from standby) backport to branch-2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14205) Backport HDFS-6440 to branch-2

2019-03-26 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801988#comment-16801988
 ] 

Chen Liang commented on HDFS-14205:
---

Thanks [~csun] for the clarification. I re-ran the two tests again and still 
passed. I will commit v009 patch soon.

> Backport HDFS-6440 to branch-2
> --
>
> Key: HDFS-14205
> URL: https://issues.apache.org/jira/browse/HDFS-14205
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Liang
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-14205-branch-2.001.patch, 
> HDFS-14205-branch-2.002.patch, HDFS-14205-branch-2.003.patch, 
> HDFS-14205-branch-2.004.patch, HDFS-14205-branch-2.005.patch, 
> HDFS-14205-branch-2.006.patch, HDFS-14205-branch-2.007.patch, 
> HDFS-14205-branch-2.008.patch, HDFS-14205-branch-2.009.patch
>
>
> Currently support for more than 2 NameNodes (HDFS-6440) is only in branch-3. 
> This JIRA aims to backport it to branch-2, as this is required by HDFS-12943 
> (consistent read from standby) backport to branch-2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13699) Add DFSClient sending handshake token to DataNode, and allow DataNode overwrite downstream QOP

2019-03-22 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16799351#comment-16799351
 ] 

Chen Liang commented on HDFS-13699:
---

Post v007 patch with various refactoring. The logic remains the same. To make 
it less confusing for review, I want to mention that part of the logic is to 
enable overwriting downstream inter-DN QOP. Namely, we want to allow client 
talking to first DN with QOP1, but the DN themselves talk to each other using 
QOP2, and QOP1 and QOP2 can be different. This is useful when client is 
external and has security requirement different from DNs which are all in the 
same cluster. The way the patch works is by configuring QOP2 which overwrites 
QOP1 at run-time.

> Add DFSClient sending handshake token to DataNode, and allow DataNode 
> overwrite downstream QOP
> --
>
> Key: HDFS-13699
> URL: https://issues.apache.org/jira/browse/HDFS-13699
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-13699.001.patch, HDFS-13699.002.patch, 
> HDFS-13699.003.patch, HDFS-13699.004.patch, HDFS-13699.005.patch, 
> HDFS-13699.006.patch, HDFS-13699.007.patch, HDFS-13699.WIP.001.patch
>
>
> Given the other Jiras under HDFS-13541, this Jira is to allow DFSClient to 
> redirect the encrypt secret to DataNode. The encrypted message is the QOP 
> that client and NameNode have used. DataNode decrypts the message and enforce 
> the QOP for the client connection. Also, this Jira will also include 
> overwriting downstream QOP, as mentioned in the HDFS-13541 design doc. 
> Namely, this is to allow inter-DN QOP that is different from client-DN QOP.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13699) Add DFSClient sending handshake token to DataNode, and allow DataNode overwrite downstream QOP

2019-03-22 Thread Chen Liang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-13699:
--
Attachment: HDFS-13699.007.patch

> Add DFSClient sending handshake token to DataNode, and allow DataNode 
> overwrite downstream QOP
> --
>
> Key: HDFS-13699
> URL: https://issues.apache.org/jira/browse/HDFS-13699
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chen Liang
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-13699.001.patch, HDFS-13699.002.patch, 
> HDFS-13699.003.patch, HDFS-13699.004.patch, HDFS-13699.005.patch, 
> HDFS-13699.006.patch, HDFS-13699.007.patch, HDFS-13699.WIP.001.patch
>
>
> Given the other Jiras under HDFS-13541, this Jira is to allow DFSClient to 
> redirect the encrypt secret to DataNode. The encrypted message is the QOP 
> that client and NameNode have used. DataNode decrypts the message and enforce 
> the QOP for the client connection. Also, this Jira will also include 
> overwriting downstream QOP, as mentioned in the HDFS-13541 design doc. 
> Namely, this is to allow inter-DN QOP that is different from client-DN QOP.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14397) Backport HADOOP-15684 to branch-2

2019-04-03 Thread Chen Liang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-14397:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Backport HADOOP-15684 to branch-2
> -
>
> Key: HDFS-14397
> URL: https://issues.apache.org/jira/browse/HDFS-14397
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HDFS-14397-branch-2.000.patch, 
> HDFS-14397-branch-2.001.patch
>
>
> As multi-SBN feature is already backported to branch-2, this is a follow-up 
> to backport HADOOP-15684.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14397) Backport HADOOP-15684 to branch-2

2019-04-03 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809404#comment-16809404
 ] 

Chen Liang commented on HDFS-14397:
---

+1 on v001 patch, I've committed to branch-2, thanks for the contribution 
[~csun]!

> Backport HADOOP-15684 to branch-2
> -
>
> Key: HDFS-14397
> URL: https://issues.apache.org/jira/browse/HDFS-14397
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HDFS-14397-branch-2.000.patch, 
> HDFS-14397-branch-2.001.patch
>
>
> As multi-SBN feature is already backported to branch-2, this is a follow-up 
> to backport HADOOP-15684.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14415) Backport HDFS-13799 to branch-2

2019-04-05 Thread Chen Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811148#comment-16811148
 ] 

Chen Liang commented on HDFS-14415:
---

The tests also passed in my local run. +1 to v000 patch, I've committed to 
branch-2. Thanks [~csun]!

> Backport HDFS-13799 to branch-2
> ---
>
> Key: HDFS-14415
> URL: https://issues.apache.org/jira/browse/HDFS-14415
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Trivial
> Attachments: HDFS-14415-branch-2.000.patch
>
>
> As multi-SBN feature is already backported to branch-2, this is a follow-up 
> to backport HDFS-13799.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



<    10   11   12   13   14   15   16   17   18   19   >