[jira] [Updated] (HDFS-14017) ObserverReadProxyProviderWithIPFailover should work with HA configuration
[ https://issues.apache.org/jira/browse/HDFS-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang updated HDFS-14017: -- Attachment: HDFS-14017-HDFS-12943.012.patch > ObserverReadProxyProviderWithIPFailover should work with HA configuration > - > > Key: HDFS-14017 > URL: https://issues.apache.org/jira/browse/HDFS-14017 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-14017-HDFS-12943.001.patch, > HDFS-14017-HDFS-12943.002.patch, HDFS-14017-HDFS-12943.003.patch, > HDFS-14017-HDFS-12943.004.patch, HDFS-14017-HDFS-12943.005.patch, > HDFS-14017-HDFS-12943.006.patch, HDFS-14017-HDFS-12943.008.patch, > HDFS-14017-HDFS-12943.009.patch, HDFS-14017-HDFS-12943.010.patch, > HDFS-14017-HDFS-12943.011.patch, HDFS-14017-HDFS-12943.012.patch > > > Currently {{ObserverReadProxyProviderWithIPFailover}} extends > {{ObserverReadProxyProvider}}, and the only difference is changing the proxy > factory to use {{IPFailoverProxyProvider}}. However this is not enough > because when calling constructor of {{ObserverReadProxyProvider}} in > super(...), the follow line: > {code:java} > nameNodeProxies = getProxyAddresses(uri, > HdfsClientConfigKeys.DFS_NAMENODE_RPC_ADDRESS_KEY); > {code} > will try to resolve the all configured NN addresses to do configured > failover. But in the case of IPFailover, this does not really apply. > > A second issue closely related is about delegation token. For example, in > current IPFailover setup, say we have a virtual host nn.xyz.com, which points > to either of two physical nodes nn1.xyz.com or nn2.xyz.com. In current HDFS, > there is always only one DT being exchanged, which has hostname nn.xyz.com. > Server only issues this DT, and client only knows the host nn.xyz.com, so all > is good. But in Observer read, even with IPFailover, the client will no > longer contacting nn.xyz.com, but will actively reaching to nn1.xyz.com and > nn2.xyz.com. During this process, current code will look for DT associated > with hostname nn1.xyz.com or nn2.xyz.com, which is different from the DT > given by NN. causing Token authentication to fail. This happens in > {{AbstractDelegationTokenSelector#selectToken}}. New IPFailover proxy > provider will need to resolve this as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12943) Consistent Reads from Standby Node
[ https://issues.apache.org/jira/browse/HDFS-12943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16684222#comment-16684222 ] Chen Liang commented on HDFS-12943: --- [~xiangheng] thanks for trying Observer read! What was the full command you ran? It should be something like {{hdfs haadmin -transitionToObserver }} where nnID is the ID of the name node that you want to transition to Observer. You can run {{hdfs haadmin -getAllServiceState}} to list all the valid nnIDs in the cluster. > Consistent Reads from Standby Node > -- > > Key: HDFS-12943 > URL: https://issues.apache.org/jira/browse/HDFS-12943 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs >Reporter: Konstantin Shvachko >Priority: Major > Attachments: ConsistentReadsFromStandbyNode.pdf, > ConsistentReadsFromStandbyNode.pdf, > TestPlan-ConsistentReadsFromStandbyNode.pdf > > > StandbyNode in HDFS is a replica of the active NameNode. The states of the > NameNodes are coordinated via the journal. It is natural to consider > StandbyNode as a read-only replica. As with any replicated distributed system > the problem of stale reads should be resolved. Our main goal is to provide > reads from standby in a consistent way in order to enable a wide range of > existing applications running on top of HDFS. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14059) Test reads from standby on a secure cluster with Configured failover
[ https://issues.apache.org/jira/browse/HDFS-14059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16684229#comment-16684229 ] Chen Liang commented on HDFS-14059: --- Thanks for sharing [~zero45]! For (1), On a quick glance of the code, it seems if \{{dfs.ha.automatic-failover.enabled}} is set to true, manually transition will be rejected with that error. Did you have this configured? We don't seem to have this. For (2), I think what you suspect makes a lot of sense. I was getting the same error and ended up with adding {{hadoop.security.service.user.name.key}}. HDFS-14035 should fix this. > Test reads from standby on a secure cluster with Configured failover > > > Key: HDFS-14059 > URL: https://issues.apache.org/jira/browse/HDFS-14059 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Reporter: Konstantin Shvachko >Assignee: Plamen Jeliazkov >Priority: Major > > Run standard HDFS tests to verify reading from ObserverNode on a secure HA > cluster with {{ConfiguredFailoverProxyProvider}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14035) NN status discovery does not leverage delegation token
[ https://issues.apache.org/jira/browse/HDFS-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang updated HDFS-14035: -- Attachment: HDFS-14035-HDFS-12943.013.patch > NN status discovery does not leverage delegation token > -- > > Key: HDFS-14035 > URL: https://issues.apache.org/jira/browse/HDFS-14035 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-14035-HDFS-12943.001.patch, > HDFS-14035-HDFS-12943.002.patch, HDFS-14035-HDFS-12943.003.patch, > HDFS-14035-HDFS-12943.004.patch, HDFS-14035-HDFS-12943.005.patch, > HDFS-14035-HDFS-12943.006.patch, HDFS-14035-HDFS-12943.007.patch, > HDFS-14035-HDFS-12943.008.patch, HDFS-14035-HDFS-12943.009.patch, > HDFS-14035-HDFS-12943.010.patch, HDFS-14035-HDFS-12943.011.patch, > HDFS-14035-HDFS-12943.012.patch, HDFS-14035-HDFS-12943.013.patch > > > Currently ObserverReadProxyProvider uses > {{HAServiceProtocol#getServiceStatus}} to get the status of each NN. However > {{HAServiceProtocol}} does not leverage delegation token. So when running an > application on YARN and when YARN node manager makes this call > getServiceStatus, token authentication will fail, causing the application to > fail. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14035) NN status discovery does not leverage delegation token
[ https://issues.apache.org/jira/browse/HDFS-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16684433#comment-16684433 ] Chen Liang commented on HDFS-14035: --- Thanks for the review [~shv]! The failed test TestConsistentReadsObserver is related. Turns out a side effect of using client protocol to discover server state is that the call to {{changeProxy}} could potentially updating client alignment context state id to most recent, if talked to active, introducing a race condition to {{testMsyncSimple}}. Post v0013 patch to resolve this. > NN status discovery does not leverage delegation token > -- > > Key: HDFS-14035 > URL: https://issues.apache.org/jira/browse/HDFS-14035 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-14035-HDFS-12943.001.patch, > HDFS-14035-HDFS-12943.002.patch, HDFS-14035-HDFS-12943.003.patch, > HDFS-14035-HDFS-12943.004.patch, HDFS-14035-HDFS-12943.005.patch, > HDFS-14035-HDFS-12943.006.patch, HDFS-14035-HDFS-12943.007.patch, > HDFS-14035-HDFS-12943.008.patch, HDFS-14035-HDFS-12943.009.patch, > HDFS-14035-HDFS-12943.010.patch, HDFS-14035-HDFS-12943.011.patch, > HDFS-14035-HDFS-12943.012.patch, HDFS-14035-HDFS-12943.013.patch > > > Currently ObserverReadProxyProvider uses > {{HAServiceProtocol#getServiceStatus}} to get the status of each NN. However > {{HAServiceProtocol}} does not leverage delegation token. So when running an > application on YARN and when YARN node manager makes this call > getServiceStatus, token authentication will fail, causing the application to > fail. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14017) ObserverReadProxyProviderWithIPFailover should work with HA configuration
[ https://issues.apache.org/jira/browse/HDFS-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang updated HDFS-14017: -- Attachment: HDFS-14017-HDFS-12943.011.patch > ObserverReadProxyProviderWithIPFailover should work with HA configuration > - > > Key: HDFS-14017 > URL: https://issues.apache.org/jira/browse/HDFS-14017 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-14017-HDFS-12943.001.patch, > HDFS-14017-HDFS-12943.002.patch, HDFS-14017-HDFS-12943.003.patch, > HDFS-14017-HDFS-12943.004.patch, HDFS-14017-HDFS-12943.005.patch, > HDFS-14017-HDFS-12943.006.patch, HDFS-14017-HDFS-12943.008.patch, > HDFS-14017-HDFS-12943.009.patch, HDFS-14017-HDFS-12943.010.patch, > HDFS-14017-HDFS-12943.011.patch > > > Currently {{ObserverReadProxyProviderWithIPFailover}} extends > {{ObserverReadProxyProvider}}, and the only difference is changing the proxy > factory to use {{IPFailoverProxyProvider}}. However this is not enough > because when calling constructor of {{ObserverReadProxyProvider}} in > super(...), the follow line: > {code:java} > nameNodeProxies = getProxyAddresses(uri, > HdfsClientConfigKeys.DFS_NAMENODE_RPC_ADDRESS_KEY); > {code} > will try to resolve the all configured NN addresses to do configured > failover. But in the case of IPFailover, this does not really apply. > > A second issue closely related is about delegation token. For example, in > current IPFailover setup, say we have a virtual host nn.xyz.com, which points > to either of two physical nodes nn1.xyz.com or nn2.xyz.com. In current HDFS, > there is always only one DT being exchanged, which has hostname nn.xyz.com. > Server only issues this DT, and client only knows the host nn.xyz.com, so all > is good. But in Observer read, even with IPFailover, the client will no > longer contacting nn.xyz.com, but will actively reaching to nn1.xyz.com and > nn2.xyz.com. During this process, current code will look for DT associated > with hostname nn1.xyz.com or nn2.xyz.com, which is different from the DT > given by NN. causing Token authentication to fail. This happens in > {{AbstractDelegationTokenSelector#selectToken}}. New IPFailover proxy > provider will need to resolve this as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14017) ObserverReadProxyProviderWithIPFailover should work with HA configuration
[ https://issues.apache.org/jira/browse/HDFS-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686936#comment-16686936 ] Chen Liang commented on HDFS-14017: --- v011 patch to fix checkstyle issues. > ObserverReadProxyProviderWithIPFailover should work with HA configuration > - > > Key: HDFS-14017 > URL: https://issues.apache.org/jira/browse/HDFS-14017 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-14017-HDFS-12943.001.patch, > HDFS-14017-HDFS-12943.002.patch, HDFS-14017-HDFS-12943.003.patch, > HDFS-14017-HDFS-12943.004.patch, HDFS-14017-HDFS-12943.005.patch, > HDFS-14017-HDFS-12943.006.patch, HDFS-14017-HDFS-12943.008.patch, > HDFS-14017-HDFS-12943.009.patch, HDFS-14017-HDFS-12943.010.patch, > HDFS-14017-HDFS-12943.011.patch > > > Currently {{ObserverReadProxyProviderWithIPFailover}} extends > {{ObserverReadProxyProvider}}, and the only difference is changing the proxy > factory to use {{IPFailoverProxyProvider}}. However this is not enough > because when calling constructor of {{ObserverReadProxyProvider}} in > super(...), the follow line: > {code:java} > nameNodeProxies = getProxyAddresses(uri, > HdfsClientConfigKeys.DFS_NAMENODE_RPC_ADDRESS_KEY); > {code} > will try to resolve the all configured NN addresses to do configured > failover. But in the case of IPFailover, this does not really apply. > > A second issue closely related is about delegation token. For example, in > current IPFailover setup, say we have a virtual host nn.xyz.com, which points > to either of two physical nodes nn1.xyz.com or nn2.xyz.com. In current HDFS, > there is always only one DT being exchanged, which has hostname nn.xyz.com. > Server only issues this DT, and client only knows the host nn.xyz.com, so all > is good. But in Observer read, even with IPFailover, the client will no > longer contacting nn.xyz.com, but will actively reaching to nn1.xyz.com and > nn2.xyz.com. During this process, current code will look for DT associated > with hostname nn1.xyz.com or nn2.xyz.com, which is different from the DT > given by NN. causing Token authentication to fail. This happens in > {{AbstractDelegationTokenSelector#selectToken}}. New IPFailover proxy > provider will need to resolve this as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14035) NN status discovery does not leverage delegation token
[ https://issues.apache.org/jira/browse/HDFS-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang updated HDFS-14035: -- Attachment: HDFS-14035-HDFS-12943.014.patch > NN status discovery does not leverage delegation token > -- > > Key: HDFS-14035 > URL: https://issues.apache.org/jira/browse/HDFS-14035 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-14035-HDFS-12943.001.patch, > HDFS-14035-HDFS-12943.002.patch, HDFS-14035-HDFS-12943.003.patch, > HDFS-14035-HDFS-12943.004.patch, HDFS-14035-HDFS-12943.005.patch, > HDFS-14035-HDFS-12943.006.patch, HDFS-14035-HDFS-12943.007.patch, > HDFS-14035-HDFS-12943.008.patch, HDFS-14035-HDFS-12943.009.patch, > HDFS-14035-HDFS-12943.010.patch, HDFS-14035-HDFS-12943.011.patch, > HDFS-14035-HDFS-12943.012.patch, HDFS-14035-HDFS-12943.013.patch, > HDFS-14035-HDFS-12943.014.patch > > > Currently ObserverReadProxyProvider uses > {{HAServiceProtocol#getServiceStatus}} to get the status of each NN. However > {{HAServiceProtocol}} does not leverage delegation token. So when running an > application on YARN and when YARN node manager makes this call > getServiceStatus, token authentication will fail, causing the application to > fail. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14035) NN status discovery does not leverage delegation token
[ https://issues.apache.org/jira/browse/HDFS-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685620#comment-16685620 ] Chen Liang commented on HDFS-14035: --- Discussed with [~xkrogen] offline, seems we can also resolve race condition in the unit test, but avoid using sleep, by making an uncoordinated call to server early. This will initialize the observer proxy and also sets the state id on client side. Post v014 patch, also added a couple missing javadoc. > NN status discovery does not leverage delegation token > -- > > Key: HDFS-14035 > URL: https://issues.apache.org/jira/browse/HDFS-14035 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-14035-HDFS-12943.001.patch, > HDFS-14035-HDFS-12943.002.patch, HDFS-14035-HDFS-12943.003.patch, > HDFS-14035-HDFS-12943.004.patch, HDFS-14035-HDFS-12943.005.patch, > HDFS-14035-HDFS-12943.006.patch, HDFS-14035-HDFS-12943.007.patch, > HDFS-14035-HDFS-12943.008.patch, HDFS-14035-HDFS-12943.009.patch, > HDFS-14035-HDFS-12943.010.patch, HDFS-14035-HDFS-12943.011.patch, > HDFS-14035-HDFS-12943.012.patch, HDFS-14035-HDFS-12943.013.patch, > HDFS-14035-HDFS-12943.014.patch > > > Currently ObserverReadProxyProvider uses > {{HAServiceProtocol#getServiceStatus}} to get the status of each NN. However > {{HAServiceProtocol}} does not leverage delegation token. So when running an > application on YARN and when YARN node manager makes this call > getServiceStatus, token authentication will fail, causing the application to > fail. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14017) ObserverReadProxyProviderWithIPFailover should work with HA configuration
[ https://issues.apache.org/jira/browse/HDFS-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673712#comment-16673712 ] Chen Liang commented on HDFS-14017: --- Thanks for the clarification [~xkrogen]. bq. when the active changes, how will it ever start using the new active? My thought was that, just like how current IPFailover works now, it simply assumes the VIP is pointing to ANN, changing the VIP to NN mapping happens outside of HDFS. So I was thinking of the same approach, in IPFailover, just assume the VIP is the savior when failure happens (i.e. let failoverproxy variable point to VIP always), and don't bother to figure out what exactly it is pointing to. Although the current patch may not doing this correctly/have this at all. bq. we would need to introduce additional VIPs for observers, and it's not clear to me if this makes sense Can't agree more, exactly the struggle I had! The intuition of IPFailover is to rely on VIPs and don't bother figure out the physical addresses, but when there are multiple NNs, with some being observer and some being active. I was not sure how that is going to look like. So decided to go (for now at least) IPFailover sending request to discover physical nodes by itself. > ObserverReadProxyProviderWithIPFailover should work with HA configuration > - > > Key: HDFS-14017 > URL: https://issues.apache.org/jira/browse/HDFS-14017 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-14017-HDFS-12943.001.patch, > HDFS-14017-HDFS-12943.002.patch > > > Currently {{ObserverReadProxyProviderWithIPFailover}} extends > {{ObserverReadProxyProvider}}, and the only difference is changing the proxy > factory to use {{IPFailoverProxyProvider}}. However this is not enough > because when calling constructor of {{ObserverReadProxyProvider}} in > super(...), the follow line: > {code:java} > nameNodeProxies = getProxyAddresses(uri, > HdfsClientConfigKeys.DFS_NAMENODE_RPC_ADDRESS_KEY); > {code} > will try to resolve the all configured NN addresses to do configured > failover. But in the case of IPFailover, this does not really apply. > > A second issue closely related is about delegation token. For example, in > current IPFailover setup, say we have a virtual host nn.xyz.com, which points > to either of two physical nodes nn1.xyz.com or nn2.xyz.com. In current HDFS, > there is always only one DT being exchanged, which has hostname nn.xyz.com. > Server only issues this DT, and client only knows the host nn.xyz.com, so all > is good. But in Observer read, even with IPFailover, the client will no > longer contacting nn.xyz.com, but will actively reaching to nn1.xyz.com and > nn2.xyz.com. During this process, current code will look for DT associated > with hostname nn1.xyz.com or nn2.xyz.com, which is different from the DT > given by NN. causing Token authentication to fail. This happens in > {{AbstractDelegationTokenSelector#selectToken}}. New IPFailover proxy > provider will need to resolve this as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14035) NN status discovery does not leverage delegation token
[ https://issues.apache.org/jira/browse/HDFS-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang updated HDFS-14035: -- Attachment: HDFS-14035-HDFS-12943.004.patch > NN status discovery does not leverage delegation token > -- > > Key: HDFS-14035 > URL: https://issues.apache.org/jira/browse/HDFS-14035 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-14035-HDFS-12943.001.patch, > HDFS-14035-HDFS-12943.002.patch, HDFS-14035-HDFS-12943.003.patch, > HDFS-14035-HDFS-12943.004.patch > > > Currently ObserverReadProxyProvider uses > {{HAServiceProtocol#getServiceStatus}} to get the status of each NN. However > {{HAServiceProtocol}} does not leverage delegation token. So when running an > application on YARN and when YARN node manager makes this call > getServiceStatus, token authentication will fail, causing the application to > fail. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14035) NN status discovery does not leverage delegation token
[ https://issues.apache.org/jira/browse/HDFS-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673535#comment-16673535 ] Chen Liang commented on HDFS-14035: --- Rebased with v004 patch > NN status discovery does not leverage delegation token > -- > > Key: HDFS-14035 > URL: https://issues.apache.org/jira/browse/HDFS-14035 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-14035-HDFS-12943.001.patch, > HDFS-14035-HDFS-12943.002.patch, HDFS-14035-HDFS-12943.003.patch, > HDFS-14035-HDFS-12943.004.patch > > > Currently ObserverReadProxyProvider uses > {{HAServiceProtocol#getServiceStatus}} to get the status of each NN. However > {{HAServiceProtocol}} does not leverage delegation token. So when running an > application on YARN and when YARN node manager makes this call > getServiceStatus, token authentication will fail, causing the application to > fail. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13880) Add mechanism to allow certain RPC calls to bypass sync
[ https://issues.apache.org/jira/browse/HDFS-13880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599019#comment-16599019 ] Chen Liang commented on HDFS-13880: --- [~shv] Masync is just the current name I picked for methods that do not need to go through the sync process (which is msync). Just replaced sync with async. Please feel free to propose a different term :). [~csun] thanks for the clarification. Will double check if {{HAServiceProtocol}} is currently synced by msync. > Add mechanism to allow certain RPC calls to bypass sync > --- > > Key: HDFS-13880 > URL: https://issues.apache.org/jira/browse/HDFS-13880 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-13880-HDFS-12943.001.patch, > HDFS-13880-HDFS-12943.002.patch > > > Currently, every single call to NameNode will be synced, in the sense that > NameNode will not process it until state id catches up. But in certain cases, > we would like to bypass this check and allow the call to return immediately, > even when the server id is not up to date. One case could be the to-be-added > new API in HDFS-13749 that request for current state id. Others may include > calls that do not promise real time responses such as {{getContentSummary}}. > This Jira is to add the mechanism to allow certain calls to bypass sync. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13880) Add mechanism to allow certain RPC calls to bypass sync
[ https://issues.apache.org/jira/browse/HDFS-13880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang updated HDFS-13880: -- Status: Patch Available (was: Open) > Add mechanism to allow certain RPC calls to bypass sync > --- > > Key: HDFS-13880 > URL: https://issues.apache.org/jira/browse/HDFS-13880 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-13880-HDFS-12943.001.patch, > HDFS-13880-HDFS-12943.002.patch > > > Currently, every single call to NameNode will be synced, in the sense that > NameNode will not process it until state id catches up. But in certain cases, > we would like to bypass this check and allow the call to return immediately, > even when the server id is not up to date. One case could be the to-be-added > new API in HDFS-13749 that request for current state id. Others may include > calls that do not promise real time responses such as {{getContentSummary}}. > This Jira is to add the mechanism to allow certain calls to bypass sync. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13872) Only some protocol methods should perform msync wait
[ https://issues.apache.org/jira/browse/HDFS-13872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599035#comment-16599035 ] Chen Liang commented on HDFS-13872: --- Somehow I missed this Jira completely...so I filed HDFS-13880 and submitted a patch there too...Sorry my bad! I was taking a very similar approach at the beginning, I added a method in ReadOnly annotation to indicate whether this method should go through msync. But then I ran into an issue, which was that ReadOnly annotation is only applied to ClientProtocol. But when it comes down to {{ProtobufRpcEngine}} layer, it actually changes from {{ClientProtocol}} to {{ClientNamenodeProtocol}} and the annotation could no longer be found. And {{ClientNamenodeProtocol}} class is actually a protobuf-generated class so we can not annotate there...Also, chatted with Konstantin, seems a more desired approach to do the check on server side. So the approach I take in HDFS-13880 is that on server side when receiving a RPC call, it looks up method name in the RPC call in ClientProtocol, if the same name method exist, then the annotation of this method in ClientProtocol will be used to check if msync should be bypassed. Again, sorry I missed this Jira earlier... > Only some protocol methods should perform msync wait > > > Key: HDFS-13872 > URL: https://issues.apache.org/jira/browse/HDFS-13872 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HDFS-13872-HDFS-12943.000.patch > > > Currently the implementation of msync added in HDFS-13767 waits until the > server has caught up to the client-specified transaction ID regardless of > what the inbound RPC is. This particularly causes problems for > ObserverReadProxyProvider (see HDFS-13779) when we try to fetch the state > from an observer/standby; this should be a quick operation, but it has to > wait for the node to catch up to the most current state. I initially thought > all {{HAServiceProtocol}} methods should thus be excluded from the wait > period, but actually I think the right approach is that _only_ > {{ClientProtocol}} methods should be subjected to the wait period. I propose > that we can do this via an annotation on client protocol which can then be > checked within {{ipc.Server}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13924) Handle BlockMissingException when reading from observer
[ https://issues.apache.org/jira/browse/HDFS-13924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16622626#comment-16622626 ] Chen Liang commented on HDFS-13924: --- Thanks for the update [~csun]! Just to add to my previous comment: What I was thinking of was to make the retry a more uniformed fashion. Specifically, in the ideal situation, I think on client side, it should always only be ProxyProvider that handles NN redirecting logic. To this extend, I would think of server side a better place to handle this compared to DFSOutputStream: server side throws exception, then ProxyProvider does the redirecting properly, so DFSInputStream is hidden from the retry and doesn't need to do anything in addition. So IMO, the better way may be, just like you mentioned, creating a new exception, say, ObserverOperationFailException, for all the situations where Observer can not successfully handle a whatever request and worth retry active, just throw this exception. Whenever ObserverProxyProvider sees this exception, try again with active. Something along this line. > Handle BlockMissingException when reading from observer > --- > > Key: HDFS-13924 > URL: https://issues.apache.org/jira/browse/HDFS-13924 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chao Sun >Priority: Major > > Internally we found that reading from ObserverNode may result to > {{BlockMissingException}}. This may happen when the observer sees a smaller > number of DNs than active (maybe due to communication issue with those DNs), > or (we guess) late block reports from some DNs to the observer. This error > happens in > [DFSInputStream#chooseDataNode|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java#L846], > when no valid DN can be found for the {{LocatedBlock}} got from the NN side. > One potential solution (although a little hacky) is to ask the > {{DFSInputStream}} to retry active when this happens. The retry logic already > present in the code - we just have to dynamically set a flag to ask the > {{ObserverReadProxyProvider}} try active in this case. > cc [~shv], [~xkrogen], [~vagarychen], [~zero45] for discussion. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13873) ObserverNode should reject read requests when it is too far behind.
[ https://issues.apache.org/jira/browse/HDFS-13873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16622645#comment-16622645 ] Chen Liang commented on HDFS-13873: --- [~csun] any updates/plans for this? I saw you are busy with two other Jiras, I can help on this one if you like :). Either way, I'm curious about what is the current internal implementation at Uber? I was syncing with Konstantin, we were planning to do this based on state id. But the threshold for rejection should probably be based on some runtime moving average (e.g. the number of txid being processed in past X mins). Any thoughts on this? > ObserverNode should reject read requests when it is too far behind. > --- > > Key: HDFS-13873 > URL: https://issues.apache.org/jira/browse/HDFS-13873 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client, namenode >Affects Versions: HDFS-12943 >Reporter: Konstantin Shvachko >Assignee: Chao Sun >Priority: Major > > Add a server-side threshold for ObserverNode to reject read requests when it > is too far behind. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13873) ObserverNode should reject read requests when it is too far behind.
[ https://issues.apache.org/jira/browse/HDFS-13873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16622948#comment-16622948 ] Chen Liang commented on HDFS-13873: --- Thanks [~csun]! I had some thoughts on this, sharing here for references: version 1: have a tracker that, whenever client sends request to Observer, trackers records Observer's current state id X, and timestamp tx comparing to previous value Y and previous timestamp ty and t = (X - Y) / (tx - ty) gives an estimation of how long it takes for observer to proceed one txid. (this can be measured as moving average for better accuracy). And say delta = clientStateId - X, then delta * t gives the estimate time of when the client request can start being processed i.e. the msync wait time. Plan 2: instead of tracking Observer state id increasing rate. We could also have t = the average time of processing one request. (This needs more code to measure time spent for a request to be in the queue until finished). The delta * t then becomes the estimate of when the client request will actually finish. version 2 requires more code changes, but is able to handle the case that, Observer state id is actually not too far behind, but Observer node itself is being too slow, causing still a long processing time of a request. Which is not captured by version 1. The downside though, it seemed to me there can be cases where version 2 can reject many calls over-aggressively. Also addressing slow Observer seems a bit beyond the scope of this Jira. I would say maybe we can go with the simpler of version 1 first and see how it works out. Any comments [~csun], [~shv]? > ObserverNode should reject read requests when it is too far behind. > --- > > Key: HDFS-13873 > URL: https://issues.apache.org/jira/browse/HDFS-13873 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client, namenode >Affects Versions: HDFS-12943 >Reporter: Konstantin Shvachko >Assignee: Chao Sun >Priority: Major > > Add a server-side threshold for ObserverNode to reject read requests when it > is too far behind. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13873) ObserverNode should reject read requests when it is too far behind.
[ https://issues.apache.org/jira/browse/HDFS-13873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626547#comment-16626547 ] Chen Liang commented on HDFS-13873: --- Update here for reference: Synced offline with Konstantin. Seems one fundamental issue with my previous proposed approaches is that, when the server state id increment is slow, it is hard to differentiate between: 1. the server is slow; 2. there were not many write anyway. Meaning, in addition to estimating request syncing time, we also need to have a reasonable estimate of server state catch up rate, instead purely based current window. > ObserverNode should reject read requests when it is too far behind. > --- > > Key: HDFS-13873 > URL: https://issues.apache.org/jira/browse/HDFS-13873 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client, namenode >Affects Versions: HDFS-12943 >Reporter: Konstantin Shvachko >Assignee: Chao Sun >Priority: Major > > Add a server-side threshold for ObserverNode to reject read requests when it > is too far behind. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14017) ObserverReadProxyProviderWithIPFailover should work with HA configuration
[ https://issues.apache.org/jira/browse/HDFS-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689781#comment-16689781 ] Chen Liang commented on HDFS-14017: --- Hmmm...thanks for the checking [~xkrogen]! > ObserverReadProxyProviderWithIPFailover should work with HA configuration > - > > Key: HDFS-14017 > URL: https://issues.apache.org/jira/browse/HDFS-14017 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-14017-HDFS-12943.001.patch, > HDFS-14017-HDFS-12943.002.patch, HDFS-14017-HDFS-12943.003.patch, > HDFS-14017-HDFS-12943.004.patch, HDFS-14017-HDFS-12943.005.patch, > HDFS-14017-HDFS-12943.006.patch, HDFS-14017-HDFS-12943.008.patch, > HDFS-14017-HDFS-12943.009.patch, HDFS-14017-HDFS-12943.010.patch, > HDFS-14017-HDFS-12943.011.patch, HDFS-14017-HDFS-12943.012.patch, > HDFS-14017-HDFS-12943.013.patch, HDFS-14017-HDFS-12943.014.patch > > > Currently {{ObserverReadProxyProviderWithIPFailover}} extends > {{ObserverReadProxyProvider}}, and the only difference is changing the proxy > factory to use {{IPFailoverProxyProvider}}. However this is not enough > because when calling constructor of {{ObserverReadProxyProvider}} in > super(...), the follow line: > {code:java} > nameNodeProxies = getProxyAddresses(uri, > HdfsClientConfigKeys.DFS_NAMENODE_RPC_ADDRESS_KEY); > {code} > will try to resolve the all configured NN addresses to do configured > failover. But in the case of IPFailover, this does not really apply. > > A second issue closely related is about delegation token. For example, in > current IPFailover setup, say we have a virtual host nn.xyz.com, which points > to either of two physical nodes nn1.xyz.com or nn2.xyz.com. In current HDFS, > there is always only one DT being exchanged, which has hostname nn.xyz.com. > Server only issues this DT, and client only knows the host nn.xyz.com, so all > is good. But in Observer read, even with IPFailover, the client will no > longer contacting nn.xyz.com, but will actively reaching to nn1.xyz.com and > nn2.xyz.com. During this process, current code will look for DT associated > with hostname nn1.xyz.com or nn2.xyz.com, which is different from the DT > given by NN. causing Token authentication to fail. This happens in > {{AbstractDelegationTokenSelector#selectToken}}. New IPFailover proxy > provider will need to resolve this as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14058) Test reads from standby on a secure cluster with IP failover
[ https://issues.apache.org/jira/browse/HDFS-14058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689993#comment-16689993 ] Chen Liang commented on HDFS-14058: --- The tests I've run include the following. Please note that the following tests were done without several recent changes such as HDFS-14035 and HDFS-14017, but with some hacky code change and workaround. Although the required changes have been formalized to recent Jiras, the following tests haven't all been re-run along with those change. Post here for record. The tests were done with the setup of 100+ datanodes, 1 Active NameNode and 1 Observer NameNode. No other standby nodes. The cluster has light HDFS workload, has YARN deployed, and has security (Kerberos) enabled. The purpose here was not evaluate performance gain, but only to prove the functionality. In all the tests below, it is verified from Observer node audit log that the reads actually went to Observer node. 1. basic hdfs IO - From hdfs command: -- create/delete directory -- basic file put/get/delete - From a simple Java program. I wrote some code which creates a DFSClient instance and perform some basic operations against it: -- create/delete directory -- get/renew delegation token One observation on this is that, from command line, depending on the relative order of ANN and ONN in config, the failover may happen every single time, with an exception printed. I believe this is because from command, every single command line call will create a new DFSClient instance. Which may start with calling Observer for write, causing failover. But for reused DFSClient (e.g. from a Java program where it create and reuse same DFSClient), there is no this issue. 2. simple MR job: a simple wordcount job from mapreduce-examples jar, on a very small input. 3. SliveTest: ran Slive from hadoop-mapreduce-client-jobclient jar, without parameters (so it uses default). I ran Slive 3 times for both with Observer enabled and disabled. I saw roughly the same ops/sec. 4.DFSIO: ran DFSIO read test several times from hadoop-mapreduce-client-jobclient jar, but only with very small input size. (10 files with 1KB each). 5. TeraGen/Sort/Validate: ran TeraGen/Sort/Validate from hadoop-mapreduce-examples jar with 1TB of data. TeraSort used 1800+ mappers and 500 reducers. All three jobs finished successfully. > Test reads from standby on a secure cluster with IP failover > > > Key: HDFS-14058 > URL: https://issues.apache.org/jira/browse/HDFS-14058 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Reporter: Konstantin Shvachko >Assignee: Chen Liang >Priority: Major > > Run standard HDFS tests to verify reading from ObserverNode on a secure HA > cluster with {{IPFailoverProxyProvider}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14181) Suspect there is a bug in NetworkTopology.java chooseRandom function.
[ https://issues.apache.org/jira/browse/HDFS-14181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732304#comment-16732304 ] Chen Liang commented on HDFS-14181: --- Sorry for the late response, just got back from vacation. The fix seems correct, v005 patch LGTM. > Suspect there is a bug in NetworkTopology.java chooseRandom function. > - > > Key: HDFS-14181 > URL: https://issues.apache.org/jira/browse/HDFS-14181 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, namenode >Affects Versions: 2.9.2 >Reporter: Sihai Ke >Assignee: Sihai Ke >Priority: Major > Attachments: 0001-add-UT-for-NetworkTopology.patch, > 0001-fix-NetworkTopology.java-chooseRandom-bug.patch, HDFS-14181.01.patch, > HDFS-14181.02.patch, HDFS-14181.03.patch, HDFS-14181.04.patch, > HDFS-14181.05.patch, image-2018-12-29-15-02-19-415.png > > > During reading the hadoop NetworkTopology.java, I suspect there is a bug in > function > chooseRandom (line 498, hadoop version 2.9.2-RC0), > I think there is a bug in{color:#f79232} code, ~excludedScope doesn't mean > availableNodes under Scope node, and I also add unit test for this and get an > exception.{color} > bug code in the else. > {code:java} > // code placeholder > if (excludedScope == null) { > availableNodes = countNumOfAvailableNodes(scope, excludedNodes); > } else { > availableNodes = > countNumOfAvailableNodes("~" + excludedScope, excludedNodes); > }{code} > Source code: > {code:java} > // code placeholder > protected Node chooseRandom(final String scope, String excludedScope, > final Collection excludedNodes) { > if (excludedScope != null) { > if (scope.startsWith(excludedScope)) { > return null; > } > if (!excludedScope.startsWith(scope)) { > excludedScope = null; > } > } > Node node = getNode(scope); > if (!(node instanceof InnerNode)) { > return excludedNodes != null && excludedNodes.contains(node) ? > null : node; > } > InnerNode innerNode = (InnerNode)node; > int numOfDatanodes = innerNode.getNumOfLeaves(); > if (excludedScope == null) { > node = null; > } else { > node = getNode(excludedScope); > if (!(node instanceof InnerNode)) { > numOfDatanodes -= 1; > } else { > numOfDatanodes -= ((InnerNode)node).getNumOfLeaves(); > } > } > if (numOfDatanodes <= 0) { > LOG.debug("Failed to find datanode (scope=\"{}\" excludedScope=\"{}\")." > + " numOfDatanodes={}", > scope, excludedScope, numOfDatanodes); > return null; > } > final int availableNodes; > if (excludedScope == null) { > availableNodes = countNumOfAvailableNodes(scope, excludedNodes); > } else { > availableNodes = > countNumOfAvailableNodes("~" + excludedScope, excludedNodes); > } > LOG.debug("Choosing random from {} available nodes on node {}," > + " scope={}, excludedScope={}, excludeNodes={}. numOfDatanodes={}.", > availableNodes, innerNode, scope, excludedScope, excludedNodes, > numOfDatanodes); > Node ret = null; > if (availableNodes > 0) { > ret = chooseRandom(innerNode, node, excludedNodes, numOfDatanodes, > availableNodes); > } > LOG.debug("chooseRandom returning {}", ret); > return ret; > } > {code} > > > Add Unit Test in TestClusterTopology.java, but get exception. > > {code:java} > // code placeholder > @Test > public void testChooseRandom1() { > // create the topology > NetworkTopology cluster = NetworkTopology.getInstance(new Configuration()); > NodeElement node1 = getNewNode("node1", "/a1/b1/c1"); > cluster.add(node1); > NodeElement node2 = getNewNode("node2", "/a1/b1/c1"); > cluster.add(node2); > NodeElement node3 = getNewNode("node3", "/a1/b1/c2"); > cluster.add(node3); > NodeElement node4 = getNewNode("node4", "/a1/b2/c3"); > cluster.add(node4); > Node node = cluster.chooseRandom("/a1/b1", "/a1/b1/c1", null); > assertSame(node.getName(), "node3"); > } > {code} > > Exception: > {code:java} > // code placeholder > java.lang.IllegalArgumentException: 1 should >= 2, and both should be > positive. > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) > at > org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:567) > at > org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:544) > atorg.apache.hadoop.net.TestClusterTopology.testChooseRandom1(TestClusterTopology.java:198) > {code} > > {color:#f79232}!image-2018-12-29-15-02-19-415.png!{color} > > > [~vagarychen] this change is imported in PR HDFS-11577, could you help to > check whether this is a bug ? > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HDFS-14205) Backport HDFS-6440 to branch-2
Chen Liang created HDFS-14205: - Summary: Backport HDFS-6440 to branch-2 Key: HDFS-14205 URL: https://issues.apache.org/jira/browse/HDFS-14205 Project: Hadoop HDFS Issue Type: Improvement Reporter: Chen Liang Currently support for more than 2 NameNodes (HDFS-6440) is only in branch-3. This JIRA aims to backport it to branch-2, as this is required by HDFS-12943 (consistent read from standby) backport to branch-2. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14204) Backport HDFS-12943 to branch-2
Chen Liang created HDFS-14204: - Summary: Backport HDFS-12943 to branch-2 Key: HDFS-14204 URL: https://issues.apache.org/jira/browse/HDFS-14204 Project: Hadoop HDFS Issue Type: Improvement Reporter: Chen Liang Currently, consistent read from standby feature (HDFS-12943) is only in trunk (branch-3). This JIRA aims to backport the feature to branch-2. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14204) Backport HDFS-12943 to branch-2
[ https://issues.apache.org/jira/browse/HDFS-14204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang updated HDFS-14204: -- External issue ID: hdfs-12943 > Backport HDFS-12943 to branch-2 > --- > > Key: HDFS-14204 > URL: https://issues.apache.org/jira/browse/HDFS-14204 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Liang >Priority: Major > > Currently, consistent read from standby feature (HDFS-12943) is only in trunk > (branch-3). This JIRA aims to backport the feature to branch-2. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14142) Move ipfailover config key out of HdfsClientConfigKeys
[ https://issues.apache.org/jira/browse/HDFS-14142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang updated HDFS-14142: -- Resolution: Fixed Fix Version/s: HDFS-12943 Status: Resolved (was: Patch Available) > Move ipfailover config key out of HdfsClientConfigKeys > -- > > Key: HDFS-14142 > URL: https://issues.apache.org/jira/browse/HDFS-14142 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Minor > Fix For: HDFS-12943 > > Attachments: HDFS-14142-HDFS-12943.001.patch > > > Running TestHdfsConfigFields throws error complaining missing key > dfs.client.failover.ipfailover.virtual-address. Since this config key is > specific to only ORFPPwithIP, This Jira moves this config prefix to > ObserverReadProxyProviderWithIPFailover. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14142) Move ipfailover config key out of HdfsClientConfigKeys
[ https://issues.apache.org/jira/browse/HDFS-14142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719300#comment-16719300 ] Chen Liang commented on HDFS-14142: --- The checkstyle warning can be fixed by adding final keyword to the key string. I fixed and verified with checkstyle locally. I've committed to the feature branch, thanks [~shv] for the review! > Move ipfailover config key out of HdfsClientConfigKeys > -- > > Key: HDFS-14142 > URL: https://issues.apache.org/jira/browse/HDFS-14142 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Minor > Fix For: HDFS-12943 > > Attachments: HDFS-14142-HDFS-12943.001.patch > > > Running TestHdfsConfigFields throws error complaining missing key > dfs.client.failover.ipfailover.virtual-address. Since this config key is > specific to only ORFPPwithIP, This Jira moves this config prefix to > ObserverReadProxyProviderWithIPFailover. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13617) Allow wrapping NN QOP into token in encrypted message
[ https://issues.apache.org/jira/browse/HDFS-13617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang updated HDFS-13617: -- Status: Patch Available (was: In Progress) > Allow wrapping NN QOP into token in encrypted message > - > > Key: HDFS-13617 > URL: https://issues.apache.org/jira/browse/HDFS-13617 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-13617.001.patch, HDFS-13617.002.patch, > HDFS-13617.003.patch, HDFS-13617.004.patch > > > This Jira allows NN to configurably wrap the QOP it has established with the > client into the token message sent back to the client. The QOP is sent back > in encrypted message, using BlockAccessToken encryption key as the key. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13617) Allow wrapping NN QOP into token in encrypted message
[ https://issues.apache.org/jira/browse/HDFS-13617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715323#comment-16715323 ] Chen Liang commented on HDFS-13617: --- Have been busy with another project...coming back to this. Post v004 patch for rebase. > Allow wrapping NN QOP into token in encrypted message > - > > Key: HDFS-13617 > URL: https://issues.apache.org/jira/browse/HDFS-13617 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-13617.001.patch, HDFS-13617.002.patch, > HDFS-13617.003.patch, HDFS-13617.004.patch > > > This Jira allows NN to configurably wrap the QOP it has established with the > client into the token message sent back to the client. The QOP is sent back > in encrypted message, using BlockAccessToken encryption key as the key. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13617) Allow wrapping NN QOP into token in encrypted message
[ https://issues.apache.org/jira/browse/HDFS-13617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang updated HDFS-13617: -- Attachment: HDFS-13617.004.patch > Allow wrapping NN QOP into token in encrypted message > - > > Key: HDFS-13617 > URL: https://issues.apache.org/jira/browse/HDFS-13617 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-13617.001.patch, HDFS-13617.002.patch, > HDFS-13617.003.patch, HDFS-13617.004.patch > > > This Jira allows NN to configurably wrap the QOP it has established with the > client into the token message sent back to the client. The QOP is sent back > in encrypted message, using BlockAccessToken encryption key as the key. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14146) Handle exception from internalQueueCall
[ https://issues.apache.org/jira/browse/HDFS-14146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719556#comment-16719556 ] Chen Liang commented on HDFS-14146: --- Thanks [~csun] for reporting! Interesting...in what situation did you get issue? > Handle exception from internalQueueCall > --- > > Key: HDFS-14146 > URL: https://issues.apache.org/jira/browse/HDFS-14146 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ipc >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Critical > Attachments: HDFS-14146-HDFS-12943.000.patch > > > When we re-queue RPC call, the {{internalQueueCall}} will potentially throw > exceptions (e.g., RPC backoff), which is then swallowed. This will cause the > RPC to be silently discarded without response to the client, which is not > good. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14146) Handle exception from internalQueueCall
[ https://issues.apache.org/jira/browse/HDFS-14146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719613#comment-16719613 ] Chen Liang commented on HDFS-14146: --- This is a good discussion and good point on the deadlock possibility. Even without considering the possible deadlock, handlers should probably never be introduced to this potentially block when requeuing requests. Just to clarify, so it seems to me we have two things to ensure about handler requeuing here: 1. it never blocks; 2. if requeuing throws an exception, it is handled as it should be. Thanks Chao for working on this critical issue! > Handle exception from internalQueueCall > --- > > Key: HDFS-14146 > URL: https://issues.apache.org/jira/browse/HDFS-14146 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ipc >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Critical > Attachments: HDFS-14146-HDFS-12943.000.patch > > > When we re-queue RPC call, the {{internalQueueCall}} will potentially throw > exceptions (e.g., RPC backoff), which is then swallowed. This will cause the > RPC to be silently discarded without response to the client, which is not > good. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14116) Fix a potential class cast error in ObserverReadProxyProvider
[ https://issues.apache.org/jira/browse/HDFS-14116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719657#comment-16719657 ] Chen Liang commented on HDFS-14116: --- Thanks for the update [~csun]! v002 patch looks pretty good overall. Just a couple minor comments: 1. in {{AbstractNNFailoverProxyProvider}}, can we change {{factory instanceof ClientHAProxyFactory}} to {{pi.proxy instance ClientProtocol}}? Which seems to me more clear and less assumption on ClientHAProxyFactory 2. maybe change the name {{clientProxy}} to something like {{serviceStateProxy}} for being more informative? We had something similar before. > Fix a potential class cast error in ObserverReadProxyProvider > - > > Key: HDFS-14116 > URL: https://issues.apache.org/jira/browse/HDFS-14116 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Chen Liang >Assignee: Chao Sun >Priority: Major > Attachments: HDFS-14116-HDFS-12943.000.patch, > HDFS-14116-HDFS-12943.001.patch, HDFS-14116-HDFS-12943.002.patch > > > Currently in {{ObserverReadProxyProvider}} constructor there is this line > {code} > ((ClientHAProxyFactory) factory).setAlignmentContext(alignmentContext); > {code} > This could potentially cause failure, because it is possible that factory can > not be casted here. Specifically, > {{NameNodeProxiesClient.createFailoverProxyProvider}} is where the > constructor will be called, and there are two paths that could call into this: > (1).{{NameNodeProxies.createProxy}} > (2).{{NameNodeProxiesClient.createFailoverProxyProvider}} > (2) works fine because it always uses {{ClientHAProxyFactory}} but (1) uses > {{NameNodeHAProxyFactory}} which can not be casted to > {{ClientHAProxyFactory}}, this happens when, for example, running > NNThroughputBenmarck. To fix this we can at least: > 1. introduce setAlignmentContext to HAProxyFactory which is the parent of > both ClientHAProxyFactory and NameNodeHAProxyFactory OR > 2. only setAlignmentContext when it is ClientHAProxyFactory by, say, having a > if check with reflection. > Depending on whether it make sense to have alignment context for the case (1) > calling code paths. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14116) Fix a potential class cast error in ObserverReadProxyProvider
[ https://issues.apache.org/jira/browse/HDFS-14116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720720#comment-16720720 ] Chen Liang commented on HDFS-14116: --- Just one more trivial thing, can we fix at least the third one of the checkstyle warning? Which I think should be just add 'private' to the service proxy variable. +1 with this fixed. I've also run the failed tests locally and they all passed. > Fix a potential class cast error in ObserverReadProxyProvider > - > > Key: HDFS-14116 > URL: https://issues.apache.org/jira/browse/HDFS-14116 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Chen Liang >Assignee: Chao Sun >Priority: Major > Attachments: HDFS-14116-HDFS-12943.000.patch, > HDFS-14116-HDFS-12943.001.patch, HDFS-14116-HDFS-12943.002.patch, > HDFS-14116-HDFS-12943.003.patch > > > Currently in {{ObserverReadProxyProvider}} constructor there is this line > {code} > ((ClientHAProxyFactory) factory).setAlignmentContext(alignmentContext); > {code} > This could potentially cause failure, because it is possible that factory can > not be casted here. Specifically, > {{NameNodeProxiesClient.createFailoverProxyProvider}} is where the > constructor will be called, and there are two paths that could call into this: > (1).{{NameNodeProxies.createProxy}} > (2).{{NameNodeProxiesClient.createFailoverProxyProvider}} > (2) works fine because it always uses {{ClientHAProxyFactory}} but (1) uses > {{NameNodeHAProxyFactory}} which can not be casted to > {{ClientHAProxyFactory}}, this happens when, for example, running > NNThroughputBenmarck. To fix this we can at least: > 1. introduce setAlignmentContext to HAProxyFactory which is the parent of > both ClientHAProxyFactory and NameNodeHAProxyFactory OR > 2. only setAlignmentContext when it is ClientHAProxyFactory by, say, having a > if check with reflection. > Depending on whether it make sense to have alignment context for the case (1) > calling code paths. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14116) ObserverReadProxyProvider should work with protocols other than ClientProtocol
[ https://issues.apache.org/jira/browse/HDFS-14116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723327#comment-16723327 ] Chen Liang commented on HDFS-14116: --- I had an offline discussion with [~shv]. So it looks like we were trying to resolve the cast exception in this Jira and ended up repurposing to extend ORFPP to support non-client-protocol. This may not be the right change because by design, Observer is meant to only for client protocol operations and nothing else. We do not design ORFPP for any other protocol so far to this point. So it might actually be the right thing to just throw exception if ORFPP is used for any protocol other than client protocol, and make changes to NNThroughputBenmark like in the v000 patch. > ObserverReadProxyProvider should work with protocols other than ClientProtocol > -- > > Key: HDFS-14116 > URL: https://issues.apache.org/jira/browse/HDFS-14116 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Chen Liang >Assignee: Chao Sun >Priority: Major > Fix For: HDFS-12943 > > Attachments: HDFS-14116-HDFS-12943.000.patch, > HDFS-14116-HDFS-12943.001.patch, HDFS-14116-HDFS-12943.002.patch, > HDFS-14116-HDFS-12943.003.patch, HDFS-14116-HDFS-12943.004.patch > > > Currently in {{ObserverReadProxyProvider}} constructor there is this line > {code} > ((ClientHAProxyFactory) factory).setAlignmentContext(alignmentContext); > {code} > This could potentially cause failure, because it is possible that factory can > not be casted here. Specifically, > {{NameNodeProxiesClient.createFailoverProxyProvider}} is where the > constructor will be called, and there are two paths that could call into this: > (1).{{NameNodeProxies.createProxy}} > (2).{{NameNodeProxiesClient.createFailoverProxyProvider}} > (2) works fine because it always uses {{ClientHAProxyFactory}} but (1) uses > {{NameNodeHAProxyFactory}} which can not be casted to > {{ClientHAProxyFactory}}, this happens when, for example, running > NNThroughputBenmarck. To fix this we can at least: > 1. introduce setAlignmentContext to HAProxyFactory which is the parent of > both ClientHAProxyFactory and NameNodeHAProxyFactory OR > 2. only setAlignmentContext when it is ClientHAProxyFactory by, say, having a > if check with reflection. > Depending on whether it make sense to have alignment context for the case (1) > calling code paths. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-12943) Consistent Reads from Standby Node
[ https://issues.apache.org/jira/browse/HDFS-12943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723272#comment-16723272 ] Chen Liang edited comment on HDFS-12943 at 12/17/18 10:45 PM: -- Hi [~brahmareddy], Thanks for testing! The timeout issue seems interesting. To start with, it is expected to see some performance degradation *from CLI*, because CLI initiates a DFSClient every time for each command, a fresh DFSClient has to get status of name nodes every time. But if it is the same DFSClient being reused, this would not be an issue. I have never seen the second-call issue. Here is an output from our cluster (log outpu part omitted), and I think you are right about lowering dfs.ha.tail-edits.period, we had similar numbers here: {code:java} $time hdfs --loglevel debug dfs -Ddfs.client.failover.proxy.provider.***=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider -mkdir /TestsORF1 real0m2.254s user0m3.608s sys 0m0.331s $time hdfs --loglevel debug dfs -Ddfs.client.failover.proxy.provider.***=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider -mkdir /TestsORF2 real0m2.159s user0m3.855s sys 0m0.330s{code} Curious, how many NN you had in the testing? and was there any error from NN logs? was (Author: vagarychen): Hi [~brahmareddy], Thanks for testing! The timeout issue seems interesting. To start with, it is expected to see some performance degradation *from CLI*, because CLI initiates a DFSClient every time for each command, a fresh DFSClient has to get status of name nodes every time. But if it is the same DFSClient being reused, this would not be an issue. I have never seen the second-call issue. Here is an output from our cluster (log outpu part omitted), and I think you are right about lowering dfs.ha.tail-edits.period, we had similar numbers here: {code:java} $time hdfs --loglevel debug dfs -Ddfs.client.failover.proxy.provider.ltx1-unonn01=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider -mkdir /TestsORF1 real0m2.254s user0m3.608s sys 0m0.331s $time hdfs --loglevel debug dfs -Ddfs.client.failover.proxy.provider.ltx1-unonn01=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider -mkdir /TestsORF2 real0m2.159s user0m3.855s sys 0m0.330s{code} ** Curious, how many NN you had in the testing? and was there any error from NN logs? > Consistent Reads from Standby Node > -- > > Key: HDFS-12943 > URL: https://issues.apache.org/jira/browse/HDFS-12943 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs >Reporter: Konstantin Shvachko >Priority: Major > Attachments: ConsistentReadsFromStandbyNode.pdf, > ConsistentReadsFromStandbyNode.pdf, HDFS-12943-001.patch, > HDFS-12943-002.patch, TestPlan-ConsistentReadsFromStandbyNode.pdf > > > StandbyNode in HDFS is a replica of the active NameNode. The states of the > NameNodes are coordinated via the journal. It is natural to consider > StandbyNode as a read-only replica. As with any replicated distributed system > the problem of stale reads should be resolved. Our main goal is to provide > reads from standby in a consistent way in order to enable a wide range of > existing applications running on top of HDFS. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14116) ObserverReadProxyProvider should work with protocols other than ClientProtocol
[ https://issues.apache.org/jira/browse/HDFS-14116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723521#comment-16723521 ] Chen Liang commented on HDFS-14116: --- Thanks [~shv] for the patch! +1 from me on v005 patch > ObserverReadProxyProvider should work with protocols other than ClientProtocol > -- > > Key: HDFS-14116 > URL: https://issues.apache.org/jira/browse/HDFS-14116 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Chen Liang >Assignee: Chao Sun >Priority: Major > Fix For: HDFS-12943 > > Attachments: HDFS-14116-HDFS-12943.000.patch, > HDFS-14116-HDFS-12943.001.patch, HDFS-14116-HDFS-12943.002.patch, > HDFS-14116-HDFS-12943.003.patch, HDFS-14116-HDFS-12943.004.patch, > HDFS-14116-HDFS-12943.005.patch > > > Currently in {{ObserverReadProxyProvider}} constructor there is this line > {code} > ((ClientHAProxyFactory) factory).setAlignmentContext(alignmentContext); > {code} > This could potentially cause failure, because it is possible that factory can > not be casted here. Specifically, > {{NameNodeProxiesClient.createFailoverProxyProvider}} is where the > constructor will be called, and there are two paths that could call into this: > (1).{{NameNodeProxies.createProxy}} > (2).{{NameNodeProxiesClient.createFailoverProxyProvider}} > (2) works fine because it always uses {{ClientHAProxyFactory}} but (1) uses > {{NameNodeHAProxyFactory}} which can not be casted to > {{ClientHAProxyFactory}}, this happens when, for example, running > NNThroughputBenmarck. To fix this we can at least: > 1. introduce setAlignmentContext to HAProxyFactory which is the parent of > both ClientHAProxyFactory and NameNodeHAProxyFactory OR > 2. only setAlignmentContext when it is ClientHAProxyFactory by, say, having a > if check with reflection. > Depending on whether it make sense to have alignment context for the case (1) > calling code paths. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12943) Consistent Reads from Standby Node
[ https://issues.apache.org/jira/browse/HDFS-12943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723272#comment-16723272 ] Chen Liang commented on HDFS-12943: --- Hi [~brahmareddy], Thanks for testing! The timeout issue seems interesting. To start with, it is expected to see some performance degradation *from CLI*, because CLI initiates a DFSClient every time for each command, a fresh DFSClient has to get status of name nodes every time. But if it is the same DFSClient being reused, this would not be an issue. I have never seen the second-call issue. Here is an output from our cluster (log outpu part omitted), and I think you are right about lowering dfs.ha.tail-edits.period, we had similar numbers here: {code:java} $time hdfs --loglevel debug dfs -Ddfs.client.failover.proxy.provider.ltx1-unonn01=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider -mkdir /TestsORF1 real0m2.254s user0m3.608s sys 0m0.331s $time hdfs --loglevel debug dfs -Ddfs.client.failover.proxy.provider.ltx1-unonn01=org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider -mkdir /TestsORF2 real0m2.159s user0m3.855s sys 0m0.330s{code} ** Curious, how many NN you had in the testing? and was there any error from NN logs? > Consistent Reads from Standby Node > -- > > Key: HDFS-12943 > URL: https://issues.apache.org/jira/browse/HDFS-12943 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs >Reporter: Konstantin Shvachko >Priority: Major > Attachments: ConsistentReadsFromStandbyNode.pdf, > ConsistentReadsFromStandbyNode.pdf, HDFS-12943-001.patch, > HDFS-12943-002.patch, TestPlan-ConsistentReadsFromStandbyNode.pdf > > > StandbyNode in HDFS is a replica of the active NameNode. The states of the > NameNodes are coordinated via the journal. It is natural to consider > StandbyNode as a read-only replica. As with any replicated distributed system > the problem of stale reads should be resolved. Our main goal is to provide > reads from standby in a consistent way in order to enable a wide range of > existing applications running on top of HDFS. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14116) Fix a potential class cast error in ObserverReadProxyProvider
[ https://issues.apache.org/jira/browse/HDFS-14116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715883#comment-16715883 ] Chen Liang commented on HDFS-14116: --- I applied the patch and reran NNThroughtputBenchmark. The run did work but I had to change fs.defaultFS from name service ID to virtual IP address. I was trying to understand why. [~csun] I wonder, is there a specific reason why the patch uses createNonHAProxy, how about createProxy? I can try change this and see if it works for name service ID. > Fix a potential class cast error in ObserverReadProxyProvider > - > > Key: HDFS-14116 > URL: https://issues.apache.org/jira/browse/HDFS-14116 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Chen Liang >Assignee: Chao Sun >Priority: Major > Attachments: HDFS-14116-HDFS-12943.000.patch > > > Currently in {{ObserverReadProxyProvider}} constructor there is this line > {code} > ((ClientHAProxyFactory) factory).setAlignmentContext(alignmentContext); > {code} > This could potentially cause failure, because it is possible that factory can > not be casted here. Specifically, > {{NameNodeProxiesClient.createFailoverProxyProvider}} is where the > constructor will be called, and there are two paths that could call into this: > (1).{{NameNodeProxies.createProxy}} > (2).{{NameNodeProxiesClient.createFailoverProxyProvider}} > (2) works fine because it always uses {{ClientHAProxyFactory}} but (1) uses > {{NameNodeHAProxyFactory}} which can not be casted to > {{ClientHAProxyFactory}}, this happens when, for example, running > NNThroughputBenmarck. To fix this we can at least: > 1. introduce setAlignmentContext to HAProxyFactory which is the parent of > both ClientHAProxyFactory and NameNodeHAProxyFactory OR > 2. only setAlignmentContext when it is ClientHAProxyFactory by, say, having a > if check with reflection. > Depending on whether it make sense to have alignment context for the case (1) > calling code paths. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14116) Fix a potential class cast error in ObserverReadProxyProvider
[ https://issues.apache.org/jira/browse/HDFS-14116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715883#comment-16715883 ] Chen Liang edited comment on HDFS-14116 at 12/11/18 1:04 AM: - I applied the patch and reran NNThroughtputBenchmark. The run did work but I had to change fs.defaultFS from name service ID to virtual IP address. I was trying to understand why. [~csun] I wonder, is there a specific reason why the patch uses createNonHAProxy, how about createProxy? I haven't looked into detail, not sure if createProxy actually works here though. was (Author: vagarychen): I applied the patch and reran NNThroughtputBenchmark. The run did work but I had to change fs.defaultFS from name service ID to virtual IP address. I was trying to understand why. [~csun] I wonder, is there a specific reason why the patch uses createNonHAProxy, how about createProxy? I can try change this and see if it works for name service ID. > Fix a potential class cast error in ObserverReadProxyProvider > - > > Key: HDFS-14116 > URL: https://issues.apache.org/jira/browse/HDFS-14116 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Chen Liang >Assignee: Chao Sun >Priority: Major > Attachments: HDFS-14116-HDFS-12943.000.patch > > > Currently in {{ObserverReadProxyProvider}} constructor there is this line > {code} > ((ClientHAProxyFactory) factory).setAlignmentContext(alignmentContext); > {code} > This could potentially cause failure, because it is possible that factory can > not be casted here. Specifically, > {{NameNodeProxiesClient.createFailoverProxyProvider}} is where the > constructor will be called, and there are two paths that could call into this: > (1).{{NameNodeProxies.createProxy}} > (2).{{NameNodeProxiesClient.createFailoverProxyProvider}} > (2) works fine because it always uses {{ClientHAProxyFactory}} but (1) uses > {{NameNodeHAProxyFactory}} which can not be casted to > {{ClientHAProxyFactory}}, this happens when, for example, running > NNThroughputBenmarck. To fix this we can at least: > 1. introduce setAlignmentContext to HAProxyFactory which is the parent of > both ClientHAProxyFactory and NameNodeHAProxyFactory OR > 2. only setAlignmentContext when it is ClientHAProxyFactory by, say, having a > if check with reflection. > Depending on whether it make sense to have alignment context for the case (1) > calling code paths. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13617) Allow wrapping NN QOP into token in encrypted message
[ https://issues.apache.org/jira/browse/HDFS-13617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang updated HDFS-13617: -- Attachment: HDFS-13617.005.patch > Allow wrapping NN QOP into token in encrypted message > - > > Key: HDFS-13617 > URL: https://issues.apache.org/jira/browse/HDFS-13617 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-13617.001.patch, HDFS-13617.002.patch, > HDFS-13617.003.patch, HDFS-13617.004.patch, HDFS-13617.005.patch > > > This Jira allows NN to configurably wrap the QOP it has established with the > client into the token message sent back to the client. The QOP is sent back > in encrypted message, using BlockAccessToken encryption key as the key. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13617) Allow wrapping NN QOP into token in encrypted message
[ https://issues.apache.org/jira/browse/HDFS-13617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16718001#comment-16718001 ] Chen Liang commented on HDFS-13617: --- Fix the checkstyle, javac, findbug warnings in v005 patch. > Allow wrapping NN QOP into token in encrypted message > - > > Key: HDFS-13617 > URL: https://issues.apache.org/jira/browse/HDFS-13617 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-13617.001.patch, HDFS-13617.002.patch, > HDFS-13617.003.patch, HDFS-13617.004.patch, HDFS-13617.005.patch > > > This Jira allows NN to configurably wrap the QOP it has established with the > client into the token message sent back to the client. The QOP is sent back > in encrypted message, using BlockAccessToken encryption key as the key. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13873) ObserverNode should reject read requests when it is too far behind.
[ https://issues.apache.org/jira/browse/HDFS-13873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717750#comment-16717750 ] Chen Liang commented on HDFS-13873: --- I guess there are (at least) two directions: estimate-based and timeout-based. Timeout-based is simple: if a call has stuck in the queue for too long (> X sec), reject it. The upside is simple, while the downside is that some resource gets wasted: client wait has already happened, the server queue slot also has been occupied for X sec. Estimate-based is basically, upon seeing the request, Observer make a guess whether this will wait for too long, if so, reject at this point of time. Upside is that no wait or queuing, immediate rejection. Downside is that the estimate better be correct and reasonable, and leaving no wholes for bad client to leverage... We can have some abstractions to allow different reject policies to be plugged in, and we can even have a combination of both: on seeing the requests, make an estimate, but even if the estimate is inaccurate and the requests pass, timeout still makes sure that the requests won't stay in the queue indefinitely. For now, we can have both Konstantin and Chao's strategies combined to start with. > ObserverNode should reject read requests when it is too far behind. > --- > > Key: HDFS-13873 > URL: https://issues.apache.org/jira/browse/HDFS-13873 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client, namenode >Affects Versions: HDFS-12943 >Reporter: Konstantin Shvachko >Assignee: Chao Sun >Priority: Major > > Add a server-side threshold for ObserverNode to reject read requests when it > is too far behind. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14142) Move ipfailover config key out of HdfsClientConfigKeys
Chen Liang created HDFS-14142: - Summary: Move ipfailover config key out of HdfsClientConfigKeys Key: HDFS-14142 URL: https://issues.apache.org/jira/browse/HDFS-14142 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Chen Liang Assignee: Chen Liang Running TestHdfsConfigFields throws error complaining missing key dfs.client.failover.ipfailover.virtual-address. Since this config key is specific to only ORFPPwithIP, This Jira moves this config prefix to ObserverReadProxyProviderWithIPFailover. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14142) Move ipfailover config key out of HdfsClientConfigKeys
[ https://issues.apache.org/jira/browse/HDFS-14142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang updated HDFS-14142: -- Attachment: HDFS-14142-HDFS-12943.001.patch > Move ipfailover config key out of HdfsClientConfigKeys > -- > > Key: HDFS-14142 > URL: https://issues.apache.org/jira/browse/HDFS-14142 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Minor > Attachments: HDFS-14142-HDFS-12943.001.patch > > > Running TestHdfsConfigFields throws error complaining missing key > dfs.client.failover.ipfailover.virtual-address. Since this config key is > specific to only ORFPPwithIP, This Jira moves this config prefix to > ObserverReadProxyProviderWithIPFailover. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14142) Move ipfailover config key out of HdfsClientConfigKeys
[ https://issues.apache.org/jira/browse/HDFS-14142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang updated HDFS-14142: -- Status: Patch Available (was: Open) > Move ipfailover config key out of HdfsClientConfigKeys > -- > > Key: HDFS-14142 > URL: https://issues.apache.org/jira/browse/HDFS-14142 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Minor > Attachments: HDFS-14142-HDFS-12943.001.patch > > > Running TestHdfsConfigFields throws error complaining missing key > dfs.client.failover.ipfailover.virtual-address. Since this config key is > specific to only ORFPPwithIP, This Jira moves this config prefix to > ObserverReadProxyProviderWithIPFailover. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14138) Description errors in the comparison logic of transaction ID
[ https://issues.apache.org/jira/browse/HDFS-14138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720798#comment-16720798 ] Chen Liang commented on HDFS-14138: --- Hey [~xiangheng] thanks for looking through the code! I missed this JIra, you can file this Jira as subtask of HDFS-12943, it would be a lot easier for us to realize and track :). This is indeed a typo by me, but I think the ongoing HDFS-14146 is fixing this also. > Description errors in the comparison logic of transaction ID > > > Key: HDFS-14138 > URL: https://issues.apache.org/jira/browse/HDFS-14138 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: HDFS-12943 >Reporter: xiangheng >Priority: Minor > Attachments: HDFS-14138-HDFS-12943.000.patch > > > The call processing should be postponed until the client call's state id is > aligned (<=) with the server state id,not >=. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14138) Description errors in the comparison logic of transaction ID
[ https://issues.apache.org/jira/browse/HDFS-14138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang updated HDFS-14138: -- Resolution: Fixed Fix Version/s: HDFS-12943 Status: Resolved (was: Patch Available) > Description errors in the comparison logic of transaction ID > > > Key: HDFS-14138 > URL: https://issues.apache.org/jira/browse/HDFS-14138 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-12943 >Reporter: xiangheng >Assignee: xiangheng >Priority: Minor > Fix For: HDFS-12943 > > Attachments: HDFS-14138-HDFS-12943.000.patch > > > The call processing should be postponed until the client call's state id is > aligned (<=) with the server state id,not >=. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14138) Description errors in the comparison logic of transaction ID
[ https://issues.apache.org/jira/browse/HDFS-14138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721786#comment-16721786 ] Chen Liang commented on HDFS-14138: --- Thanks [~csun] for chiming in! I've committed the v000 patch to the feature branch, thanks [~xiangheng] for the contribution! > Description errors in the comparison logic of transaction ID > > > Key: HDFS-14138 > URL: https://issues.apache.org/jira/browse/HDFS-14138 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-12943 >Reporter: xiangheng >Assignee: xiangheng >Priority: Minor > Fix For: HDFS-12943 > > Attachments: HDFS-14138-HDFS-12943.000.patch > > > The call processing should be postponed until the client call's state id is > aligned (<=) with the server state id,not >=. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12943) Consistent Reads from Standby Node
[ https://issues.apache.org/jira/browse/HDFS-12943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725379#comment-16725379 ] Chen Liang commented on HDFS-12943: --- Hi [~brahmareddy], Some more notes to add: 1. getHAServiceState() only gets called when initialization of client proxies (and of course when existing proxies failed and client reinitialize them). In regular operation, this call will not happen so it should not be a concern in benchmarks. 2. I tried the unit test you shared locally with Observer read enabled/disabled. I did not see difference in terms of mkdir time, it has been about 2ms the whole time regardless. I saw some degradation on get content summary though. But this is due to that the unit test is doing mkdir -> getContentSummary -> getFileStatus -> repeat. So the client is constantly switching between write and read, and thus constantly switching between proxies(NNs). This is not the IO pattern Observer is mainly targeting for, and probably the worst case for Observer read because every single getContentSummary call here could potentially trigger Observer catch up wait. > Consistent Reads from Standby Node > -- > > Key: HDFS-12943 > URL: https://issues.apache.org/jira/browse/HDFS-12943 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs >Reporter: Konstantin Shvachko >Priority: Major > Attachments: ConsistentReadsFromStandbyNode.pdf, > ConsistentReadsFromStandbyNode.pdf, HDFS-12943-001.patch, > HDFS-12943-002.patch, TestPlan-ConsistentReadsFromStandbyNode.pdf > > > StandbyNode in HDFS is a replica of the active NameNode. The states of the > NameNodes are coordinated via the journal. It is natural to consider > StandbyNode as a read-only replica. As with any replicated distributed system > the problem of stale reads should be resolved. Our main goal is to provide > reads from standby in a consistent way in order to enable a wide range of > existing applications running on top of HDFS. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-14138) Description errors in the comparison logic of transaction ID
[ https://issues.apache.org/jira/browse/HDFS-14138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang reassigned HDFS-14138: - Assignee: xiangheng > Description errors in the comparison logic of transaction ID > > > Key: HDFS-14138 > URL: https://issues.apache.org/jira/browse/HDFS-14138 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: HDFS-12943 >Reporter: xiangheng >Assignee: xiangheng >Priority: Minor > Attachments: HDFS-14138-HDFS-12943.000.patch > > > The call processing should be postponed until the client call's state id is > aligned (<=) with the server state id,not >=. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14138) Description errors in the comparison logic of transaction ID
[ https://issues.apache.org/jira/browse/HDFS-14138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang updated HDFS-14138: -- Issue Type: Sub-task (was: Bug) Parent: HDFS-12943 > Description errors in the comparison logic of transaction ID > > > Key: HDFS-14138 > URL: https://issues.apache.org/jira/browse/HDFS-14138 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-12943 >Reporter: xiangheng >Assignee: xiangheng >Priority: Minor > Attachments: HDFS-14138-HDFS-12943.000.patch > > > The call processing should be postponed until the client call's state id is > aligned (<=) with the server state id,not >=. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14138) Description errors in the comparison logic of transaction ID
[ https://issues.apache.org/jira/browse/HDFS-14138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721657#comment-16721657 ] Chen Liang commented on HDFS-14138: --- Hi [~xiangheng], absolutely no worries, and welcome to the community! :) I've asked a Jira admin from our team to add you to contributor. From this point on, you should be able to (and feel free to) assign a Hadoop/HDFS Jira you work on to yourself by setting assignee of the Jira. And just for reference, by convention, if a Jira is found to be part of another one (which happens fairly often), you can just close the Jira, marking duplicate. > Description errors in the comparison logic of transaction ID > > > Key: HDFS-14138 > URL: https://issues.apache.org/jira/browse/HDFS-14138 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: HDFS-12943 >Reporter: xiangheng >Assignee: xiangheng >Priority: Minor > Attachments: HDFS-14138-HDFS-12943.000.patch > > > The call processing should be postponed until the client call's state id is > aligned (<=) with the server state id,not >=. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14116) ObserverReadProxyProvider should work with protocols other than ClientProtocol
[ https://issues.apache.org/jira/browse/HDFS-14116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721686#comment-16721686 ] Chen Liang commented on HDFS-14116: --- I've committed v004 patch to the feature branch, thanks [~csun] for the contribution! > ObserverReadProxyProvider should work with protocols other than ClientProtocol > -- > > Key: HDFS-14116 > URL: https://issues.apache.org/jira/browse/HDFS-14116 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Chen Liang >Assignee: Chao Sun >Priority: Major > Fix For: HDFS-12943 > > Attachments: HDFS-14116-HDFS-12943.000.patch, > HDFS-14116-HDFS-12943.001.patch, HDFS-14116-HDFS-12943.002.patch, > HDFS-14116-HDFS-12943.003.patch, HDFS-14116-HDFS-12943.004.patch > > > Currently in {{ObserverReadProxyProvider}} constructor there is this line > {code} > ((ClientHAProxyFactory) factory).setAlignmentContext(alignmentContext); > {code} > This could potentially cause failure, because it is possible that factory can > not be casted here. Specifically, > {{NameNodeProxiesClient.createFailoverProxyProvider}} is where the > constructor will be called, and there are two paths that could call into this: > (1).{{NameNodeProxies.createProxy}} > (2).{{NameNodeProxiesClient.createFailoverProxyProvider}} > (2) works fine because it always uses {{ClientHAProxyFactory}} but (1) uses > {{NameNodeHAProxyFactory}} which can not be casted to > {{ClientHAProxyFactory}}, this happens when, for example, running > NNThroughputBenmarck. To fix this we can at least: > 1. introduce setAlignmentContext to HAProxyFactory which is the parent of > both ClientHAProxyFactory and NameNodeHAProxyFactory OR > 2. only setAlignmentContext when it is ClientHAProxyFactory by, say, having a > if check with reflection. > Depending on whether it make sense to have alignment context for the case (1) > calling code paths. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14116) ObserverReadProxyProvider should work with protocols other than ClientProtocol
[ https://issues.apache.org/jira/browse/HDFS-14116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang updated HDFS-14116: -- Resolution: Fixed Fix Version/s: HDFS-12943 Status: Resolved (was: Patch Available) > ObserverReadProxyProvider should work with protocols other than ClientProtocol > -- > > Key: HDFS-14116 > URL: https://issues.apache.org/jira/browse/HDFS-14116 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Chen Liang >Assignee: Chao Sun >Priority: Major > Fix For: HDFS-12943 > > Attachments: HDFS-14116-HDFS-12943.000.patch, > HDFS-14116-HDFS-12943.001.patch, HDFS-14116-HDFS-12943.002.patch, > HDFS-14116-HDFS-12943.003.patch, HDFS-14116-HDFS-12943.004.patch > > > Currently in {{ObserverReadProxyProvider}} constructor there is this line > {code} > ((ClientHAProxyFactory) factory).setAlignmentContext(alignmentContext); > {code} > This could potentially cause failure, because it is possible that factory can > not be casted here. Specifically, > {{NameNodeProxiesClient.createFailoverProxyProvider}} is where the > constructor will be called, and there are two paths that could call into this: > (1).{{NameNodeProxies.createProxy}} > (2).{{NameNodeProxiesClient.createFailoverProxyProvider}} > (2) works fine because it always uses {{ClientHAProxyFactory}} but (1) uses > {{NameNodeHAProxyFactory}} which can not be casted to > {{ClientHAProxyFactory}}, this happens when, for example, running > NNThroughputBenmarck. To fix this we can at least: > 1. introduce setAlignmentContext to HAProxyFactory which is the parent of > both ClientHAProxyFactory and NameNodeHAProxyFactory OR > 2. only setAlignmentContext when it is ClientHAProxyFactory by, say, having a > if check with reflection. > Depending on whether it make sense to have alignment context for the case (1) > calling code paths. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14116) Fix a potential class cast error in ObserverReadProxyProvider
Chen Liang created HDFS-14116: - Summary: Fix a potential class cast error in ObserverReadProxyProvider Key: HDFS-14116 URL: https://issues.apache.org/jira/browse/HDFS-14116 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Chen Liang Currently in {{ObserverReadProxyProvider}} constructor there is this line {code} ((ClientHAProxyFactory) factory).setAlignmentContext(alignmentContext); {code} This could potentially cause failure, because it is possible that factory can not be casted here. Specifically, {{NameNodeProxiesClient.createFailoverProxyProvider}} is where the constructor will be called, and there are two paths that could call into this: (1).{{NameNodeProxies.createProxy}} (2).{{NameNodeProxiesClient.createFailoverProxyProvider}} (2) works fine because it always uses {{ClientHAProxyFactory}} but (1) uses {{NameNodeHAProxyFactory}} which can not be casted to {{ClientHAProxyFactory}}, this happens when, for example, running NNThroughputBenmarck. To fix this we can at least: 1. introduce setAlignmentContext to HAProxyFactory which is the parent of both ClientHAProxyFactory and NameNodeHAProxyFactory OR 2. only setAlignmentContext when it is ClientHAProxyFactory by, say, having a if check with reflection. Depending on whether it make sense to have alignment context for the case (1) calling code paths. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13547) Add ingress port based sasl resolver
[ https://issues.apache.org/jira/browse/HDFS-13547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16703810#comment-16703810 ] Chen Liang commented on HDFS-13547: --- Thanks for checking [~vinodkv]! Will commit to 3.1.1 to branch-3 soon. > Add ingress port based sasl resolver > > > Key: HDFS-13547 > URL: https://issues.apache.org/jira/browse/HDFS-13547 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: security >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Fix For: 3.2.0 > > Attachments: HDFS-13547.001.patch, HDFS-13547.002.patch, > HDFS-13547.003.patch, HDFS-13547.004.patch > > > This Jira extends the SASL properties resolver interface to take an ingress > port parameter, and also adds an implementation based on this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14116) Fix a potential class cast error in ObserverReadProxyProvider
[ https://issues.apache.org/jira/browse/HDFS-14116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16703877#comment-16703877 ] Chen Liang commented on HDFS-14116: --- Thanks [~csun]! Yeah I can another look later. Posting error stack trace when running NNThroughput here for record: {code} Caused by: java.lang.ClassCastException: org.apache.hadoop.hdfs.server.namenode.ha.NameNodeHAProxyFactory cannot be cast to org.apache.hadoop.hdfs.server.namenode.ha.ClientHAProxyFactory at org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider.(ObserverReadProxyProvider.java:118) at org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProviderWithIPFailover.(ObserverReadProxyProviderWithIPFailover.java:99) at org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProviderWithIPFailover.(ObserverReadProxyProviderWithIPFailover.java:86) ... 12 more {code} {code} Exception in thread "main" java.io.IOException: Couldn't create proxy provider class org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProviderWithIPFailover at org.apache.hadoop.hdfs.NameNodeProxiesClient.createFailoverProxyProvider(NameNodeProxiesClient.java:261) at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:115) at org.apache.hadoop.hdfs.DFSTestUtil.getRefreshUserMappingsProtocolProxy(DFSTestUtil.java:2022) at org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark.run(NNThroughputBenchmark.java:1524) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) at org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark.runBenchmark(NNThroughputBenchmark.java:1432) at org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark.main(NNThroughputBenchmark.java:1552) {code} > Fix a potential class cast error in ObserverReadProxyProvider > - > > Key: HDFS-14116 > URL: https://issues.apache.org/jira/browse/HDFS-14116 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Chen Liang >Priority: Major > > Currently in {{ObserverReadProxyProvider}} constructor there is this line > {code} > ((ClientHAProxyFactory) factory).setAlignmentContext(alignmentContext); > {code} > This could potentially cause failure, because it is possible that factory can > not be casted here. Specifically, > {{NameNodeProxiesClient.createFailoverProxyProvider}} is where the > constructor will be called, and there are two paths that could call into this: > (1).{{NameNodeProxies.createProxy}} > (2).{{NameNodeProxiesClient.createFailoverProxyProvider}} > (2) works fine because it always uses {{ClientHAProxyFactory}} but (1) uses > {{NameNodeHAProxyFactory}} which can not be casted to > {{ClientHAProxyFactory}}, this happens when, for example, running > NNThroughputBenmarck. To fix this we can at least: > 1. introduce setAlignmentContext to HAProxyFactory which is the parent of > both ClientHAProxyFactory and NameNodeHAProxyFactory OR > 2. only setAlignmentContext when it is ClientHAProxyFactory by, say, having a > if check with reflection. > Depending on whether it make sense to have alignment context for the case (1) > calling code paths. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-13547) Add ingress port based sasl resolver
[ https://issues.apache.org/jira/browse/HDFS-13547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang resolved HDFS-13547. --- Resolution: Fixed Fix Version/s: 3.1.1 > Add ingress port based sasl resolver > > > Key: HDFS-13547 > URL: https://issues.apache.org/jira/browse/HDFS-13547 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: security >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Fix For: 3.2.0, 3.1.1 > > Attachments: HDFS-13547.001.patch, HDFS-13547.002.patch, > HDFS-13547.003.patch, HDFS-13547.004.patch > > > This Jira extends the SASL properties resolver interface to take an ingress > port parameter, and also adds an implementation based on this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13547) Add ingress port based sasl resolver
[ https://issues.apache.org/jira/browse/HDFS-13547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16703845#comment-16703845 ] Chen Liang commented on HDFS-13547: --- Committed v004 patch to branch-3 and branch-3.1.1. > Add ingress port based sasl resolver > > > Key: HDFS-13547 > URL: https://issues.apache.org/jira/browse/HDFS-13547 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: security >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Fix For: 3.2.0 > > Attachments: HDFS-13547.001.patch, HDFS-13547.002.patch, > HDFS-13547.003.patch, HDFS-13547.004.patch > > > This Jira extends the SASL properties resolver interface to take an ingress > port parameter, and also adds an implementation based on this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14120) ORFPP should also clone DT for the virtual IP
[ https://issues.apache.org/jira/browse/HDFS-14120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang updated HDFS-14120: -- Description: Currently with HDFS-14017, ORFPP behaves the similar way on handling delegation as ConfiguredFailoverProxyProvider. Specifically, given the delegation token associated with name service ID, it clones the DTs for all the corresponding physical addresses. But ORFPPwIP requires more work than CFPP in the sense that it also leverages VIP address for failover, meaning in addition to cloning DT for physical addresses, ORFPPwIP also needs to clone DT for the VIP address, which is missed from HDFS-14017. This is specific to ORFPPwIP, should not affect ORFPP. (was: Currently with HDFS-14017, ORFPP behaves the similar way on handling delegation as ConfiguredFailoverProxyProvider. Specifically, given the delegation token associated with name service ID, it clones the DTs for all the corresponding physical addresses. But ORFPP requires more work than CFPP in the sense that it also leverages VIP address for failover, meaning in addition to cloning DT for physical addresses, ORFPP also needs to clone DT for the VIP address, which is missed from HDFS-14017.) > ORFPP should also clone DT for the virtual IP > - > > Key: HDFS-14120 > URL: https://issues.apache.org/jira/browse/HDFS-14120 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-12943 >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > > Currently with HDFS-14017, ORFPP behaves the similar way on handling > delegation as ConfiguredFailoverProxyProvider. Specifically, given the > delegation token associated with name service ID, it clones the DTs for all > the corresponding physical addresses. But ORFPPwIP requires more work than > CFPP in the sense that it also leverages VIP address for failover, meaning in > addition to cloning DT for physical addresses, ORFPPwIP also needs to clone > DT for the VIP address, which is missed from HDFS-14017. This is specific to > ORFPPwIP, should not affect ORFPP. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14120) ORFPP should also clone DT for the virtual IP
Chen Liang created HDFS-14120: - Summary: ORFPP should also clone DT for the virtual IP Key: HDFS-14120 URL: https://issues.apache.org/jira/browse/HDFS-14120 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-12943 Reporter: Chen Liang Assignee: Chen Liang Currently with HDFS-14017, ORFPP behaves the similar way on handling delegation as ConfiguredFailoverProxyProvider. Specifically, given the delegation token associated with name service ID, it clones the DTs for all the corresponding physical addresses. But ORFPP requires more work than CFPP in the sense that it also leverages VIP address for failover, meaning in addition to cloning DT for physical addresses, ORFPP also needs to clone DT for the VIP address, which is missed from HDFS-14017. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14126) DataNode DirectoryScanner holding global lock for too long
[ https://issues.apache.org/jira/browse/HDFS-14126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16709085#comment-16709085 ] Chen Liang commented on HDFS-14126: --- Thanks for reporting [~jojochuang]. I have not seen this issue though. I just randomly checked several DNs in our 3.1 cluster, with number of blocks from 330K to 1024K, I did not see this exception. > DataNode DirectoryScanner holding global lock for too long > -- > > Key: HDFS-14126 > URL: https://issues.apache.org/jira/browse/HDFS-14126 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Wei-Chiu Chuang >Priority: Major > > I've got a Hadoop 3 based cluster set up, and this DN has just 434 thousand > blocks. > And yet, DirectoryScanner holds the fsdataset lock for 2.7 seconds: > {quote} > 2018-12-03 21:33:09,130 INFO > org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool > BP-4588049-10.17.XXX-XX-281857726 Total blocks: 434401, missing metadata > fi > les:0, missing block files:0, missing blocks in memory:0, mismatched blocks:0 > 2018-12-03 21:33:09,131 WARN > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Lock > held time above threshold: lock identifier: org.apache.hadoop.hdfs.serve > r.datanode.fsdataset.impl.FsDatasetImpl lockHeldTimeMs=2710 ms. Suppressed 0 > lock warnings. The stack trace is: > java.lang.Thread.getStackTrace(Thread.java:1559) > org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032) > org.apache.hadoop.util.InstrumentedLock.logWarning(InstrumentedLock.java:148) > org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:186) > org.apache.hadoop.util.InstrumentedLock.unlock(InstrumentedLock.java:133) > org.apache.hadoop.util.AutoCloseableLock.release(AutoCloseableLock.java:84) > org.apache.hadoop.util.AutoCloseableLock.close(AutoCloseableLock.java:96) > org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:473) > org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:373) > org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:318) > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > java.lang.Thread.run(Thread.java:748) > {quote} > Log messages like this repeats every several hours (6, to be exact). I am not > sure if this is a performance regression, or just the fact that the lock > information is printed in Hadoop 3. [~vagarychen] or [~templedf] do you know? > There's no log in DN to indicate any sort of JVM GC going on. Plus, the DN's > heap size is set to several GB. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14116) Fix a potential class cast error in ObserverReadProxyProvider
[ https://issues.apache.org/jira/browse/HDFS-14116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710791#comment-16710791 ] Chen Liang commented on HDFS-14116: --- Thanks for the finding [~shv], seems this error is different though. Because this is triggering a different code path, causing a different class casting. For fsck to work, as a current workaround, we can override configs to use configured proxy provider in the fsck command. For example if we have fs.defaultFS=ns1, we can call fsck as {code} hdfs fsck -Ddfs.client.failover.proxy.provider.ns1= \ org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider hdfs://ns1/ {code} Still need to fix this though. > Fix a potential class cast error in ObserverReadProxyProvider > - > > Key: HDFS-14116 > URL: https://issues.apache.org/jira/browse/HDFS-14116 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Chen Liang >Assignee: Chao Sun >Priority: Major > > Currently in {{ObserverReadProxyProvider}} constructor there is this line > {code} > ((ClientHAProxyFactory) factory).setAlignmentContext(alignmentContext); > {code} > This could potentially cause failure, because it is possible that factory can > not be casted here. Specifically, > {{NameNodeProxiesClient.createFailoverProxyProvider}} is where the > constructor will be called, and there are two paths that could call into this: > (1).{{NameNodeProxies.createProxy}} > (2).{{NameNodeProxiesClient.createFailoverProxyProvider}} > (2) works fine because it always uses {{ClientHAProxyFactory}} but (1) uses > {{NameNodeHAProxyFactory}} which can not be casted to > {{ClientHAProxyFactory}}, this happens when, for example, running > NNThroughputBenmarck. To fix this we can at least: > 1. introduce setAlignmentContext to HAProxyFactory which is the parent of > both ClientHAProxyFactory and NameNodeHAProxyFactory OR > 2. only setAlignmentContext when it is ClientHAProxyFactory by, say, having a > if check with reflection. > Depending on whether it make sense to have alignment context for the case (1) > calling code paths. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14120) ORFPP should also clone DT for the virtual IP
[ https://issues.apache.org/jira/browse/HDFS-14120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang updated HDFS-14120: -- Resolution: Fixed Status: Resolved (was: Patch Available) > ORFPP should also clone DT for the virtual IP > - > > Key: HDFS-14120 > URL: https://issues.apache.org/jira/browse/HDFS-14120 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-12943 >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-14120-HDFS-12943.001.patch > > > Currently with HDFS-14017, ORFPP behaves the similar way on handling > delegation as ConfiguredFailoverProxyProvider. Specifically, given the > delegation token associated with name service ID, it clones the DTs for all > the corresponding physical addresses. But ORFPPwIP requires more work than > CFPP in the sense that it also leverages VIP address for failover, meaning in > addition to cloning DT for physical addresses, ORFPPwIP also needs to clone > DT for the VIP address, which is missed from HDFS-14017. This is specific to > ORFPPwIP, should not affect ORFPP. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14120) ORFPP should also clone DT for the virtual IP
[ https://issues.apache.org/jira/browse/HDFS-14120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708038#comment-16708038 ] Chen Liang commented on HDFS-14120: --- Thanks [~shv] for the review! I've committed to the feature branch. > ORFPP should also clone DT for the virtual IP > - > > Key: HDFS-14120 > URL: https://issues.apache.org/jira/browse/HDFS-14120 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-12943 >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-14120-HDFS-12943.001.patch > > > Currently with HDFS-14017, ORFPP behaves the similar way on handling > delegation as ConfiguredFailoverProxyProvider. Specifically, given the > delegation token associated with name service ID, it clones the DTs for all > the corresponding physical addresses. But ORFPPwIP requires more work than > CFPP in the sense that it also leverages VIP address for failover, meaning in > addition to cloning DT for physical addresses, ORFPPwIP also needs to clone > DT for the VIP address, which is missed from HDFS-14017. This is specific to > ORFPPwIP, should not affect ORFPP. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14120) ORFPP should also clone DT for the virtual IP
[ https://issues.apache.org/jira/browse/HDFS-14120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang updated HDFS-14120: -- Status: Patch Available (was: Open) > ORFPP should also clone DT for the virtual IP > - > > Key: HDFS-14120 > URL: https://issues.apache.org/jira/browse/HDFS-14120 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-12943 >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-14120-HDFS-12943.001.patch > > > Currently with HDFS-14017, ORFPP behaves the similar way on handling > delegation as ConfiguredFailoverProxyProvider. Specifically, given the > delegation token associated with name service ID, it clones the DTs for all > the corresponding physical addresses. But ORFPPwIP requires more work than > CFPP in the sense that it also leverages VIP address for failover, meaning in > addition to cloning DT for physical addresses, ORFPPwIP also needs to clone > DT for the VIP address, which is missed from HDFS-14017. This is specific to > ORFPPwIP, should not affect ORFPP. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14120) ORFPP should also clone DT for the virtual IP
[ https://issues.apache.org/jira/browse/HDFS-14120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang updated HDFS-14120: -- Attachment: HDFS-14120-HDFS-12943.001.patch > ORFPP should also clone DT for the virtual IP > - > > Key: HDFS-14120 > URL: https://issues.apache.org/jira/browse/HDFS-14120 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-12943 >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-14120-HDFS-12943.001.patch > > > Currently with HDFS-14017, ORFPP behaves the similar way on handling > delegation as ConfiguredFailoverProxyProvider. Specifically, given the > delegation token associated with name service ID, it clones the DTs for all > the corresponding physical addresses. But ORFPPwIP requires more work than > CFPP in the sense that it also leverages VIP address for failover, meaning in > addition to cloning DT for physical addresses, ORFPPwIP also needs to clone > DT for the VIP address, which is missed from HDFS-14017. This is specific to > ORFPPwIP, should not affect ORFPP. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13547) Add ingress port based sasl resolver
[ https://issues.apache.org/jira/browse/HDFS-13547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16707743#comment-16707743 ] Chen Liang commented on HDFS-13547: --- I was not aware of this about fix-version for released version, thanks [~vinodkv] for taking care! > Add ingress port based sasl resolver > > > Key: HDFS-13547 > URL: https://issues.apache.org/jira/browse/HDFS-13547 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: security >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Fix For: 3.2.0 > > Attachments: HDFS-13547.001.patch, HDFS-13547.002.patch, > HDFS-13547.003.patch, HDFS-13547.004.patch > > > This Jira extends the SASL properties resolver interface to take an ingress > port parameter, and also adds an implementation based on this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14058) Test reads from standby on a secure cluster with IP failover
[ https://issues.apache.org/jira/browse/HDFS-14058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16709359#comment-16709359 ] Chen Liang commented on HDFS-14058: --- Another test done: 6. Performed manual failover while a TeraSort job was running. Specifically, I had two NNs ha1 and ha2, one as ANN and one as ONN. After started a TeraSort job with 1800+ mappers and 500 reducers, I changed ONN to SNN, and then call {{hdfs haadmin -failover --forcefence ha1 ha2}}, and also reconfigured the VIP to the other name node. Although some task attempts ended up failed due to timeout, the job finished successfully. > Test reads from standby on a secure cluster with IP failover > > > Key: HDFS-14058 > URL: https://issues.apache.org/jira/browse/HDFS-14058 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Reporter: Konstantin Shvachko >Assignee: Chen Liang >Priority: Major > > Run standard HDFS tests to verify reading from ObserverNode on a secure HA > cluster with {{IPFailoverProxyProvider}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14058) Test reads from standby on a secure cluster with IP failover
[ https://issues.apache.org/jira/browse/HDFS-14058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang updated HDFS-14058: -- Attachment: dfsio_crs.with-crs.txt dfsio_crs.no-crs.txt > Test reads from standby on a secure cluster with IP failover > > > Key: HDFS-14058 > URL: https://issues.apache.org/jira/browse/HDFS-14058 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Reporter: Konstantin Shvachko >Assignee: Chen Liang >Priority: Major > Attachments: dfsio_crs.no-crs.txt, dfsio_crs.with-crs.txt > > > Run standard HDFS tests to verify reading from ObserverNode on a secure HA > cluster with {{IPFailoverProxyProvider}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14058) Test reads from standby on a secure cluster with IP failover
[ https://issues.apache.org/jira/browse/HDFS-14058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16709444#comment-16709444 ] Chen Liang commented on HDFS-14058: --- Post some numbers from running DFSIO test, I ran the test for both with observer read enabled and without observer read, each with three runs. [^dfsio_crs.no-crs.txt] shows the results from the three runs with observer read disabled, while [^dfsio_crs.with-crs.txt] shows the results with observer read enabled. The numbers are very close. Again it is because the cluster is fairly empty so ANN alone is able to handle the read requests, in this case we don't gain much performance improvement by adding observer read. The test is to prove correctness and show there is no performance degradation. > Test reads from standby on a secure cluster with IP failover > > > Key: HDFS-14058 > URL: https://issues.apache.org/jira/browse/HDFS-14058 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Reporter: Konstantin Shvachko >Assignee: Chen Liang >Priority: Major > Attachments: dfsio_crs.no-crs.txt, dfsio_crs.with-crs.txt > > > Run standard HDFS tests to verify reading from ObserverNode on a secure HA > cluster with {{IPFailoverProxyProvider}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13767) Add msync server implementation.
[ https://issues.apache.org/jira/browse/HDFS-13767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang updated HDFS-13767: -- Fix Version/s: HDFS-12943 > Add msync server implementation. > > > Key: HDFS-13767 > URL: https://issues.apache.org/jira/browse/HDFS-13767 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Fix For: HDFS-12943 > > Attachments: HDFS-13767-HDFS-12943.001.patch, > HDFS-13767-HDFS-12943.002.patch, HDFS-13767-HDFS-12943.003.patch, > HDFS-13767-HDFS-12943.004.patch, HDFS-13767.WIP.001.patch, > HDFS-13767.WIP.002.patch, HDFS-13767.WIP.003.patch, HDFS-13767.WIP.004.patch > > > This is a followup on HDFS-13688, where msync API is introduced to > {{ClientProtocol}} but the server side implementation is missing. This is > Jira is to implement the server side logic. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14058) Test reads from standby on a secure cluster with IP failover
[ https://issues.apache.org/jira/browse/HDFS-14058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689993#comment-16689993 ] Chen Liang edited comment on HDFS-14058 at 12/5/18 12:02 AM: - The tests were done with the setup of 100+ datanodes, 1 Active NameNode and 1 Observer NameNode. No other standby nodes. The cluster has light HDFS workload, and has YARN deployed, has security (Kerberos) enabled. The purpose here was not evaluate performance gain, but mainly to prove the functionality and correctness. In all the tests below, it is *verified from both name nodes audit log* that the reads actually went to Observer node and writes went to Active, and it is *verified from job/client logs* that when client could not talk to Observer (e.g. for write requests, or Observer node is actually in Standby not observer), it fell back to talking to the active. The specific tests done include: 1. basic hdfs IO - From hdfs command: -- create/delete directory -- basic file put/get/delete - From a simple Java program. I wrote some code which creates a DFSClient instance and perform some basic operations against it: -- create/delete directory -- get/renew delegation token One observation on this is that, from command line, depending on the relative order of ANN and ONN in config, the failover may happen every single time, with an exception printed. This is because from command, every single command line call will create a new DFSClient instance. Which may start with calling Observer for write, causing failover. But for reused DFSClient (e.g. from a Java program where it create and reuse same DFSClient), there is no this issue. 2. simple MR job: a simple wordcount job from mapreduce-examples jar, on a very small input. 3. SliveTest: ran Slive from hadoop-mapreduce-client-jobclient jar, with default parameters. I ran Slive 3 times for both with Observer enabled and disabled. I saw similar number of ops/sec. 4.DFSIO: ran DFSIO read test several times from hadoop-mapreduce-client-jobclient jar, the tests were done with 100 files, 100 MB each. 5. TeraGen/Sort/Validate: ran TeraGen/Sort/Validate several times from hadoop-mapreduce-examples jar with 1TB of data. TeraSort used 1800+ mappers and 500 reducers. All three jobs finished successfully. was (Author: vagarychen): The tests I've run include the following. Please note that the following tests were done without several recent changes such as HDFS-14035 and HDFS-14017, but with some hacky code change and workaround. Although the required changes have been formalized to recent Jiras, the following tests haven't all been re-run along with those change. Post here for record. The tests were done with the setup of 100+ datanodes, 1 Active NameNode and 1 Observer NameNode. No other standby nodes. The cluster has light HDFS workload, has YARN deployed, and has security (Kerberos) enabled. The purpose here was not evaluate performance gain, but only to prove the functionality. In all the tests below, it is verified from Observer node audit log that the reads actually went to Observer node. 1. basic hdfs IO - From hdfs command: -- create/delete directory -- basic file put/get/delete - From a simple Java program. I wrote some code which creates a DFSClient instance and perform some basic operations against it: -- create/delete directory -- get/renew delegation token One observation on this is that, from command line, depending on the relative order of ANN and ONN in config, the failover may happen every single time, with an exception printed. I believe this is because from command, every single command line call will create a new DFSClient instance. Which may start with calling Observer for write, causing failover. But for reused DFSClient (e.g. from a Java program where it create and reuse same DFSClient), there is no this issue. 2. simple MR job: a simple wordcount job from mapreduce-examples jar, on a very small input. 3. SliveTest: ran Slive from hadoop-mapreduce-client-jobclient jar, without parameters (so it uses default). I ran Slive 3 times for both with Observer enabled and disabled. I saw roughly the same ops/sec. 4.DFSIO: ran DFSIO read test several times from hadoop-mapreduce-client-jobclient jar, but only with very small input size. (10 files with 1KB each). 5. TeraGen/Sort/Validate: ran TeraGen/Sort/Validate from hadoop-mapreduce-examples jar with 1TB of data. TeraSort used 1800+ mappers and 500 reducers. All three jobs finished successfully. > Test reads from standby on a secure cluster with IP failover > > > Key: HDFS-14058 > URL: https://issues.apache.org/jira/browse/HDFS-14058 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Reporter: Konstantin Shvachko >
[jira] [Commented] (HDFS-14017) ObserverReadProxyProviderWithIPFailover should work with HA configuration
[ https://issues.apache.org/jira/browse/HDFS-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16690185#comment-16690185 ] Chen Liang commented on HDFS-14017: --- I've committed v014 patch to feature branch, thanks for all the reviews and discussions! > ObserverReadProxyProviderWithIPFailover should work with HA configuration > - > > Key: HDFS-14017 > URL: https://issues.apache.org/jira/browse/HDFS-14017 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Fix For: HDFS-12943 > > Attachments: HDFS-14017-HDFS-12943.001.patch, > HDFS-14017-HDFS-12943.002.patch, HDFS-14017-HDFS-12943.003.patch, > HDFS-14017-HDFS-12943.004.patch, HDFS-14017-HDFS-12943.005.patch, > HDFS-14017-HDFS-12943.006.patch, HDFS-14017-HDFS-12943.008.patch, > HDFS-14017-HDFS-12943.009.patch, HDFS-14017-HDFS-12943.010.patch, > HDFS-14017-HDFS-12943.011.patch, HDFS-14017-HDFS-12943.012.patch, > HDFS-14017-HDFS-12943.013.patch, HDFS-14017-HDFS-12943.014.patch > > > Currently {{ObserverReadProxyProviderWithIPFailover}} extends > {{ObserverReadProxyProvider}}, and the only difference is changing the proxy > factory to use {{IPFailoverProxyProvider}}. However this is not enough > because when calling constructor of {{ObserverReadProxyProvider}} in > super(...), the follow line: > {code:java} > nameNodeProxies = getProxyAddresses(uri, > HdfsClientConfigKeys.DFS_NAMENODE_RPC_ADDRESS_KEY); > {code} > will try to resolve the all configured NN addresses to do configured > failover. But in the case of IPFailover, this does not really apply. > > A second issue closely related is about delegation token. For example, in > current IPFailover setup, say we have a virtual host nn.xyz.com, which points > to either of two physical nodes nn1.xyz.com or nn2.xyz.com. In current HDFS, > there is always only one DT being exchanged, which has hostname nn.xyz.com. > Server only issues this DT, and client only knows the host nn.xyz.com, so all > is good. But in Observer read, even with IPFailover, the client will no > longer contacting nn.xyz.com, but will actively reaching to nn1.xyz.com and > nn2.xyz.com. During this process, current code will look for DT associated > with hostname nn1.xyz.com or nn2.xyz.com, which is different from the DT > given by NN. causing Token authentication to fail. This happens in > {{AbstractDelegationTokenSelector#selectToken}}. New IPFailover proxy > provider will need to resolve this as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14017) ObserverReadProxyProviderWithIPFailover should work with HA configuration
[ https://issues.apache.org/jira/browse/HDFS-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang updated HDFS-14017: -- Resolution: Fixed Fix Version/s: HDFS-12943 Status: Resolved (was: Patch Available) > ObserverReadProxyProviderWithIPFailover should work with HA configuration > - > > Key: HDFS-14017 > URL: https://issues.apache.org/jira/browse/HDFS-14017 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Fix For: HDFS-12943 > > Attachments: HDFS-14017-HDFS-12943.001.patch, > HDFS-14017-HDFS-12943.002.patch, HDFS-14017-HDFS-12943.003.patch, > HDFS-14017-HDFS-12943.004.patch, HDFS-14017-HDFS-12943.005.patch, > HDFS-14017-HDFS-12943.006.patch, HDFS-14017-HDFS-12943.008.patch, > HDFS-14017-HDFS-12943.009.patch, HDFS-14017-HDFS-12943.010.patch, > HDFS-14017-HDFS-12943.011.patch, HDFS-14017-HDFS-12943.012.patch, > HDFS-14017-HDFS-12943.013.patch, HDFS-14017-HDFS-12943.014.patch > > > Currently {{ObserverReadProxyProviderWithIPFailover}} extends > {{ObserverReadProxyProvider}}, and the only difference is changing the proxy > factory to use {{IPFailoverProxyProvider}}. However this is not enough > because when calling constructor of {{ObserverReadProxyProvider}} in > super(...), the follow line: > {code:java} > nameNodeProxies = getProxyAddresses(uri, > HdfsClientConfigKeys.DFS_NAMENODE_RPC_ADDRESS_KEY); > {code} > will try to resolve the all configured NN addresses to do configured > failover. But in the case of IPFailover, this does not really apply. > > A second issue closely related is about delegation token. For example, in > current IPFailover setup, say we have a virtual host nn.xyz.com, which points > to either of two physical nodes nn1.xyz.com or nn2.xyz.com. In current HDFS, > there is always only one DT being exchanged, which has hostname nn.xyz.com. > Server only issues this DT, and client only knows the host nn.xyz.com, so all > is good. But in Observer read, even with IPFailover, the client will no > longer contacting nn.xyz.com, but will actively reaching to nn1.xyz.com and > nn2.xyz.com. During this process, current code will look for DT associated > with hostname nn1.xyz.com or nn2.xyz.com, which is different from the DT > given by NN. causing Token authentication to fail. This happens in > {{AbstractDelegationTokenSelector#selectToken}}. New IPFailover proxy > provider will need to resolve this as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14017) ObserverReadProxyProviderWithIPFailover should work with HA configuration
[ https://issues.apache.org/jira/browse/HDFS-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16690170#comment-16690170 ] Chen Liang commented on HDFS-14017: --- Thanks for looking into this [~xkrogen]! I agree that there should be no way the patch was breaking things, as it is changing a class not currently being used anywhere, just like you mentioned. I checked with Konstantin as well, he is okay with committing it. I will commit soon. > ObserverReadProxyProviderWithIPFailover should work with HA configuration > - > > Key: HDFS-14017 > URL: https://issues.apache.org/jira/browse/HDFS-14017 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-14017-HDFS-12943.001.patch, > HDFS-14017-HDFS-12943.002.patch, HDFS-14017-HDFS-12943.003.patch, > HDFS-14017-HDFS-12943.004.patch, HDFS-14017-HDFS-12943.005.patch, > HDFS-14017-HDFS-12943.006.patch, HDFS-14017-HDFS-12943.008.patch, > HDFS-14017-HDFS-12943.009.patch, HDFS-14017-HDFS-12943.010.patch, > HDFS-14017-HDFS-12943.011.patch, HDFS-14017-HDFS-12943.012.patch, > HDFS-14017-HDFS-12943.013.patch, HDFS-14017-HDFS-12943.014.patch > > > Currently {{ObserverReadProxyProviderWithIPFailover}} extends > {{ObserverReadProxyProvider}}, and the only difference is changing the proxy > factory to use {{IPFailoverProxyProvider}}. However this is not enough > because when calling constructor of {{ObserverReadProxyProvider}} in > super(...), the follow line: > {code:java} > nameNodeProxies = getProxyAddresses(uri, > HdfsClientConfigKeys.DFS_NAMENODE_RPC_ADDRESS_KEY); > {code} > will try to resolve the all configured NN addresses to do configured > failover. But in the case of IPFailover, this does not really apply. > > A second issue closely related is about delegation token. For example, in > current IPFailover setup, say we have a virtual host nn.xyz.com, which points > to either of two physical nodes nn1.xyz.com or nn2.xyz.com. In current HDFS, > there is always only one DT being exchanged, which has hostname nn.xyz.com. > Server only issues this DT, and client only knows the host nn.xyz.com, so all > is good. But in Observer read, even with IPFailover, the client will no > longer contacting nn.xyz.com, but will actively reaching to nn1.xyz.com and > nn2.xyz.com. During this process, current code will look for DT associated > with hostname nn1.xyz.com or nn2.xyz.com, which is different from the DT > given by NN. causing Token authentication to fail. This happens in > {{AbstractDelegationTokenSelector#selectToken}}. New IPFailover proxy > provider will need to resolve this as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13566) Add configurable additional RPC listener to NameNode
[ https://issues.apache.org/jira/browse/HDFS-13566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang updated HDFS-13566: -- Attachment: HDFS-13566.005.patch > Add configurable additional RPC listener to NameNode > > > Key: HDFS-13566 > URL: https://issues.apache.org/jira/browse/HDFS-13566 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ipc >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-13566.001.patch, HDFS-13566.002.patch, > HDFS-13566.003.patch, HDFS-13566.004.patch, HDFS-13566.005.patch > > > This Jira aims to add the capability to NameNode to run additional > listener(s). Such that NameNode can be accessed from multiple ports. > Fundamentally, this Jira tries to extend ipc.Server to allow configured with > more listeners, binding to different ports, but sharing the same call queue > and the handlers. Useful when different clients are only allowed to access > certain different ports. Combined with HDFS-13547, this also allows different > ports to have different SASL security levels. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13898) Throw retriable exception for getBlockLocations when ObserverNameNode is in safemode
[ https://issues.apache.org/jira/browse/HDFS-13898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16615149#comment-16615149 ] Chen Liang commented on HDFS-13898: --- Thanks [~csun] for reporting and fixing this! My main question was also that if we are mocking BlockManager why do we still need to change DN number. Seems that the {{Requested replication factor of 0 is less than the required minimum of 1}} was reported in {{BlockManager.verifyReplication}}, I wonder is it possible to mock the method {{verifyReplication}}, making it a no-op, so it won't perform this check? e.g. I randomly added the following lines before {{createNewFile}}, seems the test can pass without changing the num of DN. Not sure if this is a better way to go though. Just a thought. {code:java} BlockManager bmSpy1 = NameNodeAdapter.spyOnBlockManager(namenodes[0]); doNothing().when(bmSpy1).verifyReplication(anyString(), anyShort(), anyString());{code} Also, there is an inconsistency on the style of the mocking arguments, that some are {{Mock.any()}}, while some others are like {{anyBoolean()}}. (no Mock.) > Throw retriable exception for getBlockLocations when ObserverNameNode is in > safemode > > > Key: HDFS-13898 > URL: https://issues.apache.org/jira/browse/HDFS-13898 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Attachments: HDFS-13898-HDFS-12943.000.patch > > > When ObserverNameNode is in safe mode, {{getBlockLocations}} may throw safe > mode exception if the given file doesn't have any block yet. > {code} > try { > checkOperation(OperationCategory.READ); > res = FSDirStatAndListingOp.getBlockLocations( > dir, pc, srcArg, offset, length, true); > if (isInSafeMode()) { > for (LocatedBlock b : res.blocks.getLocatedBlocks()) { > // if safemode & no block locations yet then throw safemodeException > if ((b.getLocations() == null) || (b.getLocations().length == 0)) { > SafeModeException se = newSafemodeException( > "Zero blocklocations for " + srcArg); > if (haEnabled && haContext != null && > haContext.getState().getServiceState() == > HAServiceState.ACTIVE) { > throw new RetriableException(se); > } else { > throw se; > } > } > } > } > {code} > It only throws {{RetriableException}} for active NN so requests on observer > may just fail. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13924) Handle BlockMissingException when reading from observer
[ https://issues.apache.org/jira/browse/HDFS-13924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618247#comment-16618247 ] Chen Liang commented on HDFS-13924: --- Good finding, thanks for reporting [~csun]! I'm wondering, is there a full stack trace still available? Just want to get a better idea on why the retry logic did not help for this case. > Handle BlockMissingException when reading from observer > --- > > Key: HDFS-13924 > URL: https://issues.apache.org/jira/browse/HDFS-13924 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chao Sun >Priority: Major > > Internally we found that reading from ObserverNode may result to > {{BlockMissingException}}. This may happen when the observer sees a smaller > number of DNs than active (maybe due to communication issue with those DNs), > or (we guess) late block reports from some DNs to the observer. This error > happens in > [DFSInputStream#chooseDataNode|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java#L846], > when no valid DN can be found for the {{LocatedBlock}} got from the NN side. > One potential solution (although a little hacky) is to ask the > {{DFSInputStream}} to retry active when this happens. The retry logic already > present in the code - we just have to dynamically set a flag to ask the > {{ObserverReadProxyProvider}} try active in this case. > cc [~shv], [~xkrogen], [~vagarychen], [~zero45] for discussion. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13924) Handle BlockMissingException when reading from observer
[ https://issues.apache.org/jira/browse/HDFS-13924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618281#comment-16618281 ] Chen Liang commented on HDFS-13924: --- Thanks [~csun]. I see, so I imagine the error did not happen on server side, because server side does not treat this as error, it still returns a LocatedBlock, but with empty block info list. This only becomes an exception when later client actually tries to read the block? If this is what was happening, maybe another fix would be that on server side, if server finds itself in observer state, and getBlockLocations is called with no known block info, instead of returning empty list, it throws exception instead, so that client side triggers retry to a different node. Let DFSInputStream switch to active also makes sense to me though. > Handle BlockMissingException when reading from observer > --- > > Key: HDFS-13924 > URL: https://issues.apache.org/jira/browse/HDFS-13924 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chao Sun >Priority: Major > > Internally we found that reading from ObserverNode may result to > {{BlockMissingException}}. This may happen when the observer sees a smaller > number of DNs than active (maybe due to communication issue with those DNs), > or (we guess) late block reports from some DNs to the observer. This error > happens in > [DFSInputStream#chooseDataNode|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java#L846], > when no valid DN can be found for the {{LocatedBlock}} got from the NN side. > One potential solution (although a little hacky) is to ask the > {{DFSInputStream}} to retry active when this happens. The retry logic already > present in the code - we just have to dynamically set a flag to ask the > {{ObserverReadProxyProvider}} try active in this case. > cc [~shv], [~xkrogen], [~vagarychen], [~zero45] for discussion. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13566) Add configurable additional RPC listener to NameNode
[ https://issues.apache.org/jira/browse/HDFS-13566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618291#comment-16618291 ] Chen Liang commented on HDFS-13566: --- {{TestLeaseRecovery2}} failed regardless of whether the patch is applied or now. The other failed tests succeeded locally. The check-style issues were not introduced in this patch. > Add configurable additional RPC listener to NameNode > > > Key: HDFS-13566 > URL: https://issues.apache.org/jira/browse/HDFS-13566 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ipc >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-13566.001.patch, HDFS-13566.002.patch, > HDFS-13566.003.patch, HDFS-13566.004.patch, HDFS-13566.005.patch, > HDFS-13566.006.patch > > > This Jira aims to add the capability to NameNode to run additional > listener(s). Such that NameNode can be accessed from multiple ports. > Fundamentally, this Jira tries to extend ipc.Server to allow configured with > more listeners, binding to different ports, but sharing the same call queue > and the handlers. Useful when different clients are only allowed to access > certain different ports. Combined with HDFS-13547, this also allows different > ports to have different SASL security levels. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13566) Add configurable additional RPC listener to NameNode
[ https://issues.apache.org/jira/browse/HDFS-13566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang updated HDFS-13566: -- Attachment: HDFS-13566.006.patch > Add configurable additional RPC listener to NameNode > > > Key: HDFS-13566 > URL: https://issues.apache.org/jira/browse/HDFS-13566 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ipc >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-13566.001.patch, HDFS-13566.002.patch, > HDFS-13566.003.patch, HDFS-13566.004.patch, HDFS-13566.005.patch, > HDFS-13566.006.patch > > > This Jira aims to add the capability to NameNode to run additional > listener(s). Such that NameNode can be accessed from multiple ports. > Fundamentally, this Jira tries to extend ipc.Server to allow configured with > more listeners, binding to different ports, but sharing the same call queue > and the handlers. Useful when different clients are only allowed to access > certain different ports. Combined with HDFS-13547, this also allows different > ports to have different SASL security levels. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13880) Add mechanism to allow certain RPC calls to bypass sync
[ https://issues.apache.org/jira/browse/HDFS-13880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614159#comment-16614159 ] Chen Liang commented on HDFS-13880: --- I've committed v005 patch to the feature branch (with the unused import removed). Thanks for the review [~shv], [~xkrogen]! > Add mechanism to allow certain RPC calls to bypass sync > --- > > Key: HDFS-13880 > URL: https://issues.apache.org/jira/browse/HDFS-13880 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-13880-HDFS-12943.001.patch, > HDFS-13880-HDFS-12943.002.patch, HDFS-13880-HDFS-12943.003.patch, > HDFS-13880-HDFS-12943.004.patch, HDFS-13880-HDFS-12943.005.patch > > > Currently, every single call to NameNode will be synced, in the sense that > NameNode will not process it until state id catches up. But in certain cases, > we would like to bypass this check and allow the call to return immediately, > even when the server id is not up to date. One case could be the to-be-added > new API in HDFS-13749 that request for current state id. Others may include > calls that do not promise real time responses such as {{getContentSummary}}. > This Jira is to add the mechanism to allow certain calls to bypass sync. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13880) Add mechanism to allow certain RPC calls to bypass sync
[ https://issues.apache.org/jira/browse/HDFS-13880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang updated HDFS-13880: -- Resolution: Fixed Status: Resolved (was: Patch Available) > Add mechanism to allow certain RPC calls to bypass sync > --- > > Key: HDFS-13880 > URL: https://issues.apache.org/jira/browse/HDFS-13880 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-13880-HDFS-12943.001.patch, > HDFS-13880-HDFS-12943.002.patch, HDFS-13880-HDFS-12943.003.patch, > HDFS-13880-HDFS-12943.004.patch, HDFS-13880-HDFS-12943.005.patch > > > Currently, every single call to NameNode will be synced, in the sense that > NameNode will not process it until state id catches up. But in certain cases, > we would like to bypass this check and allow the call to return immediately, > even when the server id is not up to date. One case could be the to-be-added > new API in HDFS-13749 that request for current state id. Others may include > calls that do not promise real time responses such as {{getContentSummary}}. > This Jira is to add the mechanism to allow certain calls to bypass sync. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12943) Consistent Reads from Standby Node
[ https://issues.apache.org/jira/browse/HDFS-12943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16724465#comment-16724465 ] Chen Liang commented on HDFS-12943: --- Hi [~brahmareddy] bq. you can see this issue if "dfs.ha.tail-edits.period" is default value. Yes, with default period of 1min, any read can take up to 1min to finish, this is not specific to "second" call as you were mentioning, but any read. I agree that we need to lower this value. In our environment, we do already have set it to 100ms, and with this setting, I never seen the issue of always the second call timeout as you mentioned, nor getServiceState taking 2 seconds. I was under the impression that you still had the timeout even with setting it to 100ms? > Consistent Reads from Standby Node > -- > > Key: HDFS-12943 > URL: https://issues.apache.org/jira/browse/HDFS-12943 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs >Reporter: Konstantin Shvachko >Priority: Major > Attachments: ConsistentReadsFromStandbyNode.pdf, > ConsistentReadsFromStandbyNode.pdf, HDFS-12943-001.patch, > HDFS-12943-002.patch, TestPlan-ConsistentReadsFromStandbyNode.pdf > > > StandbyNode in HDFS is a replica of the active NameNode. The states of the > NameNodes are coordinated via the journal. It is natural to consider > StandbyNode as a read-only replica. As with any replicated distributed system > the problem of stale reads should be resolved. Our main goal is to provide > reads from standby in a consistent way in order to enable a wide range of > existing applications running on top of HDFS. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14205) Backport HDFS-6440 to branch-2
[ https://issues.apache.org/jira/browse/HDFS-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16754241#comment-16754241 ] Chen Liang commented on HDFS-14205: --- Thanks for chiming in [~csun]! I also had a patch for this backport. It compiles but some tests were failing, I'm not sure if the fails are related because many of them failed even without the patch. I haven't got the bandwidth to look further into the fails though. Post my patch here, hope it helps. > Backport HDFS-6440 to branch-2 > -- > > Key: HDFS-14205 > URL: https://issues.apache.org/jira/browse/HDFS-14205 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Liang >Assignee: Chao Sun >Priority: Major > > Currently support for more than 2 NameNodes (HDFS-6440) is only in branch-3. > This JIRA aims to backport it to branch-2, as this is required by HDFS-12943 > (consistent read from standby) backport to branch-2. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14205) Backport HDFS-6440 to branch-2
[ https://issues.apache.org/jira/browse/HDFS-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang updated HDFS-14205: -- Attachment: HDFS-14205-branch-2.001.patch > Backport HDFS-6440 to branch-2 > -- > > Key: HDFS-14205 > URL: https://issues.apache.org/jira/browse/HDFS-14205 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Liang >Assignee: Chao Sun >Priority: Major > Attachments: HDFS-14205-branch-2.001.patch > > > Currently support for more than 2 NameNodes (HDFS-6440) is only in branch-3. > This JIRA aims to backport it to branch-2, as this is required by HDFS-12943 > (consistent read from standby) backport to branch-2. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13699) Add DFSClient sending handshake token to DataNode, and allow DataNode overwrite downstream QOP
[ https://issues.apache.org/jira/browse/HDFS-13699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang updated HDFS-13699: -- Attachment: HDFS-13699.009.patch > Add DFSClient sending handshake token to DataNode, and allow DataNode > overwrite downstream QOP > -- > > Key: HDFS-13699 > URL: https://issues.apache.org/jira/browse/HDFS-13699 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-13699.001.patch, HDFS-13699.002.patch, > HDFS-13699.003.patch, HDFS-13699.004.patch, HDFS-13699.005.patch, > HDFS-13699.006.patch, HDFS-13699.007.patch, HDFS-13699.008.patch, > HDFS-13699.009.patch, HDFS-13699.WIP.001.patch > > > Given the other Jiras under HDFS-13541, this Jira is to allow DFSClient to > redirect the encrypt secret to DataNode. The encrypted message is the QOP > that client and NameNode have used. DataNode decrypts the message and enforce > the QOP for the client connection. Also, this Jira will also include > overwriting downstream QOP, as mentioned in the HDFS-13541 design doc. > Namely, this is to allow inter-DN QOP that is different from client-DN QOP. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13699) Add DFSClient sending handshake token to DataNode, and allow DataNode overwrite downstream QOP
[ https://issues.apache.org/jira/browse/HDFS-13699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808008#comment-16808008 ] Chen Liang commented on HDFS-13699: --- Thanks for the clarification [~shv]! I've attached v009 patch to address all comments. > Add DFSClient sending handshake token to DataNode, and allow DataNode > overwrite downstream QOP > -- > > Key: HDFS-13699 > URL: https://issues.apache.org/jira/browse/HDFS-13699 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-13699.001.patch, HDFS-13699.002.patch, > HDFS-13699.003.patch, HDFS-13699.004.patch, HDFS-13699.005.patch, > HDFS-13699.006.patch, HDFS-13699.007.patch, HDFS-13699.008.patch, > HDFS-13699.009.patch, HDFS-13699.WIP.001.patch > > > Given the other Jiras under HDFS-13541, this Jira is to allow DFSClient to > redirect the encrypt secret to DataNode. The encrypted message is the QOP > that client and NameNode have used. DataNode decrypts the message and enforce > the QOP for the client connection. Also, this Jira will also include > overwriting downstream QOP, as mentioned in the HDFS-13541 design doc. > Namely, this is to allow inter-DN QOP that is different from client-DN QOP. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13699) Add DFSClient sending handshake token to DataNode, and allow DataNode overwrite downstream QOP
[ https://issues.apache.org/jira/browse/HDFS-13699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang updated HDFS-13699: -- Attachment: HDFS-13699.010.patch > Add DFSClient sending handshake token to DataNode, and allow DataNode > overwrite downstream QOP > -- > > Key: HDFS-13699 > URL: https://issues.apache.org/jira/browse/HDFS-13699 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-13699.001.patch, HDFS-13699.002.patch, > HDFS-13699.003.patch, HDFS-13699.004.patch, HDFS-13699.005.patch, > HDFS-13699.006.patch, HDFS-13699.007.patch, HDFS-13699.008.patch, > HDFS-13699.009.patch, HDFS-13699.010.patch, HDFS-13699.WIP.001.patch > > > Given the other Jiras under HDFS-13541, this Jira is to allow DFSClient to > redirect the encrypt secret to DataNode. The encrypted message is the QOP > that client and NameNode have used. DataNode decrypts the message and enforce > the QOP for the client connection. Also, this Jira will also include > overwriting downstream QOP, as mentioned in the HDFS-13541 design doc. > Namely, this is to allow inter-DN QOP that is different from client-DN QOP. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13699) Add DFSClient sending handshake token to DataNode, and allow DataNode overwrite downstream QOP
[ https://issues.apache.org/jira/browse/HDFS-13699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808186#comment-16808186 ] Chen Liang commented on HDFS-13699: --- Post v010 patch with one additional unused import fix. > Add DFSClient sending handshake token to DataNode, and allow DataNode > overwrite downstream QOP > -- > > Key: HDFS-13699 > URL: https://issues.apache.org/jira/browse/HDFS-13699 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-13699.001.patch, HDFS-13699.002.patch, > HDFS-13699.003.patch, HDFS-13699.004.patch, HDFS-13699.005.patch, > HDFS-13699.006.patch, HDFS-13699.007.patch, HDFS-13699.008.patch, > HDFS-13699.009.patch, HDFS-13699.010.patch, HDFS-13699.WIP.001.patch > > > Given the other Jiras under HDFS-13541, this Jira is to allow DFSClient to > redirect the encrypt secret to DataNode. The encrypted message is the QOP > that client and NameNode have used. DataNode decrypts the message and enforce > the QOP for the client connection. Also, this Jira will also include > overwriting downstream QOP, as mentioned in the HDFS-13541 design doc. > Namely, this is to allow inter-DN QOP that is different from client-DN QOP. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14205) Backport HDFS-6440 to branch-2
[ https://issues.apache.org/jira/browse/HDFS-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang updated HDFS-14205: -- Resolution: Fixed Fix Version/s: 2.10.0 Status: Resolved (was: Patch Available) > Backport HDFS-6440 to branch-2 > -- > > Key: HDFS-14205 > URL: https://issues.apache.org/jira/browse/HDFS-14205 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Liang >Assignee: Chao Sun >Priority: Major > Fix For: 2.10.0 > > Attachments: HDFS-14205-branch-2.001.patch, > HDFS-14205-branch-2.002.patch, HDFS-14205-branch-2.003.patch, > HDFS-14205-branch-2.004.patch, HDFS-14205-branch-2.005.patch, > HDFS-14205-branch-2.006.patch, HDFS-14205-branch-2.007.patch, > HDFS-14205-branch-2.008.patch, HDFS-14205-branch-2.009.patch > > > Currently support for more than 2 NameNodes (HDFS-6440) is only in branch-3. > This JIRA aims to backport it to branch-2, as this is required by HDFS-12943 > (consistent read from standby) backport to branch-2. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14205) Backport HDFS-6440 to branch-2
[ https://issues.apache.org/jira/browse/HDFS-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802083#comment-16802083 ] Chen Liang commented on HDFS-14205: --- I have backported v009 patch to branch-2, thanks [~csun] for the effort! > Backport HDFS-6440 to branch-2 > -- > > Key: HDFS-14205 > URL: https://issues.apache.org/jira/browse/HDFS-14205 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Liang >Assignee: Chao Sun >Priority: Major > Attachments: HDFS-14205-branch-2.001.patch, > HDFS-14205-branch-2.002.patch, HDFS-14205-branch-2.003.patch, > HDFS-14205-branch-2.004.patch, HDFS-14205-branch-2.005.patch, > HDFS-14205-branch-2.006.patch, HDFS-14205-branch-2.007.patch, > HDFS-14205-branch-2.008.patch, HDFS-14205-branch-2.009.patch > > > Currently support for more than 2 NameNodes (HDFS-6440) is only in branch-3. > This JIRA aims to backport it to branch-2, as this is required by HDFS-12943 > (consistent read from standby) backport to branch-2. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14205) Backport HDFS-6440 to branch-2
[ https://issues.apache.org/jira/browse/HDFS-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801988#comment-16801988 ] Chen Liang commented on HDFS-14205: --- Thanks [~csun] for the clarification. I re-ran the two tests again and still passed. I will commit v009 patch soon. > Backport HDFS-6440 to branch-2 > -- > > Key: HDFS-14205 > URL: https://issues.apache.org/jira/browse/HDFS-14205 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Liang >Assignee: Chao Sun >Priority: Major > Attachments: HDFS-14205-branch-2.001.patch, > HDFS-14205-branch-2.002.patch, HDFS-14205-branch-2.003.patch, > HDFS-14205-branch-2.004.patch, HDFS-14205-branch-2.005.patch, > HDFS-14205-branch-2.006.patch, HDFS-14205-branch-2.007.patch, > HDFS-14205-branch-2.008.patch, HDFS-14205-branch-2.009.patch > > > Currently support for more than 2 NameNodes (HDFS-6440) is only in branch-3. > This JIRA aims to backport it to branch-2, as this is required by HDFS-12943 > (consistent read from standby) backport to branch-2. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13699) Add DFSClient sending handshake token to DataNode, and allow DataNode overwrite downstream QOP
[ https://issues.apache.org/jira/browse/HDFS-13699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16799351#comment-16799351 ] Chen Liang commented on HDFS-13699: --- Post v007 patch with various refactoring. The logic remains the same. To make it less confusing for review, I want to mention that part of the logic is to enable overwriting downstream inter-DN QOP. Namely, we want to allow client talking to first DN with QOP1, but the DN themselves talk to each other using QOP2, and QOP1 and QOP2 can be different. This is useful when client is external and has security requirement different from DNs which are all in the same cluster. The way the patch works is by configuring QOP2 which overwrites QOP1 at run-time. > Add DFSClient sending handshake token to DataNode, and allow DataNode > overwrite downstream QOP > -- > > Key: HDFS-13699 > URL: https://issues.apache.org/jira/browse/HDFS-13699 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-13699.001.patch, HDFS-13699.002.patch, > HDFS-13699.003.patch, HDFS-13699.004.patch, HDFS-13699.005.patch, > HDFS-13699.006.patch, HDFS-13699.007.patch, HDFS-13699.WIP.001.patch > > > Given the other Jiras under HDFS-13541, this Jira is to allow DFSClient to > redirect the encrypt secret to DataNode. The encrypted message is the QOP > that client and NameNode have used. DataNode decrypts the message and enforce > the QOP for the client connection. Also, this Jira will also include > overwriting downstream QOP, as mentioned in the HDFS-13541 design doc. > Namely, this is to allow inter-DN QOP that is different from client-DN QOP. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13699) Add DFSClient sending handshake token to DataNode, and allow DataNode overwrite downstream QOP
[ https://issues.apache.org/jira/browse/HDFS-13699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang updated HDFS-13699: -- Attachment: HDFS-13699.007.patch > Add DFSClient sending handshake token to DataNode, and allow DataNode > overwrite downstream QOP > -- > > Key: HDFS-13699 > URL: https://issues.apache.org/jira/browse/HDFS-13699 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-13699.001.patch, HDFS-13699.002.patch, > HDFS-13699.003.patch, HDFS-13699.004.patch, HDFS-13699.005.patch, > HDFS-13699.006.patch, HDFS-13699.007.patch, HDFS-13699.WIP.001.patch > > > Given the other Jiras under HDFS-13541, this Jira is to allow DFSClient to > redirect the encrypt secret to DataNode. The encrypted message is the QOP > that client and NameNode have used. DataNode decrypts the message and enforce > the QOP for the client connection. Also, this Jira will also include > overwriting downstream QOP, as mentioned in the HDFS-13541 design doc. > Namely, this is to allow inter-DN QOP that is different from client-DN QOP. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14397) Backport HADOOP-15684 to branch-2
[ https://issues.apache.org/jira/browse/HDFS-14397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang updated HDFS-14397: -- Resolution: Fixed Status: Resolved (was: Patch Available) > Backport HADOOP-15684 to branch-2 > - > > Key: HDFS-14397 > URL: https://issues.apache.org/jira/browse/HDFS-14397 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Minor > Attachments: HDFS-14397-branch-2.000.patch, > HDFS-14397-branch-2.001.patch > > > As multi-SBN feature is already backported to branch-2, this is a follow-up > to backport HADOOP-15684. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14397) Backport HADOOP-15684 to branch-2
[ https://issues.apache.org/jira/browse/HDFS-14397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809404#comment-16809404 ] Chen Liang commented on HDFS-14397: --- +1 on v001 patch, I've committed to branch-2, thanks for the contribution [~csun]! > Backport HADOOP-15684 to branch-2 > - > > Key: HDFS-14397 > URL: https://issues.apache.org/jira/browse/HDFS-14397 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Minor > Attachments: HDFS-14397-branch-2.000.patch, > HDFS-14397-branch-2.001.patch > > > As multi-SBN feature is already backported to branch-2, this is a follow-up > to backport HADOOP-15684. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14415) Backport HDFS-13799 to branch-2
[ https://issues.apache.org/jira/browse/HDFS-14415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811148#comment-16811148 ] Chen Liang commented on HDFS-14415: --- The tests also passed in my local run. +1 to v000 patch, I've committed to branch-2. Thanks [~csun]! > Backport HDFS-13799 to branch-2 > --- > > Key: HDFS-14415 > URL: https://issues.apache.org/jira/browse/HDFS-14415 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Trivial > Attachments: HDFS-14415-branch-2.000.patch > > > As multi-SBN feature is already backported to branch-2, this is a follow-up > to backport HDFS-13799. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org