[jira] [Commented] (HDFS-15383) RBF: Disable watch in ZKDelegationSecretManager for performance
[ https://issues.apache.org/jira/browse/HDFS-15383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630038#comment-17630038 ] ASF GitHub Bot commented on HDFS-15383: --- virajith merged PR #5112: URL: https://github.com/apache/hadoop/pull/5112 > RBF: Disable watch in ZKDelegationSecretManager for performance > --- > > Key: HDFS-15383 > URL: https://issues.apache.org/jira/browse/HDFS-15383 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Fengnan Li >Assignee: Fengnan Li >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > Based on the current design for delegation token in secure Router, the total > number of watches for tokens is the product of number of routers and number > of tokens, this is due to ZKDelegationTokenManager is using PathChildrenCache > from curator, which automatically sets the watch and ZK will push the sync > information to each router. There are some evaluations about the number of > watches in Zookeeper has negative performance impact to Zookeeper server. > In our practice when the number of watches exceeds 1.2 Million in a single ZK > server there will be significant ZK performance degradation. Thus this ticket > is to rewrite ZKDelegationTokenManagerImpl.java to explicitly disable the > PathChildrenCache and have Routers sync periodically from Zookeeper. This has > been working fine at the scale of 10 Routers with 2 million tokens. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16827) [RBF SBN] RouterStateIdContext shouldn't update the ResponseState if client doesn't use ObserverReadProxyProvider
[ https://issues.apache.org/jira/browse/HDFS-16827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17629948#comment-17629948 ] ASF GitHub Bot commented on HDFS-16827: --- simbadzina commented on code in PR #5088: URL: https://github.com/apache/hadoop/pull/5088#discussion_r1015741184 ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java: ## @@ -2879,9 +2880,12 @@ private void processRpcRequest(RpcRequestHeaderProto header, stateId = alignmentContext.receiveRequestState( header, getMaxIdleTime()); call.setClientStateId(stateId); -if (header.hasRouterFederatedState()) { - call.setFederatedNamespaceState(header.getRouterFederatedState()); -} + } + if (header.hasRouterFederatedState()) { +call.setFederatedNamespaceState(header.getRouterFederatedState()); + } else if (header.hasStateId()) { +// Set one empty FederatedNamespaceState to identify the client want to get stateId. +call.setFederatedNamespaceState(EMPTY_BYTE_STRING); Review Comment: Typo "wants" instead of "want" > [RBF SBN] RouterStateIdContext shouldn't update the ResponseState if client > doesn't use ObserverReadProxyProvider > - > > Key: HDFS-16827 > URL: https://issues.apache.org/jira/browse/HDFS-16827 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > Labels: pull-request-available > > RouterStateIdContext shouldn't update the ResponseState if client doesn't use > ObserverReadProxyProvider. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16827) [RBF SBN] RouterStateIdContext shouldn't update the ResponseState if client doesn't use ObserverReadProxyProvider
[ https://issues.apache.org/jira/browse/HDFS-16827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17629947#comment-17629947 ] ASF GitHub Bot commented on HDFS-16827: --- simbadzina commented on PR #5088: URL: https://github.com/apache/hadoop/pull/5088#issuecomment-1305990097 The change looks good to me. Still passes we refactored unit test in TestObserverWithRouter on trunk. > [RBF SBN] RouterStateIdContext shouldn't update the ResponseState if client > doesn't use ObserverReadProxyProvider > - > > Key: HDFS-16827 > URL: https://issues.apache.org/jira/browse/HDFS-16827 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > Labels: pull-request-available > > RouterStateIdContext shouldn't update the ResponseState if client doesn't use > ObserverReadProxyProvider. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15383) RBF: Disable watch in ZKDelegationSecretManager for performance
[ https://issues.apache.org/jira/browse/HDFS-15383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17629930#comment-17629930 ] ASF GitHub Bot commented on HDFS-15383: --- virajith commented on PR #5112: URL: https://github.com/apache/hadoop/pull/5112#issuecomment-1305950656 I'll merge this in the next hour. > RBF: Disable watch in ZKDelegationSecretManager for performance > --- > > Key: HDFS-15383 > URL: https://issues.apache.org/jira/browse/HDFS-15383 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Fengnan Li >Assignee: Fengnan Li >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > Based on the current design for delegation token in secure Router, the total > number of watches for tokens is the product of number of routers and number > of tokens, this is due to ZKDelegationTokenManager is using PathChildrenCache > from curator, which automatically sets the watch and ZK will push the sync > information to each router. There are some evaluations about the number of > watches in Zookeeper has negative performance impact to Zookeeper server. > In our practice when the number of watches exceeds 1.2 Million in a single ZK > server there will be significant ZK performance degradation. Thus this ticket > is to rewrite ZKDelegationTokenManagerImpl.java to explicitly disable the > PathChildrenCache and have Routers sync periodically from Zookeeper. This has > been working fine at the scale of 10 Routers with 2 million tokens. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15383) RBF: Disable watch in ZKDelegationSecretManager for performance
[ https://issues.apache.org/jira/browse/HDFS-15383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17629929#comment-17629929 ] ASF GitHub Bot commented on HDFS-15383: --- virajith commented on PR #5112: URL: https://github.com/apache/hadoop/pull/5112#issuecomment-1305950206 Thanks for the backport @melissayou . The changes look good to me - I expect the deprecated method will be addressed by [HADOOP-18520](https://issues.apache.org/jira/browse/HADOOP-18520). The other failures exist in trunk as well - fixing the checkstyles will not make this a clean cherry-pick. > RBF: Disable watch in ZKDelegationSecretManager for performance > --- > > Key: HDFS-15383 > URL: https://issues.apache.org/jira/browse/HDFS-15383 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Fengnan Li >Assignee: Fengnan Li >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > Based on the current design for delegation token in secure Router, the total > number of watches for tokens is the product of number of routers and number > of tokens, this is due to ZKDelegationTokenManager is using PathChildrenCache > from curator, which automatically sets the watch and ZK will push the sync > information to each router. There are some evaluations about the number of > watches in Zookeeper has negative performance impact to Zookeeper server. > In our practice when the number of watches exceeds 1.2 Million in a single ZK > server there will be significant ZK performance degradation. Thus this ticket > is to rewrite ZKDelegationTokenManagerImpl.java to explicitly disable the > PathChildrenCache and have Routers sync periodically from Zookeeper. This has > been working fine at the scale of 10 Routers with 2 million tokens. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13522) HDFS-13522: Add federated nameservices states to client protocol and propagate it between routers and clients.
[ https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17629928#comment-17629928 ] ASF GitHub Bot commented on HDFS-13522: --- simbadzina closed pull request #4883: HDFS-13522: Add federated nameservices states to client protocol and propagate it between routers and clients. URL: https://github.com/apache/hadoop/pull/4883 > HDFS-13522: Add federated nameservices states to client protocol and > propagate it between routers and clients. > -- > > Key: HDFS-13522 > URL: https://issues.apache.org/jira/browse/HDFS-13522 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: federation, namenode >Reporter: Erik Krogen >Assignee: Simbarashe Dzinamarira >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.5 > > Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, > HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC > clogging.png, ShortTerm-Routers+Observer.png, > observer_reads_in_rbf_proposal_simbadzina_v1.pdf, > observer_reads_in_rbf_proposal_simbadzina_v2.pdf > > Time Spent: 20h 50m > Remaining Estimate: 0h > > Changes will need to occur to the router to support the new observer node. > One such change will be to make the router understand the observer state, > e.g. {{{}FederationNamenodeServiceState{}}}. > This patch captures the state of all namespaces in the routers and propagates > it to clients. A follow up patch will change router behavior to direct > requests to the observer. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16821) Fix regression in HDFS-13522 that enables observer reads by default.
[ https://issues.apache.org/jira/browse/HDFS-16821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17629909#comment-17629909 ] ASF GitHub Bot commented on HDFS-16821: --- omalley merged PR #5078: URL: https://github.com/apache/hadoop/pull/5078 > Fix regression in HDFS-13522 that enables observer reads by default. > > > Key: HDFS-16821 > URL: https://issues.apache.org/jira/browse/HDFS-16821 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Simbarashe Dzinamarira >Assignee: Simbarashe Dzinamarira >Priority: Major > Labels: pull-request-available > > Serving reads consistently from Observer Namenodes is a feature that was > introduced in HDFS-12943. > Clients opt-into this feature by configuring the ObserverReadProxyProvider. > It is important that the opt-in is explicit because for third-party reads to > remain consistent, these clients then need to perform an msync before reads. > In HDFS-13522, the ClientGSIContext is implicitly added to the DFSClient thus > enabling Observer reads for all clients by default. This breaks consistency > guarantees for clients that haven't opted into observer reads. > [https://github.com/apache/hadoop/pull/4883/files#diff-a627e2c1f3e68235520d3c28092f4ae8a41aa4557cc530e4e6862c318be7e898R352-R354] > We need to return to the old behavior of only using the ClientGSIContext when > users have explicitly opted into Observer reads. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16821) Fix regression in HDFS-13522 that enables observer reads by default.
[ https://issues.apache.org/jira/browse/HDFS-16821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17629910#comment-17629910 ] ASF GitHub Bot commented on HDFS-16821: --- omalley commented on PR #5078: URL: https://github.com/apache/hadoop/pull/5078#issuecomment-1305886044 +1 LGTM > Fix regression in HDFS-13522 that enables observer reads by default. > > > Key: HDFS-16821 > URL: https://issues.apache.org/jira/browse/HDFS-16821 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Simbarashe Dzinamarira >Assignee: Simbarashe Dzinamarira >Priority: Major > Labels: pull-request-available > > Serving reads consistently from Observer Namenodes is a feature that was > introduced in HDFS-12943. > Clients opt-into this feature by configuring the ObserverReadProxyProvider. > It is important that the opt-in is explicit because for third-party reads to > remain consistent, these clients then need to perform an msync before reads. > In HDFS-13522, the ClientGSIContext is implicitly added to the DFSClient thus > enabling Observer reads for all clients by default. This breaks consistency > guarantees for clients that haven't opted into observer reads. > [https://github.com/apache/hadoop/pull/4883/files#diff-a627e2c1f3e68235520d3c28092f4ae8a41aa4557cc530e4e6862c318be7e898R352-R354] > We need to return to the old behavior of only using the ClientGSIContext when > users have explicitly opted into Observer reads. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16764) ObserverNamenode handles addBlock rpc and throws a FileNotFoundException
[ https://issues.apache.org/jira/browse/HDFS-16764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17629857#comment-17629857 ] ASF GitHub Bot commented on HDFS-16764: --- hadoop-yetus commented on PR #4872: URL: https://github.com/apache/hadoop/pull/4872#issuecomment-1305722509 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 1m 1s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 42m 17s | | trunk passed | | +1 :green_heart: | compile | 1m 37s | | trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 1m 27s | | trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 1m 16s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 39s | | trunk passed | | +1 :green_heart: | javadoc | 1m 15s | | trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 1m 43s | | trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 3m 43s | | trunk passed | | +1 :green_heart: | shadedclient | 26m 37s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 28s | | the patch passed | | +1 :green_heart: | compile | 1m 35s | | the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 1m 35s | | the patch passed | | +1 :green_heart: | compile | 1m 23s | | the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | javac | 1m 23s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 2s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 31s | | the patch passed | | +1 :green_heart: | javadoc | 1m 0s | | the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 1m 34s | | the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 3m 49s | | the patch passed | | +1 :green_heart: | shadedclient | 26m 40s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 351m 17s | | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 56s | | The patch does not generate ASF License warnings. | | | | 471m 51s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4872/4/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/4872 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 033df28c4be5 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / a44c5a2c2d01f68e91febdf82c9f2d4e25d53896 | | Default Java | Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4872/4/testReport/ | | Max. process+thread count | 1847 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4872/4/console | | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 | | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org | This message was automatically generated. > ObserverNamenode handles ad
[jira] [Commented] (HDFS-16831) [RBF SBN] GetNamenodesForNameserviceId should shuffle Observer NameNodes every time
[ https://issues.apache.org/jira/browse/HDFS-16831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17629651#comment-17629651 ] ASF GitHub Bot commented on HDFS-16831: --- ZanderXu commented on PR #5098: URL: https://github.com/apache/hadoop/pull/5098#issuecomment-1305225730 > +1 LGTM after last commit. @ZanderXu - Do you think we need to add a UT ? @ashutoshcipher Sir, thanks for your review. I will add one UT to test it. Do you have some good ideas to test shuffling result? > [RBF SBN] GetNamenodesForNameserviceId should shuffle Observer NameNodes > every time > --- > > Key: HDFS-16831 > URL: https://issues.apache.org/jira/browse/HDFS-16831 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > Labels: pull-request-available > > The method getNamenodesForNameserviceId in MembershipNamenodeResolver.class > should shuffle Observer NameNodes every time. The current logic will return > the cached list and will caused all of read requests are forwarding to the > first observer namenode. > > The related code as bellow: > {code:java} > @Override > public List getNamenodesForNameserviceId( > final String nsId, boolean listObserversFirst) throws IOException { > List ret = cacheNS.get(Pair.of(nsId, > listObserversFirst)); > if (ret != null) { > return ret; > } > ... > }{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org