[jira] [Comment Edited] (HDFS-14090) RBF: Improved isolation for downstream name nodes. {Static}
[ https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17230612#comment-17230612 ] Yiqun Lin edited comment on HDFS-14090 at 11/12/20, 2:38 PM: - Hi [~fengnanli], three nits for the latest patch: 1 Will look good to rename dfs.federation.router.fairness.handler.count.NS to dfs.federation.router.fairness.handler.count.EXAMPLENAMESERVICE. 2 {noformat} smaller or equal to the total number of router handlers; if the special *concurrent* is not specified, the sum of all configured values must be strictly smaller than the router handlers thus the left will be allocated to the concurrent calls. {noformat} Can we mention related setting ''strictly smaller than the router handlers (dfs.federation.router.handler.count)... 3 Can you fix related failed unit test? |hadoop.hdfs.server.federation.router.TestRBFConfigFields| Others look good to me. was (Author: linyiqun): Hi [~fengnanli], two nits for the latest patch: {noformat} smaller or equal to the total number of router handlers; if the special *concurrent* is not specified, the sum of all configured values must be strictly smaller than the router handlers thus the left will be allocated to the concurrent calls. {noformat} Can we mention related setting ''strictly smaller than the router handlers (dfs.federation.router.handler.count)... Can you fix related failed unit test? |hadoop.hdfs.server.federation.router.TestRBFConfigFields| Others look good to me. > RBF: Improved isolation for downstream name nodes. {Static} > --- > > Key: HDFS-14090 > URL: https://issues.apache.org/jira/browse/HDFS-14090 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: CR Hota >Assignee: Fengnan Li >Priority: Major > Attachments: HDFS-14090-HDFS-13891.001.patch, > HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, > HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, > HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, > HDFS-14090.009.patch, HDFS-14090.010.patch, HDFS-14090.011.patch, > HDFS-14090.012.patch, HDFS-14090.013.patch, HDFS-14090.014.patch, > HDFS-14090.015.patch, HDFS-14090.016.patch, HDFS-14090.017.patch, > HDFS-14090.018.patch, HDFS-14090.019.patch, HDFS-14090.020.patch, > HDFS-14090.021.patch, HDFS-14090.022.patch, HDFS-14090.023.patch, > HDFS-14090.024.patch, RBF_ Isolation design.pdf > > > Router is a gateway to underlying name nodes. Gateway architectures, should > help minimize impact of clients connecting to healthy clusters vs unhealthy > clusters. > For example - If there are 2 name nodes downstream, and one of them is > heavily loaded with calls spiking rpc queue times, due to back pressure the > same with start reflecting on the router. As a result of this, clients > connecting to healthy/faster name nodes will also slow down as same rpc queue > is maintained for all calls at the router layer. Essentially the same IPC > thread pool is used by router to connect to all name nodes. > Currently router uses one single rpc queue for all calls. Lets discuss how we > can change the architecture and add some throttling logic for > unhealthy/slow/overloaded name nodes. > One way could be to read from current call queue, immediately identify > downstream name node and maintain a separate queue for each underlying name > node. Another simpler way is to maintain some sort of rate limiter configured > for each name node and let routers drop/reject/send error requests after > certain threshold. > This won’t be a simple change as router’s ‘Server’ layer would need redesign > and implementation. Currently this layer is the same as name node. > Opening this ticket to discuss, design and implement this feature. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14090) RBF: Improved isolation for downstream name nodes. {Static}
[ https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17230612#comment-17230612 ] Yiqun Lin edited comment on HDFS-14090 at 11/12/20, 1:07 PM: - Hi [~fengnanli], two nits for the latest patch: {noformat} smaller or equal to the total number of router handlers; if the special *concurrent* is not specified, the sum of all configured values must be strictly smaller than the router handlers thus the left will be allocated to the concurrent calls. {noformat} Can we mention related setting ''strictly smaller than the router handlers (dfs.federation.router.handler.count)... Can you fix related failed unit test? |hadoop.hdfs.server.federation.router.TestRBFConfigFields| Others look good to me. was (Author: linyiqun): Hi [~fengnanli], two nits for the latest patch: {noformat} smaller or equal to the total number of router handlers; if the special *concurrent* is not specified, the sum of all configured values must be strictly smaller than the router handlers thus the left will be allocated to the concurrent calls. {noformat} Can we mention related setting ''strictly smaller than the router handlers (dfs.federation.router.handler.count)... Can you fix related failed unit test? Others look good to me. > RBF: Improved isolation for downstream name nodes. {Static} > --- > > Key: HDFS-14090 > URL: https://issues.apache.org/jira/browse/HDFS-14090 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: CR Hota >Assignee: Fengnan Li >Priority: Major > Attachments: HDFS-14090-HDFS-13891.001.patch, > HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, > HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, > HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, > HDFS-14090.009.patch, HDFS-14090.010.patch, HDFS-14090.011.patch, > HDFS-14090.012.patch, HDFS-14090.013.patch, HDFS-14090.014.patch, > HDFS-14090.015.patch, HDFS-14090.016.patch, HDFS-14090.017.patch, > HDFS-14090.018.patch, HDFS-14090.019.patch, HDFS-14090.020.patch, > HDFS-14090.021.patch, HDFS-14090.022.patch, HDFS-14090.023.patch, > HDFS-14090.024.patch, RBF_ Isolation design.pdf > > > Router is a gateway to underlying name nodes. Gateway architectures, should > help minimize impact of clients connecting to healthy clusters vs unhealthy > clusters. > For example - If there are 2 name nodes downstream, and one of them is > heavily loaded with calls spiking rpc queue times, due to back pressure the > same with start reflecting on the router. As a result of this, clients > connecting to healthy/faster name nodes will also slow down as same rpc queue > is maintained for all calls at the router layer. Essentially the same IPC > thread pool is used by router to connect to all name nodes. > Currently router uses one single rpc queue for all calls. Lets discuss how we > can change the architecture and add some throttling logic for > unhealthy/slow/overloaded name nodes. > One way could be to read from current call queue, immediately identify > downstream name node and maintain a separate queue for each underlying name > node. Another simpler way is to maintain some sort of rate limiter configured > for each name node and let routers drop/reject/send error requests after > certain threshold. > This won’t be a simple change as router’s ‘Server’ layer would need redesign > and implementation. Currently this layer is the same as name node. > Opening this ticket to discuss, design and implement this feature. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14090) RBF: Improved isolation for downstream name nodes. {Static}
[ https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17230394#comment-17230394 ] Fengnan Li edited comment on HDFS-14090 at 11/12/20, 7:07 AM: -- Uploaded [^HDFS-14090.024.patch] to add configs for it. I feel like there should be more optimization about how this config can be specified and be made less verbose (like specify certain default values so we don't need to specify all nameservices), but I cannot come up with a clean way of doing this now. Will revisit when I start tackle the dynamic allocations. Thanks! was (Author: fengnanli): Uploaded [^HDFS-14090.024.patch] to add configs for it. I feel like there should be more optimization about how this config can be specified but make it less verbose (like specify certain default values so we don't need to specify all nameservices), but I cannot come up with a clean way of doing this now. Will revisit when I start tackle the dynamic allocations. Thanks! > RBF: Improved isolation for downstream name nodes. {Static} > --- > > Key: HDFS-14090 > URL: https://issues.apache.org/jira/browse/HDFS-14090 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: CR Hota >Assignee: Fengnan Li >Priority: Major > Attachments: HDFS-14090-HDFS-13891.001.patch, > HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, > HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, > HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, > HDFS-14090.009.patch, HDFS-14090.010.patch, HDFS-14090.011.patch, > HDFS-14090.012.patch, HDFS-14090.013.patch, HDFS-14090.014.patch, > HDFS-14090.015.patch, HDFS-14090.016.patch, HDFS-14090.017.patch, > HDFS-14090.018.patch, HDFS-14090.019.patch, HDFS-14090.020.patch, > HDFS-14090.021.patch, HDFS-14090.022.patch, HDFS-14090.023.patch, > HDFS-14090.024.patch, RBF_ Isolation design.pdf > > > Router is a gateway to underlying name nodes. Gateway architectures, should > help minimize impact of clients connecting to healthy clusters vs unhealthy > clusters. > For example - If there are 2 name nodes downstream, and one of them is > heavily loaded with calls spiking rpc queue times, due to back pressure the > same with start reflecting on the router. As a result of this, clients > connecting to healthy/faster name nodes will also slow down as same rpc queue > is maintained for all calls at the router layer. Essentially the same IPC > thread pool is used by router to connect to all name nodes. > Currently router uses one single rpc queue for all calls. Lets discuss how we > can change the architecture and add some throttling logic for > unhealthy/slow/overloaded name nodes. > One way could be to read from current call queue, immediately identify > downstream name node and maintain a separate queue for each underlying name > node. Another simpler way is to maintain some sort of rate limiter configured > for each name node and let routers drop/reject/send error requests after > certain threshold. > This won’t be a simple change as router’s ‘Server’ layer would need redesign > and implementation. Currently this layer is the same as name node. > Opening this ticket to discuss, design and implement this feature. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14090) RBF: Improved isolation for downstream name nodes. {Static}
[ https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227189#comment-17227189 ] Yiqun Lin edited comment on HDFS-14090 at 11/6/20, 6:58 AM: Hi [~fengnanli], some minor comments from me: 1. I see here we introduce CONCURRENT_NS for concurrent call, why not acquire permit to corresponding ns instead of? 2. Current description of setting hdfs-rbf-default.xml can describe more. At least, we need to mention: * The setting name for how to configure handler count for each ns, also include CONCURRENT_NS ns. * The sum of dedicated handler count should be less than value of dfs.federation.router.handler.count 3. It would be better to add this improvement in HDFSRouterFederation.md. Comment #2 and #3 can be addressed in a follow-up JIRA, :). was (Author: linyiqun): Hi [~fengnanli], some minor comments from me: 1. I see here we introduce CONCURRENT_NS for concurrent call, why not acquire permit to corresponding ns instead of? 2. Current description of setting hdfs-rbf-default.xml can describe more. At least, we need to mention: * The setting name for how to configure handler count for each ns, also include CONCURRENT_NS ns. * The sum of dedicated handler count should be less than value of dfs.federation.router.handler.count 3. It would be better to add this improvement in HDFSRouterFederation.md. > RBF: Improved isolation for downstream name nodes. {Static} > --- > > Key: HDFS-14090 > URL: https://issues.apache.org/jira/browse/HDFS-14090 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: CR Hota >Assignee: Fengnan Li >Priority: Major > Attachments: HDFS-14090-HDFS-13891.001.patch, > HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, > HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, > HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, > HDFS-14090.009.patch, HDFS-14090.010.patch, HDFS-14090.011.patch, > HDFS-14090.012.patch, HDFS-14090.013.patch, HDFS-14090.014.patch, > HDFS-14090.015.patch, HDFS-14090.016.patch, HDFS-14090.017.patch, > HDFS-14090.018.patch, HDFS-14090.019.patch, HDFS-14090.020.patch, > HDFS-14090.021.patch, HDFS-14090.022.patch, HDFS-14090.023.patch, RBF_ > Isolation design.pdf > > > Router is a gateway to underlying name nodes. Gateway architectures, should > help minimize impact of clients connecting to healthy clusters vs unhealthy > clusters. > For example - If there are 2 name nodes downstream, and one of them is > heavily loaded with calls spiking rpc queue times, due to back pressure the > same with start reflecting on the router. As a result of this, clients > connecting to healthy/faster name nodes will also slow down as same rpc queue > is maintained for all calls at the router layer. Essentially the same IPC > thread pool is used by router to connect to all name nodes. > Currently router uses one single rpc queue for all calls. Lets discuss how we > can change the architecture and add some throttling logic for > unhealthy/slow/overloaded name nodes. > One way could be to read from current call queue, immediately identify > downstream name node and maintain a separate queue for each underlying name > node. Another simpler way is to maintain some sort of rate limiter configured > for each name node and let routers drop/reject/send error requests after > certain threshold. > This won’t be a simple change as router’s ‘Server’ layer would need redesign > and implementation. Currently this layer is the same as name node. > Opening this ticket to discuss, design and implement this feature. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14090) RBF: Improved isolation for downstream name nodes.
[ https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908758#comment-16908758 ] CR Hota edited comment on HDFS-14090 at 8/16/19 5:58 AM: - [~xkrogen] [~elgoiri] Many thanks for the detailed reviews. Very helpful :) Have incorporated almost all the points you folks mentioned in 010.patch. On a high level, changes are # "permit" is still the word being used. # One configuration controls the feature, {{NoFairnessPolicyController}} is dummy whereas {{StaticFairnessPolicyController}} is the fairness implementation. # The whole start-up will fail if fairness class loading has issues. Test cases are appropriately changed to reflect that. # {{NoPermitAvailableException}} is renamed to {{PermitLimitExceededException.}} To [~xkrogen] observations, {quote}I was considering the scenario where there are two routers R1 and R2, and two NameNodes N1 and N2. Assume most clients need to access both N1 and N2. What happens in the situation when all of R1's N1-handlers are full (but N2-handlers mostly empty), and all of R2's N2-handlers are full (but N1-handlers mostly empty)? I'm not sure if this is a situation that is likely to arise, or if the system will easily self-heal based on the backoff behavior. Maybe worth thinking about a little--not a blocking concern for me, more of a thought experiment. {quote} It should ideally not happen that all handlers of a specific router are busy and other handlers are completely free, since clients are expected to use random order while connecting. However, from the beginning the design focuses on getting the system to self-heal as much as possible to eventually get similar traffic across all routers in a cluster. {quote}The configuration for this seems like it will be really tricky to get right, particularly knowing how many fan-out handlers to allocate. I imagine as an administrator, my thought process would be like: I want 35% allocated to NN1 and 65% allocated to NN2, since NN2 is about 2x as loaded as NN1. This part is fairly intuitive. Then I encounter the fan-out configuration... What am I supposed to do with it? Are there perhaps any heuristics we can provide for reasonable values? {quote} Yes, configurations values are something, which users have to pay attention to specially concurrent calls. In the documentation sub-Jira HDFS-14558, I plan to write more about the concurrent calls and some points for users to focus on. Also configurations may need to be changed by users based on new use cases and load on downstream clusters etc. [~aajisaka] [~brahmareddy] [~linyiqun] [~hexiaoqiao] FYI. was (Author: crh): [~xkrogen] [~elgoiri] Many thanks for the detailed reviews. Very helpful :) Have incorporated almost all the points you folks mentioned. On a high level, changes are # "permit" is still the word being used. # One configuration controls the feature, {{NoFairnessPolicyController}} is dummy whereas {{StaticFairnessPolicyController}} is the fairness implementation. # The whole start-up will fail if fairness class loading has issues. Test cases are appropriately changed to reflect that. # {{NoPermitAvailableException}} is renamed to {{PermitLimitExceededException.}} To [~xkrogen] observations, {quote}I was considering the scenario where there are two routers R1 and R2, and two NameNodes N1 and N2. Assume most clients need to access both N1 and N2. What happens in the situation when all of R1's N1-handlers are full (but N2-handlers mostly empty), and all of R2's N2-handlers are full (but N1-handlers mostly empty)? I'm not sure if this is a situation that is likely to arise, or if the system will easily self-heal based on the backoff behavior. Maybe worth thinking about a little--not a blocking concern for me, more of a thought experiment. {quote} It should ideally not happen that all handlers of a specific router are busy and other handlers are completely free, since clients are expected to use random order while connecting. However, from the beginning the design focuses on getting the system to self-heal as much as possible to eventually get similar traffic across all routers in a cluster. {quote}The configuration for this seems like it will be really tricky to get right, particularly knowing how many fan-out handlers to allocate. I imagine as an administrator, my thought process would be like: I want 35% allocated to NN1 and 65% allocated to NN2, since NN2 is about 2x as loaded as NN1. This part is fairly intuitive. Then I encounter the fan-out configuration... What am I supposed to do with it? Are there perhaps any heuristics we can provide for reasonable values? {quote} Yes, configurations values are something, which users have to pay attention to specially concurrent calls. In the documentation sub-Jira HDFS-14558, I plan to write more about the concurrent calls and some
[jira] [Comment Edited] (HDFS-14090) RBF: Improved isolation for downstream name nodes.
[ https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906493#comment-16906493 ] Erik Krogen edited comment on HDFS-14090 at 8/13/19 6:27 PM: - [~crh] thanks for keeping me honest here :) Finally got around to taking a look at this. I read through the design again and had two additional concerns that came up: # I was considering the scenario where there are two routers R1 and R2, and two NameNodes N1 and N2. Assume most clients need to access both N1 and N2. What happens in the situation when all of R1's N1-handlers are full (but N2-handlers mostly empty), and all of R2's N2-handlers are full (but N1-handlers mostly empty)? I'm not sure if this is a situation that is likely to arise, or if the system will easily self-heal based on the backoff behavior. Maybe worth thinking about a little--not a blocking concern for me, more of a thought experiment. # The configuration for this seems like it will be really tricky to get right, particularly knowing how many fan-out handlers to allocate. I imagine as an administrator, my thought process would be like: ** I want 35% allocated to NN1 and 65% allocated to NN2, since NN2 is about 2x as loaded as NN1. This part is fairly intuitive. ** Then I encounter the fan-out configuration... What am I supposed to do with it? ** Are there perhaps any heuristics we can provide for reasonable values? Regarding the terminology, I actually think that "permit" better conveys the concept to me personally, however I think "quota" more closely matches similar terminology used throughout Hadoop (maybe this is bad – overloaded term?). My one concern with "permit" would be that it might imply that some number of permits are requested at a dynamic startup phase (e.g. a job requests permits at startup), rather than it being a constant allocation count. I don't have a strong preference here. I do agree that {{NoPermitAvailableException}} could use a better name to more readily indicate that it is an overloaded situation; {{PermitLimitExceededException}} might be better. {quote} * Instead of having two configs, one for enabling and one for the implementation, we could have just the implementation and by default provide a dummy implementation that doesn't do fairness. Then we would rename the current DefaultFairnessPolicyController to something more descriptive (to reflect equal or linear or similar). * Coming back to PermitAllocationException, right now we are kind of logging and swallowing; what about failing the whole startup?{quote} +1 on these two ideas from [~elgoiri] I didn't do a thorough review, but from an initial look through the code, I am also impressed by how isolated the changes were able to be. It's great to see. I have some more isolated comments below. # This code seems to assume that the set of nameservices controlled by a router will never change. I haven't been following the router closely, but I thought that you could dynamically change the set of mount points. I see we're loading the nameservices from {{DFS_ROUTER_MONITOR_NAMENODE}} – is it actually accurate that the set of monitored NameNodes matches the set of mount points? # If someone actually has a nameservice called "concurrent" (used for the {{concurrentNS}}), this is going to cause problems. Given that this name will appear in user configurations, maybe it's nice for it to be an easily human-readable name, but we should add some logic to detect this collision and complain about it. # I think a {{WARN}} log on a permit allocation failure is a bit strong. This could really flood the logs when things get busy. I would suggest downgrading it to a {{DEBUG}}, or using the {{LogThrottlingHelper}} to limit how frequently this will be logged. # I think we need to document the nameservice-specific configurations in {{hdfs-rbf-default.xml}}, including the presence of the special "concurrent" nameservice. # Nits: ## You've sometimes used {{this.}} as a prefix for fields and sometimes not, can we make it more consistent? ## You should include diamond-types ( {{<>}} ) in your {{HashMap}} / {{HashSet}} instantiations ## Can we combine the log statements on L113 and L115 of {{DefaultFairnessPolicyController}}? was (Author: xkrogen): [~crh] thanks for keeping me honest here :) Finally got around to taking a look at this. I read through the design again and had two additional concerns that came up: # I was considering the scenario where there are two routers R1 and R2, and two NameNodes N1 and N2. Assume most clients need to access both N1 and N2. What happens in the situation when all of R1's N1-handlers are full (but N2-handlers mostly empty), and all of R2's N2-handlers are full (but N1-handlers mostly empty)? I'm not sure if this is a situation that is likely to arise, or if the system will easily self-heal based on
[jira] [Comment Edited] (HDFS-14090) RBF: Improved isolation for downstream name nodes.
[ https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891583#comment-16891583 ] He Xiaoqiao edited comment on HDFS-14090 at 7/24/19 4:00 AM: - Thanks [~crh] for your contribution, [^HDFS-14090.006.patch] almost looks good to me. minor comment about permit acquires and releases, {{RouterRpcClient#acquirePermit}} & {{RouterRpcClient#releasePermit}} should invoke pairs, and it certainly did the same thing. However some logic could not process exception correctly, it may lead {{Permit}} could not release as expected, 1. {{RouterRpcClient#invokeSequential}}, when #getNamenodesForNameservice throw exception, we could not release permit as expected. {code:java} public T invokeSequential( final List locations, final RemoteMethod remoteMethod, Class expectedResultClass, Object expectedResultValue) throws IOException { .. for (final RemoteLocationContext loc : locations) { String ns = loc.getNameserviceId(); acquirePermit(ns, ugi, m); List namenodes = getNamenodesForNameservice(ns); // if throw exception, it could never release permit anymore. try { } catch{} finally { releasePermit(ns, ugi, m); } } .. } {code} 2. in {{RouterRpcClient#invokeConcurrent}}, the issue could also exist after the second time to invoke {{acquirePermit}}. One minor suggestion, the whole code segment between {{acquirePermit}} and {{releasePermit}} should be encase by {{try{} catch{} finally}} statement to ensure that we could release the acquired permit in any case. Thanks [~crh] again, please let me know if there is something I missed. was (Author: hexiaoqiao): Thanks [~crh] for your contribution, [^HDFS-14090.006.patch] almost looks good to me. minor comment about permit acquires and releases, {{RouterRpcClient#acquirePermit}} & {{RouterRpcClient#releasePermit}} should invoke pairs, and it certainly did the same thing. However some logic could not process exception correctly, it may lead {{Permit}} could not release as expected, 1. {{RouterRpcClient#invokeSequential}}, when #getNamenodesForNameservice throw exception, we could not release permit as expected. {code:java} public T invokeSequential( final List locations, final RemoteMethod remoteMethod, Class expectedResultClass, Object expectedResultValue) throws IOException { .. for (final RemoteLocationContext loc : locations) { String ns = loc.getNameserviceId(); acquirePermit(ns, ugi, m); List namenodes = getNamenodesForNameservice(ns); try { } catch{ finally{ releasePermit(ns, ugi, m); } } .. } {code} 2. in {{RouterRpcClient#invokeConcurrent}}, the issue could also exist after the second time to invoke {{acquirePermit}}. One minor suggestion, the whole code segment between {{acquirePermit}} and {{releasePermit}} should be encase by {{try{} catch{} finally}} statement to ensure that we could release the acquired permit in any case. Thanks [~crh] again, please let me know if there is something I missed. > RBF: Improved isolation for downstream name nodes. > -- > > Key: HDFS-14090 > URL: https://issues.apache.org/jira/browse/HDFS-14090 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Attachments: HDFS-14090-HDFS-13891.001.patch, > HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, > HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, > HDFS-14090.006.patch, RBF_ Isolation design.pdf > > > Router is a gateway to underlying name nodes. Gateway architectures, should > help minimize impact of clients connecting to healthy clusters vs unhealthy > clusters. > For example - If there are 2 name nodes downstream, and one of them is > heavily loaded with calls spiking rpc queue times, due to back pressure the > same with start reflecting on the router. As a result of this, clients > connecting to healthy/faster name nodes will also slow down as same rpc queue > is maintained for all calls at the router layer. Essentially the same IPC > thread pool is used by router to connect to all name nodes. > Currently router uses one single rpc queue for all calls. Lets discuss how we > can change the architecture and add some throttling logic for > unhealthy/slow/overloaded name nodes. > One way could be to read from current call queue, immediately identify > downstream name node and maintain a separate queue for each underlying name > node. Another simpler way is to maintain some sort of rate limiter configured > for each name node and let routers drop/reject/send error requests after > certain threshold. >