[jira] [Comment Edited] (HDFS-14090) RBF: Improved isolation for downstream name nodes. {Static}

Yiqun Lin (Jira) Thu, 12 Nov 2020 06:39:25 -0800


    [ 
https://issues.apache.org/jira/browse/HDFS-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17230612#comment-17230612
 ]


Yiqun Lin edited comment on HDFS-14090 at 11/12/20, 2:38 PM:
-------------------------------------------------------------

Hi [~fengnanli], three nits for the latest patch:

1 Will look good to  rename dfs.federation.router.fairness.handler.count.NS to 
dfs.federation.router.fairness.handler.count.EXAMPLENAMESERVICE.

2 {noformat}
smaller or equal to the total number of router handlers; if the special
      *concurrent* is not specified, the sum of all configured values must be
      strictly smaller than the router handlers thus the left will be allocated
      to the concurrent calls.
{noformat}
Can we mention related setting ''strictly smaller than the router handlers 
(dfs.federation.router.handler.count)...

3
Can you fix related failed unit test?
|hadoop.hdfs.server.federation.router.TestRBFConfigFields|

Others look good to me.


was (Author: linyiqun):
Hi [~fengnanli], two nits for the latest patch:
{noformat}
smaller or equal to the total number of router handlers; if the special
      *concurrent* is not specified, the sum of all configured values must be
      strictly smaller than the router handlers thus the left will be allocated
      to the concurrent calls.
{noformat}
Can we mention related setting ''strictly smaller than the router handlers 
(dfs.federation.router.handler.count)...

Can you fix related failed unit test?
|hadoop.hdfs.server.federation.router.TestRBFConfigFields|

Others look good to me.

> RBF: Improved isolation for downstream name nodes. {Static}
> -----------------------------------------------------------
>
>                 Key: HDFS-14090
>                 URL: https://issues.apache.org/jira/browse/HDFS-14090
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: CR Hota
>            Assignee: Fengnan Li
>            Priority: Major
>         Attachments: HDFS-14090-HDFS-13891.001.patch, 
> HDFS-14090-HDFS-13891.002.patch, HDFS-14090-HDFS-13891.003.patch, 
> HDFS-14090-HDFS-13891.004.patch, HDFS-14090-HDFS-13891.005.patch, 
> HDFS-14090.006.patch, HDFS-14090.007.patch, HDFS-14090.008.patch, 
> HDFS-14090.009.patch, HDFS-14090.010.patch, HDFS-14090.011.patch, 
> HDFS-14090.012.patch, HDFS-14090.013.patch, HDFS-14090.014.patch, 
> HDFS-14090.015.patch, HDFS-14090.016.patch, HDFS-14090.017.patch, 
> HDFS-14090.018.patch, HDFS-14090.019.patch, HDFS-14090.020.patch, 
> HDFS-14090.021.patch, HDFS-14090.022.patch, HDFS-14090.023.patch, 
> HDFS-14090.024.patch, RBF_ Isolation design.pdf
>
>
> Router is a gateway to underlying name nodes. Gateway architectures, should 
> help minimize impact of clients connecting to healthy clusters vs unhealthy 
> clusters.
> For example - If there are 2 name nodes downstream, and one of them is 
> heavily loaded with calls spiking rpc queue times, due to back pressure the 
> same with start reflecting on the router. As a result of this, clients 
> connecting to healthy/faster name nodes will also slow down as same rpc queue 
> is maintained for all calls at the router layer. Essentially the same IPC 
> thread pool is used by router to connect to all name nodes.
> Currently router uses one single rpc queue for all calls. Lets discuss how we 
> can change the architecture and add some throttling logic for 
> unhealthy/slow/overloaded name nodes.
> One way could be to read from current call queue, immediately identify 
> downstream name node and maintain a separate queue for each underlying name 
> node. Another simpler way is to maintain some sort of rate limiter configured 
> for each name node and let routers drop/reject/send error requests after 
> certain threshold. 
> This won’t be a simple change as router’s ‘Server’ layer would need redesign 
> and implementation. Currently this layer is the same as name node.
> Opening this ticket to discuss, design and implement this feature.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-14090) RBF: Improved isolation for downstream name nodes. {Static}

Reply via email to