[ https://issues.apache.org/jira/browse/HDFS-16646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Íñigo Goiri updated HDFS-16646: ------------------------------- Description: ### Description of PR As we all known, `StaticRouterRpcFairnessPolicyController` is very helpfully for RBF to minimize impact of clients connecting to healthy vs unhealthy nameNodes. But in prod environment, the traffic of clients accessing each NS and the pressure of downstream namenodes are dynamically changed. So if we only have one static permit conf, RBF cannot able to adapt to the changes in traffic to achieve optimal results. So here I propose an elastic RouterRpcFairnessPolicyController to help RBF adapt to traffic changes to achieve an optimal result. The overall idea is: - Each name service can configured the exclusive permits like `StaticRouterRpcFairnessPolicyController` - TotalPermits is more than sum(NsExclusivePermit) and mark TotalPermits - sum(NsExclusivePermit) as SharedPermits - Each name service can properly preempt the SharedPermits after it's own exclusive permits is used up. - But the maximum value of SharedPermits preempted by each nameservice should be limited. Such as 20% of SharedPermits. Suppose we have 200 handlers and 5 name services, and each name services configured different exclusive Permits, like: | NS1 | NS2 | NS3 | NS4 | NS5 | Concurrent NS | |-- | -- | -- | -- | -- | -- | | 9 | 11 | 8 | 12 | 10 | 50 | The `sum(NsExclusivePermit)` is 100, and the `SharedPermits = TotalPermits(200) - Sum(NsExclusivePermit)(100) = 100` Suppose we configure that each nameservice can preempt up to 20% of TotalPermits, marked as `elasticPercent`. Then from the point view of a single NS, the permits it may be can use are as follow: - Exclusive Permits, which is cannot be used by other name services. - Limited SharedPermits, whether is can use so many shared permits depends on the remaining number of SharedPermits, because the SharedPermits is be preempted by all nameservices. If we configure the `elasticPercent=100`, it means one nameservices can use up all SharedPermits. If we configure the `elasticPercent=0`, it means nameservice can only use it's exclusive Permits. If we configure the `elasticPercent=20`, it means that the RBF can tolerate 5 unhealthy name services at the same time. In our prod environment, we configured as follow, and it works well: - RBF has 3000 handlers - Each nameservice has 10 exclusive permits - `elasticPercent` is 30% Of course, we need to configure reasonable parameters according to the prod traffic. was: As we all known, StaticRouterRpcFairnessPolicyController is very helpfully for RBF to minimize impact of clients connecting to healthy vs unhealthy nameNodes. But in prod environment, the traffic of clients accessing each NS and the pressure of downstream namenodes are dynamically changed. So if we only have one static permit conf, RBF cannot able to adapt to the changes in traffic to achieve optimal results. So here I propose an elastic RouterRpcFairnessPolicyController to help RBF adapt to traffic changes to achieve an optimal result. The overall idea is: Each name service can configured the exclusive permits like StaticRouterRpcFairnessPolicyController TotalPermits is more than sum(NsExclusivePermit) and mark TotalPermits - sum(NsExclusivePermit) as SharedPermits Each name service can properly preempt the SharedPermits after it's own exclusive permits is used up. But the maximum value of SharedPermits preempted by each nameservice should be limited. Such as 20% of SharedPermits. Suppose we have 200 handlers and 5 name services, and each name services configured different exclusive Permits, like: NS1 NS2 NS3 NS4 NS5 Concurrent NS 9 11 8 12 10 50 The sum(NsExclusivePermit) is 100, and the SharedPermits = TotalPermits(200) - Sum(NsExclusivePermit)(100) = 100 Suppose we configure that each nameservice can preempt up to 20% of TotalPermits, marked as elasticPercent. Then from the point view of a single NS, the permits it may be can use are as follow: Exclusive Permits, which is cannot be used by other name services. Limited SharedPermits, whether is can use so many shared permits depends on the remaining number of SharedPermits, because the SharedPermits is be preempted by all nameservices. If we configure the elasticPercent=100, it means one nameservices can use up all SharedPermits. If we configure the elasticPercent=0, it means nameservice can only use it's exclusive Permits. If we configure the elasticPercent=20, it means that the RBF can tolerate 5 unhealthy name services at the same time. In our prod environment, we configured as follow, and it works well: RBF has 3000 handlers Each nameservice has 10 exclusive permits elasticPercent is 30% Of course, we need to configure reasonable parameters according to the prod traffic. > RBF: Support an elastic RouterRpcFairnessPolicyController > --------------------------------------------------------- > > Key: HDFS-16646 > URL: https://issues.apache.org/jira/browse/HDFS-16646 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: ZanderXu > Assignee: ZanderXu > Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > ### Description of PR > As we all known, `StaticRouterRpcFairnessPolicyController` is very helpfully > for RBF to minimize impact of clients connecting to healthy vs unhealthy > nameNodes. > But in prod environment, the traffic of clients accessing each NS and the > pressure of downstream namenodes are dynamically changed. So if we only have > one static permit conf, RBF cannot able to adapt to the changes in traffic to > achieve optimal results. > So here I propose an elastic RouterRpcFairnessPolicyController to help RBF > adapt to traffic changes to achieve an optimal result. > The overall idea is: > - Each name service can configured the exclusive permits like > `StaticRouterRpcFairnessPolicyController` > - TotalPermits is more than sum(NsExclusivePermit) and mark TotalPermits - > sum(NsExclusivePermit) as SharedPermits > - Each name service can properly preempt the SharedPermits after it's own > exclusive permits is used up. > - But the maximum value of SharedPermits preempted by each nameservice should > be limited. Such as 20% of SharedPermits. > Suppose we have 200 handlers and 5 name services, and each name services > configured different exclusive Permits, like: > | NS1 | NS2 | NS3 | NS4 | NS5 | Concurrent NS | > |-- | -- | -- | -- | -- | -- | > | 9 | 11 | 8 | 12 | 10 | 50 | > The `sum(NsExclusivePermit)` is 100, and the `SharedPermits = > TotalPermits(200) - Sum(NsExclusivePermit)(100) = 100` > Suppose we configure that each nameservice can preempt up to 20% of > TotalPermits, marked as `elasticPercent`. > Then from the point view of a single NS, the permits it may be can use are as > follow: > - Exclusive Permits, which is cannot be used by other name services. > - Limited SharedPermits, whether is can use so many shared permits depends on > the remaining number of SharedPermits, because the SharedPermits is be > preempted by all nameservices. > If we configure the `elasticPercent=100`, it means one nameservices can use > up all SharedPermits. > If we configure the `elasticPercent=0`, it means nameservice can only use > it's exclusive Permits. > If we configure the `elasticPercent=20`, it means that the RBF can tolerate 5 > unhealthy name services at the same time. > In our prod environment, we configured as follow, and it works well: > - RBF has 3000 handlers > - Each nameservice has 10 exclusive permits > - `elasticPercent` is 30% > Of course, we need to configure reasonable parameters according to the prod > traffic. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org