[ 
https://issues.apache.org/jira/browse/HDFS-16646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated HDFS-16646:
-------------------------------
    Description: 
### Description of PR
As we all known, `StaticRouterRpcFairnessPolicyController` is very helpfully 
for RBF to minimize impact of clients connecting to healthy vs unhealthy 
nameNodes. 
But in prod environment, the traffic of clients accessing each NS and the 
pressure of downstream namenodes are dynamically changed. So if we only have 
one static permit conf, RBF cannot able to adapt to the changes in traffic to 
achieve optimal results. 

So here I propose an elastic RouterRpcFairnessPolicyController to help RBF 
adapt to traffic changes to achieve an optimal result.

The overall idea is:
- Each name service can configured the exclusive permits like 
`StaticRouterRpcFairnessPolicyController`
- TotalPermits is more than sum(NsExclusivePermit) and mark TotalPermits - 
sum(NsExclusivePermit) as SharedPermits
- Each name service can properly preempt the SharedPermits after it's own 
exclusive permits is used up.
- But the maximum value of SharedPermits preempted by each nameservice should 
be limited. Such as 20% of SharedPermits.

Suppose we have 200 handlers and 5 name services, and each name services 
configured different exclusive Permits, like:
| NS1 | NS2 | NS3 | NS4 | NS5 | Concurrent NS |
|-- | -- | -- | -- | -- | -- |
| 9 | 11 | 8 | 12 | 10 | 50 |

The `sum(NsExclusivePermit)` is 100, and the `SharedPermits = TotalPermits(200) 
- Sum(NsExclusivePermit)(100) = 100`
Suppose we configure that each nameservice can preempt up to 20% of 
TotalPermits, marked as `elasticPercent`.

Then from the point view of a single NS, the permits it may be can use are as 
follow:
- Exclusive Permits, which is cannot be used by other name services.
- Limited SharedPermits, whether is can use so many shared permits depends on 
the remaining number of SharedPermits, because the SharedPermits is be 
preempted by all nameservices.

If we configure the `elasticPercent=100`, it means one nameservices can use up 
all SharedPermits.
If we configure the `elasticPercent=0`, it means nameservice can only use it's 
exclusive Permits.
If we configure the `elasticPercent=20`, it means that the RBF can tolerate 5 
unhealthy name services at the same time.

In our prod environment, we configured as follow, and it works well:
- RBF has 3000 handlers
- Each nameservice has 10 exclusive permits
- `elasticPercent` is 30%

Of course, we need to configure reasonable parameters according to the prod 
traffic.

  was:
As we all known, StaticRouterRpcFairnessPolicyController is very helpfully for 
RBF to minimize impact of clients connecting to healthy vs unhealthy nameNodes.
But in prod environment, the traffic of clients accessing each NS and the 
pressure of downstream namenodes are dynamically changed. So if we only have 
one static permit conf, RBF cannot able to adapt to the changes in traffic to 
achieve optimal results.

So here I propose an elastic RouterRpcFairnessPolicyController to help RBF 
adapt to traffic changes to achieve an optimal result.

The overall idea is:

Each name service can configured the exclusive permits like 
StaticRouterRpcFairnessPolicyController
TotalPermits is more than sum(NsExclusivePermit) and mark TotalPermits - 
sum(NsExclusivePermit) as SharedPermits
Each name service can properly preempt the SharedPermits after it's own 
exclusive permits is used up.
But the maximum value of SharedPermits preempted by each nameservice should be 
limited. Such as 20% of SharedPermits.
Suppose we have 200 handlers and 5 name services, and each name services 
configured different exclusive Permits, like:

NS1     NS2     NS3     NS4     NS5     Concurrent NS
9       11      8       12      10      50
The sum(NsExclusivePermit) is 100, and the SharedPermits = TotalPermits(200) - 
Sum(NsExclusivePermit)(100) = 100
Suppose we configure that each nameservice can preempt up to 20% of 
TotalPermits, marked as elasticPercent.

Then from the point view of a single NS, the permits it may be can use are as 
follow:

Exclusive Permits, which is cannot be used by other name services.
Limited SharedPermits, whether is can use so many shared permits depends on the 
remaining number of SharedPermits, because the SharedPermits is be preempted by 
all nameservices.
If we configure the elasticPercent=100, it means one nameservices can use up 
all SharedPermits.
If we configure the elasticPercent=0, it means nameservice can only use it's 
exclusive Permits.
If we configure the elasticPercent=20, it means that the RBF can tolerate 5 
unhealthy name services at the same time.

In our prod environment, we configured as follow, and it works well:

RBF has 3000 handlers
Each nameservice has 10 exclusive permits
elasticPercent is 30%
Of course, we need to configure reasonable parameters according to the prod 
traffic.


> RBF: Support an elastic RouterRpcFairnessPolicyController
> ---------------------------------------------------------
>
>                 Key: HDFS-16646
>                 URL: https://issues.apache.org/jira/browse/HDFS-16646
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: ZanderXu
>            Assignee: ZanderXu
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> ### Description of PR
> As we all known, `StaticRouterRpcFairnessPolicyController` is very helpfully 
> for RBF to minimize impact of clients connecting to healthy vs unhealthy 
> nameNodes. 
> But in prod environment, the traffic of clients accessing each NS and the 
> pressure of downstream namenodes are dynamically changed. So if we only have 
> one static permit conf, RBF cannot able to adapt to the changes in traffic to 
> achieve optimal results. 
> So here I propose an elastic RouterRpcFairnessPolicyController to help RBF 
> adapt to traffic changes to achieve an optimal result.
> The overall idea is:
> - Each name service can configured the exclusive permits like 
> `StaticRouterRpcFairnessPolicyController`
> - TotalPermits is more than sum(NsExclusivePermit) and mark TotalPermits - 
> sum(NsExclusivePermit) as SharedPermits
> - Each name service can properly preempt the SharedPermits after it's own 
> exclusive permits is used up.
> - But the maximum value of SharedPermits preempted by each nameservice should 
> be limited. Such as 20% of SharedPermits.
> Suppose we have 200 handlers and 5 name services, and each name services 
> configured different exclusive Permits, like:
> | NS1 | NS2 | NS3 | NS4 | NS5 | Concurrent NS |
> |-- | -- | -- | -- | -- | -- |
> | 9 | 11 | 8 | 12 | 10 | 50 |
> The `sum(NsExclusivePermit)` is 100, and the `SharedPermits = 
> TotalPermits(200) - Sum(NsExclusivePermit)(100) = 100`
> Suppose we configure that each nameservice can preempt up to 20% of 
> TotalPermits, marked as `elasticPercent`.
> Then from the point view of a single NS, the permits it may be can use are as 
> follow:
> - Exclusive Permits, which is cannot be used by other name services.
> - Limited SharedPermits, whether is can use so many shared permits depends on 
> the remaining number of SharedPermits, because the SharedPermits is be 
> preempted by all nameservices.
> If we configure the `elasticPercent=100`, it means one nameservices can use 
> up all SharedPermits.
> If we configure the `elasticPercent=0`, it means nameservice can only use 
> it's exclusive Permits.
> If we configure the `elasticPercent=20`, it means that the RBF can tolerate 5 
> unhealthy name services at the same time.
> In our prod environment, we configured as follow, and it works well:
> - RBF has 3000 handlers
> - Each nameservice has 10 exclusive permits
> - `elasticPercent` is 30%
> Of course, we need to configure reasonable parameters according to the prod 
> traffic.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to