[ https://issues.apache.org/jira/browse/HDFS-17302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17808453#comment-17808453 ]
ASF GitHub Bot commented on HDFS-17302: --------------------------------------- KeeProMise commented on PR #6380: URL: https://github.com/apache/hadoop/pull/6380#issuecomment-1899693424 > > @huangzhaobo99 do you still have concerns with the approach? > > @goiri No worries anymore, I think the sharing mechanism is really good, and percentage based allocation is easier to use. cc @KeeProMise @goiri @huangzhaobo99 Thanks for your review. If no more comments here, please help merge it, thanks! @goiri > RBF: ProportionRouterRpcFairnessPolicyController-Sharing and isolation. > ----------------------------------------------------------------------- > > Key: HDFS-17302 > URL: https://issues.apache.org/jira/browse/HDFS-17302 > Project: Hadoop HDFS > Issue Type: New Feature > Components: rbf > Reporter: Jian Zhang > Assignee: Jian Zhang > Priority: Major > Labels: pull-request-available > Attachments: HDFS-17302.001.patch, HDFS-17302.002.patch, > HDFS-17302.003.patch > > > h2. Current shortcomings > [HDFS-14090|https://issues.apache.org/jira/browse/HDFS-14090] provides a > StaticRouterRpcFairnessPolicyController to support configuring different > handlers for different ns. Using the StaticRouterRpcFairnessPolicyController > allows the router to isolate different ns, and the ns with a higher load will > not affect the router's access to the ns with a normal load. But the > StaticRouterRpcFairnessPolicyController still falls short in many ways, such > as: > 1. *Configuration is inconvenient and error-prone*: When I use > StaticRouterRpcFairnessPolicyController, I first need to know how many > handlers the router has in total, then I have to know how many nameservices > the router currently has, and then carefully calculate how many handlers to > allocate to each ns so that the sum of handlers for all ns will not exceed > the total handlers of the router, and I also need to consider how many > handlers to allocate to each ns to achieve better performance. Therefore, I > need to be very careful when configuring. Even if I configure only one more > handler for a certain ns, the total number is more than the number of > handlers owned by the router, which will also cause the router to fail to > start. At this time, I had to investigate the reason why the router failed to > start. After finding the reason, I had to reconsider the number of handlers > for each ns. In addition, when I reconfigure the total number of handlers on > the router, I have to re-allocate handlers to each ns, which undoubtedly > increases the complexity of operation and maintenance. > 2. *Extension ns is not supported*: During the running of the router, if a > new ns is added to the cluster and a mount is added for the ns, but because > no handler is allocated for the ns, the ns cannot be accessed through the > router. We must reconfigure the number of handlers and then refresh the > configuration. At this time, the router can access the ns normally. When we > reconfigure the number of handlers, we have to face disadvantage 1: > Configuration is inconvenient and error-prone. > 3. *Waste handlers*: The main purpose of proposing > RouterRpcFairnessPolicyController is to enable the router to access ns with > normal load and not be affected by ns with higher load. First of all, not all > ns have high loads; secondly, ns with high loads do not have high loads 24 > hours a day. It may be that only certain time periods, such as 0 to 8 > o'clock, have high loads, and other time periods have normal loads. Assume > there are 2 ns, and each ns is allocated half of the number of handlers. > Assume that ns1 has many requests from 0 to 14 o'clock, and almost no > requests from 14 to 24 o'clock, ns2 has many requests from 12 to 24 o'clock, > and almost no requests from 0 to 14 o'clock; when it is between 0 o'clock and > 12 o'clock and between 14 o'clock and 24 o'clock, only one ns has more > requests and the other ns has almost no requests, so we have wasted half of > the number of handlers. > 4. *Only isolation, no sharing*: The staticRouterRpcFairnessPolicyController > does not support sharing, only isolation. I think isolation is just a means > to improve the performance of router access to normal ns, not the purpose. It > is impossible for all ns in the cluster to have high loads. On the contrary, > in most scenarios, only a few ns in the cluster have high loads, and the > loads of most other ns are normal. For ns with higher load and ns with normal > load, we need to isolate their handlers so that the ns with higher load will > not affect the performance of ns with lower load. However, for nameservices > that are also under normal load, or are under higher load, we do not need to > isolate them, these ns of the same nature can share the handlers of the > router; The performance is better than assigning a fixed number of handlers > to each ns, because each ns can use all the handlers of the router. > h2. New features > Based on the above staticRouterRpcFairnessPolicyController, there are > deficiencies in usage and performance. I provide a new > RouterRpcFairnessPolicyController: > ProportionRouterRpcFairnessPolicyController (maybe with a better name) to > solve the above major shortcomings. > 1. *More user-friendly configuration* : Supports allocating handlers > proportionally to each ns. For example, we can give ns1 a handler ratio of > 0.2, then ns1 will use 0.2 of the total number of handlers on the router. > Using this method, we do not need to confirm in advance how many handlers the > router has. > 2. *Sharing and isolation* : Sharing is as important as isolation. We > support that the sum of handlers for all ns exceeds the total number of > handlers. For example, assuming we have 10 handlers and 3 ns, we can allocate > 5 (0.5) handlers to ns1, 5 (0.5) handlers to ns2, and ns3 also allocates 5 > (0.5) handlers.This feature is very important,.Consider the following > scenarios: > - Only one ns is busy during a period of time: Assume that ns1 has more > requests from 0 to 8 o'clock, ns2 has more requests from 8 to 16 o'clock, and > ns3 has more requests from 16 o'clock to 24 o'clock. Then, at any time > period, the ns with more requests uses at most half of the handlers, and the > other two normal ns share the remaining half of the handlers. In this way, > the isolation is still satisfied, and compared with > StaticRouterRpcFairnessPolicyController, we can use more handlers to handle > requests of busy and Normal ns (if you use > StaticRouterRpcFairnessPolicyController, each ns uses 3 handlers-[ns1:3 ns2:3 > ns3:3], now we can let each ns use 5 handlers). > - Only ns1 is busy: Assuming that only ns1 is busy at any time, the requests > for ns2 and ns3 are normal (the requests to access ns2 and ns3 are very few > and very fast because the downstream namenode has no pressure). We can give > ns1 5(0.5) handlers, ns2 and ns3 both have 10(1) handlers. Since the number > of requests for ns2 and ns3 is very small, and the request processing time is > very short, it will not have a major impact on the performance of ns1, and we > stipulate that ns1 uses at most half of the handlers, so the isolation is > still met. > 3. *Transparent extension*: Expanding new ns does not require refreshing the > configuration. For an ns, if we do not assign handlers to it, we can assign a > certain proportion of handlers to it by default. > 4. *Fully compatible*: The new RouterRpcFairnessPolicyController fully meets > the characteristics of StaticRouterRpcFairnessPolicyController. If we want to > only support isolation but not sharing, we can allocate 0.3 to ns2、0.3 to > ns3、0.4 to ns1. This is also more convenient than using the original > StaticRouterRpcFairnessPolicyController, because we don't need to know how > many handlers the router has in total. > Therefore, the new RouterRpcFairnessPolicyController is more flexible, has > better performance, and is more suitable for actual production environments. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org