At 2018-05-11 21:23:55, "Willem de Bruijn" <willemdebruijn.ker...@gmail.com> wrote: >On Fri, May 11, 2018 at 2:20 AM, Gao Feng <gfree.w...@vip.163.com> wrote: >> At 2018-05-11 11:54:55, "Willem de Bruijn" <willemdebruijn.ker...@gmail.com> >> wrote: >>>On Thu, May 10, 2018 at 4:28 AM, <gfree.w...@vip.163.com> wrote: >>>> From: Gao Feng <gfree.w...@vip.163.com> >>>> >>>> The skb flow limit is implemented for each CPU independently. In the >>>> current codes, the function skb_flow_limit gets the softnet_data by >>>> this_cpu_ptr. But the target cpu of enqueue_to_backlog would be not >>>> the current cpu when enable RPS. As the result, the skb_flow_limit checks >>>> the stats of current CPU, while the skb is going to append the queue of >>>> another CPU. It isn't the expected behavior. >>>> >>>> Now pass the softnet_data as a param to softnet_data to make consistent. >>> >>>The local cpu softnet_data is used on purpose. The operations in >>>skb_flow_limit() on sd fields could race if not executed on the local cpu. >> >> I think the race doesn't exist because of the rps_lock. >> The enqueue_to_backlog has hold the rps_lock before skb_flow_limit. > >Indeed, I overlooked that. There still is the matter of cache contention.
The cache contention is really important in this case? I don't think so, because the enqueue_to_backlog have touched and modified the softnet_stat of target cpu. > >>>Flow limit tries to detect large ("elephant") DoS flows with a fixed >>>four-tuple. >>>These would always hit the same RPS cpu, so that cpu being backlogged >> >> They may hit the different target CPU when enable RFS. Because the app could >> be scheduled >> to another CPU, then RFS tries to deliver the skb to latest core which has >> hot cache. > >This even more suggest using the initial (or IRQ) cpu to track state, instead >of the destination (RPS/RFS) cpu. I couldn't understand why it is better to track state on initial cpu, not the target cpu. The latter one could get more accurate result. > >>>may be an indication that such a flow is active. But the flow will also >>>always >>>arrive on the same initial cpu courtesy of RSS. So storing the lookup table >> >> The RSS couldn't make sure the irq is handled by same cpu. It would be >> balanced between >> the cpus. > >IRQs are usually pinned to cores. Unless using something like irqbalance, >but that operates at too coarse a timescale to do anything useful at Mpps >packet rates. There are some motherboard which couldn't make sure the irq is pinned. The flow_limit wouldn't work as well as expected. > >>>on the initial CPU is also fine. There may be false positives on other CPUs >>>with the same RPS destination, but that is unlikely with a highly concurrent >>>traffic server mix ("mice"). >> >> If my comment is right, the flow couldn't always arrive one the same initial >> cpu, although >> it may be sent to one same target cpu. >> >>> >>>Note that the sysctl net.core.flow_limit_cpu_bitmap enables the feature >>>for the cpus on which traffic initially lands, not the RPS destination cpus. >>>See also Documentation/networking/scaling.txt >>> >>>That said, I had to reread the code, as it does seem sensible that the >>>same softnet_data is intended to be used both when testing qlen and >>>flow_limit. >> >> In most cases, user configures the same RPS map with flow_limit like 0xff. >> Because user couldn't predict which core the evil flow would arrive on. >> >> Take an example, there are 2 cores, cpu0 and cpu1. >> One flow is the an evil flow, but the irq is sent to cpu0. After RPS/RFS, >> the target cpu is cpu1. >> Now cpu0 invokes enqueue_to_backlog, then the skb_flow_limit checkes the >> queue length >> of cpu0. Certainly it could pass the check of skb_flow_limit because there >> is no any evil flow on cpu0. > >No, enqueue_to_backlog passes qlen to skb_flow_limit, so that does >check the queue length of the RPS cpu. Sorry, I overlooked the qlen is the length of the rps cpu. Then it's ok unless the stats may be not accurate when irq isn't pinned. But I still doubt that is it really important to track state on initial cpu, not target cpu? Because the enqueue_to_backlog have touched the softnet_data of target cpu. Best Regards Feng > >> Then the cpu0 inserts the skb into the queue of cpu1. >> As a result, the skb_flow_limit doesn't work as expected. >> >> BTW, I have already sent the v2 patch which only adds the "Fixes: tag". > >The change also makes the code inconsistent with >Documentation/networking/scaling.txt > >"In such environments, enable the feature on all CPUs that handle >network rx interrupts (as set in /proc/irq/N/smp_affinity)."