Re: PREEMPT_RT and I-PIPE: the numbers, part 4
Ingo Molnar wrote: > So why do your "ping flood" results show such difference? It really is > just another type of interrupt workload and has nothing special in it. ... > are you suggesting this is not really a benchmark but a way to test how > well a particular system withholds against extreme external load? Look, you're basically splitting hairs. No matter how involved an explanation you can provide, it remains that both vanilla and I-pipe were subject to the same load. If PREEMPT_RT consistently shows the same degradation under the same setup, and that is indeed the case, then the problem is with PREEMPT_RT, not the tests. > so you can see ping packet flow fluctuations in your tests? Then you > cannot use those results as any sort of benchmark metric. I didn't say this. I said that if fluctuation there is, then maybe this is something we want to see the effect of. In real world applications, interrupts may not come in at a steady pace, as you try to achieve in your own tests. > and from this point on you should see zero lmbench overhead from flood > pinging. Can vanilla or I-PIPE do that? Let's not get into what I-pipe can or cannot do, that's not what these numbers are about. It's pretty darn amazing that we're even having this conversation. The PREEMPT_RT stuff is being worked on by more than a dozen developers spread accross some of the most well-known Linux companies out there (RedHat, MontaVista, IBM, TimeSys, etc.). Yet, despite this massive involvement, here we have a patch developed by a single guy, Philippe, who's doing this work outside his regular work hours, and his patch, which does provide guaranteed deterministic behavior, is: a) Much smaller than PREEMPT_RT b) Less intrusive than PREEMPT_RT c) Performs very well, as-good-as if not sometimes even better than PREEMPT_RT Splitting hairs won't erase this reality. And again, before the I get the PREEMPT_RT mob again on my back, this is just for the sake of argument, both approaches remain valid, and are not mutually exclusive. Like I said before, others are free to publish their own numbers showing differently from what we've found. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PREEMPT_RT and I-PIPE: the numbers, part 4
* Karim Yaghmour <[EMAIL PROTECTED]> wrote: > With ping floods, as with other things, there is room for improvement, > but keep in mind that these are standard tests [...] the problem is that ping -f isnt what it used to be. If you are using a recent distribution with an updated ping utility, these days the equivalent of 'ping -f' is something like: ping -q -l 500 -A -s 10 and even this variant (and the old variant) needs to be carefully validated for actual workload generated. Note that this is true for workloads against vanilla kernels too. (Also note that i did not claim that the flood ping workload you used is invalid - you have not published packet rates or interrupt rates that could help us judge how constant the workload was. I only said that according to my measurements it's quite unstable, and that you should double-check it. Just running it and ACK-ing that the packet rates are stable and identical amongst all of these kernels would be enough to put this concern to rest.) to see why i think there might be something wrong with the measurement, just look at the raw numbers: LMbench running times: ++---+---+---+---+---+ | Kernel | plain | IRQ | ping | IRQ & | IRQ & | || | test | flood | ping | hd | ++===+===+===+===+===+ | Vanilla-2.6.12 | 152 s | 150 s | 188 s | 185 s | 239 s | ++===+===+===+===+===+ | with RT-V0.7.51-02 | 152 s | 153 s | 203 s | 201 s | 239 s | ++===+===+===+===+===+ note that both the 'IRQ' and 'IRQ & hd' test involves interrupts, and PREEMPT_RT shows overhead within statistical error, but only the 'flood ping' workload created a ~8% slowdown. my own testing (whatever it's worth) shows that during flood-pings, the maximum overhead PREEMPT_RT caused was 4%. I.e. PREEMPT_RT used 4% more system-time than the vanilla UP kernel when the CPU was 99% dedicated to handling ping replies. But in your tests not the full CPU was dedicated to flood ping replies (of course). Your above numbers suggest that under the vanilla kernel 23% of CPU time was used up by flood pinging. (188/152 == +23.6%) Under PREEMPT_RT, my tentative guesstimation would be that it should go from 23.6% to 24.8% - i.e. a 1.2% less CPU time for lmbench - which turns into roughly +1 seconds of lmbench wall-clock time slowdown. Not 15 seconds, like your test suggests. So there's a more than an order of magnitude difference in the numbers, which i felt appropriate sharing :) _And_ your own hd and stable-rate irq workloads suggest that PREEMPT_RT and vanilla are very close to each other. Let me repeat the table, with only the numbers included where there was no flood pinging going on: LMbench running times: ++---+---+---+---+---+ | Kernel | plain | IRQ | | | IRQ & | || | test | | | hd | ++===+===+===+===+===+ | Vanilla-2.6.12 | 152 s | 150 s | | | 239 s | ++===+===+===+===+===+ | with RT-V0.7.51-02 | 152 s | 153 s | | | 239 s | ++===+===+===+===+===+ | with Ipipe-0.7 | 149 s | 150 s | | | 236 s | ++===+===+===+===+===+ these numbers suggest that outside of ping-flooding all IRQ overhead results are within statistical error. So why do your "ping flood" results show such difference? It really is just another type of interrupt workload and has nothing special in it. > but keep in mind that these are standard tests used as-is by others > [...] are you suggesting this is not really a benchmark but a way to test how well a particular system withholds against extreme external load? > For one thing, the heavy fluctuation in ping packets may actually > induce a state in the monitored kernel which is more akin to the one > we want to measure than if we had a steady flow of packets. so you can see ping packet flow fluctuations in your tests? Then you cannot use those results as any sort of benchmark metric. under PREEMPT_RT, if you wish to tone down the effects of an interrupt source then all you have to do is something like: P=$(pidof "IRQ "$(grep eth1 /proc/interrupts | cut -d: -f1 | xargs echo)) chrt -o -p 0 $P # net irq thread renice -n 19 $P chrt -o -p 0 5# softirq-tx renice -n 19 5 chrt -o -p 0 6# softirq-rx renice -n 19 6 and from this point on you should see zero lmbench overhead from flood pinging. Can vanilla or I-PIPE do that? Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/major
Re: PREEMPT_RT and I-PIPE: the numbers, part 4
* Kristian Benoit <[EMAIL PROTECTED]> wrote: [...] > "plain" run: > > Measurements | Vanilla | preempt_rt| ipipe > ---+-++- > fork | 97us | 91us (-6%) | 101us (+4%) > mmap | 776us | 629us (-19%) | 794us (+2%) some of you have wondered how it's possible that the PREEMPT_RT kernel is _faster_ than the vanilla kernel in these two metrics. I've done some more profiling, and one reason is kmap_atomic(). As i pointed out in an earlier mail, in your tests you not only had HIGHMEM64 enabled, but also HIGHPTE, which is a heavy kmap_atomic() user. [and which is an option meant for systems with 8GB or more RAM, not the typical embedded target.] kmap_atomic() is a pretty preemption-unfriendly per-CPU construct, which under PREEMPT_RT had to be changed and was mapped into kmap(). The performance advantage comes from the caching built into kmap() and not having to do per-page invlpg calls. (which can be pretty slow, expecially on highmem64) The 'mapping kmap_atomic into kmap' technique is perfectly fine under PREEMPT_RT because all kernel code is preemptible, but it's not really possible in the vanilla kernel due to the fundamental non-preemptability of interrupts, the preempt-off-ness of the mmu_gather mechanism, the atomicity of the ->page_table_lock spinlock, etc. so this is a case of 'fully preemptible beats non-preemptible due to flexibility', but it should be more of an exception than the rule, because generally the fully preemptible kernel tries to be 1:1 identical to the vanilla kernel. But it's an interesting phenomenon from a conceptual angle nevertheless. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PREEMPT_RT and I-PIPE: the numbers, part 4
On Sat, Jul 09, 2005 at 10:22:07AM -0700, Daniel Walker wrote: > PREEMPT_RT is not pre-tuned for every situation , but the bests > performance is achieved when the system is tuned. If any of these tests > rely on a low priority thread, then we just raise the priority and you > have better performance. Just think about it. Throttling those threads via the scheduler throttles the system in super controllable ways. This is very cool stuff. :) bill - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PREEMPT_RT and I-PIPE: the numbers, part 4
On Sat, 2005-07-09 at 09:19 +0200, Ingo Molnar wrote: > (if your goal was to check how heavily external interrupts can influence > a PREEMPT_RT box, you should chrt the network IRQ thread to SCHED_OTHER > and renice it and softirq-net-rx and softirq-net-tx to nice +19.) > This is interesting. I wonder how much tuning like this , just changing thread priorities, which would effect the results of these tests. PREEMPT_RT is not pre-tuned for every situation , but the bests performance is achieved when the system is tuned. If any of these tests rely on a low priority thread, then we just raise the priority and you have better performance. These other systems like Vanilla 2.6.x , and I-pipe aren't massively tunable like PREEMPT_RT . Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PREEMPT_RT and I-PIPE: the numbers, part 4
Can't type right anymore ... Karim Yaghmour wrote: > BTW, we've also released the latest very of the LRTBF we used to version Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PREEMPT_RT and I-PIPE: the numbers, part 4
Karim Yaghmour wrote: > I would usually like very much to entertain this further, but we've > really busted all the time slots I had allocated to this work. So at > this time, we really think others should start publishing results. > After all, our results are no more authoritative than those > published by others. BTW, we've also released the latest very of the LRTBF we used to publish these latest results, so others can it a try too :) Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PREEMPT_RT and I-PIPE: the numbers, part 4
Ingo Molnar wrote: > yeah, they definitely have helped, and thanks for this round of testing > too! I'll explain the recent changes to PREEMPT_RT that resulted in > these speedups in another mail. Great, I'm very much looking forward to it. > Looking at your numbers i realized that the area where PREEMPT_RT is > still somewhat behind (the flood ping +~10% overhead), you might be > using an invalid test methodology: I've got to smile reading this :) If one thing became clear out of these threads is that no matter how careful we are with our testing, there is always something that can be criticized about them. Take the highmem thing, for example, I never really bought the argument that highmem was the root of all evil ;) , and the last comparison we did between 50-35 and 51-02 with and without highmem clearly showed that indeed while highmem is a factor, there are inherent problems elsewhere than the disabling of highmem doesn't erase. Also, both vanilla and I-pipe were run with highmem, and if they don't suffer from it, then the problem is/was with PREEMPT_RT. With ping floods, as with other things, there is room for improvement, but keep in mind that these are standard tests used as-is by others to make measurements, that each run is made 5 times, and that the values in those tables represent the average of 5 runs. So while they may not be as exact as could be, I don't see why they couldn't be interpreted as giving us a "good idea" of what's happening. For one thing, the heavy fluctuation in ping packets may actually induce a state in the monitored kernel which is more akin to the one we want to measure than if we had a steady flow of packets. I would usually like very much to entertain this further, but we've really busted all the time slots I had allocated to this work. So at this time, we really think others should start publishing results. After all, our results are no more authoritative than those published by others. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PREEMPT_RT and I-PIPE: the numbers, part 4
* Paul Rolland <[EMAIL PROTECTED]> wrote: > > "IRQ & hd" run: > > Measurements | Vanilla | preempt_rt| ipipe > > ---+-++- > > fork | 101us | 94us (-7%) | 103us (+2%) > > open/close | 2.9us | 2.9us (~) | 3.0us (+3%) > > execve | 366us | 370us (+1%) | 372us (+2%) > > select 500fd |14.3us | 18.1us (+27%) | 14.5us (+1%) > > mmap | 794us | 654us (+18%) | 822us (+4%) > > > You mean -18%, not +18% I think. > > Just having a quick long at the numbers, it seems that now the "weak" > part in PREEMPT_RT is the select 500fd test. > > Ingo, any idea about this one ? yeah. In the '500 fds select' benchmark workload do_select() does an extremely tight loop over a 500-entry table that does an fget(). fget() acquires/releases current->files->file_lock. So we get 1000 lock and unlock operations in this workload. It cannot be for free. In fact, look at how the various vanilla kernels compare: AVG v2.6.12v2.6.12-PREEMPTv2.6.12-SMP -- select: 11.48 12.35 ( 7%) 26.40 (129%) (tested on one of my single-processor testsystems.) I.e. SMP locking is already 129% overhead, and CONFIG_PREEMPT (which just bumps the preempt count twice(!)) has 7% overhead. In that sense, the 27% select-500-fds overhead measured for PREEMPT_RT is more than acceptable. anyway, these days apps that do select() over 500 fds are expected to perform bad no matter what locking method is used. [To fix this particular overhead we could take the current->file_lock outside of the loop and do a get_file() within do_select(). This would improve SMP too. But i doubt anyone cares.] Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PREEMPT_RT and I-PIPE: the numbers, part 4
Paul Rolland wrote: >>mmap | 794us | 654us (+18%) | 822us (+4%) > > You mean -18%, not +18% I think. Doh ... too many numbers flying around ... yes, -18% :) Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PREEMPT_RT and I-PIPE: the numbers, part 4
Hello, > "IRQ & hd" run: > Measurements | Vanilla | preempt_rt| ipipe > ---+-++- > fork | 101us | 94us (-7%) | 103us (+2%) > open/close | 2.9us | 2.9us (~) | 3.0us (+3%) > execve | 366us | 370us (+1%) | 372us (+2%) > select 500fd |14.3us | 18.1us (+27%) | 14.5us (+1%) > mmap | 794us | 654us (+18%) | 822us (+4%) You mean -18%, not +18% I think. Just having a quick long at the numbers, it seems that now the "weak" part in PREEMPT_RT is the select 500fd test. Ingo, any idea about this one ? Regards, Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PREEMPT_RT and I-PIPE: the numbers, part 4
* Kristian Benoit <[EMAIL PROTECTED]> wrote: > The numbers for PREEMPT_RT, however, have dramatically improved. All > the 50%+ overhead we saw earlier has now gone away completely. The > improvement is in fact nothing short of amazing. We were actually so > surprised that we went around looking for any mistakes we may have > done in our testing. We haven't found any though. So unless someone > comes out with another set of numbers showing differently, we think > that a warm round of applause should go to the PREEMPT_RT folks. If > nothing else, it gives us satisfaction to know that these test rounds > have helped make things better. yeah, they definitely have helped, and thanks for this round of testing too! I'll explain the recent changes to PREEMPT_RT that resulted in these speedups in another mail. Looking at your numbers i realized that the area where PREEMPT_RT is still somewhat behind (the flood ping +~10% overhead), you might be using an invalid test methodology: > ping = on host: "sudo ping -f $TARGET_IP_ADDR" i've done a couple of ping -f flood tests between various testboxes myself, and one thing i found was that it's close to impossible to create a stable, comparable packets per second workload! The pps rate heavily fluctuated even within the same testrun. Another phenomenon i noticed is that the PREEMPT_RT kernel has a tendency to handle _more_ ping packets per second, while the vanilla (and thus i suspect the i-pipe) kernel throws away more packets. Thus lmbench under PREEMPT_RT may perform 'slower', but in fact it was just an unbalanced and thus unfair test. Once i created a stable packet rate, PREEMPT_RT's IRQ overhead became acceptable. (if your goal was to check how heavily external interrupts can influence a PREEMPT_RT box, you should chrt the network IRQ thread to SCHED_OTHER and renice it and softirq-net-rx and softirq-net-tx to nice +19.) this phenomenon could be a speciality of my network setup, but still, could you please verify the comparability of the ping -f workloads on the vanilla and the PREEMPT_RT kernels? In particular, the interrupt rate should be constant and comparable - but it might be better to look at both the received and transmitted packets per second. (Since things like iptraf are quite expensive when flood pinging is going on, the best way i found to measure the packet rate was to process netstat -s output via a simple script.) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PREEMPT_RT and I-PIPE: the numbers, part 4
Missing attachment herein included. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 L M B E N C H 2 . 0 S U M M A R Y Processor, Processes - times in microseconds - smaller is better null null open signal signalforkexecve /bin/sh kernel call I/O statfstatclose install handle process process process - --- --- --- --- --- --- --- --- --- --- HIGHMEM-RT-V0.7.50-35 0.18 0.2947 3.02 0.42 3.62 0.59 1.98 156 448 1481 NOHIGHMEM-RT-V0.7.50-35 0.18 0.28635 2.91 0.42 3.70 0.58 2.02 111 383 1372 HIGHMEM-RT-V0.7.51-02 0.18 0.27045 2.47 0.39 3.02 0.56 1.75 103 372 1352 NOHIGHMEM-RT-V0.7.51-02 0.18 0.2673 2.36 0.39 2.77 0.56 1.72 90 351 1328 File select - times in microseconds - smaller is better --- select select select select select select select select kernel 10 fd 100 fd 250 fd 500 fd 10 tcp 100 tcp 250 tcp 500 tcp - --- --- --- --- --- --- --- --- HIGHMEM-RT-V0.7.50-35 1.29 5.7013.2125.76 1.49 7.8809 18.6905 na NOHIGHMEM-RT-V0.7.50-35 1.26 5.6913.2525.84 1.47 na na na HIGHMEM-RT-V0.7.51-02 1.01 3.88 8.8217.08 1.24 na 14.1979 27.8158 NOHIGHMEM-RT-V0.7.51-02 1.02 3.90 8.8417.12 1.30 6.0573 na na Context switching with 0K - times in microseconds - smaller is better - 2proc/0k 4proc/0k 8proc/0k 16proc/0k 32proc/0k 64proc/0k 96proc/0k kernel ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch - - - - - - - - HIGHMEM-RT-V0.7.50-35 4.87 5.55 5.01 4.47 4.00 4.45 5.13 NOHIGHMEM-RT-V0.7.50-35 3.25 3.92 3.53 3.10 2.96 3.46 4.09 HIGHMEM-RT-V0.7.51-02 2.70 3.48 3.51 3.50 3.36 3.93 4.82 NOHIGHMEM-RT-V0.7.51-02 1.86 2.23 2.41 2.41 2.41 3.02 3.92 Context switching with 4K - times in microseconds - smaller is better - 2proc/4k 4proc/4k 8proc/4k 16proc/4k 32proc/4k 64proc/4k 96proc/4k kernel ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch - - - - - - - - HIGHMEM-RT-V0.7.50-35 5.48 4.75 4.47 4.76 4.68 5.90 7.24 NOHIGHMEM-RT-V0.7.50-35 3.88 4.54 4.02 3.91 4.04 4.93 5.85 HIGHMEM-RT-V0.7.51-02 3.25 3.59 3.85 3.89 4.18 5.41 6.75 NOHIGHMEM-RT-V0.7.51-02 2.70 3.01 2.99 3.04 3.31 4.56 6.16 Context switching with 8K - times in microseconds - smaller is better - 2proc/8k 4proc/8k 8proc/8k 16proc/8k 32proc/8k 64proc/8k 96proc/8k kernel ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch - - - - - - - - HIGHMEM-RT-V0.7.50-35 6.09 5.31 5.22 5.09 5.68 7.82 8.87 NOHIGHMEM-RT-V0.7.50-35 4.51 5.08 4.54 4.36 4.44 6.49 7.75 HIGHMEM-RT-V0.7.51-02 3.85 4.01 4.20 4.31 5.27 7.38 8.51 NOHIGHMEM-RT-V0.7.51-02 3.05 3.49 3.53 3.60 3.99 6.37 7.56 Context switching with 16K - times in microseconds - smaller is better -- 2p