On Sunday 29 April 2007, Willy Tarreau wrote: >On Sun, Apr 29, 2007 at 08:59:01AM +0200, Ingo Molnar wrote: >> * Willy Tarreau <[EMAIL PROTECTED]> wrote: >> > I don't know if Mike still has problems with SD, but there are now >> > several interesting reports of SD giving better feedback than CFS on >> > real work. In my experience, CFS seems smoother on *technical* tests, >> > which I agree that they do not really simulate real work. >> >> well, there are several reports of CFS being significantly better than >> SD on a number of workloads - and i know of only two reports where SD >> was reported to be better than CFS: in Kasper's test (where i'd like to >> know what the "3D stuff" he uses is and take a good look at that >> workload), and another 3D report which was done against -v6. (And even >> in these two reports the 'smoothness advantage' was not dramatic. If you >> know of any other reports then please let me know!) > >There was Caglar Onur too but he said he will redo all the tests. I'm >not tracking all tests nor versions, so it might be possible that some >of the differences vanish with v7. > >In fact, what I'd like to see in 2.6.22 is something better for everybody >and with *no* regression, even if it's not perfect. I had the feeling >that SD matched that goal right now, except for Mike who has not tested >recent versions. Don't get me wrong, I still think that CFS is a more >interesting long-term target. But it may require more time to satisfy >everyone. At least with one of them in 2.6.22, we won't waste time >comparing to current mainline. > >> Ingo > >Willy
In the FWIW category, I haven't built and tested a 'mainline' since at least 2-3 weeks ago. That's how dramatic the differences are here. Here, my main notifier of scheduling fubar artifacts is usually kmail, which in itself seems to have a poor threading model, giving the composer pauses whenever its off sorting incoming mail, or compacting a folder, all the usual stuff that it needs to do in the background. Those lags were from 10 to 30 seconds long, and I could type whole sentences before they showed up on screen with mainline. The best either of these schedulers can do is hold that down to 3 or 4 words, but that's an amazing difference in itself. With either of these schedulers, having a running gzip session that amanda launched in the background cause kmail to display a new message 5-30 seconds after the + key has been tapped is now into the sub 4 second range & often much less. SD seems to, as it states, give everyone a turn at the well, so the slowdowns when gzip is running are somewhat more noticeable, whereas with CFS, gzip seems to be pretty well preempted long enough to process most user keypresses. Not all, because tapping the + key to display the next message can at times be a pretty complex operation. For my workload, CFS seems to be a marginally better solution, but either is so much better than mainline that there cannot be a reversion to mainline performance here without a lot of kicking and screaming. 'vmstat -n 1' results show that CFS uses a lot less time doing context switches, which as IUI, is to be counted against OS overhead as it does no productive work while the switch is being done. For CFS, that's generally less than 500/second, and averageing around 350, which compared to SD046's average of about 18,000/second, it would appear that CFS allows more REAL work to get done by holding down on the non-productive time a context switch requires. FWIW, amanda runtimes tend to back that up, most CFS runs are sub 2 hours, SD runs are seemingly around 2h:10m. But that again is not over a sufficiently large sample to be a benchmark tool either, just one persons observation. I should have marked the amdump logs so I could have determined that easier by tracking which scheduler was running for that dump. amplot can be informative, but one must also correlate, and a week ago is ancient history as I have no way to verify which I was running then. The X86's poor register architecture pretty well chains us to the 'context switch' if we want multitasking. I'm reminded of how that was handled on a now largely dead architecture some here may never have seen an example of, TI's 99xx chips, where all accumulators and registers were actually stored in memory, and a context switch was a simply matter of reloading the register that pointed into this memory array with a new address so a context switch was just a matter of reading the next processes address and storing it in the address register, itself also just a location in memory. The state image of the process being 'put to sleep' was therefore maintained indefinitely as long as the memory was refreshed. Too bad we can't do that on the x86 but I assume TI has patent lawyers standing by ready to jump on that one. However, with today's L1 cache being the speed and size that it is, it sure looks like a doable thing even at 2GHZ+ clocks. Yup, we've tried lots of very productive ways to get that job done over the last 40 years, and everytime somebody has a really good idea, the patent prevents its application. And they call that progress... SIGH.. Just imagine how much money TI would have today if the royalty asked had only been a dime a chip... And to a very limited extent, one could do that on an RCA-180x too, using its plethora of internal registers, any of which could be made the PC or SP (or DMA pointer too IIRC) with a one byte command. Anyway, in my mind CFS wins this round, but its a smallish margin, like judging ice skaters, a 9.7 vs 9.5 rating. Not a huge difference, until mainlines 0.0 score is thrown into the pot for a reality check comparison. -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) And on the eighth day, we bulldozed it. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/