On 08/29/2011 08:56 PM, Mathieu Desnoyers wrote: > * Lai Jiangshan ([email protected]) wrote: >> On 08/25/2011 04:35 PM, Paolo Bonzini wrote: >>> On 08/25/2011 10:00 AM, Lai Jiangshan wrote: >>>>>> I was measuring with 10 readers, not 3. It makes sense to wait more >>>>>> with fewer readers. >>>> >>>> But my box just has 4 cores(i5 760). >>> >>> You cannot be always sure that readers are less than the cores. readers > >>> cores is exactly the case when busy waiting hurts most. >>> >> >> >> It makes no sense to do a "readers > cores" *performance* rcutorture test. > > I think it does make sense to benchmark this use-case actually. One > major difference between Userspace RCU and kernel code is that Userspace > RCU has to handle workloads that can sometimes be ill-fitted with > respect to the system configuration, and still behave reasonably well.
rcutorure's performance testing will bind reader-thread's to difference cpu, when "readers > cores", "readers - cores" reader-threads will failed to be bound, so I said it makes no sense to do a "readers > cores" *performance* rcutorture test. > >> When "readers > cores", the kernel scheduler will mess the test up. > > Even though I agree that the kernel scheduler will become heavily > involved in these tests, I think that if we keep the same scheduler > configuration between the tests and only modify the URCU algorithm, we > can compare the impact of URCU well enough. > > So what I'm trying to say here is: I agree with you that we primarily > need to optimize for performance of the "ideal" configuration (n threads > for n cpus), but we also need to consider the cases where we have more > threads than CPUs so, even though this behavior is not the one we > mainly optimize for, we don't degrade its performance more than we > should for the sake of very small gains in the ideal configuration. > > Thanks, > > Mathieu > When readers > cores, the n_updates is very unstable, so the result of n_updates makes less sense. The result show Paolo's patch has advance, but my patch has more advance for reader site performance. The updater in my patches has less affect to the reader. Thanks, Lai. --------------------------------------- 78bec1: [laijs@lai tests]$ for ((i=0;i<20;i++)) do ./rcutorture_qsbr 10 perf 2>/dev/null | (read a b c d e; echo $b $d); done 126477522000 37 124138875000 46 125035204000 38 124364462000 2813 126468264000 33 127630616000 37 124336956000 42 126514624000 35 125380877000 2045 123055119000 50 124811675000 1705 127572424000 30 125952102000 36 126772195000 3289 119900155000 50 126700892000 32 125155070000 38 126137397000 36 125569792000 40 125979757000 595 [laijs@lai tests]$ for ((i=0;i<20;i++)) do ./rcutorture_qsbr 50 perf 2>/dev/null | (read a b c d e; echo $b $d); done 133815759000 12 134410744000 114 134054438000 10 134899649000 11 134909105000 10 134866493000 10 135255807000 12 134674536000 11 134548679000 11 134324605000 11 134932753000 812 135272699000 10 134802202000 13 134966508000 10 133966645000 13 134944451000 11 134123677000 12 135216333000 11 135954511000 10 136495744000 11 ------------------------------------------------------------------------- 83a2c4(=78bec1+Paolo's patch) [laijs@lai tests]$ for ((i=0;i<20;i++)) do ./rcutorture_qsbr 10 perf 2>/dev/null | (read a b c d e; echo $b $d); done 136249775000 54 137578256000 52 136910456000 54 137308567000 52 137417925000 61 137023710000 53 137380666000 53 136926779000 55 136666836000 50 137323468000 52 137262017000 54 137175620000 2060 136971246000 149 137394253000 56 136799328000 50 137803397000 2284 137365536000 52 137353673000 53 137391468000 55 136296672000 54 [laijs@lai tests]$ for ((i=0;i<20;i++)) do ./rcutorture_qsbr 50 perf 2>/dev/null | (read a b c d e; echo $b $d); done 136294321000 16 137155186000 16 135783080000 15 137418765000 499 137668007000 19 137311146000 15 137443675000 16 137231740000 17 135516661000 16 136909649000 16 136805721000 14 136709665000 18 136655673000 17 137326871000 31 136728430000 16 136911747000 65 136827095000 16 137243937000 17 136833854000 17 136780665000 76 ------------------------------------------------------------ 78bec1+my patchset: [laijs@lai tests]$ for ((i=0;i<20;i++)) do ./rcutorture_qsbr 10 perf 2>/dev/null | (read a b c d e; echo $b $d); done 227307909000 272 226744284000 52 226503154000 51 227665894000 1617 227029208000 52 226408806000 55 227959365000 53 226864291000 53 227732800000 52 228040512000 49 227128921000 52 228532620000 54 227128007000 53 227908550000 2692 228121321000 54 227504548000 52 228398981000 55 227060552000 55 228058918000 52 226711856000 53 [laijs@lai tests]$ for ((i=0;i<20;i++)) do ./rcutorture_qsbr 50 perf 2>/dev/null | (read a b c d e; echo $b $d); done 228580953000 16 227967003000 16 228436484000 16 227489810000 17 227318023000 15 227743424000 16 226781106000 41 228138633000 17 227809037000 211 226175890000 17 228283901000 15 226989431000 15 226919435000 45 229552764000 128 227096384000 291 226240792000 17 226834648000 17 226925460000 18 227045700000 16 226696383000 15 _______________________________________________ ltt-dev mailing list [email protected] http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev
