On terça-feira, 21 de agosto de 2012 22.36.38, Thiago Macieira wrote: > RESULT : tst_QMutex::contendedQMutex():"no msleep, 1 mutex": > 4,087,893.432 CPU ticks per iteration > 11037.507260 task-clock # 2.699 CPUs > utilized > 33,483,481,790 cycles # 3.034 GHz > 21,436,137,659 instructions # 0.64 insns per cycle > 12,012,804 raw_syscalls:sys_enter # 1.088 M/sec > 4.088957193 seconds time elapsed > > Other results were: 4.2, 5.7, 5.8, 6.7, 7.1 million ticks.
Here are the results after the rewrite, without adaptive locking (see below): RESULT : tst_QMutex::contendedQMutex():"no msleep, 1 mutex": 3,364,698.345 CPU ticks per iteration 8775.205691 task-clock # 2.924 CPUs utilized 26,978,578,571 cycles # 3.074 GHz 18,091,438,451 instructions # 0.67 insns per cycle 10,460,523 raw_syscalls:sys_enter # 1.192 M/sec 3.001549490 seconds time elapsed The result for 4.04 seconds ran with 4.9 million ticks, but all the other numbers are the same. I can't explain why the tick counter is much higher for that one. With adaptive locking: RESULT : tst_QMutex::contendedQMutex():"no msleep, 1 mutex": 1,919,764.064 CPU ticks per iteration 5404.168638 task-clock # 3.783 CPUs utilized 17,199,382,533 cycles # 3.183 GHz 13,052,044,286 instructions # 0.76 insns per cycle 8,071,929 raw_syscalls:sys_enter # 1.494 M/sec 1.428415478 seconds time elapsed > RESULT : tst_QMutex::contendedQMutex():"no msleep, 2 mutexes": > 29,396,174.807 CPU ticks per iteration > 48627.618792 task-clock # 2.183 CPUs > utilized > 141,749,504,525 cycles # 2.915 GHz > 78,008,558,700 instructions # 0.55 insns per cycle > 38,536,844 raw_syscalls:sys_enter # 0.792 M/sec > 22.271697343 seconds time elapsed Without adaptive locking: RESULT : tst_QMutex::contendedQMutex():"no msleep, 2 mutexes": 28,641,366.537 CPU ticks per iteration 47886.578653 task-clock # 2.218 CPUs utilized 139,684,008,827 cycles # 2.917 GHz 76,540,168,881 instructions # 0.55 insns per cycle 38,837,066 raw_syscalls:sys_enter # 0.811 M/sec 21.586443075 seconds time elapsed I.e., roughly the same. With adaptive locking: RESULT : tst_QMutex::contendedQMutex():"no msleep, 2 mutexes": 1,961,622.638 CPU ticks per iteration 5561.854224 task-clock # 3.781 CPUs utilized 17,706,600,180 cycles # 3.184 GHz 13,209,273,979 instructions # 0.75 insns per cycle 8,072,609 raw_syscalls:sys_enter # 1.451 M/sec 1.471046980 seconds time elapsed Adaptive locking is a busy-wait spin ahead of the sleep, iterating 1000 times trying to acquire the mutex. The Qt 4 solution was time based, whereas the one I'm implementing is a fixed number of cycles. It's similar to Glibc's solution, which is also a number of cycles. Note that the "without adaptive locking" solution still tries to acquire it once again. Without that, the results are much, much worse. I decided that trying once was an acceptable comparison because Olivier's original does try to lock once before trying to sleep. In *this* particular case, it runs in less time and with less CPU time, but in other cases it's not the same. In the msleep(2) case, it runs in similar time as pthread, but it uses roughly 33% more CPU. Conclusion: the biggest gain is the adaptive locking, even though it introduces a busy-wait. I'd recommend keeping it and making it smarter, really *adapting* to how often the mutex is contended. -- Thiago Macieira - thiago.macieira (AT) intel.com Software Architect - Intel Open Source Technology Center Intel Sweden AB - Registration Number: 556189-6027 Knarrarnäsgatan 15, 164 40 Kista, Stockholm, Sweden
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development