On Aug 21, 2012, at 10:36 PM, ext Thiago Macieira wrote: > Hello > > I've just done some benchmarking of QMutex on Linux, using the pthread > implementation instead of the futex one. > > Conclusions first: > > QMutex is optimised for uncontended case. It does that by keeping the d > pointer at NULL while unlocked, and uses 0x3 to indicate it's locked. > Changing > from one value to another is extremely quick, requiring a simple atomic > operation. QMutex when uncontended proves to be roughly 16% faster than > pthread. This also shows in the benchmarks that use non-zero msleep: the > mutex > is mostly uncontended. > > That comes at a price, though: the performance drops considerably when > contention happens. > > When contention happens at a low rate (the "msleep(0)" case), QMutex > performance is similar to that of pthread, though slightly worse (up to 5%). > > When contention happens a lot, the performance is awful. I've measured > anything from 100% slower to over 1000%.
Wow… do you get the same results with Qt 4? I did quite a bit to optimize contention performance in Qt 4. I do know that mutex contention performance has regressed in Qt 5 on the Mac, I wasn't sure about on Linux, though. > Extrapolating these results to Mac and Windows, I expect QMutex performance > in > uncontended to be *much* better, but still lose horribly in the contended > case. > > Conclusion: I'm glad I use Linux and that we have futex. > > DATA: > > Reference: > Intel i7-2620M (SandyBridge) > 2 cores x 2 threads, 2.7 GHz, turbo to 3.3 GHz > CPU in "performance" governor > Linux 3.5.2 > glibc 2.15 > Fedora 17 > GCC 4.7.1, 64-bit mode > QtCore linked with LTO > > All results are the best out of 6 runs, under realtime FIFO scheduling. > > Uncontended Mutex results (100 million iterations): > > RESULT : tst_QMutex::uncontendedNative(): > 60.5891925 CPU ticks per iteration > 450.189192 task-clock # 0.999 CPUs utilized > 1,511,489,291 cycles # 3.357 GHz > 1,306,287,711 instructions # 0.86 insns per cycle > 197 raw_syscalls:sys_enter # 0.438 K/sec > 0.450477229 seconds time elapsed > > RESULT : tst_QMutex::uncontendedQMutex(): > 50.7105596 CPU ticks per iteration > 379.784144 task-clock # 0.999 CPUs utilized > 1,268,507,621 cycles # 3.340 GHz > 745,975,928 instructions # 0.59 insns per cycle > 194 raw_syscalls:sys_enter # 0.511 K/sec > 0.380036271 seconds time elapsed > > Contended Mutex results (1000 iterations): > > RESULT : tst_QMutex::contendedNative():"no msleep, 1 mutex": > 2,052,212.507 CPU ticks per iteration > 5814.825257 task-clock # 3.797 CPUs utilized > 18,513,286,444 cycles # 3.184 GHz > 13,801,932,519 instructions # 0.75 insns per cycle > 8,609,051 raw_syscalls:sys_enter # 1.481 M/sec > 1.531495948 seconds time elapsed > > RESULT : tst_QMutex::contendedQMutex():"no msleep, 1 mutex": > 4,087,893.432 CPU ticks per iteration > 11037.507260 task-clock # 2.699 CPUs utilized > 33,483,481,790 cycles # 3.034 GHz > 21,436,137,659 instructions # 0.64 insns per cycle > 12,012,804 raw_syscalls:sys_enter # 1.088 M/sec > 4.088957193 seconds time elapsed > > Other results were: 4.2, 5.7, 5.8, 6.7, 7.1 million ticks. > > RESULT : tst_QMutex::contendedNative():"no msleep, 2 mutexes": > 2,550,929.603 CPU ticks per iteration > 7155.513345 task-clock # 3.763 CPUs utilized > 22,760,839,897 cycles # 3.181 GHz > 16,370,712,299 instructions # 0.72 insns per cycle > 10,457,934 raw_syscalls:sys_enter # 1.462 M/sec > 1.901400808 seconds time elapsed > > RESULT : tst_QMutex::contendedQMutex():"no msleep, 2 mutexes": > 29,396,174.807 CPU ticks per iteration > 48627.618792 task-clock # 2.183 CPUs utilized > 141,749,504,525 cycles # 2.915 GHz > 78,008,558,700 instructions # 0.55 insns per cycle > 38,536,844 raw_syscalls:sys_enter # 0.792 M/sec > 22.271697343 seconds time elapsed > > 100 iterations: > RESULT : tst_QMutex::contendedNative():"msleep(0), 1 mutex": > 67,621,168.46 CPU ticks per iteration > 4326.998212 task-clock # 0.859 CPUs utilized > 11,239,050,634 cycles # 2.597 GHz > 8,415,799,134 instructions # 0.75 insns per cycle > 2,965,384 raw_syscalls:sys_enter # 0.685 M/sec > 5.036652093 seconds time elapsed > > RESULT : tst_QMutex::contendedQMutex():"msleep(0), 1 mutex": > 70,621,368.59 CPU ticks per iteration > 4909.514006 task-clock # 0.934 CPUs utilized > 13,123,468,429 cycles # 2.673 GHz > 9,532,793,349 instructions # 0.73 insns per cycle > 3,619,607 raw_syscalls:sys_enter # 0.737 M/sec > 5.253921952 seconds time elapsed > > RESULT : tst_QMutex::contendedNative():"msleep(0), 2 mutexes": > 67,478,669.37 CPU ticks per iteration > 4314.232114 task-clock # 0.857 CPUs utilized > 11,244,572,017 cycles # 2.606 GHz > 8,382,057,867 instructions # 0.75 insns per cycle > 2,939,351 raw_syscalls:sys_enter # 0.681 M/sec > 5.035212837 seconds time elapsed > > RESULT : tst_QMutex::contendedQMutex():"msleep(0), 2 mutexes": > 70,837,078.76 CPU ticks per iteration > 4933.702732 task-clock # 0.929 CPUs utilized > 13,192,133,179 cycles # 2.674 GHz > 9,554,807,698 instructions # 0.72 insns per cycle > 3,622,623 raw_syscalls:sys_enter # 0.734 M/sec > 5.309986829 seconds time elapsed > > -- > Thiago Macieira - thiago.macieira (AT) intel.com > Software Architect - Intel Open Source Technology Center > Intel Sweden AB - Registration Number: 556189-6027 > Knarrarnäsgatan 15, 164 40 Kista, Stockholm, Sweden > _______________________________________________ > Development mailing list > Development@qt-project.org > http://lists.qt-project.org/mailman/listinfo/development -- Bradley T. Hughes bradley.hug...@nokia.com _______________________________________________ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development