A few comments from my side: * I am not a big fan of inlining the public classes neither. It feels a bit like over-optimizing at the wrong place. Reason is that this might make it a impossible later on to refactor our code. * It's ok to have some low level inline class for internal use if that helps us in performance critical parts (such as signal/slot code). Nothing says this and the public QMutex class have to be the same. * IMO we should really try to allow using tools such as helgrind also with release build * it would be good to see how much of a real world (ie. not with artificial benchmarks) difference you get due to inlining the mutex code. Is it really relevant?
Cheers, Lars On 5/19/12 12:36 AM, "ext Olivier Goffart" <oliv...@woboq.com> wrote: >On Friday 18 May 2012 23:25:47 Thiago Macieira wrote: >> > > QBasicMutex is an internal POD class that offers non-recursive >>locking. >> > > It's incredibly efficient for Linux, where the single pointer-sized >> > > member variable is enough to execute the futex operations, in all >>cases. >> > > For other platforms, we get the same efficiency in non-contended >>cases, >> > > but incur a non-negligible performance penalty when contention >>happens. >> > >> > Which non-negligible performance penalty are you takling about here? >> >> The need to allocate memory and do the test-and-set loop for setting the >> private. > >There is no memory allocation, it is using the QFreeList. >And this happens when we are about to do a context switch. So I beleive >the >test-and-set loop should be neglectible. > >> Right, and users use the classes as intended... :-P > >(QBasicMutex is an undocumented internal class) > >> Introducing the noop valgrind code (a 32-bit rotate) still consumes CPU >> resources. It will consume front-end decoding, one ALU port, the >>retirement, >> not to mention increased code size. There's no free lunch. > >Slower than a function call (which will likely spill register)? I don't >beleive so. > >(moreever, we are talking about debug build, right?) > >> > That is not relevant. >> > Transactional memory requires a different set of primitives. QMutex >>has >> > nothing to do with transactional memory. >> >> Then you didn't read the manual. I'm going to ignore what you said >>until you >> read the manual because transactional memory has everythig to do with >> QMutex. >> >> Please, either believe me or read the manual. > >Do you have a link to that manuel? > >Transactional memory is about detecting conflicts between transactions, >and >rolling back. >Mutexes are about locking until the resource is free > >Transactional memory could be used to simplify the code that allocate the >QMutexPrivate. > >But I doubt the hypothetical gain is even worth the function call. > >Because remember the uncontended case is much more critical than the >contended >case. (There is no point in winning a dozens of cycles if we are going to >pay >anyway several hendreds in the context switch that follow) > >But fast uncontended case is critical because it allows the use of mutex >in >part that might or not be used with threads. Example: QObject. We need to >put >mutexes while emiting a signal because maybe it is used in multiple >thread. >But we don't want to pay too much when used in a single thread (the >common >case) > >> > > (P2) optimise the Mac and Windows implementations to avoid the need >>for >> > > allocating a dynamic d pointer in case of contention. In fact, >>remove >> > > the >> > > need for dynamic d-pointer allocation altogether: Mac, Windows and >>Linux >> > > should never do it, while the generic Unix implementation should do >>it >> > > all >> > > the time in the constructor >> > >> > The allocation of the d-ponter is not that dynamic. >> > It is already quite optimized in Qt5 >> >> My suggestion was to avoid the QMutexPrivate allocation on Mac and >>Windows, >> since they require just a bit of initialisation. Now, given what you >>said >> about semaphore_t, we may not be able to do it on the Mac. But we can >>try to >> apply the same optimisation for Windows -- the initialisation is a call >>to >> CreateEvent. > >But CreateEvent still probably allocate memory behind. > >> > I beleive there is more important priority right now than touching >>QMutex, >> > which already had its lifting for Qt5. >> >> I disagree. If we shoot ourselves in the foot by not being able to >>support >> TSX and valgrind in Qt 5, we've lost a lot. That's why I propose >> de-inlining for 5.0, so we have enough time to investigate those >>potential >> drawbacks. > >I think the inline is not a problem for valgrind. > >And I don't think we can gain much with TSX. > >And even if we could, we still do pretty much everything in a binary >compatible way despite the inlines (We can say 2 means unlocked and 3 >locked, >then the old code fallbacks to the non-inline case (The first lock would >handle the 0 or 1)) > >Is it not however shooting ourself in the feet not to inline it? Because >we >hardly can inline it later in a binary compatible way. > >-- >Olivier > >Woboq - Qt services and support - http://woboq.com >_______________________________________________ >Development mailing list >Development@qt-project.org >http://lists.qt-project.org/mailman/listinfo/development _______________________________________________ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development