RE: [Patch] R/W lock slowness on Windows (Was: Windows R/W lock comment / Reader Writer lock performance on Windows)
-Original Message- From: Ivan Zhakov [mailto:i...@visualsvn.com] Sent: woensdag 27 augustus 2014 15:29 To: Bert Huijben Subject: Re: [Patch] R/W lock slowness on Windows (Was: Windows R/W lock comment / Reader Writer lock performance on Windows) Hi Bert! Did you get any feedback about this from APR developers? What do you think about my suggestion that I've posted on apr@ mailing list? I didn't get any feedback except yours :( 1) In general apr doesn't promise anything about its output arguments on error cases, so I'm not sure if we should start look at this specifically here. (Other platforms don't care and I see at least one that don't set it on all error cases) 2) Same reasoning: Other platforms differ here, so I don't see any requirement to keep Windows strictly as it was. Bert On 4 August 2014 02:27, Ivan Zhakov i...@visualsvn.com wrote: On 2 August 2014 13:02, Bert Huijben b...@qqmail.nl wrote: Hi, With this mail I would like to ping the original problem with the second patch I send to the list. This patch doesn't have the likely original problems of potentially changing the lock behavior to a spin lock, but provides all the performance improvements anyway. It just replaces the Windows inter-proces mutex in the apr code with the already existing apr-mutex implementation, which is usually implemented as the far more efficient Windows in-process Critical section. This makes the implementation 10* to 140* times more efficient on Windows and more similar to the performance on other platforms. A performance fix like this is critical for Subversion, or we would have to implement our own r/w locking implementation for our caching infrastructure. Hi Bert, Comments on your patch: 1. It seems behavior of apr_thread_rwlock_create() functions changes a bit: code in trunk intializes output *rwlock pointer to NULL in case of error, while your patch returns pointer to partially initialized object. It seems that apr_thread_rwlock_create() doesn't have API promise to initalize output to NULL, but I think it's better to leave current behavior. 2. The apr_thread_rwlock_unlock() currently uses checked mutex behavior for its logic. This is possible reason why current rwlock implementation uses OS mutexes instead of cirtical sections. I don't see problem with proposed change though. -- Ivan Zhakov -- Ivan Zhakov CTO | VisualSVN | http://www.visualsvn.com
Re: [Patch] R/W lock slowness on Windows (Was: Windows R/W lock comment / Reader Writer lock performance on Windows)
On 27 August 2014 17:34, Bert Huijben b...@qqmail.nl wrote: -Original Message- From: Ivan Zhakov [mailto:i...@visualsvn.com] Sent: woensdag 27 augustus 2014 15:29 To: Bert Huijben Subject: Re: [Patch] R/W lock slowness on Windows (Was: Windows R/W lock comment / Reader Writer lock performance on Windows) Hi Bert! Did you get any feedback about this from APR developers? What do you think about my suggestion that I've posted on apr@ mailing list? I didn't get any feedback except yours :( 1) In general apr doesn't promise anything about its output arguments on error cases, so I'm not sure if we should start look at this specifically here. (Other platforms don't care and I see at least one that don't set it on all error cases) Yes, I know that general practice do not initialize output arguments on error, but I think it worth to keep old behavior because this is very sensitive area (deadlocks and etc). 2) Same reasoning: Other platforms differ here, so I don't see any requirement to keep Windows strictly as it was. Ok. -- Ivan Zhakov
Re: [Patch] R/W lock slowness on Windows (Was: Windows R/W lock comment / Reader Writer lock performance on Windows)
On 2 August 2014 13:02, Bert Huijben b...@qqmail.nl wrote: Hi, With this mail I would like to ping the original problem with the second patch I send to the list. This patch doesn't have the likely original problems of potentially changing the lock behavior to a spin lock, but provides all the performance improvements anyway. It just replaces the Windows inter-proces mutex in the apr code with the already existing apr-mutex implementation, which is usually implemented as the far more efficient Windows in-process Critical section. This makes the implementation 10* to 140* times more efficient on Windows and more similar to the performance on other platforms. A performance fix like this is critical for Subversion, or we would have to implement our own r/w locking implementation for our caching infrastructure. Hi Bert, Comments on your patch: 1. It seems behavior of apr_thread_rwlock_create() functions changes a bit: code in trunk intializes output *rwlock pointer to NULL in case of error, while your patch returns pointer to partially initialized object. It seems that apr_thread_rwlock_create() doesn't have API promise to initalize output to NULL, but I think it's better to leave current behavior. 2. The apr_thread_rwlock_unlock() currently uses checked mutex behavior for its logic. This is possible reason why current rwlock implementation uses OS mutexes instead of cirtical sections. I don't see problem with proposed change though. -- Ivan Zhakov
[Patch] R/W lock slowness on Windows (Was: Windows R/W lock comment / Reader Writer lock performance on Windows)
Hi, With this mail I would like to ping the original problem with the second patch I send to the list. This patch doesn't have the likely original problems of potentially changing the lock behavior to a spin lock, but provides all the performance improvements anyway. It just replaces the Windows inter-proces mutex in the apr code with the already existing apr-mutex implementation, which is usually implemented as the far more efficient Windows in-process Critical section. This makes the implementation 10* to 140* times more efficient on Windows and more similar to the performance on other platforms. A performance fix like this is critical for Subversion, or we would have to implement our own r/w locking implementation for our caching infrastructure. Bert apr-rwlock-AprMutex.patch Description: Binary data
RE: Windows R/W lock comment
I think it is a hybrid loop, but the documentation is not really clear about it. It is not as hybrid as a critical section (which internally falls back to the kernel based mutex behavior for long waits), but it is more CPU efficient than a loop around a few simple atomic operations. Looking at the old implementation I’m thinking that as a first step we should switch the current/old implementation to using an APR mutex instead of the Windows mutex. On all NT platforms the apr mutex is implemented as a critical section, which is much faster than the inter-process capable Windows mutex. (Handling this mutex is where the current implementation spends almost all its time) Bert From: Jeff Trawick [mailto:traw...@gmail.com] Sent: donderdag 3 juli 2014 21:57 To: APR Developer List Subject: Windows R/W lock comment I'm getting filtered when I try to respond to the original thread. This is an important issue: If a writer holds the lock for a significant time (e.g., refreshing a table from a database), do all would-be readers spin-loop, or is it a hybrid spin-then-sleep solution? We should know this for certain. Busy-loop is a no-go, at least without some non-default build-time option (or in later releases, a _ex() form of the create API which allows the flavor to be selected).
RE: Windows R/W lock comment
By googling for more details I just found an additional problem… TryAcquireSRWLockExclusive and TryAcquireSRWLockShared are only implemented on Windows 7 and later (not Vista and later), so the code needs more fallback behavior for Vista. Bert From: Bert Huijben [mailto:b...@qqmail.nl] Sent: maandag 7 juli 2014 12:25 To: 'Jeff Trawick'; 'APR Developer List' Subject: RE: Windows R/W lock comment I think it is a hybrid loop, but the documentation is not really clear about it. It is not as hybrid as a critical section (which internally falls back to the kernel based mutex behavior for long waits), but it is more CPU efficient than a loop around a few simple atomic operations. Looking at the old implementation I’m thinking that as a first step we should switch the current/old implementation to using an APR mutex instead of the Windows mutex. On all NT platforms the apr mutex is implemented as a critical section, which is much faster than the inter-process capable Windows mutex. (Handling this mutex is where the current implementation spends almost all its time) Bert From: Jeff Trawick [mailto:traw...@gmail.com] Sent: donderdag 3 juli 2014 21:57 To: APR Developer List Subject: Windows R/W lock comment I'm getting filtered when I try to respond to the original thread. This is an important issue: If a writer holds the lock for a significant time (e.g., refreshing a table from a database), do all would-be readers spin-loop, or is it a hybrid spin-then-sleep solution? We should know this for certain. Busy-loop is a no-go, at least without some non-default build-time option (or in later releases, a _ex() form of the create API which allows the flavor to be selected).
RE: Windows R/W lock comment
I created a patch for the ‘use apr mutex instead of Windows mutex’ scenario with a surprising result in my initial measurements. It looks like this implementation is actually a tiny bit faster than the Windows SRW lock on all cases in testlockperf.exe… It could be that this is caused by trying to fit everything in a single pointer value (as Windows does for the SRWLock), or it could be noise in my test environment. But at least it makes the apr rwlock fast enough for Subversion’s in memory cache, without really changing any of the api behavior of apr. Note that I now have to use rwlock-readers in apr_thread_rwlock_unlock(), like in my original patch as the apr unlock doesn’t produce an usable errorcode. But as we can assume having at least one reader or writer lock when entering this function, this is safe to do. Bert From: Bert Huijben [mailto:b...@qqmail.nl] Sent: maandag 7 juli 2014 12:25 To: 'Jeff Trawick'; 'APR Developer List' Subject: RE: Windows R/W lock comment I think it is a hybrid loop, but the documentation is not really clear about it. It is not as hybrid as a critical section (which internally falls back to the kernel based mutex behavior for long waits), but it is more CPU efficient than a loop around a few simple atomic operations. Looking at the old implementation I’m thinking that as a first step we should switch the current/old implementation to using an APR mutex instead of the Windows mutex. On all NT platforms the apr mutex is implemented as a critical section, which is much faster than the inter-process capable Windows mutex. (Handling this mutex is where the current implementation spends almost all its time) Bert From: Jeff Trawick [mailto:traw...@gmail.com] Sent: donderdag 3 juli 2014 21:57 To: APR Developer List Subject: Windows R/W lock comment I'm getting filtered when I try to respond to the original thread. This is an important issue: If a writer holds the lock for a significant time (e.g., refreshing a table from a database), do all would-be readers spin-loop, or is it a hybrid spin-then-sleep solution? We should know this for certain. Busy-loop is a no-go, at least without some non-default build-time option (or in later releases, a _ex() form of the create API which allows the flavor to be selected). apr-rwlock-AprMutex.patch Description: Binary data
Windows R/W lock comment
I'm getting filtered when I try to respond to the original thread. This is an important issue: If a writer holds the lock for a significant time (e.g., refreshing a table from a database), do all would-be readers spin-loop, or is it a hybrid spin-then-sleep solution? We should know this for certain. Busy-loop is a no-go, at least without some non-default build-time option (or in later releases, a _ex() form of the create API which allows the flavor to be selected).