RE: [Patch] R/W lock slowness on Windows (Was: Windows R/W lock comment / Reader Writer lock performance on Windows)

2014-08-27 Thread Bert Huijben


 -Original Message-
 From: Ivan Zhakov [mailto:i...@visualsvn.com]
 Sent: woensdag 27 augustus 2014 15:29
 To: Bert Huijben
 Subject: Re: [Patch] R/W lock slowness on Windows (Was: Windows R/W lock
 comment / Reader Writer lock performance on Windows)
 
 Hi Bert!
 
 Did you get any feedback about this from APR developers? What do you
 think about my suggestion that I've posted on apr@ mailing list?

I didn't get any feedback except yours :(

1)
In general apr doesn't promise anything about its output arguments on error 
cases, so I'm not sure if we should start look at this specifically here.
(Other platforms don't care and I see at least one that don't set it on all 
error cases)

2)
Same reasoning: Other platforms differ here, so I don't see any requirement to 
keep Windows strictly as it was.

Bert
 
 On 4 August 2014 02:27, Ivan Zhakov i...@visualsvn.com wrote:
  On 2 August 2014 13:02, Bert Huijben b...@qqmail.nl wrote:
  Hi,
 
  With this mail I would like to ping the original problem with the second
  patch I send to the list. This patch doesn't have the likely original
  problems of potentially changing the lock behavior to a spin lock, but
  provides all the performance improvements anyway.
 
  It just replaces the Windows inter-proces mutex in the apr code with the
  already existing apr-mutex implementation, which is usually implemented
 as
  the far more efficient Windows in-process Critical section.
 
  This makes the implementation 10* to 140* times more efficient on
 Windows
  and more similar to the performance on other platforms. A performance
 fix
  like this is critical for Subversion, or we would have to implement our own
  r/w locking implementation for our caching infrastructure.
 
  Hi Bert,
 
  Comments on your patch:
  1. It seems behavior of apr_thread_rwlock_create() functions changes a
  bit: code in trunk intializes output *rwlock pointer to NULL in case
  of error, while your patch returns pointer to partially initialized
  object. It seems that apr_thread_rwlock_create() doesn't have API
  promise to initalize output to NULL, but I think it's better to leave
  current behavior.
 
  2. The apr_thread_rwlock_unlock() currently uses checked mutex
  behavior for its logic. This is possible reason why current rwlock
  implementation uses OS mutexes instead of cirtical sections. I don't
  see problem with proposed change  though.
 
 
  --
  Ivan Zhakov
 
 
 
 --
 Ivan Zhakov
 CTO | VisualSVN | http://www.visualsvn.com



Re: [Patch] R/W lock slowness on Windows (Was: Windows R/W lock comment / Reader Writer lock performance on Windows)

2014-08-27 Thread Ivan Zhakov
On 27 August 2014 17:34, Bert Huijben b...@qqmail.nl wrote:


 -Original Message-
 From: Ivan Zhakov [mailto:i...@visualsvn.com]
 Sent: woensdag 27 augustus 2014 15:29
 To: Bert Huijben
 Subject: Re: [Patch] R/W lock slowness on Windows (Was: Windows R/W lock
 comment / Reader Writer lock performance on Windows)

 Hi Bert!

 Did you get any feedback about this from APR developers? What do you
 think about my suggestion that I've posted on apr@ mailing list?

 I didn't get any feedback except yours :(

 1)
 In general apr doesn't promise anything about its output arguments on error 
 cases, so I'm not sure if we should start look at this specifically here.
 (Other platforms don't care and I see at least one that don't set it on all 
 error cases)

Yes, I know that general practice do not initialize output arguments
on error, but I think it worth to keep old behavior because this is
very sensitive area (deadlocks and etc).

 2)
 Same reasoning: Other platforms differ here, so I don't see any requirement 
 to keep Windows strictly as it was.
Ok.

-- 
Ivan Zhakov


Re: [Patch] R/W lock slowness on Windows (Was: Windows R/W lock comment / Reader Writer lock performance on Windows)

2014-08-03 Thread Ivan Zhakov
On 2 August 2014 13:02, Bert Huijben b...@qqmail.nl wrote:
 Hi,

 With this mail I would like to ping the original problem with the second
 patch I send to the list. This patch doesn't have the likely original
 problems of potentially changing the lock behavior to a spin lock, but
 provides all the performance improvements anyway.

 It just replaces the Windows inter-proces mutex in the apr code with the
 already existing apr-mutex implementation, which is usually implemented as
 the far more efficient Windows in-process Critical section.

 This makes the implementation 10* to 140* times more efficient on Windows
 and more similar to the performance on other platforms. A performance fix
 like this is critical for Subversion, or we would have to implement our own
 r/w locking implementation for our caching infrastructure.

Hi Bert,

Comments on your patch:
1. It seems behavior of apr_thread_rwlock_create() functions changes a
bit: code in trunk intializes output *rwlock pointer to NULL in case
of error, while your patch returns pointer to partially initialized
object. It seems that apr_thread_rwlock_create() doesn't have API
promise to initalize output to NULL, but I think it's better to leave
current behavior.

2. The apr_thread_rwlock_unlock() currently uses checked mutex
behavior for its logic. This is possible reason why current rwlock
implementation uses OS mutexes instead of cirtical sections. I don't
see problem with proposed change  though.


-- 
Ivan Zhakov


[Patch] R/W lock slowness on Windows (Was: Windows R/W lock comment / Reader Writer lock performance on Windows)

2014-08-02 Thread Bert Huijben
Hi,

With this mail I would like to ping the original problem with the second
patch I send to the list. This patch doesn't have the likely original
problems of potentially changing the lock behavior to a spin lock, but
provides all the performance improvements anyway.

It just replaces the Windows inter-proces mutex in the apr code with the
already existing apr-mutex implementation, which is usually implemented as
the far more efficient Windows in-process Critical section.

This makes the implementation 10* to 140* times more efficient on Windows
and more similar to the performance on other platforms. A performance fix
like this is critical for Subversion, or we would have to implement our own
r/w locking implementation for our caching infrastructure.

Bert


apr-rwlock-AprMutex.patch
Description: Binary data


RE: Windows R/W lock comment

2014-07-07 Thread Bert Huijben
I think it is a hybrid loop, but the documentation is not really clear about 
it. It is not as hybrid as a critical section (which internally falls back to 
the kernel based mutex behavior for long waits), but it is more CPU efficient 
than a loop around a few simple atomic operations.

 

Looking at the old implementation I’m thinking that as a first step we should 
switch the current/old implementation to using an APR mutex instead of the 
Windows mutex. On all NT platforms the apr mutex is implemented as a critical 
section, which is much faster than the inter-process capable Windows mutex.

(Handling this mutex is where the current implementation spends almost all its 
time)

 

Bert

 

 

From: Jeff Trawick [mailto:traw...@gmail.com] 
Sent: donderdag 3 juli 2014 21:57
To: APR Developer List
Subject: Windows R/W lock comment

 

I'm getting filtered when I try to respond to the original thread. 

 

This is an important issue:  If a writer holds the lock for a significant time 
(e.g., refreshing a table from a database), do all would-be readers spin-loop, 
or is it a hybrid spin-then-sleep solution?  We should know this for certain.

 

Busy-loop is a no-go, at least without some non-default build-time option (or 
in later releases, a _ex() form of the create API which allows the flavor to be 
selected).

 



RE: Windows R/W lock comment

2014-07-07 Thread Bert Huijben
By googling for more details I just found an additional problem…

 

TryAcquireSRWLockExclusive and TryAcquireSRWLockShared are only implemented on 
Windows 7 and later (not Vista and later), so the code needs more fallback 
behavior for Vista.

 

Bert

 

From: Bert Huijben [mailto:b...@qqmail.nl] 
Sent: maandag 7 juli 2014 12:25
To: 'Jeff Trawick'; 'APR Developer List'
Subject: RE: Windows R/W lock comment

 

I think it is a hybrid loop, but the documentation is not really clear about 
it. It is not as hybrid as a critical section (which internally falls back to 
the kernel based mutex behavior for long waits), but it is more CPU efficient 
than a loop around a few simple atomic operations.

 

Looking at the old implementation I’m thinking that as a first step we should 
switch the current/old implementation to using an APR mutex instead of the 
Windows mutex. On all NT platforms the apr mutex is implemented as a critical 
section, which is much faster than the inter-process capable Windows mutex.

(Handling this mutex is where the current implementation spends almost all its 
time)

 

Bert

 

 

From: Jeff Trawick [mailto:traw...@gmail.com] 
Sent: donderdag 3 juli 2014 21:57
To: APR Developer List
Subject: Windows R/W lock comment

 

I'm getting filtered when I try to respond to the original thread. 

 

This is an important issue:  If a writer holds the lock for a significant time 
(e.g., refreshing a table from a database), do all would-be readers spin-loop, 
or is it a hybrid spin-then-sleep solution?  We should know this for certain.

 

Busy-loop is a no-go, at least without some non-default build-time option (or 
in later releases, a _ex() form of the create API which allows the flavor to be 
selected).

 



RE: Windows R/W lock comment

2014-07-07 Thread Bert Huijben
I created a patch for the ‘use apr mutex instead of Windows mutex’ scenario 
with a surprising result in my initial measurements.

 

It looks like this implementation is actually a tiny bit faster than the 
Windows SRW lock on all cases in testlockperf.exe…

It could be that this is caused by trying to fit everything in a single pointer 
value (as Windows does for the SRWLock), or it could be noise in my test 
environment.

 

But at least it makes the apr rwlock fast enough for Subversion’s in memory 
cache, without really changing any of the api behavior of apr.

 

Note that I now have to use rwlock-readers in apr_thread_rwlock_unlock(), like 
in my original patch as the apr unlock doesn’t produce an usable errorcode. But 
as we can assume having at least one reader or writer lock when entering this 
function, this is safe to do.

 

Bert

 

 

From: Bert Huijben [mailto:b...@qqmail.nl] 
Sent: maandag 7 juli 2014 12:25
To: 'Jeff Trawick'; 'APR Developer List'
Subject: RE: Windows R/W lock comment

 

I think it is a hybrid loop, but the documentation is not really clear about 
it. It is not as hybrid as a critical section (which internally falls back to 
the kernel based mutex behavior for long waits), but it is more CPU efficient 
than a loop around a few simple atomic operations.

 

Looking at the old implementation I’m thinking that as a first step we should 
switch the current/old implementation to using an APR mutex instead of the 
Windows mutex. On all NT platforms the apr mutex is implemented as a critical 
section, which is much faster than the inter-process capable Windows mutex.

(Handling this mutex is where the current implementation spends almost all its 
time)

 

Bert

 

 

From: Jeff Trawick [mailto:traw...@gmail.com] 
Sent: donderdag 3 juli 2014 21:57
To: APR Developer List
Subject: Windows R/W lock comment

 

I'm getting filtered when I try to respond to the original thread. 

 

This is an important issue:  If a writer holds the lock for a significant time 
(e.g., refreshing a table from a database), do all would-be readers spin-loop, 
or is it a hybrid spin-then-sleep solution?  We should know this for certain.

 

Busy-loop is a no-go, at least without some non-default build-time option (or 
in later releases, a _ex() form of the create API which allows the flavor to be 
selected).

 



apr-rwlock-AprMutex.patch
Description: Binary data


Windows R/W lock comment

2014-07-03 Thread Jeff Trawick
I'm getting filtered when I try to respond to the original thread.

This is an important issue:  If a writer holds the lock for a significant
time (e.g., refreshing a table from a database), do all would-be readers
spin-loop, or is it a hybrid spin-then-sleep solution?  We should know this
for certain.

Busy-loop is a no-go, at least without some non-default build-time option
(or in later releases, a _ex() form of the create API which allows the
flavor to be selected).