Re: mod_authz_svn and caching

2017-01-15 Thread Stefan Fuhrmann

On 14.11.2016 10:34, Stefan Fuhrmann wrote:

On 11.11.2016 01:01, Branko Čibej wrote:

On 06.11.2016 13:47, Stefan Fuhrmann wrote:

On 05.11.2016 17:51, Hans van Kranenburg wrote:

I don't see noticable performance issues at all while reparsing the
little files every time right now (how big/complex is the average authz
file?)

I remember generated authz files > 10MB.  That translates
into 100s of ms parser overhead per e.g. GET request.


That's true with the current code, but the code on the authzperf branch
should be quite a bit faster. Still slower than SHA1 though (although,
for this kind of cache, MD5 is good enough and faster).

Sure. I just wanted to provide a data point because
Hans was wondering how big "big" actually is.

Just don't forget that mod_authz_svn can also pull authz files from a
Subversion repo, not just from disk.

Yup, I think I mentioned that in my initial reply b/c
we typically get the SHA1 for free in that case.


As of r1778923, Subversion /trunk always uses the latest authz config
for any HTTP request.

-- Stefan^2.


Re: mod_authz_svn and caching

2016-11-14 Thread Stefan Fuhrmann

On 11.11.2016 01:01, Branko Čibej wrote:

On 06.11.2016 13:47, Stefan Fuhrmann wrote:

On 05.11.2016 17:51, Hans van Kranenburg wrote:

I don't see noticable performance issues at all while reparsing the
little files every time right now (how big/complex is the average authz
file?)

I remember generated authz files > 10MB.  That translates
into 100s of ms parser overhead per e.g. GET request.


That's true with the current code, but the code on the authzperf branch
should be quite a bit faster. Still slower than SHA1 though (although,
for this kind of cache, MD5 is good enough and faster).

Sure. I just wanted to provide a data point because
Hans was wondering how big "big" actually is.

Just don't forget that mod_authz_svn can also pull authz files from a
Subversion repo, not just from disk.

Yup, I think I mentioned that in my initial reply b/c
we typically get the SHA1 for free in that case.

-- Stefan^2.


Re: mod_authz_svn and caching

2016-11-10 Thread Branko Čibej
On 06.11.2016 13:47, Stefan Fuhrmann wrote:
> On 05.11.2016 17:51, Hans van Kranenburg wrote:
>> I don't see noticable performance issues at all while reparsing the
>> little files every time right now (how big/complex is the average authz
>> file?)
>
> I remember generated authz files > 10MB.  That translates
> into 100s of ms parser overhead per e.g. GET request.


That's true with the current code, but the code on the authzperf branch
should be quite a bit faster. Still slower than SHA1 though (although,
for this kind of cache, MD5 is good enough and faster).

Just don't forget that mod_authz_svn can also pull authz files from a
Subversion repo, not just from disk.


-- Brane


Re: mod_authz_svn and caching

2016-11-06 Thread Stefan Fuhrmann

On 05.11.2016 17:51, Hans van Kranenburg wrote:

On 11/05/2016 05:15 PM, Daniel Shahaf wrote:

Hans van Kranenburg wrote on Sat, Nov 05, 2016 at 16:39:44 +0100:

On 11/05/2016 11:44 AM, Stefan Fuhrmann wrote:

Looking at your workaround, I think the cleanest solution would be to
introduce an AuthzSVNDisableCache option.


I'm not sure a knob to _disable_ caching is the right approach.  So long
as using the cache trades off correctness for performance, shouldn't the
cache be off by default, with a knob to enable it?  I.e., shouldn't the
authz code be conservative and default to prefering correctness over
performance?


Ack.


The opt-out approach would be the one that does not break
existing known working setups.  Otherwise, opt-in might be
the better choice.


I also guess that the cache invalidation change will have to lead
to deprecation/removal of the option again shortly thereafter, causing
extra cruft in the code.


Sure.  OTOH, in other parts of Subversion, being able to
disable certain caches has come in handy in the past for a
variety of - often unforeseen - reasons.  Thus, the option
would not become entirely pointless.


I don't see noticable performance issues at all while reparsing the
little files every time right now (how big/complex is the average authz
file?)


I remember generated authz files > 10MB.  That translates
into 100s of ms parser overhead per e.g. GET request.


on large operations, consisting of many http requests to do
checkouts/commits etc. Especially not compared to using htgroup, which
feels like it gets into some O(2^n) horror when you put a user into too
many groups.


Well, I think the current authz group resolution can also
be expensive if you carelessly nest groups.


As to changing the value of AuthzSVNDisableCache: the test framework
doesn't let you do that; it presumes an httpd is already running.  But
the test suite could launch the server with cache enabled, and provide
a harness option to run with cache disabled, similar to the handling of
the 'short-circuit' option.

(I assume the parsed file's cache lifetime is per-connection in svnserve too)


No, svnserve has a global cache for authz.  Correctly
controlling object/pool lifetime had been a major issue.

-- Stefan^2.


Re: mod_authz_svn and caching

2016-11-05 Thread Daniel Shahaf
Hans van Kranenburg wrote on Sat, Nov 05, 2016 at 17:51:13 +0100:
> On 11/05/2016 05:15 PM, Daniel Shahaf wrote:
> > Hans van Kranenburg wrote on Sat, Nov 05, 2016 at 16:39:44 +0100:
> >> On 11/05/2016 11:44 AM, Stefan Fuhrmann wrote:
> >> I also guess that ideally the tests need to be updated with cases which
> >> show that authz file changes while doing requests are not seen when
> >> caching is enabled and doing multiple requests uwing HTTP/1.1 keepalive
> >> on a single connection, while they *are* seen when AuthzSVNDisableCache
> >> is enabled. From what I see so far, this is not a trivial undertaking.
> > 
> > The catch is, how to write code that commits twice to a single
> > repository over a single connection.  I don't think the command-line
> > client lets you do that.  Given that we want to test both svnserve and
> > mod_dav_svn, using httplib directly would be suboptimal.  I think that
> > leaves (a) write a C test, (b) write a C helper tool for doing multiple
> > commits over a single connection and write a Python test using that.
> > (Example: atomic-ra-revprop-change.)
> 
> Let httpd proxy to itself using keepalive on the backend and do multiple
> actions? Gheh, but, very tricky, since you don't control the behaviour.

It'd probably be easier to perform a commit 'by hand' using httplib from
the Python tests, but that would test just the mod_dav_svn caching, not
the svnserve caching.

I agree with the remainder (snipped).

Cheers,

Daniel


Re: mod_authz_svn and caching

2016-11-05 Thread Hans van Kranenburg
On 11/05/2016 05:15 PM, Daniel Shahaf wrote:
> Hans van Kranenburg wrote on Sat, Nov 05, 2016 at 16:39:44 +0100:
>> On 11/05/2016 11:44 AM, Stefan Fuhrmann wrote:
>>> Looking at your workaround, I think the cleanest solution would be to
>>> introduce an AuthzSVNDisableCache option.
> 
> I'm not sure a knob to _disable_ caching is the right approach.  So long
> as using the cache trades off correctness for performance, shouldn't the
> cache be off by default, with a knob to enable it?  I.e., shouldn't the
> authz code be conservative and default to prefering correctness over
> performance?

Ack.

> (However, this point may be moot; see below.)
> 
>>> On 05.11.2016 00:19, Hans van Kranenburg wrote:
 I can give it a shot to prepare a patch to do this, if wanted.
>>>
>>> Patches are always welcome.  I think adding AuthzSVNDisableCache
>>> should be easy enough for "giving it a shot".
>>
>> I guess I have to target the development (to be 1.10) code for the
>> patch?
> 
> You would, yes.  We don't add new features in patch releases (1.x.y
> with y>0).
> 
>> I also guess that the cache invalidation change will have to lead
>> to deprecation/removal of the option again shortly thereafter, causing
>> extra cruft in the code.
> 
> That is a valid point: if cache invalidation is performant, we might not
> need a knob to disable the cache.

Did you mean "we might not need a knob to *en*able the cache"?

If "calculating the SHA1 is ~10x as fast as actually parsing the file",
then performance of disabling cache is always worse. (Ok, except in the
case of a modified file, where it adds 10% overhead. But, it's safe to
assume that the relative change rate is very low.)

Anyway, having a method that provides correctness with a 10% performance
penalty (and only on this little step), is imho prefered to having an
extra options that a user needs to have an opinion about.

I don't see noticable performance issues at all while reparsing the
little files every time right now (how big/complex is the average authz
file?) on large operations, consisting of many http requests to do
checkouts/commits etc. Especially not compared to using htgroup, which
feels like it gets into some O(2^n) horror when you put a user into too
many groups.

>> So, does it make sense at all? I'm using Debian on the servers, with 1.8
>> now, and 1.9 in the next stable release, so I'll have to roll my own
>> packages anyway, for which the quick patch is good enough.
> 
> (For the list archives: the 'quick patch' updates get_access_conf() to
> remove the CACHE_KEY computation and always take the «access_conf ==
> NULL» codepath.)
> 
>> I also guess that ideally the tests need to be updated with cases which
>> show that authz file changes while doing requests are not seen when
>> caching is enabled and doing multiple requests uwing HTTP/1.1 keepalive
>> on a single connection, while they *are* seen when AuthzSVNDisableCache
>> is enabled. From what I see so far, this is not a trivial undertaking.
> 
> The catch is, how to write code that commits twice to a single
> repository over a single connection.  I don't think the command-line
> client lets you do that.  Given that we want to test both svnserve and
> mod_dav_svn, using httplib directly would be suboptimal.  I think that
> leaves (a) write a C test, (b) write a C helper tool for doing multiple
> commits over a single connection and write a Python test using that.
> (Example: atomic-ra-revprop-change.)

Let httpd proxy to itself using keepalive on the backend and do multiple
actions? Gheh, but, very tricky, since you don't control the behaviour.

But, all of this is a bit ridiculous, since the complexity of the test
will vastly overshadow the complexity of the feature being tested, which
means there's a bigger chance of having bugs in the test code than
having bugs in the feature itself.

> As to changing the value of AuthzSVNDisableCache: the test framework
> doesn't let you do that; it presumes an httpd is already running.  But
> the test suite could launch the server with cache enabled, and provide
> a harness option to run with cache disabled, similar to the handling of
> the 'short-circuit' option.
> 
> (I assume the parsed file's cache lifetime is per-connection in svnserve too)

So, my general feeling is... Let's forget about this, and +1 for the
sha1 cache invalidate way. (hm, complexity of the test still stands, if
you would want to test the cache invalidation behaviour...)

-- 
Hans van Kranenburg


Re: mod_authz_svn and caching

2016-11-05 Thread Daniel Shahaf
Hans van Kranenburg wrote on Sat, Nov 05, 2016 at 16:39:44 +0100:
> On 11/05/2016 11:44 AM, Stefan Fuhrmann wrote:
> > Looking at your workaround, I think the cleanest solution would be to
> > introduce an AuthzSVNDisableCache option.

I'm not sure a knob to _disable_ caching is the right approach.  So long
as using the cache trades off correctness for performance, shouldn't the
cache be off by default, with a knob to enable it?  I.e., shouldn't the
authz code be conservative and default to prefering correctness over
performance?

(However, this point may be moot; see below.)

> > On 05.11.2016 00:19, Hans van Kranenburg wrote:
> >> I can give it a shot to prepare a patch to do this, if wanted.
> > 
> > Patches are always welcome.  I think adding AuthzSVNDisableCache
> > should be easy enough for "giving it a shot".
> 
> I guess I have to target the development (to be 1.10) code for the
> patch?

You would, yes.  We don't add new features in patch releases (1.x.y
with y>0).

> I also guess that the cache invalidation change will have to lead
> to deprecation/removal of the option again shortly thereafter, causing
> extra cruft in the code.

That is a valid point: if cache invalidation is performant, we might not
need a knob to disable the cache.

> So, does it make sense at all? I'm using Debian on the servers, with 1.8
> now, and 1.9 in the next stable release, so I'll have to roll my own
> packages anyway, for which the quick patch is good enough.

(For the list archives: the 'quick patch' updates get_access_conf() to
remove the CACHE_KEY computation and always take the «access_conf ==
NULL» codepath.)

> I also guess that ideally the tests need to be updated with cases which
> show that authz file changes while doing requests are not seen when
> caching is enabled and doing multiple requests uwing HTTP/1.1 keepalive
> on a single connection, while they *are* seen when AuthzSVNDisableCache
> is enabled. From what I see so far, this is not a trivial undertaking.

The catch is, how to write code that commits twice to a single
repository over a single connection.  I don't think the command-line
client lets you do that.  Given that we want to test both svnserve and
mod_dav_svn, using httplib directly would be suboptimal.  I think that
leaves (a) write a C test, (b) write a C helper tool for doing multiple
commits over a single connection and write a Python test using that.
(Example: atomic-ra-revprop-change.)

As to changing the value of AuthzSVNDisableCache: the test framework
doesn't let you do that; it presumes an httpd is already running.  But
the test suite could launch the server with cache enabled, and provide
a harness option to run with cache disabled, similar to the handling of
the 'short-circuit' option.

(I assume the parsed file's cache lifetime is per-connection in svnserve too)

Cheers,

Daniel


Re: mod_authz_svn and caching

2016-11-05 Thread Hans van Kranenburg
Hi Stefan,

Thanks for the quick reply.

On 11/05/2016 11:44 AM, Stefan Fuhrmann wrote:
> 
> Thanks for the issue report!  There seem to be two independent
> approaches that we could take to address the problem -- see below.
> 
> On 05.11.2016 00:19, Hans van Kranenburg wrote:
>> (please keep me in Cc: since I'm not on the list (yet))
>>
>> [...]
>>
>> Currently, I'm running a modified mod_authz_svn that simply disables
>> caching (not causing any noticable performance difference):
>>
>> http://paste.debian.net/plainh/0b2f4e20
>>
>> I understand that it's possible to create really complex authorization
>> structures inside a file for a single repository, so the caching of the
>> result of parsing the file might be useful in some cases.
> 
> Looking at your workaround, I think the cleanest solution would be to
> introduce an AuthzSVNDisableCache option.  Just grep in the same source
> file for AuthzSVNNoAuthWhenAnonymousAllowed and no_auth_when_anon_ok as
> an example for a boolean option.  Use the new flag in get_access_conf()
> to optionally skip the cache access.

Ok, sounds good. I'm mainly a python programmer, and my C is currently
on the level of 'can read and understand, can modify and copy/paste
program', so that matches. :)

>> My suggestion about improving this situation would be to simply save the
>> modification time of the authz file with the cached information, and
>> check the mtime again on every request, invalidating the cache whenever
>> we notice that the file has changed.
> 
> svnserve already has code for that.  The idea is to use content SHA1
> instead of timestamps.  The latter might be racy / have to low a
> resolution during the repo creation phase where an authz file could
> be created first and then filled shortly thereafter.
> 
> SHA1, on the other hand, are immediately available for in-repository
> authz and even for external files, calculating the SHA1 is ~10x as
> fast as actually parsing the file.

Aha, so read the file anyway, and if the sha differs, continue do the
parsing step on the already read text.

I see the problem with the race condition with actual changing file
contents and mtime update.

I think changing the files in place is a bad idea in any case. For
changing the files, writing a new updated authz file first, and then
os.rename() it over the old one, should avoid these kind of problems.
But, as you mention, there can be more cases.

> The plan is to introduce a new authz implementation in, hopefully, 1.10
> and making the cache invalidation available everywhere is part of that.

Yes, if this way of working is used in multiple places, then it's a very
good reason to not do something else here.

>> I can give it a shot to prepare a patch to do this, if wanted.
> 
> Patches are always welcome.  I think adding AuthzSVNDisableCache
> should be easy enough for "giving it a shot".

I guess I have to target the development (to be 1.10) code for the
patch? I also guess that the cache invalidation change will have to lead
to deprecation/removal of the option again shortly thereafter, causing
extra cruft in the code.

So, does it make sense at all? I'm using Debian on the servers, with 1.8
now, and 1.9 in the next stable release, so I'll have to roll my own
packages anyway, for which the quick patch is good enough.

I also guess that ideally the tests need to be updated with cases which
show that authz file changes while doing requests are not seen when
caching is enabled and doing multiple requests uwing HTTP/1.1 keepalive
on a single connection, while they *are* seen when AuthzSVNDisableCache
is enabled. From what I see so far, this is not a trivial undertaking.

Thanks,
-- 
Hans van Kranenburg


Re: mod_authz_svn and caching

2016-11-05 Thread Stefan Fuhrmann

Hi Hans,

Thanks for the issue report!  There seem to be two independent
approaches that we could take to address the problem -- see below.

On 05.11.2016 00:19, Hans van Kranenburg wrote:

(please keep me in Cc: since I'm not on the list (yet))

Hi,

We recently switched from using a htgroup file with apache2 to using the
/conf/authz file with AuthzSVNReposRelativeAccessFile, for
performance reasons.

The authz files are really simple, like this:

[/]
f...@example.com = rw
b...@example.com = rw

>

The performance benefit is ludicrous, but we ran into an issue:

Since we have nginx in front of the apache2+svn box, there's only a
small amount of connections that is opened between nginx and apache2.
Those connections carry requests from different end users, and live
relatively long.

Inside mode_authz_svn the per-repo authz files are cached using a
per-connection cache.


Up to here, everything should "just" work.


Since we dynamically create and remove repositories, and dynamically add
and remove users, users that are added to a repository are not able to
use the repository immediately. Also, our automated create repo flow
which first creates the repo, checks if it can be accessed correctly and
then adds the user to the authorization list triggers this problem in
nearly 100% of the cases.


Yes, the dynamic aspect is an issue.


Currently, I'm running a modified mod_authz_svn that simply disables
caching (not causing any noticable performance difference):

http://paste.debian.net/plainh/0b2f4e20

I understand that it's possible to create really complex authorization
structures inside a file for a single repository, so the caching of the
result of parsing the file might be useful in some cases.


Looking at your workaround, I think the cleanest solution would be to
introduce an AuthzSVNDisableCache option.  Just grep in the same source
file for AuthzSVNNoAuthWhenAnonymousAllowed and no_auth_when_anon_ok as
an example for a boolean option.  Use the new flag in get_access_conf()
to optionally skip the cache access.


My suggestion about improving this situation would be to simply save the
modification time of the authz file with the cached information, and
check the mtime again on every request, invalidating the cache whenever
we notice that the file has changed.


svnserve already has code for that.  The idea is to use content SHA1
instead of timestamps.  The latter might be racy / have to low a
resolution during the repo creation phase where an authz file could
be created first and then filled shortly thereafter.

SHA1, on the other hand, are immediately available for in-repository
authz and even for external files, calculating the SHA1 is ~10x as
fast as actually parsing the file.

The plan is to introduce a new authz implementation in, hopefully, 1.10
and making the cache invalidation available everywhere is part of that.


I can give it a shot to prepare a patch to do this, if wanted.


Patches are always welcome.  I think adding AuthzSVNDisableCache
should be easy enough for "giving it a shot".

-- Stefan^2.