Re: mod_authz_svn and caching
On 14.11.2016 10:34, Stefan Fuhrmann wrote: On 11.11.2016 01:01, Branko Čibej wrote: On 06.11.2016 13:47, Stefan Fuhrmann wrote: On 05.11.2016 17:51, Hans van Kranenburg wrote: I don't see noticable performance issues at all while reparsing the little files every time right now (how big/complex is the average authz file?) I remember generated authz files > 10MB. That translates into 100s of ms parser overhead per e.g. GET request. That's true with the current code, but the code on the authzperf branch should be quite a bit faster. Still slower than SHA1 though (although, for this kind of cache, MD5 is good enough and faster). Sure. I just wanted to provide a data point because Hans was wondering how big "big" actually is. Just don't forget that mod_authz_svn can also pull authz files from a Subversion repo, not just from disk. Yup, I think I mentioned that in my initial reply b/c we typically get the SHA1 for free in that case. As of r1778923, Subversion /trunk always uses the latest authz config for any HTTP request. -- Stefan^2.
Re: mod_authz_svn and caching
On 11.11.2016 01:01, Branko Čibej wrote: On 06.11.2016 13:47, Stefan Fuhrmann wrote: On 05.11.2016 17:51, Hans van Kranenburg wrote: I don't see noticable performance issues at all while reparsing the little files every time right now (how big/complex is the average authz file?) I remember generated authz files > 10MB. That translates into 100s of ms parser overhead per e.g. GET request. That's true with the current code, but the code on the authzperf branch should be quite a bit faster. Still slower than SHA1 though (although, for this kind of cache, MD5 is good enough and faster). Sure. I just wanted to provide a data point because Hans was wondering how big "big" actually is. Just don't forget that mod_authz_svn can also pull authz files from a Subversion repo, not just from disk. Yup, I think I mentioned that in my initial reply b/c we typically get the SHA1 for free in that case. -- Stefan^2.
Re: mod_authz_svn and caching
On 06.11.2016 13:47, Stefan Fuhrmann wrote: > On 05.11.2016 17:51, Hans van Kranenburg wrote: >> I don't see noticable performance issues at all while reparsing the >> little files every time right now (how big/complex is the average authz >> file?) > > I remember generated authz files > 10MB. That translates > into 100s of ms parser overhead per e.g. GET request. That's true with the current code, but the code on the authzperf branch should be quite a bit faster. Still slower than SHA1 though (although, for this kind of cache, MD5 is good enough and faster). Just don't forget that mod_authz_svn can also pull authz files from a Subversion repo, not just from disk. -- Brane
Re: mod_authz_svn and caching
On 05.11.2016 17:51, Hans van Kranenburg wrote: On 11/05/2016 05:15 PM, Daniel Shahaf wrote: Hans van Kranenburg wrote on Sat, Nov 05, 2016 at 16:39:44 +0100: On 11/05/2016 11:44 AM, Stefan Fuhrmann wrote: Looking at your workaround, I think the cleanest solution would be to introduce an AuthzSVNDisableCache option. I'm not sure a knob to _disable_ caching is the right approach. So long as using the cache trades off correctness for performance, shouldn't the cache be off by default, with a knob to enable it? I.e., shouldn't the authz code be conservative and default to prefering correctness over performance? Ack. The opt-out approach would be the one that does not break existing known working setups. Otherwise, opt-in might be the better choice. I also guess that the cache invalidation change will have to lead to deprecation/removal of the option again shortly thereafter, causing extra cruft in the code. Sure. OTOH, in other parts of Subversion, being able to disable certain caches has come in handy in the past for a variety of - often unforeseen - reasons. Thus, the option would not become entirely pointless. I don't see noticable performance issues at all while reparsing the little files every time right now (how big/complex is the average authz file?) I remember generated authz files > 10MB. That translates into 100s of ms parser overhead per e.g. GET request. on large operations, consisting of many http requests to do checkouts/commits etc. Especially not compared to using htgroup, which feels like it gets into some O(2^n) horror when you put a user into too many groups. Well, I think the current authz group resolution can also be expensive if you carelessly nest groups. As to changing the value of AuthzSVNDisableCache: the test framework doesn't let you do that; it presumes an httpd is already running. But the test suite could launch the server with cache enabled, and provide a harness option to run with cache disabled, similar to the handling of the 'short-circuit' option. (I assume the parsed file's cache lifetime is per-connection in svnserve too) No, svnserve has a global cache for authz. Correctly controlling object/pool lifetime had been a major issue. -- Stefan^2.
Re: mod_authz_svn and caching
Hans van Kranenburg wrote on Sat, Nov 05, 2016 at 17:51:13 +0100: > On 11/05/2016 05:15 PM, Daniel Shahaf wrote: > > Hans van Kranenburg wrote on Sat, Nov 05, 2016 at 16:39:44 +0100: > >> On 11/05/2016 11:44 AM, Stefan Fuhrmann wrote: > >> I also guess that ideally the tests need to be updated with cases which > >> show that authz file changes while doing requests are not seen when > >> caching is enabled and doing multiple requests uwing HTTP/1.1 keepalive > >> on a single connection, while they *are* seen when AuthzSVNDisableCache > >> is enabled. From what I see so far, this is not a trivial undertaking. > > > > The catch is, how to write code that commits twice to a single > > repository over a single connection. I don't think the command-line > > client lets you do that. Given that we want to test both svnserve and > > mod_dav_svn, using httplib directly would be suboptimal. I think that > > leaves (a) write a C test, (b) write a C helper tool for doing multiple > > commits over a single connection and write a Python test using that. > > (Example: atomic-ra-revprop-change.) > > Let httpd proxy to itself using keepalive on the backend and do multiple > actions? Gheh, but, very tricky, since you don't control the behaviour. It'd probably be easier to perform a commit 'by hand' using httplib from the Python tests, but that would test just the mod_dav_svn caching, not the svnserve caching. I agree with the remainder (snipped). Cheers, Daniel
Re: mod_authz_svn and caching
On 11/05/2016 05:15 PM, Daniel Shahaf wrote: > Hans van Kranenburg wrote on Sat, Nov 05, 2016 at 16:39:44 +0100: >> On 11/05/2016 11:44 AM, Stefan Fuhrmann wrote: >>> Looking at your workaround, I think the cleanest solution would be to >>> introduce an AuthzSVNDisableCache option. > > I'm not sure a knob to _disable_ caching is the right approach. So long > as using the cache trades off correctness for performance, shouldn't the > cache be off by default, with a knob to enable it? I.e., shouldn't the > authz code be conservative and default to prefering correctness over > performance? Ack. > (However, this point may be moot; see below.) > >>> On 05.11.2016 00:19, Hans van Kranenburg wrote: I can give it a shot to prepare a patch to do this, if wanted. >>> >>> Patches are always welcome. I think adding AuthzSVNDisableCache >>> should be easy enough for "giving it a shot". >> >> I guess I have to target the development (to be 1.10) code for the >> patch? > > You would, yes. We don't add new features in patch releases (1.x.y > with y>0). > >> I also guess that the cache invalidation change will have to lead >> to deprecation/removal of the option again shortly thereafter, causing >> extra cruft in the code. > > That is a valid point: if cache invalidation is performant, we might not > need a knob to disable the cache. Did you mean "we might not need a knob to *en*able the cache"? If "calculating the SHA1 is ~10x as fast as actually parsing the file", then performance of disabling cache is always worse. (Ok, except in the case of a modified file, where it adds 10% overhead. But, it's safe to assume that the relative change rate is very low.) Anyway, having a method that provides correctness with a 10% performance penalty (and only on this little step), is imho prefered to having an extra options that a user needs to have an opinion about. I don't see noticable performance issues at all while reparsing the little files every time right now (how big/complex is the average authz file?) on large operations, consisting of many http requests to do checkouts/commits etc. Especially not compared to using htgroup, which feels like it gets into some O(2^n) horror when you put a user into too many groups. >> So, does it make sense at all? I'm using Debian on the servers, with 1.8 >> now, and 1.9 in the next stable release, so I'll have to roll my own >> packages anyway, for which the quick patch is good enough. > > (For the list archives: the 'quick patch' updates get_access_conf() to > remove the CACHE_KEY computation and always take the «access_conf == > NULL» codepath.) > >> I also guess that ideally the tests need to be updated with cases which >> show that authz file changes while doing requests are not seen when >> caching is enabled and doing multiple requests uwing HTTP/1.1 keepalive >> on a single connection, while they *are* seen when AuthzSVNDisableCache >> is enabled. From what I see so far, this is not a trivial undertaking. > > The catch is, how to write code that commits twice to a single > repository over a single connection. I don't think the command-line > client lets you do that. Given that we want to test both svnserve and > mod_dav_svn, using httplib directly would be suboptimal. I think that > leaves (a) write a C test, (b) write a C helper tool for doing multiple > commits over a single connection and write a Python test using that. > (Example: atomic-ra-revprop-change.) Let httpd proxy to itself using keepalive on the backend and do multiple actions? Gheh, but, very tricky, since you don't control the behaviour. But, all of this is a bit ridiculous, since the complexity of the test will vastly overshadow the complexity of the feature being tested, which means there's a bigger chance of having bugs in the test code than having bugs in the feature itself. > As to changing the value of AuthzSVNDisableCache: the test framework > doesn't let you do that; it presumes an httpd is already running. But > the test suite could launch the server with cache enabled, and provide > a harness option to run with cache disabled, similar to the handling of > the 'short-circuit' option. > > (I assume the parsed file's cache lifetime is per-connection in svnserve too) So, my general feeling is... Let's forget about this, and +1 for the sha1 cache invalidate way. (hm, complexity of the test still stands, if you would want to test the cache invalidation behaviour...) -- Hans van Kranenburg
Re: mod_authz_svn and caching
Hans van Kranenburg wrote on Sat, Nov 05, 2016 at 16:39:44 +0100: > On 11/05/2016 11:44 AM, Stefan Fuhrmann wrote: > > Looking at your workaround, I think the cleanest solution would be to > > introduce an AuthzSVNDisableCache option. I'm not sure a knob to _disable_ caching is the right approach. So long as using the cache trades off correctness for performance, shouldn't the cache be off by default, with a knob to enable it? I.e., shouldn't the authz code be conservative and default to prefering correctness over performance? (However, this point may be moot; see below.) > > On 05.11.2016 00:19, Hans van Kranenburg wrote: > >> I can give it a shot to prepare a patch to do this, if wanted. > > > > Patches are always welcome. I think adding AuthzSVNDisableCache > > should be easy enough for "giving it a shot". > > I guess I have to target the development (to be 1.10) code for the > patch? You would, yes. We don't add new features in patch releases (1.x.y with y>0). > I also guess that the cache invalidation change will have to lead > to deprecation/removal of the option again shortly thereafter, causing > extra cruft in the code. That is a valid point: if cache invalidation is performant, we might not need a knob to disable the cache. > So, does it make sense at all? I'm using Debian on the servers, with 1.8 > now, and 1.9 in the next stable release, so I'll have to roll my own > packages anyway, for which the quick patch is good enough. (For the list archives: the 'quick patch' updates get_access_conf() to remove the CACHE_KEY computation and always take the «access_conf == NULL» codepath.) > I also guess that ideally the tests need to be updated with cases which > show that authz file changes while doing requests are not seen when > caching is enabled and doing multiple requests uwing HTTP/1.1 keepalive > on a single connection, while they *are* seen when AuthzSVNDisableCache > is enabled. From what I see so far, this is not a trivial undertaking. The catch is, how to write code that commits twice to a single repository over a single connection. I don't think the command-line client lets you do that. Given that we want to test both svnserve and mod_dav_svn, using httplib directly would be suboptimal. I think that leaves (a) write a C test, (b) write a C helper tool for doing multiple commits over a single connection and write a Python test using that. (Example: atomic-ra-revprop-change.) As to changing the value of AuthzSVNDisableCache: the test framework doesn't let you do that; it presumes an httpd is already running. But the test suite could launch the server with cache enabled, and provide a harness option to run with cache disabled, similar to the handling of the 'short-circuit' option. (I assume the parsed file's cache lifetime is per-connection in svnserve too) Cheers, Daniel
Re: mod_authz_svn and caching
Hi Stefan, Thanks for the quick reply. On 11/05/2016 11:44 AM, Stefan Fuhrmann wrote: > > Thanks for the issue report! There seem to be two independent > approaches that we could take to address the problem -- see below. > > On 05.11.2016 00:19, Hans van Kranenburg wrote: >> (please keep me in Cc: since I'm not on the list (yet)) >> >> [...] >> >> Currently, I'm running a modified mod_authz_svn that simply disables >> caching (not causing any noticable performance difference): >> >> http://paste.debian.net/plainh/0b2f4e20 >> >> I understand that it's possible to create really complex authorization >> structures inside a file for a single repository, so the caching of the >> result of parsing the file might be useful in some cases. > > Looking at your workaround, I think the cleanest solution would be to > introduce an AuthzSVNDisableCache option. Just grep in the same source > file for AuthzSVNNoAuthWhenAnonymousAllowed and no_auth_when_anon_ok as > an example for a boolean option. Use the new flag in get_access_conf() > to optionally skip the cache access. Ok, sounds good. I'm mainly a python programmer, and my C is currently on the level of 'can read and understand, can modify and copy/paste program', so that matches. :) >> My suggestion about improving this situation would be to simply save the >> modification time of the authz file with the cached information, and >> check the mtime again on every request, invalidating the cache whenever >> we notice that the file has changed. > > svnserve already has code for that. The idea is to use content SHA1 > instead of timestamps. The latter might be racy / have to low a > resolution during the repo creation phase where an authz file could > be created first and then filled shortly thereafter. > > SHA1, on the other hand, are immediately available for in-repository > authz and even for external files, calculating the SHA1 is ~10x as > fast as actually parsing the file. Aha, so read the file anyway, and if the sha differs, continue do the parsing step on the already read text. I see the problem with the race condition with actual changing file contents and mtime update. I think changing the files in place is a bad idea in any case. For changing the files, writing a new updated authz file first, and then os.rename() it over the old one, should avoid these kind of problems. But, as you mention, there can be more cases. > The plan is to introduce a new authz implementation in, hopefully, 1.10 > and making the cache invalidation available everywhere is part of that. Yes, if this way of working is used in multiple places, then it's a very good reason to not do something else here. >> I can give it a shot to prepare a patch to do this, if wanted. > > Patches are always welcome. I think adding AuthzSVNDisableCache > should be easy enough for "giving it a shot". I guess I have to target the development (to be 1.10) code for the patch? I also guess that the cache invalidation change will have to lead to deprecation/removal of the option again shortly thereafter, causing extra cruft in the code. So, does it make sense at all? I'm using Debian on the servers, with 1.8 now, and 1.9 in the next stable release, so I'll have to roll my own packages anyway, for which the quick patch is good enough. I also guess that ideally the tests need to be updated with cases which show that authz file changes while doing requests are not seen when caching is enabled and doing multiple requests uwing HTTP/1.1 keepalive on a single connection, while they *are* seen when AuthzSVNDisableCache is enabled. From what I see so far, this is not a trivial undertaking. Thanks, -- Hans van Kranenburg
Re: mod_authz_svn and caching
Hi Hans, Thanks for the issue report! There seem to be two independent approaches that we could take to address the problem -- see below. On 05.11.2016 00:19, Hans van Kranenburg wrote: (please keep me in Cc: since I'm not on the list (yet)) Hi, We recently switched from using a htgroup file with apache2 to using the /conf/authz file with AuthzSVNReposRelativeAccessFile, for performance reasons. The authz files are really simple, like this: [/] f...@example.com = rw b...@example.com = rw > The performance benefit is ludicrous, but we ran into an issue: Since we have nginx in front of the apache2+svn box, there's only a small amount of connections that is opened between nginx and apache2. Those connections carry requests from different end users, and live relatively long. Inside mode_authz_svn the per-repo authz files are cached using a per-connection cache. Up to here, everything should "just" work. Since we dynamically create and remove repositories, and dynamically add and remove users, users that are added to a repository are not able to use the repository immediately. Also, our automated create repo flow which first creates the repo, checks if it can be accessed correctly and then adds the user to the authorization list triggers this problem in nearly 100% of the cases. Yes, the dynamic aspect is an issue. Currently, I'm running a modified mod_authz_svn that simply disables caching (not causing any noticable performance difference): http://paste.debian.net/plainh/0b2f4e20 I understand that it's possible to create really complex authorization structures inside a file for a single repository, so the caching of the result of parsing the file might be useful in some cases. Looking at your workaround, I think the cleanest solution would be to introduce an AuthzSVNDisableCache option. Just grep in the same source file for AuthzSVNNoAuthWhenAnonymousAllowed and no_auth_when_anon_ok as an example for a boolean option. Use the new flag in get_access_conf() to optionally skip the cache access. My suggestion about improving this situation would be to simply save the modification time of the authz file with the cached information, and check the mtime again on every request, invalidating the cache whenever we notice that the file has changed. svnserve already has code for that. The idea is to use content SHA1 instead of timestamps. The latter might be racy / have to low a resolution during the repo creation phase where an authz file could be created first and then filled shortly thereafter. SHA1, on the other hand, are immediately available for in-repository authz and even for external files, calculating the SHA1 is ~10x as fast as actually parsing the file. The plan is to introduce a new authz implementation in, hopefully, 1.10 and making the cache invalidation available everywhere is part of that. I can give it a shot to prepare a patch to do this, if wanted. Patches are always welcome. I think adding AuthzSVNDisableCache should be easy enough for "giving it a shot". -- Stefan^2.