On 10/23/2010 10:48 AM, Graham Leggett wrote: > Hi all, > > I am looking to improve per-request performance when you have an environment > with many > (thousands of) Location sections, and looking at ap_location_walk(), it looks > like it is > currently doing a simple linear search of locations on every request, which > is far from > ideal. > > Ideally, I'd like to create an index of the location sec_dir and sec_url > array, by > creating a tree of hashtables that will allow us to drill down to the > relevant locations > as quick as possible. This will also allow us to pre-merge many of the > configurations at > startup, instead of just at runtime (obviously because .htaccess is parsed at > runtime we > won't completely eliminate merging, but we will minimise it). > > If we're careful about what pools we use on startup, we can also free a lot > of the memory > used in the initial configuration scan, keeping a copy of the merged config, > instead of a > copy of each location's configs.
There are four factors here; 1) location matching, 2) merge caching, 3) premerging, and 4) path lookups. You've conflated these in a way that really concerns me. The requisite changes and improvements to merge caching and introduction of pre-merging have me very concerned, because of the current state of other recently 'improved' code, including cache, proxy and even ldap abstractions. I am not hopeful for a usable API to all of this, although it both very much needed and an improved design would be most welcome. I'd be very happy to have my concerns alleviated in a proof-of-concept sandbox. To the specific three issues; 1) I concur that <Location > should follow the hierarchical structure of http resources. /app matching /application was always wrong. <LocationMatch > can clearly preserve this behavior when it is desired. So Location should pick up the precise mechanics of Directory matching. Sortation, too can be very misleading in a world of Include'd sections; <LocationMatch > again can preserve the arbitrary merging behavior when desired. 2) Merge caching API is absolutely fair game for improvement, but only to the degree that it is an improvement. Recent API's don't seem to demonstrate effective functional design and segregation. Looking forward to a solid design proposal before this moves forward. 3) Predicated on the above, premerging is a lovely idea in theory and non-trivial in practice. An effective design would facilitate premerges of groups (say for example, <Directory />, <Directory /foo> and <Directory /foo/bar) but back off as intervening merges occur; for example, some other <If > section is triggered within the <Directory /foo> scope. The gains might be less than you are expecting, but even combining the current config to the merged pair of <Directory /> to <Directory /foo> could save cycles, if the search logic were sufficiently optimal. 4) The hashtable is likely to be less efficient than a tree structure. The number of hash constructions alone will be suboptimal; take a look at the hashes applied in socache, which completely fall over on ASCII input. I am very concerned that sufficient review occurs for such massive reorganization, it took several years for the last thorough pass at optimization was fully inspected small flaws were resolved. As Nick points out, this aspect of the design is critical to httpd's operation. Consider this a pre -1 until enough eyes have asserted that they have reviewed such a sandbox and declared it an improvement. Biting off all four at once will probably be a good way to attract insufficient review of the changes. So one approach might be; 1) refactor <Directory > to demonstrate that trees or hashes are more efficient than the current segments lookup. 2) improve the optimization API to make more efficient use of premerges. 3) refactor <Location > to approximate <Directory >, perhaps abstracting these both into common and reusable code. 4) demonstrate post-config premerges offer performance improvements without sacrificing significant startup penalties. That's only one way to dice up the problem, but I strongly suggest you dice it up in the manner that makes the most sense to you. In any case, it isn't suitable for presentation as a mongo-patch applied at trunk :) I remain concerned that recent API's aren't really reusable, e.g. mod_disk_cache's familiarity with the protocol. I'm strongly pondering a -1 to shipping 2.4.0 with mod_cache.h presented as a public API (in any shared include/ tree). So I hope you understand my hesitation about encouraging wholesale refactorings without seeing some proof of usability. That said, the config section logic needs love, and if someone is willing to offer that and demonstrate it in a sandbox, I will review it. In all likelyhood, my vote will be that this won't belong in 2.4 or 3.0 (the 'next' release) but in a subsequent major release. Whenever this is refactored, recycling the "server dir" concept into just an ordinary "dir" (a module may have more than one config section, of course, so per-server variables are never merged per-dir), with one API, premerging for virtual servers just as with the proposed premerging for location and directory, would be a really valuable simplification. Bill