On 10/23/2010 10:48 AM, Graham Leggett wrote:
> Hi all,
> 
> I am looking to improve per-request performance when you have an environment 
> with many
> (thousands of) Location sections, and looking at ap_location_walk(), it looks 
> like it is
> currently doing a simple linear search of locations on every request, which 
> is far from
> ideal.
> 
> Ideally, I'd like to create an index of the location sec_dir and sec_url 
> array, by
> creating a tree of hashtables that will allow us to drill down to the 
> relevant locations
> as quick as possible. This will also allow us to pre-merge many of the 
> configurations at
> startup, instead of just at runtime (obviously because .htaccess is parsed at 
> runtime we
> won't completely eliminate merging, but we will minimise it).
> 
> If we're careful about what pools we use on startup, we can also free a lot 
> of the memory
> used in the initial configuration scan, keeping a copy of the merged config, 
> instead of a
> copy of each location's configs.

There are four factors here; 1) location matching, 2) merge caching, 3) 
premerging,
and 4) path lookups.  You've conflated these in a way that really concerns me.

The requisite changes and improvements to merge caching and introduction of 
pre-merging
have me very concerned, because of the current state of other recently 
'improved' code,
including cache, proxy and even ldap abstractions.  I am not hopeful for a 
usable API
to all of this, although it both very much needed and an improved design would 
be most
welcome.  I'd be very happy to have my concerns alleviated in a proof-of-concept
sandbox.

To the specific three issues;

 1) I concur that <Location > should follow the hierarchical structure of http
    resources.  /app matching /application was always wrong.  <LocationMatch >
    can clearly preserve this behavior when it is desired.  So Location should
    pick up the precise mechanics of Directory matching.  Sortation, too can be
    very misleading in a world of Include'd sections; <LocationMatch > again
    can preserve the arbitrary merging behavior when desired.

 2) Merge caching API is absolutely fair game for improvement, but only to the
    degree that it is an improvement.  Recent API's don't seem to demonstrate
    effective functional design and segregation.  Looking forward to a solid
    design proposal before this moves forward.

 3) Predicated on the above, premerging is a lovely idea in theory and 
non-trivial
    in practice.  An effective design would facilitate premerges of groups (say
    for example, <Directory />, <Directory /foo> and <Directory /foo/bar) but
    back off as intervening merges occur; for example, some other <If > section
    is triggered within the <Directory /foo> scope.  The gains might be less 
than
    you are expecting, but even combining the current config to the merged pair
    of <Directory /> to <Directory /foo> could save cycles, if the search logic
    were sufficiently optimal.

 4) The hashtable is likely to be less efficient than a tree structure.  The
    number of hash constructions alone will be suboptimal; take a look at the
    hashes applied in socache, which completely fall over on ASCII input.

I am very concerned that sufficient review occurs for such massive 
reorganization,
it took several years for the last thorough pass at optimization was fully
inspected small flaws were resolved.  As Nick points out, this aspect of the 
design
is critical to httpd's operation.  Consider this a pre -1 until enough eyes have
asserted that they have reviewed such a sandbox and declared it an improvement.

Biting off all four at once will probably be a good way to attract insufficient
review of the changes.  So one approach might be;

  1) refactor <Directory > to demonstrate that trees or hashes are more 
efficient
     than the current segments lookup.

  2) improve the optimization API to make more efficient use of premerges.

  3) refactor <Location > to approximate <Directory >, perhaps abstracting these
     both into common and reusable code.

  4) demonstrate post-config premerges offer performance improvements without
     sacrificing significant startup penalties.

That's only one way to dice up the problem, but I strongly suggest you dice it 
up
in the manner that makes the most sense to you.  In any case, it isn't suitable
for presentation as a mongo-patch applied at trunk :)

I remain concerned that recent API's aren't really reusable, e.g. 
mod_disk_cache's
familiarity with the protocol.  I'm strongly pondering a -1 to shipping 2.4.0 
with
mod_cache.h presented as a public API (in any shared include/ tree).  So I hope 
you
understand my hesitation about encouraging wholesale refactorings without seeing
some proof of usability.  That said, the config section logic needs love, and if
someone is willing to offer that and demonstrate it in a sandbox, I will review 
it.
In all likelyhood, my vote will be that this won't belong in 2.4 or 3.0 (the 
'next'
release) but in a subsequent major release.

Whenever this is refactored, recycling the "server dir" concept into just an
ordinary "dir" (a module may have more than one config section, of course, so
per-server variables are never merged per-dir), with one API, premerging for
virtual servers just as with the proposed premerging for location and directory,
would be a really valuable simplification.

Bill

Reply via email to