On Tue, 10 Jul 2001, Brian Pane wrote: > 1. Add a 2nd create function for apr_hash_t that lets the caller > supply three callback functions: > - a hash function > - an equality-check function > - a concatenation function (if non-NULL, setting x=y followed by > x=y results in concat_fn(y,z) being stored in the table as the > value for x)
if you do concatenation this way you'll end up back in the land of O(n^2) DOS attacks which we fixed years ago... this is why the merging is done with those odd looking sort functions. (it's in the archives if you want to dig it all up, there were a lot of O(n^2) attacks :) see get_mime_headers and apr_table_overlap. i'm not sure it's appropriate to use generic interfaces with function pointer indirections when you're trying to make something go faster. if you want it go fast, then don't make it generic. make it specific. (aside on modern cpus: indirect function calls with multiple targets hurt... they hurt because the CPU has difficulty predicting where the branch/call target will be, so it has difficulty pre-fetching and decoding across the branch/call. this typically results in stalls in the pipeline -- the number of stalls depends on how many stages from the front of the pipeline the branch execute phase is. some CPUs do just fine if your indirect function call always has the same target -- but as soon as you add a second target, this predictor will break.) there's a lot of compiler theory which is appropriate here -- the HTTP header names themselves are very similar to keywords in a language. typically a tool such as gperf is used to map keywords to an enumerated type when writing (fast) parsers for languages. this would be a great thing for a webserver other than apache, apache can't use it directly because we're so general we allow people to modify the set of recognized/generated HTTP headers at run-time. i think ultimately what you want to be able to do is the following: - for all common HTTP headers we've got a compile-time perfect hash mapping keyword <-> header_enum - get_mime_headers uses that hash and stores the headers in a structure mapping header_enum -> header value - r->headers_{in,out,err} manipulations use the integer values rather than the string names (this allows table lookups simply by comparing integers rather than strings) - an additional mechanism exists to add to the enum at run-time the latter part is a pain in the ass. (but it's pretty typical of apache perf problems :) initially at least, you might as well do the dynamic part using the existing tables... so you've got a hybrid structure which supports quick lookups for the core headers, and the same old same old for the rest. do this just to find out how much of a perf benefit you're getting from the perfect hash. (hell just ignore the dynamic header crud for perf testing.) but you'll probably get resistance to putting this into the server for good... 'cause if a module uses a "non-standard" header "BlahBlah" and then rfc3942 HTTP/1.2 makes BlahBlah standard, and it's moved into the perfect hash, then the module would be broken. (i've got other ideas, but it's senseless to think about them until some sort of perf improvement is shown.) -dean