Hash tables and request headers Re: observations on fragmentation in SMS pools
Roy T. Fielding wrote: Another reason why apr_hash_t isn't a good match for HTTP headers is that the same field name may appear multiple times in the request or response headers (section 4.2 of RFC 2616), but the hash table implementation is designed around unique keys. HTTP headers were designed to be processed as a hash table, hence the rules regarding combining multiple fields with the same name. The only exception is Set-Cookie, because the folks at Netscape didn't follow the rules. Ah, I was wondering why RFC 2109 seemed at odds with 2616 on the subject of field-combining syntax. :-( If Set-Cookie is the only exception, it might not be a bad idea to use apr_hash_t for the request and response headers. I'm even willing to contribute a patch for this, but it would impact both APR (slightly) and Apache (a lot). Here's my proposed approach: 1. Add a 2nd create function for apr_hash_t that lets the caller supply three callback functions: - a hash function - an equality-check function - a concatenation function (if non-NULL, setting x=y followed by x=y results in concat_fn(y,z) being stored in the table as the value for x) 2. Change the headers_in and headers_out fields of the request_req from tables to hash tables. Supply case-insensitive hash and comparison functions for these hash tables. For headers_out, also supply a concatenation function that builds comma-separated lists (and semicolon-separated lists when the key is set-cookie) 3. Change all of the references to headers_in and headers_out in the core and standard modules to use the hash API instead of the table API. 4. Document the change so that maintainers of 3rd party modules know how to make their code compile once again. Thoughts? Is a change like this worth doing? --Brian
Re: Observations on fragmentation in SMS pools
On Mon, Jul 09, 2001 at 04:50:31PM -0700, [EMAIL PROTECTED] wrote: cool, huh? [and it's only 1024 LOC yes i know it's not portable like APR i was v.impressed that someone actually looked at it when i first mentioned it here, btw ] so *grin*. can you guarantee thread-safety on apr_hash_t using apr_hash_first()/next()/this()? can you guarantee complete traversal with multiple simultaneous adders, deleters, readers and writers? and does anyone want a challenge of porting tdb to apr? *grin* Challenge, did somebody say challenge? I'm always up for a challenge. :-) BTW: Why are tables in APR at all? The only thing I see used is headers in apr-util hooks code, and in the xml, but that of course can be fixed. Step 1, remove tables from APR, Step 2, remove tables from Apache. agh! tables are kinda... entrenched into xvl. okay, maybe not: only 10 instances of apr_table_make. 10 of apr_table_do [which is why i was advocating it: i really like it :)] Tables are in APR, because were originally moved from Apache to APR before APR-util existed. They should really move to apr-util. They should never be removed from Apache. Tables are useful because they garuantee a general order to the data, namely, the order you insert information into the table. Hashes have a different use case. Tables should definitely be moved to APR-util if they are to remain. As for Apache, there are better structures that dictate general order than the table. IMNSHO, the only reason tables are still in Apache is inertia. Nobody wants to go through and change everything to a more sane data structure. Case insensitive searches -- linear search time over the table ... ugh. -- Jon
table inefficiencies Re: Observations on fragmentation in SMS pools
Jon Travis wrote: [...] Tables should definitely be moved to APR-util if they are to remain. As for Apache, there are better structures that dictate general order than the table. IMNSHO, the only reason tables are still in Apache is inertia. Nobody wants to go through and change everything to a more sane data structure. Case insensitive searches -- linear search time over the table ... ugh. It's worth noting that half of the apr_table_get calls in Apache are from mod_mime. I posted a patch to new-httpd a couple of weeks ago that replaces mod_mime's tables with hash tables. Other frequent callers of linear-time table lookups: ap_set_byterange, ap_setup_client_block. --Brian
Re: table inefficiencies Re: Observations on fragmentation in SMS pools
[Is it just me or is it nearly impossible to have a conversation about Apache or APR that doesn't in some way belong on BOTH lists? sigh] On Mon, 9 Jul 2001, Brian Pane wrote: It's worth noting that half of the apr_table_get calls in Apache are from mod_mime. I posted a patch to new-httpd a couple of weeks ago that replaces mod_mime's tables with hash tables. Can you repost that to new-httpd? I'd forgotten about it but am definitely interested. Other frequent callers of linear-time table lookups: ap_set_byterange, ap_setup_client_block. Patches for those would be fine by me, as well. --Cliff -- Cliff Woolley [EMAIL PROTECTED] Charlottesville, VA
Re: Observations on fragmentation in SMS pools
On Mon, Jul 09, 2001 at 08:14:26PM -0700, Jon Travis wrote: Tables should definitely be moved to APR-util if they are to remain. As for Apache, there are better structures that dictate general order than the table. IMNSHO, the only reason tables are still in Apache is inertia. Nobody wants to go through and change everything to a more sane data structure. Case insensitive searches -- linear search time over the table ... ugh. One problem with hash tables is that if you want to retrieve a HTTP header, you now need to have it be the same case as it came in (unless you are normalizing every call). RFC 2616 states that (Section 4.2): Field names are case-insensitive. Therefore, if the client sends: CoNnEcTiOn: Keep-Alive it should respect that. And, this data structure is primarily used for storing HTTP header info (I'm not sure about mod_mime). I'm not aware of any browsers that do this, but I'd bet someone will because the RFC says you can. (It's possible I'm misinterpreting the RFC, wouldn't be the first time...) AFAICT, this doesn't call for a true hash table or a true array. It's something slightly different. Any data structure which replaces tables needs to be able to handle the case-insensitivity gracefully. Just want to make sure that any solution keeps us correct to the RFC. -- justin
Re: Observations on fragmentation in SMS pools
Justin Erenkrantz wrote: On Mon, Jul 09, 2001 at 08:14:26PM -0700, Jon Travis wrote: Tables should definitely be moved to APR-util if they are to remain. As for Apache, there are better structures that dictate general order than the table. IMNSHO, the only reason tables are still in Apache is inertia. Nobody wants to go through and change everything to a more sane data structure. Case insensitive searches -- linear search time over the table ... ugh. One problem with hash tables is that if you want to retrieve a HTTP header, you now need to have it be the same case as it came in (unless you are normalizing every call). RFC 2616 states that (Section 4.2): Field names are case-insensitive. Therefore, if the client sends: CoNnEcTiOn: Keep-Alive it should respect that. And, this data structure is primarily used for storing HTTP header info (I'm not sure about mod_mime). I'm not aware of any browsers that do this, but I'd bet someone will because the RFC says you can. (It's possible I'm misinterpreting the RFC, wouldn't be the first time...) AFAICT, this doesn't call for a true hash table or a true array. It's something slightly different. Any data structure which replaces tables needs to be able to handle the case-insensitivity gracefully. Just want to make sure that any solution keeps us correct to the RFC. -- justin Another reason why apr_hash_t isn't a good match for HTTP headers is that the same field name may appear multiple times in the request or response headers (section 4.2 of RFC 2616), but the hash table implementation is designed around unique keys. --Brian
data types on type of shm (Re: Observations on fragmentation in SMS pools)
On Mon, 9 Jul 2001, Luke Kenneth Casson Leighton wrote: HOWEVER! supporting the data types that apr_pool_xxx() USES is a different matter. shm can be at different memory locations in all processes. pointers don't work. you'd need to radically change the basic data types, which would affect performance and make the code ugly. so i think this is a bad idea. btw, if you want to a shm database/hash table done properly, take a look at sleepycat db. -dean
tables - hash (Re: Observations on fragmentation in SMS pools)
On Mon, 9 Jul 2001, Roy T. Fielding wrote: Tables are in APR, because were originally moved from Apache to APR before APR-util existed. They should really move to apr-util. They should never be removed from Apache. Tables are useful because they garuantee a general order to the data, namely, the order you insert information into the table. Hashes have a different use case. Ummm, no, tables do not guarantee that -- arrays do. Tables were specfically created to be an abstract hash table, but the implementation remained simple because we never used them for large tables. the r-headers_{in,out,err} tables are referenced so frequently that making them hash tables is a win... there was a russian fellow, dimitri? i forget his last name -- works on freebsd as well i think. he tried this and said it was a win. i did a hash implementation for tables, but it was a loss because the hash overhead on the zillion other small tables kills you. but if you just fix mod_mime (Brian's patch is a start :), and the headers_{in,out,err} then you get the biggest bang. (btw, headers_{in,out,err} should be turned into a perfect hash using gperf... and all the compile-time string constants such as Connection, Accept-Types, ... should become integers with #defines.) -dean
Re: Observations on fragmentation in SMS pools
... errr.. should i be using apr_hash_t or something? :) Yes. okay. will take a look at it, to see where i could use it. i have a suspicion that a lot of instances i really _need_ the case-insensitive stuff, and also need the dual set and add capability. not least because HTTP POST args can have this: key1=value1key1=value2 and, correct me if i'm wrong, here, that results in a table: (key1, value1) (key1, value2) luke
Re: Observations on fragmentation in SMS pools
On Sun, Jul 08, 2001 at 10:19:33AM -0700, Justin Erenkrantz wrote: On Sun, Jul 08, 2001 at 10:14:16AM -0700, dean gaudet wrote: an ideal situation for free-lists (blocks of freed, but not free()d memory) is one per cpu. a less ideal situation is one per thread. an even less ideal situation is one per process (which requires locking). it's insane to have one per pool :) I think we're shooting for #2 (less ideal). Unless someone can come up with a way to dynamically tell how many CPUs we are running on and bind a free-list to a specific CPU. We're currently doing #3 (even less ideal) in apr_pools.c. And, yeah, the current trivial SMS is doing #4. =) But, don't expect it to stay like that. And, if we implement the apr_sms_child_malloc, it gets to somewhere between #2 and #3. -- justin #5 the approach taken by tdb. have a hash-chain. then you get the best of all worlds. simultaneous access only requires that you lock *one* hash chain, which allows a theoretical limit of number-of-hash-bucket simultaneous accesses. luke p.s. i'm hoping that people will not be put off by tdb's GPL license because i think that there are some really good techniques used in there that could be applied to APR code, too.
Re: Observations on fragmentation in SMS pools
On Tue, Jul 10, 2001 at 04:52:25PM +0200, Luke Kenneth Casson Leighton wrote: ... errr.. should i be using apr_hash_t or something? :) Yes. okay. will take a look at it, to see where i could use it. i have a suspicion that a lot of instances i really _need_ the case-insensitive stuff, and also need the dual set and add capability. not least because HTTP POST args can have this: key1=value1key1=value2 and, correct me if i'm wrong, here, that results in a table: (key1, value1) (key1, value2) I'll reply to this one, but it also goes towards the other guys pointing to the spec. There isn't anything wrong with storing both the sensitive and then keying off of a toupper of the string, and chaining it off the end if the entry already exists in the table (for multiple headers of the same name). One could even take the existing table code, add a field for the toupper'd string, add a small adler32 (or whatever cksum is to your liking) field, and get a lot better performance than we see now. Of course I still am no big fan of the API. -- Jon
Re: Observations on fragmentation in SMS pools
Sander Striker wrote: I'll think out loud now: Me too :-) The solution might be adding specific allocation functions for SMS implementations. These functions could look something like this: APR_DECLARE(void *) apr_sms_child_malloc(apr_sms_t *sms, apr_sms_t *child, apr_size_t size); APR_DECLARE(apr_status_t) apr_sms_child_free(apr_sms_t *sms, apr_sms_t *child, void *mem); Internally the framework will use the child_malloc_fn if present. If not, it will fall back to malloc_fn. This is something that overthrows the KIS policy though... I think this approach (or a variant thereof) has a lot of promise. One way to look at the root cause of the sms-pool fragmentation and performance issues is that the notion of parent pool implies two things: 1. If P is the parent of S, S must cease to exist when P is destroyed. 2. If S needs more memory, it gets obtains it from P by calling the malloc method of P. It often doesn't make sense for these two roles to be combined in the same object. Corollary: It doesn't make sense for apr_sms_malloc(parent) to be the mechanism by which a child allocates additional space. For example, a pool created for a subrequest needs to go out of scope when its parent does, but it doesn't derive any real benefit from routing requests for additional blocks through its parent. We could get better performance if the role of the parent SMS were to supply a block source to the child. This block souce would be a pointer to an SMS that the child should call whenever it needed more memory. The block source could be the parent itself, or it could be any ancestor of the parent. In the case of a succession of sms_trivials stacked on top of an sms_std, each sms_trivial would supply the pointer to the sms_std as the block source for its child(ren). (The resulting setup would then look like the original pools implementation, in the sense that children would bypass their parents to get additional blocks. The difference is that the parent gets to decide what type of block source its children use--so if a stack of SMSs are supposed to use memory from a shared memory segment, each SMS in the stack can ensure that its descendents are using the shared mem block source rather than some malloc(3)-based block source.) Thoughts? Thanks, --Brian
Re: Observations on fragmentation in SMS pools
To be honest changing sms_trivial to use malloc instead of apr_sms_malloc is an easy move (just a few lines to change) so it's probably worth trying and then seeing what we get... BTW, I'm impressed by the amount of traffic this has generated. Exactly what I hoped it would do! I guess also I'm wondering if we should apply my patch. For all the reasons given it's not ideal, but it's a starting point, and if in 2 weeks it looks totally different, well, no bother, but at least it lets everyone work on it with a minimum of fuss... Opinions? 3 +1's and I'll commit while I'm in Pheonix (got a nice connection) david - Original Message - From: Brian Pane [EMAIL PROTECTED] To: Sander Striker [EMAIL PROTECTED] Cc: APR Development List dev@apr.apache.org Sent: Monday, July 09, 2001 1:46 AM Subject: Re: Observations on fragmentation in SMS pools Sander Striker wrote: I'll think out loud now: Me too :-) The solution might be adding specific allocation functions for SMS implementations. These functions could look something like this: APR_DECLARE(void *) apr_sms_child_malloc(apr_sms_t *sms, apr_sms_t *child, apr_size_t size); APR_DECLARE(apr_status_t) apr_sms_child_free(apr_sms_t *sms, apr_sms_t *child, void *mem); Internally the framework will use the child_malloc_fn if present. If not, it will fall back to malloc_fn. This is something that overthrows the KIS policy though... I think this approach (or a variant thereof) has a lot of promise. One way to look at the root cause of the sms-pool fragmentation and performance issues is that the notion of parent pool implies two things: 1. If P is the parent of S, S must cease to exist when P is destroyed. 2. If S needs more memory, it gets obtains it from P by calling the malloc method of P. It often doesn't make sense for these two roles to be combined in the same object. Corollary: It doesn't make sense for apr_sms_malloc(parent) to be the mechanism by which a child allocates additional space. For example, a pool created for a subrequest needs to go out of scope when its parent does, but it doesn't derive any real benefit from routing requests for additional blocks through its parent. We could get better performance if the role of the parent SMS were to supply a block source to the child. This block souce would be a pointer to an SMS that the child should call whenever it needed more memory. The block source could be the parent itself, or it could be any ancestor of the parent. In the case of a succession of sms_trivials stacked on top of an sms_std, each sms_trivial would supply the pointer to the sms_std as the block source for its child(ren). (The resulting setup would then look like the original pools implementation, in the sense that children would bypass their parents to get additional blocks. The difference is that the parent gets to decide what type of block source its children use--so if a stack of SMSs are supposed to use memory from a shared memory segment, each SMS in the stack can ensure that its descendents are using the shared mem block source rather than some malloc(3)-based block source.) Thoughts? Thanks, --Brian
RE: Observations on fragmentation in SMS pools
To be honest changing sms_trivial to use malloc instead of apr_sms_malloc is an easy move (just a few lines to change) so it's probably worth trying and then seeing what we get... Massive leakage :) Remember how the apr_sms_reset code works; it does not have to traverse and destroy all children since they all use the memory from that sms. BTW, I'm impressed by the amount of traffic this has generated. Exactly what I hoped it would do! I guess also I'm wondering if we should apply my patch. For all the reasons given it's not ideal, but it's a starting point, and if in 2 weeks it looks totally different, well, no bother, but at least it lets everyone work on it with a minimum of fuss... Opinions? 3 +1's and I'll commit while I'm in Pheonix (got a nice connection) Don't know if it counts, but +1 :) david Sander
Re: Observations on fragmentation in SMS pools
Sander Striker wrote: To be honest changing sms_trivial to use malloc instead of apr_sms_malloc is an easy move (just a few lines to change) so it's probably worth trying and then seeing what we get... Massive leakage :) Remember how the apr_sms_reset code works; it does not have to traverse and destroy all children since they all use the memory from that sms. I propose that it's better to not optimize for reset. With all the fragmentation I'm seeing in the httpd, the number of blocks managed by an SMS in the middle of a stack can be larger than the number of blocks that its descendants would have to manage (and free upon destruction) if they got memory from a block source themselves. Thus we'd likely have _less_ work to do upon reset if the SMS being reclaimed had to recurse through its descendants and ask them to return their blocks by themselves (because the total number of blocks being returned to SMSs further down the stack would be smaller). This problem would go away, of course, if the memory management strategy employed by the parent SMS were one that didn't cause so much fragmentation. But I don't see a way to accomplish that without slowing down allocation. --Brian
Re: Observations on fragmentation in SMS pools
regarding fragmentation: is it reasonable to assume that pre-allocating larger chunks of memory and sub-dividing them yourself will prevent memory fragmentation, with the side-effect of having More Code? use hash tables to optimise the list-walking? write a better sms instance, one that isn't 'trivial'? luke
Re: Observations on fragmentation in SMS pools
On Sun, Jul 08, 2001 at 10:56:59AM -0700, Jon Travis wrote: On Sun, Jul 08, 2001 at 10:41:12AM -0700, dean gaudet wrote: On Sun, 8 Jul 2001, Jon Travis wrote: There is still all this tomfoolery with locking, though, which I think would be nice to fix with different sized buckets in the freelist. Stuff that the malloc in glibc does. I cringe at the thought of how much overhead due to abstraction this whole project is causing. haha, the abstraction problems started AGES ago :) i think the first real sign of the doom ahead was the argument of changing ap_pool_t to ap_context_t because there was this arbitrary hash table attached to it. i'd suggest that before any decisions are made regarding global free list or per thread free list someone should study what the non-SMS, pools-only server behaves like. the problems you're looking at now in the SMS server are a result of SMS's feature of allocating all the way up through the ancestry adding bytes each step of the way. this _can_ be mitigated by doing your own memory-management on top of a more normal one. which is what sander's 'smart memory allocator' does. sma is targetted at having to have massive amounts of reallocs and mallocs and then freeing large numbers of them in one blast, repeat. luke
Re: Observations on fragmentation in SMS pools
On Mon, Jul 09, 2001 at 05:49:24PM +0200, Luke Kenneth Casson Leighton wrote: On Sun, Jul 08, 2001 at 11:06:33AM -0700, Justin Erenkrantz wrote: On Sun, Jul 08, 2001 at 11:04:09AM -0700, Jon Travis wrote: As for the ability to use shared memory for this ... yeah that is an interesting idea. What are your actual use cases for that? Ian posted a patch to do something like this - I think he wanted a hash table that was shared across all processes. The problem gets to be growing the memory space, but I think his use case was with fixed memory. He could probably tell more about how he wanted to do it. Both rbb and I suggested that this is what SMS was designed for. Have a SMS that is backed by shared memory and then the hashtable immediately becomes shared when you create it with this shared memory SMS. No change needed to the hashtable code. okay, had an idea here that i thought i'd share with you about shmem/pools/sms. lessons from tdb's design. accessing shmem, you really have to treat it as a very-fast-file-in-memory. therefore, best thing to do to be able to easily access sections of it: treat the shmem as an in-memory database. the simplest way to fit that into the existing APR code is to have direct support for apr_table_t in a pool-based or sms-based shared-memory-segment. if you examine tdb.c's design, you will notice that apr_table_do() becomes identical to [but more powerful than] tdb_traverse(). apr_array_header_t? again, i haven't thought about it, but you could likely get away with the same thing. The apr_table_t is just an apr_array_header_t with a specific key/val entry type. in other words, i think that supporting apr_pool_t on top of shared memory (via sms) is going to be tricky: in my opinion, the apr_pool_xxx() API _and_ the apr_sms_xxx() API can't Deal With shared memory at a convenient user-acceptable level. HOWEVER! supporting the data types that apr_pool_xxx() USES is a different matter. by supporting apr_array_header_t (if possible) and apr_table_t the complexity of messing about will be hidden from users of these data types, but they will work quite happily as expected, even if it's shared memory behind them. what people think? I think you are in for a ride on this one. The API for dealing with the tables is currently rather poor, since typically it is casted to an array and then iterated over. The strcasecmp's of the table code are a whole seperate issue. I would think that the (nearly analagous) apr_hash_t would better suit what you are talking about here. BTW: Why are tables in APR at all? The only thing I see used is headers in apr-util hooks code, and in the xml, but that of course can be fixed. Step 1, remove tables from APR, Step 2, remove tables from Apache. -- Jon
Re: Observations on fragmentation in SMS pools
On Sun, Jul 08, 2001 at 12:47:32AM -0700, Brian Pane wrote: I added some instrumentation to apr_sms_trivial_malloc to study in more detail where its bottlenecks were. As a result, I found a few interesting phenomena: 1. The good news is that allocations of small amounts of memory are very efficient. They almost always take the fastest path through the code, in which some available space is reserved from the sms-used_sentinel.prev block with a handful of pointer arithmetic operations. 2. The bad news is that allocations for larger blocks (in the =8KB range) typically require a call to the parent SMS to get data. On my test machine, I'm seeing elapsed times in the 30 microsecond range when this happens, compared to less than 1 microsecond for small allocations that don't require more memory from the parent SMS. And when an allocation falls through to the parent, it often seems to fall all the way through to the root SMS (I suspect that 30us includes a malloc). The problem seems to be particularly bad for things that create subrequests, like mod_include. 3. The worse news is that there seems to be lot of fragmentation. For example, I saw this pattern during a server-parsed request: - the application code requests 12296 bytes from a pool - not enough memory is available in the SMS, so it requests 16400 bytes from its parent SMS. - the parent SMS doesn't have enough free space either, so it requests 20504 bytes from the grandparent SMS. - the grandparent SMS doesn't have enough space either, but it has to iterate through 15 blocks its free list to figure that out. Each of these blocks has between 8176 and 12272 bytes available. - the grandparent calls through to the great-grandparent to get 24608 bytes. The great-grandparent doesn't have a block with that much free space, but it iterates through 9 blocks in its free list in search of one; all of these blocks had 16376 bytes free. - the great-grandparent thus requests 28712 bytes from the great-great grandparent. The great-great-grandparent doesn't have any blocks in its free list, so it calls through to its parent, which at last is an SMS that does a real malloc. This type of pattern may explain the reported higher memory use of the SMS-based httpd compared with the original pools; there's a lot of memory in those free lists that can't be used in this example. For an SMS that's going to be a parent of other SMSs, we'll need something with more sophisticated policies for reassigning freed space than the current trivial-SMS. Yup. I've brought this up to Sander and David before, but this is how pools work EXCEPT for the fact that we get a little more pathological in SMS's addition of memory because each parent tacks on memory (as you described). The optimization here is for the 4KB allocation. I'm not sure that is what we need to be optimizing for. More analysis is probably needed to see what our typical usage pattern is. This is where Ian's pool replay code is *very* helpful (although I don't like adding more #ifdef to the pool code - yuck - I'd like to see a SMS that just prints that info and passes it up to the parent to do what it needs to). Before turning SMS on in the mainline code by default, I think we need to come up with a more robust allocation algorithm. And, the whole point of SMS is that we can play with memory allocation algorithms until we are blue in the face and nothing else changes. You can't do that with the current pools code. We need more people playing with it and suggesting new allocation algorithms. =) The trivial SMS tries to keep the memory allocation strategy as close as possible to what we currently have. Personally, I'd set MIN_FREE to 0 and bite the small allocations (now they'd be slow, but really, how many allocations are under 4KB - data structures, perhaps?). I'd also like to see it smarter about reclaiming free space, but that gets really tricky. It's a tradeoff (and is purposeful for lots of small allocations), but until someone can write a better memory allocation algorithm, this is what we got. Trivial SMS has its Achilles heel here - this is just an aggrevation of how the original pool code works (the pool code would add 4KB, while each level of the SMS adds 4KB). Aaron pointed out that CLR (Cormen/Leiserson/Rivest for those that never took algorithms classes) has many memory allocation algorithms. I'd bet we could find one that would work better. -- justin
Re: Observations on fragmentation in SMS pools
Justin Erenkrantz wrote: Aaron pointed out that CLR (Cormen/Leiserson/Rivest for those that never took algorithms classes) has many memory allocation algorithms. I'd bet we could find one that would work better. -- justin At USENIX there was a talk about extending the slab allocator to multiple CPUs. One of the things in the talk was a thing called vmem, which is a general purpose resource allocator with low fragmentation and constant-time allocations (optionally - for more time you can get a better fit). Sounds pretty good, eh? http://www.usenix.org/publications/library/proceedings/usenix01/bonwick.html Cheers, Ben. -- http://www.apache-ssl.org/ben.html There is no limit to what a man can do or how far he can go if he doesn't mind who gets the credit. - Robert Woodruff
Re: Observations on fragmentation in SMS pools
1. The good news is that allocations of small amounts of memory are very efficient. They almost always take the fastest path through the code, in which some available space is reserved from the sms-used_sentinel.prev block with a handful of pointer arithmetic operations. Cool. 2. The bad news is that allocations for larger blocks (in the =8KB range) typically require a call to the parent SMS to get data. On my test machine, I'm seeing elapsed times in the 30 microsecond range when this happens, compared to less than 1 microsecond for small allocations that don't require more memory from the parent SMS. And when an allocation falls through to the parent, it often seems to fall all the way through to the root SMS (I suspect that 30us includes a malloc). The problem seems to be particularly bad for things that create subrequests, like mod_include. OK,not so cool. 3. The worse news is that there seems to be lot of fragmentation. For example, I saw this pattern during a server-parsed request: snip. For an SMS that's going to be a parent of other SMSs, we'll need something with more sophisticated policies for reassigning freed space than the current trivial-SMS. Definately. Yup. I've brought this up to Sander and David before, but this is how Really - first i've heard of it :( pools work EXCEPT for the fact that we get a little more pathological in SMS's addition of memory because each parent tacks on memory (as you described). The optimization here is for the 4KB allocation. I'm not sure that is what we need to be optimizing for. More analysis is probably needed to see what our typical usage pattern is. This is where Ian's pool replay code is *very* helpful (although I don't like adding more #ifdef to the pool code - yuck - I'd like to see a SMS that just prints that info and passes it up to the parent to do what it needs to). Yep, i had a evry simple debug sms that simply printed out what it was doing, then did it. Maybe it's time to revive it, but it generates a LOT of output. Before turning SMS on in the mainline code by default, I think we need to come up with a more robust allocation algorithm. And, the whole point of SMS is that we can play with memory allocation algorithms until we are blue in the face and nothing else changes. You can't do that with the current pools code. We need more people playing with it and suggesting new allocation algorithms. =) Absolutely. This was my motivation in doing the work in the first place, and it seems to be working. The trivial SMS tries to keep the memory allocation strategy as close as possible to what we currently have. Oh, we're nowhere near ready to turn it on by default. I never thought we were, but you need a test case to prove/disprove what's going on, and that what this whole pools as sms code gives us, a complex example that we can use to improve our code :) Personally, I'd set MIN_FREE to 0 and bite the small allocations (now they'd be slow, but really, how many allocations are under 4KB - data structures, perhaps?). I'd also like to see it smarter about reclaiming free space, but that gets really tricky. Oh, if you look at it we do a LOT of allocations below 4K from pools. Turn on the checking of every allocation and look at the sizes being requested. It's a tradeoff (and is purposeful for lots of small allocations), but until someone can write a better memory allocation algorithm, this is what we got. Trivial SMS has its Achilles heel here - this is just an aggrevation of how the original pool code works (the pool code would add 4KB, while each level of the SMS adds 4KB). Well, we have what we have. My main concern about trivial is that the code gets complex, and im worried that we may be over complicating things for ourselves. KISS is aguiding principal that it never does any harm to follow :) Aaron pointed out that CLR (Cormen/Leiserson/Rivest for those that never took algorithms classes) has many memory allocation algorithms. I'd bet we could find one that would work better. -- justin The sms code was added on May 9th, which looks like less than 2 months ago, so to be honest we're in pretty damn good shape. Let's take some time, do the testing and define the problems then start looking at fixes. Rushing in with a lot of quick solutions won't get us anywhere but a world of hurt. david
RE: Observations on fragmentation in SMS pools
1. The good news is that allocations of small amounts of memory are very efficient. They almost always take the fastest path through the code, in which some available space is reserved from the sms-used_sentinel.prev block with a handful of pointer arithmetic operations. Cool. This is what the pools code did. I only used a different way to store the required accounting information. 2. The bad news is that allocations for larger blocks (in the =8KB range) typically require a call to the parent SMS to get data. On my test machine, I'm seeing elapsed times in the 30 microsecond range when this happens, compared to less than 1 microsecond for small allocations that don't require more memory from the parent SMS. And when an allocation falls through to the parent, it often seems to fall all the way through to the root SMS (I suspect that 30us includes a malloc). The problem seems to be particularly bad for things that create subrequests, like mod_include. OK, not so cool. No, a first optimization might be doing some simple defines in the sms_private.h file which eliminates the overhead of the framework code when apr_sms_malloc is called from inside an sms module. The defines would look something like this: #define apr_sms_malloc(sms, size) sms-malloc_fn(sms, size) #define apr_sms_calloc(sms, size) sms-calloc_fn(sms, size) I'm afraid we can't do the free_fn because it might be NULL. This would get rid of a bit of overhead, but it isn't going to make an enormous amount of difference. 3. The worse news is that there seems to be lot of fragmentation. For example, I saw this pattern during a server-parsed request: snip. For an SMS that's going to be a parent of other SMSs, we'll need something with more sophisticated policies for reassigning freed space than the current trivial-SMS. Definately. Yup. I've brought this up to Sander and David before, but this is how Really - first i've heard of it :( Yes, Justin brought something like this up with me when he reviewed the trivial code. We didn't discuss this usage pattern though. Ofcourse it is very logical that it happens... I'll think out loud now: The solution might be adding specific allocation functions for SMS implementations. These functions could look something like this: APR_DECLARE(void *) apr_sms_child_malloc(apr_sms_t *sms, apr_sms_t *child, apr_size_t size); APR_DECLARE(apr_status_t) apr_sms_child_free(apr_sms_t *sms, apr_sms_t *child, void *mem); Internally the framework will use the child_malloc_fn if present. If not, it will fall back to malloc_fn. This is something that overthrows the KIS policy though... pools work EXCEPT for the fact that we get a little more pathological in SMS's addition of memory because each parent tacks on memory (as you described). The optimization here is for the 4KB allocation. I'm not sure that is what we need to be optimizing for. More analysis is probably needed to see what our typical usage pattern is. This is where Ian's pool replay code is *very* helpful (although I don't like adding more #ifdef to the pool code - yuck - I'd like to see a SMS that just prints that info and passes it up to the parent to do what it needs to). I want to state here that POOLS_ARE_SMS effectively makes each pool a trivial sms. To take full advantage of the SMS code the stack of SMSs should be optimized. Right now we have this (in httpd with POOLS_ARE_SMS): std trivial trivial [ trivial ...] ^^ APR httpd Justin already hinted in a private mail that he was thinking about something like this: ... trivial tracking [ tracking ...] In the buckets code blocks could be a very good choice. So, for a first pass at allowing httpd to run with pools as sms is not optimal, but it gets things working. Yep, i had a evry simple debug sms that simply printed out what it was doing, then did it. Maybe it's time to revive it, but it generates a LOT of output. Another option is just using the debug code you put in and tag the relevant SMSs. This way you can see the usage patterns in each part of the application. Before turning SMS on in the mainline code by default, I think we need to come up with a more robust allocation algorithm. And, the whole point of SMS is that we can play with memory allocation algorithms until we are blue in the face and nothing else changes. You can't do that with the current pools code. We need more people playing with it and suggesting new allocation algorithms. =) Absolutely. This was my motivation in doing the work in the first place, and it seems to be working. Yes :) I've been working on a power of 2 allocator. I still haven't got that up to speed
Re: Observations on fragmentation in SMS pools
I've been working on a power of 2 allocator. I still haven't got that up to speed yet, but it is worth a look. I'll play with it this week and post it to the list I hacked a simple power of two allocator together as a proof of concept that replacing malloc/free with apr_malloc/apr_free could give us a performance boost (it does). Find it here (it is for Windows). http://wstoddard.com/patches/apr_mem_alloc.tar There are several simple changes to this code that could make it much closer to something that could be committed. Bill
Re: Observations on fragmentation in SMS pools
On Sun, 8 Jul 2001, Ian Holsman wrote: Bill Stoddard wrote: I've been working on a power of 2 allocator. I still haven't got that up to speed yet, but it is worth a look. I'll play with it this week and post it to the list I hacked a simple power of two allocator together as a proof of concept that replacing malloc/free with apr_malloc/apr_free could give us a performance boost (it does). Find it here (it is for Windows). Hi Bill, I was wondering why the brigade code doesn't make use of pools/sms to handle it's memory. It has a pool passed to it from what I can see... Pools aren't used for brigades because when a pool is cleaned up, the memory goes away. This means that whenever we are doing a pipelined request, and we allocate the brigade out of the request_rec, we would have to copy all of the data into the conn_rec before we could destroy the request. If we allocate everything out of the conn_rec, then we have a huge resource leak until the end of the connection. For a complete description of why we don't use pools for brigades, please take a look at the mailing list archives from the time period around when the brigades were introduced. This question was brought up MANY times, and it has been answered in great detail. Ryan _ Ryan Bloom [EMAIL PROTECTED] Covalent Technologies [EMAIL PROTECTED] -
Re: Observations on fragmentation in SMS pools
On Sun, 8 Jul 2001, Ian Holsman wrote: I was wondering why the brigade code doesn't make use of pools/sms to handle it's memory. It has a pool passed to it from what I can see... The brigade is allocated out of a pool, and that pool is used to register a cleanup on the brigade to make sure all of the buckets in that brigade get destroyed. The buckets themselves use malloc(). The buckets do not and cannot use pools because the lifetimes could get to be all wrong pretty easily, and it could be a _major_ resource leak on big requests or pipelined connections. I'm in the process of switching them to use apr_sms_malloc(). I've got a 3-phase plan for making that switchover and have written phase 1, which I just need to test and commit. The three phases go something like this (I might have mentioned this on the list before... I can't remember whether I did or not, so here it is again): Phase 1: s/malloc/apr_sms_malloc/g; s/free/apr_sms_free/g; Add in a global apr_sms_std instance in the buckets code that all buckets will use to get their memory from. Add an apr_sms_t* to the apr_bucket struct so that the apr_bucket_foo_make() functions know which SMS to use when allocating memory for private structures. Set that pointer to reference the global apr_sms_std instance in all of the apr_bucket_foo_create() functions for now. Phase 2: Add an apr_sms_t parameter to all apr_bucket_foo_create() functions that lets the caller determine which SMS to use. For now, they can all just pass in a pointer to the global apr_sms_std instance. Phase 3: Create a per-thread blocks SMS in the Apache MPM's and stash a pointer to it in the conn_rec for each request. Use that instead of the global apr_sms_std when creating all buckets and get rid of the global apr_sms_std. --Cliff -- Cliff Woolley [EMAIL PROTECTED] Charlottesville, VA
Re: Observations on fragmentation in SMS pools
On Sun, 8 Jul 2001, Justin Erenkrantz wrote: Yup. I've brought this up to Sander and David before, but this is how pools woah. no way really? that's not at all how it was in 1.3 or in early 2.0 ... in 2.0 as of uh a year ago say, there was one free list per process, and locks were used to access it. no matter where in the tree of pools you tried an allocation, if it didn't fit into the current block, the allocator would lock and go to the global free block list and pick up the first fit block. none of this going up through a chain of pools or anything, that's insane. It's a tradeoff (and is purposeful for lots of small allocations), but until someone can write a better memory allocation algorithm, this is what we got. 1.3's memory allocation algorithm kick ass, to put it mildly. i've not read the apr-dev archives as to the Why and the Goals of SMS ... -dean
Re: Observations on fragmentation in SMS pools
On Sun, Jul 08, 2001 at 10:14:16AM -0700, dean gaudet wrote: an ideal situation for free-lists (blocks of freed, but not free()d memory) is one per cpu. a less ideal situation is one per thread. an even less ideal situation is one per process (which requires locking). it's insane to have one per pool :) I think we're shooting for #2 (less ideal). Unless someone can come up with a way to dynamically tell how many CPUs we are running on and bind a free-list to a specific CPU. We're currently doing #3 (even less ideal) in apr_pools.c. And, yeah, the current trivial SMS is doing #4. =) But, don't expect it to stay like that. And, if we implement the apr_sms_child_malloc, it gets to somewhere between #2 and #3. -- justin
Re: Observations on fragmentation in SMS pools
On Sun, Jul 08, 2001 at 10:18:12AM -0700, dean gaudet wrote: On Sun, 8 Jul 2001, Justin Erenkrantz wrote: Yup. I've brought this up to Sander and David before, but this is how pools woah. no way really? that's not at all how it was in 1.3 or in early 2.0 ... in 2.0 as of uh a year ago say, there was one free list per process, and locks were used to access it. no matter where in the tree of pools you tried an allocation, if it didn't fit into the current block, the allocator would lock and go to the global free block list and pick up the first fit block. none of this going up through a chain of pools or anything, that's insane. No, nothing changed. We just added the chain of pools. =) It's a work in progress. You may call it a feature right now. We're not saying it is ready to be activated - just hammered/examined by people on this list. That definitely needs to be cleaned up. We know that. If we switch that global free block list to per-thread (where we are heading), is this allocation algorithm the best one for httpd-style memory usage? If so, then I guess I won't look at other allocation algorithms. I'll defer to people who've studied this a lot longer than I have. -- justin
Re: Observations on fragmentation in SMS pools
On Sun, Jul 08, 2001 at 10:34:22AM -0700, Jon Travis wrote: There is still all this tomfoolery with locking, though, which I think would be nice to fix with different sized buckets in the freelist. Stuff that the malloc in glibc does. I cringe at the thought of how much overhead due to abstraction this whole project is causing. Well, most of the tomfoolery goes away when the free-list is per thread. No need for locking then. I think that this gets us better performance in a threaded environment than the current apr_pools.c - which has a per-process global list. The key is the allocation algorithm needs to take that into account (at this point, trivial isn't). In order to truly support a threaded environment, the free-lists need to become per-thread. With this abstraction, that *is* much easier. We're not there yet, but we're laying the groundwork to do so. -- justin
RE: Observations on fragmentation in SMS pools
Stuff that the malloc in glibc does. I cringe at the thought of how much overhead due to abstraction this whole project is causing. The other thing this abstraction buys us is the ability to pass in different memory to standard APR operations. Hopefully we can get shared memory to work in this fashion. I know for sure that we can do mlock()'d memory (which might be usefull for security type calculation stuff you don't want to go to swap space). Sander
Re: Observations on fragmentation in SMS pools
On Sun, Jul 08, 2001 at 10:41:12AM -0700, dean gaudet wrote: also -- it would be worthwhile for someone to try a version of pools which keeps no freelist, and uses malloc()/free() on each block. i'm guessing this is faster in a multithreaded server than us doing a per-process global list because malloc() itself already has to deal with multithreading, and has all sorts of other optimisations (i.e. it's not just a first-fit allocator). and as an added bonus it'd be worth trying www.hoard.org's malloc replacement. Switch the apr_sms_pools.c to call apr_sms_tracking_create instead of apr_sms_trivial_create. (You need the tracking because you will have to free the memory, so you have to at least keep track of what you have allocated.) Also, I did try having the pools use malloc/free directly (ALLOC_USE_MALLOC) and the performance was dreadful. At least on Solaris. -- justin
Re: Observations on fragmentation in SMS pools
On Sun, Jul 08, 2001 at 11:04:09AM -0700, Jon Travis wrote: As for the ability to use shared memory for this ... yeah that is an interesting idea. What are your actual use cases for that? Ian posted a patch to do something like this - I think he wanted a hash table that was shared across all processes. The problem gets to be growing the memory space, but I think his use case was with fixed memory. He could probably tell more about how he wanted to do it. Both rbb and I suggested that this is what SMS was designed for. Have a SMS that is backed by shared memory and then the hashtable immediately becomes shared when you create it with this shared memory SMS. No change needed to the hashtable code. And, potentially, the scoreboard could (??) get switched to something using a shared memory SMS. But, I don't know enough about the scoreboard to say more about that. -- justin
Re: Observations on fragmentation in SMS pools
On Sun, 8 Jul 2001, Jon Travis wrote: I talked to rbb about this not too long ago, and he told me that you did a lot of performance work, and that is why pools use their own allocation scheme (8k blocks, freelists, etc.) instead of using a total malloc() oriented scheme. yes and no. i can't say that i ever compared an implemention of apache using malloc/free to one just using palloc ... it would have been insane to try to go about adding free() calls everywhere they would have been required to do that. (and you can sit back and do some math and figure out that it would be slower by virtue of having to track many more pointers, and double the function call overhead of each allocation.) so given the restriction that i was implementing palloc -- an allocator with only a group free operation of all allocated elements, then what i did was to tune the apache-1.3 (maybe it was 1.2 at the time) code. the palloc stuff goes back to before apache 1.0, and i'm pretty sure it's rst's code (but you should ask the real old timers, not me, i came on board around 1.2). there were definitely a few bugs in both the alloc.c code, and in the way apache used it. i don't remember the details (but i'm responsible for the mess o' ifdefs of debugging code). another thing you need to consider is that this was a process-per-connection server -- and the global free list made a lot of sense. but i don't remember actually timing it against using malloc/free directly for the blocks. i think that would have sucked on many of the unixes at the time -- some of which it was standard practice to use gnumalloc instead of libc malloc to get a perf improvement. it's important to remember that malloc/free haven't stood still, and the technology in there has improved. You are right that libc malloc() implementations are very fast, can deal with threading issues more elegantly than we could in APR, and have other niceties (different sized buckets, etc.). There is currently a ton of locking in the pool implementation to deal with the threads (and we are actually missing a lock in one needed location, I believe). you know, i'd be tempted to say get rid of the free list even in the process-per-connection servers. it simplifies the code, which is way better for maintenance. and it's probably a toss up today as to which is faster. (it might be slower on ancient unixes, but, um, i fail to see the problem ;) -dean
Re: Observations on fragmentation in SMS pools
On Sun, 8 Jul 2001, Justin Erenkrantz wrote: Also, I did try having the pools use malloc/free directly (ALLOC_USE_MALLOC) and the performance was dreadful. At least on Solaris. -- justin yes, ALLOC_USE_MALLOC is dreadful -- that's not what i meant. what i mean is something like the below patch, which i haven't even tried to compile or test ;) (the patch needs work to rescue some of the debugging code in new_block). it removes block_freelist and uses malloc/free for blocks. this is much less expensive than malloc/free on each allocation (amortization). -dean Index: apr_pools.c === RCS file: /home/cvs/apr/memory/unix/apr_pools.c,v retrieving revision 1.100 diff -u -r1.100 apr_pools.c --- apr_pools.c 2001/07/07 22:23:54 1.100 +++ apr_pools.c 2001/07/08 18:17:36 @@ -248,7 +248,6 @@ /* * Static cells for managing our internal synchronisation. */ -static union block_hdr *block_freelist = NULL; #if APR_HAS_THREADS static apr_lock_t *alloc_mutex; @@ -417,120 +416,20 @@ static void free_blocks(union block_hdr *blok) { -#ifdef ALLOC_USE_MALLOC union block_hdr *next; for ( ; blok; blok = next) { next = blok-h.next; DO_FREE(blok); } -#else /* ALLOC_USE_MALLOC */ - -#ifdef ALLOC_STATS -unsigned num_blocks; -#endif /* ALLOC_STATS */ - -/* - * First, put new blocks at the head of the free list --- - * we'll eventually bash the 'next' pointer of the last block - * in the chain to point to the free blocks we already had. - */ - -union block_hdr *old_free_list; - -if (blok == NULL) { - return; /* Sanity check --- freeing empty pool? */ -} - -#if APR_HAS_THREADS -if (alloc_mutex) { -apr_lock_acquire(alloc_mutex); -} -#endif -old_free_list = block_freelist; -block_freelist = blok; - -/* - * Next, adjust first_avail pointers of each block --- have to do it - * sooner or later, and it simplifies the search in new_block to do it - * now. - */ - -#ifdef ALLOC_STATS -num_blocks = 1; -#endif /* ALLOC_STATS */ - -while (blok-h.next != NULL) { - -#ifdef ALLOC_STATS - ++num_blocks; -#endif /* ALLOC_STATS */ - - chk_on_blk_list(blok, old_free_list); - blok-h.first_avail = (char *) (blok + 1); - debug_fill(blok-h.first_avail, blok-h.endp - blok-h.first_avail); -#ifdef APR_POOL_DEBUG - blok-h.owning_pool = FREE_POOL; -#endif /* APR_POOL_DEBUG */ - blok = blok-h.next; -} - -chk_on_blk_list(blok, old_free_list); -blok-h.first_avail = (char *) (blok + 1); -debug_fill(blok-h.first_avail, blok-h.endp - blok-h.first_avail); -#ifdef APR_POOL_DEBUG -blok-h.owning_pool = FREE_POOL; -#endif /* APR_POOL_DEBUG */ - -/* Finally, reset next pointer to get the old free blocks back */ - -blok-h.next = old_free_list; - -#ifdef ALLOC_STATS -if (num_blocks max_blocks_in_one_free) { - max_blocks_in_one_free = num_blocks; -} -++num_free_blocks_calls; -num_blocks_freed += num_blocks; -#endif /* ALLOC_STATS */ - -#if APR_HAS_THREADS -if (alloc_mutex) { -apr_lock_release(alloc_mutex); -} -#endif /* APR_HAS_THREADS */ -#endif /* ALLOC_USE_MALLOC */ } /* - * Get a new block, from our own free list if possible, from the system - * if necessary. Must be called with alarms blocked. + * get a new block from the system */ static union block_hdr *new_block(apr_size_t min_size, apr_abortfunc_t abortfunc) { -union block_hdr **lastptr = block_freelist; -union block_hdr *blok = block_freelist; - -/* First, see if we have anything of the required size - * on the free list... - */ - -while (blok != NULL) { - if ((apr_ssize_t)min_size + BLOCK_MINFREE = blok-h.endp - blok-h.first_avail) { - *lastptr = blok-h.next; - blok-h.next = NULL; - debug_verify_filled(blok-h.first_avail, blok-h.endp, - [new_block] Ouch! Someone trounced a block - on the free list!\n); - return blok; - } - else { - lastptr = blok-h.next; - blok = blok-h.next; - } -} - -/* Nope. */ +union block_hdr *blok; min_size += BLOCK_MINFREE; blok = malloc_block((min_size BLOCK_MINALLOC) @@ -586,7 +485,6 @@ union block_hdr *blok; apr_pool_t *new_pool; - #if APR_HAS_THREADS if (alloc_mutex) { apr_lock_acquire(alloc_mutex); @@ -960,7 +858,7 @@ APR_DECLARE(apr_size_t) apr_pool_free_blocks_num_bytes(void) { -return bytes_in_block_list(block_freelist); +return 0; } /* the unix linker defines this symbol as the last byte + 1 of @@ -1255,13 +1153,7 @@ cur_len = strp - blok-h.first_avail; /* must try another blok */ -#if APR_HAS_THREADS -apr_lock_acquire(alloc_mutex); -#endif nblok = new_block(2 * cur_len, NULL); -#if
Re: Observations on fragmentation in SMS pools
dean gaudet wrote: On Sun, 8 Jul 2001, Justin Erenkrantz wrote: Also, I did try having the pools use malloc/free directly (ALLOC_USE_MALLOC) and the performance was dreadful. At least on Solaris. -- justin yes, ALLOC_USE_MALLOC is dreadful -- that's not what i meant. what i mean is something like the below patch, which i haven't even tried to compile or test ;) (the patch needs work to rescue some of the debugging code in new_block). it removes block_freelist and uses malloc/free for blocks. this is much less expensive than malloc/free on each allocation (amortization). -dean If you can wait till monday, and you are interested, I'll put it through the ringer on a 8-CPU Sun4500 ..Ian Index: apr_pools.c === RCS file: /home/cvs/apr/memory/unix/apr_pools.c,v retrieving revision 1.100 diff -u -r1.100 apr_pools.c --- apr_pools.c 2001/07/07 22:23:54 1.100 +++ apr_pools.c 2001/07/08 18:17:36 @@ -248,7 +248,6 @@ /* * Static cells for managing our internal synchronisation. */ -static union block_hdr *block_freelist = NULL; #if APR_HAS_THREADS static apr_lock_t *alloc_mutex; @@ -417,120 +416,20 @@ static void free_blocks(union block_hdr *blok) { -#ifdef ALLOC_USE_MALLOC union block_hdr *next; for ( ; blok; blok = next) { next = blok-h.next; DO_FREE(blok); } -#else /* ALLOC_USE_MALLOC */ - -#ifdef ALLOC_STATS -unsigned num_blocks; -#endif /* ALLOC_STATS */ - -/* - * First, put new blocks at the head of the free list --- - * we'll eventually bash the 'next' pointer of the last block - * in the chain to point to the free blocks we already had. - */ - -union block_hdr *old_free_list; - -if (blok == NULL) { - return; /* Sanity check --- freeing empty pool? */ -} - -#if APR_HAS_THREADS -if (alloc_mutex) { -apr_lock_acquire(alloc_mutex); -} -#endif -old_free_list = block_freelist; -block_freelist = blok; - -/* - * Next, adjust first_avail pointers of each block --- have to do it - * sooner or later, and it simplifies the search in new_block to do it - * now. - */ - -#ifdef ALLOC_STATS -num_blocks = 1; -#endif /* ALLOC_STATS */ - -while (blok-h.next != NULL) { - -#ifdef ALLOC_STATS - ++num_blocks; -#endif /* ALLOC_STATS */ - - chk_on_blk_list(blok, old_free_list); - blok-h.first_avail = (char *) (blok + 1); - debug_fill(blok-h.first_avail, blok-h.endp - blok-h.first_avail); -#ifdef APR_POOL_DEBUG - blok-h.owning_pool = FREE_POOL; -#endif /* APR_POOL_DEBUG */ - blok = blok-h.next; -} - -chk_on_blk_list(blok, old_free_list); -blok-h.first_avail = (char *) (blok + 1); -debug_fill(blok-h.first_avail, blok-h.endp - blok-h.first_avail); -#ifdef APR_POOL_DEBUG -blok-h.owning_pool = FREE_POOL; -#endif /* APR_POOL_DEBUG */ - -/* Finally, reset next pointer to get the old free blocks back */ - -blok-h.next = old_free_list; - -#ifdef ALLOC_STATS -if (num_blocks max_blocks_in_one_free) { - max_blocks_in_one_free = num_blocks; -} -++num_free_blocks_calls; -num_blocks_freed += num_blocks; -#endif /* ALLOC_STATS */ - -#if APR_HAS_THREADS -if (alloc_mutex) { -apr_lock_release(alloc_mutex); -} -#endif /* APR_HAS_THREADS */ -#endif /* ALLOC_USE_MALLOC */ } /* - * Get a new block, from our own free list if possible, from the system - * if necessary. Must be called with alarms blocked. + * get a new block from the system */ static union block_hdr *new_block(apr_size_t min_size, apr_abortfunc_t abortfunc) { -union block_hdr **lastptr = block_freelist; -union block_hdr *blok = block_freelist; - -/* First, see if we have anything of the required size - * on the free list... - */ - -while (blok != NULL) { - if ((apr_ssize_t)min_size + BLOCK_MINFREE = blok-h.endp - blok-h.first_avail) { - *lastptr = blok-h.next; - blok-h.next = NULL; - debug_verify_filled(blok-h.first_avail, blok-h.endp, - [new_block] Ouch! Someone trounced a block - on the free list!\n); - return blok; - } - else { - lastptr = blok-h.next; - blok = blok-h.next; - } -} - -/* Nope. */ +union block_hdr *blok; min_size += BLOCK_MINFREE; blok = malloc_block((min_size BLOCK_MINALLOC) @@ -586,7 +485,6 @@ union block_hdr *blok; apr_pool_t *new_pool; - #if APR_HAS_THREADS if (alloc_mutex) { apr_lock_acquire(alloc_mutex); @@ -960,7 +858,7 @@ APR_DECLARE(apr_size_t) apr_pool_free_blocks_num_bytes(void) { -return bytes_in_block_list(block_freelist); +return 0; } /* the unix linker defines this symbol as the last byte + 1 of @@ -1255,13 +1153,7 @@ cur_len = strp - blok-h.first_avail; /* must try another blok */ -#if APR_HAS_THREADS -
Re: Observations on fragmentation in SMS pools
On Sun, 8 Jul 2001, Ian Holsman wrote: dean gaudet wrote: it removes block_freelist and uses malloc/free for blocks. this is much less expensive than malloc/free on each allocation (amortization). If you can wait till monday, and you are interested, I'll put it through the ringer on a 8-CPU Sun4500 oh i'm in no rush, i'm being a lazy ass and not even testing these things i'm posting, so thanks for helping :) also, you would definitely want to try the www.hoard.org allocator on a (solaris) machine with that many cpus. -dean
Re: Observations on fragmentation in SMS pools
On Sun, 8 Jul 2001, Brian Pane wrote: dean gaudet wrote: On Sun, 8 Jul 2001, Ian Holsman wrote: dean gaudet wrote: it removes block_freelist and uses malloc/free for blocks. this is much less expensive than malloc/free on each allocation (amortization). If you can wait till monday, and you are interested, I'll put it through the ringer on a 8-CPU Sun4500 oh i'm in no rush, i'm being a lazy ass and not even testing these things i'm posting, so thanks for helping :) also, you would definitely want to try the www.hoard.org allocator on a (solaris) machine with that many cpus. If I remember correctly, he's tried the Hoard allocator on a server that size with ALLOC_USE_MALLOC, and it performed much better than the global free list. But Hoard is LGPL. :-( i would totally believe that the ALLOC_USE_MALLOC code performs better with hoard on a server that size... because the ALLOC_USE_MALLOC code doesn't need to use a mutex ever. the global freelist stuff is just a bad idea on a box with 8 cpus, the mutex kills ya. but with the elimination of the mutex we should scale better even just using regular malloc. i wasn't advocating bundling hoard, it's firmly my opinion that each libc vendor should be doing their damndest to provide the best malloc possible. (especially because each libc vendor tends to also be a libpthread vendor and they know the dirty details of making the two work well together). i've no qualms with putting try hoard into the how do i perf tune apache? page. it wouldn't be a requirement... -dean
Re: Observations on fragmentation in SMS pools
Jon Travis wrote: On Sun, Jul 08, 2001 at 11:27:08AM -0700, Ian Holsman wrote: Justin Erenkrantz wrote: On Sun, Jul 08, 2001 at 11:04:09AM -0700, Jon Travis wrote: [snip] I'm still curious about the use case. I'm sure there is one, if you could just enlighten me. I already know it's 'cool'. for every page request which comes in, we need to figure out what app is serving it, and which category it belongs to (eg... it is a price comparision page for a mobile phone), this information is used by other modules to determine different things. (eg... we store info on the category tree so we know that a mobile phone category is part of electronics. so we plug parts of the URI into a hash table and store info about it for each different category. now.. the reason we are going with shared memory on this is that there are ~10,000 (soon to be 60,000) categories on our site, and we also have other info we store so it blows out to be ~8M-64M of memory being used just to store this stuff we 'could' have used a gdbm/db but I think that direct shared memory would be faster than a file based DB shoved into memory cache. The other unique application of shared memory that we plan to do soon is having a state table stored in shared mem to implement a faster setenvIf module. ..Ian (ps.. if you can suggest other methods of doing the above, I would be greatfull for the ideas) -- Jon
Re: Observations on fragmentation in SMS pools
dean gaudet wrote: On Sun, 8 Jul 2001, Ben Laurie wrote: Justin Erenkrantz wrote: Aaron pointed out that CLR (Cormen/Leiserson/Rivest for those that never took algorithms classes) has many memory allocation algorithms. I'd bet we could find one that would work better. -- justin At USENIX there was a talk about extending the slab allocator to multiple CPUs. One of the things in the talk was a thing called vmem, which is a general purpose resource allocator with low fragmentation and constant-time allocations (optionally - for more time you can get a better fit). Sounds pretty good, eh? http://www.usenix.org/publications/library/proceedings/usenix01/bonwick.html linux 2.4 uses a slab allocator for many critical data structures -- in particular the networking code uses it. this is one of the tricks they used betwen 2.2 and 2.4 to increase the scalability to large numbers of CPUs. their allocator does have per-cpu caching, but i don't remember all the details. I should be clear: this gadged goes underneath the slab allocator. That is, the slab allocator uses memory allocated by vmem. Apparently, this is because slab allocators always defer the question of how to get big lumps of memory, and vmem is what it is proposed it should defer it to. BTW, the obvious thing to do to cure the problem described in the first place is to not add extra memory after the first step up the tree. Cheers, Ben. -- http://www.apache-ssl.org/ben.html There is no limit to what a man can do or how far he can go if he doesn't mind who gets the credit. - Robert Woodruff