Re: bucket free lists (was Re: [PATCH] buckets sms phase 2)

Ryan Bloom 24 Sep 2001 00:30:09 -0000

On Sunday 23 September 2001 05:09 pm, Cliff Woolley wrote:
> On Sun, 23 Sep 2001, Ryan Bloom wrote:
> > Mostly good.  I did notice a few ommissions that will be necessary.
> >
> > apr_status_t apr_bucket_list_init(int num_threads)
> >
> > This function creates the static array.  We need to tell them how many
> > threads will be creating pools.
>
> We'll also need the apr_bucket_alloc(int list) and apr_bucket_free
> functions, whatever they're called.  I left all these out for now because
> I just wanted to be sure that the other part was what you had had in mind.


OK.

> > I assume we'll be removing the apr_bucket_*create_in functions once
> > the code is finished.
>
> We can.  But if we're going to do that then let's forget all this #define
> business and just go ahead and change the API.  That saves us changing the
> names to _create_in and then changing them back.

Sounds good to me.

> > I am trying to decide if we should pass in a pool or an integer. I
> > think we need to at least associate a pool with each array slot.
> > That way, the buckets will always be allocated out of a pool, so that
> > they are guaranteed to be cleaned up properly.  The reason to not pass
> > a pool each time, is that we want to thread pool to be the pool
> > associated with array slot, not the request pool.
> >
> > So, I think we will also need:
> >
> > apr_status_t apr_bucket_list_pool(int num, apr_pool_t *p)
> >
> > Feel free to change that name.
>
> I've been debating this whole allocate-out-of-the-thread-pool thing.  If
> some request manages to generate 8192 1-byte buckets before freeing any of
> them, then we have forced the amount of allocated memory for that thread
> to be roughly 8192*80 ~= 640KB and to stay that way for as long as the
> thread lives.  I don't think that's what we want.  Perhaps if we created a
> subpool for each slot and allocated the buckets out of those?  We could
> then clean up the pool at certain intervals [once a connection, once every
> ten connections, it doesn't matter] and start over again... of course that
> would require some other API function to do it
> (apr_bucket_list_cleanup(int num) or something like that).  What do you
> think?
>
> Alternatively, of course (and I'm kind of leaning this way), we could just
> keep using malloc and basically just use the logic from the (former)
> blocks SMS, which after all was written expressly with the buckets in mind
> anyway.  It behaves a lot like pools except it gives you the free-list
> capabilities and already has the cleanup feature built in.  I know you
> really want everything to use the same allocator, of course...

Well, my initial thought was that we should allocate out of the pool.  However,
having thought about this more, it really doesn't matter.  We can very safely
allocate using malloc.  We should never call free though.  This basically works
on the same prinicple as the pools.  If a module decides to allocate 10 Meg
into the connection pool, then the connection pool for that thread is always 
10 Meg, because we clear that pool, but we never destroy it.  If a module wants
to implement buckets stupidely, then the module needs to be fixed.  We don't 
need
to design around that bug though.  Since we should only be free'ing when the 
thread
is dying, we should never free the buckets.

Now, thinking this through even more, there is an advantage to allocating out
of the thread pool.  The buckets are automatically free'd when the pool (thread)
goes away.  We also get the advantage that in perchild, when the thread dies,
all of the buckets for that thread are free'd immediately, without any special
logic.

> > I am not sure about using the value from apr_thread_current to do this.
> > That function just returns a tid from the OS, so it unlikely to be really
> > useful.  The core should just use the thread id, which can be retrieved
> > cleanly from the conn_rec->id field, using AP_CHILD_THREAD_FROM_ID.
>
> Okay, you've convinced me that we don't need this _list_find() thing.
> Therefore we also don't need the _create_in() stuff.  We can just expect
> the caller to pass us a number unique to that thread.  We can always give
> them the hint that apr_os_thread_current() is one way to create a number
> unique to each thread in the documentation, of course, though we should
> also mention that they probably just want to find that unique number once
> and cache its value somewhere.  Single-threaded apps can of course just
> use 0 or some other constant; it's better to let the caller decide that
> than for us to do it and base the decision on whether APR_HAS_THREADS or
> not.  For instance, in prefork we could always use a constant number, but
> apr-util has no way to know that prefork won't ever have buckets in more
> than one thread since APR is still threaded.

exactly.

> > That will be faster, because we don't need to call into the thread libs,
> > and it is more likely to work cleanly.
>
> Yep.  My intention (even when we were talking about using sms's) was
> always that this 'key' in whatever form it takes would get stashed in the
> conn_rec.  It's convenient that the id is already in there... though you
> can't quite use AP_CHILD_THREAD_FROM_ID, because that macro returns two
> numbers separated by a comma.  Just the second half of it is
> what we want (that's c->id%HARD_THREAD_LIMIT for mpmt MPM's, c->id for
> spmt MPM's, and 0 for prefork), but we'd have to make a new macro to get
> just the second half.

Or, we could just use the c->id % size_of_array.  Because we are going to
always create the array with the HARD_THREAD_LIMIT, we will be safe.
We will need to seriously document that, but conceptually it will work for non-
httpd programs, assuming they don't do something screwy.

> My only question is this: if we use a static array, then we expect the
> caller to pass us an index basically; a number between 0 and num_threads.
> Fine.  But what if num_threads isn't constant?  What if in some
> application it's hard to determine the index from what information you do
> have (thread ID) for some reason?  Something in the back of my mind tells
> me we might have to use a hash table instead of a static array so that we
> can account for cases like this, even though a hash table would be less
> efficient.  The extra benefit would be that the key the caller gives us
> can really be any value at all, as long as that value is unique to that
> thread.  They don't have to worry about shoehorning the value into the
> range 0..num_threads.  I fear that if we use a static array we're tying
> this too closely to httpd, and more than that to current MPM's for httpd.
> Thoughts?

By definition, every thread in an MPM can be identified by a unique ID.  If we
have a dynamic thread MPM (perchild), then we create the array with 
HARD_THREAD_LIMIT slices.  I don't think this is a problem, because the c->id
must be unique across processes, so as long as we key off of that value, we are
safe. In order to do that, we will waste a little bit of space by always 
allocating
HARD_THREAD_LIMIT slices in the array.

> PS: Why is conn_rec->id a long and not an APR type?

No reason to use an APR type.  We just a number that can store max_child_proc *
max_threads.  We could most likely use an int and be perfectly safe.

I think I answered all of the issues above.  If you have any questions, let me 
know, I'll be on-line most of the evening.

Ryan

______________________________________________________________
Ryan Bloom                              [EMAIL PROTECTED]
Covalent Technologies                   [EMAIL PROTECTED]
--------------------------------------------------------------

Re: bucket free lists (was Re: [PATCH] buckets sms phase 2)

Reply via email to