slotmem API notes

Chris Darroch Wed, 13 May 2009 21:36:27 -0700

Hi --

  Well, I should be working on mod_fcgid but I wanted to first
follow up on a long-ago promise to review the slotmem API by developing
a "test harness" framework and a slotmem provider that worked with
ZooKeeper as an experiment.


  To that end, mod_shmap now offers a simple "RESTful" interface to the
slotmem API, and mod_slotmem_zk and mod_zk together provide data storage
using ZooKeeper.  See http://people.apache.org/~chrisd/projects/shared_map/
and http://people.apache.org/~chrisd/projects/zookeeper/ for the
gory details.  (Note that the current version of mod_sharedmem in
trunk won't fully work with mod_shmap because of the work-in-progress
on the inuse flag array.  A slightly older version such as r773977
should work OK.)


  I have a few thoughts and concerns about the slotmem API as it
stands.  I'll try to divvy them up into sections below.  First, some
ideas about how to make the existing interface cleaner.

  Remove mod_slotmem (server/slotmem.c) and the associated ap_slotmem_*()
"wrapper" functions.  They do nothing that I can see except add an extra
layer of complexity.  I had no problems using the standard provider
interface to the slotmem providers.  Why would anyone not just do this?

  The wrappers also add confusion because the meanings of common
terms are changed.  For example, providers become "methods" (i.e.,
"storage methods"); the familiar ap_list_provider_names() becomes
ap_slotmem_methods().  But "methods" are what I expect the provider
to provide.  I suggest removing or renaming away any references
to "storage methods" in favour of conventional providers, instances, etc.

  So, define AP_SLOTMEM_PROVIDER_GROUP as "slotmem" and
AP_SLOTMEM_PROVIDER_VERSION as "0".  Remove AP_SLOTMEM_STORAGE.

  Rename mod_sharedmem to mod_slotmem_shm and mod_plainmem to
mod_slotmem_plain.

  Rename ap_slotmem_storage_method to ap_slotmem_provider_t.

  Rename ap_slotmem_t to ap_slotmem_instance_t.

  Remove the "slotmem_" prefix from method names (i.e., the names
of the functions provided by a provider).  If I've got a slotmem
provider, it's just extra typing to have to call
slotmem->slotmem_create() instead of slotmem->create().


  Next, there are several problems I see with the apslotmem_type
enum at the moment.  For one thing, there's only one enumerated value,
SLOTMEM_PERSIST, so there's no way to specify "don't persist".
I hacked around that by passing 1 in mod_shmap.

  Also, if we retain this type, I'd suggest renaming it to a
slightly more conventional looking ap_slotmem_type_t.

  But my personal preference would be to remove it and instead
add an unsigned int flags field to ap_slotmem_provider_t, and
define some flag values, such as AP_SLOTMEM_FLAG_NOTMPSAFE = 0x0001
(read "not MP-safe") and AP_SLOTMEM_FLAG_PERSIST = 0x0002.

  While enumerated types are clean if you've got a long list of
mutually exclusive values, a flags field follows the existing socache
framework, and allows for a number of mutually non-exclusive flags to
be set.  This is actually closer to what I suspect future development
might bring.


  The mention of the socache API brings me to my next set of
points, which proceed from the experience of programming with both
APIs and from a strong feeling that the two have a great deal in
common.  That's where I'm starting from -- specifically a sense
that the two APIs should mirror each other closely.  (Obviously
not everyone might agree with that.)

  To that end, having a AP_SLOTMEM_FLAG_NOTMPSAFE flag would
make the two APIs align on the issue of determining when a caller
needs to take special care in a multi-processing context.

  It would also allow us to remove the global mutex from
mod_sharedmem.  That would simplify maintenance a good deal.

  It would also clear up some things I find odd at the moment;
for instance, why is there locking in the do() method but not
the put() method?  (I'm also not 100% sure the locking in the
create() method will be very valuable.  For example, when
APR_USE_SHMGET is true, apr_shm_remove() will call shmctl()
to mark the segment for destruction once all processes have
detacted from it.  So locking around apr_shm_remove() and
apr_shm_create() doesn't help the case where another process,
which has already attached to the old segment, just keeps on
using that old segment after the lock is released.)

  At any rate, moving responsibility for locking up to the
caller level, as the socache API does, I think makes a lot of
sense.  It means that a caller running in a single-process,
single-threaded context can simply choose not to add the overhead
of a global lock.  Other callers can make their own decisions
about what kind of locking they need.


  Getting to the end now ... next up, a few comments on the
provider methods and their arguments.

  It would be great to pass server_rec* and apr_pool_t* arguments
to all the methods (except maybe num_slots() and slot_size()).
Some providers may need to report error messages, and to do that
they really need a server_rec*.  They may also need to allocate
data from a temporary pool.  The most obvious use case is that
r->pool would be passed to a method like get() or put().

  I'd also love to see create() split into two methods, create()
and init().  The create() method is then responsible for allocating
the ap_slotmem_instance_t structure and filling in the arguments;
it's also assumed that it will do any argument checks that are
required.  To this end, the typical use case is to call it in
the *first* invocation of the post_config hook, and pass the
pconf memory pool.  Note that this implies that create() takes
two apr_pool_t arguments, one for allocating "permanent" structures
which will last the lifetime of the instance, and one for temporary
allocations.  One would normally pass pconf and ptemp from the
post_config hook.

  The init() method then does all the work of actually setting
up the provider's data storage.  The typical use case would
be to call this from the *second* or subsequent invocations of
the post_config hook.  This is how the mod_socache_shmcb provider
functions and like mod_sharedmem is allocates shared memory segments.

  I'd suggest removing both the mem() and attach() methods.

  The mem() method assumes that all providers will return a
slab of memory, and that writes to this memory are equivalent to
a put().  That's not going to be possible unless all providers
deal with memory -- in which case, the only two possible providers
that I can see are the "plain" and shared memory providers we have,
and then the purpose of a whole API and provider framework is lost
on me.  The mod_slotmem_zk provider I wrote, for example, just has
to return APR_ENOTIMPL for mem() because there's really nothing else
it can do.  Similarly any provider which uses a DB file or similar
backing store isn't going to be able to make mem() available.
Better, I think, to drop the method since it's so deeply tied to
an in-memory provider model.

  The attach() method is similarly tied to an in-memory provider
model, and in fact, even more tied to the notion of shared memory,
I think.  From the code in mod_plainmem and mod_sharedmem, moreover,
it looks to me like just calling create() will have virtually the
same effect as attach() -- if a existing segment is available with
the same name, it'll be found and used.  If we really need the
ability to call create() but have it stop short of actually creating
a new segment if no pre-existing one is found, let's add an argument
to create() that specifies this behaviour.  But my guess would be
that there will be few uses for attach() that aren't satisfied by
create() as it stands now.

  The ap_slotmem_callback_fn_t function type should, I think, be
modified not to take a void *mem argument in its signature, but
instead a slot id (i.e., an unsigned int).  Either that or we should
remove the do() method.  This is again because the existing design
assumes that all providers can provide a slab of memory which
the function can operate on.  (I suppose there is a third option
here, in which non-in-memory providers implement do() by allocating
a temporary slab of memory and then, for each in-use slot, performing
a get(), the callback function, and a put().  This obviously would
require an apr_pool_t to be passed to do(), but I'd recommend that
anyway.)

  I'd also suggest putting num_items and item_size elements
into the ap_slotmem_provider_t structure, and then we can remove
the num_items() and item_size() "getter" methods.  That seems
equally clean and slightly simpler to me.

  I'd add a destroy() method that tears down an instance, and
expect callers to use it appropriately.  That means the caller
likely needs to register a cleanup function whenever a successful
create() is performed on a slotmem provider.  This cleanup function
then should call destroy() on the returned slotmem instance.  The
advantage here is that the providers don't need to register anything,
which simplifies their code, and it gives the caller full control
over when the slotmem instance should be destroyed.

  Last but not least on the method front, I'd suggest adding
set() and reset() methods.  The set() method would work roughly like
memset() -- that is, it would accept a byte value and write that
into every byte in the slot.

  The reset() method would do the same as set() with a zero
byte value (i.e., blank the memory) but would also "release"
the slot.  The proposed grab() and return() methods would be
unnecessary.  A provider which wanted to track which slots were
"in use" would simply set the in-use flag when put(), set(), or
get() was called, and reset the flag when reset() was called.


  Finally, just in terms of code simplicity, I noticed that
the mod_plainmem and mod_sharedmem modules both have loops like this:

       if (next) {
           for (;;) {
               ...
               if (!next->next)
                   break;
               next = next->next;
           }
       }

which I think might be reduced to:

       while (next) {
           ...
           next = next->next;
       }

  I'd also be inclined to remove the sharedmem_getstorage(),
sharedmem_initgpool(), and sharedmem_initialize_cleanup() functions,
and just in-line their contents in the places they are used.
All are used just one time each that I can see.

Chris.

--
GPG Key ID: 366A375B
GPG Key Fingerprint: 485E 5041 17E1 E2BB C263  E4DE C8E3 FA36 366A 375B

slotmem API notes

Reply via email to