On Wed, Feb 26, 2003 at 04:36:04PM +1100, Stas Bekman wrote:
The problem:
ithreads-enabled Perl cannot share nested (Perl/C) datastructures. If it could
the issue of dbi-pool would be a no-brainer since all we needed to do
is $dbh ||= connect();
(There's also tied magic to consider. A DBI handle is more than a nested datastructure. But you know that already.)
That was my first solution which didn't work. I was trying to share the inner handle as is.
Final (semi-working) solution:
I'm really glad you're doing this Stas, and your timing is good as I'm actively working on the DBI again now.
Great!
Before we dig into the fine implementation details I'd like to review the high-level concepts first to make sure we're "on the same page".
We definitely are. The only difference between your following description and my current implementation is that I was trying to provide a temporary external solution which doesn't mess with DBI directly. And once it's working it can be integrated into DBI's API. Notice that mine works (to a degree of "works") without changing a single bit in DBI ;)
as discussed with Tim and Hugo back at TPC (Jul 2002), the only possible solution is to share only the driver's private part of dbh.
A DBI handle is made up of four parts:
The outer handle (ref to tied hash). The inner handle (ref to plain hash holding attribute cache). The 'implementors data' (attached by 'magic' to the inner hash. The 'DBI common data' at the start of the 'implementors data'.
The 'DBI common data' holds various flags (like RaiseError, AutoCommit etc), pointers (like the stash of the driver class), and other info like the current recursive call_depth and how many kids the handle has. And there's also bunch of pointers to certain attribute values that are just there for performance.
[Currently the DBI common data is embedded directly into the implementors data structure rather than being pointed to by a pointer in the implementors data structure. I've had that penciled in as something to change for a v2.x DBI. It would probably make your life simpler.]
The 'implementors data' (after the DBI common data at the start) holds data that's private to the driver, like pointers to the database API objects like connection handles etc.
That's exactly what I'm sharing. Though we have to agree that if the DBD chooses to include anything that has to do with Perl (SVs, references to my_perl, etc) this DBD can't participate in the world domination scheme. ;)
Now, back to the threads... as I recall, the idea we worked out was that when a new handle is being created it could told to copy/share the implementors data from some other handle instead of initializing it's own. Something like:
my $dbh2 => DBI->connect($dsn, $user, $pass, { CloneHandle => $dbh1 });
The effect would be that $dbh2 would be a completely new handle in all respects except that it would not actually have issued a database API connect call, it would have just copied the implementors data from $dbh1.
[In practice we may pass some other attribute name with some other value that isn't an actual $dbh but something extracted from the $dbh, possibly in another thread. Something like: my $internal_id = $dbh1->internal_id; # in thread 1 my $dbh2 => DBI->connect(..., { CloneInternal => $internal_id }); # in thread 2 But I wanted to keep the initial example simple. ]
As it happens, a very similar concept has already been implemented by Gerald Richter in DBD::Oracle 1.13 (not yet released, sadly). It works like this:
our $orashr : shared = '' ;
$dbh = DBI->connect($dsn, $user, $passwd, { ora_dbh_share => \$orashr }) ;
The first connect sees $orashr as false and so does a proper connection and then sets $orashr to a copy of the implementors data structure. Subsequent connects see $orashr set and initialise their own implementors data structure from $orashr.
That seems to work for active concurrent sharing across threads but may not fit well into a pool model (and may not be thread safe for some drivers).
That's pretty much what my code does. We use locking when borrowing from the pool and putting items back, so I don't see where this could be thread unsafe.
I think of a pool as something that stored things while they're not being used and can 'loan them out' to be used for a while before then being returned to the pool. While an item is 'out' it can't be used given out to any other requestor.
I think this is the model we spoke about at TPC. Where the pool holds database connections that are then loaned out for use by a request before being returned. If two threads both request connections at the same time then the pool will grow to have two connections.
Yes, but my current API is doing: if there are available connections in the pool, use them. Otherwise do a normal login and store. This is different from the generic pool concept, where you have a grow() function which creates new items when there are in shortage. But it works all the same.
In this model a connection is only ever used by a single thread at any one time. This makes it much safer and more widely useable across drivers because the underlying database API does not need to be thread safe *in it's handling of multiple threads using a single connection concurrently*. Oracle is, recent mysql might be, but I doubt many others are.
That's absolutely transparent to a driver. If $dbh ||= connect works with a DBD in question, the pool scheme should work too.
So, in this scenario we want to allow a handle to 'loan out' use of it's implementors data and for another handle to be created and initialised to use that 'borrowed' implementors data before finally returning it.
Lets look at the 'loan out' first. We need a method to get some value that represents (points to) the implementors data, lets call it the ID, and also puts the handle into a 'brain dead' state. For example: my $id = $h->borrow_id; And another method to say the implementors data is no longer being used elsewhere and clear the 'brain dead' flag. (The flag will be used to prevent the handle being used to do anything while it's brain dead.) For example: $h->restore_id;
Well, currently I simply identify the objects by their memory address. So when the object is borrowed it's moved from the free list to the busy list. When it's returned the busy list is searched, the matching item is removed and moved to the free list.
On the borrowers side we need to pass in the borrowed ID to the
connect() call so the driver can use it. For example:
my $dbh => DBI->connect($dsn, $user, $pass, { UseID => $id });
Nuh, it can be transparent for the user API. Just lock the pool, check whether there are free items, borrow if any or do a normal login and store otherwise. I don't think the top level connect() API should change at all, other than finding a way to turn the pooling on/off.
For safety and simplisity it would seem best to copy the implementors data structure and overwrite anything that needs overwriting.
Yes, but the problem is when $dbh is destroyed the private implementation is destroyed as well. So no matter if you copy it or not, if we have no way to prevent the destruction/cleanup of the implementators data, we are in trouble. That's one of the problems that I'm stumbled with.
I don't think anything else is essential to the design but I could easily have missed lots of issues.
There are few details to figure out, but they are irrelevant at this point. Once we will get the core working (not the API, but the safe storage/retrieval/reuse working) we can work out these things. YMMV of course, since you are looking at the big picture ;)
(It would be very nice if we could find some way to automatically restore the id if the thread that borrowed it died or exits without returning it.)
We could have a garbage collector. Simply store in the pool the thread id which has borrowed the item (inside the item). Then once in a while scan the busy list and free any items that we can't find their threads alive.
Nothing in the design requires the use of threads. The borrowing, using, and restoring of implementors data can be done between handles in the same thread or an unthreaded perl.
That's correct. The only issue here is to have NOOPs for locking and sharing, but this is trivial.
These are the open issues:
2. I need a support from DBI to help me access the *really* private data in struct imp_dbh_st, because the following is a hack:
D_imp_dbh(dbh); imp_dbh->mysql = ((imp_dbh_t *)imp_dbh_new)->mysql;
Perhaps, but why exactly are you calling it a hack?
because it calls ->mysql.
1. it's hardcoded DBD::mysql call. Which shouldn't be there.
2. since the DBD is free to have any kind of imp_dbh_st, it may have more than one record. In fact DBD::mysql has ->mysql and ->has_transactions. Which I didn't bother to update, but it could be 10 entries in some other driver.
Instead it should be:
UPDATE_PRIVATE_DATA(imp_dbh, imp_dbh_new);
or something like that and DBI will know what the private data is. So it should overwrite everything but the common ->com entry. Because that can't be re-used. At least because of the h_perl thing. I've tried to fix that, but didn't succeed, mainly because it includes perl data in it (SVs and such).
When I re-install the stored dbh,
What to you mean by 're-install' here?
see above.
I must not break the ->com structure, but overwrite the rest. So I guess the right approach is to copy away the original ->com, overwrite the whole imp_dbh and then copy back the original ->com. Also I'd prefer to store in the pool only the really private data. I guess all I need is to know the size of ->com struct with its sub-structs, preferrably at compile time.
If the \%attr containing { UseID => $id } is passed into DBI::_new_dbh(...) by the driver then the DBI can look after copying the given implementors data into the new handle's implementors data structure that it's setting up.
That's the method I'm after.
If it also sets a 'HAS_COPIED_ID' flag then all the driver has to do in its _login sub is check for the flag and, if set, skip almost all of the normal connection setup.
it's more than that. We need a way to tell the driver not to destroy its private data.
3. $dbh->DESTROY. Currently I had to: SvREFCNT_inc(dbh); so imp_dbh won't lose it's data when $dbh goes out of scope, I have tried copying it but wasn't very successful. Neither playing with DBIc_FLAGS(imp_dbh) helped, but that's probably because I'm not very familiar with DBI guts. You help is needed here.
I'm not sure what you're doing here ans it may not be relevant with the model I've outlined above.
it's very relevant ;) I had to do that to prevent the destruction of the private data when $dbh was going out of scope.
5. Finally, the most important issue is that if a thread logged in for real and created imp_dbh, it must not exit while other threads use the same data.
Must not exit because the other threads are *pointing* to it's implementors data rather than using a copy? (And if it exits the memory will be freed.)
something like that. It's not about pointing. It's about freeing.
thread A allocates imp_dbh, finishes with it and puts it into the pool.
now that it exits, chances are that the data in pool may go invalid. I haven't debugged this issue yet, but it seems like that's what's happening. What we need to do is to ensure that the data in the pool can survive the exit of threads that created that data in first place. I did try to copy the data but the result was the same.
First I want to figure out how to tell the driver not to destroy its private data when $dbh is destroyed. Then I'll be able to pursue this threads-exit issue.
Take a look at the code. It's pretty much trivial.
__________________________________________________________________ Stas Bekman JAm_pH ------> Just Another mod_perl Hacker http://stason.org/ mod_perl Guide ---> http://perl.apache.org mailto:[EMAIL PROTECTED] http://use.perl.org http://apacheweek.com http://modperlbook.org http://apache.org http://ticketmaster.com
