On Wed, Feb 26, 2003 at 04:36:04PM +1100, Stas Bekman wrote:
> The problem:
> 
> ithreads-enabled Perl cannot share nested (Perl/C) datastructures. If it 
> could
> the issue of dbi-pool would be a no-brainer since all we needed to do
> is $dbh ||= connect();

(There's also tied magic to consider. A DBI handle is more than a
nested datastructure. But you know that already.)

> Final (semi-working) solution:

I'm really glad you're doing this Stas, and your timing is good as I'm
actively working on the DBI again now.

Before we dig into the fine implementation details I'd like to
review the high-level concepts first to make sure we're "on the
same page".

> as discussed with Tim and Hugo back at TPC (Jul 2002), the only
> possible solution is to share only the driver's private part of dbh.

A DBI handle is made up of four parts:

  The outer handle (ref to tied hash).
  The inner handle (ref to plain hash holding attribute cache).
  The 'implementors data' (attached by 'magic' to the inner hash.
  The 'DBI common data' at the start of the 'implementors data'.

The 'DBI common data' holds various flags (like RaiseError, AutoCommit
etc), pointers (like the stash of the driver class), and other info
like the current recursive call_depth and how many kids the handle has.
And there's also bunch of pointers to certain attribute values that
are just there for performance.

[Currently the DBI common data is embedded directly into the
implementors data structure rather than being pointed to by a pointer
in the implementors data structure.  I've had that penciled in as
something to change for a v2.x DBI. It would probably make your
life simpler.]

The 'implementors data' (after the DBI common data at the start) holds
data that's private to the driver, like pointers to the database
API objects like connection handles etc.

Now, back to the threads... as I recall, the idea we worked out was
that when a new handle is being created it could told to copy/share
the implementors data from some other handle instead of initializing
it's own. Something like:

  my $dbh2 => DBI->connect($dsn, $user, $pass, { CloneHandle => $dbh1 });

The effect would be that $dbh2 would be a completely new handle in
all respects except that it would not actually have issued a database
API connect call, it would have just copied the implementors data
from $dbh1.

[In practice we may pass some other attribute name with some other
value that isn't an actual $dbh but something extracted from the
$dbh, possibly in another thread. Something like:
  my $internal_id = $dbh1->internal_id; # in thread 1
  my $dbh2 => DBI->connect(..., { CloneInternal => $internal_id }); # in thread 2
But I wanted to keep the initial example simple.
]

As it happens, a very similar concept has already been implemented
by Gerald Richter in DBD::Oracle 1.13 (not yet released, sadly).
It works like this:

  our $orashr : shared = '' ;
 
  $dbh = DBI->connect($dsn, $user, $passwd, { ora_dbh_share => \$orashr }) ;

The first connect sees $orashr as false and so does a proper connection
and then sets $orashr to a copy of the implementors data structure.
Subsequent connects see $orashr set and initialise their own
implementors data structure from $orashr.

That seems to work for active concurrent sharing across threads but
may not fit well into a pool model (and may not be thread safe for
some drivers).

I think of a pool as something that stored things while they're
not being used and can 'loan them out' to be used for a while before
then being returned to the pool. While an item is 'out' it can't be
used given out to any other requestor.

I think this is the model we spoke about at TPC. Where the pool
holds database connections that are then loaned out for use by a
request before being returned. If two threads both request connections
at the same time then the pool will grow to have two connections.

In this model a connection is only ever used by a single thread at
any one time. This makes it much safer and more widely useable across
drivers because the underlying database API does not need to be
thread safe *in it's handling of multiple threads using a single
connection concurrently*. Oracle is, recent mysql might be, but I
doubt many others are.

So, in this scenario we want to allow a handle to 'loan out'
use of it's implementors data and for another handle to be created
and initialised to use that 'borrowed' implementors data before
finally returning it.

Lets look at the 'loan out' first. We need a method to get some value
that represents (points to) the implementors data, lets call it the ID,
and also puts the handle into a 'brain dead' state. For example:
        my $id = $h->borrow_id;
And another method to say the implementors data is no longer being
used elsewhere and clear the 'brain dead' flag. (The flag will be
used to prevent the handle being used to do anything while it's
brain dead.) For example:
        $h->restore_id;

On the borrowers side we need to pass in the borrowed ID to the
connect() call so the driver can use it. For example:
  my $dbh => DBI->connect($dsn, $user, $pass, { UseID => $id });

For safety and simplisity it would seem best to copy the implementors
data structure and overwrite anything that needs overwriting.

I don't think anything else is essential to the design but I could
easily have missed lots of issues.

(It would be very nice if we could find some way to automatically
restore the id if the thread that borrowed it died or exits
without returning it.)

Nothing in the design requires the use of threads.  The borrowing,
using, and restoring of implementors data can be done between handles
in the same thread or an unthreaded perl.


> These are the open issues:
> 
> 2. I need a support from DBI to help me access the *really* private
>    data in struct imp_dbh_st, because the following is a hack:
> 
>     D_imp_dbh(dbh);
>     imp_dbh->mysql = ((imp_dbh_t *)imp_dbh_new)->mysql;

Perhaps, but why exactly are you calling it a hack?

>    When I re-install the stored dbh,

What to you mean by 're-install' here?

>    I must not break the ->com
>    structure, but overwrite the rest. So I guess the right approach is
>    to copy away the original ->com, overwrite the whole imp_dbh and
>    then copy back the original ->com. Also I'd prefer to store in the
>    pool only the really private data. I guess all I need is to know
>    the size of ->com struct with its sub-structs, preferrably at
>    compile time.

If the \%attr containing { UseID => $id } is passed into DBI::_new_dbh(...)
by the driver then the DBI can look after copying the given implementors
data into the new handle's implementors data structure that it's setting up.
If it also sets a 'HAS_COPIED_ID' flag then all the driver has to
do in its _login sub is check for the flag and, if set, skip almost all
of the normal connection setup.


> 3. $dbh->DESTROY. Currently I had to:
>     SvREFCNT_inc(dbh);
>    so imp_dbh won't lose it's data when $dbh goes out of scope, I have
>    tried copying it but wasn't very successful. Neither playing with
>    DBIc_FLAGS(imp_dbh) helped, but that's probably because I'm not
>    very familiar with DBI guts. You help is needed here.

I'm not sure what you're doing here ans it may not be relevant with the
model I've outlined above.

> 5. Finally, the most important issue is that if a thread logged in for
>    real and created imp_dbh, it must not exit while other threads use
>    the same data.

Must not exit because the other threads are *pointing* to it's
implementors data rather than using a copy? (And if it exits the
memory will be freed.)

Tim.

Reply via email to