On Thu, 2004-08-26 at 03:04, Tim Bunce wrote: > We would all like many things but have to settle for what's practical. > ... > > Tim.
umm okay All I knew, before my unsuccessful attempt to locate instances of connect(2) in DBD::ODBC's xs and dbdimp.c files, was that I have in the past written code, in C and C++, that talked with SQL servers by opening regular old TCP streams to servers and conversing on them. Informix and Oracle. And both times I was extending working code so I didn't have to know or support all the possibilities. The only time a TCP stream blocks is when one is waiting to read from it, and this can be worked around, even on systems that do not have nonblocking sockets (unless you're multithreading, in which case you need to set up a mutex to avoid the race condition, so even if you're multithreading, provided you can set up a nonblocking mutex), by only trying to read from the socket when select(2) has indicated that there is something there. It can also block when you're writing a lot to it, but this can be worked around by being sure to only send small chunks, when you can't get non-blocking sockets. In the application I am currently working on, I want prepare and execute to return as soon as they can, even if error reports will get deferred, and I would like to know if data is ready before attempting to fetch a row, so I don't have to wait for the server to send it. Expecting the *all* access functions to return partial sets is not a core requirement for nonblocking support, mandating that *all* block would work just as well and would not slow down the blocking case with needless checks. Yes this class of issues can be trivially solved by demanding threading, but that does not help when a(n unrealistic?) design constraint limits you to one thread. I realized after posting the feature request for ready(), more(), done() that $h->timeout() also would make sense. Select(2) defines the timeout as a time value or null for indefinate blocking, for those tuning in late. Without a mandated interface to non-blocking, drivers will all implement it differently. I think it is not only practical but a good idea to define a standard interface before every driver that can support a nonblocking mode does it differently. @{$sth}{qw/more done/} can be defined in terms of $sth->{Active}: sub more{ $_[0]->{Active}} sub done{ !$_[0]->{Active}} supporting $h->ready implies that a driver supports a non-blocking mode, in which partial state is maintained in the handle and we are waiting for something from the other end. In a blocking driver, $h->ready would always be true because we are never in an incomplete state. Blocking DBI either returns or throws, then that phase of the operation is complete. In a nonblocking situation, a driver might return immediately, but register a callback with itself so that when ready() is called, the callback will run, and the pending operation will either get completed and ready() will be true, or the pending operation will still be pending and ready() will still be false. The ready() would throw the deferred error. To make things easier, (prepare execute fetch) could be mandated to all internally do readycheck: $me->ready() or goto readycheck; as they start, or even better, { my $oldtimeout = $me->{Timeout}; undef $me->{Timeout}; return $h->set_err($err, "Defered $errmsg", $state) unless $me->ready(); # mandated to finish or throw $me->{Timeout} = $oldtimeout } Or even better still, a driver could stack requests up, and each would become ready or not-ready on its own. No flexibility is taken from the driver implementors. So, my proposal is, to declare three read-only access functions and one attribute. These are more, done, ready and {Timeout}. If pointed to the simplest TCP-based driver, I will cheerfully add this stuff to demonstrate that its practical. Referring to M. Peppler's link: http://sybooks.sybase.com/onlinebooks/group-sd/sdg1251e/ctref/@Generic__BookTextView/1039;pt=799/*#X the ready() function would be always true in Synchronous mode, would invoke the ct_poll in Deferred async mode, and would check to see if the callback has occurred automatically in fully synchronous mode: @AsyncStHandles = map {$dbh->prepare($_)} @Statements; do{ $pending = 0; for(@AsyncStHandles){ unless($_->ready()){ $pending++; next; }; ... $_->more and $pending++; } } while $pending How cool is that? Full support of the optional NonBlocking attribute (settable by attempting to define the Timeout attribute?) in general would do something like Deferred Asyncronous mode. The exact reason the pending operation is not ready -- CS_BUSY, EWOULDBLOCK, etc. could be described in $DBI_ERRSTR. Ready shouldn't throw just because it isn't ready, so set_err would not be the right way to set the message. J Leffler wrote: > One of the issues I think that the specification will have to address, > probably restrictively, is whether you can have both an asynchronous > (non-blocking) statement and other synchronous (or, indeed, other > asynchronous) statements active on a single $dbh -- I suspect that the > portable answer will be "No; only one statement, whether synchronous > or asynchronous, can be active on a $dbh at any given time". That would be the portable answer, but the non-portable answer would be "Yes, if your driver supports it," just like any other fancy feature. It's easy to imagine a drh opening additional streams for additional statement handles, for instance, to mock up multiple simultaneous asynchronous statements against a back-end that does not do that natively. > ... Perl threading ... How drivers implement the standard interface is up to the authors of individual drivers. If your driver requires a threaded perl, your driver requires a threaded perl. If your driver breaks on a threaded perl, your driver breaks on a threaded perl. S. Goeldner wrote: > it looks like ADO uses events: > > http://msdn.microsoft.com/library/en-us/ado270/htm/mdmscadoevents.asp so, an anynchronous ADO driver might return immediately after issuing an UPDATE command, and would not be ready until a RecordChangeComplete event was received. Dean Arnold reports: > > DBD::Teradata has async support via driver-specific > methods/attributes: > > my @dbhs = (); > my @sths = (); > > ...connect N sessions, storing handles in @dbhs... > > foreach (@dbhs) { > push @sths, $_->prepare('insert into table values(?,?,?)', > { tdat_nowait => 1 }); > } > > foreach (@sths) { > $_->execute([shift @paramtuples]); > } > > while (params to load) { > @avails = $drh->tdat_FirstAvailList([EMAIL PROTECTED], $timeout); > > foreach (@avails) { > $rc = $sths[$_]->tdat_Realize(); > $sths[$_]->execute([shift @paramtuples]); > } > } > > The API also supports including filehandles in the list passed to > tdat_FirstAvailList, in order to handle other async I/O events. This sounds like tdat_FirstAvailList([EMAIL PROTECTED], $timeout) is a wrapper around select(2), that tells us that there is data availale on the stream, not necessarily that a whole row has been returned. Looking at http://www.presicient.com/tdatdbd/#realize I gather that Teradata's execute() never blocks and tdat_Realize runs the callback on a statement, to completion. The ready() method I am proposing combines both of these, so that in nonblocking mode ready would return false until enough data has come back that tdat_Realize would not have to block were it to run. > While this handles async execution, esp. in support of multiconnection > operations relevant to Teradata's parallel nature, > it doesn't really handle async completion notification, and > I'm not certain there's a clean, DBMS-independent way to support > that without using timers, signals, or threads, all of which > may be problematic. Working out the implementation is not our problem yet (well it is our problem if we're maintaining a driver, but it's not our problem if we're wearing "Lords Of The DBI Specification" hats) http://www.presicient.com/tdatdbd/#optimize indicates that tdat_ already returns immediately from prepare(). Using the proposed interface, if something were to go wrong with the prepare(), the error might get reported by a subsequent call to ready() before the error would be reported in the failing execute(). http://www.presicient.com/tdatdbd/#dblbuf in a situation where there are multiple active statements, the results of the ready() method on each object in the system will mean different things. $dbh->ready() # Can the session accept more instructions without # blocking or are we still chewing on the last # prepare and there's no more space in the statement # preperation queue? $sth1->ready() # is this statement able to have a complete row # fetched from it without blocking? $sth2->ready() # or this one? http://www.presicient.com/tdatdbd/#moreres > When a fetch operation returns undef, a non zero tdat_more_results > value indicates more Teradata statements are available for fetching on > the statement handle. In the proposed interface, fetch would throw the equivalent of an EWOULDBLOCK error with its undef (how far it throws it depends on what RaiseError is set to) when there is more NO NO NO! fetch would block if called while $sth->ready is still false. Maybe the standard could specify could specify two levels of asyncronous support, one in which fetch gives an error when there is more data and another in which it blocks, but that is needlessly complex. The situation is analogous to the discussions that I imagine occured in Redmond whenever they decided that microsoft sockets would not support non-blocking modes. We can say that a true ready indicates that there is at least one row to fetch, and driver behavior when fetch is called before readiness happens is left to the driver: it can block, or treat the situation as an error, either one. Finally, Tim Bunce agreed with Dean Arnold that > Also, in many (most?) instances, driver support for Perl threads > may obviate the need for an async API; in fact, I'd prefer to see > driver developers focus first on thread support, since that doesn't > really require any API definitions, and provides much the same > capability. Here's a proposed non-blocking API extension. Approving it does not interfere with development of improved threading support. In fact, approving it may encourage threading development as threads will be within-limits for implementation strategies. Here is my proposal for a non-blocking DBI extension that is a full and unchanged superset of current synchronous DBI: Requesting a connection in nonblocking mode: $dbh = DBI->connect($source, $user, $passwd, {Timeout => 0}) $h->{Timeout} if supported, reccommends timeout for blocking calls and is inherited and is changeable at every level without affecting parent. C<undef $h->{Timeout}> gives synchronous behavior. Conformance to this extension can be determined by the existence of the ready() method. $sth->ready() if supported, guarantees that the next fetch will not block, or that a non-data-returning command has completed. Invokes communication-related callbacks as in Sybase's Deferred Asynchronous mode. Sets $errstr and may die when RaiseError is set. $dbh->ready() if supported, guarantees that a session handle will not block when asked to prepare a new statement handle. Sets $errstr and may die from a deferred error when RaiseError is set $h->done() is a trivial wrapper for !$h->{Active} $h->more() is a trivial wrapper for $h->{Active} Drivers that do not implement return without answer at the prepare and execute levels should not implement $dbh->ready. Intermediate versions of individual drivers may block on any methods. $t_a_r = $sth->fetchavail_arrayref( $slice, $max_rows ) just like fetchall_arrayref, except that only data that has already arrived and been enbuffered is returned. The *all* functions will continue to block. Implementing the ready() method with threads will be easy. Implementing ready() without threads will be tricky but possible, when one has access to the communications layer. Implementation suggestions: I expect that the best way to add this extension to your module would be to write a nonblocking module separate from the main regular module, and have the connect method return an object of the nonblocking type when $attr{Timeout} exists. no threads: Include all currently pending operations on all handles of this type of database in a set that is checked whenever ready() is called on a pending object. threads: launch a thread with each statement handle. The thread responds to incoming socket data by writing it into a buffer, and when there is enough there, the thread sets the statement's ready attribute. All ready() has to do is return the ready attribute. David Nicol