Re: DBI v2 - The Plan and How You Can Help

Darren Duncan Mon, 04 Jul 2005 18:31:58 -0700

Tim et al,

Following are some ideas I have for the new DBI, that were thoughtabout greatly as I was both working on Rosetta/SQL::Routine andwriting Perl 6 under Pugs. These are all language-independent andshould be implemented at the Parrot-DBI level for all Parrot-hostedlanguages to take advantage of, rather than just in the Perl 6specific additions. I believe in them strongly enough that they arein the core of how Rosetta et al operates (partly released, partlypending).

0. There were a lot of good ideas in other people's replies to thistopic and I won't repeat them here, for the most part.

1. Always use distinct functions/methods to separate the declarationand destruction of a resource handle / object from any of itsactivities. With a database connection handle, both theopen/connect() and close/disconnect() are $dbh methods; the $dbhitself is created separately, such as with a DBI.new_connection()function. With a statement handle, the prepare() is also a $sthmethod like with execute() et al; the $sth itself is createdseparately, such as with a $dbh.new_statement() method. If newhandle types are created, such as a separate one for cursors, theywould likewise be declared and used separately.

With this separation, you can re-use the resource handles moreeasily, and you don't have to re-supply static descriptiveconfiguration details each time you use it, but rather only when thehandle is declared. At the very least, such static details for aconnection handle include what DBI implementor/driver module to use;as well, these details include what database product is being used,and locating details for the database, whether internet address orlocal service name or on-disk file name and so on. This canoptionally include the authorization identifier / user name andpassword, or those details can be provided at open() time instead ifthey are likely to be variable.

2. Always separate out any usage stages that can be performed apartfrom the database itself. This allows an application to do thosestages more efficiently, consuming fewer resources of both itself andthe database.

For example, a pre-forked Apache process can declare all of thedatabase and statement handles that it plans to use, and do as muchof the prepare()-type work that can be done internally as possible,prior to forking; all of that work can be done just once, saving CPU,and only one instance of it consumes RAM. All actual invocations ofa database, the open()/connect() and execute() happen after forking,and at that point all of the database-involving work is consolidated.

Or even when you have a single process, most of the work you have todo, including any SQL generation et al, can be more easily bepre-performed and the results cached for multiple later uses. SomeDBI wrappers may do a lot of work with SQL generation et al and beslow, but if this work is mainly preparatory, they can still be usedin a high-speed environment as that work tends to only need doingonce. Most of the prep work of a DBI wrapper can be done effectivelyprior to ever opening the database connection.

3. Redefine prepare() and execute() such that the first is expresslyfor activities that can be done apart from a database (and hence canalso be done for a connection handle that is closed at the time)while all activities that require database interaction are deferredto the second.

Under this new scheme, when a database has native prepared statementssupport that you want to leverage, the database will be invoked toprepare said statements the first time you run execute(), and thenthe result of this is cached by DBI or the driver for all subsequentexecute() to use. In that case, any input errors detected by thedatabase will be thrown at execute() time regardless of their nature;only input errors detected by the DBD module itself would be thrownat prepare() time. (Note that module-caught input errors are muchmore likely when the module itself is handling SQL in AST form,whereas database-caught input errors are much more likely when SQL isalways maintained in the program as string form.) Note also that thedeferal to execute() time of error detection is what tends to happenalready with any databases that don't have native prepared statementsupport or for whom the DBI driver doesn't use them; these won't beaffected by the official definition change.

Now I realize that it may be critically important for an applicationto know at prepare() time about statically-determinable errors, suchas mal-formed SQL syntax, where error detection is handled just bythe database. For their benefit, the prepare()+execute() dualitycould be broken up into more methods, either all used in sequence orsome alternately to each other, so users get their errors when theywant them. But regardless of the solution, it should permit for alldatabase-independent preparation to be separated out.

4. All host parameters should be named (like ":foo") rather thanpositional (like "?"), meeting with the SQL:2003 standard. The namedformat is a lot easier to use and flexible, making programmers a lotless error prone, more powerful, and particularly more resourceefficient when the same parameter is conceptually used multiple timesin a SQL statement (it only has to be bound once). If anyone wantsto use positional format, it could easily be emulated on top of this.Or, if native positional support is still important, then it shouldbe a parallel option that can be used at the same time as named inany particular SQL statement. See the native API of SQLite 3 for oneexample that (I believe) supports both in parallel. This also meansthat execute() et al should take arguments in a hash rather than anarray.

5. All details used to construct a connection handle should becompletely decomposed rather than shoved into an ungainly "datasource". Examples of what should be distinct (not all beingapplicable at once) are: 1. the DBI driver module to use; 2. theinternet server IP address or domain name and port; 3. the locallydefined server device socket; 4. the locally defined service (eg,ODBC or SQL*Net) name; 5. the file system file name; 6. the filesystem directory name; 7. some other detail if any for fully in-RAMdatabases; 8. the authorization identifier / user name; 9. thepassword; 10. some other authorization credential, or channelencryption details, or whatever else; 11. what kind of database orwhat database product is being used, if known. If the DBI drivertalks to a client-configurable DBI proxy server, then, it should bepossible to nest a set of the above settings (eg, as a hash-ref) asone part of the main settings given to the proxy client.

6. DBI drivers should always be specified by users with their actualpackage name, such as 'DBD::SQLite', and not some alternate orabbreviated version that either leaves the 'DBD::' out or is spelleddifferently. Similarly, the DBI driver loader should simply try toload exactly the driver name it is given, without munging of anytype. This approach is a lot more simple, flexible and lacks thecludges of the current DBI. DBI driver implementers can also nametheir module anything they want, and don't have to name it 'DBD::*'.A DBI driver should not have to conform to anything except a specificAPI by which it is called, which includes its behaviour uponinitialization, invocation, and destruction.

7. Error conditions should *always* be thrown as exceptions by DBI;no exception thrown means that the request succeeded, even if itsresult was nothing/undef. This is a lot simpler to implement or usethan any alternative. If people don't like that, then some wrappershould be employed to block the exceptions. Or, if it is reallyimportant to have a non-exception alternative, then that should be analternative, with thrown exceptions being the default behaviour.

8. Split off the proxy server/client stuff into a separatedistribution; they are conceptually add-ons anyway and could benefitfrom independent development. Split off any SQL parser utilities(eg, SQL::Nano, SQL::Statement) into a separate distribution, sinceonly a small fraction of potential drivers would use them, and theyare better off to just require them separately. Split off allbundled DBI drivers (DBD::File, etc) into separate distributions,unless they exist soley to provide an example of how to make a DBIdriver and are not actually useful in themselves. The DBIdistribution should focus simply on defining an interface, and letanything that will help with implementing the drivers to be optionaland separate.

9. As Sam Vilain suggested, prepare() type methods should accept bothSQL strings and any type of object as input, so that drivers have theoption to directly accept AST forms; particularly useful when thedrivers themselves would otherwise have to parse the SQL into an ASTanyway.


And now ...

Here's an example of some things that implementing some of the abovesuggestions will let an application do (code may not compile as is):


  method init($self) {
    $self.db = DBI.new_connection( driver => 'DBD::SQLite', host => 'test' );

    my $sth1 = $self.db.new_statement(
      "select * from baz where abc = :bar or def = :bar" );
    $sth1.prepare();
    my $sth2 = $self.db.new_statement(
      "insert into baz (abc, def) values (:p_abc, :p_def)" );
    $sth2.prepare();

    $self.routines = (
      'get_all_baz' => -> ($bar) {
        $sth1.execute( { bar => $bar } );
        return $sth1.fetch_all_hashref();
      },
      'add_one_baz' => -> ($abc, $def) {
        $sth2.execute( { p_abc => $abc, p_def => $def } );
      },
   );
  }

  method main($self) {
    try {
      $self.db.open( user => 'jane', pass => 'k34l5jr' );

      try {
        $self.routines.{'add_one_baz'}.('hello','world');

        my $results = $self.routines.{'get_all_baz'}.('world');

        my $sth3 = $self.db.new_statement(
          "delete from baz where def = :foo" );
        $sth3.prepare();
        $sth3.execute( { foo => 'blarch' } );
      };
      $! and say "dag nabit!";

      $self.db.close();
    };
    $! and say "dog gone!";
  }

In the above example, only main() actually invokes a database; init()does load the DBI driver, though. You can also invoke main() as manytimes as you want, and you can run init() prior to forking withouttrouble.

What I've said in this email is not exhaustive and I may add or amenditems later; but, its a good start. Feedback is welcome of course.


Thank you. -- Darren Duncan

Re: DBI v2 - The Plan and How You Can Help

Reply via email to