Tim et al,

Following are some ideas I have for the new DBI, that were thought about greatly as I was both working on Rosetta/SQL::Routine and writing Perl 6 under Pugs. These are all language-independent and should be implemented at the Parrot-DBI level for all Parrot-hosted languages to take advantage of, rather than just in the Perl 6 specific additions. I believe in them strongly enough that they are in the core of how Rosetta et al operates (partly released, partly pending).

0. There were a lot of good ideas in other people's replies to this topic and I won't repeat them here, for the most part.

1. Always use distinct functions/methods to separate the declaration and destruction of a resource handle / object from any of its activities. With a database connection handle, both the open/connect() and close/disconnect() are $dbh methods; the $dbh itself is created separately, such as with a DBI.new_connection() function. With a statement handle, the prepare() is also a $sth method like with execute() et al; the $sth itself is created separately, such as with a $dbh.new_statement() method. If new handle types are created, such as a separate one for cursors, they would likewise be declared and used separately.

With this separation, you can re-use the resource handles more easily, and you don't have to re-supply static descriptive configuration details each time you use it, but rather only when the handle is declared. At the very least, such static details for a connection handle include what DBI implementor/driver module to use; as well, these details include what database product is being used, and locating details for the database, whether internet address or local service name or on-disk file name and so on. This can optionally include the authorization identifier / user name and password, or those details can be provided at open() time instead if they are likely to be variable.

2. Always separate out any usage stages that can be performed apart from the database itself. This allows an application to do those stages more efficiently, consuming fewer resources of both itself and the database.

For example, a pre-forked Apache process can declare all of the database and statement handles that it plans to use, and do as much of the prepare()-type work that can be done internally as possible, prior to forking; all of that work can be done just once, saving CPU, and only one instance of it consumes RAM. All actual invocations of a database, the open()/connect() and execute() happen after forking, and at that point all of the database-involving work is consolidated.

Or even when you have a single process, most of the work you have to do, including any SQL generation et al, can be more easily be pre-performed and the results cached for multiple later uses. Some DBI wrappers may do a lot of work with SQL generation et al and be slow, but if this work is mainly preparatory, they can still be used in a high-speed environment as that work tends to only need doing once. Most of the prep work of a DBI wrapper can be done effectively prior to ever opening the database connection.

3. Redefine prepare() and execute() such that the first is expressly for activities that can be done apart from a database (and hence can also be done for a connection handle that is closed at the time) while all activities that require database interaction are deferred to the second.

Under this new scheme, when a database has native prepared statements support that you want to leverage, the database will be invoked to prepare said statements the first time you run execute(), and then the result of this is cached by DBI or the driver for all subsequent execute() to use. In that case, any input errors detected by the database will be thrown at execute() time regardless of their nature; only input errors detected by the DBD module itself would be thrown at prepare() time. (Note that module-caught input errors are much more likely when the module itself is handling SQL in AST form, whereas database-caught input errors are much more likely when SQL is always maintained in the program as string form.) Note also that the deferal to execute() time of error detection is what tends to happen already with any databases that don't have native prepared statement support or for whom the DBI driver doesn't use them; these won't be affected by the official definition change.

Now I realize that it may be critically important for an application to know at prepare() time about statically-determinable errors, such as mal-formed SQL syntax, where error detection is handled just by the database. For their benefit, the prepare()+execute() duality could be broken up into more methods, either all used in sequence or some alternately to each other, so users get their errors when they want them. But regardless of the solution, it should permit for all database-independent preparation to be separated out.

4. All host parameters should be named (like ":foo") rather than positional (like "?"), meeting with the SQL:2003 standard. The named format is a lot easier to use and flexible, making programmers a lot less error prone, more powerful, and particularly more resource efficient when the same parameter is conceptually used multiple times in a SQL statement (it only has to be bound once). If anyone wants to use positional format, it could easily be emulated on top of this. Or, if native positional support is still important, then it should be a parallel option that can be used at the same time as named in any particular SQL statement. See the native API of SQLite 3 for one example that (I believe) supports both in parallel. This also means that execute() et al should take arguments in a hash rather than an array.

5. All details used to construct a connection handle should be completely decomposed rather than shoved into an ungainly "data source". Examples of what should be distinct (not all being applicable at once) are: 1. the DBI driver module to use; 2. the internet server IP address or domain name and port; 3. the locally defined server device socket; 4. the locally defined service (eg, ODBC or SQL*Net) name; 5. the file system file name; 6. the file system directory name; 7. some other detail if any for fully in-RAM databases; 8. the authorization identifier / user name; 9. the password; 10. some other authorization credential, or channel encryption details, or whatever else; 11. what kind of database or what database product is being used, if known. If the DBI driver talks to a client-configurable DBI proxy server, then, it should be possible to nest a set of the above settings (eg, as a hash-ref) as one part of the main settings given to the proxy client.

6. DBI drivers should always be specified by users with their actual package name, such as 'DBD::SQLite', and not some alternate or abbreviated version that either leaves the 'DBD::' out or is spelled differently. Similarly, the DBI driver loader should simply try to load exactly the driver name it is given, without munging of any type. This approach is a lot more simple, flexible and lacks the cludges of the current DBI. DBI driver implementers can also name their module anything they want, and don't have to name it 'DBD::*'. A DBI driver should not have to conform to anything except a specific API by which it is called, which includes its behaviour upon initialization, invocation, and destruction.

7. Error conditions should *always* be thrown as exceptions by DBI; no exception thrown means that the request succeeded, even if its result was nothing/undef. This is a lot simpler to implement or use than any alternative. If people don't like that, then some wrapper should be employed to block the exceptions. Or, if it is really important to have a non-exception alternative, then that should be an alternative, with thrown exceptions being the default behaviour.

8. Split off the proxy server/client stuff into a separate distribution; they are conceptually add-ons anyway and could benefit from independent development. Split off any SQL parser utilities (eg, SQL::Nano, SQL::Statement) into a separate distribution, since only a small fraction of potential drivers would use them, and they are better off to just require them separately. Split off all bundled DBI drivers (DBD::File, etc) into separate distributions, unless they exist soley to provide an example of how to make a DBI driver and are not actually useful in themselves. The DBI distribution should focus simply on defining an interface, and let anything that will help with implementing the drivers to be optional and separate.

9. As Sam Vilain suggested, prepare() type methods should accept both SQL strings and any type of object as input, so that drivers have the option to directly accept AST forms; particularly useful when the drivers themselves would otherwise have to parse the SQL into an AST anyway.

And now ...

Here's an example of some things that implementing some of the above suggestions will let an application do (code may not compile as is):

  method init($self) {
    $self.db = DBI.new_connection( driver => 'DBD::SQLite', host => 'test' );

    my $sth1 = $self.db.new_statement(
      "select * from baz where abc = :bar or def = :bar" );
    $sth1.prepare();
    my $sth2 = $self.db.new_statement(
      "insert into baz (abc, def) values (:p_abc, :p_def)" );
    $sth2.prepare();

    $self.routines = (
      'get_all_baz' => -> ($bar) {
        $sth1.execute( { bar => $bar } );
        return $sth1.fetch_all_hashref();
      },
      'add_one_baz' => -> ($abc, $def) {
        $sth2.execute( { p_abc => $abc, p_def => $def } );
      },
   );
  }

  method main($self) {
    try {
      $self.db.open( user => 'jane', pass => 'k34l5jr' );

      try {
        $self.routines.{'add_one_baz'}.('hello','world');

        my $results = $self.routines.{'get_all_baz'}.('world');

        my $sth3 = $self.db.new_statement(
          "delete from baz where def = :foo" );
        $sth3.prepare();
        $sth3.execute( { foo => 'blarch' } );
      };
      $! and say "dag nabit!";

      $self.db.close();
    };
    $! and say "dog gone!";
  }

In the above example, only main() actually invokes a database; init() does load the DBI driver, though. You can also invoke main() as many times as you want, and you can run init() prior to forking without trouble.

What I've said in this email is not exhaustive and I may add or amend items later; but, its a good start. Feedback is welcome of course.

Thank you. -- Darren Duncan

Reply via email to