Tim et al,
Following are some ideas I have for the new DBI, that were thought
about greatly as I was both working on Rosetta/SQL::Routine and
writing Perl 6 under Pugs. These are all language-independent and
should be implemented at the Parrot-DBI level for all Parrot-hosted
languages to take advantage of, rather than just in the Perl 6
specific additions. I believe in them strongly enough that they are
in the core of how Rosetta et al operates (partly released, partly
pending).
0. There were a lot of good ideas in other people's replies to this
topic and I won't repeat them here, for the most part.
1. Always use distinct functions/methods to separate the declaration
and destruction of a resource handle / object from any of its
activities. With a database connection handle, both the
open/connect() and close/disconnect() are $dbh methods; the $dbh
itself is created separately, such as with a DBI.new_connection()
function. With a statement handle, the prepare() is also a $sth
method like with execute() et al; the $sth itself is created
separately, such as with a $dbh.new_statement() method. If new
handle types are created, such as a separate one for cursors, they
would likewise be declared and used separately.
With this separation, you can re-use the resource handles more
easily, and you don't have to re-supply static descriptive
configuration details each time you use it, but rather only when the
handle is declared. At the very least, such static details for a
connection handle include what DBI implementor/driver module to use;
as well, these details include what database product is being used,
and locating details for the database, whether internet address or
local service name or on-disk file name and so on. This can
optionally include the authorization identifier / user name and
password, or those details can be provided at open() time instead if
they are likely to be variable.
2. Always separate out any usage stages that can be performed apart
from the database itself. This allows an application to do those
stages more efficiently, consuming fewer resources of both itself and
the database.
For example, a pre-forked Apache process can declare all of the
database and statement handles that it plans to use, and do as much
of the prepare()-type work that can be done internally as possible,
prior to forking; all of that work can be done just once, saving CPU,
and only one instance of it consumes RAM. All actual invocations of
a database, the open()/connect() and execute() happen after forking,
and at that point all of the database-involving work is consolidated.
Or even when you have a single process, most of the work you have to
do, including any SQL generation et al, can be more easily be
pre-performed and the results cached for multiple later uses. Some
DBI wrappers may do a lot of work with SQL generation et al and be
slow, but if this work is mainly preparatory, they can still be used
in a high-speed environment as that work tends to only need doing
once. Most of the prep work of a DBI wrapper can be done effectively
prior to ever opening the database connection.
3. Redefine prepare() and execute() such that the first is expressly
for activities that can be done apart from a database (and hence can
also be done for a connection handle that is closed at the time)
while all activities that require database interaction are deferred
to the second.
Under this new scheme, when a database has native prepared statements
support that you want to leverage, the database will be invoked to
prepare said statements the first time you run execute(), and then
the result of this is cached by DBI or the driver for all subsequent
execute() to use. In that case, any input errors detected by the
database will be thrown at execute() time regardless of their nature;
only input errors detected by the DBD module itself would be thrown
at prepare() time. (Note that module-caught input errors are much
more likely when the module itself is handling SQL in AST form,
whereas database-caught input errors are much more likely when SQL is
always maintained in the program as string form.) Note also that the
deferal to execute() time of error detection is what tends to happen
already with any databases that don't have native prepared statement
support or for whom the DBI driver doesn't use them; these won't be
affected by the official definition change.
Now I realize that it may be critically important for an application
to know at prepare() time about statically-determinable errors, such
as mal-formed SQL syntax, where error detection is handled just by
the database. For their benefit, the prepare()+execute() duality
could be broken up into more methods, either all used in sequence or
some alternately to each other, so users get their errors when they
want them. But regardless of the solution, it should permit for all
database-independent preparation to be separated out.
4. All host parameters should be named (like ":foo") rather than
positional (like "?"), meeting with the SQL:2003 standard. The named
format is a lot easier to use and flexible, making programmers a lot
less error prone, more powerful, and particularly more resource
efficient when the same parameter is conceptually used multiple times
in a SQL statement (it only has to be bound once). If anyone wants
to use positional format, it could easily be emulated on top of this.
Or, if native positional support is still important, then it should
be a parallel option that can be used at the same time as named in
any particular SQL statement. See the native API of SQLite 3 for one
example that (I believe) supports both in parallel. This also means
that execute() et al should take arguments in a hash rather than an
array.
5. All details used to construct a connection handle should be
completely decomposed rather than shoved into an ungainly "data
source". Examples of what should be distinct (not all being
applicable at once) are: 1. the DBI driver module to use; 2. the
internet server IP address or domain name and port; 3. the locally
defined server device socket; 4. the locally defined service (eg,
ODBC or SQL*Net) name; 5. the file system file name; 6. the file
system directory name; 7. some other detail if any for fully in-RAM
databases; 8. the authorization identifier / user name; 9. the
password; 10. some other authorization credential, or channel
encryption details, or whatever else; 11. what kind of database or
what database product is being used, if known. If the DBI driver
talks to a client-configurable DBI proxy server, then, it should be
possible to nest a set of the above settings (eg, as a hash-ref) as
one part of the main settings given to the proxy client.
6. DBI drivers should always be specified by users with their actual
package name, such as 'DBD::SQLite', and not some alternate or
abbreviated version that either leaves the 'DBD::' out or is spelled
differently. Similarly, the DBI driver loader should simply try to
load exactly the driver name it is given, without munging of any
type. This approach is a lot more simple, flexible and lacks the
cludges of the current DBI. DBI driver implementers can also name
their module anything they want, and don't have to name it 'DBD::*'.
A DBI driver should not have to conform to anything except a specific
API by which it is called, which includes its behaviour upon
initialization, invocation, and destruction.
7. Error conditions should *always* be thrown as exceptions by DBI;
no exception thrown means that the request succeeded, even if its
result was nothing/undef. This is a lot simpler to implement or use
than any alternative. If people don't like that, then some wrapper
should be employed to block the exceptions. Or, if it is really
important to have a non-exception alternative, then that should be an
alternative, with thrown exceptions being the default behaviour.
8. Split off the proxy server/client stuff into a separate
distribution; they are conceptually add-ons anyway and could benefit
from independent development. Split off any SQL parser utilities
(eg, SQL::Nano, SQL::Statement) into a separate distribution, since
only a small fraction of potential drivers would use them, and they
are better off to just require them separately. Split off all
bundled DBI drivers (DBD::File, etc) into separate distributions,
unless they exist soley to provide an example of how to make a DBI
driver and are not actually useful in themselves. The DBI
distribution should focus simply on defining an interface, and let
anything that will help with implementing the drivers to be optional
and separate.
9. As Sam Vilain suggested, prepare() type methods should accept both
SQL strings and any type of object as input, so that drivers have the
option to directly accept AST forms; particularly useful when the
drivers themselves would otherwise have to parse the SQL into an AST
anyway.
And now ...
Here's an example of some things that implementing some of the above
suggestions will let an application do (code may not compile as is):
method init($self) {
$self.db = DBI.new_connection( driver => 'DBD::SQLite', host => 'test' );
my $sth1 = $self.db.new_statement(
"select * from baz where abc = :bar or def = :bar" );
$sth1.prepare();
my $sth2 = $self.db.new_statement(
"insert into baz (abc, def) values (:p_abc, :p_def)" );
$sth2.prepare();
$self.routines = (
'get_all_baz' => -> ($bar) {
$sth1.execute( { bar => $bar } );
return $sth1.fetch_all_hashref();
},
'add_one_baz' => -> ($abc, $def) {
$sth2.execute( { p_abc => $abc, p_def => $def } );
},
);
}
method main($self) {
try {
$self.db.open( user => 'jane', pass => 'k34l5jr' );
try {
$self.routines.{'add_one_baz'}.('hello','world');
my $results = $self.routines.{'get_all_baz'}.('world');
my $sth3 = $self.db.new_statement(
"delete from baz where def = :foo" );
$sth3.prepare();
$sth3.execute( { foo => 'blarch' } );
};
$! and say "dag nabit!";
$self.db.close();
};
$! and say "dog gone!";
}
In the above example, only main() actually invokes a database; init()
does load the DBI driver, though. You can also invoke main() as many
times as you want, and you can run init() prior to forking without
trouble.
What I've said in this email is not exhaustive and I may add or amend
items later; but, its a good start. Feedback is welcome of course.
Thank you. -- Darren Duncan