On 10/7/11 9:11 AM, Steve Teale wrote:
I'm thinking that for each database that gets covered there will
need to be two modules, like:
etc.c.mysql
etc.mysqld
The etc.c.xxx modules would be completely different between databases -
just translations of the necessary header files. The xxxd modules
would be as similar as possible consistent with allowing the strong
points of individual database systems to show through, and the weak
points of others to be avoided. I don't think all should be reduced
to some LCD.
Well we may be hasty to go that way. The driver-based approach works
well for other languages and APIs - why wouldn't it work for us? The
differences across different DBMSs would be mostly in the SQL support,
not the basic APIs.
These modules should attempt to make a good range of capabilities
available to the D programmer, but they not have to be all encompassing.
Those users who want to do really fancy things can drop back to the
low-level interface. They should probably have the following capabilities:
1) Establishing and closing database connections.
Sounds good. Since DB connections are a precious resource, there must be
a RAII struct holding them. The functions below may be members of it.
2) The capability to execute literal SQL statements - execSQL()
if you like. Some of these will generate result sets, of which more below.
Great.
3) The capability to create prepared statements with in and out
parameters and association of the parameters with a source, and
then to execute these. This breaks down into several components/
capabilities, which could be labeled:
3a) createPreparedStatement() - marshal parameters, associate them
with a sourceand have the server prepare the statement.
3b) execStatement() - for those SQL statements that don't have a
result set.
3c) execStatementIncremental()/getNext() - execute the prepared statement,
then fetch the result rows one at a time into some kind of result set.
Here's where the range interface might be used. We might simply have
execStatement() that returns an input range. If the statement produced
no rows, the range will be empty. No need for distinguishing
execStatement() and execStatementIncremental().
3d) execStatementAll() - execute the prepared statement and get all
the resulting rows into some kind of result set.
This is not a primitive, but instead a convenience function that should
be used with caution and only for queries known to be small. I agree
that it's good to have.
3e) (maybe) execScalar() - do the whole sequence prepare, execute,
and get a single value result set placed into a D variable.
That's convenient for e.g. "SELECT COUNT(*) FROM table WHERE condition",
which people run all the time.
3f) (maybe) execStoredProcedure() - another 'do the whole thing'
capability TBD.
Well wouldn't that just execute a special SQL a la "CALL procedure"?
It is when we come to the nature of the result sets that there is
likely to be dissent. I favor arrays of structs, but we should
probably do arrays of arrays of variants too for those situations
where structures can't be sorted out at compile time. There needs
to be some symmetry between what is used here, and what can be used
as input to operations such as a prepared insert. It is of course
vital that this part of each middle layer produce exactly the same
kind of results. Otherwise the top layer could become difficult.
I'd like arrays of structs too but we must acknowledge that most
databases people use will be large, in which case loading the whole
thing eagerly takes a lot of RAM and potentially wastes time. So arrays
are out except for e.g. convenience functions - but then we already have
array() that converts an arbitrary input range into an array.
Now, regarding the "structs" part, I'd like that too when the schema is
statically known. Two issues:
1. MFC had at a point a wizard that generated one struct per resultset.
It was an absolute maintenance disaster and they recanted by offering
dynamically-bound result sets. The lesson there is that defining a
struct for each query won't likely play out, so we better use
Tuple!(ColType1, ColType2, ...). A possible API would be:
auto db = std.database.connect("cdb.mysql");
auto rows = db.sql!(double, ulong)
("SELECT SUM(x), COUNT(x) FROM table");
// We know that rows is a range of Tuple!(double, ulong).
writeln("Sum is ", rows.front[0], " count is ", rows.front[1]);
Cool beans. I'd love to use such an API!
2. Statically-bound tuples work only when we know the query beforehand.
We need to cater for resultsets of unknown column count and types. The
question here is whether we traffic in untyped memory a la ubyte[] or
some variant type. I favor Variant because it's efficient, accommodates
any SQL type easily, and is convenient to use.
On top of this set of two modules for each database, I envisage a
higher-level module - etc.dbops - that provides a bunch of convenience
templates for various common database operations, spanning the databases.
Once the middle layer is in place, this top layer should be relatively
easy to implement. It should be noted though that all these database
wrappers will be monstrously difficult to test.
I am at the point with MySQL where I can get the result of a plain
old query into an array of a checked structure type. I have the
prepared statement stuff, and know how the result will be created from
a prepared query (the execStatementAll() case) - I just have to plow
through a bit more binding and fetching.
This is probably rather general and vague, but I would like to get
comments so we can iterate toward a practical design.
Sounds good. What I'd suggest given the plethora of DB bindings already
available is to form a sort of a "task force" and develop a shared
vision towards a solid proposal. You may self-organize with community's
help on this group and exchange ideas under e.g. the prefix
"[std.database]" in the message titles.
What I can say for sure is that there will be an std.database.
Thanks for this initiative.
Andrei