Re: Database interface design - was how to build up the library.

2011-10-07 Thread Andrei Alexandrescu

On 10/07/11 15:00, Piotr Szturmaj wrote:

Andrei Alexandrescu wrote:

On 10/7/11 12:02 PM, Piotr Szturmaj wrote:

Did you see http://pszturmaj.github.com/ddb/db.html ? It maps tuples,
structs and arrays in similar manner.


I did. Looks good. Why do you need DBRow? You could simply use
Variant[], the given struct, or Tuple directly.


Sure, that's a good idea. I used DBRow because I wanted generic
set/setNull for any of its base types, but it could be done with
function templates anyway...

The only thing that makes DBRow useful is opIndex access to Variant[]
fields, like row["field"].


Oh that makes sense, thanks.

Andrei


Re: Database interface design - was how to build up the library.

2011-10-07 Thread Piotr Szturmaj

Andrei Alexandrescu wrote:

On 10/7/11 12:02 PM, Piotr Szturmaj wrote:

Did you see http://pszturmaj.github.com/ddb/db.html ? It maps tuples,
structs and arrays in similar manner.


I did. Looks good. Why do you need DBRow? You could simply use
Variant[], the given struct, or Tuple directly.


Sure, that's a good idea. I used DBRow because I wanted generic 
set/setNull for any of its base types, but it could be done with 
function templates anyway...


The only thing that makes DBRow useful is opIndex access to Variant[] 
fields, like row["field"].


Re: Database interface design - was how to build up the library.

2011-10-07 Thread Andrei Alexandrescu

On 10/7/11 1:27 PM, Steve Teale wrote:

Andrei,

So if I'm understanding you correctly, if structs are used, and there's a 
database
change, then the app has to be rebuilt.


Yes. Touched to update the structs and rebuilt.


If Variants, then only the app's config
file has to be updated (or the app can look in the database to get such
information) - have I got this right?


Yah. The exact behavior depends on the assumption the app makes about 
the resultset. Consider:


auto result = db.sql(queryString);
enforce(result.front[0].type == typeid(double));
...

In this case you'd pretty much need to touch the code if the queryString 
returns something else than a double in the first column.



Andrei


Re: Database interface design - was how to build up the library.

2011-10-07 Thread Jacob Carlborg

On 2011-10-07 16:11, Steve Teale wrote:

I'm thinking that for each database that gets covered there will
need to be two modules, like:

etc.c.mysql
etc.mysqld

The etc.c.xxx modules would be completely different between databases -
just translations of the necessary header files. The xxxd modules
would be as similar as possible consistent with allowing the strong
points of individual database systems to show through, and the weak
points of others to be avoided. I don't think all should be reduced
to some LCD.

These modules should attempt to make a good range of capabilities
available to the D programmer, but they not have to be all encompassing.
Those users who want to do really fancy things can drop back to the
low-level interface. They should probably have the following capabilities:

1) Establishing and closing database connections.

2) The capability to execute literal SQL statements - execSQL()
if you like. Some of these will generate result sets, of which more below.

3) The capability to create prepared statements with in and out
parameters and association of the parameters with a source, and
then to execute these. This breaks down into several components/
capabilities, which could be labeled:

3a) createPreparedStatement() - marshal parameters, associate them
with a sourceand have the server prepare the statement.

3b) execStatement() - for those SQL statements that don't have a
result set.

3c) execStatementIncremental()/getNext() - execute the prepared statement,
then fetch the result rows one at a time into some kind of result set.

3d) execStatementAll() - execute the prepared statement and get all
the resulting rows into some kind of result set.

3e) (maybe) execScalar() - do the whole sequence prepare, execute,
and get a single value result set placed into a D variable.

3f) (maybe) execStoredProcedure() - another 'do the whole thing'
capability TBD.

It is when we come to the nature of the result sets that there is
likely to be dissent. I favor arrays of structs, but we should
probably do arrays of arrays of variants too for those situations
where structures can't be sorted out at compile time. There needs
to be some symmetry between what is used here, and what can be used
as input to operations such as a prepared insert. It is of course
vital that this part of each middle layer produce exactly the same
kind of results. Otherwise the top layer could become difficult.

On top of this set of two modules for each database, I envisage a
higher-level module - etc.dbops - that provides a bunch of convenience
templates for various common database operations, spanning the databases.
Once the middle layer is in place, this top layer should be relatively
easy to implement. It should be noted though that all these database
wrappers will be monstrously difficult to test.

I am at the point with MySQL where I can get the result of a plain
old query into an array of a checked structure type. I have the
prepared statement stuff, and know how the result will be created from
a prepared query (the execStatementAll() case) - I just have to plow
through a bit more binding and fetching.

This is probably rather general and vague, but I would like to get
comments so we can iterate toward a practical design.

Thanks
Steve


I don't know if this will be necessary to have a special function for 
but it would be nice to be able to get batches of results from a query. 
Or this might be solved by the range interface.


--
/Jacob Carlborg


Re: Database interface design - was how to build up the library.

2011-10-07 Thread Adam Ruppe
Sean Kelly wote:
> Does your Row equate to the ResultSet above?

Fairly similar.

Mine looks something like this:

interface ResultSet {
// name for associative array to result index
int getFieldIndex(string field);
string[] fieldNames();

bool empty();
Row front();
void popFront();
int length();
}

struct Row {
// the actual column is returned as a string - probably should
// change that, but mysql, postgres, and sqlite all offered that
// and it works for me, so I went with it
string opIndex(size_t idx) {}
string opIndex(string columnName) {}
int opApply(...) {}

private string[] data;
}



Then, each of the database implementations use that little
ResultSet interface  to feed Row structs back to the user code.


Re: Database interface design - was how to build up the library.

2011-10-07 Thread Sean Kelly
On Oct 7, 2011, at 11:27 AM, Steve Teale wrote:

> Andrei,
> 
> So if I'm understanding you correctly, if structs are used, and there's a 
> database
> change, then the app has to be rebuilt. If Variants, then only the app's 
> config
> file has to be updated (or the app can look in the database to get such
> information) - have I got this right?

I'd say yes, though qualify this by saying that a properly designed app will 
target only views and stored procedures, which insulate the app from schema 
changes.

Re: Database interface design - was how to build up the library.

2011-10-07 Thread Sean Kelly
On Oct 7, 2011, at 10:14 AM, Adam Ruppe wrote:

> In my database.d, I used a database row struct to provide both
> integer and key based indexes.

Funny… I have a ResultSet object which contains an array of Column objects.  
Here's the basic idea:

class ResultSet {
static class Column {
int index();
string name();
int size();
int length();
void[] data();
bool opEquals(NullValue);
T as(T)(); // return raw if T matches underlying or call to!(T) on 
field as appropriate
}

size_t numCols();
bool nextResultSet();
bool next();
bool prev();
bool first();
bool last();
void update(); // SQLSetPos(rowNumber=0)
Column opIndex(size_t);
// easy enough to add opIndex(string)
}

Does your Row equate to the ResultSet as above?

Re: Database interface design - was how to build up the library.

2011-10-07 Thread Steve Teale
Andrei,

So if I'm understanding you correctly, if structs are used, and there's a 
database
change, then the app has to be rebuilt. If Variants, then only the app's config
file has to be updated (or the app can look in the database to get such
information) - have I got this right?

Steve


Re: Database interface design - was how to build up the library.

2011-10-07 Thread Sean Kelly
On Oct 7, 2011, at 8:46 AM, Andrei Alexandrescu wrote:

> On 10/7/11 9:11 AM, Steve Teale wrote:
>> I'm thinking that for each database that gets covered there will
>> need to be two modules, like:
>> 
>> etc.c.mysql
>> etc.mysqld
>> 
>> The etc.c.xxx modules would be completely different between databases -
>> just translations of the necessary header files. The xxxd modules
>> would be as similar as possible consistent with allowing the strong
>> points of individual database systems to show through, and the weak
>> points of others to be avoided. I don't think all should be reduced
>> to some LCD.
> 
> Well we may be hasty to go that way. The driver-based approach works well for 
> other languages and APIs - why wouldn't it work for us? The differences 
> across different DBMSs would be mostly in the SQL support, not the basic APIs.

It may be worth targeting ODBC early on as well, since that would get the API 
working for pretty much everything.  Then targeting a DBMS-specific API would 
be an optimization step rather than a necessary one.


>> These modules should attempt to make a good range of capabilities
>> available to the D programmer, but they not have to be all encompassing.
>> Those users who want to do really fancy things can drop back to the
>> low-level interface. They should probably have the following capabilities:
>> 
>> 1) Establishing and closing database connections.
> 
> Sounds good. Since DB connections are a precious resource, there must be a 
> RAII struct holding them. The functions below may be members of it.

The way I've done this before is via reference counting, since the connection 
handle often has to be held by all related statements, resultsets, etc.  Would 
scope(exit) be sufficient to provide RAII functionality here?


>> 3) The capability to create prepared statements with in and out
>> parameters and association of the parameters with a source, and
>> then to execute these. This breaks down into several components/
>> capabilities, which could be labeled:
>> 
>> 3a) createPreparedStatement() - marshal parameters, associate them
>> with a sourceand have the server prepare the statement.
>> 
>> 3b) execStatement() - for those SQL statements that don't have a
>> result set.
>> 
>> 3c) execStatementIncremental()/getNext() - execute the prepared statement,
>> then fetch the result rows one at a time into some kind of result set.
> 
> Here's where the range interface might be used. We might simply have 
> execStatement() that returns an input range. If the statement produced no 
> rows, the range will be empty. No need for distinguishing execStatement() and 
> execStatementIncremental().

Absolutely, though things may be a bit tricky here.  It's possible for a SQL 
query to return multiple resultsets, each of which is ideally represented by a 
range.


>> 3d) execStatementAll() - execute the prepared statement and get all
>> the resulting rows into some kind of result set.
> 
> This is not a primitive, but instead a convenience function that should be 
> used with caution and only for queries known to be small. I agree that it's 
> good to have.
> 
>> 3e) (maybe) execScalar() - do the whole sequence prepare, execute,
>> and get a single value result set placed into a D variable.
> 
> That's convenient for e.g. "SELECT COUNT(*) FROM table WHERE condition", 
> which people run all the time.
> 
>> 3f) (maybe) execStoredProcedure() - another 'do the whole thing'
>> capability TBD.
> 
> Well wouldn't that just execute a special SQL a la "CALL procedure"?

A stored procedure can have a return value and multiple resultsets, I believe.  
Certainly one or the other anyway.  I think SPs are covered by the other query 
methods, though perhaps someone can think of a QOI feature that would benefit 
from a special call.


>> It is when we come to the nature of the result sets that there is
>> likely to be dissent. I favor arrays of structs, but we should
>> probably do arrays of arrays of variants too for those situations
>> where structures can't be sorted out at compile time. There needs
>> to be some symmetry between what is used here, and what can be used
>> as input to operations such as a prepared insert. It is of course
>> vital that this part of each middle layer produce exactly the same
>> kind of results. Otherwise the top layer could become difficult.
> 
> I'd like arrays of structs too but we must acknowledge that most databases 
> people use will be large, in which case loading the whole thing eagerly takes 
> a lot of RAM and potentially wastes time. So arrays are out except for e.g. 
> convenience functions - but then we already have array() that converts an 
> arbitrary input range into an array.
> 
> Now, regarding the "structs" part, I'd like that too when the schema is 
> statically known. Two issues:
> 
> 1. MFC had at a point a wizard that generated one struct per resultset. It 
> was an absolute maintenance disaster and they recanted by offering 
> dynamically-bou

Re: Database interface design - was how to build up the library.

2011-10-07 Thread Steve Teale
Andrei,

I've actually already got that in a way. Before I attempt to populate fields 
into
a struct, the struct is checked to see if it conforms to the field properties.

Steve


Re: Database interface design - was how to build up the library.

2011-10-07 Thread Adam Ruppe
In my database.d, I used a database row struct to provide both
integer and key based indexes.


Re: Database interface design - was how to build up the library.

2011-10-07 Thread Andrei Alexandrescu

On 10/7/11 12:02 PM, Piotr Szturmaj wrote:

Andrei Alexandrescu wrote:

1. MFC had at a point a wizard that generated one struct per resultset.
It was an absolute maintenance disaster and they recanted by offering
dynamically-bound result sets. The lesson there is that defining a
struct for each query won't likely play out, so we better use
Tuple!(ColType1, ColType2, ...). A possible API would be:

auto db = std.database.connect("cdb.mysql");
auto rows = db.sql!(double, ulong)
("SELECT SUM(x), COUNT(x) FROM table");
// We know that rows is a range of Tuple!(double, ulong).
writeln("Sum is ", rows.front[0], " count is ", rows.front[1]);

Cool beans. I'd love to use such an API!
Andrei


Did you see http://pszturmaj.github.com/ddb/db.html ? It maps tuples,
structs and arrays in similar manner.


I did. Looks good. Why do you need DBRow? You could simply use 
Variant[], the given struct, or Tuple directly.


Andrei


Re: Database interface design - was how to build up the library.

2011-10-07 Thread Piotr Szturmaj

Andrei Alexandrescu wrote:

1. MFC had at a point a wizard that generated one struct per resultset.
It was an absolute maintenance disaster and they recanted by offering
dynamically-bound result sets. The lesson there is that defining a
struct for each query won't likely play out, so we better use
Tuple!(ColType1, ColType2, ...). A possible API would be:

auto db = std.database.connect("cdb.mysql");
auto rows = db.sql!(double, ulong)
("SELECT SUM(x), COUNT(x) FROM table");
// We know that rows is a range of Tuple!(double, ulong).
writeln("Sum is ", rows.front[0], " count is ", rows.front[1]);

Cool beans. I'd love to use such an API!
Andrei


Did you see http://pszturmaj.github.com/ddb/db.html ? It maps tuples, 
structs and arrays in similar manner.


Re: Database interface design - was how to build up the library.

2011-10-07 Thread Andrei Alexandrescu

On 10/7/11 11:51 AM, Steve Teale wrote:

Andrei,

The suggestion in your penultimate paragraph is what I'm trying to get started.

Maybe the answer to the problem with structs may lie somewhere in std.typecons 
in
combination with the field discovery features supported by most databases. If we
can manufacture the struct to suit the circumstances, then we've cracked it! I'm
probably talking rubbish - Friday night, couple of beers.


No can do. This is a hard wall. Structs must be defined during 
compilation. Database structure can be discovered at run time.


What you can do, however, is that you pass a struct type to store a row 
in the resultset and at runtime the fields of the struct are matched 
dynamically with the fields in the rowset. That is doable and useful.



I will attempt top start a new thread - [std.database] - your 10 cents worth?


http://www.youtube.com/watch?v=lmUZGdi7Ty4


Andrei



Re: Database interface design - was how to build up the library.

2011-10-07 Thread Steve Teale
Andrei,

The suggestion in your penultimate paragraph is what I'm trying to get started.

Maybe the answer to the problem with structs may lie somewhere in std.typecons 
in
combination with the field discovery features supported by most databases. If we
can manufacture the struct to suit the circumstances, then we've cracked it! I'm
probably talking rubbish - Friday night, couple of beers.

I will attempt top start a new thread - [std.database] - your 10 cents worth?

Thanks
Steve


Re: Database interface design - was how to build up the library.

2011-10-07 Thread Alex Rønne Petersen

On 07-10-2011 16:11, Steve Teale wrote:

I'm thinking that for each database that gets covered there will
need to be two modules, like:

etc.c.mysql
etc.mysqld

The etc.c.xxx modules would be completely different between databases -
just translations of the necessary header files. The xxxd modules
would be as similar as possible consistent with allowing the strong
points of individual database systems to show through, and the weak
points of others to be avoided. I don't think all should be reduced
to some LCD.

These modules should attempt to make a good range of capabilities
available to the D programmer, but they not have to be all encompassing.
Those users who want to do really fancy things can drop back to the
low-level interface. They should probably have the following capabilities:

1) Establishing and closing database connections.

2) The capability to execute literal SQL statements - execSQL()
if you like. Some of these will generate result sets, of which more below.

3) The capability to create prepared statements with in and out
parameters and association of the parameters with a source, and
then to execute these. This breaks down into several components/
capabilities, which could be labeled:

3a) createPreparedStatement() - marshal parameters, associate them
with a sourceand have the server prepare the statement.

3b) execStatement() - for those SQL statements that don't have a
result set.

3c) execStatementIncremental()/getNext() - execute the prepared statement,
then fetch the result rows one at a time into some kind of result set.

3d) execStatementAll() - execute the prepared statement and get all
the resulting rows into some kind of result set.

3e) (maybe) execScalar() - do the whole sequence prepare, execute,
and get a single value result set placed into a D variable.

3f) (maybe) execStoredProcedure() - another 'do the whole thing'
capability TBD.

It is when we come to the nature of the result sets that there is
likely to be dissent. I favor arrays of structs, but we should
probably do arrays of arrays of variants too for those situations
where structures can't be sorted out at compile time. There needs
to be some symmetry between what is used here, and what can be used
as input to operations such as a prepared insert. It is of course
vital that this part of each middle layer produce exactly the same
kind of results. Otherwise the top layer could become difficult.

On top of this set of two modules for each database, I envisage a
higher-level module - etc.dbops - that provides a bunch of convenience
templates for various common database operations, spanning the databases.
Once the middle layer is in place, this top layer should be relatively
easy to implement. It should be noted though that all these database
wrappers will be monstrously difficult to test.

I am at the point with MySQL where I can get the result of a plain
old query into an array of a checked structure type. I have the
prepared statement stuff, and know how the result will be created from
a prepared query (the execStatementAll() case) - I just have to plow
through a bit more binding and fetching.

This is probably rather general and vague, but I would like to get
comments so we can iterate toward a practical design.

Thanks
Steve


Just a note on module naming: mysqld is misleading - it could mean 
"mysql daemon". I would recommend just calling it mysql and calling the 
C interface etc.c.mysqlc or something like this.


- Alex


Re: Database interface design - was how to build up the library.

2011-10-07 Thread Andrei Alexandrescu

On 10/7/11 9:11 AM, Steve Teale wrote:

I'm thinking that for each database that gets covered there will
need to be two modules, like:

etc.c.mysql
etc.mysqld

The etc.c.xxx modules would be completely different between databases -
just translations of the necessary header files. The xxxd modules
would be as similar as possible consistent with allowing the strong
points of individual database systems to show through, and the weak
points of others to be avoided. I don't think all should be reduced
to some LCD.


Well we may be hasty to go that way. The driver-based approach works 
well for other languages and APIs - why wouldn't it work for us? The 
differences across different DBMSs would be mostly in the SQL support, 
not the basic APIs.



These modules should attempt to make a good range of capabilities
available to the D programmer, but they not have to be all encompassing.
Those users who want to do really fancy things can drop back to the
low-level interface. They should probably have the following capabilities:

1) Establishing and closing database connections.


Sounds good. Since DB connections are a precious resource, there must be 
a RAII struct holding them. The functions below may be members of it.



2) The capability to execute literal SQL statements - execSQL()
if you like. Some of these will generate result sets, of which more below.


Great.


3) The capability to create prepared statements with in and out
parameters and association of the parameters with a source, and
then to execute these. This breaks down into several components/
capabilities, which could be labeled:

3a) createPreparedStatement() - marshal parameters, associate them
with a sourceand have the server prepare the statement.

3b) execStatement() - for those SQL statements that don't have a
result set.

3c) execStatementIncremental()/getNext() - execute the prepared statement,
then fetch the result rows one at a time into some kind of result set.


Here's where the range interface might be used. We might simply have 
execStatement() that returns an input range. If the statement produced 
no rows, the range will be empty. No need for distinguishing 
execStatement() and execStatementIncremental().



3d) execStatementAll() - execute the prepared statement and get all
the resulting rows into some kind of result set.


This is not a primitive, but instead a convenience function that should 
be used with caution and only for queries known to be small. I agree 
that it's good to have.



3e) (maybe) execScalar() - do the whole sequence prepare, execute,
and get a single value result set placed into a D variable.


That's convenient for e.g. "SELECT COUNT(*) FROM table WHERE condition", 
which people run all the time.



3f) (maybe) execStoredProcedure() - another 'do the whole thing'
capability TBD.


Well wouldn't that just execute a special SQL a la "CALL procedure"?


It is when we come to the nature of the result sets that there is
likely to be dissent. I favor arrays of structs, but we should
probably do arrays of arrays of variants too for those situations
where structures can't be sorted out at compile time. There needs
to be some symmetry between what is used here, and what can be used
as input to operations such as a prepared insert. It is of course
vital that this part of each middle layer produce exactly the same
kind of results. Otherwise the top layer could become difficult.


I'd like arrays of structs too but we must acknowledge that most 
databases people use will be large, in which case loading the whole 
thing eagerly takes a lot of RAM and potentially wastes time. So arrays 
are out except for e.g. convenience functions - but then we already have 
array() that converts an arbitrary input range into an array.


Now, regarding the "structs" part, I'd like that too when the schema is 
statically known. Two issues:


1. MFC had at a point a wizard that generated one struct per resultset. 
It was an absolute maintenance disaster and they recanted by offering 
dynamically-bound result sets. The lesson there is that defining a 
struct for each query won't likely play out, so we better use 
Tuple!(ColType1, ColType2, ...). A possible API would be:


auto db = std.database.connect("cdb.mysql");
auto rows = db.sql!(double, ulong)
("SELECT SUM(x), COUNT(x) FROM table");
// We know that rows is a range of Tuple!(double, ulong).
writeln("Sum is ", rows.front[0], " count is ", rows.front[1]);

Cool beans. I'd love to use such an API!

2. Statically-bound tuples work only when we know the query beforehand. 
We need to cater for resultsets of unknown column count and types. The 
question here is whether we traffic in untyped memory a la ubyte[] or 
some variant type. I favor Variant because it's efficient, accommodates 
any SQL type easily, and is convenient to use.



On top of this set of two modules for each database, I envisage a
higher-level module - etc.dbops - that provides a bunch of convenience
templates for var

Database interface design - was how to build up the library.

2011-10-07 Thread Steve Teale
I'm thinking that for each database that gets covered there will
need to be two modules, like:

etc.c.mysql
etc.mysqld

The etc.c.xxx modules would be completely different between databases -
just translations of the necessary header files. The xxxd modules
would be as similar as possible consistent with allowing the strong
points of individual database systems to show through, and the weak
points of others to be avoided. I don't think all should be reduced
to some LCD.

These modules should attempt to make a good range of capabilities
available to the D programmer, but they not have to be all encompassing.
Those users who want to do really fancy things can drop back to the
low-level interface. They should probably have the following capabilities:

1) Establishing and closing database connections.

2) The capability to execute literal SQL statements - execSQL()
if you like. Some of these will generate result sets, of which more below.

3) The capability to create prepared statements with in and out
parameters and association of the parameters with a source, and
then to execute these. This breaks down into several components/
capabilities, which could be labeled:

3a) createPreparedStatement() - marshal parameters, associate them
with a sourceand have the server prepare the statement.

3b) execStatement() - for those SQL statements that don't have a
result set.

3c) execStatementIncremental()/getNext() - execute the prepared statement,
then fetch the result rows one at a time into some kind of result set.

3d) execStatementAll() - execute the prepared statement and get all
the resulting rows into some kind of result set.

3e) (maybe) execScalar() - do the whole sequence prepare, execute,
and get a single value result set placed into a D variable.

3f) (maybe) execStoredProcedure() - another 'do the whole thing'
capability TBD.

It is when we come to the nature of the result sets that there is
likely to be dissent. I favor arrays of structs, but we should
probably do arrays of arrays of variants too for those situations
where structures can't be sorted out at compile time. There needs
to be some symmetry between what is used here, and what can be used
as input to operations such as a prepared insert. It is of course
vital that this part of each middle layer produce exactly the same
kind of results. Otherwise the top layer could become difficult.

On top of this set of two modules for each database, I envisage a
higher-level module - etc.dbops - that provides a bunch of convenience
templates for various common database operations, spanning the databases.
Once the middle layer is in place, this top layer should be relatively
easy to implement. It should be noted though that all these database
wrappers will be monstrously difficult to test.

I am at the point with MySQL where I can get the result of a plain
old query into an array of a checked structure type. I have the
prepared statement stuff, and know how the result will be created from
a prepared query (the execStatementAll() case) - I just have to plow
through a bit more binding and fetching.

This is probably rather general and vague, but I would like to get
comments so we can iterate toward a practical design.

Thanks
Steve