current dbd initiatives

Chris Darroch Wed, 07 Jun 2006 08:33:36 -0700

Hi --

   I thought I'd try to summarize a few of the initiatives underway
for the APR DBD interface, partly for my own benefit so I don't lose
track of anything.  I've added one new subject as well (#4).



1) Transaction modes and explicit rollbacks.  Bojan Smojver has created
a patch set which he's been testing:

http://marc.theaimsgroup.com/?l=apr-dev&m=114956299124140&w=2

(Thank you, Bojan!  Please prod me by email to do an Oracle test
for you this week.)


2) We seem to have some consensus that a new function to check whether
a particular column value is null should look like:

APU_DECLARE(int) apr_dbd_entry_is_null(const apr_dbd_driver_t *driver,
                                       apr_dbd_row_t *row, int col);

   I can create a trivial patch set for apr_dbd.{c,h} and
apr_dbd_oracle.c and I'm willing to take a stab at a few other drivers,
but I don't have a test setup to hand for them.

   One outstanding question, I think, might be how to report error
conditions.  This concerns existing functions as well; see #3 below.
Perhaps this function, for now, should ignore errors?  Or should
it look like this:

APU_DECLARE(int) apr_dbd_entry_is_null(const apr_dbd_driver_t *driver,
                                       apr_dbd_row_t *row, int col, int *null);

and return 0 on success after setting *null to 0 or 1?  That allows
for the usual APR-style error handling, i.e., don't touch *null and
return the error code.


3) Error reporting via a standard set of APR_DBD_* error codes, per
Bojan's suggestion:

http://marc.theaimsgroup.com/?l=apr-dev&m=114803269220454&w=2

   As a simple example, apr_dbd_set_dbname() is documented as returning
0 for success or, presumably, a DB-specific error code like most other
DBD functions.  However, several drivers currently return APR_ENOTIMPL.
We probably shouldn't be mixing APR error codes with DB codes like this,
for one thing.

   As a sidebar, I've never been entirely clear on the rules governing
out-of-memory situations in APR: should one catch a NULL return from
apr_palloc() and return APR_ENOMEM?  A few functions like
apr_xml_parser_convert_doc() catch NULL returns and bubble up APR_ENOMEM;
many others just plow ahead without a check.  I mention this only
because if we're supposed to return APR_ENOMEM from a DBD function on
a memory allocation error, then we need that error code to be distinct
from anything coming out of the DB.

   Overall, this seems like a *large* effort to me.  Forewarned is
forearmed, I guess.


4) I've been wondering if we should deprecate and then remove the
apr_dbd_escape() function.  We generally encourage the use of
placeholders in our query and select functions, which is good.
These two related security alerts caught my eye:

http://lwn.net/Vulnerabilities/184920/
http://lwn.net/Vulnerabilities/186068/
http://lwn.net/Articles/185077/

because they both involve string escaping, and as noted in the final
(subscriber-only, sorry) link, using placeholders avoids these kinds
of security holes.

   Because we encourage the use of placeholders, and because we
introduce a new special character (%) which we really ought to be
escaping as well as the usual ones (i.e., single-quote, maybe backslash),
I'm thinking that unless we plan to craft a bulletproof escaping
function for each DB -- perhaps not the easiest thing in the world --
then we should forget it and discourage people from trying to escape
strings and then wedge them into SQL queries.

   So, I'd propose having this function always return NULL for now,
documenting this as the error return case, and then removing it in 1.3,
if possible (see #5 below).


5) Renaming functions like apr_dbd_get_entry() to apr_dbd_entry_get().
Is this the kind of thing that can be done with a minor version number
change, i.e., in 1.3?

   My own inclination would be to provide apr_dbd_entry_get() in 1.2
and make get_entry() a wrapper for it, and then remove get_entry()
in 1.3, if possible, and the same for get_row(), etc.; otherwise,
remove them in 2.0.


6) The gnarly subject of complex data types, including binary data
types like BLOBs and BFILEs, both as input and output parameters.
Bojan's latest proposals are documented here, I believe:

http://marc.theaimsgroup.com/?l=apr-dev&m=114928825410705&w=2
http://marc.theaimsgroup.com/?l=apr-dev&m=114929898406504&w=2

   Putting aside all the new sprintf()-style format specifiers for
a moment, we have a question about how to handle binary data.
One option discussed was to create a structure with length, data,
and other elements.  Alex Dubov had some concerns about this:

http://marc.theaimsgroup.com/?l=apr-dev&m=114888730427147&w=2

To me, though, it seems nearly identical to the apr_datum_t structure
used in the DBM interface, so I'm less concerned.

   Bucket brigades also came up in relation to this subject
for use, especially, with large amounts of data.  Personally
I think that's a clever approach; using apr_brigade_length() would
seem to be appropriate for determining the length of the data, since
it will be efficient whenever possible (e.g., if the buckets' length
was initialized by apr_bucket_shared_make() or a similar function).

   For certain drivers, we also have a need to provide additional
elements -- in particular, for Oracle, knowing the table and column
name that the CLOB/BLOB/BFILE relates to is very helpful.  If the
existing [p][v]query|select() functions need to support these
data types, what if we provided an apr_bucket_type_dbd_foo bucket type
so that a bucket of that type could be prepended on the front of
the brigade?

   The challenge, it would seem to me, is how to inform the caller
when this bucket is required, and what to put in it.  Some sort of
introspection facility, perhaps?

   As a another sidebar, this whole discussion reminds of this one on
the httpd-dev list a while back, where William A. Rowe, Jr. asked
about strings in httpd 3.x:

http://marc.theaimsgroup.com/?l=apache-httpd-dev&m=112180295023956&w=2

Among other things, he raises the question of character sets,
which could be important to the DBD interface as well.

   In sum, then, bucket brigades would seem to be the "right" way
to handle large amounts of data.  That leads back to the question
of format specifiers and the current function definitions, which
all take const char* inputs and, in the case of apr_dbd_get_entry(),
return a const char* as well.

   Alex and Bojan seem to be leaning toward a more complex
sprintf()-style set of format specifiers.  At present, IIRC, %s is
really the key documented one, plus %d for PostgreSQL.  Oracle has
some others but they're only in trunk and can be altered or dropped.

   If this route is followed, my own recommendation would be to
avoid confusion by not deviating from the meaning of existing sprintf()
and apr_vformatter() format specifiers.  For instance, rather than
%h for short and %H for unsigned short, we would use %hd and %hu.
Any extensions would follow the apr_vformatter() lead and look like
%pDx, %pDz, etc.  So long as apr_vformatter() stays away from using
%pD for anything, there's no overlap.

   Alex also suggested a sscanf()-style function to read data from a
row.  So far as I know, APR doesn't have any sscanf()-like functions
yet, so we'd be breaking new ground a little.  I'm also not sure
how this would interact with the possibility that any given value
might be null, in the SQL sense.

   Speaking just for myself, though, I can't quite shake the conviction
that it might be saner to leave the existing interface largely as-is
and to provide a secondary, less string-oriented set of interfaces
for more complex SQL queries.  That is, leave [p][v]query|select()
in place handling just %s and %d, as they do now, and let get_entry()
continue to return a const char*.

   For everything else, we go back to Nick Kew's original proposal
of an apr_dbd_entry_t type and Bojan's followup suggestions:

http://marc.theaimsgroup.com/?l=apr-dev&m=114909866521113&w=2
http://marc.theaimsgroup.com/?l=apr-dev&m=114912478418539&w=2

   What about starting with something like this?  For the data
types, keep the list as small as possible by dealing only with
common, larger data types.  Column types would try to a superset of
all the common data types in the different DBs.

typedef enum {
    APR_DBD_DATA_NULL,
    APR_DBD_DATA_STRING,
    APR_DBD_DATA_INT32,
    APR_DBD_DATA_UINT32,
    APR_DBD_DATA_DOUBLE,
    APR_DBD_DATA_TIME,
    APR_DBD_DATA_BUFFER,        /* string plus length */
    APR_DBD_DATA_BRIGADE
} apr_dbd_data_e;

typedef enum {
    APR_DBD_COLUMN_VARCHAR,
    APR_DBD_COLUMN_NUMBER,
    APR_DBD_COLUMN_DATE,
    APR_DBD_COLUMN_TIME,
    APR_DBD_COLUMN_DATETIME,
    APR_DBD_COLUMN_CLOB,
    APR_DBD_COLUMN_BLOB
} apr_dbd_column_e;

typedef enum {
    APR_DBD_FLAGS_TABLE_NAME,
    APR_DBD_FLAGS_COLUMN_NAME
} apr_dbd_flags_e;

typedef struct {
    apr_dbd_data_e type;
    union apr_dbd_data_u_t {
        const char *s;
        apr_int32_t *i;
        apr_uint32_t *u;
        double d;
        apr_time_t *tm;
        struct apr_dbd_buffer_t {
            const char *buf;
            apr_size_t *len;
        } buf;
        apr_bucket_brigade *bb;
    } data;
    union apr_dbd_flag_data_u_t {
        const char *table_name;
        const char *column_name;
    } flag_data;
} apr_dbd_datum_t;

APU_DECLARE(apr_status_t) apr_dbd_datum_get(const apr_dbd_driver_t *driver,
                                            apr_dbd_row_t *row, int col);

APU_DECLARE(apr_status_t) apr_dbd_datum_pquery(const apr_dbd_driver_t *driver,
                                               apr_pool_t *pool,
                                               apr_dbd_t *handle, int *nrows,
                                               apr_dbd_prepared_t *statement,
                                               int nargs,
                                               const apr_dbd_datum_t **args);

and so forth for all the other [p][v]query|select() functions.  An
introspection function might look like:

APU_DECLARE(apr_dbd_flags_e)
    apr_dbd_column_flags_get(const apr_dbd_driver_t *driver,
                             apr_dbd_column_e column_type);

For Oracle's pesky LOBs, this would return APR_DBD_FLAGS_TABLE_NAME |
APR_DBD_FLAGS_COLUMN_NAME.

   Thoughts and flames?

Chris.

-- 
GPG Key ID: 366A375B
GPG Key Fingerprint: 485E 5041 17E1 E2BB C263  E4DE C8E3 FA36 366A 375B

current dbd initiatives

Reply via email to