Hi -- I thought I'd try to summarize a few of the initiatives underway for the APR DBD interface, partly for my own benefit so I don't lose track of anything. I've added one new subject as well (#4).
1) Transaction modes and explicit rollbacks. Bojan Smojver has created a patch set which he's been testing: http://marc.theaimsgroup.com/?l=apr-dev&m=114956299124140&w=2 (Thank you, Bojan! Please prod me by email to do an Oracle test for you this week.) 2) We seem to have some consensus that a new function to check whether a particular column value is null should look like: APU_DECLARE(int) apr_dbd_entry_is_null(const apr_dbd_driver_t *driver, apr_dbd_row_t *row, int col); I can create a trivial patch set for apr_dbd.{c,h} and apr_dbd_oracle.c and I'm willing to take a stab at a few other drivers, but I don't have a test setup to hand for them. One outstanding question, I think, might be how to report error conditions. This concerns existing functions as well; see #3 below. Perhaps this function, for now, should ignore errors? Or should it look like this: APU_DECLARE(int) apr_dbd_entry_is_null(const apr_dbd_driver_t *driver, apr_dbd_row_t *row, int col, int *null); and return 0 on success after setting *null to 0 or 1? That allows for the usual APR-style error handling, i.e., don't touch *null and return the error code. 3) Error reporting via a standard set of APR_DBD_* error codes, per Bojan's suggestion: http://marc.theaimsgroup.com/?l=apr-dev&m=114803269220454&w=2 As a simple example, apr_dbd_set_dbname() is documented as returning 0 for success or, presumably, a DB-specific error code like most other DBD functions. However, several drivers currently return APR_ENOTIMPL. We probably shouldn't be mixing APR error codes with DB codes like this, for one thing. As a sidebar, I've never been entirely clear on the rules governing out-of-memory situations in APR: should one catch a NULL return from apr_palloc() and return APR_ENOMEM? A few functions like apr_xml_parser_convert_doc() catch NULL returns and bubble up APR_ENOMEM; many others just plow ahead without a check. I mention this only because if we're supposed to return APR_ENOMEM from a DBD function on a memory allocation error, then we need that error code to be distinct from anything coming out of the DB. Overall, this seems like a *large* effort to me. Forewarned is forearmed, I guess. 4) I've been wondering if we should deprecate and then remove the apr_dbd_escape() function. We generally encourage the use of placeholders in our query and select functions, which is good. These two related security alerts caught my eye: http://lwn.net/Vulnerabilities/184920/ http://lwn.net/Vulnerabilities/186068/ http://lwn.net/Articles/185077/ because they both involve string escaping, and as noted in the final (subscriber-only, sorry) link, using placeholders avoids these kinds of security holes. Because we encourage the use of placeholders, and because we introduce a new special character (%) which we really ought to be escaping as well as the usual ones (i.e., single-quote, maybe backslash), I'm thinking that unless we plan to craft a bulletproof escaping function for each DB -- perhaps not the easiest thing in the world -- then we should forget it and discourage people from trying to escape strings and then wedge them into SQL queries. So, I'd propose having this function always return NULL for now, documenting this as the error return case, and then removing it in 1.3, if possible (see #5 below). 5) Renaming functions like apr_dbd_get_entry() to apr_dbd_entry_get(). Is this the kind of thing that can be done with a minor version number change, i.e., in 1.3? My own inclination would be to provide apr_dbd_entry_get() in 1.2 and make get_entry() a wrapper for it, and then remove get_entry() in 1.3, if possible, and the same for get_row(), etc.; otherwise, remove them in 2.0. 6) The gnarly subject of complex data types, including binary data types like BLOBs and BFILEs, both as input and output parameters. Bojan's latest proposals are documented here, I believe: http://marc.theaimsgroup.com/?l=apr-dev&m=114928825410705&w=2 http://marc.theaimsgroup.com/?l=apr-dev&m=114929898406504&w=2 Putting aside all the new sprintf()-style format specifiers for a moment, we have a question about how to handle binary data. One option discussed was to create a structure with length, data, and other elements. Alex Dubov had some concerns about this: http://marc.theaimsgroup.com/?l=apr-dev&m=114888730427147&w=2 To me, though, it seems nearly identical to the apr_datum_t structure used in the DBM interface, so I'm less concerned. Bucket brigades also came up in relation to this subject for use, especially, with large amounts of data. Personally I think that's a clever approach; using apr_brigade_length() would seem to be appropriate for determining the length of the data, since it will be efficient whenever possible (e.g., if the buckets' length was initialized by apr_bucket_shared_make() or a similar function). For certain drivers, we also have a need to provide additional elements -- in particular, for Oracle, knowing the table and column name that the CLOB/BLOB/BFILE relates to is very helpful. If the existing [p][v]query|select() functions need to support these data types, what if we provided an apr_bucket_type_dbd_foo bucket type so that a bucket of that type could be prepended on the front of the brigade? The challenge, it would seem to me, is how to inform the caller when this bucket is required, and what to put in it. Some sort of introspection facility, perhaps? As a another sidebar, this whole discussion reminds of this one on the httpd-dev list a while back, where William A. Rowe, Jr. asked about strings in httpd 3.x: http://marc.theaimsgroup.com/?l=apache-httpd-dev&m=112180295023956&w=2 Among other things, he raises the question of character sets, which could be important to the DBD interface as well. In sum, then, bucket brigades would seem to be the "right" way to handle large amounts of data. That leads back to the question of format specifiers and the current function definitions, which all take const char* inputs and, in the case of apr_dbd_get_entry(), return a const char* as well. Alex and Bojan seem to be leaning toward a more complex sprintf()-style set of format specifiers. At present, IIRC, %s is really the key documented one, plus %d for PostgreSQL. Oracle has some others but they're only in trunk and can be altered or dropped. If this route is followed, my own recommendation would be to avoid confusion by not deviating from the meaning of existing sprintf() and apr_vformatter() format specifiers. For instance, rather than %h for short and %H for unsigned short, we would use %hd and %hu. Any extensions would follow the apr_vformatter() lead and look like %pDx, %pDz, etc. So long as apr_vformatter() stays away from using %pD for anything, there's no overlap. Alex also suggested a sscanf()-style function to read data from a row. So far as I know, APR doesn't have any sscanf()-like functions yet, so we'd be breaking new ground a little. I'm also not sure how this would interact with the possibility that any given value might be null, in the SQL sense. Speaking just for myself, though, I can't quite shake the conviction that it might be saner to leave the existing interface largely as-is and to provide a secondary, less string-oriented set of interfaces for more complex SQL queries. That is, leave [p][v]query|select() in place handling just %s and %d, as they do now, and let get_entry() continue to return a const char*. For everything else, we go back to Nick Kew's original proposal of an apr_dbd_entry_t type and Bojan's followup suggestions: http://marc.theaimsgroup.com/?l=apr-dev&m=114909866521113&w=2 http://marc.theaimsgroup.com/?l=apr-dev&m=114912478418539&w=2 What about starting with something like this? For the data types, keep the list as small as possible by dealing only with common, larger data types. Column types would try to a superset of all the common data types in the different DBs. typedef enum { APR_DBD_DATA_NULL, APR_DBD_DATA_STRING, APR_DBD_DATA_INT32, APR_DBD_DATA_UINT32, APR_DBD_DATA_DOUBLE, APR_DBD_DATA_TIME, APR_DBD_DATA_BUFFER, /* string plus length */ APR_DBD_DATA_BRIGADE } apr_dbd_data_e; typedef enum { APR_DBD_COLUMN_VARCHAR, APR_DBD_COLUMN_NUMBER, APR_DBD_COLUMN_DATE, APR_DBD_COLUMN_TIME, APR_DBD_COLUMN_DATETIME, APR_DBD_COLUMN_CLOB, APR_DBD_COLUMN_BLOB } apr_dbd_column_e; typedef enum { APR_DBD_FLAGS_TABLE_NAME, APR_DBD_FLAGS_COLUMN_NAME } apr_dbd_flags_e; typedef struct { apr_dbd_data_e type; union apr_dbd_data_u_t { const char *s; apr_int32_t *i; apr_uint32_t *u; double d; apr_time_t *tm; struct apr_dbd_buffer_t { const char *buf; apr_size_t *len; } buf; apr_bucket_brigade *bb; } data; union apr_dbd_flag_data_u_t { const char *table_name; const char *column_name; } flag_data; } apr_dbd_datum_t; APU_DECLARE(apr_status_t) apr_dbd_datum_get(const apr_dbd_driver_t *driver, apr_dbd_row_t *row, int col); APU_DECLARE(apr_status_t) apr_dbd_datum_pquery(const apr_dbd_driver_t *driver, apr_pool_t *pool, apr_dbd_t *handle, int *nrows, apr_dbd_prepared_t *statement, int nargs, const apr_dbd_datum_t **args); and so forth for all the other [p][v]query|select() functions. An introspection function might look like: APU_DECLARE(apr_dbd_flags_e) apr_dbd_column_flags_get(const apr_dbd_driver_t *driver, apr_dbd_column_e column_type); For Oracle's pesky LOBs, this would return APR_DBD_FLAGS_TABLE_NAME | APR_DBD_FLAGS_COLUMN_NAME. Thoughts and flames? Chris. -- GPG Key ID: 366A375B GPG Key Fingerprint: 485E 5041 17E1 E2BB C263 E4DE C8E3 FA36 366A 375B
