I just started reading about the new date and time module consolidation
project on the [EMAIL PROTECTED] mailing list. The subject of database-
specific date and time formats quickly appeared, mostly in the context of
how to expose this kind of functionality (a subclass for each database? DB
names as parameters? DWIMery? etc.)
My response to the discussion was that the design of the db-specific date
parsing and formatting APIs need not concentrate too much on "end-user"
niceties since, ideally, the only modules that would ever need to use these
APIs directly would be the DBD:: modules.
Whether or not this ever comes to pass depends mostly on the reaction of the
DBI developers to this idea, so let me outline my proposal and you can tell
me what you think. (Sorry if I'm a bit long-winded, but I want to give some
background and explain my reasoning, such as it is :)
The purpose of DBI ("to provide a consistent database interface, independent
of the actual database being used") can be narrowly interpreted to mean that
someone writes code to work with the database's native C API, and then
exposes that functionality via a common Perl API (the DBI). But I favor a
broader interpretation in which the functionality exposed is not limited to
that implemented by the database's native API.
The existing DBI function that was foremost in my mind when considering this
philosophy was $dbh->quote(...). Although not all databases conveniently
expose literal quoting functionality, it was deemed sufficiently important
(and convenient) for DBI to include this in the interface. Happily, there
is enough commonality across databases that a single default implementation
covers many databases. And when different behavior is required, it's just a
simple matter of overriding the default implementation in the DBD.
I believe that database-specific date parsing and formatting is equally ripe
for support in DBI. Almost every database table has some sort of date field
in it, so the need to deal with dates is very common. And most databases
share very similar date formats (e.g. YYYY-MM-DD HH:MM:SS), so there is a
lot of common ground for a default implementation. But there are enough
db-specific oddities ("-infinity", "allballs" :) that requiring the end-user
to deal with dates himself forces him to add database-specific logic to his
DBI code, subverting the purpose of DBI.
Since date column types are also very common across databases (or are
semantically equivalent, if not exact matches), I think they're the best
place to start forming an API. At minimum, I propose a family of methods
like this:
$val = $dbh->format_<column-type>(...);
Covering every possible date/time column type may not be feasible, since
some databases support column types that other databases do not. But these
formats are often equivalent (e.g. DATETIME and TIMESTAMP may both be full
"year, month, day, hour, minute, second" dates, despite the different
names.) And even just covering the most common date and time column types
will be a tremendous help. Here's an initial list off the top of my head:
date
time
datetime
timestamp
But before we explore that further, let's look at the arguments and the
return values of these functions. Example:
$val = $dbh->format_datetime(...);
Obviously $val is a string suitable for use in the "datetime" column of of
whatever database $dbh is connected to. (This string should be unquoted,
since already have $dbh->quote() to do that part.)
The argument that goes in place of the "..." is where things start to get a
bit tricky. The first thing it calls to mind is the need to parse date
strings in a variety of formats. For example:
$val = $dbh->format_datetime('1/1/2004');
$val = $dbh->format_date('tomorrow');
$val = $dbh->format_timestamp('next Thursday at noon');
Such functionality is well beyond the scope of DBI, however. But that being
so, we are left with the problem of what, exactly, should be passed to the
proposed DBI date formatting methods.
It would be nice if we had a "canonical" date/time representation. That's
part of what the new DateTime modules being discussed on the
[EMAIL PROTECTED] mailing list hope to deliver. But regardless of if/when
that project comes to fruition, I still believe this functionality is an
important addition to DBI.
So, in the absence of both a canonical date/time representation and
extensive date parsing abilities built into DBI, how do we support this? I
propose that each $dbh be (optionally) parameterized with a date parser and
formatter.
* The parser must be able to parse any of the database's natively supported
date/time formats.
* The formatter must be able to take the output of the parser and return
strings suitable for use in any of the database's natively supported
date/time columns.
The DBD implementations must include default parsers and formatters of their
choosing, using whatever modules (or custom C code or whatever :) they
decide is best for their driver. DBI should also provide default
implementations that cover most of the common formats.
Alternate parsers and formatters may be specified at different
granularities. The broadest form is that of a class name.
$dbh->date_parser('My::Date::Parser');
$dbh->date_formatter('My::Date::Formatter');
These classes must support methods with the same names as the DBI date
methods. In the case of the parser, there is just a single method:
parse_date()
The formatter must have a method for each column type:
format_date()
format_time()
format_datetime()
...etc.
At a finer granularity, individual methods may be overridden with either a
class name or a code reference. For example:
$dbh->date_formatter('My::Date::Formatter');
$dbh->date_formatter(timestamp => 'My::Date::Timestamp::Formatter');
$dbh->date_formatter(time => \&format_time);
In this case, all date column types other than times and timestamps would be
parsed by calling:
My::Date::Formatter->format_<column-type>(...)
but timestamps would be formatted by calling:
My::Date::Timestamp::Formatter->format_timestamp(...)
and times would be formatted by calling the subroutine:
format_time(...)
The long-term goal is to use the new DateTime modules, which will presumably
have very regular interfaces for database-specific date parsing (making the
DBD author's life easier). This would also have the added benefit of making
the date representations returned from the date parser be the same across
all DBDs.
But the existence of these new DateTime APIs will not be sufficient to
ensure any measure of database independence. Unless the functionality is
accessed via a common DBI interface, and not the DateTime modules
themselves, end-user programs will still be compromised by
database-dependent code.
Consequently, the number one goal of this proposal is to ensure that any
database-specific date/time handling is exposed via DBI, since that is the
only context in which database-specific decisions can be made without
end-user involvement.
What I've outlined above is the bare minimum functionality that I think is
required to remove the majority of the burden of reading and writing
date/time column values in a database-independent way.
Perhaps an even more general system is possible. You'll notice that I
didn't delve into even the most obvious areas of possible DWIMery (e.g.
hooking date parsing and auto-formatting into $sth->execute(...)). And I've
avoided proposing an even more generic column-type formatting and parsing
API. I don't want this to snowball into a huge project that never gets
completed :) And the details of the interface are certainly up for grabs.
But I've wanted this functionality in some form or another for years. I've
implemented it myself in DBI wrapper classes more times than I care to
remember. But it properly belongs in DBI itself, IMO. I hope that the DBI
powers-that-be agree :)
If so, I hope we can hammer out a spec quickly and get the ball rolling,
while also pursing long-term integration with the [EMAIL PROTECTED] folks.
There is a tremendous amount of low-hanging fruit in this area, IMO. Date
and time column types tend to have very regular formats, making them easy to
parse and easy to create.
If we do the spec right, we won't have to change anything if/when the new
DateTime modules have stabilized. And in the meantime, DBI users everywhere
can stop dealing with date and time columns the same way they've stopped
dealing with manually quoting literals via s/'/''/g or s/'/\\'/ or whatever.
So, what do you all think? :)
-John