On Tue, Jul 06, 2004 at 11:34:28AM +0200, Michael Peppler wrote:
> I've been asked to investigate the feasibility of adding access to
> Sybase's bulk-load API in DBD::Sybase. This is an API that allows you to
> load table rows in a minimally logged manner, and is of course much
> faster than normal INSERT statements.
>
> I have this API available in Sybase::CTlib, and I have a number of users
> who mix DBI and Sybase::CTlib to get the necessary functionality.
>
> The API consists of an init call (blk_init()), of binding columns to be
> loaded (blk_bind()) and blk_rowxfer() to load on or more bound rows.
> blk_done() is called to commit loaded rows.
>
> I suspect that other databases have similar APIs, and that access to
> these APIs might be useful as a general case.
>
> I can of course implement this as a bunch of private DBD::Sybase calls,
> but after thinking about this a little I thought that it might be
> possible to integrate this to the prepare()/execute()/commit() sequence.
>
> Maybe something like:
>
> $sth = $dbh->prepare("BULK INSERT <tablename> -- attributes to be
> defined");
> while(<>) {
> @row = split('\|'); # or whatever...
> $sth->execute(@row);
> }
> $sth->commit;
>
> Obviously the driver (or DBI) would have to recognize the BULK INSERT
> statement and switch to the bulk load API. That's similar to what I do
> now with the "EXEC ..." statement that generates RPC calls instead of
> SQL language commands to the server.
>
> Does this make sense, or is this too Sybase-specific to be of general
> DBI interest?
Oracle has a vagely similar concept and it would be nice to come up with
a common API as it's a relatively common requirement.
But the DBI tries very hard to avoid parsing the SQL (beyond looking
for placeholders) and I'd rather not change that. Adding a pseudo-sql
interface on top of the underlying blk-load code doesn't really buy
you much. The only gain would be if the SQL itself was coming from
a script, but that's not very likely for bulk loading. It would also be
hard to define a pseudo-sql syntax that we'd be sure would work for all
drivers that might want to use it.
So at this point I'd suggest you try something along these lines...
$sth = $dbh->prepare("INSERT INTO table (...column names...) VALUES (?,?,...)", {
syb_bulk_load => { ... },
});
That's a *ordinary valid sql insert statement* but the syb_bulk_load
attribute triggers DBD::Sybase to parse the statement to get the
table and column names (which is trivial, and DBI::SQL::Nano can
do it for you).
Then override the execute_for_fetch() method as way to feed in the
data (that'll make execute_array() work for you as well). Something
like this:
sub execute_for_fetch {
my $sth = shift;
return $sth->SUPER::execute_for_fetch(@_)
unless $sth->{syb_bulk_load};
... your code here ...
}
This is more efficient that execute() because you don't return from
execute_for_fetch() until the loading is complete. It also makes
it easy to switch back to non-bulk-load behaviour by just commenting
out the syb_bulk_load attribute from the prepare statement.
Once more than one driver has some kind of working bulk load support
then we could look at defining a common API, but the above seems
sufficient for now.
Tim.