Re: Bulk insert/load API access from DBI

Tim Bunce Tue, 06 Jul 2004 04:08:50 -0700

On Tue, Jul 06, 2004 at 11:34:28AM +0200, Michael Peppler wrote:
> I've been asked to investigate the feasibility of adding access to
> Sybase's bulk-load API in DBD::Sybase. This is an API that allows you to
> load table rows in a minimally logged manner, and is of course much
> faster than normal INSERT statements.
> 
> I have this API available in Sybase::CTlib, and I have a number of users
> who mix DBI and Sybase::CTlib to get the necessary functionality.
> 
> The API consists of an init call (blk_init()), of binding columns to be
> loaded (blk_bind()) and blk_rowxfer() to load on or more bound rows.
> blk_done() is called to commit loaded rows.
> 
> I suspect that other databases have similar APIs, and that access to
> these APIs might be useful as a general case.
> 
> I can of course implement this as a bunch of private DBD::Sybase calls,
> but after thinking about this a little I thought that it might be
> possible to integrate this to the prepare()/execute()/commit() sequence.
> 
> Maybe something like:
> 
> $sth = $dbh->prepare("BULK INSERT <tablename> -- attributes to be
> defined");
> while(<>) {
>    @row = split('\|');   # or whatever...
>    $sth->execute(@row);
> }
> $sth->commit;
> 
> Obviously the driver (or DBI) would have to recognize the BULK INSERT
> statement and switch to the bulk load API. That's similar to what I do
> now with the "EXEC ..." statement that generates RPC calls instead of
> SQL language commands to the server.
> 
> Does this make sense, or is this too Sybase-specific to be of general
> DBI interest?


Oracle has a vagely similar concept and it would be nice to come up with
a common API as it's a relatively common requirement.

But the DBI tries very hard to avoid parsing the SQL (beyond looking
for placeholders) and I'd rather not change that. Adding a pseudo-sql
interface on top of the underlying blk-load code doesn't really buy
you much. The only gain would be if the SQL itself was coming from
a script, but that's not very likely for bulk loading. It would also be
hard to define a pseudo-sql syntax that we'd be sure would work for all
drivers that might want to use it.

So at this point I'd suggest you try something along these lines...

  $sth = $dbh->prepare("INSERT INTO table (...column names...) VALUES (?,?,...)", {
        syb_bulk_load => { ... },
  });

That's a *ordinary valid sql insert statement* but the syb_bulk_load
attribute triggers DBD::Sybase to parse the statement to get the
table and column names (which is trivial, and DBI::SQL::Nano can
do it for you).

Then override the execute_for_fetch() method as way to feed in the
data (that'll make execute_array() work for you as well).  Something
like this:

        sub execute_for_fetch {
            my $sth = shift;
            return $sth->SUPER::execute_for_fetch(@_)
                unless $sth->{syb_bulk_load};
            ... your code here ...
        }

This is more efficient that execute() because you don't return from
execute_for_fetch() until the loading is complete. It also makes
it easy to switch back to non-bulk-load behaviour by just commenting
out the syb_bulk_load attribute from the prepare statement.

Once more than one driver has some kind of working bulk load support
then we could look at defining a common API, but the above seems
sufficient for now.

Tim.

Re: Bulk insert/load API access from DBI

Reply via email to