Re: Potential DBI array insert solution

Jonathan Leffler Wed, 28 Feb 2001 21:00:28 -0800
Dean Arnold wrote:

> You might want to view the dbi-dev maillist, we've been kicking around
> solutions to this. The last proposal was bind_param_array()
> and bind_col_array() methods,
> and associated bind_param_status() and bind_row_status() for
> reporting individual tuple status info.

I've not previoiusly contributed to this discussion, but...

1.  The next version of DBD::Informix will have a mechanism analogous to an array
insert (it's called an INSERT cursor in Informix) enabled, and it gets a very
considerable performance benefit too.  I think I measured it at around 10x faster
over a WAN.  It will be enabled via Informix-specific attributes.

2.  Informix does not provide a direct way to mess with the row count as described
in the proposal.  Nor does it provide a direct way to mess with the buffer size --
it uses an environment variable, and (a) I don't know when it gets read so I don't
know whether you can set the environment variable on the fly and have it take
effect, and (b) even if you can set it on the fly, Perl is apt to get a tad upset
if your C code goes messing with its environment.  I know this from past experience
(Informix ESQL/C sometimes sets an environment variable on the fly), and its been
discussed inconclusively a couple of times on perl5-porters, and I submitted an RFC
to the Perl 6 process requesting that some mechanism be provided for allowing a
module to tell Perl to check whether the environment has been changed.  It's at a
very low level of detail, but it would make some difference.

Given item 2 above, DBD::Informix cannot directly comply with the proposed feature,
and especially not the proposed parameterization of the feature.  Consequently, I
cannot recommend it to DBI as described.

However, I can see that there is a desire, and a need, for saying "Please optimize
this sequence of inserts".  I don't much care how you say that, but you should be
able to say it.  With the bind_param_array mechanism, its pretty obvious how
DBD::Informix could do it.  I could also accept an attribute (set on connect or
prepare) such as InsertOptimization => 1 to enable INSERT optimization.  If that
was provided with optional hints like InsertCount => 20 and InsertSize or
InsertCache or InsertBuffer => 4096 as in the proposal I'm rejecting, that would be
acceptable and implementable by DBD::Informix (those hints would be ignored).  The
key difference between what I'm suggesting and what I'm rejecting is that the
optimization is requested in a database neutral manner, and the parameterization
(which DBD::Informix would not be able to do reliably) is optional and ignorable.

[About InsertCount: yes, of course DBD::Informix could count every N rows and do a
FLUSH of the INSERT cursor, but there's no guarantee that the library has not
already flushed it several times before the N rows were inserted.  The size of the
buffer is (most probably) fixed when the process first starts communicating with
the database (so the InsertBuffer attribute would have to be available to
DBD::Informix before the driver made the connection to the database), and the
number of rows it can contain is (obviously) controlled by the size of the rows
inserted.]


> I have hacked up DBD::ODBC and the DBI layer to support these -with pretty
> good performance improvement for DB2 (up to 20x faster) and Oracle (up to 10x
> faster). I have turned this over to the keeper of DBD::ODBC, and (hopefully)
> it will be incorporated into the next release.
>
> As for your suggestions, I think the caching of rows may have some
> problems wrt transaction behavior, and also making inter-object calls for each
> row would detract from the performance improvement (as opposed to passing
> arrays in a single inter-object call). Also, the bind_param_array() API permits
> mixing arrays and scalars in a single statement execution
> (e.g., for stored procedure parameters).

For DBD::Informix, AutoCommit => 1 is messy -- IMNSHO the insert optimization
doesn't make sense with AutoCommit => 1.  With AutoCommit => 0, then the
transactional behaviour will depend on whether you do an explicit commit or not.
If you commit, then the pending data will be sent and the TX committed.  If you
rollback or disconnect without explicitly terminating the TX, then the whole lot
will be discarded.  If that's not what you want, don't do it.  Basically, Informix
would 'sense' the pending data and send it to the database as part of committing a
TX.  And No, off the top of my head, I have no idea what happens if some of the
rows are not acceptable for insertion -- that's a whole 'nother sticky area.  The
non-insertable rows are definitely not inserted; I don't know whether that means
that the COMMIT fails and the TX is rolled back instead, or the COMMIT fails but
the TX is still active, or the COMMIT succeeds, or what.


> ----- Original Message -----
> From: "Jeff Holt" <[EMAIL PROTECTED]>
> To: <[EMAIL PROTECTED]>
> Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
> Sent: Wednesday, February 28, 2001 4:43 AM
> Subject: Potential DBI array insert solution
>
> > I respectfully request feedback for the following potential DBI array
> > insert solution.
> >
> > PART 1
> > Add two new hash array attributes for the connect() and prepare()
> > methods called 'insertcount' and 'insertsize'. The 'insertcount'
> > attribute tells the execute() method to 'cache' a number of rows before
> > performing an array insert. The 'insertsize' attribute tells the
> > execute() method to not exceed the specified 'cache' size. The
> > 'insertsize' attribute should override the 'insertcount' attribute. If
> > these two attributes aren't set then, for backwards compatibility, the
> > existing DBI v1.14 behavior should prevail (the default for
> > 'insertcount' should be 1). If the statement being prepared is not an
> > insert and the attributes were set via:
> >     . connect(), then they should be ignored
> >     . prepare(), then an error should be raised
> >
> > PART 2
> > Add a new method called execflush() that allows the programmer to
> > control the DBI's behavior on commit or disconnect. One could possibly
> > argue that
> > the DBI could 'sense' pending rows prior to a commit or disconnect.

--
Jonathan Leffler ([EMAIL PROTECTED], [EMAIL PROTECTED])
Guardian of DBD::Informix 1.00.PC1 -- see http://www.cpan.org/
#include <disclaimer.h>
Re: Potential DBI array insert solution

Reply via email to