mark d wrote:
On 1/18/06, Robert Hicks <[EMAIL PROTECTED]> wrote:
mark d wrote:
On 1/18/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
---- mark d <[EMAIL PROTECTED]> wrote:
This optimization might have made some sense decades over a serial line, but
not today. It is saving 320 single-packet round trips, which is an
infintesimal amount of traffic.
Well, traffic wise, you're right. But, it also saves parsing all those rows
out of the
DB and through the DBI; this way the only process that has to work with the data
is the oracle engine. Use DBI where appropriate... Right tool for the job,
and all that...
I have no idea what "parsing all those rows out of the DB and through the DBI"
means, though it sounds like you think row data is copied from the DB to
the DBI in the course of a sql DELETE statement (that would be a fundamental
misunderstanding of SQL and DBI.)
In any case, you're mistaken that the PL/SQL saves anything but a few packets.
To be explicit, the proposed PL/SQL was:
declare
rows_processed number;
batch_size number := 50000;
begin
rows_processed := batch_size;
while rows_processed = batch_size loop
delete from table where rownum < batch_size + 1;
rows_processed := SQL%ROWCOUNT;
commit;
end loop;
end;
The equivalent DBI code is:
$sth = $dbh->prepare("delete from TABLE where rownum < 50001");
while ( ($deleted = $sth->execute) > 0) {
printf "deleted $deleted\n";
The only difference between the two is a few network packets.
Is that the only difference? I read it, and I could be wrong, that your
SQL will stop when rownum hits 50001, while his will keep going in
batches of 50000 until there isn't any more to delete given the delete
statement.
No. Look more closely. The execute is inside a while loop.
The execute() method returns the number of "affected" rows. For a DELETE,
this means the number of deleted rows. The conditional test inside the loop
checks to see when more than zero rows have been deleted, so it repeats
the delete until zero rows are left.
Here's a real-life sample:
code:
$batch = 50001;
$sth = $dbh->prepare("delete from SNAFU where rownum < $batch");
printf "Rows in table = %d\n",
$dbh->selectall_arrayref("select count(*) from SNAFU")->[0]->[0];
$t = time();
while (($deleted = $sth->execute) > 0) {
printf "batch %d: deleted %d rows in %f secs\n",
++$i, $deleted, time()-$t;
$t = time();
}
Output:
Rows in table = 700000
batch 1: deleted 50000 rows in 0.850589 secs
batch 2: deleted 50000 rows in 0.840419 secs
batch 3: deleted 50000 rows in 0.931219 secs
batch 4: deleted 50000 rows in 0.835611 secs
batch 5: deleted 50000 rows in 0.841162 secs
batch 6: deleted 50000 rows in 0.834864 secs
batch 7: deleted 50000 rows in 3.877935 secs
batch 8: deleted 50000 rows in 0.871310 secs
batch 9: deleted 50000 rows in 0.826238 secs
batch 10: deleted 50000 rows in 0.825013 secs
batch 11: deleted 50000 rows in 0.829521 secs
batch 12: deleted 50000 rows in 0.838740 secs
batch 13: deleted 50000 rows in 0.830472 secs
batch 14: deleted 50000 rows in 0.881580 secs
Done
Yours is a simple delete. I need to break up my deletes into manageable
sizes, do the delete, do the next batch, etc.
Note that unless you have a meaningful basis for, and definition of,
"manageable",
this is all a complete waste of time anyway. (I originally only
chimed in to note
the pointless "optimization", but I guess I'll comment on the original
question too :)
"large" is highly relative. 16 million is "small" to me. It might be
large for a
MS Access rolodex, but it's certainly not large by any objective measure.
On the delete side I have about 16 million records that will be deleted.
I have two of those to do. I have one update that changes a few values
on about 3.1 million records (not the whole table).
Unless you have good reason to believe it would be a problem, just issue
a single SQL statement and don't waste your time fooling around with
these hacks.
That is probably what "I" would do but my DBA is a little jittery and
wants it broken up. :)
I do appreciate your comments though.
Robert