I'm writing a short script which, as part of its normal operation,
performs a SUM on values from a Postgresql database.  When running the
summation query from the command line interface of Postgresql, it
looks like this:

dev=> EXPLAIN ANALYZE SELECT SUM(bytes) AS bytes FROM
files,users,files_headers,headers,senders WHERE files.file_id =
files_headers.file_id AND files_headers.header_id = headers.header_id
AND files.sender_id = senders.sender_id AND files.user_id =
users.user_id AND file_name like 'testdata.%';
 
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=62172.92..62172.93 rows=1 width=4) (actual
time=351.910..351.911 rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..62153.44 rows=7789 width=4) (actual
time=0.072..325.140 rows=22887 loops=1)
         ->  Nested Loop  (cost=0.00..2235.26 rows=7789 width=8)
(actual time=0.060..97.635 rows=22887 loops=1)
               ->  Nested Loop  (cost=0.00..239.11 rows=34 width=4)
(actual time=0.046..5.323 rows=323 loops=1)
                     ->  Nested Loop  (cost=0.00..61.58 rows=34
width=8) (actual time=0.036..2.286 rows=323 loops=1)
                           ->  Seq Scan on users  (cost=0.00..16.05
rows=5 width=4) (actual time=0.005..0.029 rows=5 loops=1)
                           ->  Index Scan using file_name_user_id_idx
on files  (cost=0.00..8.99 rows=9 width=12) (actual time=0.118..0.292
rows=65 loops=5)
                                 Index Cond: ((files.file_name >=
'testdata.'::bpchar) AND (files.file_name < 'testdata/'::bpchar) AND
(files.user_id = users.user_id))
                                 Filter: (files.file_name ~~ 'testdata.
%'::text)
                     ->  Index Scan using senders_pkey on senders
(cost=0.00..5.21 rows=1 width=4) (actual time=0.003..0.004 rows=1
loops=323)
                           Index Cond: (senders.sender_id =
files.sender_id)
               ->  Index Scan using file_id_idx on files_headers
(cost=0.00..55.85 rows=229 width=12) (actual time=0.006..0.113 rows=71
loops=323)
                     Index Cond: (files_headers.file_id =
files.file_id)
         ->  Index Scan using header_id_idx on headers
(cost=0.00..7.68 rows=1 width=12) (actual time=0.003..0.005 rows=1
loops=22887)
               Index Cond: (headers.header_id =
files_headers.header_id)
 Total runtime: 351.977 ms
(16 rows)



However the same query (as far as I can tell) executed through DBI,
takes over 50 seconds to execute.  Test script pared down to the
minimum required to reproduce the issue:

#!/usr/local/bin/perl -w
use DBI;
$dbuser = 'user';
$dbpass = 'pass';
use constant BYTE_COUNT_STH => 'SELECT SUM(bytes) AS bytes FROM
files,users,files_headers,headers,senders WHERE files.file_id =
files_headers.file_id AND files_headers.header_id = headers.header_id
AND files.sender_id = senders.sender_id AND files.user_id =
users.user_id AND file_name like ?';
$dbh = DBI->connect( "DBI:Pg:dbname=dev", $dbuser, $dbpass ) ||
    die( "Cannot connect to database: " . DBI->errstr );
$bytes_sth= $dbh->prepare( BYTE_COUNT_STH ) ||
    die( "Cannot prepare statement.\n" );

$filename = 'testdata.%';
$bytes_sth->execute( $filename );
$bytes_sth->finish;
$dbh->disconnect;
#END

The output looks like this:
$ DBI_PROFILE=2 perl -d:Profile test.pl
DBI::Profile: 45.780639s 99.52% (12 calls) test.pl @ 2010-08-09
15:03:38
'' =>
    0.090086s / 9 = 0.010010s avg (first 0.090018s, min 0.000002s, max
0.090018s)
'SELECT SUM(bytes) AS bytes FROM
files,users,files_headers,headers,senders WHERE files.file_id =
files_headers.file_id AND files_headers.header_id = headers.header_id
AND files.sender_id = senders.sender_id AND files.user_id =
users.user_id AND file_name like ?' =>
    45.690553s / 3 = 15.230184s avg (first 0.000093s, min 0.000009s,
max 45.690451s)

And the perl profiling data shows that the vast majority of the time
is spent in DBI::st::execute:
$ head prof.out
time elapsed (wall):   46.0127
time running program:  46.0065  (99.99%)
time profiling (est.): 0.0062  (0.01%)
number of calls:       405

%Time    Sec.     #calls   sec/call  F  name
99.31   45.6905        1  45.690480     DBI::st::execute
 0.20    0.0898        1   0.089833     DBD::Pg::db::_login
 0.14    0.0630        1   0.063033     <anon>:test.pl:2
 0.09    0.0432        6   0.007198     DBI::common::STORE



I'm a little confused by the DBI profiling output, which seems to
indicate that the query was run three times, but even if that is
accurate, each run would be far longer than the same run on the
console.

What could be causing this massive discrepancy?  In both cases, I'm
connecting to the same database on the same machine as the same user.
I've repeated this over and over, so it isn't likely related to
caching.  Other queries don't seem to be giving the DBI nearly this
much trouble.

System:  8.0-RELEASE-p4 amd64
DBI Version: p5-DBI-1.61.1
Postgres Version: 8.4.3

Any thoughts?

Reply via email to