I'm writing a short script which, as part of its normal operation, performs a SUM on values from a Postgresql database. When running the summation query from the command line interface of Postgresql, it looks like this:
dev=> EXPLAIN ANALYZE SELECT SUM(bytes) AS bytes FROM files,users,files_headers,headers,senders WHERE files.file_id = files_headers.file_id AND files_headers.header_id = headers.header_id AND files.sender_id = senders.sender_id AND files.user_id = users.user_id AND file_name like 'testdata.%'; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=62172.92..62172.93 rows=1 width=4) (actual time=351.910..351.911 rows=1 loops=1) -> Nested Loop (cost=0.00..62153.44 rows=7789 width=4) (actual time=0.072..325.140 rows=22887 loops=1) -> Nested Loop (cost=0.00..2235.26 rows=7789 width=8) (actual time=0.060..97.635 rows=22887 loops=1) -> Nested Loop (cost=0.00..239.11 rows=34 width=4) (actual time=0.046..5.323 rows=323 loops=1) -> Nested Loop (cost=0.00..61.58 rows=34 width=8) (actual time=0.036..2.286 rows=323 loops=1) -> Seq Scan on users (cost=0.00..16.05 rows=5 width=4) (actual time=0.005..0.029 rows=5 loops=1) -> Index Scan using file_name_user_id_idx on files (cost=0.00..8.99 rows=9 width=12) (actual time=0.118..0.292 rows=65 loops=5) Index Cond: ((files.file_name >= 'testdata.'::bpchar) AND (files.file_name < 'testdata/'::bpchar) AND (files.user_id = users.user_id)) Filter: (files.file_name ~~ 'testdata. %'::text) -> Index Scan using senders_pkey on senders (cost=0.00..5.21 rows=1 width=4) (actual time=0.003..0.004 rows=1 loops=323) Index Cond: (senders.sender_id = files.sender_id) -> Index Scan using file_id_idx on files_headers (cost=0.00..55.85 rows=229 width=12) (actual time=0.006..0.113 rows=71 loops=323) Index Cond: (files_headers.file_id = files.file_id) -> Index Scan using header_id_idx on headers (cost=0.00..7.68 rows=1 width=12) (actual time=0.003..0.005 rows=1 loops=22887) Index Cond: (headers.header_id = files_headers.header_id) Total runtime: 351.977 ms (16 rows) However the same query (as far as I can tell) executed through DBI, takes over 50 seconds to execute. Test script pared down to the minimum required to reproduce the issue: #!/usr/local/bin/perl -w use DBI; $dbuser = 'user'; $dbpass = 'pass'; use constant BYTE_COUNT_STH => 'SELECT SUM(bytes) AS bytes FROM files,users,files_headers,headers,senders WHERE files.file_id = files_headers.file_id AND files_headers.header_id = headers.header_id AND files.sender_id = senders.sender_id AND files.user_id = users.user_id AND file_name like ?'; $dbh = DBI->connect( "DBI:Pg:dbname=dev", $dbuser, $dbpass ) || die( "Cannot connect to database: " . DBI->errstr ); $bytes_sth= $dbh->prepare( BYTE_COUNT_STH ) || die( "Cannot prepare statement.\n" ); $filename = 'testdata.%'; $bytes_sth->execute( $filename ); $bytes_sth->finish; $dbh->disconnect; #END The output looks like this: $ DBI_PROFILE=2 perl -d:Profile test.pl DBI::Profile: 45.780639s 99.52% (12 calls) test.pl @ 2010-08-09 15:03:38 '' => 0.090086s / 9 = 0.010010s avg (first 0.090018s, min 0.000002s, max 0.090018s) 'SELECT SUM(bytes) AS bytes FROM files,users,files_headers,headers,senders WHERE files.file_id = files_headers.file_id AND files_headers.header_id = headers.header_id AND files.sender_id = senders.sender_id AND files.user_id = users.user_id AND file_name like ?' => 45.690553s / 3 = 15.230184s avg (first 0.000093s, min 0.000009s, max 45.690451s) And the perl profiling data shows that the vast majority of the time is spent in DBI::st::execute: $ head prof.out time elapsed (wall): 46.0127 time running program: 46.0065 (99.99%) time profiling (est.): 0.0062 (0.01%) number of calls: 405 %Time Sec. #calls sec/call F name 99.31 45.6905 1 45.690480 DBI::st::execute 0.20 0.0898 1 0.089833 DBD::Pg::db::_login 0.14 0.0630 1 0.063033 <anon>:test.pl:2 0.09 0.0432 6 0.007198 DBI::common::STORE I'm a little confused by the DBI profiling output, which seems to indicate that the query was run three times, but even if that is accurate, each run would be far longer than the same run on the console. What could be causing this massive discrepancy? In both cases, I'm connecting to the same database on the same machine as the same user. I've repeated this over and over, so it isn't likely related to caching. Other queries don't seem to be giving the DBI nearly this much trouble. System: 8.0-RELEASE-p4 amd64 DBI Version: p5-DBI-1.61.1 Postgres Version: 8.4.3 Any thoughts?