Hi all, here is my postgresql configuration: "version";"PostgreSQL 9.0.3 on amd64-portbld-freebsd8.0, compiled by GCC cc (GCC) 4.2.1 20070719 [FreeBSD], 64-bit" "bytea_output";"escape" "checkpoint_segments";"64" "client_encoding";"UNICODE" "effective_cache_size";"6GB" "fsync";"off" "lc_collate";"C" "lc_ctype";"C" "listen_addresses";"*" "log_destination";"syslog" "max_connections";"20" "max_stack_depth";"2MB" "server_encoding";"UTF8" "shared_buffers";"4GB" "silent_mode";"on" "synchronous_commit";"off" "TimeZone";"Europe/Jersey" "update_process_title";"off" "work_mem";"24MB"
I have a partitioned table tcpsessiondata: CREATE TABLE appqosdata.tcpsessiondata ( detectorid smallint not null, createdtime bigint not null, sessionid bigint not null, ... ); That table has many millions of rows in each partition and no data in the main one. Here is an example: select count(*) from appqosdata.tcpsessiondata_1; count ---------- 49377910 (1 row) Every partition has a "Primary Key (detectorid, createdtime)" I run the following query on that table: select cast (SD.detectorid as numeric), CAST( (createdtime / 61000000000::bigint) AS numeric) as timegroup, sum(datafromsource)+sum(datafromdestination) as numbytes, CAST ( sum(packetsfromsource)+sum(packetsfromdestination) AS numeric) as numpackets from appqosdata.tcpsessiondata SD where SD.detectorid >= 0 and SD.createdtime >= 1297266601368086000::bigint and SD.createdtime < 1297270202368086000::bigint group by SD.detectorid, timegroup The table is partitioned by a "sessionid" which is not used in this particular query so I already expect all partitions to be touched. However I have a bad scan choice on at least a couple of partitions: "HashAggregate (cost=5679026.42..5679028.76 rows=67 width=34) (actual time=160113.366..160113.366 rows=0 loops=1)" " Output: (sd.detectorid)::numeric, (((sd.createdtime / 61000000000::bigint))::numeric), (sum(sd.datafromsource) + sum(sd.datafromdestination)), ((sum(sd.packetsfromsource) + sum(sd.packetsfromdestination)))::numeric, sd.detectorid" " -> Result (cost=0.00..5679025.41 rows=67 width=34) (actual time=160113.360..160113.360 rows=0 loops=1)" " Output: sd.detectorid, sd.createdtime, sd.datafromsource, sd.datafromdestination, sd.packetsfromsource, sd.packetsfromdestination, ((sd.createdtime / 61000000000::bigint))::numeric" " -> Append (cost=0.00..5679025.08 rows=67 width=34) (actual time=160113.356..160113.356 rows=0 loops=1)" " -> Seq Scan on appqosdata.tcpsessiondata sd (cost=0.00..23.65 rows=1 width=34) (actual time=0.002..0.002 rows=0 loops=1)" " Output: sd.detectorid, sd.createdtime, sd.datafromsource, sd.datafromdestination, sd.packetsfromsource, sd.packetsfromdestination" " Filter: ((sd.detectorid >= 0) AND (sd.createdtime >= 1297266601368086000::bigint) AND (sd.createdtime < 1297270202368086000::bigint))" " -> Seq Scan on appqosdata.tcpsessiondata_1 sd (cost=0.00..1373197.46 rows=1 width=34) (actual time=46436.737..46436.737 rows=0 loops=1)" " Output: sd.detectorid, sd.createdtime, sd.datafromsource, sd.datafromdestination, sd.packetsfromsource, sd.packetsfromdestination" " Filter: ((sd.detectorid >= 0) AND (sd.createdtime >= 1297266601368086000::bigint) AND (sd.createdtime < 1297270202368086000::bigint))" " -> Seq Scan on appqosdata.tcpsessiondata_2 sd (cost=0.00..2447484.00 rows=1 width=34) (actual time=108359.967..108359.967 rows=0 loops=1)" " Output: sd.detectorid, sd.createdtime, sd.datafromsource, sd.datafromdestination, sd.packetsfromsource, sd.packetsfromdestination" " Filter: ((sd.detectorid >= 0) AND (sd.createdtime >= 1297266601368086000::bigint) AND (sd.createdtime < 1297270202368086000::bigint))" " -> Index Scan using tcpsessiondata_3_pkey on appqosdata.tcpsessiondata_3 sd (cost=0.00..11.51 rows=1 width=34) (actual time=0.016..0.016 rows=0 loops=1)" " Output: sd.detectorid, sd.createdtime, sd.datafromsource, sd.datafromdestination, sd.packetsfromsource, sd.packetsfromdestination" " Index Cond: ((sd.detectorid >= 0) AND (sd.createdtime >= 1297266601368086000::bigint) AND (sd.createdtime < 1297270202368086000::bigint))" ... (many more partitions here).... " -> Index Scan using tcpsessiondata_61_pkey on appqosdata.tcpsessiondata_61 sd (cost=0.00..8162.42 rows=1 width=34) (actual time=25.446..25.446 rows=0 loops=1)" " Output: sd.detectorid, sd.createdtime, sd.datafromsource, sd.datafromdestination, sd.packetsfromsource, sd.packetsfromdestination" " Index Cond: ((sd.detectorid >= 0) AND (sd.createdtime >= 1297266601368086000::bigint) AND (sd.createdtime < 1297270202368086000::bigint))" " -> Index Scan using tcpsessiondata_62_pkey on appqosdata.tcpsessiondata_62 sd (cost=0.00..11.51 rows=1 width=34) (actual time=0.008..0.008 rows=0 loops=1)" " Output: sd.detectorid, sd.createdtime, sd.datafromsource, sd.datafromdestination, sd.packetsfromsource, sd.packetsfromdestination" " Index Cond: ((sd.detectorid >= 0) AND (sd.createdtime >= 1297266601368086000::bigint) AND (sd.createdtime < 1297270202368086000::bigint))" " -> Index Scan using tcpsessiondata_63_pkey on appqosdata.tcpsessiondata_63 sd (cost=0.00..11.51 rows=1 width=34) (actual time=0.006..0.006 rows=0 loops=1)" " Output: sd.detectorid, sd.createdtime, sd.datafromsource, sd.datafromdestination, sd.packetsfromsource, sd.packetsfromdestination" " Index Cond: ((sd.detectorid >= 0) AND (sd.createdtime >= 1297266601368086000::bigint) AND (sd.createdtime < 1297270202368086000::bigint))" " -> Seq Scan on appqosdata.tcpsessiondata_64 sd (cost=0.00..13.00 rows=1 width=34) (actual time=0.102..0.102 rows=0 loops=1)" " Output: sd.detectorid, sd.createdtime, sd.datafromsource, sd.datafromdestination, sd.packetsfromsource, sd.packetsfromdestination" " Filter: ((sd.detectorid >= 0) AND (sd.createdtime >= 1297266601368086000::bigint) AND (sd.createdtime < 1297270202368086000::bigint))" " -> Seq Scan on appqosdata.tcpsessiondata_65 sd (cost=0.00..117.64 rows=1 width=34) (actual time=0.854..0.854 rows=0 loops=1)" " Output: sd.detectorid, sd.createdtime, sd.datafromsource, sd.datafromdestination, sd.packetsfromsource, sd.packetsfromdestination" " Filter: ((sd.detectorid >= 0) AND (sd.createdtime >= 1297266601368086000::bigint) AND (sd.createdtime < 1297270202368086000::bigint))" " -> Index Scan using tcpsessiondata_66_pkey on appqosdata.tcpsessiondata_66 sd (cost=0.00..11.51 rows=1 width=34) (actual time=0.007..0.007 rows=0 loops=1)" " Output: sd.detectorid, sd.createdtime, sd.datafromsource, sd.datafromdestination, sd.packetsfromsource, sd.packetsfromdestination" " Index Cond: ((sd.detectorid >= 0) AND (sd.createdtime >= 1297266601368086000::bigint) AND (sd.createdtime < 1297270202368086000::bigint))" " -> Index Scan using tcpsessiondata_67_pkey on appqosdata.tcpsessiondata_67 sd (cost=0.00..11.51 rows=1 width=34) (actual time=0.005..0.005 rows=0 loops=1)" " Output: sd.detectorid, sd.createdtime, sd.datafromsource, sd.datafromdestination, sd.packetsfromsource, sd.packetsfromdestination" " Index Cond: ((sd.detectorid >= 0) AND (sd.createdtime >= 1297266601368086000::bigint) AND (sd.createdtime < 1297270202368086000::bigint))" "Total runtime: 160114.339 ms" The question is: why do we get a seq scan on appqosdata.tcpsessiondata_1 and appqosdata.tcpsessiondata_2 even if the planner estimates correctly 1 row out of millions could potentially be selected? As you can see ~90% of the time is spent on those 2 partitions even if they are not apparently different from any of the others. I would appreciate any help with this issue. Thank you, Svetlin Manavski