Steve, does this just apply to count(*), or is count(id) just as bad? I was originally a MySQL user and count(*) could be very efficient there.
cheers Ben On 26/08/2011, at 12:01 AM, Stephen Woodbridge wrote: > The issue here is the count(*) which forces a full table scan in postgresql > as fork mentioned. You need to look at a real query, unless you are really > doing a count(*). > > -Steve > > On 8/25/2011 11:49 AM, Ben Madin wrote: >> I'm no expert at this, but my understanding (which is limited) was >> that you are asking for the whole table, so indexing doesn't really >> get used (my understanding is that indexing helps to find the page >> for a subset of data more quickly than scanning through the whole >> lot). >> >> Also, you might be able to get some speed up by using a different >> join type (outer join and not null where clause)? >> >> cheers >> >> Ben >> >> >> On 25/08/2011, at 9:41 PM, Charles Galpin wrote: >> >>> If this is too off topic, please let me know and I'll sign up on a >>> postgres list to get help. But this is related to my use of postgis >>> and If anyone knows this stuff, it's you guys. >>> >>> I have an example query that I expect to be much faster, but my >>> main concern is we are about to do some visualization of historical >>> congestion data which will require queries across much larger data >>> sets - like 150 million records a day. We are about to test using >>> partitions but the number per table will still be much larger than >>> what I am dealing with now. >>> >>> So here is a query I would think would be much faster than 43 >>> seconds for two tables, one with about 97k rows, and the other 3.2 >>> million. >>> >>> explain select count(l.*) from links l, source_link ld where >>> l.link_id = ld.link_id; /* 'Aggregate (cost=174731.72..174731.73 >>> rows=1 width=32)' ' -> Hash Join (cost=13024.27..166784.14 >>> rows=3179029 width=32)' ' Hash Cond: (ld.link_id = >>> l.link_id)' ' -> Seq Scan on source_link ld >>> (cost=0.00..58282.29 rows=3179029 width=10)' ' -> Hash >>> (cost=10963.12..10963.12 rows=96812 width=42)' ' -> >>> Seq Scan on links l (cost=0.00..10963.12 rows=96812 width=42)' */ >>> >>> Each table has an index on link_id, defined like this >>> >>> CREATE INDEX links_link_id_idx ON links USING btree (link_id); >>> >>> CREATE INDEX source_link_link_id_idx ON source_link USING btree >>> (link_id); >>> >>> Shouldn't this index prevent these sequential scans, or am I >>> misreading this? Should this really take 43 seconds? >>> >>> thanks for any advice, charles >>> >>> _______________________________________________ postgis-users >>> mailing list postgis-users@postgis.refractions.net >>> http://postgis.refractions.net/mailman/listinfo/postgis-users >> >> _______________________________________________ postgis-users mailing >> list postgis-users@postgis.refractions.net >> http://postgis.refractions.net/mailman/listinfo/postgis-users > > _______________________________________________ > postgis-users mailing list > postgis-users@postgis.refractions.net > http://postgis.refractions.net/mailman/listinfo/postgis-users _______________________________________________ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users