On Thursday 16 July 2009 07:20:18 Marc Cousin wrote: > Le Thursday 16 July 2009 01:56:37, Devin Ben-Hur a écrit : > > Marc Cousin wrote: > > > This mail contains the asked plans : > > > Plan 1 > > > around 1 million records to insert, seq_page_cost 1, random_page_cost 4 > > > > > > -> Hash (cost=425486.72..425486.72 rows=16746972 width=92) > > > (actual time=23184.196..23184.196 rows=16732049 loops=1) -> Seq Scan > > > on path (cost=0.00..425486.72 rows=16746972 width=92) (actual > > > time=0.004..7318.850 rows=16732049 loops=1) > > > > > > -> Hash (cost=1436976.15..1436976.15 rows=79104615 width=35) > > > (actual time=210831.840..210831.840 rows=79094418 loops=1) -> Seq Scan > > > on filename (cost=0.00..1436976.15 rows=79104615 width=35) (actual > > > time=46.324..148887.662 rows=79094418 loops=1) > > > > This doesn't address the cost driving plan question, but I think it's a > > bit puzzling that a seq scan of 17M 92-byte rows completes in 7 secs, > > while a seqscan of 79M 35-byte rows takes 149secs. It's about 4:1 row > > ratio, less than 2:1 byte ratio, but a 20:1 time ratio. Perhaps there's > > some terrible bloat on filename that's not present on path? If that seq > > scan time on filename were proportionate to path this plan would > > complete about two minutes faster (making it only 6 times slower instead > > of 9 :). > > Much simpler than that I think : there is a bigger percentage of path that > is used all the time than of filename. The database used is the production > database, so there were other insert queries running a few minutes before I > got this plan. > > But I'll give it a look today and come back with bloat and cache > information on these 2 tables.
Here are the stats for filename : SELECT * from pgstattuple('public.filename'); table_len | tuple_count | tuple_len | tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent | free_space | free_percent ------------+-------------+------------+---------------+------------------+----------------+--------------------+------------+-------------- 5308719104 | 79338344 | 4717466438 | 88.86 | 0 | 0 | 0 | 11883396 | 0.22 So I guess it's not bloated. I checked in the cache, the times displayed before were with path in the cache. filename couldn't stay in the cache, as it's too big. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance