Re: [SQL] Query planning question

John Lister Mon, 11 May 2009 08:21:28 -0700

"John Lister" <john.lister...@kickstone.com> writes:
Am I right in assuming the planner thinks a sequential scan is quickerthan 10k index hits, would tweaking the costs fix this or would i bebetter updating the stats for the product_id and manufacturer_id fields?
AFAICT the planner did exactly the right things here.  Your first
example is fetching 40 times as many rows from retailer_offer as
the second one is.  If the planner had stuck with the nestloop plan,
it would've taken about 40x as long, and been significantly slower
than the hash join.

Cheers for the quick reply, maybe not the best values, see the following 2plans with approx the same number of product rows but different results andtimes. I forgot to mention that the product table has 2.5M rows althoughthis is apparent from the plans:


with hash join:

"Unique (cost=199627.47..199900.51 rows=1029 width=13) (actualtime=2226.358..2238.255 rows=49 loops=1)"" -> Sort (cost=199627.47..199718.48 rows=36406 width=13) (actualtime=2226.356..2230.342 rows=37086 loops=1)"

"        Sort Key: m.name, m.id"
"        Sort Method:  quicksort  Memory: 3276kB"

" -> Hash Join (cost=101700.78..196869.37 rows=36406 width=13)(actual time=1759.983..2193.453 rows=37086 loops=1)"

"              Hash Cond: (p.manufacturer_id = m.id)"

" -> Hash Join (cost=101667.62..196335.64 rows=36406 width=4)(actual time=1759.338..2174.826 rows=37086 loops=1)"

"                    Hash Cond: (o.product_id = p.id)"

" -> Bitmap Heap Scan on retailer_offer o(cost=921.66..84697.06 rows=36406 width=4) (actual time=12.168..49.759rows=37086 loops=1)"

"                          Recheck Cond: ((retailer_id = 5149) AND active)"

" -> Bitmap Index Scan onidx_retaileroffer_retailerid (cost=0.00..912.56 rows=36406 width=0) (actualtime=7.136..7.136 rows=37089 loops=1)"

"                                Index Cond: (retailer_id = 5149)"

" -> Hash (cost=59067.54..59067.54 rows=2540354width=8) (actual time=1746.670..1746.670 rows=2540383 loops=1)"" -> Seq Scan on product_product p(cost=0.00..59067.54 rows=2540354 width=8) (actual time=0.012..787.095rows=2540383 loops=1)"" -> Hash (cost=20.29..20.29 rows=1029 width=13) (actualtime=0.635..0.635 rows=1029 loops=1)"" -> Seq Scan on manufacturer_manufacturer m(cost=0.00..20.29 rows=1029 width=13) (actual time=0.009..0.296 rows=1029loops=1)"

"Total runtime: 2244.036 ms"

and without:

"Unique (cost=43237.53..43266.80 rows=1029 width=13) (actualtime=410.191..421.953 rows=332 loops=1)"" -> Sort (cost=43237.53..43247.29 rows=3903 width=13) (actualtime=410.189..414.351 rows=32959 loops=1)"

"        Sort Key: m.name, m.id"
"        Sort Method:  quicksort  Memory: 3384kB"

" -> Hash Join (cost=134.15..43004.71 rows=3903 width=13) (actualtime=16.356..328.938 rows=32959 loops=1)"

"              Hash Cond: (p.manufacturer_id = m.id)"

" -> Nested Loop (cost=100.99..42917.89 rows=3903 width=4)(actual time=15.716..308.037 rows=32959 loops=1)"" -> Bitmap Heap Scan on retailer_offer o(cost=100.99..13663.64 rows=3903 width=4) (actual time=15.693..67.479rows=32959 loops=1)"

"                          Recheck Cond: ((retailer_id = 2016) AND active)"

" -> Bitmap Index Scan onidx_retaileroffer_retailerid (cost=0.00..100.02 rows=3903 width=0) (actualtime=7.863..7.863 rows=33369 loops=1)"

"                                Index Cond: (retailer_id = 2016)"

" -> Index Scan using product_product_pkey onproduct_product p (cost=0.00..7.48 rows=1 width=8) (actualtime=0.006..0.006 rows=1 loops=32959)"

"                          Index Cond: (p.id = o.product_id)"

" -> Hash (cost=20.29..20.29 rows=1029 width=13) (actualtime=0.627..0.627 rows=1029 loops=1)"" -> Seq Scan on manufacturer_manufacturer m(cost=0.00..20.29 rows=1029 width=13) (actual time=0.009..0.270 rows=1029loops=1)"

"Total runtime: 422.058 ms"

You can see that the sequential scan is significantly slower than the indexscan (i've tried to mitigate any caching by the OS with these results).Postgresql 8.3.5 running on a Quad Core Xeon 2Ghz with 12Gb ram. All costsset to defaults, shared_buffers=4.2GB and effective_cache=6Gb.I thought with the later versions more shared_buffers was better, is thistoo much??


Thanks

JOHN



--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] Query planning question

Reply via email to