Poking a bit deeper, it *does* think the plan with sorts is cheaper than without. The mergejoin plan it really prefers is:
regression=# set enable_hashjoin TO 0; SET regression=# set enable_nestloop TO 0; SET regression=# explain ... QUERY PLAN --------------------------------------------------------------------------------------------------- Update (cost=41.69..379234.38 rows=14225817 width=88) -> Merge Join (cost=41.69..379234.38 rows=14225817 width=88) Merge Cond: ((t1.f1 = t2.f1) AND (t1.f2 = t2.f2) AND (t1.f3 = t2.f3) AND (t1.f4 = t2.f4)) -> Index Scan using i1 on t1 (cost=0.00..21198.88 rows=234839 width=65) -> Sort (cost=41.69..43.19 rows=600 width=65) Sort Key: t2.f1, t2.f2, t2.f3, t2.f4 -> Seq Scan on t2 (cost=0.00..14.00 rows=600 width=65) (7 rows) but if you force it to use indexscans: regression=# set enable_seqscan TO 0; SET regression=# explain ... QUERY PLAN --------------------------------------------------------------------------------------------------- Update (cost=52001.84..410457.60 rows=14225817 width=88) -> Merge Join (cost=52001.84..410457.60 rows=14225817 width=88) Merge Cond: ((t1.f3 = t2.f3) AND (t1.f1 = t2.f1) AND (t1.f2 = t2.f2) AND (t1.f4 = t2.f4)) -> Sort (cost=51783.56..52370.66 rows=234839 width=65) Sort Key: t1.f3, t1.f1, t1.f2, t1.f4 -> Index Scan using i1 on t1 (cost=0.00..21198.88 rows=234839 width=65) -> Sort (cost=93.12..94.62 rows=600 width=65) Sort Key: t2.f3, t2.f1, t2.f2, t2.f4 -> Index Scan using i2 on t2 (cost=0.00..65.44 rows=600 width=65) (9 rows) and then without sorts: regression=# set enable_sort TO 0; SET regression=# explain ... QUERY PLAN --------------------------------------------------------------------------------------------------- Update (cost=0.00..483609.37 rows=14225817 width=88) -> Merge Join (cost=0.00..483609.37 rows=14225817 width=88) Merge Cond: ((t2.f1 = t1.f1) AND (t2.f2 = t1.f2) AND (t2.f3 = t1.f3) AND (t2.f4 = t1.f4)) -> Index Scan using i2 on t2 (cost=0.00..65.44 rows=600 width=65) -> Materialize (cost=0.00..23547.27 rows=234839 width=65) -> Index Scan using i1 on t1 (cost=0.00..21198.88 rows=234839 width=65) (6 rows) Note that the join cost is way higher than the sum of the input costs in all three cases. The reason for that is that it's expecting a whole lot of rescanning of the inner relation due to duplicate merge keys. This means that a "bare" inner indexscan is going to be penalized very heavily for refetches, whereas plans with either sort or materialize in between look better because the refetch cost is very low. So that's how a plan with a sort can be preferred to one without. I think the weird looking choices of sort order may just be randomness because all sort orders cost the same once it decides to sort. However, even given that, it's odd that it prefers a plan with two sorts to a plan with one materialize. Poking around in costsize.c, I think that the reason for this is that the rescan cost of a sort is estimated at cpu_operator_cost per tuple, whereas rescanning a materialize node is being estimated at cpu_tuple_cost per tuple. For a plan where rescan cost is the dominant factor, that matters. We probably ought to make those two estimates the same. Since neither plan node type does any projection or qual checking, the lower number is probably the better choice. BTW, the real bottom line here is that mergejoin is a crummy plan choice when there are so few distinct join key values. The planner would never have picked any of these plans if you hadn't forced it to. So I'm not sure how important this is in the real world. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers