Is there actually evidence that there's a lot of problems with bad join orders? ISTM that's one of the areas where the planner actually does a pretty good job.
I put together a quick demonstration using AxBxC where AxB is empty but AxC is not. Sure enough, postgres chooses AxC first, then xB, which results in extra work. This is a contrived example, but it would be a pain to try to post a real example with all the data and analysis. I think it is fair to say that if it is making the wrong choice in this example, it is sometimes making the wrong choice in practice. Cross-table statistics are supposed to help avoid this, right? But I think it would only help if the system had the statistics for AxB. I think I have been hearing other people propose systems which would track which joins the system is actually using and then recommend to the user that those statistics be gathered. I think we need to go beyond that to recommending statistics (or automatically gathering statistics, or whatever) for joins that *might* be used given different plans than the one currently chosen by the planner.
test=# create table A (a integer); CREATE TABLE Time: 60.151 ms test=# create table B (b integer); CREATE TABLE Time: 3.270 ms test=# create table C (c integer); CREATE TABLE Time: 2.421 ms test=# insert into A (a) (select * from generate_series(1,10000,2)); INSERT 0 5000 Time: 67.829 ms test=# insert into B (b) (select * from generate_series(2,10000,2)); INSERT 0 5000 Time: 60.031 ms test=# insert into C (c) (select * from generate_series(1,1000,2)); INSERT 0 500 Time: 6.303 ms test=# analyze A; ANALYZE Time: 69.669 ms test=# analyze B; ANALYZE Time: 24.548 ms test=# analyze C; ANALYZE Time: 2.936 ms test=# explain select * from A, B, C where A.a = B.b and A.a = C.c; QUERY PLAN --------------------------------------------------------------------------- Hash Join (cost=113.50..216.50 rows=500 width=12) Hash Cond: ("outer".b = "inner".a) -> Seq Scan on b (cost=0.00..73.00 rows=5000 width=4) -> Hash (cost=112.25..112.25 rows=500 width=8) -> Hash Join (cost=9.25..112.25 rows=500 width=8) Hash Cond: ("outer".a = "inner".c) -> Seq Scan on a (cost=0.00..73.00 rows=5000 width=4) -> Hash (cost=8.00..8.00 rows=500 width=4) -> Seq Scan on c (cost=0.00..8.00 rows=500 width=4) (9 rows) Time: 4.807 ms test=# select * from A, B, C where A.a = B.b and A.a = C.c; a | b | c ---+---+--- (0 rows) Time: 34.561 ms test=# select count(*) from A, C where A.a = C.c; count ------- 500 (1 row) Time: 8.450 ms test=# select count(*) from A, B where A.a = B.b; count ------- 0 (1 row) Time: 33.757 ms ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org