Hi Bruno,

Thanks for the moral support! I feel so too - but I am confident it will show up soon.

W.r.t. your rewrite of the query, I get this "ERROR: could not devise a query plan for the given query" but no further details - I will try google

Regards,
Kim.

Bruno Wolff III wrote:

On Wed, Jun 08, 2005 at 11:37:40 +0200,
 Kim Bisgaard <[EMAIL PROTECTED]> wrote:
Hi,

I'm having problems with the query optimizer and FULL OUTER JOIN on PostgreSQL 7.4. I cannot get it to use my indexes with full outer joins. I might be naive, but I think that it should be possible?

I have two BIG tables (virtually identical) with 3 NOT NULL columns Station_id, TimeObs, Temp_XXXX, with unique indexes on (Station_id, TimeObs) and valid ANALYSE (set statistics=100). I want to join the two tables with a FULL OUTER JOIN.

When I specify the query as:

SELECT station_id, timeobs,temp_grass, temp_dry_at_2m
      FROM temp_dry_at_2m a
FULL OUTER JOIN temp_grass b USING (station_id, timeobs)
      WHERE station_id = 52981
        AND timeobs = '2004-1-1 0:0:0'

I get the correct results

station_id |       timeobs       | temp_grass | temp_dry_at_2m
------------+---------------------+------------+----------------
    52944 | 2004-01-01 00:10:00 |            |           -1.1
(1 row)

BUT LOUSY performance, and the following EXPLAIN:

                                                                                
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Merge Full Join (cost=1542369.83..1618958.58 rows=6956994 width=32) (actual time=187176.408..201436.264 rows=1 loops=1) Merge Cond: (("outer".station_id = "inner".station_id) AND ("outer".timeobs = "inner".timeobs)) Filter: ((COALESCE("outer".station_id, "inner".station_id) = 52981) AND (COALESCE("outer".timeobs, "inner".timeobs) = '2004-01-01 00:00:00'::timestamp without time zone)) -> Sort (cost=1207913.44..1225305.93 rows=6956994 width=16) (actual time=145748.253..153851.607 rows=6956994 loops=1)
       Sort Key: a.station_id, a.timeobs
-> Seq Scan on temp_dry_at_2m a (cost=0.00..117549.94 rows=6956994 width=16) (actual time=0.049..54226.770 rows=6956994 loops=1) -> Sort (cost=334456.38..340472.11 rows=2406292 width=16) (actual time=31668.876..34491.123 rows=2406292 loops=1)
       Sort Key: b.station_id, b.timeobs
-> Seq Scan on temp_grass b (cost=0.00..40658.92 rows=2406292 width=16) (actual time=0.052..5484.489 rows=2406292 loops=1)
Total runtime: 201795.989 ms
(10 rows)

Someone else will need to comment on why Postgres can't use a more
efficient plan. What I think will work for you is to restrict
the station_id and timeobs on each side and then do a full join.
You can try something like the sample query below (which hasn't been tested):
SELECT station_id, timeobs, temp_grass, temp_dry_at_2m
 FROM
   (SELECT station_id, timeobs, temp_dry_at_2m
     FROM temp_dry_at_2m
     WHERE
       station_id = 52981
       AND
       timeobs = '2004-1-1 0:0:0') a
   FULL OUTER JOIN
   (SELECT station_id, timeobs, temp_grass
     FROM temp_grass
     WHERE
       station_id = 52981
       AND
       timeobs = '2004-1-1 0:0:0') b
   USING (station_id, timeobs)

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
     subscribe-nomail command to [EMAIL PROTECTED] so that your
     message can get through to the mailing list cleanly


---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

              http://archives.postgresql.org

Reply via email to