Sometimes using a temp table is a better idea:

e.g.

-- start by creating a temp table 'tids' that hold the to_ids that
-- we are interested in.
 SELECT to_id
   INTO TEMP TABLE tids
   FROM correlation
  WHERE from_id  = 1234
  ORDER BY val DESC limit 100;

-- The following temp table makes use of the primary key on
-- the correlation table, and the stated goal from the original
-- question that:
--         from_id > to_id
--     and from_id in (tids.to_id)
--    and to_id in (tids.to_id)

 SELECT t1.to_id AS from_id, t2.to_id
   INTO TEMP TABLE from_to
   FROM tids t1, tids t2
  WHERE t1.to_id > t2.to_id;

-- Now we can use the from_to table as an index into the correlation
-- table.

 SELECT c.from_id, c.to_id, c.val
   FROM from_to
   JOIN correlation c USING(from_id, to_id)
  WHERE val > 0.5;


The explain analyze for the final select works out to:
Nested Loop (cost=0.00..50692.00 rows=8488 width=16) (actual time=0.171..150.095 rows=2427 loops=1)
-> Seq Scan on from_to (cost=0.00..79.38 rows=5238 width=8) (actual time=0.006..7.660 rows=4950 loops=1)
-> Index Scan using correlation_pkey on correlation c (cost=0.00..9.63 rows=2 width=16) (actual time=0.024..0.025 rows=0 loops=4950)
Index Cond: (("outer".from_id = c.from_id) AND ("outer".to_id = c.to_id))
Filter: (val > 0.5::double precision)
Total runtime: 152.261 ms



Richard Huxton wrote:

Sean Davis wrote:

I answer my own question, if only for my own records. The following query is about 5-6 times faster than the original. Of course, if anyone else has other ideas, I'd be happy to hear them.

Sean

explain analyze select from_id,to_id,val from exprsdb.correlation where from_id in (select to_id from exprsdb.correlation where from_id=2424 order by val desc limit 100) and to_id in (select to_id from exprsdb.correlation where from_id=2424 order by val desc limit 100) and val>0.6 and to_id<from_id;


Might not be any faster, but you can do this as a self-join with subquery:

SELECT c1.from_id, c1.to_id, c1.val
FROM
  correlation c1,
  (
    SELECT to_id FROM correlation WHERE from_id=2424
    ORDER BY val DESC LIMIT 100
  ) AS c2
  (
    SELECT to_id FROM correlation WHERE from_id=2424
    ORDER BY val DESC LIMIT 100
  ) AS c3
WHERE
  c1.from_id = c2.to_id
  AND c1.to_id = c3.to_id
  AND c1.val > 0.5
  AND c1.to_id < from_id
;

I think PG should be smart enough nowadays to figure out these two queries are basically the same.


--
Edmund Bacon <[EMAIL PROTECTED]>


---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly

Reply via email to