[HACKERS] IN vs EXISTS equivalence

Kevin Grittner Mon, 22 Oct 2007 07:35:56 -0700

I've requested this before without response, but I'm asking again
because it just caused me pain again: could we get a TODO added to
have the planner recognize equivalent IN and EXISTS constructs and
have them compete on cost estimates?  I know it's not a trivial
improvement, but if it's on the list maybe someone will pick it up,
and I see it as the single biggest weakness in PostgreSQL
performance.
 
I don't need help resolving this particular case, because the fix is
always blinding obvious when we hit this, and it doesn't even break
portability because no other database we've tested fails to recognize
these equivalent cases.
 
step=# explain DELETE FROM "Body" WHERE "bodySeqNo" NOT IN (SELECT "bodySeqNo" 
FROM "Message");
                                    QUERY PLAN
----------------------------------------------------------------------------------
 Seq Scan on "Body"  (cost=90277.43..285235351699.39 rows=3313379 width=6)
   Filter: (NOT (subplan))
   SubPlan
     ->  Materialize  (cost=90277.43..159793.40 rows=6627957 width=11)
           ->  Seq Scan on "Message"  (cost=0.00..80413.07 rows=6627957 
width=11)
(5 rows)


step=# explain DELETE FROM "Body" WHERE NOT EXISTS (SELECT * FROM "Message" m 
WHERE m."bodySeqNo" = "Body"."bodySeqNo");
                                         QUERY PLAN
--------------------------------------------------------------------------------------------
 Seq Scan on "Body"  (cost=0.00..3401760.88 rows=3313416 width=6)
   Filter: (NOT (subplan))
   SubPlan
     ->  Index Scan using "Message_Body" on "Message" m  (cost=0.00..0.49 
rows=1 width=136)
           Index Cond: (("bodySeqNo")::numeric = ($0)::numeric)
(5 rows)
 
The bodySeqNo column is NOT NULL in both tables, and is the primary
key in the Body table.  The Message table has a non-unique index on
it.  (\d lists will follow at the bottom.)
 
I cancelled the first query after it had been running for 54 hours
over our slowest hours (the weekend).  The second form ran in four
minutes in competition with peak time queries.
 
-Kevin
 
 
step=# \d "Body"
               Table "public.Body"
   Column    |          Type          | Modifiers
-------------+------------------------+-----------
 bodySeqNo   | "SequenceT"            | not null
 contentType | character varying(255) | not null
 encoding    | character varying(255) |
 body        | "BodyT"                |
Indexes:
    "Body_pkey" PRIMARY KEY, btree ("bodySeqNo")

step=# \d "Message"
                 Table "public.Message"
     Column      |           Type           | Modifiers
-----------------+--------------------------+-----------
 messageId       | "SequenceT"              | not null
 clientMessageId | "ClientMessageIdT"       | not null
 correlationId   | "SequenceT"              |
 destQueue       | "QueueNameT"             | not null
 replyToQueue    | "QueueNameT"             | not null
 typeCode        | character(2)             |
 expiration      | timestamp with time zone |
 priority        | smallint                 | not null
 status          | character(2)             | not null
 created         | timestamp with time zone | not null
 lastModified    | timestamp with time zone | not null
 bodySeqNo       | "SequenceT"              | not null
 messageIdSearch | "PrioritySequenceT"      | not null
Indexes:
    "Message_pkey" PRIMARY KEY, btree ("messageId")
    "MessageIndex2" UNIQUE, btree ("destQueue", "clientMessageId")
    "Message_MessageIdSearch" UNIQUE, btree ("destQueue", status, 
"messageIdSearch") CLUSTER
    "Message_Body" btree ("bodySeqNo")
    "Message_Created" btree ("destQueue", status, created)
    "Message_Created2" btree ("destQueue", created)
    "Message_Expiration" btree (expiration)
    "Message_LastModified" btree ("destQueue", "lastModified")
    "Message_ReplyToQueue" btree ("replyToQueue")
Foreign-key constraints:
    "Message_fk1" FOREIGN KEY ("destQueue") REFERENCES "Queue"(name)
    "Message_fk2" FOREIGN KEY ("replyToQueue") REFERENCES "Queue"(name)


---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

[HACKERS] IN vs EXISTS equivalence

Reply via email to