On 07/06/2012 03:34 PM, Perry Smith wrote:
Hi Guys,

This isn't a PostgreSQL specific question but just a SQL question.  If this is 
not an appropriate question for this list, please let me know.

It is also, perhaps, a really silly question.

This query (without the 'explain' keyword) , when executed takes forever and a 
day:

condor_development=> explain select id from filesets where id not in ( select 
fileset_id from service_pack_fileset_maps );
                                           QUERY PLAN
----------------------------------------------------------------------------------------------
  Seq Scan on filesets  (cost=0.00..71937742.00 rows=26088 width=4)
    Filter: (NOT (SubPlan 1))
    SubPlan 1
      ->  Materialize  (cost=0.00..2517.78 rows=95852 width=4)
            ->  Seq Scan on service_pack_fileset_maps  (cost=0.00..1663.52 
rows=95852 width=4)
(5 rows)

This query returns within a second:

condor_development=> explain select id from filesets where id not in ( select 
distinct fileset_id from service_pack_fileset_maps );
                                           QUERY PLAN
----------------------------------------------------------------------------------------------
  Seq Scan on filesets  (cost=2102.31..3153.53 rows=26088 width=4)
    Filter: (NOT (hashed SubPlan 1))
    SubPlan 1
      ->  HashAggregate  (cost=1903.15..2062.48 rows=15933 width=4)
            ->  Seq Scan on service_pack_fileset_maps  (cost=0.00..1663.52 
rows=95852 width=4)
(5 rows)

The difference is the "distinct" keyword in the inner select.

What I'm confused about is why isn't the "distinct" implicit?  I thought the construct "blah 
in ( select ... )" was using "sets" and an item (I thought) can not be in a set more than once.

Perhaps my question is the opposite really?  Why would you not always use "distinct" in the inner 
select when the operator is "in" or "not in" ?

And if I can throw in another question on top: is there a different method other than 
"not in" that would work better?

Thank you guys for the help and a really awesome database.
pedz
(this is *suppose* to be sent as plain text... I hope my mailer does what it is 
told)




Well they are distinct records, they just may have the same values. And I'm not trying to be flippant. We don't see the structure of those table: are all the id fields involved primary keys or with unique index coverage? Does "not exists ( select fileset.id = fileset_id from service_pack_fileset_map) change the behaviour?





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to