Re: [GENERAL] Help with sql

Rob Sargent Fri, 06 Jul 2012 17:40:55 -0700

On 07/06/2012 03:34 PM, Perry Smith wrote:

Hi Guys,


This isn't a PostgreSQL specific question but just a SQL question.  If this is 
not an appropriate question for this list, please let me know.

It is also, perhaps, a really silly question.

This query (without the 'explain' keyword) , when executed takes forever and a 
day:

condor_development=> explain select id from filesets where id not in ( select 
fileset_id from service_pack_fileset_maps );
                                           QUERY PLAN
----------------------------------------------------------------------------------------------
  Seq Scan on filesets  (cost=0.00..71937742.00 rows=26088 width=4)
    Filter: (NOT (SubPlan 1))
    SubPlan 1
      ->  Materialize  (cost=0.00..2517.78 rows=95852 width=4)
            ->  Seq Scan on service_pack_fileset_maps  (cost=0.00..1663.52 
rows=95852 width=4)
(5 rows)


This query returns within a second:

condor_development=> explain select id from filesets where id not in ( select 
distinct fileset_id from service_pack_fileset_maps );
                                           QUERY PLAN
----------------------------------------------------------------------------------------------
  Seq Scan on filesets  (cost=2102.31..3153.53 rows=26088 width=4)
    Filter: (NOT (hashed SubPlan 1))
    SubPlan 1
      ->  HashAggregate  (cost=1903.15..2062.48 rows=15933 width=4)
            ->  Seq Scan on service_pack_fileset_maps  (cost=0.00..1663.52 
rows=95852 width=4)
(5 rows)


The difference is the "distinct" keyword in the inner select.

What I'm confused about is why isn't the "distinct" implicit?  I thought the construct "blah 
in ( select ... )" was using "sets" and an item (I thought) can not be in a set more than once.

Perhaps my question is the opposite really?  Why would you not always use "distinct" in the inner 
select when the operator is "in" or "not in" ?

And if I can throw in another question on top: is there a different method other than 
"not in" that would work better?

Thank you guys for the help and a really awesome database.
pedz
(this is *suppose* to be sent as plain text... I hope my mailer does what it is 
told)

Well they are distinct records, they just may have the same values. AndI'm not trying to be flippant. We don't see the structure of thosetable: are all the id fields involved primary keys or with unique indexcoverage? Does "not exists ( select fileset.id = fileset_id fromservice_pack_fileset_map) change the behaviour?






--
Sent via pgsql-general mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Help with sql

Reply via email to