Achilleas Mantzios wrote:
Thanx,

Στις Monday 17 January 2011 18:52:27 ο/η Ing. Marcos Ortiz Valmaseda έγραψε:
Well, on the Release Notes on the PostgreSQL-8.4 Documentation, the developers 
recommend to use NOT EXISTS
instead NOT IN, because the first clause has a better performance. So, you can 
use it on that way.

You mean this?
(from 8.4 changes)
"Create explicit concepts of semi-joins and anti-joins (Tom)
       This work formalizes our previous ad-hoc treatment of IN (SELECT
       ...) clauses, and extends it to EXISTS and NOT EXISTS clauses. It
       should result in significantly better planning of EXISTS and NOT
       EXISTS queries. In general, logically equivalent IN and EXISTS
       clauses should now have similar performance, whereas previously IN
       often won."

I haven't found any other recent reference to this issue. And this is far from 
what you suggest.
Here the entry talks about "similar" performance.

Also a similar issue was hot back in 7.4 days :
"IN / NOT IN subqueries are now much more efficient
          In previous releases, IN/NOT IN subqueries were joined to the
          upper query by sequentially scanning the subquery looking for a
          match. The 7.4 code uses the same sophisticated techniques used
          by ordinary joins and so is much faster. An IN will now usually
          be as fast as or faster than an equivalent EXISTS subquery; this
          reverses the conventional wisdom that applied to previous
          releases."

Other questions?
- Do you have a partial index on marinerstates.marinerid where this condition 
is accomplished?
No, but i just tried it (on state='Active') with no impact.
- Do you have a index on mariner.id?
Yes, It is the primary key.
- Can you provide a explain of these queries on the PostgreSQL-9.0 machines?
Sure, first i'll post the table definitions and then some stats and then the 
epxlain analyze(s)

mariner
=====
 id                           | integer                  | not null default 
nextval(('public.mariner_id_seq'::text)::regclass)
 givenname                    | character varying(200)   |
 midname                      | character varying(100)   |
 surname                      | character varying(200)   | not null
...
Indexes:
    "mariner_pkey" PRIMARY KEY, btree (id)
    "mariner_smauid" UNIQUE, btree (smauid)
    "mariner_username_key" UNIQUE, btree (username)
    "mariner_nationalityid" btree (nationalityid)
    "mariner_parentid" btree (parentid)
    "mariner_surname" btree (surname)

marinerstates
========
 id                     | integer                  | not null default 
nextval(('public.marinerstates_id_seq'::text)::regclass)
 marinerid              | integer                  | not null
 state                  | character varying(20)    | not null
 vslid                  | integer                  |
 leave_period_days      | integer                  |
 comment                | text                     |
 starttime              | timestamp with time zone | not null
 endtime                | timestamp with time zone |
 trid                   | integer                  |
 sal_bw                 | real                     | not null default 0.0
 sal_ot                 | real                     | not null default 0.0
 sal_lp                 | real                     | not null default 0.0
 sal_misc               | real                     | not null default 0.0
 rankid                 | integer                  |
 system_vslid           | integer                  |
 startport              | text                     |
 endport                | text                     |
.....
Indexes:
    "marinerstates_pkey" PRIMARY KEY, btree (id)
    "marinerstates_mariner_cur_state" UNIQUE, btree (marinerid) WHERE endtime 
IS NULL
    "marinerstates_system_vslid" UNIQUE, btree (marinerid, system_vslid)
    "marinerstates__system_vslid" btree (system_vslid)
    "marinerstates_cur_mariners_states" btree (endtime) WHERE endtime IS NULL
    "marinerstates_mariner_past_state" btree (marinerid, starttime, endtime) 
WHERE endtime IS NOT NULL
    "marinerstates_marinerid" btree (marinerid)
    "marinerstates_marinerid_starttime" btree (marinerid, starttime)
    "marinerstates_rankid" btree (rankid)
    "marinerstates_rankid_cur_mariners" btree (rankid) WHERE endtime IS NULL
    "marinerstates_rankid_past_state" btree (rankid, starttime, endtime) WHERE 
endtime IS NOT NULL
    "marinerstates_state" btree (state)
    "marinerstates_state_cur_mariners" btree (state) WHERE endtime IS NULL
    "marinerstates_state_past_state" btree (state, starttime, endtime) WHERE 
endtime IS NOT NULL
    "marinerstates_vslid" btree (vslid)
    "marinerstates_vslid_cur_mariners" btree (vslid) WHERE endtime IS NULL
    "marinerstates_vslid_past_state" btree (vslid, starttime, endtime) WHERE 
endtime IS NOT NULL

vessels
=====
 name               | character varying(200) | not null
 id                 | integer                | not null default 
nextval(('public.vessels_id_seq'::text)::regclass)
...
Indexes:
    "vessels_pkey" PRIMARY KEY, btree (id)
    "vessels_name_key" UNIQUE, btree (name)
    "idx_name" btree (name)
    "vessels_flag" btree (flag)
    "vessels_groupno" btree (groupno)
    "vessels_vslstatus_idx" btree (vslstatus)

dynacom=# SELECT count(*) from mariner;
 count
-------
 14447

dynacom=# SELECT count(*) from marinerstates;
 count
-------
 51013

dynacom=# SELECT avg(marqry.cnt),stddev(marqry.cnt) FROM (SELECT 
m.id,count(ms.id) as cnt from mariner m, marinerstates ms WHERE 
m.id=ms.marinerid group by m.id) AS marqry;
        avg         |       stddev
--------------------+--------------------
 3.5665944207508914 | 4.4416879361829170

(vessels do not play any impact in the query, so i'll leave them out)

Slow plan in 9.0.2 :
                                                                                
                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=11525.09..11571.55 rows=3717 width=23) (actual 
time=10462.561..10462.937 rows=603 loops=1)
   ->  Sort  (cost=11525.09..11534.38 rows=3717 width=23) (actual 
time=10462.560..10462.664 rows=603 loops=1)
         Sort Key: m.surname, (COALESCE(m.givenname, ''::character varying)), 
(COALESCE(m.midname, ''::character varying)), m.id
         Sort Method:  quicksort  Memory: 71kB
         ->  Hash Join  (cost=8281.98..11304.67 rows=3717 width=23) (actual 
time=10425.261..10461.621 rows=603 loops=1)
               Hash Cond: (ms.marinerid = m.id)
               ->  Hash Join  (cost=20.12..2963.83 rows=3717 width=4) (actual 
time=0.228..34.993 rows=2625 loops=1)
                     Hash Cond: (ms.vslid = vsl.id)
                     ->  Seq Scan on marinerstates ms  (cost=0.00..2889.32 
rows=4590 width=8) (actual time=0.011..33.494 rows=2625 loops=1)
                           Filter: (((state)::text = 'Active'::text) AND 
((starttime)::date <= '2007-01-11'::date) AND ((COALESCE(endtime, now()))::date 
>= '2006-07-15'::date))
                     ->  Hash  (cost=16.72..16.72 rows=272 width=4) (actual 
time=0.207..0.207 rows=272 loops=1)
                           Buckets: 1024  Batches: 1  Memory Usage: 10kB
                           ->  Seq Scan on vessels vsl  (cost=0.00..16.72 
rows=272 width=4) (actual time=0.004..0.118 rows=272 loops=1)
               ->  Hash  (cost=8172.57..8172.57 rows=7143 width=23) (actual 
time=10424.994..10424.994 rows=12832 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 702kB
                     ->  Seq Scan on mariner m  (cost=7614.86..8172.57 
rows=7143 width=23) (actual time=10409.498..10419.971 rows=12832 loops=1)
                           Filter: ((NOT (hashed SubPlan 1)) AND 
((marinertype)::text = 'Mariner'::text))
                           SubPlan 1
                             ->  Unique  (cost=2768.00..7614.86 rows=1 width=4) 
(actual time=87.495..10408.446 rows=1454 loops=1)
                                   ->  Nested Loop  (cost=2768.00..7614.86 
rows=1 width=4) (actual time=87.493..10407.517 rows=1835 loops=1)
                                         Join Filter: (msold.marinerid = 
mold.id)
                                         ->  Index Scan using mariner_pkey on 
mariner mold  (cost=0.00..1728.60 rows=14286 width=4) (actual time=0.007..13.931 
rows=14286 loops=1)
                                               Filter: ((marinertype)::text = 
'Mariner'::text)
                                         ->  Materialize  
(cost=2768.00..5671.97 rows=1 width=8) (actual time=0.003..0.330 rows=1876 
loops=14286)
                                               ->  Nested Loop  
(cost=2768.00..5671.96 rows=1 width=8) (actual time=39.723..85.401 rows=1876 
loops=1)
                                                     ->  Hash Semi Join  
(cost=2768.00..5671.67 rows=1 width=12) (actual time=39.708..81.501 rows=1876 
loops=1)
                                                           Hash Cond: 
(msold.marinerid = msold2.marinerid)
                                                           Join Filter: ((msold2.id 
<> msold.id) AND (msold2.starttime < msold.starttime) AND ((msold.starttime - 
msold2.endtime) <= '1 year 6 mons'::interval))
                                                           ->  Seq Scan on 
marinerstates msold  (cost=0.00..2889.32 rows=4590 width=20) (actual 
time=0.003..33.952 rows=2625 loops=1)
                                                                 Filter: 
(((state)::text = 'Active'::text) AND ((starttime)::date <= '2007-01-11'::date) 
AND ((COALESCE(endtime, now()))::date >= '2006-07-15'::date))
                                                           ->  Hash  
(cost=2251.66..2251.66 rows=41307 width=24) (actual time=39.613..39.613 rows=41250 
loops=1)
                                                                 Buckets: 8192  
Batches: 1  Memory Usage: 2246kB
                                                                 ->  Seq Scan 
on marinerstates msold2  (cost=0.00..2251.66 rows=41307 width=24) (actual 
time=0.002..24.882 rows=41250 loops=1)
                                                                       Filter: 
((state)::text = 'Active'::text)
                                                     ->  Index Scan using 
vessels_pkey on vessels vslold  (cost=0.00..0.28 rows=1 width=4) (actual 
time=0.001..0.001 rows=1 loops=1876)
                                                           Index Cond: 
(vslold.id = msold.vslid)
 Total runtime: 10463.619 ms
(37 rows)

Fast plan in 8.3.13 :
                                                                                
                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=633677.56..633700.48 rows=1834 width=23) (actual 
time=543.684..551.003 rows=603 loops=1)
   ->  Sort  (cost=633677.56..633682.14 rows=1834 width=23) (actual 
time=543.676..546.070 rows=603 loops=1)
         Sort Key: m.surname, (COALESCE(m.givenname, ''::character varying)), 
(COALESCE(m.midname, ''::character varying)), m.id
         Sort Method:  quicksort  Memory: 53kB
         ->  Hash Join  (cost=630601.65..633578.15 rows=1834 width=23) (actual 
time=439.969..540.573 rows=603 loops=1)
               Hash Cond: (ms.vslid = vsl.id)
               ->  Hash Join  (cost=630580.33..633530.01 rows=2261 width=27) 
(actual time=437.459..532.847 rows=603 loops=1)
                     Hash Cond: (ms.marinerid = m.id)
                     ->  Seq Scan on marinerstates ms  (cost=0.00..2875.32 
rows=4599 width=8) (actual time=0.017..80.153 rows=2625 loops=1)
                           Filter: (((state)::text = 'Active'::text) AND 
((starttime)::date <= '2007-01-11'::date) AND ((COALESCE(endtime, now()))::date 
>= '2006-07-15'::date))
                     ->  Hash  (cost=630491.54..630491.54 rows=7103 width=23) 
(actual time=437.307..437.307 rows=12832 loops=1)
                           ->  Index Scan using mariner_pkey on mariner m  
(cost=628776.89..630491.54 rows=7103 width=23) (actual time=311.023..380.168 
rows=12832 loops=1)
                                 Filter: ((NOT (hashed subplan)) AND 
((marinertype)::text = 'Mariner'::text))
                                 SubPlan
                                   ->  Unique  (cost=0.00..628772.30 rows=1834 
width=4) (actual time=0.129..303.981 rows=1454 loops=1)
                                         ->  Nested Loop  (cost=0.00..628767.72 
rows=1834 width=4) (actual time=0.120..289.961 rows=1835 loops=1)
                                               ->  Nested Loop  
(cost=0.00..627027.98 rows=1865 width=4) (actual time=0.099..237.128 rows=1876 
loops=1)
                                                     ->  Index Scan using 
marinerstates_marinerid on marinerstates msold  (cost=0.00..626316.07 rows=2299 
width=8) (actual time=0.079..186.150 rows=1876 loops=1)
                                                           Filter: (((state)::text = 
'Active'::text) AND ((starttime)::date <= '2007-01-11'::date) AND 
((COALESCE(endtime, now()))::date >= '2006-07-15'::date) AND (subplan))
                                                           SubPlan
                                                             ->  Bitmap Heap 
Scan on marinerstates msold2  (cost=4.28..12.11 rows=1 width=0) (actual 
time=0.020..0.020 rows=1 loops=2625)
                                                                   Recheck Cond: 
((marinerid = $0) AND (starttime < $2))
                                                                   Filter: ((id <> 
$1) AND ((state)::text = 'Active'::text) AND (($2 - endtime) <= '1 year 6 
mons'::interval))
                                                                   ->  Bitmap 
Index Scan on marinerstates_marinerid_starttime  (cost=0.00..4.28 rows=2 width=0) 
(actual time=0.009..0.009 rows=6 loops=2625)
                                                                         Index 
Cond: ((marinerid = $0) AND (starttime < $2))
                                                     ->  Index Scan using 
vessels_pkey on vessels vslold  (cost=0.00..0.30 rows=1 width=4) (actual 
time=0.006..0.010 rows=1 loops=1876)
                                                           Index Cond: 
(vslold.id = msold.vslid)
                                               ->  Index Scan using 
mariner_pkey on mariner mold  (cost=0.00..0.92 rows=1 width=4) (actual 
time=0.007..0.012 rows=1 loops=1876)
                                                     Index Cond: (mold.id = 
msold.marinerid)
                                                     Filter: 
((mold.marinertype)::text = 'Mariner'::text)
               ->  Hash  (cost=17.81..17.81 rows=281 width=4) (actual 
time=2.491..2.491 rows=273 loops=1)
                     ->  Seq Scan on vessels vsl  (cost=0.00..17.81 rows=281 
width=4) (actual time=0.012..1.306 rows=273 loops=1)
 Total runtime: 553.601 ms
(33 rows)

Is there any other data i could post (pg_stat,...) that would help?

thanx a lot.

Regards


Ing. Marcos Luís Ortíz Valmaseda
Linux User # 418229 && PostgreSQL DBA
Centro de Tecnologías Gestión de Datos (DATEC)
http://postgresql.uci.cu
http://www.postgresql.org
http://it.toolbox.com/blogs/sql-apprentice




--
Achilleas Mantzios

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Achilleas, here is the slow part from 9.02:

                                                    ->  Hash Semi Join  
(cost=2768.00..5671.67 rows=1 width=12) (actual time=39.708..81.501 rows=1876 
loops=1)
                                                          Hash Cond: 
(msold.marinerid = msold2.marinerid)
                                                          Join Filter: ((msold2.id <> 
msold.id) AND (msold2.starttime < msold.starttime) AND ((msold.starttime - 
msold2.endtime) <= '1 year 6 mons'::interval))
                                                          ->  Seq Scan on 
marinerstates msold  (cost=0.00..2889.32 rows=4590 width=20) (actual 
time=0.003..33.952 rows=2625 loops=1)
                                                                Filter: 
(((state)::text = 'Active'::text) AND ((starttime)::date <= '2007-01-11'::date) 
AND ((COALESCE(endtime, now()))::date >= '2006-07-15'::date))
                                                          ->  Hash  
(cost=2251.66..2251.66 rows=41307 width=24) (actual time=39.613..39.613 rows=41250 
loops=1)
                                                                Buckets: 8192  
Batches: 1  Memory Usage: 2246kB
-> Seq Scan on marinerstates msold2 (cost=0.00..2251.66 rows=41307 width=24) (actual time=0.002..24.882

The same part from 8.3.3 looks like this:

Seq Scan on marinerstates ms  (cost=0.00..2875.32 rows=4599 width=8) (actual 
time=0.017..80.153 rows=2625 loops=1)
                          Filter: (((state)::text = 'Active'::text) AND 
((starttime)::date <= '2007-01-11'::date) AND ((COALESCE(endtime, now()))::date 
>= '2006-07-15'::date))
                    ->  Hash  (cost=630491.54..630491.54 rows=7103 width=23) 
(actual time=437.307..437.307 rows=12832 loops=1)
                          ->  Index Scan using mariner_pkey on mariner m  
(cost=628776.89..630491.54 rows=7103 width=23) (actual time=311.023..380.168 
rows=12832 loops=1)
                                Filter: ((NOT (hashed subplan)) AND 
((marinertype)::text = 'Mariner'::text))
                                SubPlan
                                  ->  Unique  (cost=0.00..628772.30 rows=1834 
width=4) (actual time=0.129..303.981 rows=1454 loops=1)
                                        ->  Nested Loop  (cost=0.00..628767.72 
rows=1834 width=4) (actual time=0.120..289.961 rows=1835 loops=1)


This leads me to the conclusion that the queries differ significantly. 8.3.3 mentions NOT hashed plan, I don't see it in 9.02 and the filtering conditions look differently. Are you sure that the plans are from the same query?

--
Mladen Gogala Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Reply via email to