>X-Original-To: [EMAIL PROTECTED]
>X-Authentication-Warning: houston.familyhealth.com.au: chriskl owned process 
doing -bs
>Date: Wed, 4 Aug 2004 21:21:51 +0800 (WST)
>From: Christopher Kings-Lynne <[EMAIL PROTECTED]>
>To: Valerie Schneider DSI/DEV <[EMAIL PROTECTED]>
>Cc: [EMAIL PROTECTED], <[EMAIL PROTECTED]>
>Subject: Re: [PERFORM] Tuning queries on large database
>MIME-Version: 1.0
>X-Virus-Scanned: by amavisd-new at hub.org
>X-Spam-Status: No, hits=0.0 tagged_above=0.0 required=5.0 tests=
>X-Spam-Level: 
>X-Mailing-List: pgsql-performance
>
>>      sort_mem = 50000
>
>That is way, way too large.  Try more like 5000 or lower.
>
>>  num_poste  | numeric(9,0)                | not null
>
>For starters numerics are really, really slow compared to integers.  Why
>aren't you using an integer for this field since youhave '0' decimal
>places.
>
>>  schema | relfilenode |      table       |   index    |  reltuples  |   size
>> 
--------+-------------+------------------+------------+-------------+----------
>>  public |   125615917 | data             |            | 1.25113e+08 | 
72312040
>>  public |   251139049 | data             | i_data_dat | 1.25113e+08 |  
2744400
>>  public |   250870177 | data             | pk_data    | 1.25113e+08 |  
4395480
>>
>> My first remark is that the table takes a lot of place on disk, about
>> 70 Gb, instead of 35 Gb with oracle.
>
>Integers will take a lot less space than numerics.
>
>> The different queries of the bench are "simple" queries (no join,
>> sub-query, ...) and are using indexes (I "explained" each one to
>> be sure) :
>> Q1 select_court : access to about 700 rows  : 1 "num_poste" and 1 month
>>      (using PK : num_poste=p1  and dat between p2 and p3)
>> Q2 select_moy   : access to about 7000 rows : 10 "num_poste" and 1 month
>>      (using PK : num_poste between p1 and p1+10 and dat between p2 and p3)
>> Q3 select_long  : about 250 000 rows        : 2 "num_poste"
>>      (using PK : num_poste in (p1,p1+2))
>> Q4 select_tres_long : about 3 millions rows : 25 "num_poste"
>>      (using PK : num_poste between p1 and p1 + 25)
>>
>> The result is that for "short queries" (Q1 and Q2) it runs in a few
>> seconds on both Oracle and PG. The difference becomes important with
>> Q3 : 8 seconds with oracle
>>      80 sec with PG
>> and too much with Q4 : 28s with oracle
>>                        17m20s with PG !
>>
>> Of course when I run 100 or 1000 parallel queries such as Q3 or Q4,
>> it becomes a disaster !
>
>Please reply with the EXPLAIN ANALYZE output of these queries so we can
>have some idea of how to help you.
>
>Chris
>
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 4: Don't 'kill -9' the postmaster

Q1 :
bench=> explain analyze select 'Q1',min(td),max(u) from data where 
num_poste=1000 and dat between 
(date_trunc('month',to_timestamp('31012004','ddmmyyyy')-interval '2000 
days'))::timestamp and 
(date_trunc('month',to_timestamp('31012004','ddmmyyyy')-interval '2000 days') + 
interval '1 month' - interval '1 hour')::timestamp;
                                                                                 
                                                                                 
                            QUERY PLAN                                                 
                                                                                       
                                                      
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
----------------------------------------------------------------------
 Aggregate  (cost=2501.90..2501.90 rows=1 width=21) (actual 
time=581.460..581.461 rows=1 loops=1)
   ->  Index Scan using pk_data on data  (cost=0.00..2498.80 rows=619 width=21) 
(actual time=92.986..579.089 rows=744 loops=1)
         Index Cond: ((num_poste = 1000::numeric) AND (dat >= 
(date_trunc('month'::text, (to_timestamp('31012004'::text, 'ddmmyyyy'::text) - 
'2000 days'::interval)))::timestamp without time zone) AND (dat <= 
(((date_trunc('month'::text, (to_timestamp('31012004'::text, 'ddmmyyyy'::text) - 
'2000 days'::interval)) + '1 mon'::interval) - '01:00:00'::interval))::timestamp 
without time zone))
 Total runtime: 609.149 ms
(4 rows)


Q2 :
bench=> explain analyze select 'Q2',count(*) from data where num_poste between 
100 and 100+10 and dat between 
(date_trunc('month',to_timestamp('31012004','ddmmyyyy')-interval '3000 
days'))::timestamp and 
(date_trunc('month',to_timestamp('31012004','ddmmyyyy')-interval '3000 days') + 
interval '1 month' - interval '1 hour')::timestamp;
                                                                                 
                                                                                 
                                            QUERY PLAN                                 
                                                                                       
                                                                                      
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
----------------------
 Aggregate  (cost=23232.05..23232.05 rows=1 width=0) (actual 
time=5678.849..5678.850 rows=1 loops=1)
   ->  Index Scan using pk_data on data  (cost=0.00..23217.68 rows=5747 width=0) 
(actual time=44.408..5669.387 rows=7920 loops=1)
         Index Cond: ((num_poste >= 100::numeric) AND (num_poste <= 
110::numeric) AND (dat >= (date_trunc('month'::text, 
(to_timestamp('31012004'::text, 'ddmmyyyy'::text) - '3000 
days'::interval)))::timestamp without time zone) AND (dat <= 
(((date_trunc('month'::text, (to_timestamp('31012004'::text, 'ddmmyyyy'::text) - 
'3000 days'::interval)) + '1 mon'::interval) - '01:00:00'::interval))::timestamp 
without time zone))
 Total runtime: 5679.059 ms
(4 rows)


Q3 :
bench=> explain analyze select 'Q3',sum(rr1),count(ff) from data where num_poste 
in (50,50+2);
                                                                    QUERY PLAN         
                                                           
--------------------------------------------------------------------------------
------------------------------------------------------------------
 Aggregate  (cost=986770.56..986770.56 rows=1 width=17) (actual 
time=75401.030..75401.031 rows=1 loops=1)
   ->  Index Scan using pk_data, pk_data on data  (cost=0.00..985534.43 
rows=247225 width=17) (actual time=35.823..74885.689 rows=250226 loops=1)
         Index Cond: ((num_poste = 50::numeric) OR (num_poste = 52::numeric))
 Total runtime: 75405.666 ms
(4 rows)


Q4 :
bench=> explain analyze select 'Q4',count(*) from data where num_poste between 
600 and 625;
                                                                  QUERY PLAN           
                                                       
--------------------------------------------------------------------------------
--------------------------------------------------------------
 Aggregate  (cost=12166763.62..12166763.62 rows=1 width=0) (actual 
time=1162090.302..1162090.303 rows=1 loops=1)
   ->  Index Scan using pk_data on data  (cost=0.00..12159021.19 rows=3096971 
width=0) (actual time=94.679..1158266.561 rows=3252938 loops=1)
         Index Cond: ((num_poste >= 600::numeric) AND (num_poste <= 
625::numeric))
 Total runtime: 1162102.217 ms
(4 rows)


Now I'm going to recreate my table with integer and real datatype,
and to decrease sort_mem to 5000.
Then I'll try these queries again.
Thanks.



********************************************************************
*    Les points de vue exprimes sont strictement personnels et     *
*      n'engagent pas la responsabilite de METEO-FRANCE.           *
********************************************************************
* Valerie SCHNEIDER             Tel : +33 (0)5 61 07 81 91         *
* METEO-FRANCE / DSI/DEV        Fax : +33 (0)5 61 07 81 09         *
* 42, avenue G. Coriolis        Email : [EMAIL PROTECTED] *
* 31057 TOULOUSE Cedex - FRANCE         http://www.meteo.fr        *
********************************************************************


---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Reply via email to