Re: [PERFORM] IDE x SAS RAID 0 on HP DL 380 G5 P400i controller performance problem

2010-07-20 Thread Scott Carey
If you are using ext3, your performance on the RAID card may also improve if 
the postgres xlog is not on the same partition as the data.  Otherwise, for 
every transaction commit, all of the data on the whole partition will have to 
be sync()'d not just the xlog.

Also, what is the performance difference between all the inserts in one script 
if you do:

* all your statements in the script
vs.
* first line is "BEGIN;"  then all your statements, then "COMMIT;" at the end?

If these two are about the same on your old IDE drive, then your I/O stack 
(file system + OS + hardware) is lying to you about fsync().  The latter should 
be a lot faster on your RAID card if write-back caching is not on.



On Jul 19, 2010, at 12:53 PM, Daniel Ferreira de Lima wrote:



That said, a look into the write-caching+BBU policy on your controller is 
worthwhile.  If you're running this application successfully on some hardware 
but not others, that could be a source for the difference.

I think it's really a BBU/BBWC problem.
The tests that we made in the lab with HP Blade G5 (G6 doesn't support kernel 
version 2.4) turning the battery off show us the 'same' performance of the 
"pizza-box" HP DL 380 G5..  an old joke.  40 secs to 2000 insertions: like a 
chariot.

We're finding the cache expansion and batteries (and.. why it's not default???).


I think that this thread is now finished.

Thanks!
--
Daniel Ferreira



Re: [PERFORM] potential performance gain by query planner optimization

2010-07-20 Thread Pavel Stehule
Hello

2010/7/20 Kneringer, Armin :
> Hi there.
>
> I think I found a potential performance gain if the query planner would be 
> optimized. All Tests has been performed with 8.4.1 (and earlier versions) on 
> CentOS 5.3 (x64)
>
> The following query will run on my database (~250 GB) for ca. 1600 seconds 
> and the sort will result in a disk merge deploying ca. 200 GB of data to the 
> local disk (ca. 180.000 tmp-files)

can you try show check explain with set enable_hashjoin to off;   ?

Regards

Pavel Stehule

>
> explain SELECT DISTINCT t4.objid
> FROM fscsubfile t4, cooobject t6
>  NOT EXISTS (
>  WHERE t6.objid = t4.objid AND
>  t4.fileresporgid = 573936067464397682 AND
>   NOT EXISTS (
>   SELECT 1
>   FROM ataggval q1_1,
>   atdateval t5
>   WHERE q1_1.objid = t4.objid AND
>   q1_1.attrid = 281479288456451 AND
>   q1_1.aggrid = 0 AND
>   t5.aggrid = q1_1.aggval AND
>   t5.objid = t4.objid AND
>   t5.attrid = 281479288456447 ) AND
>  ((t6.objclassid IN (285774255832590,285774255764301))) AND
>  ((t4.objid > 573936097512390656 and t4.objid < 573936101807357952))
>  ORDER BY t4.objid;
>
>                                                                               
>    QUERY PLAN
> --
> Unique  (cost=2592137103.99..2592137104.00 rows=1 width=8)
>   ->  Sort  (cost=2592137103.99..2592137104.00 rows=1 width=8)
>         Sort Key: t4.objid
>         ->  Nested Loop  (cost=1105592553.38..2592137103.98 rows=1 width=8)
>               ->  Hash Anti Join  (cost=1105592553.38..2592137095.75 rows=1 
> width=8)
>                     Hash Cond: ((t4.objid = q1_1.objid) AND (t4.objid = 
> t5.objid))
>                     ->  Bitmap Heap Scan on fscsubfile t4  
> (cost=154.42...14136.40 rows=5486 width=8)
>                           Recheck Cond: ((fileresporgid = 
> 573936067464397682::bigint) AND (objid > 573936097512390656::bigint) AND 
> (objid < 573936101807357952::bigint))
>                           ->  Bitmap Index Scan on ind_fscsubfile_filerespons 
>  (cost=0.00..153.05 rows=5486 width=0)
>                                 Index Cond: ((fileresporgid = 
> 573936067464397682::bigint) AND (objid > 573936097512390656::bigint) AND 
> (objid < 573936101807357952::bigint))
>                     ->  Hash  (cost=11917516.57..11917516.57 rows=55006045159 
> width=16)
>                           ->  Nested Loop  (cost=0.00..11917516.57 
> rows=55006045159 width=16)
>                                 ->  Seq Scan on atdateval t5  
> (cost=0.00...294152.40 rows=1859934 width=12)
>                                       Filter: (attrid = 
> 281479288456447::bigint)
>                                 ->  Index Scan using ind_ataggval on ataggval 
> q1_1  (cost=0.00..6.20 rows=4 width=12)
>                                       Index Cond: ((q1_1.attrid = 
> 281479288456451::bigint) AND (q1_1.aggval = t5.aggrid))
>                                       Filter: (q1_1.aggrid = 0)
>               ->  Index Scan using cooobjectix on cooobject t6  
> (cost=0.00..8.22 rows=1 width=8)
>                     Index Cond: (t6.objid = t4.objid)
>                     Filter: (t6.objclassid = ANY 
> ('{285774255832590,285774255764301}'::bigint[]))
> (20 rows)
>
>
> As the disks pace is limited on my test system I can't provide the "explain 
> analyze" output
> If I change the query as follows the query takes only 12 seconds and only 
> needs 2 tmp files for sorting.
> (Changed lines are marked with [!] as I don't know HTML-Mails will be 
> delivered without conversion
>
> explain SELECT DISTINCT t4.objid
> FROM fscsubfile t4, cooobject t6
> WHERE t6.objid = t4.objid AND
> t4.fileresporgid = 573936067464397682 AND
>   NOT EXISTS (
>   SELECT 1
>   FROM ataggval q1_1,
>   atdateval t5
>   WHERE q1_1.objid = t4.objid AND
>   q1_1.attrid = 281479288456451 AND
>   q1_1.aggrid = 0 AND
>   t5.aggrid = q1_1.aggval AND
>   t5.objid = q1_1.objid AND                 [!]
>   t5.attrid = 281479288456447 ) AND
>   ((t6.objclassid IN (285774255832590,285774255764301))) AND
>   ((t4.objid > 573936097512390656 and t4.objid < 573936101807357952))
>  ORDER BY t4.objid;
>                                                                            
> QUERY PLAN
> --
> Unique  (cost=918320.29..971968.88 rows=1 width=8)
>   ->  Nested Loop  (cost=918320.29..971968.88 rows=1 width=8)
>         ->  Merge Anti Join  (cost=918320.29..971960.65 rows=1 width=8)
>               Merge Cond: (t4.objid = q1_1.objid)
>               ->  Index Scan using ind_fscsubfile_filerespons on fscsubfile 
> t4  (cost=0.00..19016.05 rows=5486 width=8)
>                     Index Cond: ((fileresporgid = 573936067464397682::bigint) 
> AND (objid > 57

[PERFORM] potential performance gain by query planner optimization

2010-07-20 Thread Kneringer, Armin
Hi there.

I think I found a potential performance gain if the query planner would be 
optimized. All Tests has been performed with 8.4.1 (and earlier versions) on 
CentOS 5.3 (x64)

The following query will run on my database (~250 GB) for ca. 1600 seconds and 
the sort will result in a disk merge deploying ca. 200 GB of data to the local 
disk (ca. 180.000 tmp-files)

explain SELECT DISTINCT t4.objid
FROM fscsubfile t4, cooobject t6
 NOT EXISTS (
 WHERE t6.objid = t4.objid AND
 t4.fileresporgid = 573936067464397682 AND
   NOT EXISTS (
   SELECT 1
   FROM ataggval q1_1,
   atdateval t5
   WHERE q1_1.objid = t4.objid AND
   q1_1.attrid = 281479288456451 AND
   q1_1.aggrid = 0 AND
   t5.aggrid = q1_1.aggval AND
   t5.objid = t4.objid AND
   t5.attrid = 281479288456447 ) AND
 ((t6.objclassid IN (285774255832590,285774255764301))) AND
 ((t4.objid > 573936097512390656 and t4.objid < 573936101807357952))
 ORDER BY t4.objid;


  QUERY PLAN
--
Unique  (cost=2592137103.99..2592137104.00 rows=1 width=8)
   ->  Sort  (cost=2592137103.99..2592137104.00 rows=1 width=8)
 Sort Key: t4.objid
 ->  Nested Loop  (cost=1105592553.38..2592137103.98 rows=1 width=8)
   ->  Hash Anti Join  (cost=1105592553.38..2592137095.75 rows=1 
width=8)
 Hash Cond: ((t4.objid = q1_1.objid) AND (t4.objid = 
t5.objid))
 ->  Bitmap Heap Scan on fscsubfile t4  
(cost=154.42...14136.40 rows=5486 width=8)
   Recheck Cond: ((fileresporgid = 
573936067464397682::bigint) AND (objid > 573936097512390656::bigint) AND (objid 
< 573936101807357952::bigint))
   ->  Bitmap Index Scan on ind_fscsubfile_filerespons  
(cost=0.00..153.05 rows=5486 width=0)
 Index Cond: ((fileresporgid = 
573936067464397682::bigint) AND (objid > 573936097512390656::bigint) AND (objid 
< 573936101807357952::bigint))
 ->  Hash  (cost=11917516.57..11917516.57 rows=55006045159 
width=16)
   ->  Nested Loop  (cost=0.00..11917516.57 
rows=55006045159 width=16)
 ->  Seq Scan on atdateval t5  
(cost=0.00...294152.40 rows=1859934 width=12)
   Filter: (attrid = 
281479288456447::bigint)
 ->  Index Scan using ind_ataggval on ataggval 
q1_1  (cost=0.00..6.20 rows=4 width=12)
   Index Cond: ((q1_1.attrid = 
281479288456451::bigint) AND (q1_1.aggval = t5.aggrid))
   Filter: (q1_1.aggrid = 0)
   ->  Index Scan using cooobjectix on cooobject t6  
(cost=0.00..8.22 rows=1 width=8)
 Index Cond: (t6.objid = t4.objid)
 Filter: (t6.objclassid = ANY 
('{285774255832590,285774255764301}'::bigint[]))
(20 rows)


As the disks pace is limited on my test system I can't provide the "explain 
analyze" output
If I change the query as follows the query takes only 12 seconds and only needs 
2 tmp files for sorting.
(Changed lines are marked with [!] as I don't know HTML-Mails will be 
delivered without conversion

explain SELECT DISTINCT t4.objid
FROM fscsubfile t4, cooobject t6
WHERE t6.objid = t4.objid AND
t4.fileresporgid = 573936067464397682 AND
   NOT EXISTS (
   SELECT 1
   FROM ataggval q1_1,
   atdateval t5
   WHERE q1_1.objid = t4.objid AND
   q1_1.attrid = 281479288456451 AND
   q1_1.aggrid = 0 AND
   t5.aggrid = q1_1.aggval AND
   t5.objid = q1_1.objid AND [!]
   t5.attrid = 281479288456447 ) AND
   ((t6.objclassid IN (285774255832590,285774255764301))) AND
   ((t4.objid > 573936097512390656 and t4.objid < 573936101807357952))
 ORDER BY t4.objid;

QUERY PLAN
--
Unique  (cost=918320.29..971968.88 rows=1 width=8)
   ->  Nested Loop  (cost=918320.29..971968.88 rows=1 width=8)
 ->  Merge Anti Join  (cost=918320.29..971960.65 rows=1 width=8)
   Merge Cond: (t4.objid = q1_1.objid)
   ->  Index Scan using ind_fscsubfile_filerespons on fscsubfile t4 
 (cost=0.00..19016.05 rows=5486 width=8)
 Index Cond: ((fileresporgid = 573936067464397682::bigint) 
AND (objid > 573936097512390656::bigint) AND (objid < 
573936101807357952::bigint))
   ->  Materialize  (cost=912418.42..956599.36 rows=22689 width=8)
 ->  Merge Join  (cost=912418.42..956372.47 rows=22689 
width=8)
   Merge Cond: ((t5.objid = q1

Re: [PERFORM] dates and partitioning

2010-07-20 Thread Jochen Erwied
Tuesday, July 20, 2010, 5:36:07 PM you wrote:


> 2) the above query creates a plan that does a sequential scan & filter on 
> every partition. Why won't it only hit the correct partitions? Is it due to
> the way the date was specified? or maybe the "at time zone" syntax?

Quick guess: How is your 'constraint_exclusion'-setting? 
Which version of postgres?

-- 
Jochen Erwied |   home: joc...@erwied.eu +49-208-38800-18, FAX: -19
Sauerbruchstr. 17 |   work: j...@mbs-software.de  +49-2151-7294-24, FAX: -50
D-45470 Muelheim  | mobile: jochen.erw...@vodafone.de   +49-173-5404164


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] dates and partitioning

2010-07-20 Thread Brad Nicholson
On Tue, 2010-07-20 at 09:36 -0600, Kevin Kempter wrote:
> Hi All;
> 
> we have a table partitioned by day, the check constraint on the child tables 
> looks like this (this is the may 31st partition):
> 
> CHECK 
> (stime >= '2010-05-30 00:00:00+00'::timestamp with time zone 
>   AND stime <= '2010-05-30 23:59:59+00'::timestamp with time zone)
> 
> 
> We have a python based app that creates code like this:
> 
> select
>  *
> from
> table_a a, 
> table_b b,
> table_d d
> where a.id = b.id
> and b.id = d.id
> and stime >= timestamp %s at time zone \'UTC\'
> and stime < timestamp %s at time zone \'UTC\'
> and stime >= timestamp %s at time zone d.name
> and stime < timestamp %s at time zone d.name
>...
> 
> 
> so here's my questions:
> 
> 1) the above app generated query pshows up like this in pg_stat_activity:
> 
> and stime >= timestamp E'2010-07-17' at time zone 'UTC'   
> and stime < timestamp E'2010-07-21' at time zone 'UTC' 
> and stime >= timestamp E'2010-07-18' at time zone d.name  
> and stime < timestamp E'2010-07-19' at time zone d.name 
> 
> what's the E'date' from? and why does it show up this way?

That's E is an escape character.  Python is likely putting that in.

See http://www.postgresql.org/docs/8.4/static/sql-syntax-lexical.html -
section 4.1.2.2

> 
> 2) the above query creates a plan that does a sequential scan & filter on 
> every partition. Why won't it only hit the correct partitions? Is it due to 
> the way the date was specified? or maybe the "at time zone" syntax?

Do you have constraint_exclusion turned on?

-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] dates and partitioning

2010-07-20 Thread Kevin Kempter
Hi All;

we have a table partitioned by day, the check constraint on the child tables 
looks like this (this is the may 31st partition):

CHECK 
(stime >= '2010-05-30 00:00:00+00'::timestamp with time zone 
  AND stime <= '2010-05-30 23:59:59+00'::timestamp with time zone)


We have a python based app that creates code like this:

select
 *
from
table_a a, 
table_b b,
table_d d
where a.id = b.id
and b.id = d.id
and stime >= timestamp %s at time zone \'UTC\'
and stime < timestamp %s at time zone \'UTC\'
and stime >= timestamp %s at time zone d.name
and stime < timestamp %s at time zone d.name
   ...


so here's my questions:

1) the above app generated query pshows up like this in pg_stat_activity:

and stime >= timestamp E'2010-07-17' at time zone 'UTC'   
and stime < timestamp E'2010-07-21' at time zone 'UTC' 
and stime >= timestamp E'2010-07-18' at time zone d.name  
and stime < timestamp E'2010-07-19' at time zone d.name 

what's the E'date' from? and why does it show up this way?


2) the above query creates a plan that does a sequential scan & filter on 
every partition. Why won't it only hit the correct partitions? Is it due to 
the way the date was specified? or maybe the "at time zone" syntax?


Thanks in advance...

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance