Re: [PERFORM] analyzing intermediate query

2008-12-02 Thread Andrus
Generally if you know your temptbl will always contains a few rows (say, 
generally a few and never more than a few thousands) it is better to use 
something like that :


- get list of items
- SELECT * FROM table WHERE id IN (...)


My list can contain 1 .. 10  records and table contains 300 records 
and is growing.


As discussed here few time ago, IN (...)  forces seq scan over 300 rows 
and maybe stack overflow exception also occurs (stack overflow occurs in 
8.0, maybe it is fixed in 8.1).


Using temp table + ANALYZE enables bitmap index scan for this query and is 
thus a lot faster.


I formerly used IN (...) but changed this to use temp table + primary key on 
temp table + analyze this temp table.


Using 8.1.4

I can switch this to temp table also if it helps.
This requires some special logic to generate temp table name since there may 
be a number of such tables in single transaction, so is would be major appl 
rewrite.


Andrus. 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] analyzing intermediate query

2008-12-02 Thread Andrus

I noticed that query

SELECT dok.*
FROM dok
JOIN  (SELECT DISTINCT dokumnr FROM  temptbl ) x USING(dokumnr);

is slow in 8.1.4
I cannot use explain analyze since this query uses results from temporary 
table temptbl which is not available.


Sometimes innter table returns only 1 row so maybe seq scan is selected 
instead of single row index access becauses expected count is 1000


As I understand, PostgreSql requires manually running ANALYZE for temporary 
tables if their row count is different from 1000


How to force PostgreSql to analyze inner table in this query or use other 
way to get index using query plan if inner query returns single row ?


How 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] analyzing intermediate query

2008-12-02 Thread Andrus
My list can contain 1 .. 10  records and table contains 300 
records and is growing.


Ah. No IN(), then ;)
Temp table + ANALYZE seems your only option...


In 8.3 or 8.4  I think that IN() or temp table produce exactly the same 
result.


Andrus. 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Seq scan over 3.3 millions of rows instead of using date and pattern indexes

2008-11-30 Thread Andrus

explain analyze SELECT sum(1)
  FROM dok JOIN rid USING (dokumnr)
  WHERE dok.kuupaev='2008-05-01'
and
( (
 dok.doktyyp IN
('V','G','Y','K','I','T','D','N','H','M','E','B','A','R','C','F','J','Q')
 AND CASE WHEN NOT dok.objrealt OR dok.doktyyp='I' THEN dok.yksus
ELSE rid.kuluobjekt END LIKE 'AEGVIIDU%'
 )
OR
( dok.doktyyp IN ('O','S','I','U','D','P')
 AND CASE WHEN dok.objrealt THEN rid.kuluobjekt ELSE dok.sihtyksus
END LIKE 'AEGVIIDU%'
 )
  )

Aggregate  (cost=369240.67..369240.68 rows=1 width=0) (actual
time=41135.557..41135.560 rows=1 loops=1)
  -  Hash Join  (cost=96614.24..369229.39 rows=4508 width=0) (actual
time=5859.704..40912.979 rows=59390 loops=1)
Hash Cond: (outer.dokumnr = inner.dokumnr)
Join Filter: inner.doktyyp = 'V'::bpchar) OR (inner.doktyyp
= 'G'::bpchar) OR (inner.doktyyp = 'Y'::bpchar) OR (inner.doktyyp =
'K'::bpchar) OR (inner.doktyyp = 'I'::bpchar) OR (inner.doktyyp =
'T'::bpchar) OR (inner.doktyyp = 'D'::bpchar) OR (inner.doktyyp =
'N'::bpchar) OR (inner.doktyyp = 'H'::bpchar) OR (inner.doktyyp =
'M'::bpchar) OR (inner.doktyyp = 'E'::bpchar) OR (inner.doktyyp =
'B'::bpchar) OR (inner.doktyyp = 'A'::bpchar) OR (inner.doktyyp =
'R'::bpchar) OR (inner.doktyyp = 'C'::bpchar) OR (inner.doktyyp =
'F'::bpchar) OR (inner.doktyyp = 'J'::bpchar) OR (inner.doktyyp =
'Q'::bpchar)) AND (CASE WHEN ((NOT (inner.objrealt)::boolean) OR
(inner.doktyyp = 'I'::bpchar)) THEN inner.yksus ELSE outer.kuluobjekt
END ~~ 'AEGVIIDU%'::text)) OR (((inner.doktyyp = 'O'::bpchar) OR
(inner.doktyyp = 'S'::bpchar) OR (inner.doktyyp = 'I'::bpchar) OR
(inner.doktyyp = 'U'::bpchar) OR (inner.doktyyp = 'D'::bpchar) OR
(inner.doktyyp = 'P'::bpchar)) AND (CASE WHEN (inner.objrealt)::boolean
THEN outer.kuluobjekt ELSE inner.sihtyksus END ~~ 'AEGVIIDU%'::text)))
-  Seq Scan on rid  (cost=0.00..129911.53 rows=3299853 width=18)
(actual time=0.039..17277.888 rows=3299777 loops=1)
-  Hash  (cost=92983.97..92983.97 rows=336110 width=38) (actual
time=3965.478..3965.478 rows=337455 loops=1)
  -  Bitmap Heap Scan on dok  (cost=1993.66..92983.97
rows=336110 width=38) (actual time=135.810..2389.703 rows=337455 loops=1)
Recheck Cond: (kuupaev = '2008-05-01'::date)
Filter: ((doktyyp = 'V'::bpchar) OR (doktyyp =
'G'::bpchar) OR (doktyyp = 'Y'::bpchar) OR (doktyyp = 'K'::bpchar) OR
(doktyyp = 'I'::bpchar) OR (doktyyp = 'T'::bpchar) OR (doktyyp =
'D'::bpchar) OR (doktyyp = 'N'::bpchar) OR (doktyyp = 'H'::bpchar) OR
(doktyyp = 'M'::bpchar) OR (doktyyp = 'E'::bpchar) OR (doktyyp =
'B'::bpchar) OR (doktyyp = 'A'::bpchar) OR (doktyyp = 'R'::bpchar) OR
(doktyyp = 'C'::bpchar) OR (doktyyp = 'F'::bpchar) OR (doktyyp =
'J'::bpchar) OR (doktyyp = 'Q'::bpchar) OR (doktyyp = 'O'::bpchar) OR
(doktyyp = 'S'::bpchar) OR (doktyyp = 'I'::bpchar) OR (doktyyp =
'U'::bpchar) OR (doktyyp = 'D'::bpchar) OR (doktyyp = 'P'::bpchar))
-  Bitmap Index Scan on dok_kuupaev_idx
(cost=0.00..1993.66 rows=347618 width=0) (actual time=97.881..97.881
rows=337770 loops=1)
  Index Cond: (kuupaev = '2008-05-01'::date)
Total runtime: 41136.348 ms

8.1.4 Db is analyzed, default_statistics_target is 40.
PostgreSql still choices seq scan over rid.

This query can optimized as follows:

1. kuupaev = '2008-05-01'  index can reduce number of scanned rows 10 times
(to 33)
2. AEGVIIDU%  can reduce number of rows 6 times (to 6)

How to force pg to use indexes for those conditions ?

This query can be executed against different shops groups (int this case
there is other value than  AEGVIIDU) and for different date.

There are 6 different shop groups containing roughly same number or records
each.
So using index on  AEGVIIDU% can decrease number of scanned rows 6 times.
Usually 90% of dok records contain 'Y' in dok.doktyyp column and
dok.objrealt is false for those records.

Is it possible to use come functional index or other method to speed it ?

rid.kuluobjekt, dok.yksus and dok.sihtyksus types are char(10).

There are indexes

dok(yksus bpchar_pattern_ops)
dok(sihtyksus bpchar_pattern_ops)

Is it possible to re-write query that it uses those indexes or create some
other indexes?
Using 8.1.4, us-en locale, utf-8 db encoding.
select column list is removed from sample.

Andrus. 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Increasing GROUP BY CHAR columns speed

2008-11-29 Thread Andrus
I'm still not sure why the planner chose to sort rather than hash with 
oversized work_mem (is there an implied order in the query results I 
missed?).


Group by contains decimal column exchrate. Maybe pg is not capable to use 
hash with numeric datatype.


  My guess is that this query can still get much faster if a hash is 
possible on the last part.  It looks like the gain so far has more to do 
with sorting purely in memory which reduced the number of compares 
required.  But that is just a guess.


I fixed this by adding cast to :::float

bilkaib.exchrate:::float

In this case query is much faster.
Hopefully this will not affect to result since numeric(13,8) can casted to 
float without data loss.


Andrus. 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Increasing pattern index query speed

2008-11-28 Thread Andrus
 WHEN NOT dok.objrealt OR dok.doktyyp='I' THEN dok.yksus
ELSE rid.kuluobjekt END LIKE 'RIISIPERE%'
 )
OR
( dok.doktyyp IN ('O','S','I','U','D','P')
 AND CASE WHEN dok.objrealt THEN rid.kuluobjekt ELSE dok.sihtyksus
END LIKE 'RIISIPERE%'
 )

or is it possible to re-write this condition so that it uses existing
pattern indexes

CREATE INDEX dok_sihtyksus_pattern_idx  ON firma2.dok  (sihtyksus
bpchar_pattern_ops);
CREATE INDEX dok_yksus_pattern_idx ON firma2.dok  (yksus
bpchar_pattern_ops);

without changing tables structureˇ?

Andrus. 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Increasing GROUP BY CHAR columns speed

2008-11-28 Thread Andrus

Group by using CHAR columns takes abnormally big time.

How to speed it ?

Andrus.

8.1.4, cluster locale is en-us, db encoding is utf-8

set search_path to firma2,public;
explain analyze SELECT
CASE WHEN bilkaib.raha='EEK' THEN 0 ELSE bilkaib.id END,
bilkaib.DB,
CASE WHEN dbkonto.objekt1='+' THEN bilkaib.DBOBJEKT ELSE null END:: 
CHAR(10) AS dbobjekt,
CASE WHEN dbkonto.objekt2='+' THEN bilkaib.DB2OBJEKT ELSE null END:: 
CHAR(10) AS db2objekt,
CASE WHEN dbkonto.objekt3='+' THEN bilkaib.DB3OBJEKT ELSE null END:: 
CHAR(10) AS db3objekt,
CASE WHEN dbkonto.objekt4='+' THEN bilkaib.DB4OBJEKT ELSE null END:: 
CHAR(10) AS db4objekt,
CASE WHEN dbkonto.objekt5='+' THEN bilkaib.DB5OBJEKT ELSE null END:: 
CHAR(10) AS db5objekt,
CASE WHEN dbkonto.objekt6='+' THEN bilkaib.DB6OBJEKT ELSE null END:: 
CHAR(10) AS db6objekt,
CASE WHEN dbkonto.objekt7='+' THEN bilkaib.DB7OBJEKT ELSE null END:: 
CHAR(10) AS db7objekt,
CASE WHEN dbkonto.objekt8='+' THEN bilkaib.DB8OBJEKT ELSE null END:: 
CHAR(10) AS db8objekt,
CASE WHEN dbkonto.objekt9='+' THEN bilkaib.DB9OBJEKT ELSE null END:: 
CHAR(10) AS db9objekt,

bilkaib.CR,
CASE WHEN crkonto.objekt1='+' THEN bilkaib.crOBJEKT ELSE null END:: 
CHAR(10) AS crobjekt,
CASE WHEN crkonto.objekt2='+' THEN bilkaib.cr2OBJEKT ELSE null END:: 
CHAR(10) AS cr2objekt,
CASE WHEN crkonto.objekt3='+' THEN bilkaib.cr3OBJEKT ELSE null END:: 
CHAR(10) AS cr3objekt,
CASE WHEN crkonto.objekt4='+' THEN bilkaib.cr4OBJEKT ELSE null END:: 
CHAR(10) AS cr4objekt,
CASE WHEN crkonto.objekt5='+' THEN bilkaib.cr5OBJEKT ELSE null END:: 
CHAR(10) AS cr5objekt,
CASE WHEN crkonto.objekt6='+' THEN bilkaib.cr6OBJEKT ELSE null END:: 
CHAR(10) AS cr6objekt,
CASE WHEN crkonto.objekt7='+' THEN bilkaib.cr7OBJEKT ELSE null END:: 
CHAR(10) AS cr7objekt,
CASE WHEN crkonto.objekt8='+' THEN bilkaib.cr8OBJEKT ELSE null END:: 
CHAR(10) AS cr8objekt,
CASE WHEN crkonto.objekt9='+' THEN bilkaib.cr9OBJEKT ELSE null END:: 
CHAR(10) AS cr9objekt,

bilkaib.RAHA,
CASE WHEN crkonto.klienkaupa OR dbkonto.klienkaupa OR crkonto.tyyp IN 
('K','I') OR dbkonto.tyyp IN ('K','I')

THEN  bilkaib.KLIENT ELSE NULL END AS klient,

bilkaib.EXCHRATE,

CASE WHEN crkonto.klienkaupa OR dbkonto.klienkaupa
  OR crkonto.tyyp IN ('K','I') OR dbkonto.tyyp IN ('K','I')
THEN
  klient.nimi ELSE NULL END AS kliendinim,  -- 24.

CAST(CASE WHEN crkonto.arvekaupa OR dbkonto.arvekaupa
  OR (bilkaib.cr'00' AND crkonto.tyyp='K')
  OR (bilkaib.db'00' AND dbkonto.tyyp='K')
THEN bilkaib.doknr ELSE NULL END AS CHAR(25)) AS doknr

,bilkaib.ratediffer
,CASE WHEN bilkaib.raha='EEK' THEN DATE'20070101' ELSE bilkaib.kuupaev END 
AS kuupaev


,SUM(bilkaib.summa)::numeric(14,2) AS summa
  from BILKAIB join KONTO CRKONTO ON bilkaib.cr=crkonto.kontonr AND
  crkonto.iseloom='A'
join KONTO DBKONTO ON bilkaib.db=dbkonto.kontonr AND
  dbkonto.iseloom='A'
left join klient on bilkaib.klient=klient.kood
  where  ( bilkaib.cr LIKE '112'||'%' OR bilkaib.db LIKE '112'||'%' ) AND 
bilkaib.kuupaev BETWEEN '2007-01-01' AND '2008-11-26'
 GROUP BY 
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28


GroupAggregate  (cost=52316.23..61434.48 rows=41923 width=838) (actual 
time=10771.337..11372.135 rows=577 loops=1)
  -  Sort  (cost=52316.23..52421.03 rows=41923 width=838) (actual 
time=10770.529..11012.651 rows=52156 loops=1)
Sort Key: CASE WHEN (bilkaib.raha = 'EEK'::bpchar) THEN 0 ELSE 
bilkaib.id END, bilkaib.db, (CASE WHEN (dbkonto.objekt1 = '+'::bpchar) THEN 
bilkaib.dbobjekt ELSE NULL::bpchar END)::character(10), (CASE WHEN 
(dbkonto.objekt2 = '+'::bpchar) THEN bilkaib.db2objekt ELSE NULL::bpchar 
END)::character(10), (CASE WHEN (dbkonto.objekt3 = '+'::bpchar) THEN 
bilkaib.db3objekt ELSE NULL::bpchar END)::character(10), (CASE WHEN 
(dbkonto.objekt4 = '+'::bpchar) THEN bilkaib.db4objekt ELSE NULL::bpchar 
END)::character(10), (CASE WHEN (dbkonto.objekt5 = '+'::bpchar) THEN 
bilkaib.db5objekt ELSE NULL::bpchar END)::character(10), (CASE WHEN 
(dbkonto.objekt6 = '+'::bpchar) THEN bilkaib.db6objekt ELSE NULL::bpchar 
END)::character(10), (CASE WHEN (dbkonto.objekt7 = '+'::bpchar) THEN 
bilkaib.db7objekt ELSE NULL::bpchar END)::character(10), (CASE WHEN 
(dbkonto.objekt8 = '+'::bpchar) THEN bilkaib.db8objekt ELSE NULL::bpchar 
END)::character(10), (CASE WHEN (dbkonto.objekt9 = '+'::bpchar) THEN 
bilkaib.db9objekt ELSE NULL::bpchar END)::character(10), bilkaib.cr, (CASE 
WHEN (crkonto.objekt1 = '+'::bpchar) THEN bilkaib.crobjekt ELSE NULL::bpchar 
END)::character(10), (CASE WHEN (crkonto.objekt2 = '+'::bpchar) THEN 
bilkaib.cr2objekt ELSE NULL::bpchar END)::character(10), (CASE WHEN 
(crkonto.objekt3 = '+'::bpchar) THEN bilkaib.cr3objekt ELSE NULL::bpchar 
END)::character(10), (CASE WHEN (crkonto.objekt4 = '+'::bpchar) THEN 
bilkaib.cr4objekt ELSE NULL::bpchar END)::character(10), (CASE WHEN 
(crkonto.objekt5 = '+'::bpchar) THEN bilkaib.cr5objekt ELSE NULL::bpchar 
END)::character(10), (CASE WHEN (crkonto.objekt6 = '+'::bpchar) THEN 
bilkaib.cr6objekt

Re: [PERFORM] Increasing GROUP BY CHAR columns speed

2008-11-28 Thread Andrus

Scott,

Thank you.


The below query is spending most of its time in the sort, or perhaps the
complicated check condition before it.
The explain has a 8 second gap in time between the 2.8 seconds after the
Hash Left Join and before the Sort.  I'm guessing its hidden in the sort.
You can get the planner to switch from a sort to a hash aggregate with a
large work_mem.  Try calling
SET work_mem = '100MB';
before this query first.
It may not help that much if the check time is as expensive as it looks in
the plan below, but its very easy to try.
If it does help, you may want to temporarily increase that value only for
this query rather than making it a default in the config file.


SET work_mem = 2097151  (this is max allowed value) or SET work_mem = 97151
decreases query time from 12 seconds to 9 seconds.

My application may ran in servers with  1 GB RAM only. I'm afraid than in
those servers 2097151  will cause error and abort query.

Is it reasonable to add

SET work_mem = 97151

before this query and

SET work_mem TO  DEFAULT

after this query ?
Or should I use max value in cases where there are much more data ? This
query may return a much more data for longer period and more accounts.

CASE WHEN dbkonto.objekt1='+' THEN bilkaib.DBOBJEKT ELSE null END::
CHAR(10) AS dbobjekt

is ugly.  I tried to rewrite it using  NullIfNot(dbkonto.objekt1, '+') AS
dbobjekt, bot got error
ERROR:  function nullifnot(character, unknown) does not exist

How to re-write this in nicer and faster way ?

For most of rows checks

WHEN objektn='+'

will fail: objektn values are usually rarely equal to '+':  they are empty
or null mostly.

Maybe this can be used to optimize the query.

Btw.
Tom Lane's reply from earlier discussion about this query speed (then there
were '' instead of NULL in group columns) some years ago:

I think the problem is probably that you're sorting two dozen CHAR
columns, and that in many of the rows all these entries are '' forcing
the sort code to compare all two dozen columns (not so)?  So the sort
ends up doing lots and lots and lots of CHAR comparisons.  Which can
be slow, especially in non-C locales.

locale specific check is not  nessecary for those CHAR(10) columns. How to
force PostgreSql to use binary check for grouping ?
Some dbms allow to mark columns as C locale. I havent found this nor
chartobin() function in PostgreSql.
Will creating BinaryNullIfNot(dbkonto.objekt1, '+')   function solve this ?

Andrus.

New testcase:

set search_path to firma2,public;
SET work_mem = 2097151;  -- 9 seconds
-- SET work_mem = 1097151;  -- 9 seconds
--SET work_mem to default; -- 12 seconds

explain analyze SELECT
CASE WHEN bilkaib.raha='EEK' THEN 0 ELSE bilkaib.id END,
bilkaib.DB,
CASE WHEN dbkonto.objekt1='+' THEN bilkaib.DBOBJEKT ELSE null END::
CHAR(10) AS dbobjekt,
CASE WHEN dbkonto.objekt2='+' THEN bilkaib.DB2OBJEKT ELSE null END::
CHAR(10) AS db2objekt,
CASE WHEN dbkonto.objekt3='+' THEN bilkaib.DB3OBJEKT ELSE null END::
CHAR(10) AS db3objekt,
CASE WHEN dbkonto.objekt4='+' THEN bilkaib.DB4OBJEKT ELSE null END::
CHAR(10) AS db4objekt,
CASE WHEN dbkonto.objekt5='+' THEN bilkaib.DB5OBJEKT ELSE null END::
CHAR(10) AS db5objekt,
CASE WHEN dbkonto.objekt6='+' THEN bilkaib.DB6OBJEKT ELSE null END::
CHAR(10) AS db6objekt,
CASE WHEN dbkonto.objekt7='+' THEN bilkaib.DB7OBJEKT ELSE null END::
CHAR(10) AS db7objekt,
CASE WHEN dbkonto.objekt8='+' THEN bilkaib.DB8OBJEKT ELSE null END::
CHAR(10) AS db8objekt,
CASE WHEN dbkonto.objekt9='+' THEN bilkaib.DB9OBJEKT ELSE null END::
CHAR(10) AS db9objekt,
bilkaib.CR,
CASE WHEN crkonto.objekt1='+' THEN bilkaib.crOBJEKT ELSE null END::
CHAR(10) AS crobjekt,
CASE WHEN crkonto.objekt2='+' THEN bilkaib.cr2OBJEKT ELSE null END::
CHAR(10) AS cr2objekt,
CASE WHEN crkonto.objekt3='+' THEN bilkaib.cr3OBJEKT ELSE null END::
CHAR(10) AS cr3objekt,
CASE WHEN crkonto.objekt4='+' THEN bilkaib.cr4OBJEKT ELSE null END::
CHAR(10) AS cr4objekt,
CASE WHEN crkonto.objekt5='+' THEN bilkaib.cr5OBJEKT ELSE null END::
CHAR(10) AS cr5objekt,
CASE WHEN crkonto.objekt6='+' THEN bilkaib.cr6OBJEKT ELSE null END::
CHAR(10) AS cr6objekt,
CASE WHEN crkonto.objekt7='+' THEN bilkaib.cr7OBJEKT ELSE null END::
CHAR(10) AS cr7objekt,
CASE WHEN crkonto.objekt8='+' THEN bilkaib.cr8OBJEKT ELSE null END::
CHAR(10) AS cr8objekt,
CASE WHEN crkonto.objekt9='+' THEN bilkaib.cr9OBJEKT ELSE null END::
CHAR(10) AS cr9objekt,
bilkaib.RAHA,
CASE WHEN crkonto.klienkaupa OR dbkonto.klienkaupa OR crkonto.tyyp IN
('K','I') OR dbkonto.tyyp IN ('K','I')
THEN  bilkaib.KLIENT ELSE NULL END AS klient,

bilkaib.EXCHRATE,

CASE WHEN crkonto.klienkaupa OR dbkonto.klienkaupa
  OR crkonto.tyyp IN ('K','I') OR dbkonto.tyyp IN ('K','I')
THEN
  klient.nimi ELSE NULL END AS kliendinim,  -- 24.

CAST(CASE WHEN crkonto.arvekaupa OR dbkonto.arvekaupa
  OR (bilkaib.cr'00' AND crkonto.tyyp='K')
  OR (bilkaib.db'00' AND dbkonto.tyyp='K')
THEN bilkaib.doknr ELSE NULL END AS CHAR(25)) AS doknr

,bilkaib.ratediffer
,CASE WHEN bilkaib.raha='EEK

Re: [PERFORM] Increasing GROUP BY CHAR columns speed

2008-11-28 Thread Andrus
Application should work with any server starting at 8.1 with any RAM size 
(probably starting at 1 GB).


How to find safe value which does not cause error in SET work_mem command ?

If I use 2 GB maybe this can cause excaption when running in server with 1 
GB RAM where this setting may be not allowed.


Andrus. 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Increasing GROUP BY CHAR columns speed

2008-11-28 Thread Andrus

I it seems that slowness is caused by grouping by column

exchrate numeric(13,8)

if this column is excluded, query takes 12 seconds
if this column in present, query takes 27 (!) seconds.
How to fix this ?

Andrus.

set search_path to firma2,public;
SET work_mem = 2097151 ;
explain analyze SELECT
CASE WHEN bilkaib.raha='EEK' THEN 0 ELSE bilkaib.id END,
bilkaib.DB,
bilkaib.CR,
  bilkaib.RAHA, -- 12 sek
bilkaib.EXCHRATE, -- 27 sec
SUM(bilkaib.summa)::numeric(14,2) AS summa
from BILKAIB join KONTO CRKONTO ON bilkaib.cr=crkonto.kontonr AND
  crkonto.iseloom='A'
join KONTO DBKONTO ON bilkaib.db=dbkonto.kontonr AND
  dbkonto.iseloom='A'
left join klient on bilkaib.klient=klient.kood
WHERE --(bilkaib.cr LIKE ''||'%' OR bilkaib.db LIKE ''||'%')
 bilkaib.kuupaev BETWEEN '2008-01-01' AND '2008-12-31'
GROUP BY 1,2,3,4,5

GroupAggregate  (cost=71338.72..79761.05 rows=240638 width=58) (actual 
time=24570.085..27382.022 rows=217 loops=1)
  -  Sort  (cost=71338.72..71940.31 rows=240638 width=58) (actual 
time=24566.700..25744.006 rows=322202 loops=1)
Sort Key: CASE WHEN (bilkaib.raha = 'EEK'::bpchar) THEN 0 ELSE 
bilkaib.id END, bilkaib.db, bilkaib.cr, bilkaib.raha, bilkaib.exchrate
-  Hash Left Join  (cost=193.31..49829.89 rows=240638 width=58) 
(actual time=17.072..9901.578 rows=322202 loops=1)

  Hash Cond: (outer.klient = inner.kood)
  -  Hash Join  (cost=42.18..45624.00 rows=240638 width=74) 
(actual time=4.715..7151.111 rows=322202 loops=1)

Hash Cond: (outer.cr = inner.kontonr)
-  Hash Join  (cost=21.09..41803.63 rows=278581 
width=74) (actual time=2.306..4598.703 rows=322202 loops=1)

  Hash Cond: (outer.db = inner.kontonr)
  -  Seq Scan on bilkaib  (cost=0.00..37384.19 
rows=322507 width=74) (actual time=0.075..1895.027 rows=322202 loops=1)
Filter: ((kuupaev = '2008-01-01'::date) 
AND (kuupaev = '2008-12-31'::date))
  -  Hash  (cost=20.49..20.49 rows=241 width=14) 
(actual time=2.193..2.193 rows=241 loops=1)
-  Seq Scan on konto dbkonto 
(cost=0.00..20.49 rows=241 width=14) (actual time=0.011..1.189 rows=241 
loops=1)

  Filter: (iseloom = 'A'::bpchar)
-  Hash  (cost=20.49..20.49 rows=241 width=14) (actual 
time=2.386..2.386 rows=241 loops=1)
  -  Seq Scan on konto crkonto  (cost=0.00..20.49 
rows=241 width=14) (actual time=0.020..1.394 rows=241 loops=1)

Filter: (iseloom = 'A'::bpchar)
  -  Hash  (cost=147.90..147.90 rows=1290 width=16) (actual 
time=12.319..12.319 rows=1290 loops=1)
-  Seq Scan on klient  (cost=0.00..147.90 rows=1290 
width=16) (actual time=0.032..6.979 rows=1290 loops=1)

Total runtime: 27434.724 ms


set search_path to firma2,public;
SET work_mem = 2097151 ;
explain analyze SELECT
CASE WHEN bilkaib.raha='EEK' THEN 0 ELSE bilkaib.id END,
bilkaib.DB,
bilkaib.CR,
  bilkaib.RAHA,
SUM(bilkaib.summa)::numeric(14,2) AS summa
from BILKAIB join KONTO CRKONTO ON bilkaib.cr=crkonto.kontonr AND
  crkonto.iseloom='A'
join KONTO DBKONTO ON bilkaib.db=dbkonto.kontonr AND
  dbkonto.iseloom='A'
left join klient on bilkaib.klient=klient.kood
WHERE
 bilkaib.kuupaev BETWEEN '2008-01-01' AND '2008-12-31'
GROUP BY 1,2,3,4

HashAggregate  (cost=52837.86..57049.03 rows=240638 width=50) (actual 
time=11744.137..11745.578 rows=215 loops=1)
  -  Hash Left Join  (cost=193.31..49829.89 rows=240638 width=50) (actual 
time=17.330..9826.549 rows=322202 loops=1)

Hash Cond: (outer.klient = inner.kood)
-  Hash Join  (cost=42.18..45624.00 rows=240638 width=66) (actual 
time=4.804..7141.983 rows=322202 loops=1)

  Hash Cond: (outer.cr = inner.kontonr)
  -  Hash Join  (cost=21.09..41803.63 rows=278581 width=66) 
(actual time=2.343..4600.683 rows=322202 loops=1)

Hash Cond: (outer.db = inner.kontonr)
-  Seq Scan on bilkaib  (cost=0.00..37384.19 
rows=322507 width=66) (actual time=0.081..1939.376 rows=322202 loops=1)
  Filter: ((kuupaev = '2008-01-01'::date) AND 
(kuupaev = '2008-12-31'::date))
-  Hash  (cost=20.49..20.49 rows=241 width=14) (actual 
time=2.207..2.207 rows=241 loops=1)
  -  Seq Scan on konto dbkonto  (cost=0.00..20.49 
rows=241 width=14) (actual time=0.014..1.179 rows=241 loops=1)

Filter: (iseloom = 'A'::bpchar)
  -  Hash  (cost=20.49..20.49 rows=241 width=14) (actual 
time=2.426..2.426 rows=241 loops=1)
-  Seq Scan on konto crkonto  (cost=0.00..20.49 
rows=241 width=14) (actual time=0.029..1.444 rows=241 loops=1)

  Filter: (iseloom = 'A'::bpchar)
-  Hash  (cost=147.90..147.90

Re: [PERFORM] Increasing GROUP BY CHAR columns speed

2008-11-28 Thread Andrus

I it seems that slowness is caused by grouping by column

exchrate numeric(13,8)


exchrate has different values in few rows.
It has usually value 0
In this sample query it is always 0.

I tried not change exchrate with 

nullif( bilkaib.EXCHRATE,0)  


but this does not up speed query, no idea why.

Andrus.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Increasing pattern index query speed

2008-11-26 Thread Andrus

Richard and Mario,


You can't use xxx_pattern_ops indexes for non-pattern tests.


I missed regular index. Sorry for that. Now issue with testcase is solved. 
Thank you very much.


I researched issue in live 8.1.4 db a bit more.
Performed vacuum and whole db reindex.
Tried several times to run two same pattern queries in quiet db.

additonal condition

AND dok.kuupaev BETWEEN date'2008-11-21' AND date'2008-11-21'

takes 239 seconds to run.

additonal condition

AND dok.kuupaev = date'2008-11-21'

takes 1 seconds.

Both query conditions are logically the same.
How to make BETWEEN query fast (real queries are running as between queries 
over some date range)?


P.S. VACUUM issues warning that free space map 15 is not sufficient, 
16 nodes reqired.
Two days ago after vacuum full there were 6 used enties in FSM. No idea 
why this occurs.


Andrus.

set search_path to firma2,public;
explain analyze SELECT sum(1)
  FROM dok JOIN rid USING (dokumnr)
  JOIN toode USING (toode)
  WHERE rid.toode like '9910%'
  AND dok.kuupaev BETWEEN date'2008-11-21' AND date'2008-11-21'
Aggregate  (cost=17.86..17.87 rows=1 width=0) (actual 
time=239346.647..239346.651 rows=1 loops=1)
  -  Nested Loop  (cost=0.00..17.85 rows=1 width=0) (actual 
time=3429.715..239345.923 rows=108 loops=1)
-  Nested Loop  (cost=0.00..11.84 rows=1 width=24) (actual 
time=3429.666..239339.687 rows=108 loops=1)

  Join Filter: (outer.dokumnr = inner.dokumnr)
  -  Index Scan using dok_kuupaev_idx on dok  (cost=0.00..5.81 
rows=1 width=4) (actual time=0.028..13.341 rows=1678 loops=1)
Index Cond: ((kuupaev = '2008-11-21'::date) AND 
(kuupaev = '2008-11-21'::date))
  -  Index Scan using rid_toode_pattern_idx on rid 
(cost=0.00..6.01 rows=1 width=28) (actual time=0.025..86.156 rows=15402 
loops=1678)
Index Cond: ((toode ~=~ '9910'::bpchar) AND (toode 
~~ '9911'::bpchar))

Filter: (toode ~~ '9910%'::text)
-  Index Scan using toode_pkey on toode  (cost=0.00..6.00 rows=1 
width=24) (actual time=0.032..0.037 rows=1 loops=108)

  Index Cond: (outer.toode = toode.toode)
Total runtime: 239347.132 ms

explain analyze SELECT sum(1)
  FROM dok JOIN rid USING (dokumnr)
  JOIN toode USING (toode)
  WHERE rid.toode like '9910%'
  AND dok.kuupaev = date'2008-11-21'
Aggregate  (cost=17.86..17.87 rows=1 width=0) (actual time=707.028..707.032 
rows=1 loops=1)
  -  Nested Loop  (cost=0.00..17.85 rows=1 width=0) (actual 
time=60.890..706.460 rows=108 loops=1)
-  Nested Loop  (cost=0.00..11.84 rows=1 width=24) (actual 
time=60.848..701.908 rows=108 loops=1)
  -  Index Scan using rid_toode_pattern_idx on rid 
(cost=0.00..6.01 rows=1 width=28) (actual time=0.120..247.636 rows=15402 
loops=1)
Index Cond: ((toode ~=~ '9910'::bpchar) AND (toode 
~~ '9911'::bpchar))

Filter: (toode ~~ '9910%'::text)
  -  Index Scan using dok_dokumnr_idx on dok  (cost=0.00..5.81 
rows=1 width=4) (actual time=0.021..0.021 rows=0 loops=15402)

Index Cond: (dok.dokumnr = outer.dokumnr)
Filter: (kuupaev = '2008-11-21'::date)
-  Index Scan using toode_pkey on toode  (cost=0.00..6.00 rows=1 
width=24) (actual time=0.021..0.026 rows=1 loops=108)

  Index Cond: (outer.toode = toode.toode)
Total runtime: 707.250 ms

vmstat 5 output during running slower query:

procs ---memory-- ---swap-- -io --system-- cpu
r  b   swpd   free   buff  cache   si   sobibo   incs us sy id 
wa
2  0332 738552  0 126483200 4 1111  6  1 83 
10
1  0332 738520  0 126483200 0   135  25934 24 76  0 
0
1  0332 738488  0 126483200 0   112  26342 24 76  0 
0
1  0332 738504  0 126483200 013  25219 23 77  0 
0
1  0332 738528  0 126483200 031  25526 26 74  0 
0
1  0332 738528  0 126483200 0 6  25118 27 73  0 
0
1  0332 738544  0 126485600 522  25425 27 73  0 
0
1  0332 737908  0 126485600 013  25222 27 73  0 
0
1  0332 737932  0 126485600 0 2  25118 23 77  0 
0
1  0332 737932  0 126485600 0 2  25117 25 75  0 
0
1  0332 737932  0 126485600 0 4  25219 25 75  0 
0
1  0332 737932  0 126485600 0 0  25016 26 74  0 
0
1  0332 737932  0 126485600 0 8  25219 26 74  0 
0
1  0332 737924  0 126485600 067  25219 24 76  0 
0
1  0332 737900  0 126485600 013  25837 25 75  0 
0
1  0332 737916  0 126485600 0 0  25116 26 74  0 
0
1  0332 737932  0 126485600 0 2

Re: [PERFORM] Increasing pattern index query speed

2008-11-26 Thread Andrus

Scott,

My first thought on the query where a pattern being faster than the query 
with an exact value is that the planner does not have good enough 
statistics on that column.  Without looking at the explain plans further, I 
would suggest trying something simple.  The fact that it is fasster on 8.3 
but slower on 8.1 may have to do with changes between versions, or may 
simply be due to luck in the statistics sampling.
See if increasing the statistics target on that column significantly does 
anything:

EXPLAIN (your query);

ALTER TABLE orders_products ALTER COLUMN product_id SET STATISTICS 2000;
ANALYZE orders_products;
EXPLAIN (your query);
2000 is simply a guess of mine for a value much larger than the default. 
This will generally make query planning slower but the system will have a 
lot more data about that column and the distribution of data in it.  This 
should help stabilize the query performance.

If this has an effect, the query plans will change.
Your question below really boils down to something more simple:
--Why is the most optimal query plan not chosen?  This is usually due to 
either insufficient statistics or quirks in how the query planner works on 
a specific data set or with certain configuration options.


Thank you very much.
I found that  AND dok.kuupaev = date'2008-11-21' runs fast but
AND dok.kuupaev BETWEEN date'2008-11-21' AND date'2008-11-21'  runs very 
slow.


explain SELECT sum(1)
  FROM dok JOIN rid USING (dokumnr)
  JOIN toode USING (toode)
  WHERE rid.toode like '9910%'

plan with default statistics:

Aggregate  (cost=17.86..17.87 rows=1 width=0)
  -  Nested Loop  (cost=0.00..17.85 rows=1 width=0)
-  Nested Loop  (cost=0.00..11.84 rows=1 width=24)
  Join Filter: (outer.dokumnr = inner.dokumnr)
  -  Index Scan using dok_kuupaev_idx on dok  (cost=0.00..5.81 
rows=1 width=4)
Index Cond: ((kuupaev = '2008-11-21'::date) AND 
(kuupaev = '2008-11-21'::date))
  -  Index Scan using rid_toode_pattern_idx on rid 
(cost=0.00..6.01 rows=1 width=28)
Index Cond: ((toode ~=~ '9910'::bpchar) AND (toode 
~~ '9911'::bpchar))

Filter: (toode ~~ '9910%'::text)
-  Index Scan using toode_pkey on toode  (cost=0.00..6.00 rows=1 
width=24)

  Index Cond: (outer.toode = toode.toode)

after statistics is changed query runs fast ( 70 ... 1000 ms)

ALTER TABLE rid ALTER COLUMN toode SET STATISTICS 1000;
analyze rid;
explain analyze SELECT sum(1)
  FROM dok JOIN rid USING (dokumnr)
  JOIN toode USING (toode)
  WHERE rid.toode like '9910%'
  AND dok.kuupaev BETWEEN date'2008-11-21' AND date'2008-11-21'
Aggregate  (cost=27.04..27.05 rows=1 width=0) (actual time=44.830..44.834 
rows=1 loops=1)
  -  Nested Loop  (cost=0.00..27.04 rows=1 width=0) (actual 
time=0.727..44.370 rows=108 loops=1)
-  Nested Loop  (cost=0.00..21.02 rows=1 width=24) (actual 
time=0.688..40.519 rows=108 loops=1)
  -  Index Scan using dok_kuupaev_idx on dok  (cost=0.00..5.81 
rows=1 width=4) (actual time=0.027..8.094 rows=1678 loops=1)
Index Cond: ((kuupaev = '2008-11-21'::date) AND 
(kuupaev = '2008-11-21'::date))
  -  Index Scan using rid_dokumnr_idx on rid 
(cost=0.00..15.20 rows=1 width=28) (actual time=0.011..0.011 rows=0 
loops=1678)

Index Cond: (outer.dokumnr = rid.dokumnr)
Filter: (toode ~~ '9910%'::text)
-  Index Scan using toode_pkey on toode  (cost=0.00..6.00 rows=1 
width=24) (actual time=0.016..0.020 rows=1 loops=108)

  Index Cond: (outer.toode = toode.toode)
Total runtime: 45.050 ms

It seems that you are genius.

I used 1000 since doc wrote that max value is 1000

Rid table contains 3.5millions rows, will increase 1 millions of rows per 
year and is updated frequently, mostly by adding.


Is it OK to leave

SET STATISTICS 1000;

setting for this table this column or should  I try to decrease it ?

Andrus. 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Increasing pattern index query speed

2008-11-26 Thread Andrus

Richard,


And the results were?


Results are provided in bottom of the message to which you replied.


One problem at a time. Let's get the pattern-matching speed problems on
your live server sorted, then we can look at different queries.


First message in this thread described the issue with query having
additional condition

AND dok.kuupaev BETWEEN '2008-11-21' AND  '2008-11-21'

It seems that this problem occurs when pattern matching and BETWEEN
conditions are used in same query.

According to Scott Garey great recommendation I added

ALTER TABLE rid ALTER COLUMN toode SET STATISTICS 1000;

This fixes testcase in live server, see my other message.
Is it OK to run

ALTER TABLE rid ALTER COLUMN toode SET STATISTICS 1000

in prod database or should I try to decrease 1000 to smaller value ?
rid is big increasing table and is changed frequently, mostly by adding 
rows.


Andrus. 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Increasing pattern index query speed

2008-11-26 Thread Andrus

Is it OK to run

ALTER TABLE rid ALTER COLUMN toode SET STATISTICS 1000

in prod database or should I try to decrease 1000 to smaller value ?
rid is big increasing table and is changed frequently, mostly by adding
rows.


pgAdmin shows default_statistic_target value has its default value  10 in 
postgresql.conf file


Andrus. 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Increasing pattern index query speed

2008-11-26 Thread Andrus

Richard,


Results are provided in bottom of the message to which you replied.


No - the explains there were contrasting a date test BETWEEN versus =.


I changed rid.toode statitics target to 100:

ALTER TABLE firma2.rid ALTER COLUMN toode SET STATISTICS 100;
analyze firma2.rid;

Analyze takes 3 seconds and testcase rans fast.
I'm planning to monitor results by looking log file for queries which take 
longer than 10 seconds.


Do you still need results ?
If yes, which query and how many times should I run?


Ah, I think I understand. The test case was *missing* this clause.


I added  this clause to testcase. Also added char(70) colums containing 
padding characters to all three tables. Cannot still reproduce this issue

in testcase in fast devel 8.3 notebook.
In testcase order_products contains product_id values in a very regular 
order, maybe this affects the results. No idea how to use random() to 
generate random

products for every order.

Andrus. 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hash join on int takes 8..114 seconds

2008-11-24 Thread Andrus

Tomas,

Let's suppose you set a reasonable value (say 8096) instead of 2GB. That 
gives about 160MB.
Anyway this depends - if you have a lot of slow queries caused by on-disk 
sorts / hash tables, use a higher value. Otherwise leave it as it is.


Probably product orders table is frequently joined which product table.
currently there was work_memory = 512 in conf file.

I changed it to work_memory = 8096


If it is all cached in memory, you may want to ensure that your
shared_buffers is a reasonalbe size so that there is less shuffling of 
data
from the kernel to postgres and back.  Generally, shared_buffers works 
best

between 5% and 25% of system memory.


currently shared_buffers = 15000


That's 120MB, i.e. about 6% of the memory. Might be a little bit higher, 
but seems reasonable.


I changed it to 2

Given the fact that the performance issues are caused by bloated tables 
and / or slow I/O subsystem, moving to a similar system won't help I 
guess.


I have ran VACUUM FULL ANALYZE VERBOSE
and set MAX_FSM_PAGES = 15

So there is no any bloat except pg_shdepend indexes which should not affect 
to query speed.


Andrus. 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] limit clause produces wrong query plan

2008-11-24 Thread Andrus

Scott,


And how exactly should it be optimized?  If a query is even moderately
interesting, with a few joins and a where clause, postgresql HAS to
create the rows that come before your offset in order to assure that
it's giving you the right rows.


SELECT ... FROM bigtable ORDER BY intprimarykey OFFSET 100 LIMIT 100

It should scan primary key in index order for 200 first keys and skipping 
first 100 keys.



SELECT ... FROM bigtable ORDER BY intprimarykey OFFSET 0 LIMIT 100


That should be plenty fast.


The example which I  posted shows that

SELECT ... FROM bigtable ORDER BY intprimarykey LIMIT 100

this is extremely *slow*: seq scan is  performed over whole bigtable.


A standard workaround is to use some kind of sequential, or nearly so,
id field, and then use between on that field.

select * from table where idfield between x and x+100;


Users can delete and insert any rows in table.
This appoarch requires updating x in every row in big table after each
insert, delete or order column change and is thus extremely slow.
So I do'nt understand how this can be used for large tables.

Andrus.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Increasing pattern index query speed

2008-11-24 Thread Andrus

Richard,


These are the same but the times are different. I'd be very surprised if
you can reproduce these times reliably.


I re-tried today again and got same results: in production database pattern 
query  is many times slower that equality query.

toode and rid base contain only single product starting with 9910
So both queries should scan exactly same numbers of rows.


Can I give you some wider-ranging suggestions Andrus?
1. Fix the vacuuming issue in your hash-join question.


I have ran VACUUM FULL VERBOSE ANALYSE and set max_fsm_pages=15
So issue is fixed before those tests.


2. Monitor the system to make sure you know if/when disk activity is high.


I optimized this system. Now there are short (some seconds) sales queries 
about after every 5 - 300 seconds which cause few disk activity and add few 
new rows to some tables.

I havent seen that this activity affects to this test result.


3. *Then* start to profile individual queries and look into their plans.
Change the queries one at a time and monitor again.


How to change pattern matching query to faster ?

Andrus.

Btw.

I tried to reproduce this big difference in test server in 8.3 using sample 
data script below and got big difference but in opposite direction.


explain analyze   SELECT sum(1)
FROM   orders
JOIN orders_products USING (order_id)
JOIN products USING (product_id)
WHERE orders.order_date'2006-01-01' and ...

different where clauses produce different results:

AND orders_products.product_id = '3370'  -- 880 .. 926 ms
AND  orders_products.product_id like '3370%' -- 41 ..98 ms

So patter index is 10 .. 20 times (!) faster always.
No idea why.

Test data creation script:

begin;
CREATE OR REPLACE FUNCTION Counter() RETURNS int IMMUTABLE AS
$_$
SELECT 350;
$_$ LANGUAGE SQL;

CREATE TEMP TABLE orders (order_id INTEGER NOT NULL, order_date DATE NOT 
NULL);
CREATE TEMP TABLE products (product_id CHAR(20) NOT NULL, product_name 
char(70) NOT NULL, quantity numeric(12,2) default 1);
CREATE TEMP TABLE orders_products (order_id INTEGER NOT NULL, product_id 
CHAR(20),

 id serial, price numeric(12,2) default 1 );

INSERT INTO products SELECT (n*power( 10,13))::INT8::CHAR(20),
  'product number ' || n::TEXT FROM generate_series(0,13410) AS n;

INSERT INTO orders
SELECT n,'2005-01-01'::date + (4000.0 * n/Counter() * '1 DAY'::interval)
FROM generate_series(0, Counter()/3 ) AS n;

SET work_mem TO 2097151;

INSERT INTO orders_products SELECT
  generate_series/3 as  order_id,
  ( (1+ (generate_series % 13410))*power( 10,13))::INT8::CHAR(20) AS 
product_id

FROM generate_series(1, Counter());

ALTER TABLE orders ADD PRIMARY KEY (order_id);
ALTER TABLE products ADD PRIMARY KEY (product_id);
ALTER TABLE orders_products ADD PRIMARY KEY (id);

ALTER TABLE orders_products ADD FOREIGN KEY (product_id) REFERENCES 
products(product_id);
ALTER TABLE orders_products ADD FOREIGN KEY (order_id) REFERENCES 
orders(order_id) ON DELETE CASCADE;


CREATE INDEX orders_date ON orders( order_date );
CREATE INDEX order_product_pattern_idx ON orders_products( product_id 
bpchar_pattern_ops );


COMMIT;
SET work_mem TO DEFAULT;
ANALYZE; 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hash join on int takes 8..114 seconds

2008-11-24 Thread Andrus
 activity:

top - 23:07:49 up 27 days,  4:20,  1 user,  load average: 0.25, 0.22, 0.12
Tasks:  50 total,   3 running,  47 sleeping,   0 stopped,   0 zombie
Cpu(s): 29.3% us, 70.7% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  0.0% si
Mem:   2075828k total,  1951604k used,   124224k free,0k buffers
Swap:  3911816k total,  216k used,  3911600k free,  1899348k cached

 PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
29687 postgres  25   0  144m 124m 121m R 96.3  6.2   8:18.04 postmaster
8147 root  16   0  4812 1628 1316 S  1.0  0.1   0:00.03 sshd
8141 root  15   0  5080 1892 1528 S  0.7  0.1   0:00.02 sshd
8145 sshd  15   0  4816 1220  912 S  0.3  0.1   0:00.01 sshd
8151 sshd  15   0  4812 1120  816 S  0.3  0.1   0:00.01 sshd
   1 root  16   0  1480  508  444 S  0.0  0.0   0:01.89 init
   2 root  34  19 000 S  0.0  0.0   0:00.01 ksoftirqd/0
   3 root  10  -5 000 S  0.0  0.0   0:00.00 events/0
   4 root  10  -5 000 S  0.0  0.0   0:00.63 khelper
   5 root  10  -5 000 S  0.0  0.0   0:00.00 kthread
   7 root  10  -5 000 S  0.0  0.0   2:21.73 kblockd/0
   8 root  20  -5 000 S  0.0  0.0   0:00.00 kacpid
 115 root  13  -5 000 S  0.0  0.0   0:00.00 aio/0
 114 root  15   0 000 S  0.0  0.0   9:21.41 kswapd0
 116 root  10  -5 000 S  0.0  0.0   0:12.06 xfslogd/0
 117 root  10  -5 000 S  0.0  0.0   1:36.43 xfsdatad/0
 706 root  10  -5 000 S  0.0  0.0   0:00.00 kseriod
 723 root  13  -5 000 S  0.0  0.0   0:00.00 kpsmoused
 738 root  11  -5 000 S  0.0  0.0   0:00.00 ata/0
 740 root  11  -5 000 S  0.0  0.0   0:00.00 scsi_eh_0
 741 root  11  -5 000 S  0.0  0.0   0:00.00 scsi_eh_1
 742 root  11  -5 000 S  0.0  0.0   0:00.00 scsi_eh_2
 743 root  11  -5 000 S  0.0  0.0   0:00.00 scsi_eh_3
 762 root  10  -5 000 S  0.0  0.0   0:18.64 xfsbufd
 763 root  10  -5 000 S  0.0  0.0   0:00.79 xfssyncd
 963 root  16  -4  1712  528  336 S  0.0  0.0   0:00.24 udevd
6677 root  15   0  1728  572  400 S  0.0  0.0   0:10.14 syslog-ng
7183 root  15   0  3472  828  672 S  0.0  0.0   0:08.50 sshd
7222 root  16   0  1736  672  556 S  0.0  0.0   0:00.03 cron
7237 root  16   0  1620  712  608 S  0.0  0.0   0:00.00 agetty
7238 root  17   0  1616  712  608 S  0.0  0.0   0:00.00 agetty
7239 root  16   0  1616  712  608 S  0.0  0.0   0:00.00 agetty
7240 root  16   0  1616  708  608 S  0.0  0.0   0:00.00 agetty
7241 root  16   0  1616  708  608 S  0.0  0.0   0:00.00 agetty
31873 root  15   0  1616  712  608 S  0.0  0.0   0:00.00 agetty
14908 postgres  16   0  141m  10m 9936 S  0.0  0.5   0:01.44 postmaster
14915 postgres  16   0  8468 1360  896 S  0.0  0.1   0:00.36 postmaster

Andrus.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] limit clause produces wrong query plan

2008-11-24 Thread Andrus

it was veery fast. To be honest I do not know what is happening?!


This is really weird.
It seems that PostgreSql OFFSET / LIMIT are not optimized and thus typical 
paging queries


SELECT ... FROM bigtable ORDER BY intprimarykey OFFSET pageno*100 LIMIT 100

or even first page query

SELECT ... FROM bigtable ORDER BY intprimarykey OFFSET 0 LIMIT 100

cannot be used in PostgreSql at all for big tables.

Do you have any idea how to fix this ?

Andrus. 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] seq scan over 3.3 million rows instead of single keyindex access

2008-11-23 Thread Andrus
An index-scan makes only sense if rid contains considerable more than 
300 rows.


I'm sorry, I meant using index to get the row.

Andrus. 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hash join on int takes 8..114 seconds

2008-11-23 Thread Andrus

You could try writing a plpgsql function which would generate the data
set.
Or you could use your existing data set.


Creating 3.5 mln rows using stored proc is probably slow.
Probably it would be better and faster to use some random() and
generate_series() trick.
In this case others can try it and dataset generation is faster.


By the way, a simple way to de-bloat your big table without blocking
would be this :

- stop all inserts and updates
- begin
- create table new like old table
- insert into new select * from old (order by perhaps)
- create indexes
- rename new into old
- commit

If this is just a reporting database where you insert a batch of new data
every day, for instance, that's very easy to do. If it's OLTP, then, no.


Those are orders and order_products tables.
I ran vacuum full analyze verbose last night.
Now database has 4832 MB size, including 1 GB
pg_shdepend bloated indexes.
I added  max_fsm_pages=15 and re-booted.

Query below and other queries are still too slow

set search_path to firma2,public;
explain analyze
SELECT sum(1)
  FROM dok JOIN rid USING (dokumnr)
  JOIN toode USING (toode)
  WHERE rid.toode='X05' AND dok.kuupaev='2008-09-01'

Aggregate  (cost=181795.13..181795.14 rows=1 width=0) (actual
time=23678.265..23678.268 rows=1 loops=1)
  -  Nested Loop  (cost=73999.44..181733.74 rows=24555 width=0) (actual
time=18459.230..23598.956 rows=21476 loops=1)
-  Index Scan using toode_pkey on toode  (cost=0.00..6.01 rows=1
width=24) (actual time=0.134..0.145 rows=1 loops=1)
  Index Cond: ('X05'::bpchar = toode)
-  Hash Join  (cost=73999.44..181482.18 rows=24555 width=24)
(actual time=18459.076..23441.098 rows=21476 loops=1)
  Hash Cond: (outer.dokumnr = inner.dokumnr)
  -  Bitmap Heap Scan on rid  (cost=4082.88..101779.03
rows=270252 width=28) (actual time=9337.782..12720.365 rows=278182 loops=1)
Recheck Cond: (toode = 'X05'::bpchar)
-  Bitmap Index Scan on rid_toode_idx
(cost=0.00..4082.88 rows=270252 width=0) (actual time=9330.634..9330.634
rows=278183 loops=1)
  Index Cond: (toode = 'X05'::bpchar)
  -  Hash  (cost=69195.13..69195.13 rows=112573 width=4)
(actual time=8894.465..8894.465 rows=109890 loops=1)
-  Bitmap Heap Scan on dok  (cost=1492.00..69195.13
rows=112573 width=4) (actual time=1618.763..8404.847 rows=109890 loops=1)
  Recheck Cond: (kuupaev = '2008-09-01'::date)
  -  Bitmap Index Scan on dok_kuupaev_idx
(cost=0.00..1492.00 rows=112573 width=0) (actual time=1612.177..1612.177
rows=110484 loops=1)
Index Cond: (kuupaev =
'2008-09-01'::date)
Total runtime: 23678.790 ms


Here is a list of untried recommendations from this thread:

1. CLUSTER rid ON rid_toode_pkey ; CLUSTER dok ON dok_kuupaev_idx
- In 8.1.4 provided form  of CLUSTER causes syntax error, no idea what
syntax to use.
Risky to try in prod server. Requires creating  randomly distributed
product_id testcase to measure
difference.

2. Change CHAR(20) product index to int index by adding update trigger.
Risky to try in prod server. Requires creating  randomly distributed
product_id testcase to measure
difference.

3. Denormalization of sale date to order_producs table by adding update
trigger.
Risky to try in prod server. Requires creating  randomly distributed
product_id testcase to measure
difference.

4. Check on the performance of the RAID: Does it leverage NCQ appropriately
when running queries in parallel ?
No idea how.

5. Materialized views. I need date granularity so it is possible to sum only
one days sales.
http://www.pgcon.org/2008/schedule/events/69.en.html
Seems to be major appl re-write, no idea how.

Appoaches which probably does not change perfomance:

6. Upgrade to 8.4 or to 8.3.5

7. run server on standalone mode and recover 1 GB pg_shdepend bloated index.

8. tune some conf file parameters:

work_mem = 512

I'd consider increasing this value a little - 0.5 MB seems too low to me
(but not necessarily).


effective_cache_size= 7

Well, your server has 2GB of RAM and usually it's recommended to set
this value to about 60-70% of your RAM, so using 540MB (25%) seems quite
low.

Data size is nearly the same as RAM size. It is unpleasant surprise that
queries take so long time.

What should I do next?


Andrus.

 140926 firma2.rid1737 MB
 240595 firma2.dok1632 MB
 3 1214 pg_catalog.pg_shdepend1235 MB
 4 1232 pg_catalog.pg_shdepend_depender_index 795 MB
 7 1233 pg_catalog.pg_shdepend_reference_index439 MB
 844299 firma2.rid_toode_idx  298 MB
 944286 firma2.dok_tasudok_idx245 MB
10 19103791 firma2.rid_toode_pattern_idx

[PERFORM] limit clause produces wrong query plan

2008-11-23 Thread Andrus

Adding limit clause causes very slow query:

explain analyze select * from firma2.dok where doktyyp='J' order by dokumnr
limit 100
Limit  (cost=0.00..4371.71 rows=100 width=1107) (actual
time=33189.971..33189.971 rows=0 loops=1)
  -  Index Scan using dok_dokumnr_idx on dok  (cost=0.00..278740.01
rows=6376 width=1107) (actual time=33189.959..33189.959 rows=0 loops=1)
Filter: (doktyyp = 'J'::bpchar)
Total runtime: 33190.103 ms


Without limit is is fast:

explain analyze select * from firma2.dok where doktyyp='J' order by dokumnr
Sort  (cost=7061.80..7077.74 rows=6376 width=1107) (actual
time=0.119..0.119 rows=0 loops=1)
  Sort Key: dokumnr
  -  Index Scan using dok_doktyyp on dok  (cost=0.00..3118.46 rows=6376
width=1107) (actual time=0.101..0.101 rows=0 loops=1)
Index Cond: (doktyyp = 'J'::bpchar)
Total runtime: 0.245 ms

How to fix this without dropping dok_doktyyp index so that limit can safely 
used for paged data access ?


indexes:

dok_doktyyp:  dok(doktyyp)
dok_dokumnr_idx:  dok(dokumnr)

types:

dokumnr int primary key
doktyyp char(1)

Andrus.

Using 8.1.4


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hash join on int takes 8..114 seconds

2008-11-23 Thread Andrus

I guess you have backups - take them, restore the database on a different
machine (preferably with the same / similar hw config) and tune the
queries on it.

After restoring all the tables / indexes will be 'clean' (not bloated), so
you'll see if performing VACUUM FULL / CLUSTER is the right solution or if
you have to change the application internals.

Sure, the times will be slightly different but the performance problems
should remain the same.


VACUUM FULL has
My test computer has PostgreSql 8.3, 4 GB RAM, SSD disks, Intel X2Extreme
CPU
So it is much faster than this prod server.
No idea how to emulate this environment.
I can create new db in prod server as old copy but this can be used in late
night only.

Where to find script which clones some database in server? Something like

CREATE DATABASE newdb AS SELECT *  FROM olddb;

It would be more convenient to run db cloning script from pgadmin command
window.
Only way I found is to use SSH with pg_dup/pg_restore. This requires SSH
access to server and SSH port opening to public  internet.

Or probably try to run CLUSTER command in prod server. Hopefully clustering
by product id cannot make things slow
too much.

Andrus. 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hash join on int takes 8..114 seconds

2008-11-23 Thread Andrus
 016  25218 26 75  0
0
1  0232  77424  0 193699600 0 0  25218 37 63  0
0
1  0232  71464  0 194192800 073  26034 38 62  0
0
0  0232 123420  0 189104400 032  25731  8 15 77
0  end of slower query
0  0232 123420  0 189104400 025  25524  0  0
100  0
0  0232 123420  0 189104400 028  25527  0  0
100  0


Is it safe to set

work_mem = 2097151

in postgresql.conf file ?


First, demonstrate that it is all or mostly in memory -- use iostat or
other tools to ensure that there is not much disk activity during the
query.  If your system doesn't have iostat installed, it should be
installed.  It is a very useful tool.


# iostat
bash: iostat: command not found
# locate iostat
/usr/src/linux-2.6.16-gentoo-r9/Documentation/iostats.txt

I have few experience in Linux. No idea how to install or compile iostat in
this system.


If it is all cached in memory, you may want to ensure that your
shared_buffers is a reasonalbe size so that there is less shuffling of data
from the kernel to postgres and back.  Generally, shared_buffers works best
between 5% and 25% of system memory.


currently shared_buffers = 15000


If it is completely CPU bound then the work done for the query has to be
reduced by altering the plan to a more optimal one or making the work it
has to do at each step easier.  Most of the ideas in this thread revolve
around those things.


When running on loaded server even after VACUUM FULL, response time for
original work_mem is  longer probably because it must fetch blocks from 
disk.



Based on the time it took to do the vacuum, I suspect your disk subsystem
is a bit slow.  If it can be determined that there is much disk I/O in your
use cases, there are generally several things that can be done to tune
Linux I/O.  The main ones in my experience are the 'readahead' value for
each disk which helps sequential reads significantly, and trying out the
linux 'deadline' scheduler and comparing it to the more commonly used 'cfq'
scheduler.  If the system is configured with the anticipatory scheduler,
absolutely switch to cfq or deadline as the anticipatory scheduler will
perform horribly poorly for a database.


This is 3 year old cheap server.
No idea what to config.

There is also other similar server which as 1.2 GB more usable memory. No 
idea is it worth to switch into it.

After some years sales data will still exceed this more memory.

Andrus. 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hash join on int takes 8..114 seconds

2008-11-22 Thread Andrus

You could perhaps run a little check on the performance of the RAID, is
it better than linux software RAID ?
Does it leverage NCQ appropriately when running queries in parallel ?


I was told that this RAID is software RAID.
I have no experience what to check.
This HP server was installed 3 years ago and in this time it was not
high perfomance server.


explain analyze
SELECT sum(1)
  FROM dok JOIN rid USING (dokumnr)
  JOIN toode USING (toode)
  LEFT JOIN artliik using(grupp,liik)
  WHERE rid.toode='X05' AND dok.kuupaev='2008-09-01'


By the way, note that the presence of the toode table in the query above
is not required at all, unless you use columns of toode in your
aggregates.


In real query, SELECT column list contains data form sales table dok (sale
date and time)
and sales detail table rid (quantity, price)
WHERE clause may contain additional filters from product table (product
category, supplier).


Let's play with that, after all, it's friday night.


Thank you very much for great sample.
I tried to create testcase from this to match production db:

1.2 million orders
3.5 million order details
13400 products with char(20) as primary keys containing ean-13 codes mostly
3 last year data
every order has usually 1..3 detail lines
same product can appear multiple times in order
products are queried by start of code

This sample does not distribute products randomly between orders.
How to change this so that every order contains 3 (or 1..6 ) random 
products?

I tried to use random row sample from
http://www.pgsql.cz/index.php/PostgreSQL_SQL_Tricks-i

but in this case constant product is returned always. It seems than query 
containing randon() is executed only once.



Andrus.

begin;
CREATE TEMP TABLE orders (order_id INTEGER NOT NULL, order_date DATE NOT 
NULL);
CREATE TEMP TABLE products (product_id CHAR(20) NOT NULL, product_name 
char(70) NOT NULL, quantity numeric(12,2) default 1);
CREATE TEMP TABLE orders_products (order_id INTEGER NOT NULL, product_id 
CHAR(20),padding1 char(70),

 id serial, price numeric(12,2) default 1 );

INSERT INTO products SELECT (n*power( 10,13))::INT8::CHAR(20),
  'product number ' || n::TEXT FROM generate_series(0,13410) AS n;

INSERT INTO orders
SELECT n,'2005-01-01'::date + (4000.0 * n/350.0 * '1 DAY'::interval) 
FROM generate_series(0,350/3) AS n;


SET work_mem TO 2097151; -- 1048576;

INSERT INTO orders_products SELECT
  generate_series/3 as  order_id,
  ( ((generate_series/350.0)*13410.0)::int*power( 
10,13))::INT8::CHAR(20)

FROM generate_series(1,350)
where generate_series/30;

ALTER TABLE orders ADD PRIMARY KEY (order_id);
ALTER TABLE products ADD PRIMARY KEY (product_id);
ALTER TABLE orders_products ADD PRIMARY KEY (id);

ALTER TABLE orders_products ADD FOREIGN KEY (product_id) REFERENCES 
products(product_id);
ALTER TABLE orders_products ADD FOREIGN KEY (order_id) REFERENCES 
orders(order_id) ON DELETE CASCADE;


CREATE INDEX orders_date ON orders( order_date );
CREATE INDEX order_product_pattern_idx ON orders_products( product_id 
bpchar_pattern_ops );


COMMIT;
SET work_mem TO DEFAULT;
ANALYZE;

 SELECT sum(quantity*price)
FROM   orders
JOIN orders_products USING (order_id)
JOIN products USING (product_id)
WHERE orders.order_date='2008-01-17'
and orders_products.product_id like '130%' 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] seq scan over 3.3 million rows instead of single key index access

2008-11-22 Thread Andrus

There are  indexes on   rid(dokumnr) and dok(dokumnr) and dokumnr is int.
Instead of using single key index, 8.1.4 scans over whole rid table.
Sometimes idtelluued can contain more than single row so replacing join with 
equality is not possible.


How to fix ?

Andrus.

CREATE TEMP TABLE idtellUued(dokumnr INT) ON COMMIT DROP;
INSERT INTO idtellUued VALUES(1249228);
explain analyze  select 1
  from dok JOIN rid USING(dokumnr)
JOIN idtellUued USING(dokumnr)

Hash Join  (cost=7483.22..59.77 rows=5706 width=0) (actual 
time=14905.981..27065.903 rows=8 loops=1)

  Hash Cond: (outer.dokumnr = inner.dokumnr)
  -  Seq Scan on rid  (cost=0.00..198240.33 rows=3295833 width=4) (actual 
time=0.036..15021.641 rows=3280576 loops=1)
  -  Hash  (cost=7477.87..7477.87 rows=2140 width=8) (actual 
time=0.114..0.114 rows=1 loops=1)
-  Nested Loop  (cost=0.00..7477.87 rows=2140 width=8) (actual 
time=0.076..0.099 rows=1 loops=1)
  -  Seq Scan on idtelluued  (cost=0.00..31.40 rows=2140 
width=4) (actual time=0.006..0.011 rows=1 loops=1)
  -  Index Scan using dok_dokumnr_idx on dok  (cost=0.00..3.47 
rows=1 width=4) (actual time=0.051..0.058 rows=1 loops=1)

Index Cond: (dok.dokumnr = outer.dokumnr)
Total runtime: 27066.080 ms


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Increasing pattern index query speed

2008-11-22 Thread Andrus

Both queries return same result (19) and return same data.
Pattern query is a much slower (93 sec) than  equality check (13 sec).
How to fix this ?
Using 8.1.4, utf-8 encoding, et-EE locale.

Andrus.

SELECT sum(1)
  FROM dok JOIN rid USING (dokumnr)
  JOIN toode USING (toode)
  WHERE rid.toode = '9910' AND dok.kuupaev BETWEEN '2008-11-21' AND 
'2008-11-21'

 AND dok.yksus  LIKE 'ORISSAARE%'

Aggregate  (cost=43.09..43.10 rows=1 width=0) (actual 
time=12674.675..12674.679 rows=1 loops=1)
  -  Nested Loop  (cost=29.57..43.08 rows=1 width=0) (actual 
time=2002.045..12673.645 rows=19 loops=1)
-  Nested Loop  (cost=29.57..37.06 rows=1 width=24) (actual 
time=2001.922..12672.344 rows=19 loops=1)
  -  Index Scan using dok_kuupaev_idx on dok  (cost=0.00..3.47 
rows=1 width=4) (actual time=342.812..9810.627 rows=319 loops=1)
Index Cond: ((kuupaev = '2008-11-21'::date) AND 
(kuupaev = '2008-11-21'::date))

Filter: (yksus ~~ 'ORISSAARE%'::text)
  -  Bitmap Heap Scan on rid  (cost=29.57..33.58 rows=1 
width=28) (actual time=8.948..8.949 rows=0 loops=319)
Recheck Cond: ((outer.dokumnr = rid.dokumnr) AND 
(rid.toode = '9910'::bpchar))
-  BitmapAnd  (cost=29.57..29.57 rows=1 width=0) 
(actual time=8.930..8.930 rows=0 loops=319)
  -  Bitmap Index Scan on rid_dokumnr_idx 
(cost=0.00..2.52 rows=149 width=0) (actual time=0.273..0.273 rows=2 
loops=319)
Index Cond: (outer.dokumnr = 
rid.dokumnr)
  -  Bitmap Index Scan on rid_toode_idx 
(cost=0.00..26.79 rows=1941 width=0) (actual time=8.596..8.596 rows=15236 
loops=319)

Index Cond: (toode = '9910'::bpchar)
-  Index Scan using toode_pkey on toode  (cost=0.00..6.01 rows=1 
width=24) (actual time=0.043..0.048 rows=1 loops=19)

  Index Cond: ('9910'::bpchar = toode)
Total runtime: 12675.191 ms

explain analyze SELECT sum(1)
  FROM dok JOIN rid USING (dokumnr)
  JOIN toode USING (toode)
  WHERE rid.toode like '9910%' AND dok.kuupaev BETWEEN '2008-11-21' AND 
'2008-11-21'

 AND dok.yksus  LIKE 'ORISSAARE%'


Aggregate  (cost=15.52..15.53 rows=1 width=0) (actual 
time=92966.501..92966.505 rows=1 loops=1)
  -  Nested Loop  (cost=0.00..15.52 rows=1 width=0) (actual 
time=24082.032..92966.366 rows=19 loops=1)
-  Nested Loop  (cost=0.00..9.50 rows=1 width=24) (actual 
time=24081.919..92965.116 rows=19 loops=1)

  Join Filter: (outer.dokumnr = inner.dokumnr)
  -  Index Scan using dok_kuupaev_idx on dok  (cost=0.00..3.47 
rows=1 width=4) (actual time=0.203..13924.324 rows=319 loops=1)
Index Cond: ((kuupaev = '2008-11-21'::date) AND 
(kuupaev = '2008-11-21'::date))

Filter: (yksus ~~ 'ORISSAARE%'::text)
  -  Index Scan using rid_toode_pattern_idx on rid 
(cost=0.00..6.01 rows=1 width=28) (actual time=0.592..166.778 rows=15235 
loops=319)
Index Cond: ((toode ~=~ '9910'::bpchar) AND (toode 
~~ '9911'::bpchar))

Filter: (toode ~~ '9910%'::text)
-  Index Scan using toode_pkey on toode  (cost=0.00..6.01 rows=1 
width=24) (actual time=0.041..0.046 rows=1 loops=19)

  Index Cond: (outer.toode = toode.toode)
Total runtime: 92967.512 ms


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] seq scan over 3.3 million rows instead of single key index access

2008-11-22 Thread Andrus

Gregory,


 I would suggest running ANALYZE
idtellUued at some point before the problematic query.


Thank you.
After adding analyze all is OK.
Is analyze command required in  8.3 also ?
Or is it better better to specify some hint at create temp table time since 
I know the number of rows before running query ?


Andrus.

set search_path to firma2,public;
CREATE TEMP TABLE idtellUued(dokumnr INT) ON COMMIT DROP;
INSERT INTO idtellUued VALUES(1249228);
analyze idtelluued;
explain analyze  select 1
  from dok JOIN rid USING(dokumnr)
JOIN idtellUued USING(dokumnr)

Nested Loop  (cost=0.00..275.18 rows=3 width=0) (actual time=87.266..87.388 
rows=8 loops=1)
  -  Nested Loop  (cost=0.00..6.95 rows=1 width=8) (actual 
time=36.613..36.636 rows=1 loops=1)
-  Seq Scan on idtelluued  (cost=0.00..1.01 rows=1 width=4) 
(actual time=0.009..0.015 rows=1 loops=1)
-  Index Scan using dok_dokumnr_idx on dok  (cost=0.00..5.93 
rows=1 width=4) (actual time=36.585..36.590 rows=1 loops=1)

  Index Cond: (dok.dokumnr = outer.dokumnr)
  -  Index Scan using rid_dokumnr_idx on rid  (cost=0.00..267.23 rows=80 
width=4) (actual time=50.635..50.672 rows=8 loops=1)

Index Cond: (outer.dokumnr = rid.dokumnr)
Total runtime: 87.586 ms


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hash join on int takes 8..114 seconds

2008-11-21 Thread Andrus

Richard,


In addition to top below, you'll probably find vmstat 5 useful.


Thank you.
During this query run (65 sec), vmstat 5 shows big values in  bi,cs and wa 
columns:


procs ---memory-- ---swap-- -io --system-- cpu
r  b   swpd   free   buff  cache   si   sobibo   incs us sy id 
wa
1  1 88  51444  0 18544040017 6   1513  5  1 83 
10
0  1 88  52140  0 185430400  362695  562   784 15 38  0 
47
0  1 92  51608  0 185566800 14116   103 1382  2294  4  8  0 
88
0  1 92  51620  0 185725601 1525831 1210  1975  4  8  0 
88
0  2 92  50448  0 185918800 1311819 1227  1982  3  7  0 
90
0  1 92  51272  0 185908800  769153  828  1284 14  4  0 
82
0  1 92  52472  0 185879200 10691 9  758   968  3  7  0 
89
0  2 92  51240  0 185859600  8204  7407  717  1063  2  5  0 
93
0  1 92  51648  0 186038800 20622   121 1118  2229 12  9  0 
79
2  1 92  50564  0 186139600 20994  3277  969  1681 15  8  0 
76
1  0 92  52180  0 186019200 1854236  802  1276 36 12  0 
51
0  0 92  91872  0 182094801 1528547  588   774  9 12 32 
47
0  0 92  91904  0 182094800 0 4  25118  0  0 
100  0
0  0 92  92044  0 182094800 0 0  25017  0  0 
100  0
0  0 92  91668  0 1821156002793  27266  5  0 92 
3
0  0 92  91668  0 182115600 064  26038  0  0 
100  0
0  0 92  91636  0 182115600 0   226  27771  0  0 
100  0
0  0 92  91676  0 182115600 026  25523  0  0 
100  0



Here you're stuck waiting for disks (91.0% wa). Check out vmstat and
iostat to see what's happening.


typing  iostat returns

bash: iostat: command not found

It seems that this is not installed in this gentoo.

Andrus. 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hash join on int takes 8..114 seconds

2008-11-21 Thread Andrus
 have been deleted, 0 are currently reusable.
CPU 0.18s/0.66u sec elapsed 1.78 sec.
INFO:  index rid_toode_pattern_idx now contains 3275230 row versions in 
16310 pages

DETAIL:  17800 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.44s/1.04u sec elapsed 6.55 sec.
INFO:  rid: removed 38331 row versions in 3090 pages
DETAIL:  CPU 0.20s/0.10u sec elapsed 5.49 sec.
INFO:  rid: found 38331 removable, 3275189 nonremovable row versions in 
165282 pages

DETAIL:  0 dead row versions cannot be removed yet.
There were 1878923 unused item pointers.
0 pages are entirely empty.
CPU 5.06s/7.27u sec elapsed 607.59 sec.

Query returned successfully with no result in 1058319 ms.


- what's the size of the dataset relative to the RAM ?


Db size is 7417 MB
relevant table sizes in desc by size order:

 140595 dok 2345 MB
 2 1214 pg_shdepend 2259 MB
 340926 rid 2057 MB
 6 1232 pg_shdepend_depender_index  795 MB
 7 1233 pg_shdepend_reference_index 438 MB
 844286 dok_tasudok_idx 245 MB
 944299 rid_toode_idx   243 MB
1044283 dok_klient_idx  142 MB
11 19103791 rid_toode_pattern_idx   127 MB
1444298 rid_inpdokumnr_idx  118 MB
1544297 rid_dokumnr_idx 110 MB
1643573 rid_pkey109 MB
1844288 dok_tasumata_idx91 MB
1944289 dok_tellimus_idx89 MB
2044284 dok_krdokumnr_idx   89 MB
2144285 dok_kuupaev_idx 84 MB
2343479 dok_pkey74 MB
2444282 dok_dokumnr_idx 74 MB
25 19076304 rid_rtellimus_idx   58 MB
26 18663923 dok_yksus_pattern_idx   43 MB
27 18801591 dok_sihtyksus_pattern_idx   42 MB
29 18774881 dok_doktyyp 30 MB
4640967 toode   13 MB

server is HP Proliant DL320 G3
http://h18000.www1.hp.com/products/quickspecs/12169_ca/12169_ca.HTML
CPU is 2.93Ghz Celeron 256kb cache.

Server has 2 GB RAM.
It has SATA RAID 0,1 integrated controller (1.5Gbps) and SAMSUNG HD160JJ
mirrored disks.


Now let's look more closely at the query :

explain analyze
SELECT sum(1)
  FROM dok JOIN rid USING (dokumnr)
  JOIN toode USING (toode)
  LEFT JOIN artliik using(grupp,liik)
  WHERE rid.toode='X05' AND dok.kuupaev='2008-09-01'


I presume doing the query without artliik changes nothing to the runtime,
yes ?


Yes. After removing artkliik from join I got response times 65 and 50
seconds, so this does not make difference.


Your problem here is that, no matter what, postgres will have to examine
- all rows where dok.kuupaev='2008-09-01',
- and all rows where rid.toode = 'X05'.
If you use dok.kuupaev='2007-09-01' (note : 2007) it will probably have
to scan many, many more rows.


Probably yes, since then it reads one year more sales data.


If you perform this query often you could CLUSTER rid on (toode) and dok
on (kuupaev), but this can screw other queries.


Some reports are by sales date (dok.kuupaev) and customers.
CLUSTER rid on (toode) slows them down. Also autovacuum cannot do 
clustering.



What is the meaning of the columns ?


This is typical sales data:

-- Receipt headers:
DOK ( dokumnr  INT SERIAL PRIMARY KEY,
 kuupaev DATE --- sales date
)

-- Receipt details
RID ( dokumnr INT,
toode CHAR(20),  -- item code
CONSTRAINT rid_dokumnr_fkey FOREIGN KEY (dokumnr)   REFERENCES dok
(dokumnr),
 CONSTRAINT rid_toode_fkey FOREIGN KEY (toode)
 REFERENCES firma2.toode (toode)
)

-- Products
TOODE (
 toode CHAR(20) PRIMARY KEY
)


To make this type of query faster I would tend to think about :



- denormalization (ie adding a column in one of your tables and a
multicolumn index)


For this query it is possible to duplicate kuupaev column to rid table.
However most of the this seems to go to scanning rid table, so I suspect
that this will help.


- materialized views
- materialized summary tables (ie. summary of sales for last month, for
instance)


There are about 1000 items and reports are different.

Andrus. 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hash join on int takes 8..114 seconds

2008-11-21 Thread Andrus

Richard,

Thank you.


Try SELECT count(*) FROM pg_shdepend.


This query returns 3625  and takes 35 seconds to run.


If it's not a million rows, then the table is bloated. Try (as postgres
or some other db superuser) vacuum full pg_shdepend and a reindex
pg_shdepend.


vacuum full verbose pg_shdepend
INFO:  vacuuming pg_catalog.pg_shdepend
INFO:  pg_shdepend: found 16103561 removable, 3629 nonremovable row 
versions in 131425 pages

DETAIL:  0 dead row versions cannot be removed yet.
Nonremovable row versions range from 49 to 49 bytes long.
There were 0 unused item pointers.
Total free space (including removable row versions) is 1009387632 bytes.
131363 pages are or will become empty, including 0 at the end of the table.
131425 pages containing 1009387632 free bytes are potential move 
destinations.

CPU 2.12s/1.69u sec elapsed 52.66 sec.
INFO:  index pg_shdepend_depender_index now contains 3629 row versions in 
101794 pages

DETAIL:  16103561 index row versions were removed.
101311 index pages have been deleted, 2 are currently reusable.
CPU 20.12s/14.52u sec elapsed 220.66 sec.

After 400 seconds of run I got phone calls that server does not respond to 
other clients. So I was forced to cancel vacuum full verbose pg_shdepend

 command.

How to run it so that other users can use database at same time ?


If it is a million rows, you'll need to find out why. Do you have a lot
of temporary tables that aren't being dropped or something similar?


Application creates temporary tables in many places. Every sales operation 
probably creates some temporary tables.
Should I change something in configuration or change application (Only 
single POS application which is used to access this db) or is only solution 
to manully run


vacuum full pg_shdepend
reindex pg_shdepend

periodically ?
How to vacuum full pg_shdepend automatically so that other users can work at 
same time ?


Hopefully this table size does not affect to query speed.

Andrus. 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hash join on int takes 8..114 seconds

2008-11-21 Thread Andrus

Thomas,

Thank you.


Just the most important points:

1) dok table contains 1235086 row versions in 171641 pages (with 8kB
pages this means 1.4GB MB of data), but there are 1834279 unused item
pointers (i.e. about 60% of the space is wasted)

2) rid table contains 3275189 roiws in 165282 (with 8kB pages this means
about 1.3GB of data), but there are 1878923 unused item pointers (i.e.
about 30% of the space is wasted)

3) don't forget to execute analyze after vacuuming (or vacuum analyze)


autovacuum is running.
So if I understand properly, I must ran
VACUUM FULL ANALYZE dok;
VACUUM FULL ANALYZE rid;

Those commands cause server probably to stop responding to other client like
vacuum full pg_shdepend
did.

Should vacuum_cost_delay = 2000 allow other users to work when running those
commands ?


4) I'm not sure why the sizes reported by you (for example 2.3GB vs 1.5GB
for doc table) - the difference seems too large for me.


I used pg_total_relation_size(). So 2.3 GB includes indexes also:

  844286 dok_tasudok_idx245 MB
1044283 dok_klient_idx 142 MB
   1844288 dok_tasumata_idx   91 MB
   1944289 dok_tellimus_idx   89 MB
2044284dok_krdokumnr_idx  89 MB
2144285 dok_kuupaev_idx84 MB
2243531 makse_pkey 77 MB
2343479 dok_pkey   74 MB
2444282 dok_dokumnr_idx74 MB
26 18663923 dok_yksus_pattern_idx  43 MB
27 18801591 dok_sihtyksus_pattern_idx  42 MB


Anyway the amount of wasted rows seems significant to me - I'd try to
solve this first. Either by VACUUM FULL or by CLUSTER. The CLUSTER will
lock the table exclusively, but the results may be better (when sorting by
a well chosen index). Don't forget to run ANALYZE afterwards.


How to invoke those commands so that other clients can continue work?
I'm using 8.1.4.
Log files show that autovacuum is running.

I'm planning the following solution:

1. Set

vacuum_cost_delay=2000

2. Run the following commands periodically in this order:

VACUUM FULL;
vacuum full pg_shdepend;
CLUSTER rid on (toode);
CLUSTER dok  on (kuupaev);
REINDEX DATABASE mydb;
REINDEX SYSTEM mydb;
ANALYZE;

Are all those command required or can something leaved out ?


Several other things to consider:

1) Regarding the toode column - why are you using CHAR(20) when the values
are actually shorter? This may significantly increase the amount of space
required.


There may be some products whose codes may be up to 20 characters.
PostgreSQL does not hold trailing spaces in db, so this does *not* affect to
space.


2) I've noticed the CPU used is Celeron, which may negatively affect the
speed of hash computation. I'd try to replace it by something faster - say
INTEGER as an artificial primary key of the toode table and using it as
a FK in other tables.  This might improve the Bitmap Heap Scan on rid
part, but yes - it's just a minor improvement compared to the Hash Join
part of the query.


Natural key Toode CHAR(20) is used widely in different queries. Replacing it
with
INT surrogate key requires major application rewrite.

Should I add surrogate index INT columns to toode and rid table and measure
test query speed in this case?


Materialized views seem like a good idea to me, but maybe I'm not seeing
something. What do you mean by reports are different? If there is a lot
of rows for a given product / day, then creating an aggregated table with
(product code / day) as a primary key is quite simple. It may require a
lot of disk space, but it'll remove the hash join overhead. But if the
queries are very different, then it may be difficult to build such
materialized view(s).


log file seems that mostly only those queries are slow:

SELECT ...
  FROM dok JOIN rid USING (dokumnr)
  JOIN ProductId USING (ProductId)
  WHERE rid.ProductId LIKE :p1 || '%' AND dok.SaleDate=:p2

:p1 and :p2 are parameters different for different queries.

dok contains several years of data. :p2 is usually only few previous months
or last year ago.
SELECT column list contains fixed list of known columns from all tables.

How to create index or materialized view to optimize this types of queries ?

Andrus.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hash join on int takes 8..114 seconds

2008-11-21 Thread Andrus
How to vacuum full pg_shdepend automatically so that other users can 
work at same time ?


Your table is horribly bloated.
You must use VACUUM FULL + REINDEX (as superuser) on it, however 
unfortunately, it is blocking.
Therefore, you should wait for sunday night to do this, when noone will 
notice.


Shops are closed late night for a short time, including sunday night.
This time may be shorter than time required to complete VACUUM command.

I discovered vacuum_cost_delay=2000 option. Will this remove blocking issue 
and allow vacuum full to work ?



Meanwhile, you can always VACUUM it (as superuser) and REINDEX it.


I expect that autovacuum does this automatically.


And while you're at it, VACUUM FULL + reindex the entire database.
To avoid such annoyances in the future, you should ensure that autovacuum 
runs properly ; you should investigate this. If you use a cron'ed VACUUM 
that does not run as superuser, then it will not be able to VACUUM the 
system catalogs, and the problem will come back.


autovacuum is turned on in postgresql.conf file
log file shows a lot of messages every day that database is vacuumed.
I assume that it is running as user postgres.

I do'nt understand how autovacuum can avoid this: it does not perform vacuum 
full so pg_shdepend ja my tables become

bloated again and again.

Andrus. 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hash join on int takes 8..114 seconds

2008-11-21 Thread Andrus

Alvaro,


1. vacuum_cost_delay does not affect vacuum full
2. vacuum full is always blocking, regardless of settings


So only way is to disable other database acces if vacuum full is required.


So I gather you're not doing any vacuuming, eh?


Log files for every day are full of garbage messages below.
So I hope that vacuum is running well, isn't it ?

Andrus.

2008-11-19 00:00:48 EET11728  1 LOG:  autovacuum: processing database 
postgres
2008-11-19 00:01:48 EET11729  1 LOG:  autovacuum: processing database 
mydb1
2008-11-19 00:02:48 EET11730  1 LOG:  autovacuum: processing database 
emydb1
2008-11-19 00:03:48 EET11731  1 LOG:  autovacuum: processing database 
template1
2008-11-19 00:04:48 EET11732  1 LOG:  autovacuum: processing database 
testmydb1
2008-11-19 00:05:48 EET11733  1 LOG:  autovacuum: processing database 
mydb3
2008-11-19 00:06:48 EET11734  1 LOG:  autovacuum: processing database 
postgres
2008-11-19 00:07:48 EET11735  1 LOG:  autovacuum: processing database 
mydb1
2008-11-19 00:08:48 EET11736  1 LOG:  autovacuum: processing database 
emydb1
2008-11-19 00:09:48 EET11737  1 LOG:  autovacuum: processing database 
template1
2008-11-19 00:10:48 EET11750  1 LOG:  autovacuum: processing database 
testmydb1
2008-11-19 00:11:48 EET11751  1 LOG:  autovacuum: processing database 
mydb3
2008-11-19 00:12:48 EET11752  1 LOG:  autovacuum: processing database 
postgres
2008-11-19 00:13:48 EET11753  1 LOG:  autovacuum: processing database 
mydb1
2008-11-19 00:14:48 EET11754  1 LOG:  autovacuum: processing database 
emydb1
2008-11-19 00:15:48 EET11755  1 LOG:  autovacuum: processing database 
template1
2008-11-19 00:16:48 EET11756  1 LOG:  autovacuum: processing database 
testmydb1
2008-11-19 00:17:48 EET11757  1 LOG:  autovacuum: processing database 
mydb3
... 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hash join on int takes 8..114 seconds

2008-11-20 Thread Andrus

Just a question, what are you doing with the 20.000 result rows ?


Those rows represent monthly sales data of one item.
They are used as following:

1. Detailed sales report for month. This report can browsed in screen for 
montly sales and ordering analysis.


2. Total reports. In those reports, sum( sales), sum(quantity) is used to 
get total sales in day, week, month, time for item and resulting rows are 
summed.


Andrus. 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hash join on int takes 8..114 seconds

2008-11-20 Thread Andrus
 udevd
6677 root  15   0  1728  572  400 S  0.0  0.0   0:04.99 syslog-ng


Andrus. 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Hash join on int takes 8..114 seconds

2008-11-19 Thread Andrus

Query below seems to use indexes everywhere in most optimal way.
dokumnr column is of type int

Speed of this query varies rapidly:

In live db fastest response I have got is 8 seconds.
Re-running same query  after 10 seconds may take 60 seconds.
Re-running it again after 10 seconds may take 114 seconds.

Any idea how to speed it up ?

Is it possible to optimize it, will upgrading to 8.3.5 help or should I
require to add more RAM, disk or CPU speed ?

Real query contains column list instead of sum(1) used in test below.

Andrus.


explain analyze
SELECT sum(1)
  FROM dok JOIN rid USING (dokumnr)
  JOIN toode USING (toode)
  LEFT JOIN artliik using(grupp,liik)
  WHERE rid.toode='X05' AND dok.kuupaev='2008-09-01'

longest response time:

Aggregate  (cost=234278.53..234278.54 rows=1 width=0) (actual
time=114479.933..114479.936 rows=1 loops=1)
  -  Hash Left Join  (cost=52111.20..234218.21 rows=24126 width=0) (actual
time=100435.523..114403.293 rows=20588 loops=1)
Hash Cond: ((outer.grupp = inner.grupp) AND (outer.liik =
inner.liik))
-  Nested Loop  (cost=52103.94..233735.35 rows=24126 width=19)
(actual time=100405.258..114207.387 rows=20588 loops=1)
  -  Index Scan using toode_pkey on toode  (cost=0.00..6.01
rows=1 width=43) (actual time=18.312..18.325 rows=1 loops=1)
Index Cond: ('X05'::bpchar = toode)
  -  Hash Join  (cost=52103.94..233488.08 rows=24126 width=24)
(actual time=100386.921..114037.986 rows=20588 loops=1)
Hash Cond: (outer.dokumnr = inner.dokumnr)
-  Bitmap Heap Scan on rid  (cost=4127.51..175020.84
rows=317003 width=28) (actual time=9.932..76225.918 rows=277294
loops=1)
  Recheck Cond: (toode = 'X05'::bpchar)
  -  Bitmap Index Scan on rid_toode_idx
(cost=0.00..4127.51 rows=317003 width=0) (actual time=11105.807..11105.807
rows=280599 loops=1)
Index Cond: (toode = 'X05'::bpchar)
-  Hash  (cost=47376.82..47376.82 rows=93444 width=4)
(actual time=35082.427..35082.427 rows=105202 loops=1)
  -  Index Scan using dok_kuupaev_idx on dok
(cost=0.00..47376.82 rows=93444 width=4) (actual time=42.110..34586.331
rows=105202 loops=1)
Index Cond: (kuupaev =
'2008-09-01'::date)
-  Hash  (cost=6.84..6.84 rows=84 width=19) (actual
time=30.220..30.220 rows=84 loops=1)
  -  Seq Scan on artliik  (cost=0.00..6.84 rows=84 width=19)
(actual time=20.104..29.845 rows=84 loops=1)
Total runtime: 114480.373 ms

Same query in other runs:


Aggregate  (cost=234278.53..234278.54 rows=1 width=0) (actual
time=62164.496..62164.500 rows=1 loops=1)
  -  Hash Left Join  (cost=52111.20..234218.21 rows=24126 width=0) (actual
time=46988.005..62088.379 rows=20588 loops=1)
Hash Cond: ((outer.grupp = inner.grupp) AND (outer.liik =
inner.liik))
-  Nested Loop  (cost=52103.94..233735.35 rows=24126 width=19)
(actual time=46957.750..61893.613 rows=20588 loops=1)
  -  Index Scan using toode_pkey on toode  (cost=0.00..6.01
rows=1 width=43) (actual time=146.530..146.543 rows=1 loops=1)
Index Cond: ('X05'::bpchar = toode)
  -  Hash Join  (cost=52103.94..233488.08 rows=24126 width=24)
(actual time=46811.194..61595.560 rows=20588 loops=1)
Hash Cond: (outer.dokumnr = inner.dokumnr)
-  Bitmap Heap Scan on rid  (cost=4127.51..175020.84
rows=317003 width=28) (actual time=1870.209..55864.237 rows=277294 loops=1)
  Recheck Cond: (toode = 'X05'::bpchar)
  -  Bitmap Index Scan on rid_toode_idx
(cost=0.00..4127.51 rows=317003 width=0) (actual time=1863.713..1863.713
rows=280599 loops=1)
Index Cond: (toode = 'X05'::bpchar)
-  Hash  (cost=47376.82..47376.82 rows=93444 width=4)
(actual time=1650.823..1650.823 rows=105202 loops=1)
  -  Index Scan using dok_kuupaev_idx on dok
(cost=0.00..47376.82 rows=93444 width=4) (actual time=0.091..1190.962
rows=105202 loops=1)
Index Cond: (kuupaev =
'2008-09-01'::date)
-  Hash  (cost=6.84..6.84 rows=84 width=19) (actual
time=30.210..30.210 rows=84 loops=1)
  -  Seq Scan on artliik  (cost=0.00..6.84 rows=84 width=19)
(actual time=20.069..29.836 rows=84 loops=1)
Total runtime: 62164.789 ms



Aggregate  (cost=234278.53..234278.54 rows=1 width=0) (actual
time=40185.499..40185.503 rows=1 loops=1)
  -  Hash Left Join  (cost=52111.20..234218.21 rows=24126 width=0) (actual
time=32646.761..40109.470 rows=20585 loops=1)
Hash Cond: ((outer.grupp = inner.grupp) AND (outer.liik =
inner.liik))
-  Nested Loop  (cost=52103.94..233735.35 rows=24126 width=19)
(actual time=32645.933..39944.242 rows=20585 loops=1)
  -  Index Scan using toode_pkey on toode  (cost

[PERFORM] Increasing select max(datecol) from bilkaib where datecol=date'2008-11-01' and (cr='00' or db='00') speed

2008-11-12 Thread Andrus

There are columns
kuupaev date,  cr char(10), db char(10)
and regular indexes  for all those fields.
bilkaib table contains large number of rows.

The following query takes too much time.
How to make it faster ?
I think PostgreSql should use multiple indexes as bitmaps to speed it.

I can re-write this query in any way or split to multiple statements if this
makes it faster.

Andrus.

explain analyze select max(kuupaev) from bilkaib where
kuupaev=date'2008-11-01' and (cr='00' or db='00')

Result  (cost=339.75..339.76 rows=1 width=0) (actual
time=52432.256..52432.260 rows=1 loops=1)
  InitPlan
-  Limit  (cost=0.00..339.75 rows=1 width=4) (actual
time=52432.232..52432.236 rows=1 loops=1)
  -  Index Scan Backward using bilkaib_kuupaev_idx on bilkaib
(cost=0.00..1294464.73 rows=3810 width=4) (actual time=52432.222..52432.222
rows=1 loops=1)
Index Cond: (kuupaev = '2008-11-01'::date)
Filter: ((kuupaev IS NOT NULL) AND ((cr = '00'::bpchar) OR
(db = '00'::bpchar)))
Total runtime: 52432.923 ms

PostgreSQL 8.1.4 on i686-pc-linux-gnu, compiled by GCC
i686-pc-linux-gnu-gcc (GCC) 3.4.6 (Gentoo 3.4.6-r1, ssp-3.4.5-1.0,
pie-8.7.9)


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Increasing select max(datecol) from bilkaib where datecol=date'2008-11-01' and (cr='00' or db='00') speed

2008-11-12 Thread Andrus

Vladimir,

I am afraid PostgreSQL is not smart enough to rewrite query with or into 
two separate index scans. There is no way to improve the query 
significantly without rewriting it.
Note:  for this case indices on (datecol), (cr) and (db) are not very 
helpful.


Thank you very much.
I added you indexes to db and re-write query.
Now it runs fast.

Andrus.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Increasing select max(datecol) from bilkaib where datecol=date'2008-11-01' and (cr='00' or db='00') speed

2008-11-12 Thread Andrus

Matthew,

Thank you.

bilkaib table contains GL transactions for every day.
00 records are initial balance records and they appear only in start of year 
or start of month.
They may present or may be not present for some month if initial balance is 
not calculated yet.
If 00 records are present, usuallly there are lot of them for single date 
for db and cr columns.

This query finds initial balance date befeore given date.
bilkaib table contains several year transactions so it is large.

Alternatively if you create an index on (cr, bilkaib) and one on (db, 
bilkaib) then you will be able to use other values in the query too.


I'm sorry I do'nt understand this.
What does the (cr, bilkaib) syntax mean?
Should I create two functions indexes and re-write query as Vladimir 
suggests or is there better appoach ?


Andrus. 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Using index for IS NULL query

2008-11-11 Thread Andrus

Index is not used for

is null

condition:

create index  makse_dokumnr_idx on makse(dokumnr);
explain select
sum( summa)
  from MAKSE
  where  dokumnr is null

Aggregate  (cost=131927.95..131927.96 rows=1 width=10)
  -  Seq Scan on makse  (cost=0.00..131927.94 rows=1 width=10)
Filter: (dokumnr IS NULL)



Table makse contains 120 rows and about 800 rows with dokumnr is null so 
using index is much faster that seq scan.

How to fix ?

Andrus.

PostgreSQL 8.1.4 on i686-pc-linux-gnu, compiled by GCC 
i686-pc-linux-gnu-gcc (GCC) 3.4.6 (Gentoo 3.4.6-r1, ssp-3.4.5-1.0, 
pie-8.7.9)



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Simple indexed IN query takes 40 seconds

2008-11-10 Thread Andrus

I found that simple IN query on indexed tables takes too much time.

dok and rid have both indexes on int dokumnr columnr and dokumnr is not 
null.

PostgreSql can use index on dok or event on rid so it can executed fast.

How to make this query to run fast ?

Andrus.



note: list contain a lot of integers, output below is abbreviated in this
part.

explain analyze select
   sum(rid.kogus)
  from dok JOIN rid USING(dokumnr)
  where dok.dokumnr in
(869906,869907,869910,869911,869914,869915,869916,869917,869918,869921,869925,869926,869928,869929,869934,869935,869936,...)

Aggregate  (cost=327569.15..327569.16 rows=1 width=9) (actual
time=39749.842..39749.846 rows=1 loops=1)
  -  Hash Join  (cost=83872.74..327537.74 rows=12563 width=9) (actual
time=25221.702..39697.249 rows=11857 loops=1)
Hash Cond: (outer.dokumnr = inner.dokumnr)
-  Seq Scan on rid  (cost=0.00..195342.35 rows=3213135 width=13)
(actual time=0.046..26347.959 rows=3243468 loops=1)
-  Hash  (cost=83860.76..83860.76 rows=4792 width=4) (actual
time=128.366..128.366 rows=4801 loops=1)
  -  Bitmap Heap Scan on dok  (cost=9618.80..83860.76
rows=4792 width=4) (actual time=58.667..108.611 rows=4801 loops=1)
Recheck Cond: ((dokumnr = 869906) OR (dokumnr = 869907)
OR (dokumnr = 869910) OR (dokumnr = 869911) OR (dokumnr = 869914) OR
(dokumnr = 869915) OR (dokumnr = 869916) OR (dokumnr = 869917) OR (dokumnr =
869918) OR (dokumnr = 869921) OR (dokumnr = 869925) OR (dokumnr = 869926) OR
(dokumnr = 869928) OR (dokumnr = 869929) OR (dokumnr = 869934) OR (dokumnr =
869935) OR (dokumnr = 869936) OR (dokumnr = 869937) OR (dokumnr = 869940) OR
(dokumnr = 869941) OR (dokumnr = 869945) OR (dokumnr = 869951) OR (dokumnr =
869964) OR (dokumnr = 869966) OR (dokumnr = 869969) OR (dokumnr = 869974) OR
(dokumnr = 869979) OR (dokumnr = 869986) OR (dokumnr = 869992) OR (dokumnr =
869993) OR (dokumnr = 869995) OR (dokumnr = 869997) OR (dokumnr = 870007) OR
(dokumnr = 870018) OR (dokumnr = 870021) OR (dokumnr = 870023) OR (dokumnr =
870025) OR (dokumnr = 870033) OR (dokumnr = 870034) OR (dokumnr = 870036) OR
(dokumnr = 870038) OR (dokumnr = 870043) OR (dokumnr = 870044) OR (dokumnr =
870046) OR (dokumnr = 870050) OR (dokumnr = 870051) OR (dokumnr = 870053) OR
(dokumnr = 870054) OR (dokumnr = 870055) OR (dokumnr = 870064) OR (dokumnr =
870066) OR (dokumnr = 870069) OR (dokumnr = 870077) OR (dokumnr = 870079) OR
(dokumnr = 870081) OR (dokumnr = 870084) OR (dokumnr = 870085) OR (dokumnr =
870090) OR (dokumnr = 870096) OR (dokumnr = 870110) OR (dokumnr = 870111) OR
(dokumnr = 870117) OR (dokumnr = 870120) OR (dokumnr = 870124) OR (dokumnr =
870130)
...
OR (dokumnr = 890907) OR (dokumnr = 890908))
-  BitmapOr  (cost=9618.80..9618.80 rows=4801 width=0)
(actual time=58.248..58.248 rows=0 loops=1)
  -  Bitmap Index Scan on dok_dokumnr_idx
(cost=0.00..2.00 rows=1 width=0) (actual time=0.052..0.052 rows=3 loops=1)
Index Cond: (dokumnr = 869906)
  -  Bitmap Index Scan on dok_dokumnr_idx
(cost=0.00..2.00 rows=1 width=0) (actual time=0.011..0.011 rows=3 loops=1)
Index Cond: (dokumnr = 869907)
  -  Bitmap Index Scan on dok_dokumnr_idx
(cost=0.00..2.00 rows=1 width=0) (actual time=0.020..0.020 rows=3 loops=1)
Index Cond: (dokumnr = 869910)
  -  Bitmap Index Scan on dok_dokumnr_idx
(cost=0.00..2.00 rows=1 width=0) (actual time=0.010..0.010 rows=3 loops=1)
Index Cond: (dokumnr = 869911)
  -  Bitmap Index Scan on dok_dokumnr_idx
(cost=0.00..2.00 rows=1 width=0) (actual time=0.008..0.008 rows=3 loops=1)
Index Cond: (dokumnr = 869914)
...
  -  Bitmap Index Scan on dok_dokumnr_idx
(cost=0.00..2.00 rows=1 width=0) (actual time=0.008..0.008 rows=1 loops=1)
Index Cond: (dokumnr = 890908)
Total runtime: 39771.385 ms

PostgreSQL 8.1.4 on i686-pc-linux-gnu, compiled by GCC 
i686-pc-linux-gnu-gcc (GCC) 3.4.6 (Gentoo 3.4.6-r1, ssp-3.4.5-1.0, 
pie-8.7.9) 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Simple indexed IN query takes 40 seconds

2008-11-10 Thread Andrus

Tom,


Using something newer than 8.1 would help.


Thank you.

If 


CREATE TEMP TABLE  ids ( id int ) ON COMMIT DROP;

is created, ids are added to this table and 
ids table is used in inner join insted of IN clause or


IN clause is replaced with

... dokumnr IN ( SELECT id FROM ids )  ...

, will this fix the issue in 8.1.4 ?

Andrus.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] why group expressions cause query to run forever

2006-06-27 Thread Andrus
 I think the problem is probably that you're sorting two dozen CHAR
 columns, and that in many of the rows all these entries are '' forcing
 the sort code to compare all two dozen columns (not so)?

Yes, most of columns return empty strings.

I changed empty strings to null, casted to varchar and simplyfied the 
statment.
However, this select statement runs forever.

Any idea how to speed it up ?

Andrus.

SELECT
bilkaib.DB,
CASE WHEN dbkonto.objekt1='+' THEN bilkaib.DBOBJEKT ELSE null 
END::VARCHAR(10) AS dbobjekt,
CASE WHEN dbkonto.objekt2='+' THEN bilkaib.DB2OBJEKT ELSE null 
END::VARCHAR(10) AS db2objekt,
CASE WHEN dbkonto.objekt3='+' THEN bilkaib.DB3OBJEKT ELSE null 
END::VARCHAR(10) AS db3objekt,
CASE WHEN dbkonto.objekt4='+' THEN bilkaib.DB4OBJEKT ELSE null 
END::VARCHAR(10) AS db4objekt,
CASE WHEN dbkonto.objekt5='+' THEN bilkaib.DB5OBJEKT ELSE null 
END::VARCHAR(10) AS db5objekt,
CASE WHEN dbkonto.objekt6='+' THEN bilkaib.DB6OBJEKT ELSE null 
END::VARCHAR(10) AS db6objekt,
CASE WHEN dbkonto.objekt7='+' THEN bilkaib.DB7OBJEKT ELSE null 
END::VARCHAR(10) AS db7objekt,
CASE WHEN dbkonto.objekt8='+' THEN bilkaib.DB8OBJEKT ELSE null 
END::VARCHAR(10) AS db8objekt,
CASE WHEN dbkonto.objekt9='+' THEN bilkaib.DB9OBJEKT ELSE null 
END::VARCHAR(10) AS db9objekt
  from BILKAIB join KONTO CRKONTO ON bilkaib.cr=crkonto.kontonr
  join KONTO DBKONTO ON bilkaib.db=dbkonto.kontonr
 where
 bilkaib.kuupaev BETWEEN '2006-01-01' AND '2006-12-31'
  GROUP BY 1,2,3,4,5,6,7,8,9,10



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] why group expressions cause query to run forever

2006-06-26 Thread Andrus
Tom,

thank you.

 I think the problem is probably that you're sorting two dozen CHAR
 columns, and that in many of the rows all these entries are '' forcing
 the sort code to compare all two dozen columns (not so)?  So the sort
 ends up doing lots and lots and lots of CHAR comparisons.  Which can
 be slow, especially in non-C locales.  What's your locale setting?

show all returns

lc_collate;en_US.UTF-8
lc_ctype;en_US.UTF-8
lc_messages;C
lc_monetary;et_EE.utf-8
lc_numeric;et_EE.utf-8
lc_time;et_EE.utf-8

How to speed up this  query ?
Is it possible to force the binary comparison for grouping ?
Should I concatenate all the char columns into single column ?

Andrus.




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PERFORM] why group expressions cause query to run forever

2006-06-22 Thread Andrus
) AND (kuupaev 
= '2006-12-31'::date))
  -  Hash  (cost=20.29..20.29 rows=227 width=14) (actual 
time=2.112..2.112 rows=227 loops=1)
-  Seq Scan on konto dbkonto  (cost=0.00..20.29 
rows=227 width=14) (actual time=0.011..1.126 rows=227 loops=1)
  Filter: (iseloom = 'A'::bpchar)
-  Hash  (cost=20.29..20.29 rows=227 width=14) (actual 
time=2.149..2.149 rows=227 loops=1)
  -  Seq Scan on konto crkonto  (cost=0.00..20.29 rows=227 
width=14) (actual time=0.022..1.152 rows=227 loops=1)
Filter: (iseloom = 'A'::bpchar)
Total runtime: 4519.063 ms

Postgres 8.1 on Gentoo Linux.

Andrus. 



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Why date index is not used

2006-06-09 Thread Andrus
Tom,

 Because it doesn't help --- the system still has to do the sort.

It can help a lot in this case.

kuupaev is sales date
kellaaeg is sales time

Postgres can use kuupaev index to fetch first 100 rows plus a number of more 
rows whose kellaaeg value is equal to kellaaeg in 100 th row. I have 500 
sales per day.
So it can fetch 600 rows using index on kuupaev column.

After that it can sort those 600 rows fast.
Currently it sorts blindly  all 54000 rows in table.

 You'd need a two-column index on both of the ORDER BY columns to avoid
 sorting.

Thank you. It works.

Andrus. 



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] How to force Postgres to use index on ILIKE

2006-06-06 Thread Andrus
 SELECT toode, nimetus
 FROM toode
 WHERE toode  ILIKE  'x10%' ESCAPE '!'
 ORDER BY UPPER(toode ),nimetus  LIMIT 100

 runs 1 minute in first time for small table size.

 Toode field type is CHAR(20)

 1) why are you using CHAR and not VARCHAR or TEXT?   CHAR will give you
 problems using an index, period.

1. I haven't seen any example where VARCHAR is better that CHAR for indexing
2. I have a lot of existing code. Changing CHAR to VARCHAR requires probably 
re-writing a lot of code, a huge work.

 2) You can't use an index on ILIKE.

I'ts very sad. I expected that lower(toode) index can be used.


 You can, however, use an index on
 lower(field) if your query is properly phrased and if you've created an
 expression index on lower(field).

I tried by Postgres does not use index. Why ?

create index nimib2 on firma1.klient(lower(nimi) bpchar_pattern_ops);

explain analyze select nimi from firma1.klient where lower(nimi) like
'mokter%'

Seq Scan on klient  (cost=0.00..9.79 rows=1 width=74) (actual
time=0.740..0.761 rows=1 loops=1)
  Filter: (lower((nimi)::text) ~~ 'mokter%'::text)
Total runtime: 0.877 ms




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PERFORM] How to force Postgres to use index on ILIKE

2006-06-05 Thread Andrus
I have UTF-8 Postgres 8.1 database on W2K3

Query

SELECT toode, nimetus
FROM toode
WHERE toode  ILIKE  'x10%' ESCAPE '!'
ORDER BY UPPER(toode ),nimetus  LIMIT 100

runs 1 minute in first time for small table size.

Toode field type is CHAR(20)

How to create index on toode field so that query can use  it ?




---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] Query runs 38 seconds for small database!

2006-05-08 Thread Andrus
I have small database. However the following query takes 38 (!) seconds to
run.
How to speed it up (preferably not changing table structures but possibly
creating indexes) ?

Andrus.

set search_path to public,firma1;
explain analyze select bilkaib.summa  from BILKAIB join KONTO CRKONTO ON
bilkaib.cr=crkonto.kontonr AND
  crkonto.iseloom='A'
join KONTO DBKONTO ON bilkaib.db=dbkonto.kontonr AND
  dbkonto.iseloom='A'
left join klient on bilkaib.klient=klient.kood
  where  ( bilkaib.cr LIKE '3'||'%' OR
   bilkaib.db LIKE '3'||'%' )
 AND bilkaib.kuupaev BETWEEN '2006-01-01' AND '2006-03-31'
AND  ( kuupaev='20060101' OR  (cr!='00' and db!='00'))
AND ( 3 IN(2,3) or (NOT bilkaib.ratediffer and (
 TRIM(bilkaib.masin)='' or bilkaib.masin IS NULL or
  bilkaib.alusdok not in ('KV', 'DU', 'DJ') or
bilkaib.andmik is NULL or bilkaib.alusdok is NULL or
substring(andmik from 1 for 9)!='Kursivahe'
 ))) and
(position(bilkaib.laustyyp IN 'x')=0 or
bilkaib.laustyyp is null or bilkaib.laustyyp=' ')


Nested Loop Left Join  (cost=23.30..1964.10 rows=1 width=10) (actual
time=7975.470..38531.724 rows=3151 loops=1)
  -  Nested Loop  (cost=23.30..1958.08 rows=1 width=26) (actual
time=7975.407..37978.718 rows=3151 loops=1)
Join Filter: (inner.cr = outer.kontonr)
-  Seq Scan on konto crkonto  (cost=0.00..23.30 rows=1 width=44)
(actual time=0.135..13.913 rows=219 loops=1)
  Filter: (iseloom = 'A'::bpchar)
-  Hash Join  (cost=23.30..1934.64 rows=11 width=40) (actual
time=1.650..155.734 rows=3151 loops=219)
  Hash Cond: (outer.db = inner.kontonr)
  -  Index Scan using bilkaib_kuupaev_idx on bilkaib
(cost=0.00..1897.10 rows=2826 width=54) (actual time=1.628..111.216
rows=3151 loops=219)
Index Cond: ((kuupaev = '2006-01-01'::date) AND
(kuupaev = '2006-03-31'::date))
Filter: (((cr ~~ '3%'::text) OR (db ~~ '3%'::text)) AND
((kuupaev = '2006-01-01'::date) OR ((cr  '00'::bpchar) AND (db 
'00'::bpchar))) AND ((position('x'::text, (laustyyp)::text) = 0) OR
(laustyyp IS NULL) OR (laustyyp = ' '::bpc (..)
  -  Hash  (cost=23.30..23.30 rows=1 width=44) (actual
time=2.278..2.278 rows=219 loops=1)
-  Seq Scan on konto dbkonto  (cost=0.00..23.30 rows=1
width=44) (actual time=0.017..1.390 rows=219 loops=1)
  Filter: (iseloom = 'A'::bpchar)
  -  Index Scan using klient_pkey on klient  (cost=0.00..6.01 rows=1
width=52) (actual time=0.138..0.158 rows=1 loops=3151)
Index Cond: (outer.klient = klient.kood)
Total runtime: 38561.745 ms





CREATE TABLE firma1.bilkaib
(
  id int4 NOT NULL DEFAULT nextval('bilkaib_id_seq'::regclass),
  kuupaev date NOT NULL,
  db char(10) NOT NULL,
  dbobjekt char(10),
  cr char(10) NOT NULL,
  crobjekt char(10),
  summa numeric(14,2) NOT NULL,
  raha char(3) NOT NULL,
  masin char(5),
  klient char(12),
  alusdok char(2),
  dokumnr int4 NOT NULL DEFAULT nextval('bilkaib_dokumnr_seq'::regclass),
  db2objekt char(10),
  cr2objekt char(10),
  db3objekt char(10),
  db4objekt char(10),
  db5objekt char(10),
  db6objekt char(10),
  db7objekt char(10),
  db8objekt char(10),
  db9objekt char(10),
  cr3objekt char(10),
  cr4objekt char(10),
  cr5objekt char(10),
  cr6objekt char(10),
  cr7objekt char(10),
  cr8objekt char(10),
  cr9objekt char(10),
  exchrate numeric(13,8),
  doknr char(25),
  andmik text,
  laustyyp char(1),
  ratediffer ebool,
  adoknr char(25),
  jarjeknr numeric(7),
  CONSTRAINT bilkaib_pkey PRIMARY KEY (id),
  CONSTRAINT bilkaib_alusdok_fkey FOREIGN KEY (alusdok)
  REFERENCES firma1.alusdok (alusdok) MATCH SIMPLE
  ON UPDATE CASCADE ON DELETE NO ACTION DEFERRABLE INITIALLY IMMEDIATE,
  CONSTRAINT bilkaib_cr2objekt_fkey FOREIGN KEY (cr2objekt)
  REFERENCES firma1.yksus2 (yksus) MATCH SIMPLE
  ON UPDATE CASCADE ON DELETE NO ACTION DEFERRABLE INITIALLY IMMEDIATE,
  CONSTRAINT bilkaib_cr3objekt_fkey FOREIGN KEY (cr3objekt)
  REFERENCES firma1.yksus3 (yksus) MATCH SIMPLE
  ON UPDATE CASCADE ON DELETE NO ACTION DEFERRABLE INITIALLY IMMEDIATE,
  CONSTRAINT bilkaib_cr4objekt_fkey FOREIGN KEY (cr4objekt)
  REFERENCES firma1.yksus4 (yksus) MATCH SIMPLE
  ON UPDATE CASCADE ON DELETE NO ACTION DEFERRABLE INITIALLY IMMEDIATE,
  CONSTRAINT bilkaib_cr5objekt_fkey FOREIGN KEY (cr5objekt)
  REFERENCES firma1.yksus5 (yksus) MATCH SIMPLE
  ON UPDATE CASCADE ON DELETE NO ACTION DEFERRABLE INITIALLY IMMEDIATE,
  CONSTRAINT bilkaib_cr6objekt_fkey FOREIGN KEY (cr6objekt)
  REFERENCES firma1.yksus6 (yksus) MATCH SIMPLE
  ON UPDATE CASCADE ON DELETE NO ACTION DEFERRABLE INITIALLY IMMEDIATE,
  CONSTRAINT bilkaib_cr7objekt_fkey FOREIGN KEY (cr7objekt)
  REFERENCES firma1.yksus7 (yksus) MATCH SIMPLE
  ON UPDATE CASCADE ON DELETE NO ACTION DEFERRABLE INITIALLY IMMEDIATE,
  CONSTRAINT bilkaib_cr8objekt_fkey FOREIGN KEY (cr8objekt)
  REFERENCES firma1.yksus8 (yksus) MATCH SIMPLE

Re: [PERFORM] Query runs 38 seconds for small database!

2006-05-08 Thread Andrus
 -  Seq Scan on konto dbkonto  (cost=0.00..23.30 
 rows=1
 width=44) (actual time=0.017..1.390 rows=219 loops=1)
   Filter: (iseloom = 'A'::bpchar)

 Anytime you see a row estimate that far off about a simple single-column
 condition, it means your statistics are out-of-date.

Than you. I addded ANALYZE command and now  query works fast.

I see autovacuum: processing database mydb messages in log file and I have

stats_start_collector = on
stats_row_level = on

in config file. Why statistics was out-of-date ?

Andrus.


My postgres.conf file (only uncommented settings are listed):

listen_addresses = '*'
max_connections = 40
shared_buffers = 1000
log_destination = 'stderr'
redirect_stderr = on   # Enable capturing of stderr into log
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' # Log file name pattern.
log_rotation_age = 1440  # Automatic rotation of logfiles will
log_rotation_size = 10240  # Automatic rotation of logfiles will
log_min_error_statement = 'warning' # Values in order of increasing 
severity:
silent_mode = on
log_line_prefix = '%t %u %d %h %p %i %l %x %q'
stats_start_collector = on
stats_row_level = on
autovacuum = on   # enable autovacuum subprocess?
lc_messages = 'C'   # locale for system error message
lc_monetary = 'C'   # locale for monetary formatting
lc_numeric = 'C'   # locale for number formatting
lc_time = 'C'# locale for time formatting



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Query runs 38 seconds for small database!

2006-05-08 Thread Andrus
 BTW, you might want to cut all the autovac thresholds in half; that's
 what I typically do.

I added ANALYZE command to my procedure which creates and loads data to 
postgres database
from other DBMS. This runs only onvce after installing my application.  I 
hope this is sufficient.
If default threshold is so conservative values I expect there is some reason 
for it.

Andrus. 



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Query runs 38 seconds for small database!

2006-05-08 Thread Andrus
 The only reason for being so conservative that I'm aware of was that it
 was a best guess. Everyone I've talked to cuts the defaults down by at
 least a factor of 2, sometimes even more.

Can we ask that Tom will change default values to 2 times smaller in 8.1.4 ?

 BTW, these parameters are already tweaked from what we started with in
 contrib/pg_autovacuum. It would allow a table to grow to 2x larger than
 it should be before vacuuming, as opposed to the 40% that the current
 settings allow. But even there, is there any real reason you want to
 have 40% bloat? To make matters worse, those settings ensure that all
 but the smallest databases will suffer runaway bloat unless you bump up
 recprd the FSM settings.

I created empty table konto and loaded more that 219 records to it during 
database creation.
So it seems that if table grows from zero to more than 219 times larger then 
it was still not processed.

Andrus. 



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] CPU usage goes to 100%, query seems to ran forever

2006-04-28 Thread Andrus
 You have ANALYZEd all these tables recently, I hope?  The planner
 certainly doesn't think this query will take very long.

I have autovacuum running so I expect it takes care of ANALYZE, isn't it ?

I ran also analyze command before running explain analyze.

 To find out what's wrong, you're going to have to be patient enough to
 let an EXPLAIN ANALYZE run to completion.  Plain EXPLAIN won't tell.

Here it is running in my local computer. I'm expecting run time no more 1 
second

Nested Loop Left Join  (cost=0.00..1829.95 rows=1 width=24) (actual
time=492064.990..492064.990 rows=0 loops=1)
  Filter: (((inner.arttyyp  'Teenus'::bpchar) AND (inner.arttyyp 
'Komplekt'::bpchar)) OR (inner.arttyyp IS NULL))
  -  Nested Loop  (cost=0.00..1825.01 rows=1 width=43) (actual
time=492064.983..492064.983 rows=0 loops=1)
-  Nested Loop  (cost=0.00..1819.04 rows=1 width=24) (actual
time=492064.978..492064.978 rows=0 loops=1)
  Join Filter: ((outer.dokumnr = inner.dokumnr) AND
(((position('VGYKITDNHMEBARCFJ'::text, (outer.doktyyp)::text)  0) AND
(CASE WHEN ((NOT (outer.objrealt)::boolean) OR (outer.doktyyp =
'I'::bpchar)) THEN outer.yksus ELSE inner (..)
  -  Seq Scan on dok  (cost=0.00..787.80 rows=1 width=39)
(actual time=0.152..878.198 rows=7670 loops=1)
Filter: ((kuupaev = '2006-04-08'::date) AND (kuupaev
= '2006-04-27'::date) AND (NOT (eimuuda)::boolean) AND ((laonr)::double
precision = 1::double precision) AND (position('OSIDVGYKIF'::text,
(doktyyp)::text)  0) AND (((kuupaev):: (..)
  -  Seq Scan on rid  (cost=0.00..1019.42 rows=315 width=51)
(actual time=22.003..62.216 rows=839 loops=7670)
Filter: ((toode = '1EEKPANT'::bpchar) AND
(length(btrim((toode)::text))  2) AND (toode IS NOT NULL))
-  Index Scan using toode_pkey on toode  (cost=0.00..5.96 rows=1
width=43) (never executed)
  Index Cond: ('1EEKPANT'::bpchar = toode)
  -  Index Scan using artliik_pkey on artliik  (cost=0.00..4.92 rows=1
width=31) (never executed)
Index Cond: ((outer.grupp = artliik.grupp) AND (outer.liik =
artliik.liik))
Total runtime: 492065.840 ms


Andrus. 



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] CPU usage goes to 100%, query seems to ran forever

2006-04-28 Thread Andrus
 Something seems to have truncated your EXPLAIN output, but anyway we
 can see where the problem is:

I copied it from pgAdmin in 640x480  screen resolution in XP
Maybe pgAdmin bug ?

 The planner is expecting to get one row from dok passing the filter
 condition, and hence chooses a plan that is suitable for a small number
 of rows ... but in reality there are 7670 rows matching the filter
 condition, and that's what blows the runtime out of the water.  (Most of
 the runtime is actually going into 7670 repeated scans of rid, which
 wouldn't have happened with another plan type.)

I added index

CREATE INDEX rid_toode_idx  ON firma2.rid  USING btree  (toode);

and query start working fast !

 So you need to see about getting that estimate to be more accurate.
 First thing is to make sure that dok has been ANALYZEd --- just do it
 by hand.

As I wrote I have autovacuum running. Is'nt this sufficient ?

  I can see at least three things you are doing that are
 unnecessarily destroying the planner's ability to estimate the number of
 matching rows:

 dok.laonr='1'::float8  and
 Since laonr apparently isn't float8, this forces a runtime type
 conversion as well as interfering with statistics use.  (The planner
 will have ANALYZE stats about dok.laonr, but the connection to
 dok.laonr::float8 escapes it.)  Just write the constant with quotes
 and no type coercion.

I re-wrote it as

dok.laonr=1

this query is automatically generated by VFP and ODBC parameter substitution 
which adds those type conversions.
VFP has only float8 type and it probably forces ODBC driver convert numbers 
to float8

 POSITION( dok.doktyyp IN 'OSIDVGYKIF')!=0 AND

 This is completely unestimatable given the available statistics, and it
 doesn't look to me like it is all that great a semantic representation
 either.  Perhaps the query that's really meant here is dok.doktypp IN
 ('O','S','I', ...)?  If so, you should say what you mean, not play
 games with converting the query into some strange string operation.

'OSID ...' is a string parameter substituted to SELECT template.
changing this to IN ( 'O', 'S', .. requires re-writing parts of code and I'm 
not sure it makes code faster.

  AND dok.kuupaev||dok.kellaaeg BETWEEN '2006-04-08' AND '2006-04-2723 59'

 This is another case where the planner is not going to have any ability
 to make a useful estimate, and it's because you are using a crummy
 representation of your data.  You should merge those two columns into
 one timestamp column and just do a simple BETWEEN test.
 By and large, unnatural representations of data that you use in WHERE
 clauses are going to cost you big-time in SQL queries.  It's worth
 taking time up front to design a clean table schema, and taking time
 to revise it when requirements change.

date range test in other part of where clause

dok.kuupaev   BETWEEN 

 is optimizable.

AND dok.kuupaev||dok.kellaaeg adds time range test to date range.
There are less that some thousands documents per day.

Wasting time to re-engineer database and deployed application seems not 
reasonable in this case.

Andrus. 



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] CPU usage goes to 100%, query seems to ran forever

2006-04-27 Thread Andrus
I have small database running in  8.1.3 in W2K server.
The following query causes Postgres process to use 100% CPU and seems to run 
forever.
If I change '1EEKPANT' to less frequently used item code, it runs fast.

How to speed it up ?

set search_path to public,firma2;
 select  rid.toode
   FROM dok JOIN rid USING (dokumnr)
   JOIN toode USING (toode)
   LEFT JOIN artliik ON toode.grupp=artliik.grupp and
toode.liik=artliik.liik
   WHERE (NOT '0'  or dok.kinnitatud)
  AND dok.kuupaev BETWEEN '2006-04-08' AND '2006-04-27'
  AND rid.toode='1EEKPANT'
 AND (NOT dok.eimuuda or '0' ) and
 dok.laonr='1'::float8  and
 POSITION( dok.doktyyp IN 'OSIDVGYKIF')!=0 AND
 ( ( ('1'  OR (POSITION(dok.doktyyp IN 'TUNH')=0 and
   (rid.kogus0 or
('1'  and rid.kogus=0
 and
 POSITION(dok.doktyyp IN 'VGYKITDNHMEBARCFJ' )!=0
  AND CASE WHEN NOT dok.objrealt OR dok.doktyyp='I' THEN dok.yksus 
ELSE rid.kuluobjekt END LIKE 'LADU%' ESCAPE '!'
  )
 OR
 (POSITION(dok.doktyyp IN 'OSIUDP' )!=0
  AND CASE WHEN dok.objrealt THEN rid.kuluobjekt ELSE dok.sihtyksus 
END LIKE 'LADU%' ESCAPE '!'
  )
   )
  AND dok.kuupaev||dok.kellaaeg BETWEEN '2006-04-08' AND '2006-04-2723 59'
 AND ('0'  or ( length(trim(rid.toode))2 AND
 rid.toode is NOT NULL))

 AND ( LENGTH('' )=0 OR rid.partii=''  OR (dok.doktyyp='I' AND
rid.kulupartii=''  ) )
 AND (NOT dok.inventuur or rid.kogus!=0)
   AND dok.dokumnr!= 0
 AND ( artliik.arttyyp NOT IN ('Teenus', 'Komplekt' ) OR artliik.arttyyp IS 
NULL)


explain returns:

Nested Loop Left Join  (cost=0.00..1828.18 rows=1 width=24)
  Filter: (((inner.arttyyp  'Teenus'::bpchar) AND (inner.arttyyp  
'Komplekt'::bpchar)) OR (inner.arttyyp IS NULL))
  -  Nested Loop  (cost=0.00..1822.51 rows=1 width=43)
-  Nested Loop  (cost=0.00..1816.56 rows=1 width=24)
  Join Filter: ((outer.dokumnr = inner.dokumnr) AND 
(((position('VGYKITDNHMEBARCFJ'::text, (outer.doktyyp)::text)  0) AND 
(CASE WHEN ((NOT (outer.objrealt)::boolean) OR (outer.doktyyp = 
'I'::bpchar)) THEN outer.yksus ELSE inner (..)
  -  Seq Scan on dok  (cost=0.00..787.80 rows=1 width=39)
Filter: ((kuupaev = '2006-04-08'::date) AND (kuupaev 
= '2006-04-27'::date) AND (NOT (eimuuda)::boolean) AND ((laonr)::double 
precision = 1::double precision) AND (position('OSIDVGYKIF'::text, 
(doktyyp)::text)  0) AND (((kuupaev):: (..)
  -  Seq Scan on rid  (cost=0.00..1019.42 rows=249 width=51)
Filter: ((toode = '1EEKPANT'::bpchar) AND 
(length(btrim((toode)::text))  2) AND (toode IS NOT NULL))
-  Index Scan using toode_pkey on toode  (cost=0.00..5.94 rows=1 
width=43)
  Index Cond: ('1EEKPANT'::bpchar = toode)
  -  Index Scan using artliik_pkey on artliik  (cost=0.00..5.65 rows=1 
width=88)
Index Cond: ((outer.grupp = artliik.grupp) AND (outer.liik = 
artliik.liik))


Andrus. 



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Query runs too long for indexed tables

2006-04-05 Thread Andrus
 Let me guess, you've updated it a lot and aren't familiar with Vacuum?

 run a vacuum full on your database.  schedule a vacuum (plain one) to
 run every so often (hours or days are a good interval for most folks)

 If that's NOT your problem, then please, let us know.

Scot, thank you. Excellent. If database is created and VACUUM ANALYZE is 
issued, this query runs fast.
However, I need to speed up it during running script.

This is a database creation script. Script does the following:

1. CREATE DATABASE foo;
2. START TRANSACTION;
3. Create 145 tables with primary keys. Add data to those tables.
4. Create some additional indexes
5. ANALYZE
6. Clear bad bad foreign keys fields using commands like

UPDATE firma1.rid SET toode=NULL
WHERE toode IS NOT NULL AND
toode NOT IN (SELECT TOODE  FROM firma1.TOODE);

7. Create foreign key references
8. COMMIT

This script runs about 1 hour  in modern server with  fsync off.
Largest table has 10 records, few other tables have 15000 records and 
remaining have fewer records.

How to speed this up ?
Is'nt running ANALYZE sufficient to speed up foreign key clearing ?

It seems that ANALYZE does'nt work. Should I isse COMMIT before running 
ANALYZE or issue more commits?

Server has 4 GB RAM

postgres.conf file is default from 8.1.3 window zip file  except the 
following settings are added to end:

fsync=off
shared_buffers = 3
redirect_stderr = on
log_min_error_statement = error
autovacuum = on
... also 2 stats settings from aurtovacuur
max_fsm_pages = 3

Andrus. 



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Query runs too long for indexed tables

2006-04-05 Thread Andrus
 Why not use a LEFT JOIN for this ?

UPDATE firma1.rid SET rid.toode=NULL
LEFT join firma1.toode using(toode)
 WHERE rid.toode IS NOT NULL AND toode.toode IS NULL;

Causes:

ERROR:  syntax error at or near LEFT at character 41

outer joins are not supported in Postgres UPDATE command.

Andrus. 



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PERFORM] Query runs too long for indexed tables

2006-04-04 Thread Andrus
I have relatively small tables (toode and rid) in fast server.
Both tables are indexed on toode field.

Following query takes long time to run.
toode field type is char(20). It is difficult to change this field type.

Any idea how to speed up this query ?

UPDATE firma1.rid SET toode=NULL
   WHERE toode IS NOT NULL AND
   toode NOT IN (SELECT TOODE  FROM firma1.TOODE);

Query returned successfully: 0 rows affected, 594813 ms execution time.

explain window shows:

Seq Scan on rid  (cost=2581.07..20862553.77 rows=51848 width=1207)
  Filter: ((toode IS NOT NULL) AND (NOT (subplan)))
  SubPlan
-  Materialize  (cost=2581.07..2944.41 rows=14734 width=84)
  -  Seq Scan on toode  (cost=0.00..2350.34 rows=14734 width=84)


Andrus. 



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings