Re: [PERFORM] Slow query with planner row strange estimation

2010-07-05 Thread damien hostin

Hello,

Before the week end I tried to change the index, but even with the 
mono-column index on differents columns, the estimated number of rows 
from dwhinv is 1.


Anyone have a suggestion, what can I check ?


thx


damien hostin a écrit :

Hello,

I try to make a query run quicker but I don't really know how to give 
hints to the planner.


We are using postgresql 8.4.3 64bit on ubuntu 9.10 server. The 
hardware is a 10 SAS drive (15k) on a single RAID 10 array with 8Go RAM.
Queries come from J2EE application (OLAP cube), but running them in 
pg_admin perform the same way.


I made a short example that shows what I think is the problem. The 
real query is much longer but with only one join it already cause 
problems.


Here is the short example :

select rfoadv_8.rfoadvsup as c8,
   sum(dwhinv.dwhinvqte) as m0
from
   dwhinv as dwhinv,
   rfoadv as rfoadv_8
where (dwhinv.dwhinv___rforefide = 'HPLUS'
 and  (dwhinv.dwhinv___rfodomide = 'PMSI'  and 
dwhinv.dwhinv___rfoindrvs = '1' and 
dwhinv.dwhinv___rfoindide='recN3_BB_reel') )

 and  dwhinv.dwhinv_p2rfodstide = rfoadv_8.rfoadvinf
 and rfoadv_8.rfoadvsup = 'ACTI'
group by rfoadv_8.rfoadvsup

dwhinv is a table with almost 6.000.000 records
rfoadv is a view with 800.000 records
rfoadv is based on rfoade which is 50.000 records

Here is the explain analyse :
GroupAggregate  (cost=0.00..16.56 rows=1 width=13) (actual 
time=2028.452..2028.453 rows=1 loops=1)
 -  Nested Loop  (cost=0.00..16.54 rows=1 width=13) (actual 
time=0.391..1947.432 rows=42664 loops=1)
   Join Filter: (((ade2.rfoadegch)::text = 
(ade1.rfoadegch)::text) AND ((ade2.rfoadedrt)::text = 
(ade1.rfoadedrt)::text))
   -  Nested Loop  (cost=0.00..12.54 rows=1 width=214) (actual 
time=0.304..533.281 rows=114350 loops=1)
 -  Index Scan using dwhinv_rdi_idx on dwhinv  
(cost=0.00..4.87 rows=1 width=12) (actual time=0.227..16.827 rows=6360 
loops=1)
   Index Cond: (((dwhinv___rforefide)::text = 
'HPLUS'::text) AND ((dwhinv___rfodomide)::text = 'PMSI'::text) AND 
((dwhinv___rfoindide)::text = 'recN3_BB_reel'::text) AND 
(dwhinv___rfoindrvs = 1))
 -  Index Scan using rfoade_dsi_idx on rfoade ade2  
(cost=0.00..7.63 rows=3 width=213) (actual time=0.007..0.037 rows=18 
loops=6360)
   Index Cond: ((ade2.rfoade_i_rfodstide)::text = 
(dwhinv.dwhinv_p2rfodstide)::text)
   -  Index Scan using rfoade_pk on rfoade ade1  (cost=0.00..3.98 
rows=1 width=213) (actual time=0.008..0.009 rows=0 loops=114350)
 Index Cond: (((ade1.rfoade___rforefide)::text = 
(ade2.rfoade___rforefide)::text) AND ((ade1.rfoade_i_rfodstide)::text 
= 'ACTI'::text) AND ((ade1.rfoade___rfovdeide)::text = 
(ade2.rfoade___rfovdeide)::text) AND (ade1.rfoadervs = ade2.rfoadervs))


We can see that the planner think that accessing dwhinv with the 
dwhinv_rdi_idx index will return 1 row, but in fact there are 6360. So 
the nested loop is not done with 1 loop but 6360. With only one Join, 
the query runs in about 1.5 sec which is not really long, but with 8 
join, the same mistake is repeated 8 times, the query runs in 30-60 
sec. I try to disable nested loop, hash join and merge join are done 
instead of nested loops, example query runs in 0.2 - 0.5 sec, and the 
real query no more that 1 sec ! Which is great.


Here is the execution plan with nested loop off:

GroupAggregate  (cost=12.56..2453.94 rows=1 width=13) (actual 
time=817.306..817.307 rows=1 loops=1)
 -  Hash Join  (cost=12.56..2453.93 rows=1 width=13) (actual 
time=42.583..720.746 rows=42664 loops=1)
   Hash Cond: (((ade2.rfoade___rforefide)::text = 
(ade1.rfoade___rforefide)::text) AND ((ade2.rfoade___rfovdeide)::text 
= (ade1.rfoade___rfovdeide)::text) AND (ade2.rfoadervs = ade1.rfoadervs))
   Join Filter: (((ade2.rfoadegch)::text = 
(ade1.rfoadegch)::text) AND ((ade2.rfoadedrt)::text = 
(ade1.rfoadedrt)::text))
   -  Hash Join  (cost=4.88..2446.21 rows=1 width=214) (actual 
time=42.168..411.962 rows=114350 loops=1)
 Hash Cond: ((ade2.rfoade_i_rfodstide)::text = 
(dwhinv.dwhinv_p2rfodstide)::text)
 -  Seq Scan on rfoade ade2  (cost=0.00..2262.05 
rows=47805 width=213) (actual time=0.057..78.988 rows=47805 loops=1)
 -  Hash  (cost=4.87..4.87 rows=1 width=12) (actual 
time=41.632..41.632 rows=6360 loops=1)
   -  Index Scan using dwhinv_rdi_idx on dwhinv  
(cost=0.00..4.87 rows=1 width=12) (actual time=0.232..28.199 rows=6360 
loops=1)
 Index Cond: (((dwhinv___rforefide)::text = 
'HPLUS'::text) AND ((dwhinv___rfodomide)::text = 'PMSI'::text) AND 
((dwhinv___rfoindide)::text = 'recN3_BB_reel'::text) AND 
(dwhinv___rfoindrvs = 1))
   -  Hash  (cost=7.63..7.63 rows=3 width=213) (actual 
time=0.347..0.347 rows=11 loops=1)
 -  Index Scan using rfoade_dsi_idx on rfoade ade1  
(cost=0.00..7.63 rows=3 width=213) (actual time=0.095..0.307 rows=11 
loops=1)
   Index Cond: 

[PERFORM] Two equivalent WITH RECURSIVE queries, one of them slow.

2010-07-05 Thread Octavio Alvarez

Hello.

I have a tree-like table with a three-field PK (name, date, id) and one  
parent field.
It has 5k to 6k records as of now, but it will hold about 1 million  
records.


I am trying the following WITH RECURSIVE query:

WITH RECURSIVE t AS (
 SELECT par.id AS tid, par.name, par.date, par.id,  
par.text, par.h_title, par.h_name, par.parent

   FROM _books.par
UNION
 SELECT t.tid AS pid, p.name, p.date, p.id, p.text,  
p.h_title, p.h_name, p.parent

   FROM t, _books.par p
  WHERE p.name = t.name AND p.date = t.date AND t.id =  
p.parent

)
 SELECT t.tid, t.name, t.date, t.id, t.text, t.h_title, t.h_name, t.parent
   FROM t WHERE name = 'cfx' AND date = '2009-08-19' AND tid = '28340';

... which takes 2547.503 ms

However, if I try the same query but adding the same WHERE clause to the
non-recursive term, I get much better results.


WITH RECURSIVE t AS (
 SELECT par.id AS tid, par.name, par.date, par.id,  
par.text, par.h_title, par.h_name, par.parent
   FROM _books.par WHERE name = 'cfx' AND date =  
'2009-08-19' AND par.id = '28340'

UNION
 SELECT t.tid AS pid, p.name, p.date, p.id, p.text,  
p.h_title, p.h_name, p.parent

   FROM t, _books.par p
  WHERE p.name = t.name AND p.date = t.date AND t.id =  
p.parent

)
 SELECT t.tid, t.name, t.date, t.id, t.text, t.h_title, t.h_name, t.parent
   FROM t WHERE name = 'cfx' AND date = '2009-08-19' AND tid = '28340';

... which takes 0.221 ms

I am being forced to use the slow query because I want to define it as a
view, leaving the WHERE clause to the application.

I fail to see where the two queries might be different, or, what cases the
slow one considers that the fast one doesn't, as to get a clue on how to
workaround this.

I have taken the EXPLAIN ANALYZE output for both queries. It looks like
the slow one is processing all records (read: not adding the WHERE clause
to the non-recursive term).


  QUERY  
PLAN

--
 CTE Scan on t  (cost=96653.20..96820.57 rows=1 width=144) (actual  
time=32.931..2541.792 rows=1 loops=1)
   Filter: (((name)::text = 'cfx'::text) AND (date = '2009-08-19'::date)  
AND (tid = 28340))

   CTE t
 -  Recursive Union  (cost=0.00..96653.20 rows=6086 width=212)  
(actual time=0.017..2442.655 rows=33191 loops=1)
   -  Seq Scan on par  (cost=0.00..237.96 rows=5996 width=208)  
(actual time=0.011..5.591 rows=5996 loops=1)
   -  Merge Join  (cost=8909.74..9629.35 rows=9 width=212)  
(actual time=225.979..254.727 rows=3022 loops=9)
 Merge Cond: (((t.name)::text = (p.name)::text) AND  
(t.date = p.date) AND (t.id = p.parent))
 -  Sort  (cost=7700.54..7850.44 rows=59960 width=44)  
(actual time=58.163..59.596 rows=3685 loops=9)

   Sort name: t.name, t.date, t.id
   Sort Method:  quicksort  Memory: 17kB
   -  WorkTable Scan on t  (cost=0.00..1199.20  
rows=59960 width=44) (actual time=0.027..3.486 rows=3688 loops=9)
 -  Materialize  (cost=1209.20..1284.15 rows=5996  
width=208) (actual time=163.062..177.415 rows=5810 loops=9)
   -  Sort  (cost=1209.20..1224.19 rows=5996  
width=208) (actual time=163.054..172.543 rows=5810 loops=9)

 Sort name: p.name, p.date, p.parent
 Sort Method:  external merge  Disk: 1304kB
 -  Seq Scan on par p  (cost=0.00..237.96  
rows=5996 width=208) (actual time=0.015..3.330 rows=5996 loops=9)

 Total runtime: 2547.503 ms
(17 rows)



QUERY  
PLAN

--
 CTE Scan on t  (cost=927.80..928.10 rows=1 width=144) (actual  
time=0.036..0.132 rows=1 loops=1)
   Filter: (((name)::text = 'cfx'::text) AND (date = '2009-08-19'::date)  
AND (tid = 28340))

   CTE t
 -  Recursive Union  (cost=0.00..927.80 rows=11 width=212) (actual  
time=0.030..0.124 rows=1 loops=1)
   -  Index Scan using par_id on par  (cost=0.00..8.27 rows=1  
width=208) (actual time=0.024..0.026 rows=1 loops=1)

 Index Cond: (id = 28340)
 Filter: (((name)::text = 'cfx'::text) AND (date =  
'2009-08-19'::date))
   -  Nested Loop  (cost=0.00..91.93 rows=1 width=212) (actual  
time=0.091..0.091 rows=0 loops=1)
 Join Filter: (((t.name)::text = (p.name)::text) AND  
(t.date = p.date))
 -  WorkTable Scan on t  (cost=0.00..0.20 rows=10  
width=44) (actual 

Re: [PERFORM] how to (temporarily) disable/minimize benefits of disk block cache or postgresql shared buffer

2010-07-05 Thread Matthew Wakeling

On Fri, Jul 2, 2010 at 8:30 AM, Craig Ringer cr...@postnewspapers.com.auwrote:

Yeah, if you're in a weird virtualized environment like that you're
likely to have problems...


On Sat, 3 Jul 2010, Rajesh Kumar Mallah wrote:

Thanks for thinking about it.I do not understand why u feel OpenVz is weird.
at the most its not very popular.


It's not OpenVz that is wierd, but virtualisation in general. If you are 
running in a virtual machine, then all sorts of things will not run as 
well as expected.


Matthew

--
Contrary to popular belief, Unix is user friendly. It just happens to be
very selective about who its friends are. -- Kyle Hearn

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] SeqScans on boolen values / How to speed this up?

2010-07-05 Thread Jens Hoffrichter
Hello everyone,

We've recently finished developing a bigger webapplication, and we are
about to put it online.

I ran some load tests yesterday, and configured 'slow query' logging
beforehand, so I could see if there might be a performance bottleneck
in the PG. While I discovered no real problems, the log file analysis
using pgFouine revealed two queries, which are executed often, and
take quite a bit some time.

I'm just curious if there is any way to improve the performance of
those queries. I'm seeing SeqScans in the EXPLAIN ANALYZE, but nothing
I have done yet has removed those.

The statements and query plans are:

 Query 1 -

explain analyze SELECT
n.name_short,n.flag,n.nation_id,n.urlidentifier,count(p.person_id) as
athletes from nations n left join persons p on n.nation_id =
p.nation_id left join efclicences e on p.person_id = e.person_id where
continent = 'eu' and  p.deleted = false and p.inactive = false and
e.fencer = true group by
n.name_short,n.flag,n.nation_id,n.urlidentifier order by n.name_short;
   QUERY PLAN
-
 Sort  (cost=9997.21..9997.32 rows=44 width=33) (actual
time=872.000..872.000 rows=44 loops=1)
   Sort Key: n.name_short
   Sort Method:  quicksort  Memory: 28kB
   -  HashAggregate  (cost=9995.45..9996.01 rows=44 width=33) (actual
time=872.000..872.000 rows=44 loops=1)
 -  Hash Join  (cost=5669.49..9611.83 rows=30690 width=33)
(actual time=332.000..720.000 rows=142240 loops=1)
   Hash Cond: (e.person_id = p.person_id)
   -  Seq Scan on efclicences e  (cost=0.00..2917.29
rows=143629 width=8) (actual time=0.000..80.000 rows=143629 loops=1)
 Filter: fencer
   -  Hash  (cost=5285.87..5285.87 rows=30690 width=33)
(actual time=332.000..332.000 rows=142240 loops=1)
 -  Hash Join  (cost=7.10..5285.87 rows=30690
width=33) (actual time=0.000..256.000 rows=142240 loops=1)
   Hash Cond: (p.nation_id = n.nation_id)
   -  Seq Scan on persons p
(cost=0.00..4438.29 rows=142288 width=16) (actual time=0.000..112.000
rows=142418 loops=1)
 Filter: ((NOT deleted) AND (NOT inactive))
   -  Hash  (cost=6.55..6.55 rows=44
width=25) (actual time=0.000..0.000 rows=44 loops=1)
 -  Seq Scan on nations n
(cost=0.00..6.55 rows=44 width=25) (actual time=0.000..0.000 rows=44
loops=1)
   Filter: ((continent)::text = 'eu'::text)
 Total runtime: 880.000 ms
(17 rows)

--- Query 2 ---
explain analyze SELECT persons.person_id AS persons_person_id FROM
persons LEFT OUTER JOIN indexing_persons ON persons.person_id =
indexing_persons.person_id WHERE  indexing_persons.person_id IS  NULL
OR persons.modified  indexing_persons.indexed ORDER  BY
persons.modified DESC LIMIT 1000;
   QUERY PLAN
-
 Limit  (cost=17755.23..17757.73 rows=1000 width=16) (actual
time=372.000..372.000 rows=0 loops=1)
   -  Sort  (cost=17755.23..17994.61 rows=95753 width=16) (actual
time=372.000..372.000 rows=0 loops=1)
 Sort Key: persons.modified
 Sort Method:  quicksort  Memory: 25kB
 -  Hash Left Join  (cost=4313.44..12505.20 rows=95753
width=16) (actual time=372.000..372.000 rows=0 loops=1)
   Hash Cond: (persons.person_id = indexing_persons.person_id)
   Filter: ((indexing_persons.person_id IS NULL) OR
(persons.modified  indexing_persons.indexed))
   -  Seq Scan on persons  (cost=0.00..4438.29
rows=143629 width=16) (actual time=0.000..56.000 rows=143629 loops=1)
   -  Hash  (cost=2534.86..2534.86 rows=142286 width=16)
(actual time=140.000..140.000 rows=143629 loops=1)
 -  Seq Scan on indexing_persons
(cost=0.00..2534.86 rows=142286 width=16) (actual time=0.000..72.000
rows=143629 loops=1)
 Total runtime: 372.000 ms
(11 rows)

 Table definitions ---

\d persons
Table public.persons
   Column|   Type   |
Modifiers
-+--+-
 person_id   | bigint   | not null default
nextval('persons_person_id_seq'::regclass)
 givenname   | character varying(100)   | not null
 surname | character varying(100)   | not null
 name_display_short  | character varying(20)| not null
 name_display_long   | character varying(50)| not null
 title   | character varying(50)|
 

Re: [PERFORM] SeqScans on boolen values / How to speed this up?

2010-07-05 Thread Stephen Frost
Jens,

* Jens Hoffrichter (jens.hoffrich...@gmail.com) wrote:
 I'm just curious if there is any way to improve the performance of
 those queries. I'm seeing SeqScans in the EXPLAIN ANALYZE, but nothing
 I have done yet has removed those.

SeqScans aren't necessairly bad.  Also, providing your postgresql.conf
parameters would be useful in doing any kind of analysis work like this.

For starters, why are you using left joins for these queries?  When you
use a left-join and then have a filter on the right-hand table that
requires it to be non-null, you're causing it to be an inner join
anyway.  Fixing that might change/improve the plans you're getting.

 The statements and query plans are:
 
  Query 1 -
 
 explain analyze SELECT
 n.name_short,n.flag,n.nation_id,n.urlidentifier,count(p.person_id) as
 athletes from nations n left join persons p on n.nation_id =
 p.nation_id left join efclicences e on p.person_id = e.person_id where
 continent = 'eu' and  p.deleted = false and p.inactive = false and
 e.fencer = true group by
 n.name_short,n.flag,n.nation_id,n.urlidentifier order by n.name_short;

Alright, for this one, you're processing 144k rows in persons
up into the aggregate, how big is the table?  If it's anything less than
1M, seqscanning that is almost certainly the fastest way.  You could
*test* that theory by disabling seqscans and running the query again for
the timing.  If it's faster, then you probably need to adjust some PG
parameters (eg: effective_cache_size, maybe random_page_cost) for your
system.

QUERY PLAN
 -
  Sort  (cost=9997.21..9997.32 rows=44 width=33) (actual
 time=872.000..872.000 rows=44 loops=1)
Sort Key: n.name_short
Sort Method:  quicksort  Memory: 28kB
-  HashAggregate  (cost=9995.45..9996.01 rows=44 width=33) (actual
 time=872.000..872.000 rows=44 loops=1)
  -  Hash Join  (cost=5669.49..9611.83 rows=30690 width=33)
 (actual time=332.000..720.000 rows=142240 loops=1)
Hash Cond: (e.person_id = p.person_id)
-  Seq Scan on efclicences e  (cost=0.00..2917.29
 rows=143629 width=8) (actual time=0.000..80.000 rows=143629 loops=1)
  Filter: fencer
-  Hash  (cost=5285.87..5285.87 rows=30690 width=33)
 (actual time=332.000..332.000 rows=142240 loops=1)
  -  Hash Join  (cost=7.10..5285.87 rows=30690
 width=33) (actual time=0.000..256.000 rows=142240 loops=1)
Hash Cond: (p.nation_id = n.nation_id)
-  Seq Scan on persons p
 (cost=0.00..4438.29 rows=142288 width=16) (actual time=0.000..112.000
 rows=142418 loops=1)
  Filter: ((NOT deleted) AND (NOT inactive))
-  Hash  (cost=6.55..6.55 rows=44
 width=25) (actual time=0.000..0.000 rows=44 loops=1)
  -  Seq Scan on nations n
 (cost=0.00..6.55 rows=44 width=25) (actual time=0.000..0.000 rows=44
 loops=1)
Filter: ((continent)::text = 
 'eu'::text)
  Total runtime: 880.000 ms
 (17 rows)
 
 --- Query 2 ---
 explain analyze SELECT persons.person_id AS persons_person_id FROM
 persons LEFT OUTER JOIN indexing_persons ON persons.person_id =
 indexing_persons.person_id WHERE  indexing_persons.person_id IS  NULL
 OR persons.modified  indexing_persons.indexed ORDER  BY
 persons.modified DESC LIMIT 1000;

For this one, you might try indexing persons.modified and
indexing_persons.indexed and see if that changes things.

 -
  Limit  (cost=17755.23..17757.73 rows=1000 width=16) (actual
 time=372.000..372.000 rows=0 loops=1)
-  Sort  (cost=17755.23..17994.61 rows=95753 width=16) (actual
 time=372.000..372.000 rows=0 loops=1)
  Sort Key: persons.modified
  Sort Method:  quicksort  Memory: 25kB
  -  Hash Left Join  (cost=4313.44..12505.20 rows=95753
 width=16) (actual time=372.000..372.000 rows=0 loops=1)
Hash Cond: (persons.person_id = indexing_persons.person_id)
Filter: ((indexing_persons.person_id IS NULL) OR
 (persons.modified  indexing_persons.indexed))
-  Seq Scan on persons  (cost=0.00..4438.29
 rows=143629 width=16) (actual time=0.000..56.000 rows=143629 loops=1)
-  Hash  (cost=2534.86..2534.86 rows=142286 width=16)
 (actual time=140.000..140.000 rows=143629 loops=1)
  -  Seq Scan on indexing_persons
 (cost=0.00..2534.86 rows=142286 width=16) (actual time=0.000..72.000
 rows=143629 loops=1)
  Total runtime: 372.000 ms
 (11 rows)

Thanks,

Stephen


signature.asc
Description: 

Re: [PERFORM] using dbt2 postgresql 8.4 - rampup time issue

2010-07-05 Thread MUHAMMAD ASIF

 A clarification of terms may help to start.  The terminals per
 warehouse in the scripts correlates to the number terminals emulated.
  An emulated terminal is tied to a warehouse's district.  In other
 words, the number of terminals translates to the number of districts
 in a warehouse across the entire database.  To increase the terminals
 per warehouse implies you have scaled the database differently, which
 I'm assuming is not the case here.
 

Scale the database … Can you please elaborate ? . To increase  terminals per 
warehouse  I added only one option ( i.e. -t for dbt2-run-workload ) with 
normal dbt2 test i.e. 

./dbt2-pgsql-create-db
./dbt2-pgsql-build-db -d $DBDATA -g -r -w $WAREHOUSES
./dbt2-run-workload -a pgsql -c $DB_CONNECTIONS -d 
$REGRESS_DURATION_SEC -w $WAREHOUSES -o $OUTPUT_DIR -t $TERMINAL_PER_WAREHOUSE
./dbt2-pgsql-stop-db

Is this change enough or I am missing some thing ?

  1.
  Settings :
  DATABASE CONNECTIONS: 50
  TERMINALS PER WAREHOUSE: 10
  SCALE FACTOR (WAREHOUSES): 200
  DURATION OF TEST (in sec): 7200
  Result :
   Response Time (s)
   Transaction  %Average :90th %Total
  Rollbacks  %
    -  -  ---
  ---  -
  Delivery   3.96  0.285 : 0.02326883
  0   0.00
 New Order  45.26  0.360 : 0.010   307335
  3082   1.01
  Order Status   3.98  0.238 : 0.00327059
  0   0.00
   Payment  42.82  0.233 : 0.003   290802
  0   0.00
   Stock Level   3.97  0.245 : 0.00226970
  0   0.00
    -  -  ---
  ---  -
 
  2508.36 new-order transactions per minute (NOTPM)
  120.1 minute duration
  0 total unknown errors
  2000 second(s) ramping up
 
  2.
  Settings :
  DATABASE CONNECTIONS: 50
  TERMINALS PER WAREHOUSE: 40
  SCALE FACTOR (WAREHOUSES): 200
  DURATION OF TEST (in sec): 7200
  Result :
   Response Time (s)
   Transaction  %Average :90th %Total
  Rollbacks  %
    -  -  ---
  ---  -
  Delivery   3.95  8.123 : 4.60543672
  0   0.00
 New Order  45.19 12.205 : 2.563   499356
  4933   1.00
  Order Status   4.00  7.385 : 3.31444175
  0   0.00
   Payment  42.89  7.221 : 1.920   473912
  0   0.00
   Stock Level   3.97  7.093 : 1.88743868
  0   0.00
    -  -  ---
  ---  -
 
  7009.40 new-order transactions per minute (NOTPM)
  69.8 minute duration
  0 total unknown errors
  8016 second(s) ramping up
 

8016 (actual rampup time) + ( 69.8 * 60 ) = 12204 

5010 (estimated rampup time) + 7200 (estimated steady state time) = 
12210 

  3.
  Settings :
  DATABASE CONNECTIONS: 50
  TERMINALS PER WAREHOUSE: 40
  SCALE FACTOR (WAREHOUSES): 200
  DURATION OF TEST (in sec): 7200
  Result :
   Response Time (s)
   Transaction  %Average :90th %Total
  Rollbacks  %
    -  -  ---
  ---  -
  Delivery   3.98  9.095 :16.10315234
  0   0.00
 New Order  45.33  7.896 :14.794   173539
  1661   0.97
  Order Status   3.96  8.165 :13.98915156
  0   0.00
   Payment  42.76  7.295 :12.470   163726
  0   0.00
   Stock Level   3.97  7.198 :12.52015198
  0   0.00
    -  -  ---
  ---  -
 
  10432.09 new-order transactions per minute (NOTPM)
  16.3 minute duration
  0 total unknown errors
  11227 second(s) ramping up

11227 (actual rampup time) + ( 16.3 * 60 ) = 12205 
5010 (estimated rampup time) + 7200 (estimated steady state time) = 12210 

 
  These results show that dbt2 test actually did not run for 2 hours but it
  start varying with the increase of  TERMINALS PER WAREHOUSE value i.e. 1st
  Run ( 120.1 minute duration ), 2nd Run (69.8 minute duration) and 3rd Run
  (16.3 minute duration).
 
 The ramp up times are actually as expected (explained below).  What
 you are witnessing is more likely that the driver is crashing because
 the values are out of range from the scale of the database.  You have
 effectively told the driver that there are more than 10 districts per
 warehouse, and have likely not built the database that way.  I'm
 actually surprised the driver actually ramped up completely.
 

I run the dbt2 test with the following configuration i.e.

WAREHOUSES=100
DB_CONNECTIONS=20
REGRESS_DURATION=7200 #HOURS

Re: [PERFORM] big data - slow select (speech search)

2010-07-05 Thread Robert Haas
On Thu, Jul 1, 2010 at 6:34 PM, Michal Fapso michal.fa...@gmail.com wrote:
 It took about 4.5 seconds. If I rerun it, it takes
 less than 2 miliseconds, but it is because of the cache. I need to
 optimize the first-run.

 laptop ASUS, CPU dual core T2300 1.66GHz, 1.5G RAM

 EXPLAIN ANALYZE SELECT h1.docid
 FROM hyps AS h1
 WHERE h1.wordid=65658;

  Bitmap Heap Scan on hyps h1  (cost=10.97..677.09 rows=171 width=4)
 (actual time=62.106..4416.864 rows=343 loops=1)
   Recheck Cond: (wordid = 65658)
   -  Bitmap Index Scan on hyps_wordid_index  (cost=0.00..10.92
 rows=171 width=0) (actual time=42.969..42.969 rows=343 loops=1)
         Index Cond: (wordid = 65658)
  Total runtime: 4432.015 ms

 If I run the same query in Lucene search engine, it takes 0.105
 seconds on the same data which is quite a huge difference.

So PostgreSQL is reading 343 rows from disk in 4432 ms, or about 12
ms/row.  I'm not an expert on seek times, but that might not really be
that unreasonable, considering that those rows may be scattered all
over the index and thus it may be basically random I/O.  Have you
tried clustering hyps on hyps_wordid_index?  If you had a more
sophisticated disk subsystem you could try increasing
effective_io_concurrency but that's not going to help with only one
spindle.

If you run the same query in Lucene and it takes only 0.105 s, then
Lucene is obviously doing a lot less I/O.  I doubt that any amount of
tuning of your existing schema is going to produce that kind of result
on PostgreSQL.  Using the full-text search stuff, or a gin index of
some kind, might get you closer, but it's hard to beat a
special-purpose engine that implements exactly the right algorithm for
your use case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance issues with postgresql-8.4.0: Query gets stuck sometimes

2010-07-05 Thread Robert Haas
On Fri, Jul 2, 2010 at 1:40 AM, Sachin Kumar sachin...@globallogic.com wrote:
 At times we have observed that postgres stops responding for several
 minutes, even couldn’t fetch the number of entries in a particular table.
 One such instance happens when we execute the following steps:

Sounds sort of like a checkpoint spike.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SeqScans on boolen values / How to speed this up?

2010-07-05 Thread Craig Ringer
On 05/07/10 19:36, Jens Hoffrichter wrote:
 Hello everyone,
 
 We've recently finished developing a bigger webapplication, and we are
 about to put it online.
 
 I ran some load tests yesterday, and configured 'slow query' logging
 beforehand, so I could see if there might be a performance bottleneck
 in the PG. While I discovered no real problems, the log file analysis
 using pgFouine revealed two queries, which are executed often, and
 take quite a bit some time.

It might be worth looking at what queries have results that change
infrequently or don't have to be up to the minute accurate, so they're
candidates for caching. Memcached is an incredibly handy tool for taking
load off your database.

--
Craig Ringer

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SeqScans on boolen values / How to speed this up?

2010-07-05 Thread Scott Marlowe
On Mon, Jul 5, 2010 at 5:36 AM, Jens Hoffrichter
jens.hoffrich...@gmail.com wrote:
 Hello everyone,

 We've recently finished developing a bigger webapplication, and we are
 about to put it online.

If you're checking for bools, and 99.99% of the result is just true or
just false, look at creating partial indexes on the .01% part.

create index  (boolfield) where boolfield is true

(or is false)

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance