[PERFORM] Big difference in time returned by EXPLAIN ANALYZE SELECT ... AND SELECT ...

2010-07-26 Thread Piotr Gasidło
Hello,

I've found strange problem in my database (8.4.4, but also 9.0beta3,
default postgresql.conf, shared_buffers raised to 256MB).

EXPLAIN ANALYZE SELECT ...
Total runtime: 4.782 ms
Time: 25,970 ms

SELECT ...
...
(21 rows)

Time: 23,042 ms

Test done in psql connected by socket to server (same host, using
\timing to get runtime).

Does big difference in Total runtime and Time is normal?

I've notice, that removing one index (not used in query,ocache_*_ukey,
used by me only to have data integrity)  gives me:

EXPLAIN ANALYZE SELECT ...
Total runtime: 3.439 ms
Time: 13,324 ms

Why such big difference in timing (vs. query with this index)?

Query is:
select oc_h_id,oc_duration,SUM(oc_count) FROM ocache WHERE
oc_date_from = '2010-07-22'::date AND oc_date_from =
'2010-07-24'::date AND oc_h_id =
ANY('{32842,3095,27929,2229,22769,3098,33433,22559,226,2130,226,2130,2229,3095,3098,22559,22769,27929,32842,33433}'::int[])
GROUP BY oc_h_id, oc_duration;

EXPLAIN ANALYZE is:

 HashAggregate  (cost=42060.58..42095.53 rows=2796 width=12) (actual
time=4.357..4.368 rows=21 loops=1)
   -  Append  (cost=0.00..41850.91 rows=27955 width=12) (actual
time=0.432..3.928 rows=1439 loops=1)
 -  Seq Scan on ocache  (cost=0.00..17.20 rows=6 width=12)
(actual time=0.002..0.002 rows=0 loops=1)
   Filter: ((oc_date_from = '2010-07-22'::date) AND
(oc_date_from = '2010-07-24'::date) AND (oc_h_id = ANY
('{32842,3095,27929,2229,22769,3098,33433,22559,226,2130,226,2130,2229,3095,3098,22559,22769,27929,32842,33433}'::integer[])))
 -  Bitmap Heap Scan on ocache_2010_7 ocache
(cost=357.41..8117.29 rows=5466 width=12) (actual time=0.430..0.582
rows=196 loops=1)
   Recheck Cond: (oc_h_id = ANY
('{32842,3095,27929,2229,22769,3098,33433,22559,226,2130,226,2130,2229,3095,3098,22559,22769,27929,32842,33433}'::integer[]))
   Filter: ((oc_date_from = '2010-07-22'::date) AND
(oc_date_from = '2010-07-24'::date))
   -  Bitmap Index Scan on ocache_2010_7_oc_h_id_key
(cost=0.00..356.04 rows=16397 width=0) (actual time=0.174..0.174
rows=1156 loops=1)
 Index Cond: (oc_h_id = ANY
('{32842,3095,27929,2229,22769,3098,33433,22559,226,2130,226,2130,2229,3095,3098,22559,22769,27929,32842,33433}'::integer[]))
 -  Bitmap Heap Scan on ocache_2010_8 ocache
(cost=370.91..9067.15 rows=6060 width=12) (actual time=0.175..0.615
rows=562 loops=1)
   Recheck Cond: (oc_h_id = ANY
('{32842,3095,27929,2229,22769,3098,33433,22559,226,2130,226,2130,2229,3095,3098,22559,22769,27929,32842,33433}'::integer[]))
   Filter: ((oc_date_from = '2010-07-22'::date) AND
(oc_date_from = '2010-07-24'::date))
   -  Bitmap Index Scan on ocache_2010_8_oc_h_id_key
(cost=0.00..369.39 rows=18181 width=0) (actual time=0.156..0.156
rows=1124 loops=1)
 Index Cond: (oc_h_id = ANY
('{32842,3095,27929,2229,22769,3098,33433,22559,226,2130,226,2130,2229,3095,3098,22559,22769,27929,32842,33433}'::integer[]))
 -  Bitmap Heap Scan on ocache_2010_9 ocache
(cost=389.47..9891.79 rows=6703 width=12) (actual time=0.158..0.513
rows=448 loops=1)
   Recheck Cond: (oc_h_id = ANY
('{32842,3095,27929,2229,22769,3098,33433,22559,226,2130,226,2130,2229,3095,3098,22559,22769,27929,32842,33433}'::integer[]))
   Filter: ((oc_date_from = '2010-07-22'::date) AND
(oc_date_from = '2010-07-24'::date))
   -  Bitmap Index Scan on ocache_2010_9_oc_h_id_key
(cost=0.00..387.80 rows=20108 width=0) (actual time=0.140..0.140
rows=896 loops=1)
 Index Cond: (oc_h_id = ANY
('{32842,3095,27929,2229,22769,3098,33433,22559,226,2130,226,2130,2229,3095,3098,22559,22769,27929,32842,33433}'::integer[]))
 -  Bitmap Heap Scan on ocache_2010_10 ocache
(cost=268.42..6735.90 rows=4721 width=12) (actual time=0.107..0.300
rows=229 loops=1)
   Recheck Cond: (oc_h_id = ANY
('{32842,3095,27929,2229,22769,3098,33433,22559,226,2130,226,2130,2229,3095,3098,22559,22769,27929,32842,33433}'::integer[]))
   Filter: ((oc_date_from = '2010-07-22'::date) AND
(oc_date_from = '2010-07-24'::date))
   -  Bitmap Index Scan on ocache_2010_10_oc_h_id_key
(cost=0.00..267.24 rows=14162 width=0) (actual time=0.096..0.096
rows=458 loops=1)
 Index Cond: (oc_h_id = ANY
('{32842,3095,27929,2229,22769,3098,33433,22559,226,2130,226,2130,2229,3095,3098,22559,22769,27929,32842,33433}'::integer[]))
 -  Bitmap Heap Scan on ocache_2010_11 ocache
(cost=139.48..3340.84 rows=2395 width=12) (actual time=0.046..0.047
rows=4 loops=1)
   Recheck Cond: (oc_h_id = ANY
('{32842,3095,27929,2229,22769,3098,33433,22559,226,2130,226,2130,2229,3095,3098,22559,22769,27929,32842,33433}'::integer[]))
   Filter: ((oc_date_from = '2010-07-22'::date) AND
(oc_date_from = '2010-07-24'::date))
   -  Bitmap Index Scan on ocache_2010_11_oc_h_id_key
(cost=0.00..138.88 rows=7184 width=0) (actual 

[PERFORM] Re: Big difference in time returned by EXPLAIN ANALYZE SELECT ... AND SELECT ...

2010-07-26 Thread Piotr Gasidło
W dniu 26 lipca 2010 10:35 użytkownik Piotr Gasidło
qua...@barbara.eu.org napisał:
 \d+ ocache_2010_12
                              Table public.ocache_2010_12
 Indexes:
 (...)

Missed index in listing:
ocache_2010_12_ukey UNIQUE, btree (oc_date_from, oc_date_to,
oc_h_id, oc_transport, oc_ofc_id_1) WITH (fillfactor=80)

-- 
Piotr Gasidło

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Explains of queries to partitioned tables

2010-07-26 Thread Vlad Arkhipov

There is a partitioned table with 2 partitions:

drop table if exists p cascade;

create table p (
  id bigint not null primary key,
  ts timestamp);

create table p_actual ( check (ts is null) ) inherits (p);
create table p_historical ( check (ts is not null) ) inherits (p);

-- I skipped the triggers and rules creation

insert into p (id, ts) values (1, '2000-01-01');
insert into p (id, ts) values (2, null);
insert into p (id, ts) values (3, '2001-01-01');
insert into p (id, ts) values (4, '2005-01-01');

analyze p;
analyze p_actual;
analyze p_historical;

Here is the explain output for the query 'select * from p where ts is null'

Result  (cost=0.00..188.10 rows=10 width=16) (actual time=0.028..0.038 
rows=1 loops=1)
  -  Append  (cost=0.00..188.10 rows=10 width=16) (actual 
time=0.023..0.029 rows=1 loops=1)
-  Seq Scan on p  (cost=0.00..187.00 rows=9 width=16) (actual 
time=0.002..0.002 rows=0 loops=1)

  Filter: (ts IS NULL)
-  Seq Scan on p_actual p  (cost=0.00..1.10 rows=1 width=16) 
(actual time=0.014..0.016 rows=1 loops=1)

  Filter: (ts IS NULL)
Total runtime: 0.080 ms

You can notice that the optimizer expects 10 rows in the table p and as 
a result of this assumption the whole query is estimated as 10 rows. 
Whether it will cause a performance impact further? pg_stats does not 
contain any statistics on the table 'p'. Is this a cause of such behaviour?
The estimation is worse for some other queries, for example 'select * 
from p where ts is not null'


Result  (cost=0.00..188.30 rows=1764 width=16) (actual time=0.021..0.049 
rows=3 loops=1)
  -  Append  (cost=0.00..188.30 rows=1764 width=16) (actual 
time=0.016..0.032 rows=3 loops=1)
-  Seq Scan on p  (cost=0.00..187.00 rows=1761 width=16) 
(actual time=0.003..0.003 rows=0 loops=1)

  Filter: (ts IS NOT NULL)
-  Seq Scan on p_historical p  (cost=0.00..1.30 rows=3 
width=16) (actual time=0.008..0.015 rows=3 loops=1)

  Filter: (ts IS NOT NULL)
Total runtime: 0.095 ms


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Big difference in time returned by EXPLAIN ANALYZE SELECT ... AND SELECT ...

2010-07-26 Thread Craig Ringer
On 26/07/10 16:35, Piotr Gasidło wrote:
 Hello,
 
 I've found strange problem in my database (8.4.4, but also 9.0beta3,
 default postgresql.conf, shared_buffers raised to 256MB).
 
 EXPLAIN ANALYZE SELECT ...
 Total runtime: 4.782 ms
 Time: 25,970 ms
 
 SELECT ...
 ...
 (21 rows)
 
 Time: 23,042 ms
 
 Test done in psql connected by socket to server (same host, using
 \timing to get runtime).
 
 Does big difference in Total runtime and Time is normal?

Given that EXPLAIN ANALYZE doesn't transfer large rowsets to the client,
it can't really be time taken to transfer the data, which is the usual
difference between 'explain analyze' timings and psql client-side timings.

Given that, I'm wondering if the difference in this case is planning
time. I can't really imagine the query planner taking 20 seconds (!!) to
run, though, no matter how horrifyingly complicated the query and table
structure were, unless there was something going wrong.

Another possibility, then, is that for some reason queries are being
delayed from starting or delayed before results are being returned, so
the server completes them in a short amount of time but it takes a while
for psql to find out they're finished.

In your position, at this point I'd be doing things like hooking a
debugger up to the postgres backend and interrupting its execution
periodically to see what it's up to while this query runs. I'd also be
using wireshark to look at network activity to see if there were any
clues there. I'd be using top, vmstat and iostat to examine
system-level load if it was practical to leave the system otherwise
idle, so I could see if CPU/memory/disk were in demand, and for how long.

--
Craig Ringer

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-26 Thread Yeb Havinga

Yeb Havinga wrote:

Greg Smith wrote:
Put it on ext3, toggle on noatime, and move on to testing.  The 
overhead of the metadata writes is the least of the problems when 
doing write-heavy stuff on Linux.
I ran a pgbench run and power failure test during pgbench with a 3 
year old computer



On the same config more tests.

scale 10 read only and read/write tests. note: only 240 s.

starting vacuum...end.
transaction type: SELECT only
scaling factor: 10
query mode: prepared
number of clients: 10
duration: 240 s
number of transactions actually processed: 8208115
tps = 34197.109896 (including connections establishing)
tps = 34200.658720 (excluding connections establishing)

y...@client45:~$ pgbench -c 10 -l -M prepared -T 240 test
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 10
query mode: prepared
number of clients: 10
duration: 240 s
number of transactions actually processed: 809271
tps = 3371.147020 (including connections establishing)
tps = 3371.518611 (excluding connections establishing)

--
scale 300 (just fits in RAM) read only and read/write tests

pgbench -c 10 -M prepared -T 300 -S test
starting vacuum...end.
transaction type: SELECT only
scaling factor: 300
query mode: prepared
number of clients: 10
duration: 300 s
number of transactions actually processed: 9219279
tps = 30726.931095 (including connections establishing)
tps = 30729.692823 (excluding connections establishing)

The test above doesn't really test the drive but shows the CPU/RAM limit.

pgbench -c 10 -l -M prepared -T 3600 test
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 300
query mode: prepared
number of clients: 10
duration: 3600 s
number of transactions actually processed: 8838200
tps = 2454.994217 (including connections establishing)
tps = 2455.012480 (excluding connections establishing)

--
scale 2000

pgbench -c 10 -M prepared -T 300 -S test
starting vacuum...end.
transaction type: SELECT only
scaling factor: 2000
query mode: prepared
number of clients: 10
duration: 300 s
number of transactions actually processed: 755772
tps = 2518.547576 (including connections establishing)
tps = 2518.762476 (excluding connections establishing)

So the test above tests the random seek performance. Iostat on the drive 
showed a steady just over 4000 read io's/s:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
 11.390.00   13.37   60.400.00   14.85
Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s 
avgrq-sz avgqu-sz   await  svctm  %util
sda   0.00 0.00 4171.000.00 60624.00 0.00
29.0711.812.83   0.24 100.00
sdb   0.00 0.000.000.00 0.00 0.00 
0.00 0.000.00   0.00   0.00


pgbench -c 10 -l -M prepared -T 24000 test
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 2000
query mode: prepared
number of clients: 10
duration: 24000 s
number of transactions actually processed: 30815691
tps = 1283.979098 (including connections establishing)
tps = 1283.980446 (excluding connections establishing)

Note the duration of several hours. No long waits occurred - of this 
last test the latency png is at http://yfrog.com/f/0vlatencywp/ and the 
TPS graph at http://yfrog.com/f/b5tpsp/


regards,
Yeb Havinga


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Big difference in time returned by EXPLAIN ANALYZE SELECT ... AND SELECT ...

2010-07-26 Thread Vitalii Tymchyshyn

26.07.10 12:15, Craig Ringer написав(ла):

On 26/07/10 16:35, Piotr Gasidło wrote:
   

Hello,

I've found strange problem in my database (8.4.4, but also 9.0beta3,
default postgresql.conf, shared_buffers raised to 256MB).

EXPLAIN ANALYZE SELECT ...
Total runtime: 4.782 ms
Time: 25,970 ms

SELECT ...
...
(21 rows)

Time: 23,042 ms

Test done in psql connected by socket to server (same host, using
\timing to get runtime).

Does big difference in Total runtime and Time is normal?
 

Given that EXPLAIN ANALYZE doesn't transfer large rowsets to the client,
it can't really be time taken to transfer the data, which is the usual
difference between 'explain analyze' timings and psql client-side timings.

Given that, I'm wondering if the difference in this case is planning
time. I can't really imagine the query planner taking 20 seconds (!!) to
run, though, no matter how horrifyingly complicated the query and table
structure were, unless there was something going wrong.
   

Actually it's 20ms, so I suspect your point about planning time is correct.
Piotr: You can try preparing your statement and then analyzing execute 
time to check if this is planning time.


Best regards, Vitalii Tymchyshyn

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-26 Thread Matthew Wakeling

On Sun, 25 Jul 2010, Yeb Havinga wrote:
Graph of TPS at http://tinypic.com/r/b96aup/3 and latency at 
http://tinypic.com/r/x5e846/3


Does your latency graph really have milliseconds as the y axis? If so, 
this device is really slow - some requests have a latency of more than a 
second!


Matthew

--
The early bird gets the worm, but the second mouse gets the cheese.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Big difference in time returned by EXPLAIN ANALYZE SELECT ... AND SELECT ...

2010-07-26 Thread Craig Ringer
On 26/07/10 17:25, Vitalii Tymchyshyn wrote:
 26.07.10 12:15, Craig Ringer написав(ла):
 On 26/07/10 16:35, Piotr Gasidło wrote:
   
 Hello,

 I've found strange problem in my database (8.4.4, but also 9.0beta3,
 default postgresql.conf, shared_buffers raised to 256MB).

 EXPLAIN ANALYZE SELECT ...
 Total runtime: 4.782 ms
 Time: 25,970 ms

 Given that, I'm wondering if the difference in this case is planning
 time. I can't really imagine the query planner taking 20 seconds (!!) to
 run, though, no matter how horrifyingly complicated the query and table
 structure were, unless there was something going wrong.

 Actually it's 20ms, so I suspect your point about planning time is correct.

Oh, a commas-as-fraction-separator locale.

That makes sense. Thanks for the catch.

--
Craig Ringer

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Big difference in time returned by EXPLAIN ANALYZE SELECT ... AND SELECT ...

2010-07-26 Thread Piotr Gasidło
2010/7/26 Vitalii Tymchyshyn tiv...@gmail.com:
 26.07.10 12:15, Craig Ringer написав(ла):
 (...)
 Piotr: You can try preparing your statement and then analyzing execute time
 to check if this is planning time.

You are right.

I've done simple PREPARE (without params, etc).

 REPARE query as select oc_h_id,oc_duration,SUM(oc_count) FROM ocache WHERE 
 oc_date_from = '2010-07-22'::date AND oc_date_from = '2010-07-24'::date AND 
 oc_h_id = 
 ANY('{32842,3095,27929,2229,22769,3098,33433,22559,226,2130,226,2130,2229,3095,3098,22559,22769,27929,32842,33433}'::int[])
  GROUP BY oc_h_id, oc_duration;
PREPARE
Time: 19,873 ms

 EXPLAIN ANALYZE EXECUTE query;
...
Total runtime: 3.237 ms
Time: 5,118 ms

 EXECUTE query;
 oc_h_id | oc_duration | sum
-+-+--
   27929 |   7 |  546
3098 |   7 |  552
   27929 |  14 |  512
3098 |  14 |  444
   22769 |  14 |  984
   32842 |  14 |  444
   27929 |  22 |4
   27929 |  15 |   44
   32842 |   7 |  552
   22769 |   7 | 1356
2229 |   7 |  496
 226 |  14 |  536
2130 |   7 |  536
2130 |  14 |  448
 226 |   7 |  584
2229 |  14 |  400
   33433 |  14 |  444
3095 |   7 |  552
   33433 |   7 |  552
3095 |  14 |  444
   27929 |   8 |   40
(21 rows)

Time: 3,494 ms

The time matches EXPLAIN ANALYZE runtime.

Compared to not prepared query, its big difference!

 select oc_h_id,oc_duration,SUM(oc_count) FROM ocache WHERE oc_date_from = 
 '2010-07-22'::date AND oc_date_from = '2010-07-24'::date AND oc_h_id = 
 ANY('{32842,3095,27929,2229,22769,3098,33433,22559,226,2130,226,2130,2229,3095,3098,22559,22769,27929,32842,33433}'::int[])
  GROUP BY oc_h_id, oc_duration;
 oc_h_id | oc_duration | sum
-+-+--
   27929 |   7 |  546
3098 |   7 |  552
   27929 |  14 |  512
3098 |  14 |  444
   22769 |  14 |  984
   32842 |  14 |  444
   27929 |  22 |4
   27929 |  15 |   44
   32842 |   7 |  552
   22769 |   7 | 1356
2229 |   7 |  496
 226 |  14 |  536
2130 |   7 |  536
2130 |  14 |  448
 226 |   7 |  584
2229 |  14 |  400
   33433 |  14 |  444
3095 |   7 |  552
   33433 |   7 |  552
3095 |  14 |  444
   27929 |   8 |   40
(21 rows)

Time: 22,571 ms

Ok. Is there any way to tune postgresql, to shorten planning time for
such queries?

-- 
Piotr Gasidło

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-26 Thread Yeb Havinga

Matthew Wakeling wrote:

On Sun, 25 Jul 2010, Yeb Havinga wrote:
Graph of TPS at http://tinypic.com/r/b96aup/3 and latency at 
http://tinypic.com/r/x5e846/3


Does your latency graph really have milliseconds as the y axis?

Yes
If so, this device is really slow - some requests have a latency of 
more than a second!
I try to just give the facts. Please remember that particular graphs are 
from a read/write pgbench run on a bigger than RAM database that ran for 
some time (so with checkpoints), on a *single* $435 50GB drive without 
BBU raid controller. Also, this is a picture with a few million points: 
the ones above 200ms are perhaps a hundred and hence make up a very 
small fraction.


So far I'm pretty impressed with this drive. Lets be fair to OCZ and the 
SandForce guys and do not shoot from the hip things like really slow, 
without that being backed by a graphed pgbench run together with it's 
cost, so we can compare numbers with numbers.


regards,
Yeb Havinga


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-26 Thread Greg Smith

Matthew Wakeling wrote:
Does your latency graph really have milliseconds as the y axis? If so, 
this device is really slow - some requests have a latency of more than 
a second!


Have you tried that yourself?  If you generate one of those with 
standard hard drives and a BBWC under Linux, I expect you'll discover 
those latencies to be 5 seconds long.  I recently saw 100 *seconds* 
running a large pgbench test due to latency flushing things to disk, on 
a system with 72GB of RAM.  Takes a long time to flush 3GB of random 
I/O out to disk when the kernel will happily cache that many writes 
until checkpoint time.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-26 Thread Greg Smith

Yeb Havinga wrote:
Please remember that particular graphs are from a read/write pgbench 
run on a bigger than RAM database that ran for some time (so with 
checkpoints), on a *single* $435 50GB drive without BBU raid controller.


To get similar *average* performance results you'd need to put about 4 
drives and a BBU into a server.  The worst-case latency on that solution 
is pretty bad though, when a lot of random writes are queued up; I 
suspect that's where the SSD will look much better.


By the way:  if you want to run a lot more tests in an organized 
fashion, that's what http://github.com/gregs1104/pgbench-tools was 
written to do.  That will spit out graphs by client and by scale showing 
how sensitive the test results are to each.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-26 Thread Matthew Wakeling

On Mon, 26 Jul 2010, Greg Smith wrote:

Matthew Wakeling wrote:
Does your latency graph really have milliseconds as the y axis? If so, this 
device is really slow - some requests have a latency of more than a second!


Have you tried that yourself?  If you generate one of those with standard 
hard drives and a BBWC under Linux, I expect you'll discover those latencies 
to be 5 seconds long.  I recently saw 100 *seconds* running a large pgbench 
test due to latency flushing things to disk, on a system with 72GB of RAM. 
Takes a long time to flush 3GB of random I/O out to disk when the kernel 
will happily cache that many writes until checkpoint time.


Apologies, I was interpreting the graph as the latency of the device, not 
all the layers in-between as well. There isn't any indication in the email 
with the graph as to what the test conditions or software are. Obviously 
if you factor in checkpoints and the OS writing out everything, then you 
would have to expect some large latency operations. However, if the device 
itself behaved as in the graph, I would be most unhappy and send it back.


Yeb also made the point - there are far too many points on that graph to 
really tell what the average latency is. It'd be instructive to have a few 
figures, like only x% of requests took longer than y.


Matthew

--
I wouldn't be so paranoid if you weren't all out to get me!!

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-26 Thread Yeb Havinga

Matthew Wakeling wrote:
Apologies, I was interpreting the graph as the latency of the device, 
not all the layers in-between as well. There isn't any indication in 
the email with the graph as to what the test conditions or software are.
That info was in the email preceding the graph mail, but I see now I 
forgot to mention it was a 8.4.4 postgres version.


regards,
Yeb Havinga


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-26 Thread Greg Spiegelberg
On Mon, Jul 26, 2010 at 10:26 AM, Yeb Havinga yebhavi...@gmail.com wrote:

 Matthew Wakeling wrote:

 Apologies, I was interpreting the graph as the latency of the device, not
 all the layers in-between as well. There isn't any indication in the email
 with the graph as to what the test conditions or software are.

 That info was in the email preceding the graph mail, but I see now I forgot
 to mention it was a 8.4.4 postgres version.


Speaking of the layers in-between, has this test been done with the ext3
journal on a different device?  Maybe the purpose is wrong for the SSD.  Use
the SSD for the ext3 journal and the spindled drives for filesystem?
Another possibility is to use ext2 on the SSD.

Greg


Re: [PERFORM] Testing Sandforce SSD

2010-07-26 Thread Greg Smith

Matthew Wakeling wrote:
Yeb also made the point - there are far too many points on that graph 
to really tell what the average latency is. It'd be instructive to 
have a few figures, like only x% of requests took longer than y.


Average latency is the inverse of TPS.  So if the result is, say, 1200 
TPS, that means the average latency is 1 / (1200 transactions/second) = 
0.83 milliseconds/transaction.  The average TPS figure is normally on a 
more useful scale as far as being able to compare them in ways that make 
sense to people.


pgbench-tools derives average, worst-case, and 90th percentile figures 
for latency from the logs.  I have 37MB worth of graphs from a system 
showing how all this typically works for regular hard drives I've been 
given permission to publish; just need to find a place to host it at 
internally and I'll make the whole stack available to the world.  So far 
Yeb's data is showing that a single SSD is competitive with a small 
array on average, but with better worst-case behavior than I'm used to 
seeing.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-26 Thread Greg Smith

Greg Spiegelberg wrote:
Speaking of the layers in-between, has this test been done with the 
ext3 journal on a different device?  Maybe the purpose is wrong for 
the SSD.  Use the SSD for the ext3 journal and the spindled drives for 
filesystem?  


The main disk bottleneck on PostgreSQL databases are the random seeks 
for reading and writing to the main data blocks.  The journal 
information is practically noise in comparison--it barely matters 
because it's so much less difficult to keep up with.  This is why I 
don't really find ext2 interesting either.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-26 Thread Kevin Grittner
Greg Smith g...@2ndquadrant.com wrote:
 
 Yeb's data is showing that a single SSD is competitive with a
 small array on average, but with better worst-case behavior than
 I'm used to seeing.
 
So, how long before someone benchmarks a small array of SSDs?  :-)
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-26 Thread Yeb Havinga

Greg Smith wrote:

Yeb Havinga wrote:
Please remember that particular graphs are from a read/write pgbench 
run on a bigger than RAM database that ran for some time (so with 
checkpoints), on a *single* $435 50GB drive without BBU raid controller.


To get similar *average* performance results you'd need to put about 4 
drives and a BBU into a server.  The worst-case latency on that 
solution is pretty bad though, when a lot of random writes are queued 
up; I suspect that's where the SSD will look much better.


By the way:  if you want to run a lot more tests in an organized 
fashion, that's what http://github.com/gregs1104/pgbench-tools was 
written to do.  That will spit out graphs by client and by scale 
showing how sensitive the test results are to each.

Got it, running the default config right now.

When you say 'comparable to a small array' - could you give a ballpark 
figure for 'small'?


regards,
Yeb Havinga

PS: Some update on the testing: I did some ext3,ext4,xfs,jfs and also 
ext2 tests on the just-in-memory read/write test. (scale 300) No real 
winners or losers, though ext2 isn't really faster and the manual need 
for fix (y) during boot makes it impractical in its standard 
configuration. I did some poweroff tests with barriers explicitily off 
in ext3, ext4 and xfs, still all recoveries went ok.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-26 Thread Yeb Havinga

Yeb Havinga wrote:
To get similar *average* performance results you'd need to put about 
4 drives and a BBU into a server.  The 


Please forget this question, I now see it in the mail i'm replying to. 
Sorry for the spam!


-- Yeb


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-26 Thread Greg Smith

Yeb Havinga wrote:
I did some ext3,ext4,xfs,jfs and also ext2 tests on the just-in-memory 
read/write test. (scale 300) No real winners or losers, though ext2 
isn't really faster and the manual need for fix (y) during boot makes 
it impractical in its standard configuration.


That's what happens every time I try it too.  The theoretical benefits 
of ext2 for hosting PostgreSQL just don't translate into significant 
performance increases on database oriented tests, certainly not ones 
that would justify the downside of having fsck issues come back again.  
Glad to see that holds true on this hardware too.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-26 Thread Scott Marlowe
On Mon, Jul 26, 2010 at 12:40 PM, Greg Smith g...@2ndquadrant.com wrote:
 Greg Spiegelberg wrote:

 Speaking of the layers in-between, has this test been done with the ext3
 journal on a different device?  Maybe the purpose is wrong for the SSD.  Use
 the SSD for the ext3 journal and the spindled drives for filesystem?

 The main disk bottleneck on PostgreSQL databases are the random seeks for
 reading and writing to the main data blocks.  The journal information is
 practically noise in comparison--it barely matters because it's so much less
 difficult to keep up with.  This is why I don't really find ext2 interesting
 either.

Note that SSDs aren't usually real fast at large sequential writes
though, so it might be worth putting pg_xlog on a spinning pair in a
mirror and seeing how much, if any, the SSD drive speeds up when not
having to do pg_xlog.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-26 Thread Greg Spiegelberg
On Mon, Jul 26, 2010 at 1:45 PM, Greg Smith g...@2ndquadrant.com wrote:

 Yeb Havinga wrote:

 I did some ext3,ext4,xfs,jfs and also ext2 tests on the just-in-memory
 read/write test. (scale 300) No real winners or losers, though ext2 isn't
 really faster and the manual need for fix (y) during boot makes it
 impractical in its standard configuration.


 That's what happens every time I try it too.  The theoretical benefits of
 ext2 for hosting PostgreSQL just don't translate into significant
 performance increases on database oriented tests, certainly not ones that
 would justify the downside of having fsck issues come back again.  Glad to
 see that holds true on this hardware too.


I know I'm talking development now but is there a case for a pg_xlog block
device to remove the file system overhead and guaranteeing your data is
written sequentially every time?

Greg


Re: [PERFORM] Strange explain on partitioned tables

2010-07-26 Thread Gerald Fontenay

Joshua D. Drake wrote:

On Fri, 2010-07-23 at 15:03 -0700, Gerald Fontenay wrote:
  
The PostgreSQL partitioning system is aimed to support perhaps a 
hundred inherited tables.  You can expect to get poor performance on 
queries if you create 1000 of them. 
  

Hi,

Why is that you would expect poor performance for say 1000 or more? I 
have a ~1000 inherited tables and I don't see any significant slowdowns. 
I only ever access a single inherited table at a time though in this 
situation. I suppose I am using inheritance only for organization in 
this case...



It is variable based on workload and as I recall has to do with the
planning time. As the number of children increases, so does the planning
time.

Joshua D. Drake


  
Thank you for your response. So if I query only my target child table, 
this should be just like any other single table wrt planning right? I 
have thousands of these tables. (I suppose that I'm only using 
inheritance for the sake of organization in this situation...)


Thanks!
Gerald


Re: [PERFORM] Testing Sandforce SSD

2010-07-26 Thread Andres Freund
On Mon, Jul 26, 2010 at 03:23:20PM -0600, Greg Spiegelberg wrote:
 On Mon, Jul 26, 2010 at 1:45 PM, Greg Smith g...@2ndquadrant.com wrote:
  Yeb Havinga wrote:
  I did some ext3,ext4,xfs,jfs and also ext2 tests on the just-in-memory
  read/write test. (scale 300) No real winners or losers, though ext2 isn't
  really faster and the manual need for fix (y) during boot makes it
  impractical in its standard configuration.
 
 
  That's what happens every time I try it too.  The theoretical benefits of
  ext2 for hosting PostgreSQL just don't translate into significant
  performance increases on database oriented tests, certainly not ones that
  would justify the downside of having fsck issues come back again.  Glad to
  see that holds true on this hardware too.
 I know I'm talking development now but is there a case for a pg_xlog block
 device to remove the file system overhead and guaranteeing your data is
 written sequentially every time?
For one I doubt that its a relevant enough efficiency loss in
comparison with a significantly significantly complex implementation
(for one you cant grow/shrink, for another you have to do more
complex, hw-dependent things like rounding to hardware boundaries,
page size etc to stay efficient) for another my experience is that at
a relatively low point XlogInsert gets to be the bottleneck - so I
don't see much point in improving at that low level (yet at least).

Where I would like to do some hw dependent measuring (because I see
significant improvements there) would be prefetching for seqscan,
indexscans et al. using blktrace... But I currently dont have the
time. And its another topic ;-)

Andres

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-26 Thread Greg Smith

Greg Spiegelberg wrote:
I know I'm talking development now but is there a case for a pg_xlog 
block device to remove the file system overhead and guaranteeing your 
data is written sequentially every time?


It's possible to set the PostgreSQL wal_sync_method parameter in the 
database to open_datasync or open_sync, and if you have an operating 
system that supports direct writes it will use those and bypass things 
like the OS write cache.  That's close to what you're suggesting, 
supposedly portable, and it does show some significant benefit when it's 
properly supported.  Problem has been, the synchronous writing code on 
Linux in particular hasn't ever worked right against ext3, and the 
PostgreSQL code doesn't make the right call at all on Solaris.  So 
there's two popular platforms that it just plain doesn't work on, even 
though it should.


We've gotten reports that there are bleeding edge Linux kernel and 
library versions available now that finally fix that issue, and that 
PostgreSQL automatically takes advantage of them when it's compiled on 
one of them.  But I'm not aware of any distribution that makes this easy 
to try out that's available yet, paint is still wet on the code I think.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Big difference in time returned by EXPLAIN ANALYZE SELECT ... AND SELECT ...

2010-07-26 Thread Tom Lane
=?UTF-8?Q?Piotr_Gasid=C5=82o?= qua...@barbara.eu.org writes:
 Ok. Is there any way to tune postgresql, to shorten planning time for
 such queries?

You've got a ridiculously large number of partitions.  Use fewer.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Linux Filesystems again - Ubuntu this time

2010-07-26 Thread Mark Kirkwood
I'm involved with the setup of replacement hardware for one of our 
systems. It is going to be using Ubuntu Lucid Server (kernel 2.6.32 I 
believe). The issue of filesystems has raised its head again.


I note that ext4 is now the default for Lucid, what do folks think about 
using it: stable enough now? Also xfs has seen quite a bit of 
development in these later kernels, any thoughts on that?


Cheers

Mark

P.s: We are quite keen to move away from ext3, as we have encountered 
its tendency to hit a wall under heavy load and leave us waiting for 
kjournald and pdflush to catch up


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance