[PERFORM] Big difference in time returned by EXPLAIN ANALYZE SELECT ... AND SELECT ...
Hello, I've found strange problem in my database (8.4.4, but also 9.0beta3, default postgresql.conf, shared_buffers raised to 256MB). EXPLAIN ANALYZE SELECT ... Total runtime: 4.782 ms Time: 25,970 ms SELECT ... ... (21 rows) Time: 23,042 ms Test done in psql connected by socket to server (same host, using \timing to get runtime). Does big difference in Total runtime and Time is normal? I've notice, that removing one index (not used in query,ocache_*_ukey, used by me only to have data integrity) gives me: EXPLAIN ANALYZE SELECT ... Total runtime: 3.439 ms Time: 13,324 ms Why such big difference in timing (vs. query with this index)? Query is: select oc_h_id,oc_duration,SUM(oc_count) FROM ocache WHERE oc_date_from = '2010-07-22'::date AND oc_date_from = '2010-07-24'::date AND oc_h_id = ANY('{32842,3095,27929,2229,22769,3098,33433,22559,226,2130,226,2130,2229,3095,3098,22559,22769,27929,32842,33433}'::int[]) GROUP BY oc_h_id, oc_duration; EXPLAIN ANALYZE is: HashAggregate (cost=42060.58..42095.53 rows=2796 width=12) (actual time=4.357..4.368 rows=21 loops=1) - Append (cost=0.00..41850.91 rows=27955 width=12) (actual time=0.432..3.928 rows=1439 loops=1) - Seq Scan on ocache (cost=0.00..17.20 rows=6 width=12) (actual time=0.002..0.002 rows=0 loops=1) Filter: ((oc_date_from = '2010-07-22'::date) AND (oc_date_from = '2010-07-24'::date) AND (oc_h_id = ANY ('{32842,3095,27929,2229,22769,3098,33433,22559,226,2130,226,2130,2229,3095,3098,22559,22769,27929,32842,33433}'::integer[]))) - Bitmap Heap Scan on ocache_2010_7 ocache (cost=357.41..8117.29 rows=5466 width=12) (actual time=0.430..0.582 rows=196 loops=1) Recheck Cond: (oc_h_id = ANY ('{32842,3095,27929,2229,22769,3098,33433,22559,226,2130,226,2130,2229,3095,3098,22559,22769,27929,32842,33433}'::integer[])) Filter: ((oc_date_from = '2010-07-22'::date) AND (oc_date_from = '2010-07-24'::date)) - Bitmap Index Scan on ocache_2010_7_oc_h_id_key (cost=0.00..356.04 rows=16397 width=0) (actual time=0.174..0.174 rows=1156 loops=1) Index Cond: (oc_h_id = ANY ('{32842,3095,27929,2229,22769,3098,33433,22559,226,2130,226,2130,2229,3095,3098,22559,22769,27929,32842,33433}'::integer[])) - Bitmap Heap Scan on ocache_2010_8 ocache (cost=370.91..9067.15 rows=6060 width=12) (actual time=0.175..0.615 rows=562 loops=1) Recheck Cond: (oc_h_id = ANY ('{32842,3095,27929,2229,22769,3098,33433,22559,226,2130,226,2130,2229,3095,3098,22559,22769,27929,32842,33433}'::integer[])) Filter: ((oc_date_from = '2010-07-22'::date) AND (oc_date_from = '2010-07-24'::date)) - Bitmap Index Scan on ocache_2010_8_oc_h_id_key (cost=0.00..369.39 rows=18181 width=0) (actual time=0.156..0.156 rows=1124 loops=1) Index Cond: (oc_h_id = ANY ('{32842,3095,27929,2229,22769,3098,33433,22559,226,2130,226,2130,2229,3095,3098,22559,22769,27929,32842,33433}'::integer[])) - Bitmap Heap Scan on ocache_2010_9 ocache (cost=389.47..9891.79 rows=6703 width=12) (actual time=0.158..0.513 rows=448 loops=1) Recheck Cond: (oc_h_id = ANY ('{32842,3095,27929,2229,22769,3098,33433,22559,226,2130,226,2130,2229,3095,3098,22559,22769,27929,32842,33433}'::integer[])) Filter: ((oc_date_from = '2010-07-22'::date) AND (oc_date_from = '2010-07-24'::date)) - Bitmap Index Scan on ocache_2010_9_oc_h_id_key (cost=0.00..387.80 rows=20108 width=0) (actual time=0.140..0.140 rows=896 loops=1) Index Cond: (oc_h_id = ANY ('{32842,3095,27929,2229,22769,3098,33433,22559,226,2130,226,2130,2229,3095,3098,22559,22769,27929,32842,33433}'::integer[])) - Bitmap Heap Scan on ocache_2010_10 ocache (cost=268.42..6735.90 rows=4721 width=12) (actual time=0.107..0.300 rows=229 loops=1) Recheck Cond: (oc_h_id = ANY ('{32842,3095,27929,2229,22769,3098,33433,22559,226,2130,226,2130,2229,3095,3098,22559,22769,27929,32842,33433}'::integer[])) Filter: ((oc_date_from = '2010-07-22'::date) AND (oc_date_from = '2010-07-24'::date)) - Bitmap Index Scan on ocache_2010_10_oc_h_id_key (cost=0.00..267.24 rows=14162 width=0) (actual time=0.096..0.096 rows=458 loops=1) Index Cond: (oc_h_id = ANY ('{32842,3095,27929,2229,22769,3098,33433,22559,226,2130,226,2130,2229,3095,3098,22559,22769,27929,32842,33433}'::integer[])) - Bitmap Heap Scan on ocache_2010_11 ocache (cost=139.48..3340.84 rows=2395 width=12) (actual time=0.046..0.047 rows=4 loops=1) Recheck Cond: (oc_h_id = ANY ('{32842,3095,27929,2229,22769,3098,33433,22559,226,2130,226,2130,2229,3095,3098,22559,22769,27929,32842,33433}'::integer[])) Filter: ((oc_date_from = '2010-07-22'::date) AND (oc_date_from = '2010-07-24'::date)) - Bitmap Index Scan on ocache_2010_11_oc_h_id_key (cost=0.00..138.88 rows=7184 width=0) (actual
[PERFORM] Re: Big difference in time returned by EXPLAIN ANALYZE SELECT ... AND SELECT ...
W dniu 26 lipca 2010 10:35 użytkownik Piotr Gasidło qua...@barbara.eu.org napisał: \d+ ocache_2010_12 Table public.ocache_2010_12 Indexes: (...) Missed index in listing: ocache_2010_12_ukey UNIQUE, btree (oc_date_from, oc_date_to, oc_h_id, oc_transport, oc_ofc_id_1) WITH (fillfactor=80) -- Piotr Gasidło -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Explains of queries to partitioned tables
There is a partitioned table with 2 partitions: drop table if exists p cascade; create table p ( id bigint not null primary key, ts timestamp); create table p_actual ( check (ts is null) ) inherits (p); create table p_historical ( check (ts is not null) ) inherits (p); -- I skipped the triggers and rules creation insert into p (id, ts) values (1, '2000-01-01'); insert into p (id, ts) values (2, null); insert into p (id, ts) values (3, '2001-01-01'); insert into p (id, ts) values (4, '2005-01-01'); analyze p; analyze p_actual; analyze p_historical; Here is the explain output for the query 'select * from p where ts is null' Result (cost=0.00..188.10 rows=10 width=16) (actual time=0.028..0.038 rows=1 loops=1) - Append (cost=0.00..188.10 rows=10 width=16) (actual time=0.023..0.029 rows=1 loops=1) - Seq Scan on p (cost=0.00..187.00 rows=9 width=16) (actual time=0.002..0.002 rows=0 loops=1) Filter: (ts IS NULL) - Seq Scan on p_actual p (cost=0.00..1.10 rows=1 width=16) (actual time=0.014..0.016 rows=1 loops=1) Filter: (ts IS NULL) Total runtime: 0.080 ms You can notice that the optimizer expects 10 rows in the table p and as a result of this assumption the whole query is estimated as 10 rows. Whether it will cause a performance impact further? pg_stats does not contain any statistics on the table 'p'. Is this a cause of such behaviour? The estimation is worse for some other queries, for example 'select * from p where ts is not null' Result (cost=0.00..188.30 rows=1764 width=16) (actual time=0.021..0.049 rows=3 loops=1) - Append (cost=0.00..188.30 rows=1764 width=16) (actual time=0.016..0.032 rows=3 loops=1) - Seq Scan on p (cost=0.00..187.00 rows=1761 width=16) (actual time=0.003..0.003 rows=0 loops=1) Filter: (ts IS NOT NULL) - Seq Scan on p_historical p (cost=0.00..1.30 rows=3 width=16) (actual time=0.008..0.015 rows=3 loops=1) Filter: (ts IS NOT NULL) Total runtime: 0.095 ms -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Big difference in time returned by EXPLAIN ANALYZE SELECT ... AND SELECT ...
On 26/07/10 16:35, Piotr Gasidło wrote: Hello, I've found strange problem in my database (8.4.4, but also 9.0beta3, default postgresql.conf, shared_buffers raised to 256MB). EXPLAIN ANALYZE SELECT ... Total runtime: 4.782 ms Time: 25,970 ms SELECT ... ... (21 rows) Time: 23,042 ms Test done in psql connected by socket to server (same host, using \timing to get runtime). Does big difference in Total runtime and Time is normal? Given that EXPLAIN ANALYZE doesn't transfer large rowsets to the client, it can't really be time taken to transfer the data, which is the usual difference between 'explain analyze' timings and psql client-side timings. Given that, I'm wondering if the difference in this case is planning time. I can't really imagine the query planner taking 20 seconds (!!) to run, though, no matter how horrifyingly complicated the query and table structure were, unless there was something going wrong. Another possibility, then, is that for some reason queries are being delayed from starting or delayed before results are being returned, so the server completes them in a short amount of time but it takes a while for psql to find out they're finished. In your position, at this point I'd be doing things like hooking a debugger up to the postgres backend and interrupting its execution periodically to see what it's up to while this query runs. I'd also be using wireshark to look at network activity to see if there were any clues there. I'd be using top, vmstat and iostat to examine system-level load if it was practical to leave the system otherwise idle, so I could see if CPU/memory/disk were in demand, and for how long. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Testing Sandforce SSD
Yeb Havinga wrote: Greg Smith wrote: Put it on ext3, toggle on noatime, and move on to testing. The overhead of the metadata writes is the least of the problems when doing write-heavy stuff on Linux. I ran a pgbench run and power failure test during pgbench with a 3 year old computer On the same config more tests. scale 10 read only and read/write tests. note: only 240 s. starting vacuum...end. transaction type: SELECT only scaling factor: 10 query mode: prepared number of clients: 10 duration: 240 s number of transactions actually processed: 8208115 tps = 34197.109896 (including connections establishing) tps = 34200.658720 (excluding connections establishing) y...@client45:~$ pgbench -c 10 -l -M prepared -T 240 test starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 10 query mode: prepared number of clients: 10 duration: 240 s number of transactions actually processed: 809271 tps = 3371.147020 (including connections establishing) tps = 3371.518611 (excluding connections establishing) -- scale 300 (just fits in RAM) read only and read/write tests pgbench -c 10 -M prepared -T 300 -S test starting vacuum...end. transaction type: SELECT only scaling factor: 300 query mode: prepared number of clients: 10 duration: 300 s number of transactions actually processed: 9219279 tps = 30726.931095 (including connections establishing) tps = 30729.692823 (excluding connections establishing) The test above doesn't really test the drive but shows the CPU/RAM limit. pgbench -c 10 -l -M prepared -T 3600 test starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 300 query mode: prepared number of clients: 10 duration: 3600 s number of transactions actually processed: 8838200 tps = 2454.994217 (including connections establishing) tps = 2455.012480 (excluding connections establishing) -- scale 2000 pgbench -c 10 -M prepared -T 300 -S test starting vacuum...end. transaction type: SELECT only scaling factor: 2000 query mode: prepared number of clients: 10 duration: 300 s number of transactions actually processed: 755772 tps = 2518.547576 (including connections establishing) tps = 2518.762476 (excluding connections establishing) So the test above tests the random seek performance. Iostat on the drive showed a steady just over 4000 read io's/s: avg-cpu: %user %nice %system %iowait %steal %idle 11.390.00 13.37 60.400.00 14.85 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 4171.000.00 60624.00 0.00 29.0711.812.83 0.24 100.00 sdb 0.00 0.000.000.00 0.00 0.00 0.00 0.000.00 0.00 0.00 pgbench -c 10 -l -M prepared -T 24000 test starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 2000 query mode: prepared number of clients: 10 duration: 24000 s number of transactions actually processed: 30815691 tps = 1283.979098 (including connections establishing) tps = 1283.980446 (excluding connections establishing) Note the duration of several hours. No long waits occurred - of this last test the latency png is at http://yfrog.com/f/0vlatencywp/ and the TPS graph at http://yfrog.com/f/b5tpsp/ regards, Yeb Havinga -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Big difference in time returned by EXPLAIN ANALYZE SELECT ... AND SELECT ...
26.07.10 12:15, Craig Ringer написав(ла): On 26/07/10 16:35, Piotr Gasidło wrote: Hello, I've found strange problem in my database (8.4.4, but also 9.0beta3, default postgresql.conf, shared_buffers raised to 256MB). EXPLAIN ANALYZE SELECT ... Total runtime: 4.782 ms Time: 25,970 ms SELECT ... ... (21 rows) Time: 23,042 ms Test done in psql connected by socket to server (same host, using \timing to get runtime). Does big difference in Total runtime and Time is normal? Given that EXPLAIN ANALYZE doesn't transfer large rowsets to the client, it can't really be time taken to transfer the data, which is the usual difference between 'explain analyze' timings and psql client-side timings. Given that, I'm wondering if the difference in this case is planning time. I can't really imagine the query planner taking 20 seconds (!!) to run, though, no matter how horrifyingly complicated the query and table structure were, unless there was something going wrong. Actually it's 20ms, so I suspect your point about planning time is correct. Piotr: You can try preparing your statement and then analyzing execute time to check if this is planning time. Best regards, Vitalii Tymchyshyn -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Testing Sandforce SSD
On Sun, 25 Jul 2010, Yeb Havinga wrote: Graph of TPS at http://tinypic.com/r/b96aup/3 and latency at http://tinypic.com/r/x5e846/3 Does your latency graph really have milliseconds as the y axis? If so, this device is really slow - some requests have a latency of more than a second! Matthew -- The early bird gets the worm, but the second mouse gets the cheese. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Big difference in time returned by EXPLAIN ANALYZE SELECT ... AND SELECT ...
On 26/07/10 17:25, Vitalii Tymchyshyn wrote: 26.07.10 12:15, Craig Ringer написав(ла): On 26/07/10 16:35, Piotr Gasidło wrote: Hello, I've found strange problem in my database (8.4.4, but also 9.0beta3, default postgresql.conf, shared_buffers raised to 256MB). EXPLAIN ANALYZE SELECT ... Total runtime: 4.782 ms Time: 25,970 ms Given that, I'm wondering if the difference in this case is planning time. I can't really imagine the query planner taking 20 seconds (!!) to run, though, no matter how horrifyingly complicated the query and table structure were, unless there was something going wrong. Actually it's 20ms, so I suspect your point about planning time is correct. Oh, a commas-as-fraction-separator locale. That makes sense. Thanks for the catch. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Big difference in time returned by EXPLAIN ANALYZE SELECT ... AND SELECT ...
2010/7/26 Vitalii Tymchyshyn tiv...@gmail.com: 26.07.10 12:15, Craig Ringer написав(ла): (...) Piotr: You can try preparing your statement and then analyzing execute time to check if this is planning time. You are right. I've done simple PREPARE (without params, etc). REPARE query as select oc_h_id,oc_duration,SUM(oc_count) FROM ocache WHERE oc_date_from = '2010-07-22'::date AND oc_date_from = '2010-07-24'::date AND oc_h_id = ANY('{32842,3095,27929,2229,22769,3098,33433,22559,226,2130,226,2130,2229,3095,3098,22559,22769,27929,32842,33433}'::int[]) GROUP BY oc_h_id, oc_duration; PREPARE Time: 19,873 ms EXPLAIN ANALYZE EXECUTE query; ... Total runtime: 3.237 ms Time: 5,118 ms EXECUTE query; oc_h_id | oc_duration | sum -+-+-- 27929 | 7 | 546 3098 | 7 | 552 27929 | 14 | 512 3098 | 14 | 444 22769 | 14 | 984 32842 | 14 | 444 27929 | 22 |4 27929 | 15 | 44 32842 | 7 | 552 22769 | 7 | 1356 2229 | 7 | 496 226 | 14 | 536 2130 | 7 | 536 2130 | 14 | 448 226 | 7 | 584 2229 | 14 | 400 33433 | 14 | 444 3095 | 7 | 552 33433 | 7 | 552 3095 | 14 | 444 27929 | 8 | 40 (21 rows) Time: 3,494 ms The time matches EXPLAIN ANALYZE runtime. Compared to not prepared query, its big difference! select oc_h_id,oc_duration,SUM(oc_count) FROM ocache WHERE oc_date_from = '2010-07-22'::date AND oc_date_from = '2010-07-24'::date AND oc_h_id = ANY('{32842,3095,27929,2229,22769,3098,33433,22559,226,2130,226,2130,2229,3095,3098,22559,22769,27929,32842,33433}'::int[]) GROUP BY oc_h_id, oc_duration; oc_h_id | oc_duration | sum -+-+-- 27929 | 7 | 546 3098 | 7 | 552 27929 | 14 | 512 3098 | 14 | 444 22769 | 14 | 984 32842 | 14 | 444 27929 | 22 |4 27929 | 15 | 44 32842 | 7 | 552 22769 | 7 | 1356 2229 | 7 | 496 226 | 14 | 536 2130 | 7 | 536 2130 | 14 | 448 226 | 7 | 584 2229 | 14 | 400 33433 | 14 | 444 3095 | 7 | 552 33433 | 7 | 552 3095 | 14 | 444 27929 | 8 | 40 (21 rows) Time: 22,571 ms Ok. Is there any way to tune postgresql, to shorten planning time for such queries? -- Piotr Gasidło -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Testing Sandforce SSD
Matthew Wakeling wrote: On Sun, 25 Jul 2010, Yeb Havinga wrote: Graph of TPS at http://tinypic.com/r/b96aup/3 and latency at http://tinypic.com/r/x5e846/3 Does your latency graph really have milliseconds as the y axis? Yes If so, this device is really slow - some requests have a latency of more than a second! I try to just give the facts. Please remember that particular graphs are from a read/write pgbench run on a bigger than RAM database that ran for some time (so with checkpoints), on a *single* $435 50GB drive without BBU raid controller. Also, this is a picture with a few million points: the ones above 200ms are perhaps a hundred and hence make up a very small fraction. So far I'm pretty impressed with this drive. Lets be fair to OCZ and the SandForce guys and do not shoot from the hip things like really slow, without that being backed by a graphed pgbench run together with it's cost, so we can compare numbers with numbers. regards, Yeb Havinga -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Testing Sandforce SSD
Matthew Wakeling wrote: Does your latency graph really have milliseconds as the y axis? If so, this device is really slow - some requests have a latency of more than a second! Have you tried that yourself? If you generate one of those with standard hard drives and a BBWC under Linux, I expect you'll discover those latencies to be 5 seconds long. I recently saw 100 *seconds* running a large pgbench test due to latency flushing things to disk, on a system with 72GB of RAM. Takes a long time to flush 3GB of random I/O out to disk when the kernel will happily cache that many writes until checkpoint time. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Testing Sandforce SSD
Yeb Havinga wrote: Please remember that particular graphs are from a read/write pgbench run on a bigger than RAM database that ran for some time (so with checkpoints), on a *single* $435 50GB drive without BBU raid controller. To get similar *average* performance results you'd need to put about 4 drives and a BBU into a server. The worst-case latency on that solution is pretty bad though, when a lot of random writes are queued up; I suspect that's where the SSD will look much better. By the way: if you want to run a lot more tests in an organized fashion, that's what http://github.com/gregs1104/pgbench-tools was written to do. That will spit out graphs by client and by scale showing how sensitive the test results are to each. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Testing Sandforce SSD
On Mon, 26 Jul 2010, Greg Smith wrote: Matthew Wakeling wrote: Does your latency graph really have milliseconds as the y axis? If so, this device is really slow - some requests have a latency of more than a second! Have you tried that yourself? If you generate one of those with standard hard drives and a BBWC under Linux, I expect you'll discover those latencies to be 5 seconds long. I recently saw 100 *seconds* running a large pgbench test due to latency flushing things to disk, on a system with 72GB of RAM. Takes a long time to flush 3GB of random I/O out to disk when the kernel will happily cache that many writes until checkpoint time. Apologies, I was interpreting the graph as the latency of the device, not all the layers in-between as well. There isn't any indication in the email with the graph as to what the test conditions or software are. Obviously if you factor in checkpoints and the OS writing out everything, then you would have to expect some large latency operations. However, if the device itself behaved as in the graph, I would be most unhappy and send it back. Yeb also made the point - there are far too many points on that graph to really tell what the average latency is. It'd be instructive to have a few figures, like only x% of requests took longer than y. Matthew -- I wouldn't be so paranoid if you weren't all out to get me!! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Testing Sandforce SSD
Matthew Wakeling wrote: Apologies, I was interpreting the graph as the latency of the device, not all the layers in-between as well. There isn't any indication in the email with the graph as to what the test conditions or software are. That info was in the email preceding the graph mail, but I see now I forgot to mention it was a 8.4.4 postgres version. regards, Yeb Havinga -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Testing Sandforce SSD
On Mon, Jul 26, 2010 at 10:26 AM, Yeb Havinga yebhavi...@gmail.com wrote: Matthew Wakeling wrote: Apologies, I was interpreting the graph as the latency of the device, not all the layers in-between as well. There isn't any indication in the email with the graph as to what the test conditions or software are. That info was in the email preceding the graph mail, but I see now I forgot to mention it was a 8.4.4 postgres version. Speaking of the layers in-between, has this test been done with the ext3 journal on a different device? Maybe the purpose is wrong for the SSD. Use the SSD for the ext3 journal and the spindled drives for filesystem? Another possibility is to use ext2 on the SSD. Greg
Re: [PERFORM] Testing Sandforce SSD
Matthew Wakeling wrote: Yeb also made the point - there are far too many points on that graph to really tell what the average latency is. It'd be instructive to have a few figures, like only x% of requests took longer than y. Average latency is the inverse of TPS. So if the result is, say, 1200 TPS, that means the average latency is 1 / (1200 transactions/second) = 0.83 milliseconds/transaction. The average TPS figure is normally on a more useful scale as far as being able to compare them in ways that make sense to people. pgbench-tools derives average, worst-case, and 90th percentile figures for latency from the logs. I have 37MB worth of graphs from a system showing how all this typically works for regular hard drives I've been given permission to publish; just need to find a place to host it at internally and I'll make the whole stack available to the world. So far Yeb's data is showing that a single SSD is competitive with a small array on average, but with better worst-case behavior than I'm used to seeing. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Testing Sandforce SSD
Greg Spiegelberg wrote: Speaking of the layers in-between, has this test been done with the ext3 journal on a different device? Maybe the purpose is wrong for the SSD. Use the SSD for the ext3 journal and the spindled drives for filesystem? The main disk bottleneck on PostgreSQL databases are the random seeks for reading and writing to the main data blocks. The journal information is practically noise in comparison--it barely matters because it's so much less difficult to keep up with. This is why I don't really find ext2 interesting either. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Testing Sandforce SSD
Greg Smith g...@2ndquadrant.com wrote: Yeb's data is showing that a single SSD is competitive with a small array on average, but with better worst-case behavior than I'm used to seeing. So, how long before someone benchmarks a small array of SSDs? :-) -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Testing Sandforce SSD
Greg Smith wrote: Yeb Havinga wrote: Please remember that particular graphs are from a read/write pgbench run on a bigger than RAM database that ran for some time (so with checkpoints), on a *single* $435 50GB drive without BBU raid controller. To get similar *average* performance results you'd need to put about 4 drives and a BBU into a server. The worst-case latency on that solution is pretty bad though, when a lot of random writes are queued up; I suspect that's where the SSD will look much better. By the way: if you want to run a lot more tests in an organized fashion, that's what http://github.com/gregs1104/pgbench-tools was written to do. That will spit out graphs by client and by scale showing how sensitive the test results are to each. Got it, running the default config right now. When you say 'comparable to a small array' - could you give a ballpark figure for 'small'? regards, Yeb Havinga PS: Some update on the testing: I did some ext3,ext4,xfs,jfs and also ext2 tests on the just-in-memory read/write test. (scale 300) No real winners or losers, though ext2 isn't really faster and the manual need for fix (y) during boot makes it impractical in its standard configuration. I did some poweroff tests with barriers explicitily off in ext3, ext4 and xfs, still all recoveries went ok. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Testing Sandforce SSD
Yeb Havinga wrote: To get similar *average* performance results you'd need to put about 4 drives and a BBU into a server. The Please forget this question, I now see it in the mail i'm replying to. Sorry for the spam! -- Yeb -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Testing Sandforce SSD
Yeb Havinga wrote: I did some ext3,ext4,xfs,jfs and also ext2 tests on the just-in-memory read/write test. (scale 300) No real winners or losers, though ext2 isn't really faster and the manual need for fix (y) during boot makes it impractical in its standard configuration. That's what happens every time I try it too. The theoretical benefits of ext2 for hosting PostgreSQL just don't translate into significant performance increases on database oriented tests, certainly not ones that would justify the downside of having fsck issues come back again. Glad to see that holds true on this hardware too. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Testing Sandforce SSD
On Mon, Jul 26, 2010 at 12:40 PM, Greg Smith g...@2ndquadrant.com wrote: Greg Spiegelberg wrote: Speaking of the layers in-between, has this test been done with the ext3 journal on a different device? Maybe the purpose is wrong for the SSD. Use the SSD for the ext3 journal and the spindled drives for filesystem? The main disk bottleneck on PostgreSQL databases are the random seeks for reading and writing to the main data blocks. The journal information is practically noise in comparison--it barely matters because it's so much less difficult to keep up with. This is why I don't really find ext2 interesting either. Note that SSDs aren't usually real fast at large sequential writes though, so it might be worth putting pg_xlog on a spinning pair in a mirror and seeing how much, if any, the SSD drive speeds up when not having to do pg_xlog. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Testing Sandforce SSD
On Mon, Jul 26, 2010 at 1:45 PM, Greg Smith g...@2ndquadrant.com wrote: Yeb Havinga wrote: I did some ext3,ext4,xfs,jfs and also ext2 tests on the just-in-memory read/write test. (scale 300) No real winners or losers, though ext2 isn't really faster and the manual need for fix (y) during boot makes it impractical in its standard configuration. That's what happens every time I try it too. The theoretical benefits of ext2 for hosting PostgreSQL just don't translate into significant performance increases on database oriented tests, certainly not ones that would justify the downside of having fsck issues come back again. Glad to see that holds true on this hardware too. I know I'm talking development now but is there a case for a pg_xlog block device to remove the file system overhead and guaranteeing your data is written sequentially every time? Greg
Re: [PERFORM] Strange explain on partitioned tables
Joshua D. Drake wrote: On Fri, 2010-07-23 at 15:03 -0700, Gerald Fontenay wrote: The PostgreSQL partitioning system is aimed to support perhaps a hundred inherited tables. You can expect to get poor performance on queries if you create 1000 of them. Hi, Why is that you would expect poor performance for say 1000 or more? I have a ~1000 inherited tables and I don't see any significant slowdowns. I only ever access a single inherited table at a time though in this situation. I suppose I am using inheritance only for organization in this case... It is variable based on workload and as I recall has to do with the planning time. As the number of children increases, so does the planning time. Joshua D. Drake Thank you for your response. So if I query only my target child table, this should be just like any other single table wrt planning right? I have thousands of these tables. (I suppose that I'm only using inheritance for the sake of organization in this situation...) Thanks! Gerald
Re: [PERFORM] Testing Sandforce SSD
On Mon, Jul 26, 2010 at 03:23:20PM -0600, Greg Spiegelberg wrote: On Mon, Jul 26, 2010 at 1:45 PM, Greg Smith g...@2ndquadrant.com wrote: Yeb Havinga wrote: I did some ext3,ext4,xfs,jfs and also ext2 tests on the just-in-memory read/write test. (scale 300) No real winners or losers, though ext2 isn't really faster and the manual need for fix (y) during boot makes it impractical in its standard configuration. That's what happens every time I try it too. The theoretical benefits of ext2 for hosting PostgreSQL just don't translate into significant performance increases on database oriented tests, certainly not ones that would justify the downside of having fsck issues come back again. Glad to see that holds true on this hardware too. I know I'm talking development now but is there a case for a pg_xlog block device to remove the file system overhead and guaranteeing your data is written sequentially every time? For one I doubt that its a relevant enough efficiency loss in comparison with a significantly significantly complex implementation (for one you cant grow/shrink, for another you have to do more complex, hw-dependent things like rounding to hardware boundaries, page size etc to stay efficient) for another my experience is that at a relatively low point XlogInsert gets to be the bottleneck - so I don't see much point in improving at that low level (yet at least). Where I would like to do some hw dependent measuring (because I see significant improvements there) would be prefetching for seqscan, indexscans et al. using blktrace... But I currently dont have the time. And its another topic ;-) Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Testing Sandforce SSD
Greg Spiegelberg wrote: I know I'm talking development now but is there a case for a pg_xlog block device to remove the file system overhead and guaranteeing your data is written sequentially every time? It's possible to set the PostgreSQL wal_sync_method parameter in the database to open_datasync or open_sync, and if you have an operating system that supports direct writes it will use those and bypass things like the OS write cache. That's close to what you're suggesting, supposedly portable, and it does show some significant benefit when it's properly supported. Problem has been, the synchronous writing code on Linux in particular hasn't ever worked right against ext3, and the PostgreSQL code doesn't make the right call at all on Solaris. So there's two popular platforms that it just plain doesn't work on, even though it should. We've gotten reports that there are bleeding edge Linux kernel and library versions available now that finally fix that issue, and that PostgreSQL automatically takes advantage of them when it's compiled on one of them. But I'm not aware of any distribution that makes this easy to try out that's available yet, paint is still wet on the code I think. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Big difference in time returned by EXPLAIN ANALYZE SELECT ... AND SELECT ...
=?UTF-8?Q?Piotr_Gasid=C5=82o?= qua...@barbara.eu.org writes: Ok. Is there any way to tune postgresql, to shorten planning time for such queries? You've got a ridiculously large number of partitions. Use fewer. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Linux Filesystems again - Ubuntu this time
I'm involved with the setup of replacement hardware for one of our systems. It is going to be using Ubuntu Lucid Server (kernel 2.6.32 I believe). The issue of filesystems has raised its head again. I note that ext4 is now the default for Lucid, what do folks think about using it: stable enough now? Also xfs has seen quite a bit of development in these later kernels, any thoughts on that? Cheers Mark P.s: We are quite keen to move away from ext3, as we have encountered its tendency to hit a wall under heavy load and leave us waiting for kjournald and pdflush to catch up -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance