[PERFORM] WAL still kept in pg_xlog even long after heavy workload is done
Hello, We're encountering some problems with WAL growth in production with PostgreSQL 9.6.3 and 9.6.2. From what I know a WAL file can either be recycled(and would be reused) or deleted. We'd like to have better control over the amount of WAL that is kept around. There were a few occasions where we had to resize partitions because pg_xlog grew as much as it did. According to the docs [1] there are some parameters in GUC (postgresql.conf) about this. The parameters I've been able to identify are the following: * wal_keep_segments * max_wal_size * min_wal_size Our WAL grows a lot around the time of our product upgrades (that is, when we upgrade to a new version of our database, so not a Pg upgrade, just a newer version of our db schema, plpgsql code etc). As part of this upgrade, we add new columns or have some large UPDATEs on tables as big as 300M (but in one case we also have one with 1.5B rows). I am seeing the following int he docs [3] min_wal_size (integer) As long as WAL disk usage stays below this setting, old WAL files are always recycled for future use at a checkpoint, rather than removed. This can be used to ensure that enough WAL space is reserved to handle spikes in WAL usage, for example when running large batch jobs. The default is 80 MB. This parameter can only be set in the postgresql.conf file or on the server command line. This sounds very familiar because, that's essentially what we're doing. There are some large jobs that cause a lot of workload and changes and generate a lot of WAL. So far, the way I interpret this is min_wal_size is the amount of WAL recycled (that is kept around to be reused) and max_wal_size is the total amount of WAL allowed to be kept on disk. I would also like to interpret the default values of min_wal_size and max_wal_size. So if I run the following query: psql -c "select name, setting from pg_settings where name like '%wal_size';" I get the following: max_wal_size|2097152 min_wal_size|1048576 Do these two values look ok? Both these values were generated by pgtune [4], but it seems like pgtune thinks they're expressed by default in KB. Looking at the PostgreSQL code, it seems to me that these two are expressed in MB, at least that's what I understand when I see GUC_UNIT_MB in the source code [6]. So maybe the pgtune fork we're using has a bug in the sense that it produces an incorrect value for those two parameters? (should be in MB but is expressed in KB, therefore much higher than what it should be). Another question is, how can I use any of the checkpoint settings to control the WAL that is kept around? * checkpoint_timeout * checkpoint_completion_target * checkpoint_flush_after * checkpoint_warning = I actually tried something with these settings on a test environment. I've used the following settings: checkpoint_timeout = 40s min_wal_size = 600MB max_wal_size = 900MB Then I've created a db named x1 and ran this on it four or five times. pgbench -i -s 70 x1 The pg_xlog directory grew to 2.2G and after a few minutes, it decreased to 2.0G After about 40 minutes it decreased to 1.4G and it's not going any lower. I was expecting pg_xlog's size to be 600MB after the first WAL removal had run. Should I expect that the size will eventually drop to 600MB or will it just sit there at 1.4G? = Other thoughts: I have looked a bit at Pg internals too, I'm seeing four functions there that are responsible for removing WAL: XLogArchiveIsReady, RemoveXlogFile, RemoveOldXlogFiles, XLOGfileslop. All of these belong to /src/backend/access/transam/xlog.c The only place in the code that seems to take a decision about how much WAL to recycle and how much to remove is the function XLOGfileslop [2]. It seems like XLOGfileslop is an estimate for the number of WAL to keep around(recycled WAL). Both max_wal_size and min_wal_size are used inside XLOGfileslop. As far as checkpoint_* GUC settings go, they seem to be involved as well. So far, the only thing I know about checkpoints is that between checkpoints, many WAL are created. The amount of WAL between checkpoints can vary. I don't have a good understanding about the interplay between checkpoints and WAL. I'd be grateful for any thoughts on how to improve this, and better control the amount of WAL kept in pg_xlog. Thank you, Stefan [1] https://www.postgresql.org/docs/9.6/static/wal-configuration.html [2] https://github.com/postgres/postgres/blob/0c5803b450e0cc29b3527df3f352e6f18a038cc6/src/backend/access/transam/xlog.c#L2258 [3] https://www.postgresql.org/docs/9.6/static/runtime-config-wal.html#RUNTIME-CONFIG-WAL-CHECKPOINTS [4] https://github.com/kmatt/pgtune [5] https://github.com/kmatt/pgtune/blob/master/pgtune#L560 [6] https://github.com/postgres/postgres/blob/f49842d1ee31b976c681322f76025d7732e860f3/src/backend/utils/misc/guc.c#L2268 Stefan Petrea System Engineer stefan.pet...@tangoe.com --
Re: [PERFORM] anti-join with small table via text/varchar cannot estimate rows correctly
On 02.03.2017 02:06, Tom Lane wrote: Stefan Andreatta <s.andrea...@synedra.com> writes: The same anti-join using the text fields, however estimates just 1 resulting row, while there are still of course 9,999 of them: =# explain analyze select tmp_san_1.id from tmp_san_1 left join tmp_san_2 on tmp_san_1.text = tmp_san_2.text where tmp_san_2.id is null; That is not an anti-join. To make it one, you have to constrain the RHS join column to be IS NULL, not some random other column. Note the join type isn't getting shown as Anti: Hash Left Join (cost=1.02..192.53 rows=1 width=4) (actual time=0.020..3.091 rows= loops=1) As written, the query could return some rows that weren't actually antijoin rows, ie tmp_san_1.text *did* have a match in tmp_san_2, but that row chanced to have a null value of id. Possibly the planner could be smarter about estimating for this case, but it doesn't look much like a typical use-case to me. regards, tom lane Thanks a lot! Right, my problem had nothing to do with the type of the join field, but with the selection of the proper field for the NULL-condition. So, even a join on the id field is badly estimated if checked on the text field: =# EXPLAIN ANALYZE SELECT tmp_san_1.id FROM tmp_san_1 LEFT JOIN tmp_san_2 ON tmp_san_1.id = tmp_san_2.id WHERE (tmp_san_2.text IS NULL); QUERY PLAN --- Hash Left Join (cost=1.02..192.53 rows=1 width=4) (actual time=0.019..2.939 rows= loops=1) Hash Cond: (tmp_san_1.id = tmp_san_2.id) Filter: (tmp_san_2.text IS NULL) Rows Removed by Filter: 1 -> Seq Scan on tmp_san_1 (cost=0.00..154.00 rows=1 width=4) (actual time=0.007..1.003 rows=1 loops=1) -> Hash (cost=1.01..1.01 rows=1 width=6) (actual time=0.004..0.004 rows=1 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on tmp_san_2 (cost=0.00..1.01 rows=1 width=6) (actual time=0.001..0.002 rows=1 loops=1) Planning time: 0.062 ms Execution time: 3.381 ms (10 rows) ... but if the join and the check refer to the same field everything is fine: =# EXPLAIN ANALYZE SELECT tmp_san_1.id FROM tmp_san_1 LEFT JOIN tmp_san_2 ON tmp_san_1.id = tmp_san_2.id WHERE (tmp_san_2.id IS NULL); QUERY PLAN --- Hash Anti Join (cost=1.02..281.26 rows= width=4) (actual time=0.018..2.672 rows= loops=1) Hash Cond: (tmp_san_1.id = tmp_san_2.id) -> Seq Scan on tmp_san_1 (cost=0.00..154.00 rows=1 width=4) (actual time=0.007..0.962 rows=1 loops=1) -> Hash (cost=1.01..1.01 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on tmp_san_2 (cost=0.00..1.01 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1) Planning time: 0.051 ms Execution time: 3.164 ms (8 rows) It get's more interesting again, if the text field really could be NULL and I wanted to include those rows. If I just include "OR tmp_san_2.text IS NULL" estimates are off again: =# EXPLAIN ANALYZE SELECT tmp_san_1.id FROM tmp_san_1 LEFT JOIN tmp_san_2 ON tmp_san_1.id = tmp_san_2.id WHERE (tmp_san_2.id IS NULL OR tmp_san_2.text IS NULL); QUERY PLAN --- Hash Left Join (cost=1.02..192.53 rows=1 width=4) (actual time=0.019..2.984 rows= loops=1) Hash Cond: (tmp_san_1.id = tmp_san_2.id) Filter: ((tmp_san_2.id IS NULL) OR (tmp_san_2.text IS NULL)) Rows Removed by Filter: 1 -> Seq Scan on tmp_san_1 (cost=0.00..154.00 rows=1 width=4) (actual time=0.008..1.024 rows=1 loops=1) -> Hash (cost=1.01..1.01 rows=1 width=6) (actual time=0.004..0.004 rows=1 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on tmp_san_2 (cost=0.00..1.01 rows=1 width=6) (actual time=0.001..0.002 rows=1 loops=1) Planning time: 0.088 ms Execution time: 3.508 ms (10 rows) Instead, it seems, I have to move this condition (inverted) into the join clause for the planner to make correct estimates again: =# EXPLAIN ANALYZE SELECT tmp_san_1.id FROM tmp_san_1 LEFT JOIN tmp_san_2 ON tmp_san_1.id = tmp_san_2.id AND tmp_san_2.text IS NOT NULL WHERE (tmp_san_2.id IS NULL); QUERY PLAN
[PERFORM] anti-join with small table via text/varchar cannot estimate rows correctly
Hello, I have encountered a strange problem when doing an anti-join with a very small table via a varchar or text field as opposed to an integer field. Postgres version is 9.5.3 I did some experiments to extract the problem in a simple form. FIrst generate two tables with a series of numbers - once as integers once as text. The first table has 10,000 rows the second table just one: =# select generate_series(1, 1) as id, generate_series(1,1)::text as text into table tmp_san_1; SELECT 1 =# select generate_series(1, 1) as id, generate_series(1,1)::text as text into table tmp_san_2; SELECT 1 =# analyze tmp_san_1; ANALYZE =# analyze tmp_san_2; ANALYZE =# \d tmp_san_* Table "public.tmp_san_1" Column | Type | Modifiers +-+--- id | integer | text | text| Table "public.tmp_san_2" Column | Type | Modifiers +-+--- id | integer | text | text| Now I do an anti-join between the two tables via the id field (integer). The number of resulting rows are estimated correctly as 9,999: =# explain analyze select tmp_san_1.id from tmp_san_1 left join tmp_san_2 on tmp_san_1.id = tmp_san_2.id where tmp_san_2.id is null; QUERY PLAN --- Hash Anti Join (cost=1.02..281.26 rows= width=4) (actual time=0.019..2.743 rows= loops=1) Hash Cond: (tmp_san_1.id = tmp_san_2.id) -> Seq Scan on tmp_san_1 (cost=0.00..154.00 rows=1 width=4) (actual time=0.007..1.023 rows=1 loops=1) -> Hash (cost=1.01..1.01 rows=1 width=4) (actual time=0.004..0.004 rows=1 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on tmp_san_2 (cost=0.00..1.01 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=1) Planning time: 0.138 ms Execution time: 3.218 ms (8 rows) The same anti-join using the text fields, however estimates just 1 resulting row, while there are still of course 9,999 of them: =# explain analyze select tmp_san_1.id from tmp_san_1 left join tmp_san_2 on tmp_san_1.text = tmp_san_2.text where tmp_san_2.id is null; QUERY PLAN --- Hash Left Join (cost=1.02..192.53 rows=1 width=4) (actual time=0.020..3.091 rows= loops=1) Hash Cond: (tmp_san_1.text = tmp_san_2.text) Filter: (tmp_san_2.id IS NULL) Rows Removed by Filter: 1 -> Seq Scan on tmp_san_1 (cost=0.00..154.00 rows=1 width=8) (actual time=0.008..0.983 rows=1 loops=1) -> Hash (cost=1.01..1.01 rows=1 width=6) (actual time=0.004..0.004 rows=1 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on tmp_san_2 (cost=0.00..1.01 rows=1 width=6) (actual time=0.002..0.002 rows=1 loops=1) Planning time: 0.173 ms Execution time: 3.546 ms (10 rows) I cannot explain that behavior and much less think of a fix or workaround. Unfortunately my real-world example has to use varchar for the join. Thanks for any help, Stefan
[PERFORM] Re: [PERFORM] Re: Query > 1000× slowdown after adding datetime comparison
2015-08-31 21:46 GMT+02:00 twoflowerwrote: > I created a new boolean column and filled it for every row in DOCUMENT with > *(doc.date_last_updated >= date(now() - '171:00:00'::interval))*, reanalyzed > ... ... and you've put an index on that new boolean column (say "updated")? CREATE INDEX index_name ON some_table (boolean_field); or tried a conditional index like CREATE INDEX index_name ON some_table (some_field) WHERE boolean_field; -S. 2015-08-31 21:46 GMT+02:00 twoflower : > David G Johnston wrote >> What happens if you pre-compute the date condition and hard code it? > > I created a new boolean column and filled it for every row in DOCUMENT with > *(doc.date_last_updated >= date(now() - '171:00:00'::interval))*, reanalyzed > the table and modified the query to just compare this column to TRUE. I > expected this to be very fast, considering that a (to me, anyway) similar > query also containing a constant value comparison finishes immediately. > However, the query is running now for 4 minutes already. That's really > interesting. > > > > -- > View this message in context: > http://postgresql.nabble.com/Query-1-000-000-slowdown-after-adding-datetime-comparison-tp5864045p5864088.html > Sent from the PostgreSQL - performance mailing list archive at Nabble.com. > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Re: [PERFORM] Re: Query > 1000× slowdown after adding datetime comparison
So, if I'm understanding you correctly, we're talking solely about following clause in the query you gave initially: WHERE doc.date_last_updated >= date(now() - '171:00:00'::interval) which initially was WHERE documenttype = 4 and now is being replaced by a temporary (I'd say derived) column WHERE updated ? In any case - I have to go - but run http://explain.depesz.com/ and give a weblink to the explain plans of your queries. -S. 2015-08-31 22:30 GMT+02:00 twoflower: > I did not. I wanted to compare this query to the one I tried before, having > *documenttype = 4* as the sole condition. That one was very fast and the > *documenttype* was not indexed either. > > But this query, using the new temporary column, still runs, after 48 > minutes... > > > > -- > View this message in context: > http://postgresql.nabble.com/Query-1-000-000-slowdown-after-adding-datetime-comparison-tp5864045p5864101.html > Sent from the PostgreSQL - performance mailing list archive at Nabble.com. > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Strange performance problem with query
Hi Kevin, Thanks for the advice. I opted for setting the random_page_cost a bit lower, as that made the most sense in the context of the current setup where there is quite a high cache hit ratio. Is 97% high enough?: =# SELECT 'cache hit rate' AS name, sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) AS ratio FROM pg_statio_user_tables; name | ratio + cache hit rate | 0.97344836172381212996 When I set the random_page_cost down from 4 to 2, the query plan changes to the faster one. Kind Regards Stefan Cell : 072-380-1479 Desk : 087-577-7241 On 2014/09/15 03:25 PM, Kevin Grittner wrote: Van Der Berg, Stefan svanderb...@fnb.co.za wrote: I get a similar plan selected on the original query if I set enable_seqscan to off. I much prefer the second result. My questions are: 1. Why is this happening? Your cost factors don't accurately model actual costs. 2. How can I encourage the behavior of the second query without changing the original query? You didn't give enough information to really give solid advice, but when people see what you are seeing, some common tuning needed is: Set shared_buffers to about 25% of system RAM or 8GB, whichever is lower. Set effective_cache_size to 50% to 75% of system RAM. Set work_mem to about 25% of system RAM divided by max_connections. If you have a high cache hit ratio (which you apparently do) reduce random_page_cost, possibly to something near or equal to seq_page_cost. Increase cpu_tuple_cost, perhaps to 0.03. You might want to play with the above, and if you still have a problem, read this page and post with more detail: http://wiki.postgresql.org/wiki/SlowQueryQuestions Is there some column level setting I can set? The statistics looked pretty accurate, so that shouldn't be necessary. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company To read FirstRand Bank's Disclaimer for this email click on the following address or copy into your Internet browser: https://www.fnb.co.za/disclaimer.html If you are unable to access the Disclaimer, send a blank e-mail to firstrandbankdisclai...@fnb.co.za and we will send you a copy of the Disclaimer. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Strange performance problem with query
Hi All, Please see the output from the following query analysis : =# explain analyze select count(1) from jbpmprocess.jbpm_taskinstance ti join jbpmprocess.jbpm_task task on (ti.task_ = task.id_ ) join jbpmprocess.jbpm_processinstance pi on ti.procinst_ = pi.id_ where ti.isopen_ = true; QUERY PLAN -- Aggregate (cost=47372.04..47372.05 rows=1 width=0) (actual time=647.070..647.071 rows=1 loops=1) - Hash Join (cost=44806.99..47336.72 rows=14127 width=0) (actual time=605.077..645.410 rows=20359 loops=1) Hash Cond: (ti.task_ = task.id_) - Hash Join (cost=44779.80..47115.28 rows=14127 width=8) (actual time=604.874..640.541 rows=20359 loops=1) Hash Cond: (ti.procinst_ = pi.id_) - Index Scan using idx_task_instance_isopen on jbpm_taskinstance ti (cost=0.00..1995.84 rows=22672 width=16) (actual time=0.011..16.606 rows=20359 loops=1) Index Cond: (isopen_ = true) Filter: isopen_ - Hash (cost=28274.91..28274.91 rows=1320391 width=8) (actual time=604.601..604.601 rows=1320391 loops=1) Buckets: 262144 Batches: 1 Memory Usage: 51578kB - Seq Scan on jbpm_processinstance pi (cost=0.00..28274.91 rows=1320391 width=8) (actual time=0.004..192.166 rows=1320391 loops=1) - Hash (cost=18.75..18.75 rows=675 width=8) (actual time=0.196..0.196 rows=675 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 27kB - Seq Scan on jbpm_task task (cost=0.00..18.75 rows=675 width=8) (actual time=0.003..0.106 rows=675 loops=1) Total runtime: 652.266 ms (15 rows) I'm not sure why the planner insists on doing the sequential scan on jbpm_processinstance even though the 22672 rows from jbpm_taskinstance it has to match it against, is only 1% of the number of rows in jbpm_processinstance. So far I think it is because the values in procinst_ of jbpm_taskinstance are not entirely unique. The very strange thing though is the way the query plan changes if I repeat the where clause : explain analyze select count(1) from jbpmprocess.jbpm_taskinstance ti join jbpmprocess.jbpm_task task on (ti.task_ = task.id_ ) join jbpmprocess.jbpm_processinstance pi on ti.procinst_ = pi.id_ where ti.isopen_ = true and ti.isopen_ = true; QUERY PLAN -- Aggregate (cost=2074.61..2074.62 rows=1 width=0) (actual time=80.126..80.126 rows=1 loops=1) - Hash Join (cost=27.19..2074.24 rows=151 width=0) (actual time=0.217..77.959 rows=20359 loops=1) Hash Cond: (ti.task_ = task.id_) - Nested Loop (cost=0.00..2044.97 rows=151 width=8) (actual time=0.016..71.429 rows=20359 loops=1) - Index Scan using idx_task_instance_isopen on jbpm_taskinstance ti (cost=0.00..29.72 rows=243 width=16) (actual time=0.012..16.928 rows=20359 loops=1) Index Cond: ((isopen_ = true) AND (isopen_ = true)) Filter: (isopen_ AND isopen_) - Index Scan using jbpm_processinstance_pkey on jbpm_processinstance pi (cost=0.00..8.28 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=20359) Index Cond: (id_ = ti.procinst_) - Hash (cost=18.75..18.75 rows=675 width=8) (actual time=0.196..0.196 rows=675 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 27kB - Seq Scan on jbpm_task task (cost=0.00..18.75 rows=675 width=8) (actual time=0.002..0.107 rows=675 loops=1) Total runtime: 80.170 ms I get a similar plan selected on the original query if I set enable_seqscan to off. I much prefer the second result. My questions are: 1. Why is this happening? 2. How can I encourage the behavior of the second query without changing the original query? Is there some column level setting I can set? (BTW the tables are analyzed, and I currently have no special settings/attributes set for any of the tables.) -- Kind Regards Stefan Cell : 072-380-1479 Desk : 087-577-7241 To read FirstRand Bank's Disclaimer for this email click on the following address or copy into your Internet browser: https://www.fnb.co.za/disclaimer.html If you are unable to access the Disclaimer, send a blank e-mail to firstrandbankdisclai...@fnb.co.za and we will send you a copy of the Disclaimer. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Seqscan on big table, when an Index-Usage should be possible
Hello, I'm currently testing some queries on data which I had imported from an other database-system into Postgres 9.4. After the import I did create the indexes, run an analyze and vacuum. I also played a little bit with seq_page_cost and random_page_cost. But currently I have no clue, which parameter I have to adjust, to get an query-time like the example width 'enable_seqscan=off'. Stefan pd= set enable_seqscan=off; pd= explain analyze select t.name from product p left join measurements m on p.productid=m.productid inner join measurementstype t on m.measurementstypeid=t.measurementstypeid where p.timestamp between '2013-02-01 15:00:00' and '2013-02-05 21:30:00' group by t.name; QUERY PLAN HashAggregate (cost=200380892.01..200380936.43 rows=4442 width=16) (actual time=34428.335..34428.693 rows=656 loops=1) Group Key: t.name - Hash Join (cost=8995.44..200361772.19 rows=7647926 width=16) (actual time=103.670..30153.958 rows=5404751 loops=1) Hash Cond: (m.measurementstypeid = t.measurementstypeid) - Nested Loop (cost=8279.61..200188978.03 rows=7647926 width=4) (actual time=75.939..22488.725 rows=5404751 loops=1) - Bitmap Heap Scan on product p (cost=8279.03..662659.76 rows=526094 width=8) (actual time=75.903..326.850 rows=368494 loops=1) Recheck Cond: ((timestamp = '2013-02-01 15:00:00'::timestamp without time zone) AND (timestamp = '2013-02-05 21:30:00'::timestamp without time zo Heap Blocks: exact=3192 - Bitmap Index Scan on product_timestamp (cost=0.00..8147.51 rows=526094 width=0) (actual time=75.050..75.050 rows=368494 loops=1) Index Cond: ((timestamp = '2013-02-01 15:00:00'::timestamp without time zone) AND (timestamp = '2013-02-05 21:30:00'::timestamp without tim - Index Scan using measurements_productid on measurements m (cost=0.58..347.12 rows=3214 width=12) (actual time=0.018..0.045 rows=15 loops=368494) Index Cond: (productid = p.productid) - Hash (cost=508.91..508.91 rows=16554 width=20) (actual time=27.704..27.704 rows=16554 loops=1) Buckets: 2048 Batches: 1 Memory Usage: 686kB - Index Scan using measurementstype_pkey on measurementstype t (cost=0.29..508.91 rows=16554 width=20) (actual time=0.017..15.719 rows=16554 loops=1) Planning time: 2.176 ms Execution time: 34429.080 ms (17 Zeilen) Zeit: 34432,187 ms pd= set enable_seqscan=on; SET Zeit: 0,193 ms pd= explain analyze select t.name from product p left join measurements m on p.productid=m.productid inner join measurementstype t on m.measurementstypeid=t.measurementstypeid where p.timestamp between '2013-02-01 15:00:00' and '2013-02-05 21:30:00' group by t.name; QUERY PLAN HashAggregate (cost=108645282.49..108645326.91 rows=4442 width=16) (actual time=5145182.269..5145182.656 rows=656 loops=1) Group Key: t.name - Hash Join (cost=671835.40..108626162.68 rows=7647926 width=16) (actual time=2087822.232..5141351.539 rows=5404751 loops=1) Hash Cond: (m.measurementstypeid = t.measurementstypeid) - Hash Join (cost=671291.94..108453540.88 rows=7647926 width=4) (actual time=2087800.816..5134312.822 rows=5404751 loops=1) Hash Cond: (m.productid = p.productid) - Seq Scan on measurements m (cost=0.00..49325940.08 rows=2742148608 width=12) (actual time=0.007..2704591.045 rows=2742146806 loops=1) - Hash (cost=662659.76..662659.76 rows=526094 width=8) (actual time=552.480..552.480 rows=368494 loops=1) Buckets: 16384 Batches: 4 Memory Usage: 2528kB - Bitmap Heap Scan on product p (cost=8279.03..662659.76 rows=526094 width=8) (actual time=73.353..302.482 rows=368494 loops=1) Recheck Cond: ((timestamp = '2013-02-01 15:00:00'::timestamp without time zone) AND (timestamp = '2013-02-05 21:30:00'::timestamp without t Heap Blocks: exact=3192 - Bitmap Index Scan on product_timestamp (cost=0.00..8147.51 rows=526094 width=0) (actual time=72.490..72.490 rows=368494 loops=1) Index Cond: ((timestamp = '2013-02-01 15:00:00'::timestamp without time zone) AND (timestamp = '2013-02-05 21:30:00'::timestamp witho - Hash (cost=336.54..336.54 rows=16554
Re: [PERFORM] Getting query plan alternatives from query planner?
Hi Craig and Shawn I fully agree with your argumentation. Who's the elephant in the room who is reluctant to introduce explicit hints? -S. 2014-04-14 17:35 GMT+02:00 Craig James cja...@emolecules.com: Shaun Thomas stho...@optionshouse.com wrote: these issues tend to get solved through optimization fences. Reorganize a query into a CTE, or use the (gross) OFFSET 0 trick. How are these nothing other than unofficial hints? Yeah, the cognitive dissonance levels get pretty high around this issue. Some of the same people who argue strenuously against adding hints about what plan should be chosen also argue against having clearly equivalent queries optimize to the same plan because they find the fact that they don't useful for coercing a decent plan sometimes. That amounts to a hint, but obscure and undocumented. (The OP may be wondering what this OFFSET 0 trick is, and how he can use it.) +1. I've said this or something like it at least a half-dozen times. Postgres DOES have hints, they're just obscure, undocumented and hard to use. If a developer chooses to use them, they become embedded in the app and forgotten. They're hard to find because there's nothing explicit in the SQL to look for. You have to know to look for things like OFFSET or SET Five years down the road when the developer is long gone, who's going to know why ... OFFSET 0 was put in the code unless the developer made careful comments? With explicit, documented hints, one could search for hints of a particular type should the optimizer improve to the point where they are no longer needed. It is harder to do that with subtle differences in syntax choice. Figuring out which CTEs or LIMITs were chosen because they caused optimization barriers rather than for their semantic merit takes some effort. Exactly. I'll make a bet here. I'll bet that the majority of large Postgres installations have at least one, probably several, SQL statements that have been hinted in some way, either with CTEs or LIMITs, or by using SET to disable a particular query type, and that these hints are critical to the system's performance. The question is not whether to have hints. The question is how to expose hints to users. Craig
Re: [PERFORM] slow join not using index properly
to get past, so I think it might be invalid: ERROR: syntax error at or near UNION LINE 8: UNION ( ^ So I landed on the version that I posted above, which seems to select the same set in all of the cases that I tried. Anyway, thanks again for taking a stab at helping, I do appreciate it. If you have any other ideas that might be of help I'd certainly be happy to hear them. Take care, /Stefan On Thu, Mar 20, 2014 at 11:02 PM, Ilya Kosmodemiansky ilya.kosmodemian...@postgresql-consulting.com wrote: Hi Stefan! Probably you need to rewrite your query like this (check it first): with RECURSIVE qq(cont_key, anc_key) as ( select min(a1.context_key), ancestor_key from virtual_ancestors a1 union select (SELECT a1.context_key, ancestor_key FROM virtual_ancestors a1 where context_key cont_key order by context_key limit 1) from qq where cont_key is not null ) select a1.cont_key from qq a1, collection_data, virtual_ancestors a2 WHERE a1.anc_key = collection_data.context_key AND collection_data.collection_context_key = a2.context_key AND a2.ancestor_key = ? best regards, Ilya On Fri, Mar 21, 2014 at 12:56 AM, Stefan Amshey srams...@gmail.com wrote: We have a slow performing query that we are trying to improve, and it appears to be performing a sequential scan at a point where it should be utilizing an index. Can anyone tell me why postgres is opting to do it this way? The original query is as follows: SELECT DISTINCT a1.context_key FROM virtual_ancestors a1, collection_data, virtual_ancestors a2 WHERE a1.ancestor_key = collection_data.context_key AND collection_data.collection_context_key = a2.context_key AND a2.ancestor_key = ? The key relationships should all using indexed columns, but the query plan that postgres comes up with ends up performing a sequential scan on the collection_data table (in this case about 602k rows) where we would have expected it to utilize the index: HashAggregate (cost=60905.73..60935.73 rows=3000 width=4) (actual time=3366.165..3367.354 rows=3492 loops=1) Buffers: shared hit=16291 read=1222 - Nested Loop (cost=17546.26..60898.23 rows=3000 width=4) (actual time=438.332..3357.918 rows=13037 loops=1) Buffers: shared hit=16291 read=1222 - Hash Join (cost=17546.26..25100.94 rows=98 width=4) (actual time=408.554..415.767 rows=2092 loops=1) Hash Cond: (a2.context_key = collection_data.collection_context_key) Buffers: shared hit=4850 read=3 - Index Only Scan using virtual_ancestors_pkey on virtual_ancestors a2 (cost=0.00..233.32 rows=270 width=4) (actual time=8.532..10.703 rows=1960 loops=1) Index Cond: (ancestor_key = 1072173) Heap Fetches: 896 Buffers: shared hit=859 read=3 - Hash (cost=10015.56..10015.56 rows=602456 width=8) (actual time=399.708..399.708 rows=602570 loops=1) Buckets: 65536 Batches: 1 Memory Usage: 23538kB Buffers: shared hit=3991 sequential scan occurs here ## - Seq Scan on collection_data (cost=0.00..10015.56 rows=602456 width=8) (actual time=0.013..163.509 rows=602570 loops=1) Buffers: shared hit=3991 - Index Only Scan using virtual_ancestors_pkey on virtual_ancestors a1 (cost=0.00..360.70 rows=458 width=8) (actual time=1.339..1.403 rows=6 loops=2092) Index Cond: (ancestor_key = collection_data.context_key) Heap Fetches: 7067 Buffers: shared hit=11441 read=1219 Total runtime: 3373.058 ms The table definitions are as follows: Table public.virtual_ancestors Column| Type | Modifiers --+--+--- ancestor_key | integer | not null context_key | integer | not null degree | smallint | not null Indexes: virtual_ancestors_pkey PRIMARY KEY, btree (ancestor_key, context_key) virtual_context_key_idx btree (context_key) Foreign-key constraints: virtual_ancestors_ancestor_key_fkey FOREIGN KEY (ancestor_key) REFERENCES contexts(context_key) virtual_ancestors_context_key_fkey FOREIGN KEY (context_key) REFERENCES contexts(context_key) Table public.collection_data Column | Type | Modifiers +--+--- collection_context_key | integer | not null context_key| integer | not null type| character varying(1) | not null source| character varying(1) | not null Indexes: collection_data_context_key_idx btree (context_key
Re: [PERFORM] Getting query plan alternatives from query planner?
Hi Tom You wrote: Path alternatives are rejected whenever possible before moving up to the next join level, so that what we have rejected is actually just a plan fragment in most cases. Thanks for the quick answer. This sounds like a fair implementation decision. Background for asking this is of course, that one want's 1. to understand and 2. influence the optimizer in cases where one thinks that the planner is wrong :-). So, the bottom line is 1. that PostgreSQL doesn't offer no means to understand the planner except EXPLAIN-ing the chosen plan? 2. and there's no road map to introduce planner hinting (like in EnterpriseDB or Ora)? Regards, Stefan 2014-03-20 18:08 GMT+01:00 Tom Lane t...@sss.pgh.pa.us: Stefan Keller sfkel...@gmail.com writes: I'd like to know from the query planner which query plan alternatives have been generated and rejected. Is this possible? No, not really. People have occasionally hacked the planner to print rejected paths before they're discarded, but there's no convenient way to do anything except send the data to the postmaster log, which isn't all that convenient. A bigger problem is that people who are asking for this typically imagine that the planner generates complete plans before rejecting them; which it does not. Path alternatives are rejected whenever possible before moving up to the next join level, so that what we have rejected is actually just a plan fragment in most cases. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Getting query plan alternatives from query planner?
Hi, I'd like to know from the query planner which query plan alternatives have been generated and rejected. Is this possible? --Stefan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] FTS performance issue - planner problem identified (but only partially resolved)
Hi Kevin Well, you're right :-) But my use cases are un-specific by design since I'm using FTS as a general purpose function. So I still propose to enhance the planner too as Tom Lane and your colleague suggest based on repeated similar complaints [1]. Yours, Stefan [1] http://www.postgresql.org/message-id/ca+tgmozgqbeu2kn305hwds+axw7yp0yn9vzwbsbwa8unst+...@mail.gmail.com 2013/7/29 Kevin Grittner kgri...@ymail.com: Stefan Keller sfkel...@gmail.com wrote: Finally, setting random_page_cost to 1 helps also - but I don't like this setting neither. Well, you should learn to like whichever settings best model your actual costs given your level of caching and your workload. ;-) FWIW, I have found page costs less volatile and easier to tune with cpu_tuple_cost increased. I just always start by bumping that to 0.03. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How to properly index hstore tags column to faster search for keys
Hi Yuri and Radu-Stefan I would'nt give too fast on PostgreSQL! When looking at your query plan I wonder if one could reformulate the query to compute the ST_DWithin first (assuming you have an index on the node geometries!) before it filters the tags. To investigate that you could formulate a CTE query [1] which computes the ST_DWithin first. Yours, Stefan [1] http://www.postgresql.org/docs/9.2/static/queries-with.html 2013/7/8 Yuri Levinsky yu...@celltick.com Dear Radu-Stefan, It seems to me that you trying hard to solve a problem by SQL that probably can't be solved. Take a look please on Apache HBase. You can access HBase from PostgreSQL as well by utilizing Java or Python for example. ** ** *Sincerely yours*, ** ** [image: Description: Celltick logo_highres] Yuri Levinsky, DBA Celltick Technologies Ltd., 32 Maskit St., Herzliya 46733, Israel Mobile: +972 54 6107703, Office: +972 9 9710239; Fax: +972 9 9710222 ** ** *From:* pgsql-performance-ow...@postgresql.org [mailto: pgsql-performance-ow...@postgresql.org] *On Behalf Of *Radu-Stefan Zugravu *Sent:* Monday, July 08, 2013 12:20 PM *To:* Richard Huxton *Cc:* pgsql-performance@postgresql.org *Subject:* Re: [PERFORM] How to properly index hstore tags column to faster search for keys ** ** Any improvement is welcomed. The overall performance of the application is not very good. It takes about 200 seconds to compute a path for not so far star and end points. I want to improve this query as much as I can. How exactly should I post the explain without the index? Do I have to drop all created indexes for the tags column? It takes some time to create them back. ** ** On Mon, Jul 8, 2013 at 11:53 AM, Richard Huxton d...@archonet.com wrote:* *** On 08/07/13 09:31, Radu-Stefan Zugravu wrote: Hi, Thank you for your answer. My EXPLAIN ANALYZE output can be found here: http://explain.depesz.com/s/Wbo. ** ** Thanks Also, there is a discution on this subject on dba.stackexchange.com http://dba.stackexchange.com: http://dba.stackexchange.com/questions/45820/how-to-properly-index-hstore-tags-column-to-faster-search-for-keys Thanks - also useful to know. I can't see anything wrong with your query. Reading it from the bottom upwards: 1. Index used for historic search - builds a bitmap of blocks 2. Index used for geometry search - builds a bitmap of blocks 3. See where the bitmaps overlap (BitmapAnd) 4. Grab those disk blocks and find the rows (Bitmap Heap Scan) The whole thing takes under 20ms - what sort of time were you hoping for? The bulk of it (15ms) is taken up locating the historic rows. There are 36351 of those, but presumably most of them are far away on the map. Could you post the explain without the index? I'm curious as to how slow it is just testing the tags after doing the geometry search. -- Richard Huxton Archonet Ltd ** ** -- Radu-Stefan Zugravu 0755 950 145 0760 903 464 raduzugrav...@gmail.com radu.zugr...@yahoo.com This mail was received via Mail-SeCure System. image002.jpg
[PERFORM] Re: FTS performance issue - planner problem identified (but only partially resolved)
Hi Sorry, referring to GIST index in my mail before was no good idea. The bottom line still is, that the query (as recommended by the docs) and the planner don't choose the index which makes it slow - unless the original query... select id,title,left(content,100) from fulltextsearch where plainto_tsquery('pg_catalog.english','good') @@ to_tsvector('pg_catalog.english',content); is reformulated by this select id,title,left(content,100) from fulltextsearch, plainto_tsquery('pg_catalog.english','good') query where query @@ to_tsvector('pg_catalog.english',content); ... using default values for enable_seqscan and set random_page_cost. Yours, S. 2013/7/19 Stefan Keller sfkel...@gmail.com: Hi At 2013/2/8 I wrote: I have problems with the performance of FTS in a query like this: SELECT * FROM FullTextSearch WHERE content_tsv_gin @@ plainto_tsquery('english', 'good'); It's slow ( 30 sec.) for some GB (27886 html files, originally 73 MB zipped). The planner obviously always chooses table scan Now, I've identified (but only partially resolved) the issue: Here are my comments: Thats the query in question (see commented log below): select id,title,left(content,100) from fulltextsearch where plainto_tsquery('pg_catalog.english','good') @@ to_tsvector('pg_catalog.english',content); After having created the GIN index, the FTS query unexpectedly is fast because planner chooses Bitmap Index Scan. After the index statistics have been updated, the same query becomes slow. Only when using the trick with the function in the WHERE clause. I think GIST does'nt change anything. select id,title,left(content,100) from fulltextsearch, plainto_tsquery('pg_catalog.english','good') query where query @@ to_tsvector('pg_catalog.english',content); = This hint should mentioned in the docs! Then, setting enable_seqscan to off makes original query fast again. But that's a setting I want to avoid in a multi-user database. Finally, setting random_page_cost to 1 helps also - but I don't like this setting neither. = To me the planner should be updated to recognize immutable plainto_tsquery() function in the WHERE clause and choose Bitmap Index Scan at the first place. What do you think? Yours, Stefan Lets look at table fulltextsearch: movies=# \d fulltextsearch Table public.fulltextsearch Column | Type | Modifiers -+-+- id | integer | not null default nextval('fulltextsearch_id_seq'::regclass) docid | integer | default 0 title | text| content | text| not null movies=# CREATE INDEX fulltextsearch_gincontent ON fulltextsearch USING gin(to_tsvector('pg_catalog.english',content)); movies=# SELECT * FROM pg_class c WHERE relname LIKE 'fullt%'; oid | name| kind | tuples| pages | allvisible | toastrelid | hasindex +---+--+-+---+++-- 476289 | fulltextsearch| r| 27886 | 555 | 0 | 476293 | t 503080 | fulltextsearch_gincontent | i| 8.97135e+06 | 11133 | 0 | 0 | f 476296 | fulltextsearch_id_seq | S| 1 | 1 | 0 | 0 | f 503075 | fulltextsearch_pkey | i| 27886 |79 | 0 | 0 | f (4 rows) = fulltextsearch_gincontent has an arbitrary large number of tuples (statistics is wrong and not yet updated) movies=# explain (analyze,costs,timing,buffers) select id,title,left(content,100) from fulltextsearch where plainto_tsquery('pg_catalog.english','good') @@ to_tsvector('pg_catalog.english',content); = Unexpectedly, the query is fast! See query plan http://explain.depesz.com/s/ewn Let's update the statistics: movies=# VACUUM ANALYZE VERBOSE fulltextsearch ; SELECT * FROM pg_class c WHERE relname LIKE 'fullt%'; oid | name| kind | tuples | pages | allvisible | toastrelid | hasindex +---+--++---+++-- 476289 | fulltextsearch| r| 27886 | 555 | 555 | 476293 | t 503080 | fulltextsearch_gincontent | i| 27886 | 11133 | 0 | 0 | f 476296 | fulltextsearch_id_seq | S| 1 | 1 | 0 | 0 | f 503075 | fulltextsearch_pkey | i| 27886 |79 | 0 | 0 | f (4 rows) = Now after having update statistics (see especially tuples of fulltextsearch_gincontent ) the original query is slow! See query plan http://explain.depesz.com/s/MQ60 Now, let's reformulate the original query and move the function call to plainto_tsquery to the FROM clause: movies=# explain (analyze,costs,timing,buffers) select id,title,left(content,100) from fulltextsearch, plainto_tsquery('pg_catalog.english','good
Re: [PERFORM] FTS performance issue - planner problem identified (but only partially resolved)
Hi Marc Thanks a lot for your hint! You mean doing a SET track_counts (true); for the whole session? That would be ok if it would be possible just for the gin index. It's obviously an issue of the planner estimation costs. The data I'm speaking about (movies) has a text attribute which has a length of more than 8K so it's obviously having to do with detoasting. But the thoughts about @@ operators together with this GIN index seem also to be valid. I hope this issue is being tracked in preparation for 9.3. Regards, Stefan 2013/7/19 Marc Mamin m.ma...@intershop.de: SELECT * FROM FullTextSearch WHERE content_tsv_gin @@ plainto_tsquery('english', 'good'); It's slow ( 30 sec.) for some GB (27886 html files, originally 73 MB zipped). The planner obviously always chooses table scan Hello, A probable reason for the time difference is the cost for decompressing toasted content. At least in 8.3, the planner was not good at estimating it. I'm getting better overall performances since I've stopped collect statistic on tsvectors. An alternative would have been to disallow compression on them. I'm aware this is a drastic way and would not recommend it without testing. The benefit may depend on the type of data you are indexing. In our use case these are error logs with many java stack traces, hence with many lexemes poorly discriminative. see: http://www.postgresql.org/message-id/27953.1329434...@sss.pgh.pa.us as a comment on http://www.postgresql.org/message-id/c4dac901169b624f933534a26ed7df310861b...@jenmail01.ad.intershop.net regards, Marc Mamin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] FTS performance issue - planner problem identified (but only partially resolved)
Hi At 2013/2/8 I wrote: I have problems with the performance of FTS in a query like this: SELECT * FROM FullTextSearch WHERE content_tsv_gin @@ plainto_tsquery('english', 'good'); It's slow ( 30 sec.) for some GB (27886 html files, originally 73 MB zipped). The planner obviously always chooses table scan Now, I've identified (but only partially resolved) the issue: Here are my comments: Thats the query in question (see commented log below): select id,title,left(content,100) from fulltextsearch where plainto_tsquery('pg_catalog.english','good') @@ to_tsvector('pg_catalog.english',content); After having created the GIN index, the FTS query unexpectedly is fast because planner chooses Bitmap Index Scan. After the index statistics have been updated, the same query becomes slow. Only when using the trick with the function in the WHERE clause. I think GIST does'nt change anything. select id,title,left(content,100) from fulltextsearch, plainto_tsquery('pg_catalog.english','good') query where query @@ to_tsvector('pg_catalog.english',content); = This hint should mentioned in the docs! Then, setting enable_seqscan to off makes original query fast again. But that's a setting I want to avoid in a multi-user database. Finally, setting random_page_cost to 1 helps also - but I don't like this setting neither. = To me the planner should be updated to recognize immutable plainto_tsquery() function in the WHERE clause and choose Bitmap Index Scan at the first place. What do you think? Yours, Stefan Lets look at table fulltextsearch: movies=# \d fulltextsearch Table public.fulltextsearch Column | Type | Modifiers -+-+- id | integer | not null default nextval('fulltextsearch_id_seq'::regclass) docid | integer | default 0 title | text| content | text| not null movies=# CREATE INDEX fulltextsearch_gincontent ON fulltextsearch USING gin(to_tsvector('pg_catalog.english',content)); movies=# SELECT * FROM pg_class c WHERE relname LIKE 'fullt%'; oid | name| kind | tuples| pages | allvisible | toastrelid | hasindex +---+--+-+---+++-- 476289 | fulltextsearch| r| 27886 | 555 | 0 | 476293 | t 503080 | fulltextsearch_gincontent | i| 8.97135e+06 | 11133 | 0 | 0 | f 476296 | fulltextsearch_id_seq | S| 1 | 1 | 0 | 0 | f 503075 | fulltextsearch_pkey | i| 27886 |79 | 0 | 0 | f (4 rows) = fulltextsearch_gincontent has an arbitrary large number of tuples (statistics is wrong and not yet updated) movies=# explain (analyze,costs,timing,buffers) select id,title,left(content,100) from fulltextsearch where plainto_tsquery('pg_catalog.english','good') @@ to_tsvector('pg_catalog.english',content); = Unexpectedly, the query is fast! See query plan http://explain.depesz.com/s/ewn Let's update the statistics: movies=# VACUUM ANALYZE VERBOSE fulltextsearch ; SELECT * FROM pg_class c WHERE relname LIKE 'fullt%'; oid | name| kind | tuples | pages | allvisible | toastrelid | hasindex +---+--++---+++-- 476289 | fulltextsearch| r| 27886 | 555 | 555 | 476293 | t 503080 | fulltextsearch_gincontent | i| 27886 | 11133 | 0 | 0 | f 476296 | fulltextsearch_id_seq | S| 1 | 1 | 0 | 0 | f 503075 | fulltextsearch_pkey | i| 27886 |79 | 0 | 0 | f (4 rows) = Now after having update statistics (see especially tuples of fulltextsearch_gincontent ) the original query is slow! See query plan http://explain.depesz.com/s/MQ60 Now, let's reformulate the original query and move the function call to plainto_tsquery to the FROM clause: movies=# explain (analyze,costs,timing,buffers) select id,title,left(content,100) from fulltextsearch, plainto_tsquery('pg_catalog.english','good') query where query @@ to_tsvector('pg_catalog.english',content); = This special query is fast again! See query plan http://explain.depesz.com/s/FVT Setting enable_seqscan to off makes query fast again: See query plan http://explain.depesz.com/s/eOr Finally, setting random_page_cost to 1 helps also (default is 4): movies=# set enable_seqscan to default; movies=# set random_page_cost to 1.0; = Query is fast. See query plan http://explain.depesz.com/s/M5Ke -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How to properly index hstore tags column to faster search for keys
Hi, Thank you for your answer. My EXPLAIN ANALYZE output can be found here: http://explain.depesz.com/s/Wbo . Also, there is a discution on this subject on dba.stackexchange.com: http://dba.stackexchange.com/questions/45820/how-to-properly-index-hstore-tags-column-to-faster-search-for-keys On Mon, Jul 8, 2013 at 10:44 AM, Richard Huxton d...@archonet.com wrote: On 07/07/13 08:28, Radu-Stefan Zugravu wrote: Each node has a geometry column called geom and a hstore column called tags. I need to extract nodes along a line that have certain keys in the tags column. To do that I use the following query: SELECT id, tags FROM nodes WHERE ST_DWithin(nodes.geom, ST_MakeLine('{$geom1}', '{$geom2}'), 0.001) AND tags ? '{$type}'; CREATE INDEX nodes_tags_idx ON nodes USING GIN(tags); After creating the index I searched again for nodes using the same first query but there is no change in performance. How can I properly use GIN and GIST to index tags column so I can faster search for nodes that have a certain key in tags column? Your index definition looks OK. Try showing the output of EXPLAIN ANALYSE for your query - that way we'll see if the index is being used. You can always paste explain output to: http://explain.depesz.com/ if it's too long for the email. -- Richard Huxton Archonet Ltd -- Radu-Stefan Zugravu 0755 950 145 0760 903 464 raduzugrav...@gmail.com radu.zugr...@yahoo.com
Re: [PERFORM] How to properly index hstore tags column to faster search for keys
I do call the query for each neighbour node to find which one is better in building my path. I think I will try the first way you mentioned. I also found some references using BTREE indexes: CREATE INDEX nodes_tags_btree_historic_idx on nodes USING BTREE ((tags ? 'historic')); CREATE INDEX nodes_tags_btree_tourist_idx on nodes USING BTREE ((tags ? 'tourist)); Do you think this could make a difference? On Mon, Jul 8, 2013 at 1:27 PM, Richard Huxton d...@archonet.com wrote: On 08/07/13 10:20, Radu-Stefan Zugravu wrote: Any improvement is welcomed. The overall performance of the application is not very good. It takes about 200 seconds to compute a path for not so far star and end points. So you have to call this query 1000 times with different start and end points? I want to improve this query as much as I can. There's only two ways I can see to get this much below 20ms. This will only work if you want a very restricted range of tags. Drop the tag index and create multiple geometry indexes instead: CREATE INDEX node_geo_tourist_idx index details WHERE tags ? 'tourist'; CREATE INDEX node_geo_tourist_idx index details WHERE tags ? 'history'; etc. This will only work if you have a literal WHERE clause that checks the tag. It should be fast though. The second way would be to delete all the nodes that aren't tagged tourist or history. That assumes you are never interested in them of course. How exactly should I post the explain without the index? Do I have to drop all created indexes for the tags column? It takes some time to create them back. Not important - I was just curious. -- Richard Huxton Archonet Ltd -- Radu-Stefan Zugravu 0755 950 145 0760 903 464 raduzugrav...@gmail.com radu.zugr...@yahoo.com
[PERFORM] How to properly index hstore tags column to faster search for keys
I imported a large area of OpenStreetMap's planet.osm file into a postgresql database. The database contains a table called nodes. Each node has a geometry column called geom and a hstore column called tags. I need to extract nodes along a line that have certain keys in the tags column. To do that I use the following query: SELECT id, tags FROM nodes WHERE ST_DWithin(nodes.geom, ST_MakeLine('{$geom1}', '{$geom2}'), 0.001) AND tags ? '{$type}'; $geom1 and $geom2 are geometries for start and end points of my line. The $type variable contains the key I want to search for. Now, it can have one of the following values: 'historic' or 'tourist'. The query given above works but it is too slow. I guess searching for a key in tags column takes too much time. I read about GIN and GIST indexes and I generated a GIN index using the following query: CREATE INDEX nodes_tags_idx ON nodes USING GIN(tags); After creating the index I searched again for nodes using the same first query but there is no change in performance. How can I properly use GIN and GIST to index tags column so I can faster search for nodes that have a certain key in tags column? Thank you, Radu-Stefan
[PERFORM]
Hi, I'm experiencing a very slow CTE query (see below). When I split the three aggregationns into separate views, its' decent fast. So I think it's due to the planner. Any ideas how to reformulate the query? These are the tables and views involved: * Table promotion with start/end date and a region, and table promo2mission (each 1 to dozen tupels). * View all_errors (more than 20'000 tubles, based on table errors without tupels from table fix) * Table error_type (7 tupels) Here's the EXPLAIN ANALYZE log: http://explain.depesz.com/s/tbF Yours, Stefan CTE Query: WITH aggregation1 AS (SELECT p.id AS promo_id, p.startdate, p.enddate, p.geom AS promogeom, pm.error_type, pm.mission_extra_coins AS extra_coins FROM (promotion p join promo2mission pm ON (( p.id = pm.promo_id ))) WHERE ( ( p.startdate = Now() ) AND ( p.enddate = Now() ) )), aggregation2 AS (SELECT e.error_id AS missionid, e.schemaid, t.TYPE, e.osm_id, e.osm_type, t.description AS title, t.view_type, t.answer_placeholder, t.bug_question AS description, t.fix_koin_count, t.vote_koin_count, e.latitude, e.longitude, e.geom AS missiongeom, e.txt1, e.txt2, e.txt3, e.txt4, e.txt5 FROM all_errors e, error_type t WHERE ( ( e.error_type_id = t.error_type_id ) AND ( NOT ( EXISTS (SELECT 1 FROM fix f WHERE ( ( ( ( f.error_id = e.error_id ) AND ( f.osm_id = e.osm_id ) ) AND ( ( f.schemaid ) :: text = ( e.schemaid ) :: text ) ) AND ( ( f.complete AND f.valid ) OR ( NOT f.complete ) ) )) ) ) )), aggregation3 AS (SELECT ag2.missionid AS missionidtemp, ag1.promo_id, ag1.extra_coins FROM (aggregation2 ag2 join aggregation1 ag1 ON (( ( ag2.TYPE ) :: text = ( ag1.error_type ) :: text ))) WHERE public._st_contains(ag1.promogeom, ag2.missiongeom)) SELECT ag2.missionid AS id, ag2.schemaid, ag2.TYPE, ag2.osm_id, ag2.osm_type, ag2.title, ag2.description, ag2.latitude, ag2.longitude, ag2.view_type, ag2.answer_placeholder, ag2.fix_koin_count, ag2.missiongeom, ag2.txt1, ag2.txt2, ag2.txt3, ag2.txt4, ag2.txt5, ag3.promo_id, ag3.extra_coins FROM (aggregation2 ag2 left join aggregation3 ag3 ON (( ag2.missionid = ag3.missionidtemp ))); -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Slow CTE Query
Hi, I'm experiencing a very slow CTE query (see below). When I split the three aggregations into three separate views, its' decent fast. So I think it's due to the planner. Any ideas like reformulating the query? These are the tables and views involved: * Table promotion with start/end date and a region, and table promo2mission (each 1 to dozen tupels). * View all_errors (more than 20'000 tubles, based on table errors without tupels from table fix) * Table error_type (7 tupels) Here's the EXPLAIN ANALYZE log: http://explain.depesz.com/s/tbF Yours, Stefan CTE Query: WITH aggregation1 AS (SELECT p.id AS promo_id, p.startdate, p.enddate, p.geom AS promogeom, pm.error_type, pm.mission_extra_coins AS extra_coins FROM (promotion p join promo2mission pm ON (( p.id = pm.promo_id ))) WHERE ( ( p.startdate = Now() ) AND ( p.enddate = Now() ) )), aggregation2 AS (SELECT e.error_id AS missionid, e.schemaid, t.TYPE, e.osm_id, e.osm_type, t.description AS title, t.view_type, t.answer_placeholder, t.bug_question AS description, t.fix_koin_count, t.vote_koin_count, e.latitude, e.longitude, e.geom AS missiongeom, e.txt1, e.txt2, e.txt3, e.txt4, e.txt5 FROM all_errors e, error_type t WHERE ( ( e.error_type_id = t.error_type_id ) AND ( NOT ( EXISTS (SELECT 1 FROM fix f WHERE ( ( ( ( f.error_id = e.error_id ) AND ( f.osm_id = e.osm_id ) ) AND ( ( f.schemaid ) :: text = ( e.schemaid ) :: text ) ) AND ( ( f.complete AND f.valid ) OR ( NOT f.complete ) ) )) ) ) )), aggregation3 AS (SELECT ag2.missionid AS missionidtemp, ag1.promo_id, ag1.extra_coins FROM (aggregation2 ag2 join aggregation1 ag1 ON (( ( ag2.TYPE ) :: text = ( ag1.error_type ) :: text ))) WHERE public._st_contains(ag1.promogeom, ag2.missiongeom)) SELECT ag2.missionid AS id, ag2.schemaid, ag2.TYPE, ag2.osm_id, ag2.osm_type, ag2.title, ag2.description, ag2.latitude, ag2.longitude, ag2.view_type, ag2.answer_placeholder, ag2.fix_koin_count, ag2.missiongeom, ag2.txt1, ag2.txt2, ag2.txt3, ag2.txt4, ag2.txt5, ag3.promo_id, ag3.extra_coins FROM (aggregation2 ag2 left join aggregation3 ag3 ON (( ag2.missionid = ag3.missionidtemp ))); -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] [BUGS] BUG #8130: Hashjoin still gives issues
Dear Tom, On Wed, 1 May 2013, Tom Lane wrote: What can we do to provide a bit more of information? https://wiki.postgresql.org/wiki/Slow_Query_Questions There is no particularly good reason to think this is a bug; please take it up on pgsql-performance if you have more questions. I beg to disagree, the performance of a select * query and the select b.id query are both hot. The result in a fundamentally different query plan (and performance). Combined with the recent bugfix regarding hash estimation, it gives me a good indication that there might be a bug. I am not deep into the query optimiser of PostgreSQL but given the above same were different selections can change an entire query plan (and * is in fact out of the box 30 times faster than b.id) it does. When hash is disabled the entire query is -depending on the system checked- 2 to 30x faster. The original query: select * from ambit_privateevent_calendars as a, ambit_privateevent as b, ambit_calendarsubscription as c, ambit_calendar as d where c.calendar_id = d.id and a.privateevent_id = b.id and c.user_id = 1270 and c.calendar_id = a.calendar_id and c.STATUS IN (1, 8, 2, 15, 18, 4, 12, 20) and not b.main_recurrence = true; select b.id from ambit_privateevent_calendars as a, ambit_privateevent as b, ambit_calendarsubscription as c, ambit_calendar as d where c.calendar_id = d.id and a.privateevent_id = b.id and c.user_id = 1270 and c.calendar_id = a.calendar_id and c.STATUS IN (1, 8, 2, 15, 18, 4, 12, 20) and not b.main_recurrence = true; (select * = select b.id, the star query is *fastest*) We compare: http://explain.depesz.com/s/jRx http://explain.depesz.com/s/eKE By setting set enable_hashjoin = off; performance in our entire application increased 30 fold in throughput, which was a bit unexpected but highly appreciated. The result of the last query switch the mergejoin: http://explain.depesz.com/s/AWB It is also visible that after hashjoin is off, the b.id query is faster than the * query (what would be expected). Our test machine is overbudgetted, 4x the memory of the entire database ~4GB, and uses the PostgreSQL stock settings. Stefan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] FTS performance issue probably due to wrong planner estimate of detoasting
Hi Jesper and Pavel Thx for your hints. I'm rather reluctant in tuning with unwanted side effects, We'll see. I have to setup my system and db again before I can try out your tricks. Yours, Stefan 2013/2/8 Jesper Krogh jes...@krogh.cc: On 08/02/13 01:52, Stefan Keller wrote: Hi, I have problems with the performance of FTS in a query like this: SELECT * FROM FullTextSearch WHERE content_tsv_gin @@ plainto_tsquery('english', 'good'); It's slow ( 30 sec.) for some GB (27886 html files, originally 73 MB zipped). The planner obviously always chooses table scan: http://explain.depesz.com/s/EEE I have to check again, if I'm doing something wrong but I'm pretty sure it has to do with de-toasting and (wrong?) cost estimations. If you havent done it .. bump up statistics target on the column and re-analyze, see what that gives. I have also been playing with the cost-numbers in order to get it to favour an index-scan more often. That is lowering random_page_cost to be close to seq_page_cost, dependent on your system, the amount of memory, etc, then this can have negative side-effects on non-gin-queries. -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] FTS performance issue probably due to wrong planner estimate of detoasting
Hi, I have problems with the performance of FTS in a query like this: SELECT * FROM FullTextSearch WHERE content_tsv_gin @@ plainto_tsquery('english', 'good'); It's slow ( 30 sec.) for some GB (27886 html files, originally 73 MB zipped). The planner obviously always chooses table scan: http://explain.depesz.com/s/EEE I have to check again, if I'm doing something wrong but I'm pretty sure it has to do with de-toasting and (wrong?) cost estimations. I've seen some comments here saying that estimating detoasting costs (especially with operator @@ and GIN index) is an open issue (since years?). And I found a nice blog here [1] which uses 9.2/9.1 and proposes to disable sequential table scan (SET enable_seqscan off;). But this is no option for me since other queries still need seqscan. Can anyone tell me if is on some agenda here (e.g. as an open item for 9.2)? Yours, Stefan [1] http://palominodb.com/blog/2012/03/06/considerations-about-text-searchs-big-fields-and-planner-costs -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] serious under-estimation of n_distinct for clustered distributions
A status update on this problem: 1.) Workarounds (setting n_distinct manually) are tested and - as far as workarounds go - OK. 2.) Source of the problem and possible solution: The source of these troubles is the sampling method employed in src/backend/commands/analyze.c. Judging from Tom Lane's comment for the original implementation in 2004 this has never been thought to be perfect. Does anybody see a chance to improve that part? Should this discussion be taken elsewhere? Is there any input from my side that could help? btw: I do find this problem to be very frequent in our databases. And considering the commonplace conditions leading to it, I would expect many systems to be affected. But searching the forums and the web I hardly found any references to it - which amazes me to no end. Best Regards, Stefan On 12/30/2012 07:02 PM, Stefan Andreatta wrote: On 12/29/2012 10:57 PM, Peter Geoghegan wrote: On 29 December 2012 20:57, Stefan Andreatta s.andrea...@synedra.com wrote: Now, the 2005 discussion goes into great detail on the advantages and disadvantages of this algorithm, particularly when using small sample sizes, and several alternatives are discussed. I do not know whether anything has been changed after that, but I know that the very distinct problem, which I will focus on here, still persists. It's a really hard problem to solve satisfactorily. It's a problem that has been studied in much detail. Yes, the algorithm used is still the same. See the comments within src/backend/commands/analyze.c (IBM Research Report RJ 10025 is referenced there). Thanks a lot for this information! I looked through the code a bit. The Haas Stokes Formula is fine. The problem really lies with the two phase random selection procedure: Starting from line 1039, there is a comment: * As of May 2004 we use a new two-stage method: Stage one selects up * to targrows random blocks (or all blocks, if there aren't so many). * Stage two scans these blocks and uses the Vitter algorithm to create * a random sample of targrows rows (or less, if there are less in the * sample of blocks). The two stages are executed simultaneously: each * block is processed as soon as stage one returns its number and while * the rows are read stage two controls which ones are to be inserted * into the sample. * * Although every row has an equal chance of ending up in the final * sample, this sampling method is not perfect: not every possible * sample has an equal chance of being selected. For large relations * the number of different blocks represented by the sample tends to be * too small. We can live with that for now. Improvements are welcome. Now the problem with clustered data is, that the probability of sampling a value twice is much higher when the same page is repeatedly sampled. As stage one takes a random sample of pages, and stage two samples rows from these pages, the probability of visiting the same page twice (or more often) is much higher than if random rows were selected from the whole table. Hence we get a lot more multiple values for clustered data and we end up with the severe under-estimation we can see in those cases. Probabilities do my brain in, as usual, but I tested the procedure for my test data with a simple python script. There is absolutely nothing wrong with the implementation. It seems to be a purely statistical problem. Not everything may be hopeless though ;-) The problem could theoretically be avoided if random rows were selected from the whole table. Again, that may not be feasible - the two phase approach was probably not implemented for nothing. Another possible solution would be to avoid much of the resampling (not all) in phase two. For that - in theory - every page visited would have to get a lower weight, so that revisiting this page is not any more likely as rows were selected from the whole column. That does not sound easy or elegant to implement. But perhaps there is some clever algorithm - unfortunately I do not know. The general advice here is: 1) Increase default_statistics_target for the column. 2) If that doesn't help, consider using the following DDL: alter table foo alter column bar set ( n_distinct = 5.0); Yes, I will probably have to live with that for now - I will come back to these workarounds with one or two questions. Thanks again Regards, Seefan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Simple join doesn't use index
On 01/03/2013 11:54 PM, Alex Vinnik wrote: Don't understand why PG doesn't use views_visit_id_index in that query but rather scans whole table. One explanation I have found that when resulting dataset constitutes ~15% of total number of rows in the table then seq scan is used. In this case resulting dataset is just 1.5% of total number of rows. So it must be something different. Any reason why it happens and how to fix it? But does the query planner know the same? If you added the EXPLAIN ANALYZE output of the query and something like: SELECT tablename AS table_name, attname AS column_name, null_frac, avg_width, n_distinct, correlation FROM pg_stats WHERE tablename in ('views', 'visits'); .. one could possibly tell a bit more. Postgres 9.2 Ubuntu 12.04.1 LTS shared_buffers = 4GB the rest of the settings are default ones There are more than just this one memory related value, that need to be changed for optimal performance. E.g. effective_cache_size can have a direct effect on use of nested loops. See: http://www.postgresql.org/docs/9.2/static/runtime-config-query.html Regards, Stefan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] serious under-estimation of n_distinct for clustered distributions
On 12/29/2012 10:57 PM, Peter Geoghegan wrote: On 29 December 2012 20:57, Stefan Andreatta s.andrea...@synedra.com wrote: ... The general advice here is: 1) Increase default_statistics_target for the column. I tried that, but to get good estimates under these circumstances, I need to set the statistics_target so high that the whole table gets analyzed. As this problem matters most for all of our large tables, I would have to set default_statistics_target to something like 10 - that's a bit scary for production systems with tables of appr. 100GB, I find. 2) If that doesn't help, consider using the following DDL: alter table foo alter column bar set ( n_distinct = 5.0); Yes, that's probably best - even if it means quite some maintenance work. I do it like that: ALTER TABLE test_1 ALTER COLUMN clustered_random_2000k SET (n_distinct = -0.05); btw: Postgres will never set relative n_distinct values for anything larger than -0.1. If I determine (or know) it to be a constant but lower fraction, could it be a problem to explicitly set this value to between -0.1 and 0? To activate that setting, however, an ANALYZE has to be run. That was not clear to me from the documentation: ANALYZE verbose test_1; To check column options and statistics values: SELECT pg_class.relname AS table_name, pg_attribute.attname AS column_name, pg_attribute.attoptions FROM pg_attribute JOIN pg_class ON pg_attribute.attrelid = pg_class.oid WHERE pg_attribute.attnum 0 AND pg_class.relname = 'test_1' AND pg_attribute.attname = 'clustered_random_2000k'; SELECT tablename AS table_name, attname AS column_name, null_frac, avg_width, n_distinct, correlation FROM pg_stats WHERE tablename = 'test_1' and attname = 'clustered_random_2000k'; And finally, we can undo the whole thing, if necessary: ALTER TABLE test_1 ALTER COLUMN clustered_random_2000k RESET (n_distinct); ANALYZE VERBOSE test_1; Regards, Stefan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] serious under-estimation of n_distinct for clustered distributions
On 12/29/2012 10:57 PM, Peter Geoghegan wrote: On 29 December 2012 20:57, Stefan Andreatta s.andrea...@synedra.com wrote: Now, the 2005 discussion goes into great detail on the advantages and disadvantages of this algorithm, particularly when using small sample sizes, and several alternatives are discussed. I do not know whether anything has been changed after that, but I know that the very distinct problem, which I will focus on here, still persists. It's a really hard problem to solve satisfactorily. It's a problem that has been studied in much detail. Yes, the algorithm used is still the same. See the comments within src/backend/commands/analyze.c (IBM Research Report RJ 10025 is referenced there). Thanks a lot for this information! I looked through the code a bit. The Haas Stokes Formula is fine. The problem really lies with the two phase random selection procedure: Starting from line 1039, there is a comment: * As of May 2004 we use a new two-stage method: Stage one selects up * to targrows random blocks (or all blocks, if there aren't so many). * Stage two scans these blocks and uses the Vitter algorithm to create * a random sample of targrows rows (or less, if there are less in the * sample of blocks). The two stages are executed simultaneously: each * block is processed as soon as stage one returns its number and while * the rows are read stage two controls which ones are to be inserted * into the sample. * * Although every row has an equal chance of ending up in the final * sample, this sampling method is not perfect: not every possible * sample has an equal chance of being selected. For large relations * the number of different blocks represented by the sample tends to be * too small. We can live with that for now. Improvements are welcome. Now the problem with clustered data is, that the probability of sampling a value twice is much higher when the same page is repeatedly sampled. As stage one takes a random sample of pages, and stage two samples rows from these pages, the probability of visiting the same page twice (or more often) is much higher than if random rows were selected from the whole table. Hence we get a lot more multiple values for clustered data and we end up with the severe under-estimation we can see in those cases. Probabilities do my brain in, as usual, but I tested the procedure for my test data with a simple python script. There is absolutely nothing wrong with the implementation. It seems to be a purely statistical problem. Not everything may be hopeless though ;-) The problem could theoretically be avoided if random rows were selected from the whole table. Again, that may not be feasible - the two phase approach was probably not implemented for nothing. Another possible solution would be to avoid much of the resampling (not all) in phase two. For that - in theory - every page visited would have to get a lower weight, so that revisiting this page is not any more likely as rows were selected from the whole column. That does not sound easy or elegant to implement. But perhaps there is some clever algorithm - unfortunately I do not know. The general advice here is: 1) Increase default_statistics_target for the column. 2) If that doesn't help, consider using the following DDL: alter table foo alter column bar set ( n_distinct = 5.0); Yes, I will probably have to live with that for now - I will come back to these workarounds with one or two questions. Thanks again Regards, Seefan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] serious under-estimation of n_distinct for clustered distributions
I have encountered serious under-estimations of distinct values when values are not evenly distributed but clustered within a column. I think this problem might be relevant to many real-world use cases and I wonder if there is a good workaround or possibly a programmatic solution that could be implemented. Thanks for your help! Stefan *The Long Story:* When Postgres collects statistics, it estimates the number of distinct values for every column (see pg_stats.n_distinct). This is one important source for the planner to determine the selectivity and hence can have great influence on the resulting query plan. My Problem: When I collected statistics on some columns that have rather high selectivity but not anything like unique values, I consistently got n_distinct values that are far too low (easily by some orders of magnitude). Worse still, the estimates did not really improve until I analyzed the whole table. I tested this for Postgres 9.1 and 9.2. An artificial test-case is described at the end of this mail. Some Analysis of the Problem: Unfortunately it is not trivial to estimate the total number of different values based on a sample. As far as I found out, Postgres uses an algorithm that is based on the number of values that are found only once in the sample used for ANALYZE. I found references to Good-Turing frequency estimation (http://encodestatistics.org/publications/statistics_and_postgres.pdf) and to a paper from Haas Stokes, Computer Science, 1996 (http://researcher.ibm.com/researcher/files/us-phaas/jasa3rj.pdf). The latter source is from Josh Berkus in a 2005 discussion on the Postgres Performance List (see e.g. http://grokbase.com/t/postgresql/pgsql-performance/054kztf8pf/bad-n-distinct-estimation-hacks-suggested for a look on the whole discussion there). The formula given there for the total number of distinct values is: n*d / (n - f1 + f1*n/N) where f1 is the number of values that occurred only once in the sample. n is the number of rows sampled, d the number of distincts found and N the total number of rows in the table. Now, the 2005 discussion goes into great detail on the advantages and disadvantages of this algorithm, particularly when using small sample sizes, and several alternatives are discussed. I do not know whether anything has been changed after that, but I know that the very distinct problem, which I will focus on here, still persists. When the number of values that are found only once in the sample (f1) becomes zero, the whole term equals d, that is, n_distinct is estimated to be just the number of distincts found in the sample. This is basically fine as it should only happen when the sample has really covered more or less all distinct values. However, we have a sampling problem here: for maximum efficiency Postgres samples not random rows but random pages. If the distribution of the values is not random but clustered (that is, the same values tend to be close together) we run into problems. The probability that any value from a clustered distribution is sampled only once, when any page covers multiple adjacent rows, is very low. So, under these circumstances, the estimate for n_distinct will always be close to the number of distincts found in the sample. Even if every value would in fact only appear a few times in the table. Relevance: I think this is not just an unfortunate border case, but a very common situation. Imagine two tables that are filled continually over time where the second table references the first - some objects and multiple properties for each for example. Now the foreign key column of the second table will have many distinct values but a highly clustered distribution. It is probably not helpful, if the planner significantly underestimates the high selectivity of the foreign key column. Workarounds: There are workarounds: manually setting table column statistics or using an extremely high statistics target, so that the whole table gets analyzed. However, these workarounds do not seem elegant and may be impractical. Questions: A) Did I find the correct reason for my problem? Specifically, does Postgres really estimate n_distinct as described above? B) Are there any elegant workarounds? C) What could be a programmatic solution to this problem? I think, it might be possible to use the number of values that are found in only one page (vs. found only once at all) for f1. Or the number of distincts could be calculated using some completely different approach? Test Case: For an artificial test-case let's create a table and fill it with 10 million rows (appr. 1,300 MB required). There is an ID column featuring unique values and 4 groups of 3 columns each that have selectivities of: - 5 (x_2000k = 2,000,000 distinct values) - 25 (x_400k = 400,000 distinct values) - 125 (x_80k = 80,000 distinct values). The 4 groups of columns show different distributions
Re: [PERFORM] Index over all partitions (aka global index)?
Yes a physical index would be one solution - but it's not the only one. The indexes could be treated in parallel in their physical places where they are. That's why I called it still logical. I don't think so that I would loose all benefits of partition since an index could adapt itself when partitions are attached or removed. That's probably how Oracle resolves it which knows global indexes probably since version 8(!) [1] Yours, S. [1] http://www.oracle-base.com/articles/8i/partitioned-tables-and-indexes.php 2012/10/14 Jeff Janes jeff.ja...@gmail.com: On Sat, Oct 13, 2012 at 5:43 PM, Stefan Keller sfkel...@gmail.com wrote: Say, there is a table with 250 mio. rows split into 250 tables with 1 mio. rows each. And say the the index behavior is O(log n). Then a search for a key takes O(log(250*n)) or 8.4 time units. What PG (9.1) currently probably does is a iterative call to all 250 partitioned tables, which will take O(250*log(n)) - or 1500 time units in this case. This is about 180 times slower. What do you think about introducing a global index over all partitions (like Ora :-)? This would be a (logically) single index which can be even be parallelized given the partitioned tables are optimally distributed like in different tablespaces. What do you think about this? What you already have is a logically single index. What you want is physically single index. But wouldn't that remove most of the benefits of partitioning? You could no longer add or remove partitions instantaneously, for example. Cheers, Jeff -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Index over all partitions (aka global index)?
Hi, Given I have a large table implemented with partitions and need fast access to a (primary) key value in a scenario where every minute updates (inserts/updates/deletes) are coming in. Now since PG does not allow any index (nor constraint) on master table, I have a performance issue (and a possible parallelization opportunity). Say, there is a table with 250 mio. rows split into 250 tables with 1 mio. rows each. And say the the index behavior is O(log n). Then a search for a key takes O(log(250*n)) or 8.4 time units. What PG (9.1) currently probably does is a iterative call to all 250 partitioned tables, which will take O(250*log(n)) - or 1500 time units in this case. This is about 180 times slower. What do you think about introducing a global index over all partitions (like Ora :-)? This would be a (logically) single index which can be even be parallelized given the partitioned tables are optimally distributed like in different tablespaces. What do you think about this? -S. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Inserts in 'big' table slowing down the database
Sorry for the delay. I had to sort out the problem (among other things). It's mainly about swapping. The table nodes contains about 2^31 entries and occupies about 80GB on disk space plus index. If one would store the geom values in a big array (where id is the array index) it would only make up about 16GB, which means that the ids are dense (with few deletes). Then updates come in every hour as bulk insert statements with entries having ids in sorted manner. Now PG becomes slower and slower! CLUSTER could help - but obviously this operation needs a table lock. And if this operation takes longer than an hour, it delays the next update. Any ideas? Partitioning? Yours, S. 2012/9/3 Ivan Voras ivo...@freebsd.org: On 03/09/2012 13:03, Stefan Keller wrote: Hi, I'm having performance issues with a simple table containing 'Nodes' (points) from OpenStreetMap: CREATE TABLE nodes ( id bigint PRIMARY KEY, user_name text NOT NULL, tstamp timestamp without time zone NOT NULL, geom GEOMETRY(POINT, 4326) ); CREATE INDEX idx_nodes_geom ON nodes USING gist (geom); The number of rows grows steadily and soon reaches one billion (1'000'000'000), therefore the bigint id. Now, hourly inserts (update and deletes) are slowing down the database (PostgreSQL 9.1) constantly. Before I'm looking at non-durable settings [1] I'd like to know what choices I have to tune it while keeping the database productive: cluster index? partition table? use tablespaces? reduce physical block size? You need to describe in detail what does slowing down mean in your case. Do the disk drives somehow do more operations per transaction? Does the database use more CPU cycles? Is there swapping? What is the expected (previous) performance? At a guess, it is very unlikely that using non-durable settings will help you here. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Inserts in 'big' table slowing down the database
Hi, I'm having performance issues with a simple table containing 'Nodes' (points) from OpenStreetMap: CREATE TABLE nodes ( id bigint PRIMARY KEY, user_name text NOT NULL, tstamp timestamp without time zone NOT NULL, geom GEOMETRY(POINT, 4326) ); CREATE INDEX idx_nodes_geom ON nodes USING gist (geom); The number of rows grows steadily and soon reaches one billion (1'000'000'000), therefore the bigint id. Now, hourly inserts (update and deletes) are slowing down the database (PostgreSQL 9.1) constantly. Before I'm looking at non-durable settings [1] I'd like to know what choices I have to tune it while keeping the database productive: cluster index? partition table? use tablespaces? reduce physical block size? Stefan [1] http://www.postgresql.org/docs/9.1/static/non-durability.html -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow query: Select all buildings that have 1 pharmacies and 1 schools within 1000m
Hi 2012/8/8 Jeff Janes jeff.ja...@gmail.com: On Tue, Aug 7, 2012 at 5:07 PM, Stefan Keller sfkel...@gmail.com wrote: Hi Craig Clever proposal! I slightly tried to adapt it to the hstore involved. Now I'm having a weird problem that PG says that relation 'p' does not exist. Why does PG recognize table b in the subquery but not table p? Any ideas? I don't think it does recognize b, either. It just fell over on p before it had a chance to fall over on b. No, the b get's recognized. See my original query. That's a strange behaviour of the SQL parser which I can't understand. I think you have to use WITH if you want to reference the same subquery in multiple FROMs. I'll try that with CTE too. Another approach would be to add explicit conditions for there being at least 1 school and 1 pharmacy within distance. There can't be 1 unless there is =1, but the join possibilities for =1 (i.e. where exists rather than where (select count(*)...)1 ) are much more attractive than the ones for 1. Cheers, Jeff You mean, first doing a select on existence and then apply the count condition later? Stefan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Slow query: Select all buildings that have 1 pharmacies and 1 schools within 1000m
Hi I have an interesting query to be optimized related to this one [1]. The query definition is: Select all buildings that have more than 1 pharmacies and more than 1 schools within a radius of 1000m. The problem is that I think that this query is inherently O(n^2). In fact the solution I propose below takes forever... My questions: 1. Any comments about the nature of this problem? 2. ... on how to speed it up ? 3. In the original query [1] there's a count which contains a subquery. According to my tests PostgreSQL does not allow this despite the documentation which says count(expression). Remarks: I know that count(*) could be faster on PostgreSQL but count(osm_id) does not change the query plan and this does not seem to be the bottleneck here anyway. Yours, S. [1] http://gis.stackexchange.com/questions/11445/selecting-pois-around-specific-buildings-using-postgis Here's my query: -- Select all buildings that have 1 pharmacies and 1 schools within 1000m: SELECT osm_id AS building_id FROM (SELECT osm_id, way FROM osm_polygon WHERE tags @ hstore('building','yes') ) AS b WHERE (SELECT count(*) 1 FROM osm_poi AS p WHERE p.tags @ hstore('amenity','pharmacy') AND ST_DWithin(b.way,p.way,1000) ) AND (SELECT count(*) 1 FROM osm_poi AS p WHERE p.tags @ hstore('amenity','school') AND ST_DWithin(b.way,p.way,1000) ) -- Total query runtime: 4308488 ms. 66345 rows retrieved. Here's the query plan (from EXPLAIN): Index Scan using osm_polygon_tags_idx on osm_polygon (cost=0.00..406812.81 rows=188 width=901) Index Cond: (tags @ 'building=yes'::hstore) Filter: ((SubPlan 1) AND (SubPlan 2)) SubPlan 1 - Aggregate (cost=269.19..269.20 rows=1 width=0) - Bitmap Heap Scan on osm_poi p (cost=7.76..269.19 rows=1 width=0) Recheck Cond: (way st_expand(osm_polygon.way, 1000::double precision)) Filter: ((tags @ 'amenity=pharmacy'::hstore) AND (osm_polygon.way st_expand(way, 1000::double precision)) AND _st_dwithin(osm_polygon.way, way, 1000::double precision)) - Bitmap Index Scan on osm_poi_way_idx (cost=0.00..7.76 rows=62 width=0) Index Cond: (way st_expand(osm_polygon.way, 1000::double precision)) SubPlan 2 - Aggregate (cost=269.19..269.20 rows=1 width=0) - Bitmap Heap Scan on osm_poi p (cost=7.76..269.19 rows=1 width=0) Recheck Cond: (way st_expand(osm_polygon.way, 1000::double precision)) Filter: ((tags @ 'amenity=school'::hstore) AND (osm_polygon.way st_expand(way, 1000::double precision)) AND _st_dwithin(osm_polygon.way, way, 1000::double precision)) - Bitmap Index Scan on osm_poi_way_idx (cost=0.00..7.76 rows=62 width=0) Index Cond: (way st_expand(osm_polygon.way, 1000::double precision)) *** -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow query: Select all buildings that have 1 pharmacies and 1 schools within 1000m
Your proposal lacks the requirement that it's the same building from where pharmacies and schools are reachable. But I think about. Yours, S. 2012/8/7 Tomas Vondra t...@fuzzy.cz: On 7 Srpen 2012, 14:01, Stefan Keller wrote: Hi I have an interesting query to be optimized related to this one [1]. The query definition is: Select all buildings that have more than 1 pharmacies and more than 1 schools within a radius of 1000m. The problem is that I think that this query is inherently O(n^2). In fact the solution I propose below takes forever... What about plain INTERSECT? Something like SELECT osm_id FROM osm_poi AS p, osm_polygon b WHERE p.tags @ hstore('amenity','pharmacy') AND ST_DWithin(b.way,p.way,1000) INTERSECT SELECT osm_id FROM osm_poi AS p, osm_polygon b WHERE p.tags @ hstore('amenity','school') AND ST_DWithin(b.way,p.way,1000) Or something like that. But maybe it's a complete nonsense ... Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow query: Select all buildings that have 1 pharmacies and 1 schools within 1000m
Hi Craig Clever proposal! I slightly tried to adapt it to the hstore involved. Now I'm having a weird problem that PG says that relation 'p' does not exist. Why does PG recognize table b in the subquery but not table p? Any ideas? -- Stefan SELECT b.way AS building_geometry FROM (SELECT way FROM osm_polygon WHERE tags @ hstore('building','yes') ) AS b, (SELECT way, tags-'amenity' as value FROM osm_poi WHERE tags ? 'amenity' ) AS p WHERE (SELECT count(*) 1 FROM p WHERE p.value = 'pharmacy' AND ST_DWithin(b.way,p.way,1000) ) AND (SELECT count(*) 1 FROM p WHERE p.value = 'school' AND ST_DWithin(b.way,p.way,1000) ) ERROR: relation p does not exist LINE 14: FROM p 2012/8/7 Craig James cja...@emolecules.com: On Tue, Aug 7, 2012 at 5:01 AM, Stefan Keller sfkel...@gmail.com wrote: Hi I have an interesting query to be optimized related to this one [1]. The query definition is: Select all buildings that have more than 1 pharmacies and more than 1 schools within a radius of 1000m. The problem is that I think that this query is inherently O(n^2). In fact the solution I propose below takes forever... Maybe you could get rid of the O(n^2) aspect like this: Select all buildings that have more than 1 pharmacies and more than 1 schools within a radius of 1000m from (Select all buildings that have more than four (pharmacy or school) within a radius of 1000m) The inner select should be fast -- you could make it fast by creating a new property like building of interest that was pharmacy or school and build an index on the building of interest property. The inner query would reduce your sample set to a much smaller set of buildings, and presumably the outer query could handle that pretty quickly. Craig James My questions: 1. Any comments about the nature of this problem? 2. ... on how to speed it up ? 3. In the original query [1] there's a count which contains a subquery. According to my tests PostgreSQL does not allow this despite the documentation which says count(expression). Remarks: I know that count(*) could be faster on PostgreSQL but count(osm_id) does not change the query plan and this does not seem to be the bottleneck here anyway. Yours, S. [1] http://gis.stackexchange.com/questions/11445/selecting-pois-around-specific-buildings-using-postgis Here's my query: -- Select all buildings that have 1 pharmacies and 1 schools within 1000m: SELECT osm_id AS building_id FROM (SELECT osm_id, way FROM osm_polygon WHERE tags @ hstore('building','yes') ) AS b WHERE (SELECT count(*) 1 FROM osm_poi AS p WHERE p.tags @ hstore('amenity','pharmacy') AND ST_DWithin(b.way,p.way,1000) ) AND (SELECT count(*) 1 FROM osm_poi AS p WHERE p.tags @ hstore('amenity','school') AND ST_DWithin(b.way,p.way,1000) ) -- Total query runtime: 4308488 ms. 66345 rows retrieved. Here's the query plan (from EXPLAIN): Index Scan using osm_polygon_tags_idx on osm_polygon (cost=0.00..406812.81 rows=188 width=901) Index Cond: (tags @ 'building=yes'::hstore) Filter: ((SubPlan 1) AND (SubPlan 2)) SubPlan 1 - Aggregate (cost=269.19..269.20 rows=1 width=0) - Bitmap Heap Scan on osm_poi p (cost=7.76..269.19 rows=1 width=0) Recheck Cond: (way st_expand(osm_polygon.way, 1000::double precision)) Filter: ((tags @ 'amenity=pharmacy'::hstore) AND (osm_polygon.way st_expand(way, 1000::double precision)) AND _st_dwithin(osm_polygon.way, way, 1000::double precision)) - Bitmap Index Scan on osm_poi_way_idx (cost=0.00..7.76 rows=62 width=0) Index Cond: (way st_expand(osm_polygon.way, 1000::double precision)) SubPlan 2 - Aggregate (cost=269.19..269.20 rows=1 width=0) - Bitmap Heap Scan on osm_poi p (cost=7.76..269.19 rows=1 width=0) Recheck Cond: (way st_expand(osm_polygon.way, 1000::double precision)) Filter: ((tags @ 'amenity=school'::hstore) AND (osm_polygon.way st_expand(way, 1000::double precision)) AND _st_dwithin(osm_polygon.way, way, 1000::double precision)) - Bitmap Index Scan on osm_poi_way_idx (cost=0.00..7.76 rows=62 width=0) Index Cond: (way st_expand(osm_polygon.way, 1000::double precision)) *** -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?
2012/3/1 Jeff Janes jeff.ja...@gmail.com: On Tue, Feb 28, 2012 at 3:46 PM, Stefan Keller sfkel...@gmail.com wrote: 2012/2/28 Claudio Freire klaussfre...@gmail.com: In the OP, you say There is enough main memory to hold all table contents.. I'm assuming, there you refer to your current system, with 4GB memory. Sorry for the confusion: I'm doing these tests on this machine with one table (osm_point) and one country. This table has a size of 2.6GB and 10 million tuples. The other machine has to deal with at least 5 tables in total and will be hold more than one country plus routing etc.. What is your shared_buffers set to? 2.6GB is uncomfortably close to 4GB, considering the computer has other things it needs to use memory for as well. These are the current modified settings in postgresql.conf: shared_buffers = 128MB work_mem = 3MB maintenance_work_mem = 30MB effective_cache_size = 352MB wal_buffers = 8MB default_statistics_target = 50 constraint_exclusion = on checkpoint_completion_target = 0.9 checkpoint_segments = 16 max_connections = 80 -Stefan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?
2012/2/29 Jeff Janes jeff.ja...@gmail.com: It's quite possible the vacuum full is thrashing your disk cache due to maintainance_work_mem. You can overcome this issue with the tar trick, which is more easily performed as: tar cf /dev/null $PG_DATA/base But on many implementations, that will not work. tar detects the output is going to the bit bucket, and so doesn't bother to actually read the data. Right. But what about the commands cp $PG_DATA/base /dev/null or cat $PG_DATA/base /dev/null ? They seem to do something. -Stefan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?
2012/2/29 Stefan Keller sfkel...@gmail.com: 2012/2/29 Jeff Janes jeff.ja...@gmail.com: It's quite possible the vacuum full is thrashing your disk cache due to maintainance_work_mem. You can overcome this issue with the tar trick, which is more easily performed as: tar cf /dev/null $PG_DATA/base But on many implementations, that will not work. tar detects the output is going to the bit bucket, and so doesn't bother to actually read the data. Right. But what about the commands cp $PG_DATA/base /dev/null or cat $PG_DATA/base /dev/null ? They seem to do something. ...or let's try /dev/zero instead /dev/null: tar cf /dev/zero $PG_DATA/base -Stefan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Re: 回复: [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?
Hi Wales 2012/2/27 Wales Wang wormw...@yahoo.com wrote: There are many approach for PostgreSQL in-memory. The quick and easy way is making slave pgsql run on persistent RAM filesystem, the slave is part of master/slave replication cluster. The fstab and script make RAM file system persistent is below: Setup: First, create a mountpoint for the disk : mkdir /mnt/ramdisk Secondly, add this line to /etc/fstab in to mount the drive at boot-time. tmpfs /mnt/ramdisk tmpfs defaults,size=65536M 0 0 #! /bin/sh # /etc/init.d/ramdisk.sh # case $1 in start) echo Copying files to ramdisk rsync -av /data/ramdisk-backup/ /mnt/ramdisk/ echo [`date +%Y-%m-%d %H:%M`] Ramdisk Synched from HD /var/log/ramdisk_sync.log ;; sync) echo Synching files from ramdisk to Harddisk echo [`date +%Y-%m-%d %H:%M`] Ramdisk Synched to HD /var/log/ramdisk_sync.log rsync -av --delete --recursive --force /mnt/ramdisk/ /data/ramdisk-backup/ ;; stop) echo Synching logfiles from ramdisk to Harddisk echo [`date +%Y-%m-%d %H:%M`] Ramdisk Synched to HD /var/log/ramdisk_sync.log rsync -av --delete --recursive --force /mnt/ramdisk/ /data/ramdisk-backup/ ;; *) echo Usage: /etc/init.d/ramdisk {start|stop|sync} exit 1 ;; esac exit 0 you can run it when startup and shutdown and crontabe hoursly. Wales Wang Thank you for the tipp. Making slave pgsql run on persistent RAM filesystem is surely at least a possibility which I'll try out. But what I'm finally after is a solution, where records don't get pushed back to disk a.s.a.p. but rather got hold in memory as long as possible assuming that there is enough memory. I suspect that currently there is quite some overhead because of that (besides disk-oriented structures). -Stefan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?
2012/2/28 Claudio Freire klaussfre...@gmail.com: On Tue, Feb 28, 2012 at 5:48 PM, Stefan Keller sfkel...@gmail.com wrote: P.S. And yes, the database is aka 'read-only' and truncated and re-populated from scratch every night. fsync is off so I don't care about ACID. After the indexes on name, hstore and geometry are generated I do a VACUUM FULL FREEZE. The current installation is a virtual machine with 4GB memory and the filesystem is read/write. The future machine will be a pizza box with 72GB memory. I don't get this. Something's wrong. In the OP, you say There is enough main memory to hold all table contents.. I'm assuming, there you refer to your current system, with 4GB memory. Sorry for the confusion: I'm doing these tests on this machine with one table (osm_point) and one country. This table has a size of 2.6GB and 10 million tuples. The other machine has to deal with at least 5 tables in total and will be hold more than one country plus routing etc.. -Stefan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?
Hi Jeff and Wales, 2012/2/26 Jeff Janes jeff.ja...@gmail.com wrote: The problem is that the initial queries are too slow - and there is no second chance. I do have to trash the buffer every night. There is enough main memory to hold all table contents. Just that table, or the entire database? The entire database consisting of only about 5 tables which are similar but with different geometry types plus a relations table (as OpenStreetMap calls it). 1. How can I warm up or re-populate shared buffers of Postgres? Instead, warm the OS cache. Then data will get transferred into the postgres shared_buffers pool from the OS cache very quickly. tar -c $PGDATA/base/ |wc -c Ok. So with OS cache you mean the files which to me are THE database itself? A cache to me is a second storage with controlled redudancy because of performance reasons. 2. Are there any hints on how to tell Postgres to read in all table contents into memory? I don't think so, at least not in core. I've wondered if it would make sense to suppress ring-buffer strategy when there are buffers on the free-list. That way a sequential scan would populate shared_buffers after a restart. But it wouldn't help you get the indexes into cache. So, are there any developments going on with PostgreSQL as Stephen suggested in the former thread? 2012/2/26 Wales Wang wormw...@yahoo.com: You can try PostgreSQL 9.x master/slave replication, then try run slave on persistent RAM Fileystem (tmpfs) So, access your all data from slave PostgreSQL that run on tmpfs.. Nice idea. I do have a single upscaled server and up to now I hesitated to allocate say 48 Gigabytes (out of 72) to such a RAM Fileystem (tmpfs). Still, would'nt it be more flexible when I could dynamically instruct PostgreSQL to behave like an in-memory database? Yours, Stefan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?
2012/2/26 Andy Colson a...@squeakycode.net wrote: On 02/25/2012 06:16 PM, Stefan Keller wrote: 1. How can I warm up or re-populate shared buffers of Postgres? 2. Are there any hints on how to tell Postgres to read in all table contents into memory? Yours, Stefan How about after you load the data, vacuum freeze it, then do something like: SELECT count(*) FROM osm_point WHERE tags @ 'tourism=junk' -Andy That good idea is what I proposed elsewhere on one of the PG lists and got told that this does'nt help. I can accept this approach that users should'nt directly interfere with the optimizer. But I think it's still worth to discuss a configuration option (per table) or so which tells PG that this table contents should fit into memory so that it tries to load a table into memory and keeps it there. This option probably only makes sense in combination with unlogged tables. Yours, Stefan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?
Hi, 2012/2/26 Cédric Villemain ced...@2ndquadrant.fr wrote: 1. How can I warm up or re-populate shared buffers of Postgres? There was a patch proposed for postgresql which purpose was to Which patch are you referring to? snapshot/Restore postgresql buffers, but it is still not sure how far that really help to have that part loaded. What's not sure and why? 2. Are there any hints on how to tell Postgres to read in all table contents into memory? I wrote pgfincore for the OS part: you can use it to preload table/index in OS cache, and do snapshot/restore if you want fine grain control of what part of the object you want to warm. https://github.com/klando/pgfincore Yes, now I remember. I have a look at that. I'd still like to see something where PG really preloads tuples and treats them always in-memory (given they fit into RAM). Since I have a read-only database there's no WAL and locking needed. But as soon as we allow writes I realize that the in-memory feature needs to be coupled with other enhancements like replication (which somehow would avoid WAL). Yours, Stefan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?
Hi, 2011/10/24 Stephen Frost sfr...@snowman.net wrote Now, we've also been discussing ways to have PG automatically re-populate shared buffers and possibly OS cache based on what was in memory at the time of the last shut-down, but I'm not sure that would help your case either since you're rebuilding everything every night and that's what's trashing your buffers (because everything ends up getting moved around). You might actually want to consider if that's doing more harm than good for you. If you weren't doing that, then the cache wouldn't be getting destroyed every night.. I'd like to come back on the issue of aka of in-memory key-value database. To remember, it contains table definition and queries as indicated in the appendix [0]. There exist 4 other tables of similar structure. There are indexes on each column. The tables contain around 10 million tuples. The database is read-only; it's completely updated every day. I don't expect more than 5 concurrent users at any time. A typical query looks like [1] and varies in an unforeseable way (that's why hstore is used). EXPLAIN tells me that the indexes are used [2]. The problem is that the initial queries are too slow - and there is no second chance. I do have to trash the buffer every night. There is enough main memory to hold all table contents. 1. How can I warm up or re-populate shared buffers of Postgres? 2. Are there any hints on how to tell Postgres to read in all table contents into memory? Yours, Stefan APPENDIX [0] CREATE TABLE osm_point ( osm_id integer, name text, tags hstore geom geometry(Point,4326) ); [1] SELECT osm_id, name FROM osm_point WHERE tags @ 'tourism=viewpoint' AND ST_Contains( GeomFromText('BOX(8.42 47.072, 9.088 47.431)'::box2d, 4326), geom) [2] EXPLAIN ANALYZE returns: Bitmap Heap Scan on osm_point (cost=402.15..40465.85 rows=430 width=218) (actual time=121.888..137. Recheck Cond: (tags @ 'tourism=viewpoint'::hstore) Filter: (('01030...'::geometry geom) AND _st_contains('01030'::geometry, geom)) - Bitmap Index Scan on osm_point_tags_idx (cost=0.00..402.04 rows=11557 width=0) (actual time=1 6710 loops=1) Index Cond: (tags @ 'tourism=viewpoint'::hstore) Total runtime: 137.881 ms (6 rows) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] hstore query: Any better idea than adding more memory?
Hi Stephen Thanks for your answer and hints. 2011/10/24 Stephen Frost sfr...@snowman.net wrote: * Stefan Keller (sfkel...@gmail.com) wrote: Adding more memory (say to total of 32 GB) would only postpone the problem. Erm, seems like you're jumping to conclusions here... Sorry. I actually only wanted to report here what's special in my postgresql.conf. First time the query lasts about 10 time longer (~ 1010 ms) - but I'd like to get better results already in the first query. Do you mean first time after a database restart? No: I simply meant doing the query when one can assume that the query result is not yet in the postgres' cache. You can check that here online: http://labs.geometa.info/postgisterminal = 1. When I add the actual time from EXPLAIN above, I get 11 + 10 + 10ms which is three times greater than the 11ms reported. Why? Because they include the times from the nodes under them. = 2. Why does the planner choose to sort first instead of sorting the (smaller) result query at the end the? You're reading the explain 'backwards' regarding time.. It *does* do the sort last. Nodes which are indented feed the nodes above them, so the bitmap index scan and recheck feed into the sort, hence the sort is actually done after. Can't really work any other way anyway, PG has to get the data before it can sort it.. Oh, thanks. I should have realized that. But then what should the arrow (-) wants to stand for? Sort (cost=30819.51... - Bitmap Heap Scan on osm_point (cost=313.21... - Bitmap Index Scan on osm_point_tags_idx I would suggest that the inverse arrow would be more intuitive: Sort (cost=30819.51... - Bitmap Heap Scan on osm_point (cost=313.21... - Bitmap Index Scan on osm_point_tags_idx = 3. What could I do to speed up such queries (first time, i.e. without caching) besides simply adding more memory? There didn't look like anything there that could really be done much faster, at the plan level. It's not uncommon for people to intentionally get a box with more memory than the size of their database, so everything is in memory. At the end of the day, if the blocks aren't in memory then PG has to get them from disk. If disk is slow, the query is going to be slow. Now, hopefully, you're hitting this table often enough with similar queries that important, common, parts of the table and index are already in memory, but there's no magic PG can perform to ensure that. If there's a lot of updates/changes to this table, you might check if there's a lot of bloat (check_postgres works great for this..). Eliminating excessive bloat, if there is any, could help with all accesses to that table, of course, since it would reduce the amount of data which would need to be. Thanks for the hint. But there are only periodic updates (currently once a night) and these are actually done by 1. truncating the database and 2. bulk loading all the stuff, then 3. reindexing. If one tries to completely fit the whole data into memory, then to me PostgreSQL features borrowed from in-memory databases become interesting. = Is there anything else than index-only scans (planned for 9.2?) which could be of interest here? Stefan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] hstore query: Any better idea than adding more memory?
Hi all I'd like to tune the following hstore-related query which selects all Zoos from table osm_point: SELECT osm_id, name, tags FROM osm_point WHERE tags @ hstore('tourism','zoo') ORDER BY name; ... given the following table and indexes definition: CREATE TABLE osm_point ( osm_id integer, name text, tags hstore, way geometry ) CREATE INDEX osm_point_index ON osm_point USING gist (way); CREATE INDEX osm_point_name_idx ON osm_point USING btree (name) WITH (FILLFACTOR=100); ALTER TABLE osm_point CLUSTER ON osm_point_name_idx; CREATE INDEX osm_point_pkey ON osm_point USING btree (osm_id); CREATE INDEX osm_point_tags_idx ON osm_point USING gist (tags) WITH (FILLFACTOR=100); ... and following statistics: * Live Tuples 9626138 (that's also what COUNT(*) returns) * Table Size1029 MB * Toast Table Size 32 kB * Indexes Size1381 MB (?) ** osm_point_index 1029 MB ** osm_point_name_idx 1029 MB ** osm_point_pkey 1029 MB ** osm_point_tags_idx 1029 MB PostgreSQL has version 9.0.4, runs on on Ubuntu Linux 10.04 LTS (64-Bit) with 1 vCPU and 1 GB vRAM. Adding more memory (say to total of 32 GB) would only postpone the problem. I already increased the PostgreSQL configuration of shared_buffers (using pgtune). Now EXPLAIN ANALYZE returns (if run several times): Sort (cost=30819.51..30843.58 rows=9626 width=65) (actual time=11.502..11.502 rows=19 loops=1) Sort Key: name Sort Method: quicksort Memory: 29kB - Bitmap Heap Scan on osm_point (cost=313.21..30182.62 rows=9626 width=65) (actual time=10.727..11.473 rows=19 loops=1) Recheck Cond: (tags @ 'tourism=zoo'::hstore) - Bitmap Index Scan on osm_point_tags_idx (cost=0.00..310.80 rows=9626 width=0) (actual time=10.399..10.399 rows=591 loops=1) Index Cond: (tags @ 'tourism=zoo'::hstore) Total runtime: 11 ms First time the query lasts about 10 time longer (~ 1010 ms) - but I'd like to get better results already in the first query. = 1. When I add the actual time from EXPLAIN above, I get 11 + 10 + 10ms which is three times greater than the 11ms reported. Why? = 2. Why does the planner choose to sort first instead of sorting the (smaller) result query at the end the? = 3. What could I do to speed up such queries (first time, i.e. without caching) besides simply adding more memory? Yours, Stefan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hash index use presently(?) discouraged since 2005: revive or bury it?
Merlin and Jeff, General remark again:It's hard for me to imagine that btree is superior for all the issues mentioned before. I still believe in hash index for primary keys and certain unique constraints where you need equality search and don't need ordering or range search. 2011/9/17 Jeff Janes jeff.ja...@gmail.com: (...) Also, that link doesn't address concurrency of selects at all, only of inserts. How would (or did) you test and benchmark concurrency of inserts and selects? Use pgbench with own config for a blackbox test? 2011/9/18 Merlin Moncure mmonc...@gmail.com: Here again, I think that any proposed improvement in the current hash index code should be measured against wrapping a btree index. You get wal logging and high concurrency for free if you decide to do that. As I understand, this would be an enhancement of btree. That's ok for btree but not really exploiting all advantages of a separate hash index, would'nt it? Stefan 2011/9/18 Merlin Moncure mmonc...@gmail.com: On Sat, Sep 17, 2011 at 4:48 PM, Jeff Janes jeff.ja...@gmail.com wrote: On Tue, Sep 13, 2011 at 5:04 PM, Peter Geoghegan pe...@2ndquadrant.com wrote: On 14 September 2011 00:04, Stefan Keller sfkel...@gmail.com wrote: Has this been verified on a recent release? I can't believe that hash performs so bad over all these points. Theory tells me otherwise and http://en.wikipedia.org/wiki/Hash_table seems to be a success. My understanding is that a huge amount of work has gone into making btree what it is in PG, and not nearly as much work has gone into making hash indexes what they could be. Hash indexes have been improved since 2005 - their performance was improved quite a bit in 9.0. Here's a more recent analysis: http://www.depesz.com/index.php/2010/06/28/should-you-use-hash-index/ They are 3 time faster to build. But if you rip the WAL logging out of btree, how much faster would those get? Also, that link doesn't address concurrency of selects at all, only of inserts. Of course hash indexes are faster to build than varlen string indexes :-). I use natural keys 50-80% of the time and hash indexing would remove some of the pain in cases where I don't need ordering and range operations. In fact, if they are made to properly support wal logging and uniqueness, I imagine they should supplant btree in a broad range of cases, so much so that it would be awful nice to be able to have syntax to choose hash for primary keys and unique constraints. @ Jeff: I think that adding WAL to hash indexes without first addressing the heavy-weight locking issue would be a mistake. Even if the WAL was fixed, the bad performance under concurrent selects would still make it at best a narrow niche thing. And fixing the locking *after* WAL is in place would probably be very much harder than the other order. Here again, I think that any proposed improvement in the current hash index code should be measured against wrapping a btree index. You get wal logging and high concurrency for free if you decide to do that. merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hash index use presently(?) discouraged since 2005: revive or bury it?
I'm simply referring to literature (like the intro Ramakrishnan Gehrke). I just know that Oracle an Mysql actually do have them too and use it without those current implementation specific restrictions in Postgres. IMHO by design Hash Index (e.g. linear hashing) work best when: 1. only equal (=) tests are used (on whole values) 2. columns (key values) have very-high cardinality And ideally but not necessarily when index values do not change and number of rows are known ahead of time (avoiding O(N) worst case - but there are approaches to chaining with dynamic resizing). I just collected this to encourage ourselves that enhancing hash indexes could be worthwhile. Stefan 2011/9/18 Kevin Grittner kevin.gritt...@wicourts.gov: Stefan Keller wrote: It's hard for me to imagine that btree is superior for all the issues mentioned before. It would be great if you could show a benchmark technique which shows otherwise. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] What about implementing a bitmap index? Any use cases?
Hi, I know there exist Bitmap Index Scan and Bitmap Heap Scan in Postgres. What about implementing a bitmap index for explicit use (CREATE INDEX ...)? Any use cases? Bitmap indexes work best on values with low cardinality (categorical data), would be efficient in space and ready for logic operations. Stefan P.S. Disclaimer (referring to my other thread about Hash): I'm not a btree opposer :- I'm just evaluating index alternatives. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Index containing records instead of pointers to the data?
Hi, Sorry if this is an odd question: I assume that Postgres indexes don't store records but only pointers to the data. This means, that there is always an additional access needed (real table I/O). Would an index containing data records make sense? Stefan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] PostgreSQL-related topics of theses and seminary works sought (Was: Hash index use presently(?) discouraged...)
2011/9/17 Tomas Vondra t...@fuzzy.cz wrote: (...) We've been asked by a local university for PostgreSQL-related topics of theses and seminary works I'm also interested in such proposals or ideas! Here's some list of topics: * Adding WAL-support to hash indexes in PostgreSQL (see ex-topic) * Time in PostgreSQL * Storing (Weather) Sensor Data in PostgreSQL * Fast Bulk Data Inserting in PostgreSQL with Unlogged tables (incl. adding GiST support) * Performance Tuning of Read-Only a PostgreSQL Database * Materialized Views in PostgreSQL: Experiments around Jonathan Gardner's Proposal * more... ? Yours, Stefan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hash index use presently(?) discouraged since 2005: revive or bury it?
2011/9/16 Tom Lane t...@sss.pgh.pa.us: I'm not entirely following this eagerness to junk that AM, anyway. We've put a lot of sweat into it over the years, in the hopes that it would eventually be good for something. It's on the edge of being good for something now, and there's doubtless room for more improvements, so why are the knives out? No knives from my side. Sorry for the exaggerated subject title. I'm also in favor for an enhanced hash index for cases where only = tests are processed and where only few inserts/deletes will occur. Stefan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hash index use presently(?) discouraged since 2005: revive or bury it?
2011/9/14 Tom Lane t...@sss.pgh.pa.us: (...) I think that the current state of affairs is still what depesz said, namely that there might be cases where they'd be a win to use, except the lack of WAL support is a killer. I imagine somebody will step up and do that eventually. Should I open a ticket? Stefan 2011/9/14 Tom Lane t...@sss.pgh.pa.us: Peter Geoghegan pe...@2ndquadrant.com writes: On 14 September 2011 00:04, Stefan Keller sfkel...@gmail.com wrote: Has this been verified on a recent release? I can't believe that hash performs so bad over all these points. Theory tells me otherwise and http://en.wikipedia.org/wiki/Hash_table seems to be a success. Hash indexes have been improved since 2005 - their performance was improved quite a bit in 9.0. Here's a more recent analysis: http://www.depesz.com/index.php/2010/06/28/should-you-use-hash-index/ Yeah, looking into the git logs shows several separate major changes committed during 2008, including storing only the hash code not the whole indexed value (big win on wide values, and lets you index values larger than one index page, which doesn't work in btree). I think that the current state of affairs is still what depesz said, namely that there might be cases where they'd be a win to use, except the lack of WAL support is a killer. I imagine somebody will step up and do that eventually. The big picture though is that we're not going to remove hash indexes, even if they're nearly useless in themselves, because hash index opclasses provide the foundation for the system's knowledge of how to do the datatype-specific hashing needed for hash joins and hash aggregation. And those things *are* big wins, even if hash indexes themselves never become so. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hash index use presently(?) discouraged since 2005: revive or bury it?
2011/9/14 Tom Lane t...@sss.pgh.pa.us writes: (...) I think that the current state of affairs is still what depesz said, namely that there might be cases where they'd be a win to use, except the lack of WAL support is a killer. I imagine somebody will step up and do that eventually. How much of work (in man days) do you estimate would this mean for someone who can program but has to learn PG internals first? Stefan 2011/9/14 Tom Lane t...@sss.pgh.pa.us: Peter Geoghegan pe...@2ndquadrant.com writes: On 14 September 2011 00:04, Stefan Keller sfkel...@gmail.com wrote: Has this been verified on a recent release? I can't believe that hash performs so bad over all these points. Theory tells me otherwise and http://en.wikipedia.org/wiki/Hash_table seems to be a success. Hash indexes have been improved since 2005 - their performance was improved quite a bit in 9.0. Here's a more recent analysis: http://www.depesz.com/index.php/2010/06/28/should-you-use-hash-index/ Yeah, looking into the git logs shows several separate major changes committed during 2008, including storing only the hash code not the whole indexed value (big win on wide values, and lets you index values larger than one index page, which doesn't work in btree). I think that the current state of affairs is still what depesz said, namely that there might be cases where they'd be a win to use, except the lack of WAL support is a killer. I imagine somebody will step up and do that eventually. The big picture though is that we're not going to remove hash indexes, even if they're nearly useless in themselves, because hash index opclasses provide the foundation for the system's knowledge of how to do the datatype-specific hashing needed for hash joins and hash aggregation. And those things *are* big wins, even if hash indexes themselves never become so. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres for a data warehouse, 5-10 TB
Interesting debate. 2011/9/13 Marti Raudsepp ma...@juffo.org: Yes, every BEGIN/EXCEPTION block creates a subtransaction -- like a SAVEPOINT it can roll back to in case of an error. Are you sure? In theory I always understood that there are no subtransactions. In fact when looking at the docs there is chapter 39.6.6. saying By default, any error occurring in a PL/pgSQL function aborts execution of the function, and indeed of the surrounding transaction as well. You can trap errors and recover from them by using a BEGIN block with an EXCEPTION clause. (http://www.postgresql.org/docs/current/interactive/plpgsql-control-structures.html ) So the doc isn't totally explicit about this. But whatever: What would be the the function of a subtransaction? To give the possibility to recover and continue within the surrounding transaction? Stefan 2011/9/13 Marti Raudsepp ma...@juffo.org: On Tue, Sep 13, 2011 at 19:34, Robert Klemme shortcut...@googlemail.com wrote: I don't think so. You only need to catch the error (see attachment). Or does this create a sub transaction? Yes, every BEGIN/EXCEPTION block creates a subtransaction -- like a SAVEPOINT it can roll back to in case of an error. Yes, I mentioned the speed issue. But regardless of the solution for MySQL's INSERT..ON DUPLICATE KEY UPDATE which Igor mentioned you will have the locking problem anyhow if you plan to insert concurrently into the same table and be robust. In a mass-loading application you can often divide the work between threads in a manner that doesn't cause conflicts. For example, if the unique key is foobar_id and you have 4 threads, thread 0 will handle rows where (foobar_id%4)=0, thread 1 takes (foobar_id%4)=1 etc. Or potentially hash foobar_id before dividing the work. I already suggested this in my original post. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Hash index use presently(?) discouraged since 2005: revive or bury it?
The doc at http://www.postgresql.org/docs/current/interactive/indexes-types.html says: Caution: Hash index operations are not presently WAL-logged, so hash indexes might need to be rebuilt with REINDEX after a database crash. They are also not replicated over streaming or file-based replication. For these reasons, hash index use is presently discouraged. I found a thread here http://archives.postgresql.org/pgsql-general/2005-05/msg00370.php about Hash index vs. b-tree index (PostgreSQL 8.0) mentioning some issues, like they * are not faster than B-trees even for = comparisons * aren't WAL safe * have poor concurrency (require coarser locks), * are significantly slower than creating a b+-tree index. In fact these statements seem to rely on the docs back in version 7.2 (see http://www.postgresql.org/docs/7.2/static/indexes-types.html ) Has this been verified on a recent release? I can't believe that hash performs so bad over all these points. Theory tells me otherwise and http://en.wikipedia.org/wiki/Hash_table seems to be a success. Are there any plans to give hash index another chance (or to bury it with a reason)? Stefan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Fwd: [PERFORM] Summaries on SSD usage?
Shaun, 2011/9/2 Shaun Thomas stho...@peak6.com: Ironically, this is actually the topic of my presentation at Postgres Open. Do you think my problem would now be solved with NVRAM PCI card? Stefan -- Forwarded message -- From: Stefan Keller sfkel...@gmail.com Date: 2011/9/3 Subject: Re: [PERFORM] Summaries on SSD usage? To: Jesper Krogh jes...@krogh.cc Cc: pgsql-performance@postgresql.org 2011/9/3 Jesper Krogh jes...@krogh.cc: On 2011-09-03 00:04, Stefan Keller wrote: It's not that hard to figure out.. take some of your typical queries. say the one above.. Change the search-term to something you'd expect the user to enter in a minute, but hasn't been run. (could be museum instead of zoo.. then you run it with \timing and twice.. if the two queries are close to each other in timing, then you only hit memory anyway and neither SSD, NVRAM or more RAM will buy you anything. Faster memory and faster CPU-cores will.. if you have a significant speedup to the second run, then more RAM, NVRAM, SSD is a good fix. Typically I have slow-query-logging turned on, permanently set to around 250ms. If I find queries in the log that i didnt expect to take above 250ms then I'd start to investigate if query-plans are correct .. and so on.. The above numbers are raw-data size and now how PG uses them.. or? And you havent told anything about the size of your current system. Its definitely the case that the second query run is much faster (first ones go up to 30 seconds and more...). PG uses the raw data for Switzerlad like this: 10 GB total disk space based on 2 GB raw XML input. Table osm_point is one of the four big tables and uses 984 MB for table and 1321 MB for indexes (where hstore is the biggest from id, name and geometry). Stefan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Summaries on SSD usage?
2011/9/3 Jesper Krogh jes...@krogh.cc: On 2011-09-03 00:04, Stefan Keller wrote: It's not that hard to figure out.. take some of your typical queries. say the one above.. Change the search-term to something you'd expect the user to enter in a minute, but hasn't been run. (could be museum instead of zoo.. then you run it with \timing and twice.. if the two queries are close to each other in timing, then you only hit memory anyway and neither SSD, NVRAM or more RAM will buy you anything. Faster memory and faster CPU-cores will.. if you have a significant speedup to the second run, then more RAM, NVRAM, SSD is a good fix. Typically I have slow-query-logging turned on, permanently set to around 250ms. If I find queries in the log that i didnt expect to take above 250ms then I'd start to investigate if query-plans are correct .. and so on.. The above numbers are raw-data size and now how PG uses them.. or? And you havent told anything about the size of your current system. Its definitely the case that the second query run is much faster (first ones go up to 30 seconds and more...). PG uses the raw data for Switzerlad like this: 10 GB total disk space based on 2 GB raw XML input. Table osm_point is one of the four big tables and uses 984 MB for table and 1321 MB for indexes (where hstore is the biggest from id, name and geometry). Stefan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Summaries on SSD usage?
2011/9/2 Scott Marlowe scott.marl...@gmail.com: On Tue, Aug 30, 2011 at 11:23 AM, Stefan Keller sfkel...@gmail.com wrote: How big is your DB? What kind of reads are most common, random access or sequential? How big of a dataset do you pull out at once with a query. SSDs are usually not a big winner for read only databases. If the dataset is small (dozen or so gigs) get more RAM to fit it in If it's big and sequentially accessed, then build a giant RAID-10 or RAID-6 If it's big and randomly accessed then buy a bunch of SSDs and RAID them My dataset is a mirror of OpenStreetMap updated daily. For Switzerland it's about 10 GB total disk space used (half for tables, half for indexes) based on 2 GB raw XML input. Europe would be about 70 times larger (130 GB) and world has 250 GB raw input. It's both randomly (= index scan?) and sequentially (= seq scan?) accessed with queries like: SELECT * FROM osm_point WHERE tags @ hstore('tourism','zoo') AND name ILIKE 'Zoo%' . You can try it yourself online, e.g. http://labs.geometa.info/postgisterminal/?xapi=node[tourism=zoo] So I'm still unsure what's better: SSD, NVRAM (PCI card) or plain RAM? And I'm eager to understand if unlogged tables could help anyway. Yours, Stefan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Summaries on SSD usage?
You mean something like Unlogged Tables in PostgreSQL 9.1 (= in-memory database) or simply a large ramdisk? Yours, Stefan 2011/9/1 Jim Nasby j...@nasby.net: On Aug 30, 2011, at 12:23 PM, Stefan Keller wrote: I'm looking for summaries (or best practices) on SSD usage with PostgreSQL. My use case is mainly a read-only database. Are there any around? I'm not sure, but for read-only why not just put more memory in the server? It'll be a lot cheaper than SSDs. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Summaries on SSD usage?
Hi, I'm looking for summaries (or best practices) on SSD usage with PostgreSQL. My use case is mainly a read-only database. Are there any around? Yours, Stefan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] hstore - Implementation and performance issues around its operators
Hi Robert Many thanks for your answers. 2011/7/19 Robert Haas robertmh...@gmail.com: Putting the elements in order wouldn't really help, would it? I mean, you'd need some kind of an index inside the hstore... which there isn't. Sorry for my inprecise question. In fact elements of a hstore are stored in order of (keylength,key) with the key comparison done bytewise (not locale-dependent). See e.g. function hstoreUniquePairs in http://doxygen.postgresql.org/ . This ordered property is being used by some hstore functions but not all - and I'm still wondering why. Yours, Stefan 2011/7/19 Robert Haas robertmh...@gmail.com: On Sun, Jun 19, 2011 at 2:59 PM, Stefan Keller sfkel...@gmail.com wrote: 1. Obviously the '@' has to be used in order to let use the GiST index. Why is the '-' operator not supported by GiST ('-' is actually mentioned in all examples of the doc.)? Because it's not a comparison operator. 2. Currently the hstore elements are stored in order as they are coming from the insert statement / constructor. Why are the elements not ordered i.e. why is the hstore not cached in all hstore functions (like hstore_fetchval etc.)? Putting the elements in order wouldn't really help, would it? I mean, you'd need some kind of an index inside the hstore... which there isn't. 3. In the source code 'hstore_io.c' one finds the following enigmatic note: ... very large hstore values can't be output. this could be fixed, but many other data types probably have the same issue. What is the max. length of a hstore (i.e. the max. length of the sum of all elements in text representation)? I think that anything of half a gigabyte or more is at risk of falling down there. But probably it's not smart to use such big hstores anyway. 4. Last, I don't fully understand the following note in the hstore doc. (http://www.postgresql.org/docs/current/interactive/hstore.html ): Notice that the old names are reversed from the convention formerly followed by the core geometric data types! Why names? Why not rather 'operators' or 'functions'? It's referring to the operator names. What does this reversed from the convention mean concretely? That comment could be a little more clear, but I think what it's saying is that hstore's old @ is like the core geometic types old ~, and visca versa. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] hstore - Implementation and performance issues around its operators
Hi, We did a benchmark comparing a Key-Value-Pairs stored as EAV db schema versus hstore. The results are promising in favor of hstore but there are some question which remain. 1. Obviously the '@' has to be used in order to let use the GiST index. Why is the '-' operator not supported by GiST ('-' is actually mentioned in all examples of the doc.)? 2. Currently the hstore elements are stored in order as they are coming from the insert statement / constructor. Why are the elements not ordered i.e. why is the hstore not cached in all hstore functions (like hstore_fetchval etc.)? 3. In the source code 'hstore_io.c' one finds the following enigmatic note: ... very large hstore values can't be output. this could be fixed, but many other data types probably have the same issue. What is the max. length of a hstore (i.e. the max. length of the sum of all elements in text representation)? 4. Last, I don't fully understand the following note in the hstore doc. (http://www.postgresql.org/docs/current/interactive/hstore.html ): Notice that the old names are reversed from the convention formerly followed by the core geometric data types! Why names? Why not rather 'operators' or 'functions'? What does this reversed from the convention mean concretely? Yours, Stefan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: FW: [PERFORM] KVP table vs. hstore - hstore performance (Was: Postgres NoSQL emulation)
Hi all Thank you to all who answered: That worked: CREATE INDEX planet_osm_point_tags_amenity ON planet_osm_point ((tags-'amenity')) WHERE (tags-'amenity') IS NOT NULL; My problem is, that in fact I don't know which tag to index since I'm running a web admin application where users can enter arbitrary queries. Yours, Stefan 2011/5/25 Pierre C li...@peufeu.com: You wrote Try to create a btree index on (bench_hstore-bench_id) WHERE (bench_hstore-bench_id) IS NOT NULL. What do you mean exactly? = CREATE INDEX myhstore_kps_gin_idx ON myhstore USING gin(kvps) WHERE ??? IS NOT NULL; My table's def is: CREATE TABLE myhstore ( id bigint PRIMARY KEY, kvps hstore NOT NULL ); So I'm doing something like: CREATE INDEX myhstore_kps_gin_idx ON myhstore USING gin(kvps); Hello ; I meant a plain old btree index like this : CREATE INDEX foo ON myhstore((kvps-'yourkeyname')) WHERE (kvps-'yourkeyname') IS NOT NULL; The idea is that : - The reason to use hstore is to have an arbitrary number of keys and use the keys you want, not have a fixed set of columns like in a table - Therefore, no hstore key is present in all rows (if it was, you'd make it a table column, and maybe index it) - You'll probably only want to index some of the keys/values (avoiding to index values that contain serialized data or other stuff that never appears in a WHERE clause) So, for each key that corresponds to a searchable attribute, I'd use a conditional index on that key, which only indexes the relevant rows. For keys that never appear in a WHERE, no index is needed. gist is good if you want the intersecton of a hstore with another one (for instance), btree is good if you want simple search or range search. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: FW: [PERFORM] KVP table vs. hstore - hstore performance (Was: Postgres NoSQL emulation)
Salut Pierre You wrote Try to create a btree index on (bench_hstore-bench_id) WHERE (bench_hstore-bench_id) IS NOT NULL. What do you mean exactly? = CREATE INDEX myhstore_kps_gin_idx ON myhstore USING gin(kvps) WHERE ??? IS NOT NULL; My table's def is: CREATE TABLE myhstore ( id bigint PRIMARY KEY, kvps hstore NOT NULL ); So I'm doing something like: CREATE INDEX myhstore_kps_gin_idx ON myhstore USING gin(kvps); Stefan 2011/5/23 Pierre C li...@peufeu.com: Hi Merlin The analyze command gave the following result: On the KVP table: Index Scan using kvpidx on bench_kvp (cost=0.00..8.53 rows=1 width=180) (actual time=0.037..0.038 rows=1 loops=1) Index Cond: (bench_id = '20_20'::text) Total runtime: 0.057 ms And on the Hstore table: Bitmap Heap Scan on bench_hstore (cost=32.22..3507.54 rows=1000 width=265) (actual time=145.040..256.173 rows=1 loops=1) Recheck Cond: (bench_hstore @ 'bench_id=20_20'::hstore) - Bitmap Index Scan on hidx (cost=0.00..31.97 rows=1000 width=0) (actual time=114.748..114.748 rows=30605 loops=1) Index Cond: (bench_hstore @ 'bench_id=20_20'::hstore) Total runtime: 256.211 ms For Hstore I'm using a GIST index. Try to create a btree index on (bench_hstore-bench_id) WHERE (bench_hstore-bench_id) IS NOT NULL. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] KVP table vs. hstore - hstore performance (Was: Postgres NoSQL emulation)
Hi Jim You actually made me think about the schema Michel and I are using: And KVP is? ;) CREATE TABLE mykvpstore( id bigint PRIMARY KEY ) CREATE TABLE kvp ( id bigint REFERENCES mykvpstore(id), key text NOT NULL, value text, ); -- with index on key And the table with the associative array type (hstore) is: CREATE TABLE myhstore ( id bigint PRIMARY KEY, kvps hstore NOT NULL ); -- with GIST index on obj It seems to me that in the mykvpstore-kvp there is also some overhead. And yes, we have no clue what keys to anticipate, except for some common ones like 'name': The use case is coming from OpenStreetMap (http://wiki.openstreetmap.org/wiki/Database_schema ). Yours, Stefan 2011/5/17 Jim Nasby j...@nasby.net: On May 16, 2011, at 8:47 AM, Merlin Moncure wrote: On Sat, May 14, 2011 at 5:10 AM, Stefan Keller sfkel...@gmail.com wrote: Hi, I am conducting a benchmark to compare KVP table vs. hstore and got bad hstore performance results when the no. of records is greater than about 500'000. CREATE TABLE kvp ( id SERIAL PRIMARY KEY, key text NOT NULL, value text ); -- with index on key CREATE TABLE myhstore ( id SERIAL PRIMARY KEY, obj hstore NOT NULL ); -- with GIST index on obj Does anyone have experience with that? hstore is not really designed for large-ish sets like that. And KVP is? ;) IIRC hstore ends up just storing everything as text, with pointers to know where things start and end. There's no real indexing inside hstore, so basically the only thing it can do is scan the entire hstore. That said, I would strongly reconsider using KVP for anything except the most trivial of data sets. It is *extremely* inefficient. Do you really have absolutely no idea what *any* of your keys will be? Even if you need to support a certain amount of non-deterministic stuff, I would put everything you possibly can into real fields and only use KVP or hstore for things that you really didn't anticipate. Keep in mind that for every *value*, your overhead is 24 bytes for the heap header, 2+ varlena bytes in the heap, plus the length of the key. In the index you're looking at 6+ bytes of overhead, 1+ byte for varlena, plus the length of the key. The PK will cost you an additional 16-24 bytes, depending on alignment. So that's a *minimum* of ~50 bytes per value, and realistically the overhead will be closer to 65-70 bytes, *per value*. Unless your values are decent-sized strings, the overhead is going to be many times larger than the actual data! -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] KVP table vs. hstore - hstore performance (Was: Postgres NoSQL emulation)
Hi, I am conducting a benchmark to compare KVP table vs. hstore and got bad hstore performance results when the no. of records is greater than about 500'000. CREATE TABLE kvp ( id SERIAL PRIMARY KEY, key text NOT NULL, value text ); -- with index on key CREATE TABLE myhstore ( id SERIAL PRIMARY KEY, obj hstore NOT NULL ); -- with GIST index on obj Does anyone have experience with that? Yours, Stefan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] How to configure a read-only database server?
I browsed the faq and looked at PostgreSQL performance books but I could not find the obvious: How to configure a read-only database server? I have a single-disk virtual Linux system and a read-only dataset which is exposed to internet and completely replaced from time to time. This is what I found so far: * Disabling autovacuum daemon. * Setting postgresql.conf parameters: fsync=off synchronous_commit=off full_page_writes=off * For the session: SET transaction_read_only TO FALSE; SET TRANSACTION READ ONLY; * What about wal_level and archive_mode? = Any comments on speeding up/optimizing such database server? Yours, Stefan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Dell PERC H700/H800
Joshua D. Drake wrote: On Thu, 2010-02-11 at 12:39 +, Matthew Wakeling wrote: Just a heads up - apparently the more recent Dell RAID controllers will no longer recognise hard discs that weren't sold through Dell. http://www.channelregister.co.uk/2010/02/10/dell_perc_11th_gen_qualified_hdds_only/ As one of the comments points out, that kind of makes them no longer SATA or SAS compatible, and they shouldn't be allowed to use those acronyms any more. That's interesting. I know that IBM at least on some of their models have done the same. Glad I use HP :) all of the main vendors do that - IBM does and so does HP (unless you count the toy boxes without a real raid controller). The later actually goes so far and blacklists some of their own hdd firmware levels in more recent controller versions which can cause quite some surprising results during maintenance operations. I find it quite strange that people seem to be surprised by Dell now starting with that as well (I atually find it really surprising they have not done that before). Stefan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Best suiting OS
Stefan Kaltenbrunner wrote: Devrim GÜNDÜZ wrote: On Mon, 2009-10-05 at 12:07 +0200, Jean-Michel Pouré wrote: Go for Debian: * It is a free community, very active. Well, we need to state that this is not a unique feature. * It is guaranteed to be upgradable. Depends. I had lots of issues with upgrade process in the past -- but yeah, it is much better than most distros. * Very easy to administrate via apt-get. Right. apt is better than yum (in terms of speed). Choose Debian SID or testing, which will provide the latest fixes. One thing that I don't like about Debian is their update policy. If upstream is releasing a security update, I'd like to be able to find new packages as upstream announces updated sets. Yes, I'm talking about PostgreSQL here. This is exactly what Debian does for a while now(at least for PostgreSQL).. Ie.: Debian Etch aka has 8.1.18 and Debian Lenny has 8.3.8... Debian Etch aka oldstable and Debian Lenny (the current release)... Stefan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
Scott Carey wrote: On 7/30/09 11:24 AM, Stefan Kaltenbrunner ste...@kaltenbrunner.cc wrote: Kevin Grittner wrote: Tom Lane t...@sss.pgh.pa.us wrote: Kevin Grittner kevin.gritt...@wicourts.gov writes: Since the dump to custom format ran longer than the full pg_dump piped directly to psql would have taken, the overall time to use this technique is clearly longer for our databases on our hardware. Hmmm ... AFAIR there isn't a good reason for dump to custom format to take longer than plain text dump, except for applying compression. Maybe -Z0 would be worth testing? Or is the problem that you have to write the data to a disk file rather than just piping it? I did some checking with the DBA who normally copies these around for development and test environments. He confirmed that when the source and target are on the same machine, a pg_dump piped to psql takes about two hours. If he pipes across the network, it runs more like three hours. My pg_dump to custom format ran for six hours. The single-transaction restore from that dump file took two hours, with both on the same machine. I can confirm with benchmarks, but this guy generally knows what he's talking about (and we do create a lot of development and test databases this way). Either the compression is tripling the dump time, or there is something inefficient about how pg_dump writes to the disk. seems about right - compression in pg_dump -Fc is a serious bottleneck and unless can significantly speed it up or make it use of multiple cores (either for the dump itself - which would be awsome - or for the compression) I would recommend to not use it at all. That's not an option when a dump compressed is 200GB and uncompressed is 1.3TB, for example. yeah that was not meant as don't use compression at all but rather as use a different way to compress than what pg_dump provides internally. Stefan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
Tom Lane wrote: Kevin Grittner kevin.gritt...@wicourts.gov writes: Since the dump to custom format ran longer than the full pg_dump piped directly to psql would have taken, the overall time to use this technique is clearly longer for our databases on our hardware. Hmmm ... AFAIR there isn't a good reason for dump to custom format to take longer than plain text dump, except for applying compression. Maybe -Z0 would be worth testing? Or is the problem that you have to write the data to a disk file rather than just piping it? I always dump with -Z0(and compress afterwards or even in a pipe to get two cores busy) because otherwise custom dump times are simply ridiculous. However Kevin is on something here - on the typical 4-8 core box I tested I managed to an around cores/2 speedup for the restore which means that for a pure upgrade or testing similiar to what kevin is doing custom dumps + parallel restore might result in no win or even a loss. On on of our datasets I did some benchmarking a while ago (for those who attended bruce pg_migrator talk @pgcon these are same numbers): * 150GB Database (on-disk - ~100GB as a plain text dump) time to dump(-C0): 120min time to restore(single threaded): 180min time to restore(-j 16): 59min however the problem is that this does not actually mean that parallel restore shaves you ~120min in dump/restore time because you get the following real runtimes: plain text dump + single threaded restore in a pipe: 188min custom dump to file + parallel restore: 179min this is without compression, with the default custom dump + parallel restore is way slower than the simple approach on reasonable hardware. Stefan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL 8.4 performance tuning questions
Kevin Grittner wrote: Tom Lane t...@sss.pgh.pa.us wrote: Kevin Grittner kevin.gritt...@wicourts.gov writes: Since the dump to custom format ran longer than the full pg_dump piped directly to psql would have taken, the overall time to use this technique is clearly longer for our databases on our hardware. Hmmm ... AFAIR there isn't a good reason for dump to custom format to take longer than plain text dump, except for applying compression. Maybe -Z0 would be worth testing? Or is the problem that you have to write the data to a disk file rather than just piping it? I did some checking with the DBA who normally copies these around for development and test environments. He confirmed that when the source and target are on the same machine, a pg_dump piped to psql takes about two hours. If he pipes across the network, it runs more like three hours. My pg_dump to custom format ran for six hours. The single-transaction restore from that dump file took two hours, with both on the same machine. I can confirm with benchmarks, but this guy generally knows what he's talking about (and we do create a lot of development and test databases this way). Either the compression is tripling the dump time, or there is something inefficient about how pg_dump writes to the disk. seems about right - compression in pg_dump -Fc is a serious bottleneck and unless can significantly speed it up or make it use of multiple cores (either for the dump itself - which would be awsome - or for the compression) I would recommend to not use it at all. Stefan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] hyperthreaded cpu still an issue in 8.4?
Greg Smith wrote: On Wed, 29 Jul 2009, Stefan Kaltenbrunner wrote: Well the real problem is that pgbench itself does not scale too well to lots of concurrent connections and/or to high transaction rates so it seriously skews the result. Sure, but that's what the multi-threaded pgbench code aims to fix, which didn't show up until after you ran your tests. I got the 90K select TPS with a completely unoptimized postgresql.conf, so that's by no means the best it's possible to get out of the new pgbench code on this hardware. I've seen as much as a 40% improvement over the standard pgbench code in my limited testing so far, and the patch author has seen a 450% one. You might be able to see at least the same results you got from sysbench out of it. oh - the 90k tps are with the new multithreaded pgbench? missed that fact. As you can see from my results I managed to get 83k with the 8.4 pgbench on a slightly slower Nehalem which does not sound too impressive for the new code... Stefan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] hyperthreaded cpu still an issue in 8.4?
Greg Smith wrote: On Tue, 28 Jul 2009, Scott Marlowe wrote: Just FYI, I ran the same basic test but with -c 10 since -c shouldn't really be greater than -s That's only true if you're running the TPC-B-like or other write tests, where access to the small branches table becomes a serious hotspot for contention. The select-only test has no such specific restriction as it only operations on the big accounts table. Often peak throughput is closer to a very small multiple on the number of cores though, and possibly even clients=cores, presumably because it's more efficient to approximately peg one backend per core rather than switch among more than one on each--reduced L1 cache contention etc. That's the behavior you measured when your test showed better results with c=10 than c=16 on a 8 core system, rather than suffering less from the c must be s contention limitation. Well the real problem is that pgbench itself does not scale too well to lots of concurrent connections and/or to high transaction rates so it seriously skews the result. If you look http://www.kaltenbrunner.cc/blog/index.php?/archives/26-Benchmarking-8.4-Chapter-1Read-Only-workloads.html. It is pretty clear that 90k(or the 83k I got due to the slower E5530) tps is actually a pgench limit and that the backend really can do almost twice as fast (I only demonstrated ~140k tps using sysbench there but I later managed to do ~160k tps with queries that are closer to what pgbench does in the lab) Stefan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] 8.4 COPY performance regression on Solaris
Alan Li wrote: Hi, It seems that a COPY of 8M rows to a table to 8.4rc1 takes 30% longer than it does to 8.3.7 on Solaris. Here are the steps I've taken to reproduce this problem on two different solaris boxes (Solaris 10 11/06 s10x_u3wos_10 X86 and Solaris 10 8/07 s10x_u4wos_12b X86). I've tried this on a Linux box, and I do not see the problem there. tried that on my box (though I increased the testset size by 10x to get more sensible runtimes) and I can reproduce that on Linux(CentoS 5.3/x86_64, Nehalem Xeon E5530) as well. I get ~45 rows/s on 8.3 and only ~33/s on 8.4 on 8.4 I get: 3m59/4m01/3m56s runtime and a profile of samples %symbol name 636302 19.6577 XLogInsert 415510 12.8366 CopyReadLine 2253476.9618 DoCopy 1311434.0515 ParseDateTime 1220433.7703 DecodeNumber 81730 2.5249 DecodeDate 81045 2.5038 DecodeDateTime 80900 2.4993 pg_verify_mbstr_len 80235 2.4787 pg_next_dst_boundary 67571 2.0875 LWLockAcquire 64548 1.9941 heap_insert 64178 1.9827 LWLockRelease 63609 1.9651 PageAddItem 63402 1.9587 heap_form_tuple 56544 1.7468 timestamp_in 48697 1.5044 heap_fill_tuple 45248 1.3979 pg_atoi 42390 1.3096 IsSystemRelation 41287 1.2755 BufferGetBlockNumber 38936 1.2029 ValidateDate 36619 1.1313 ExecStoreTuple 35367 1.0926 DecodeTime on 8.3.7 I get 2m58s,2m54s,2m55s and a profile of: samples %symbol name 460966 16.2924 XLogInsert 307386 10.8643 CopyReadLine 301745 10.6649 DoCopy 1534525.4236 pg_next_dst_boundary 1197574.2327 DecodeNumber 1053563.7237 heap_formtuple 83456 2.9497 ParseDateTime 83020 2.9343 pg_verify_mbstr_len 72735 2.5708 DecodeDate 70425 2.4891 LWLockAcquire 65820 2.3264 LWLockRelease 61823 2.1851 DecodeDateTime 55895 1.9756 hash_any 51305 1.8133 PageAddItem 47440 1.6767 AllocSetAlloc 47218 1.6689 heap_insert 38912 1.3753 DecodeTime 34871 1.2325 ReadBuffer_common 34519 1.2200 date2j 33093 1.1696 DetermineTimeZoneOffset 31334 1.1075 MemoryContextAllocZero 30951 1.0939 RelationGetBufferForTuple If I do the same test utilizing WAL bypass the picture changes: 8.3 runtimes:2m16,2min14s,2min22s and profile: samples %symbol name 445583 16. CopyReadLine 332772 12.5300 DoCopy 1569745.9106 pg_next_dst_boundary 1319524.9684 heap_formtuple 1191144.4850 DecodeNumber 94340 3.5522 ParseDateTime 81624 3.0734 pg_verify_mbstr_len 75012 2.8245 DecodeDate 74950 2.8221 DecodeDateTime 64467 2.4274 hash_any 62859 2.3669 PageAddItem 62054 2.3365 LWLockAcquire 57209 2.1541 LWLockRelease 45812 1.7250 hash_search_with_hash_value 41530 1.5637 DetermineTimeZoneOffset 40790 1.5359 heap_insert 39694 1.4946 AllocSetAlloc 38855 1.4630 ReadBuffer_common 36056 1.3576 MemoryContextAllocZero 36030 1.3567 DecodeTime 29057 1.0941 UnpinBuffer 28291 1.0653 PinBuffer 8.4 runtime: 2m1s,2m,1m59s and profile: 404775 17.9532 CopyReadLine 2084829.2469 DoCopy 1488986.6042 ParseDateTime 1186455.2623 DecodeNumber 80972 3.5914 DecodeDate 79005 3.5042 pg_verify_mbstr_len 73645 3.2664 PageAddItem 72167 3.2009 DecodeDateTime 65264 2.8947 heap_form_tuple 52680 2.3365 timestamp_in 46264 2.0520 pg_next_dst_boundary 45819 2.0322 ExecStoreTuple 45745 2.0290 heap_fill_tuple 43690 1.9378 heap_insert 38453 1.7055 InputFunctionCall 37050 1.6433 LWLockAcquire 36853 1.6346 BufferGetBlockNumber 36428 1.6157 heap_compute_data_size 33818 1.5000 DetermineTimeZoneOffset 33468 1.4844 DecodeTime 30896 1.3703 tm2timestamp 30888 1.3700 GetCurrentTransactionId Stefan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL with PostGIS on embedded hardware
Greg Stark wrote: On Mon, May 11, 2009 at 5:05 PM, Stefan Kaltenbrunner ste...@kaltenbrunner.cc wrote: Good to know!!! I imagine that on a PS3 it would be _really_ fast... :-) well not really - while it is fairly easy to get postgresql running on a PS3 it is not a fast platform. While the main CPU there is a pretty fast Power based core it only has 256MB of Ram and a single SATA disk available(though you could add some USB disks). The nice thing about it is that TPC-C and other benchmarks all specify their bottom-line number in some unit like Transaction per second PER DOLLAR. So using a PS3 should be able to get ridiculously good results compared to expensive server hardware... I kinda doubt that - the PS3 is certainly not server grade hardware so you can only compare it to a desktop and I would bet that the typical desktop you get for the 400€(you can get 4GB RAM a quadcore CPU for that) price of a PS3 is going to outperform it significantly for almost every workload... Stefan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Any better plan for this query?..
Dimitri wrote: Folks, before you start to think what a dumb guy doing a dumb thing :-)) I'll explain you few details: it's for more than 10 years I'm using a db_STRESS kit (http://dimitrik.free.fr/db_STRESS.html) to check databases performance and scalability. Until now I was very happy with results it gave me as it stress very well each database engine internals an put on light some things I should probably skip on other workloads. What do you want, with a time the fast query executed before in 500ms now runs within 1-2ms - not only hardware was improved but also database engines increased their performance a lot! :-)) I was attempting to look into that benchmark kit a bit but I find the information on that page a bit lacking :( a few notices: * is the sourcecode for the benchmark actually available? the kit seems to contain a few precompiled binaries and some source/headfiles but there are no building instructions, no makefile or even a README which makes it really hard to verify exactly what the benchmark is doing or if the benchmark client might actually be the problem here. * there is very little information on how the toolkit talks to the database - some of the binaries seem to contain a static copy of libpq or such? * how many queries per session is the toolkit actually using - some earlier comments seem to imply you are doing a connect/disconnect cycle for every query ist that actually true? Stefan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Any better plan for this query?..
Dimitri wrote: Hi Stefan, sorry, I did not have a time to bring all details into the toolkit - but at least I published it instead to tell a nice story about :-) fair point and appreciated. But it seems important that benchmarking results can be verified by others as well... The client process is a binary compiled with libpq. Client is interpreting a scenario script and publish via SHM a time spent on each SQL request. I did not publish sources yet as it'll also require to explain how to compile them :-)) So for the moment it's shipped as a freeware, but with time everything will be available (BTW, you're the first who asking for sources (well, except IBM guys who asked to get it on POWER boxes, but it's another story :-)) well there is no licence tag(or a copyright notice) or anything als associated with the download which makes it a bit harder than it really needs to be. The reason why I was actually looking for the source is that all my available benchmark platforms are none of the ones you are providing binaries for which kinda reduces its usefulness. What is good is each client is publishing *live* its internal stats an we're able to get live data and follow any kind of waves in performance. Each session is a single process, so there is no contention between clients as you may see on some other tools. The current scenario script contains 2 selects (representing a Read transaction) and delete/insert/update (representing Write transaction). According a start parameters each client executing a given number Reads per Write. It's connecting on the beginning and disconnecting at the end of the test. well I have seen clients getting bottlenecked internally (like wasting more time in getting rid/absorbing of the actual result than it took the server to generate the answer...). How sure are you that your live publishing of data does not affect the benchmark results(because it kinda generates an artifical think time) for example? But what I get from your answer is that you are basically doing one connect/disconnect per client and the testcase you are talking about has 256 clients? Stefan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Any better plan for this query?..
Matthew Wakeling wrote: On Tue, 12 May 2009, Simon Riggs wrote: won't connect operations be all handled by a single thread - the parent postmaster? No, we spawn then authenticate. But you still have a single thread doing the accept() and spawn. At some point (maybe not now, but in the future) this could become a bottleneck given very short-lived connections. well the main cost is backend startup and that one is extremely expensive (compared to the cost of a simple query and also depending on the OS). We have more overhead there than other databases (most notably MySQL) hence what prompted my question on how the benchmark was operating. For any kind of workloads that contain frequent connection establishments one wants to use a connection pooler like pgbouncer(as said elsewhere in the thread already). Stefan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL with PostGIS on embedded hardware
Paolo Rizzi wrote: Are you saying that PostgreSQL+PostGIS can actually run on a smartphone??? Intriguing... Did anyone ever actually tried that??? If it's a supported CPU type and you've got a suitable build toolchain, sure. Seven or eight years ago we were getting a good laugh out of the fact that you could run PG on a PlayStation 2. Good to know!!! I imagine that on a PS3 it would be _really_ fast... :-) well not really - while it is fairly easy to get postgresql running on a PS3 it is not a fast platform. While the main CPU there is a pretty fast Power based core it only has 256MB of Ram and a single SATA disk available(though you could add some USB disks). Stefan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How to unique-ify HUGE table?
Scott Marlowe wrote: On Tue, Dec 23, 2008 at 11:14 AM, George Pavlov gpav...@mynewplace.com wrote: You don't say what PG version you are on, but just for kicks you may try using GROUP BY instead of DISTINCT. Yes, the two should perform the same, but with 8.1 (or maybe 8.0) I had seen situations where GROUP BY was faster (admittedly this happened with more complex queries). So, try this: Even in 8.3 it looks like group by is faster. Tested it on a decent sized table and group by used a hash agg and ran in ~600 ms, while distinct used a sort and ran in 1300 ms. That was on 500k rows. On a much larger table, one with about 10M rows, a similar statement runs in 1500 ms with group by and in 2390 ms when run with distinct. Not surprising - this is a known limitation in all released versions of postgresql (GROUP BY can use hashing and sorting - DISTINCT only sorting). 8.4 is going to improve that though. Stefan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Explain Analyze - Total runtime very differentes
[EMAIL PROTECTED] wrote: Hello friends ... I'm evaluating the performance of algorithms for optimization of queries. I am comparing results between the algorithm of Dynamic Programming and an implementation of Kruskal's algorithm. When submitting a query that makes reference to only 2 tables of my base, logically the same Query Plan is shown. But the Total runtime displayed by the command Explain-Analyze presents a variation of time very high: Dynamic Programming Total runtime: 1204.220 ms Kruskal Total runtime: 3744.879 ms No change of data (insert, delete, update) in the tables was made during the tests. The same query was submitted several times (with Kruskal and Dynamic Programming algorithms) and the variation of results persists. The explain analyze only reports the time to run *execute* the query. With the same Query Plan, does not understand why this variation occurs. In annex the Query Plans sure it it not something as simple as a caching effect - ie you run the slow variant first and pg and/or the OS buffered data and the repeated execution just got a benefit from that ? Try running all variations a few dozend times both in cached and uncached state and you should see the difference getting leveled out. Stefan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] pg_dump error - out of memory, Failed on request of size 536870912
Marcin Citowicki wrote: Hello, I forgot to add - all those 'out of memory' errors happen when backup db is trying to create index. Every 'CREATE INDEX' operation is followed by 'out of memory' error. are you sure that your OS (or ulimit) is able to support a maintenance_work_setting that large ? - try reducing to a say 128MB for a start and try again. Stefan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] query plan, index scan cost
hi list, i have a problem with time consuming query. first of all my table structure: CREATE TABLE nw_tla_2008_4_deu ( ID bigint NOT NULL, NET2CLASS smallint, FOW smallint, CONSTRAINT nw_tla_2008_4_deu_pkey PRIMARY KEY (ID), ) WITHOUT OIDS; CREATE INDEX nw_tla_2008_4_deu_fow_idx ON nw_tla_2008_4_deu USING btree (FOW); CREATE INDEX nw_tla_2008_4_deu_net2class_idx ON nw_tla_2008_4_deu USING btree (NET2CLASS); CREATE INDEX nw_tla_2008_4_deu_the_geom_gist ON nw_tla_2008_4_deu USING gist (the_geom gist_geometry_ops); ALTER TABLE nw_tla_2008_4_deu CLUSTER ON nw_tla_2008_4_deu_the_geom_gist; when i run the following query with explain analyze i get the following result: EXPLAIN ANALYZE SELECT nw.ID AS id FROM nw_tla_2008_4_deu AS nw WHERE expand(st_pointfromtext('POINT(13.7328934 51.049476)',4326), 0.2476961598054) nw.the_geom AND nw.FOW IN (1,2,3,4,10,17) AND nw.NET2CLASS IN (0,1,2,3) Bitmap Heap Scan on nw_tla_2008_4_deu nw (cost=35375.52..77994.15 rows=11196 width=8) (actual time=13307.830..13368.969 rows=15425 loops=1) Recheck Cond: (NET2CLASS = ANY ('{0,1,2,3}'::integer[])) Filter: (('010320E610010005C06BF82A4000A0A066494000C06BF82A4000C009A6494000E00FF62B4000C009A6494000E00FF62B4000A0A066494000C06BF82A4000A0A0664940'::geometry the_geom) AND (FOW = ANY ('{1,2,3,4,10,17}'::integer[]))) - BitmapAnd (cost=35375.52..35375.52 rows=12614 width=0) (actual time=13307.710..13307.710 rows=0 loops=1) - Bitmap Index Scan on nw_tla_2008_4_deu_the_geom_gist (cost=0.00..1759.12 rows=55052 width=0) (actual time=22.452..22.452 rows=52840 loops=1) Index Cond: ('010320E610010005C06BF82A4000A0A066494000C06BF82A4000C009A6494000E00FF62B4000C009A6494000E00FF62B4000A0A066494000C06BF82A4000A0A0664940'::geometry the_geom) - Bitmap Index Scan on nw_tla_2008_4_deu_net2class_idx (cost=0.00..33610.55 rows=1864620 width=0) (actual time=13284.121..13284.121 rows=2021814 loops=1) Index Cond: (NET2CLASS = ANY ('{0,1,2,3}'::integer[])) Total runtime: *13.332* ms running the next query which is only slightly different and has one instead of two and conditions leads to the following result EXPLAIN ANALYZE SELECT nw.ID AS id FROM nw_tla_2008_4_deu AS nw WHERE expand(st_pointfromtext('POINT(13.7328934 51.049476)',4326), 0.2476961598054) nw.the_geom AND nw.FOW IN (1,2,3,4,10,17) Bitmap Heap Scan on nw_tla_2008_4_deu nw (cost=1771.34..146161.54 rows=48864 width=8) (actual time=23.285..99.493 rows=47723 loops=1) Filter: (('010320E610010005C06BF82A4000A0A066494000C06BF82A4000C009A6494000E00FF62B4000C009A6494000E00FF62B4000A0A066494000C06BF82A4000A0A0664940'::geometry the_geom) AND (FOW = ANY ('{1,2,3,4,10,17}'::integer[]))) - Bitmap Index Scan on nw_tla_2008_4_deu_the_geom_gist (cost=0.00..1759.12 rows=55052 width=0) (actual time=22.491..22.491 rows=52840 loops=1) Index Cond: ('010320E610010005C06BF82A4000A0A066494000C06BF82A4000C009A6494000E00FF62B4000C009A6494000E00FF62B4000A0A066494000C06BF82A4000A0A0664940'::geometry the_geom) Total runtime: *109*ms so in both querys there are and conditions. there are two and conditions in the first query and one and condition in the second query. unfortunately i am not an expert in reading the postgre query plan. basically i am wondering why in the first query a second index scan is done whereas in the second query the second index scan is not done. the second query runs hundred times faster then first one which surprising to me. any ideas? regards, stefan _ In 5 Schritten zur eigenen Homepage. Jetzt Domain sichern und gestalten! Nur 3,99 EUR/Monat! http://www.maildomain.web.de/?mc=021114 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Planning a new server - help needed
Laszlo Nagy schrieb: [...] The RAID controller that I have selected can only handle 8 disks. I guess I need to find a different one with 16 channels and use more disks. So are you saying that with all disks in a bigger RAID 6 array, I will get the most out of the hardware? In that case, I'll try to get a bit more money from the management and build RAID 6 with 12 disks. Here a good SATA-Controllers for 4/8/12/16-Disks: http://www.tekram.com/product2/product_detail.asp?pid=51 Stefan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] URI to kind of a benchmark
Harald Armin Massa wrote: reading postgres benchmarks for beginners advises to stop reading on the words default (ie. unchanged postgresql.conf); but the real test is given right after: http://www.kaltenbrunner.cc/blog/index.php?/archives/21-guid.html That confirmes my first impression (on different workload) of the speed has doubled. If reality confirmes, that 8.2 to 8.3 will be a small step in versions, and a great step in databases. yeah but keep in mind that this one only tested a very specific scenario (update heavy, workload fits easily in buffercache and benefits from HOT) - it is a fairly neat improvement though ... Stefan ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] DRBD and Postgres: how to improve the perfomance?
Joshua D. Drake wrote: Gregory Stark wrote: Simon Riggs [EMAIL PROTECTED] writes: You're right, but the distinction is a small one. What are the chances of losing two independent servers within a few milliseconds of each other? If they're on the same power bus? That chance is minuscule or at least should be. Of course we are assuming some level of conditioned power that is independent of the power bus, e.g; a UPS. how is that making it different in practise ? - if both are on the same UPS they are affectively on the same power bus ... If the UPS fails (or the generator is not kicking in which happens way more often than people would believe) they could still fail at the very same time Stefan ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] DRBD and Postgres: how to improve the perfomance?
Joshua D. Drake wrote: Stefan Kaltenbrunner wrote: Joshua D. Drake wrote: Gregory Stark wrote: Simon Riggs [EMAIL PROTECTED] writes: You're right, but the distinction is a small one. What are the chances of losing two independent servers within a few milliseconds of each other? If they're on the same power bus? That chance is minuscule or at least should be. Of course we are assuming some level of conditioned power that is independent of the power bus, e.g; a UPS. how is that making it different in practise ? - if both are on the same UPS they are affectively on the same power bus ... Well I was thinking the bus that is in the wall. I would assume that people were smart enough to have independent UPS systems for each server. city power-line conditioning generator-panel-plug-UPS-server wash, rinse repeat. the typical datacenter version of this is actually more like: city power-UPS (with generator in parallel)-panel-plug or city power-flywheel-(maybe UPS)-panel-plug it is not really that common to have say two different UPS feeds in your rack (at least not for normal housing or the average corporate datacenter) - mostly you get two feeds from different power distribution panels (so different breakers) but that's about it. Having a local UPS attached is usually not really that helpful either because those have limited capacity need space and are an additional thing that can (and will) fail. Stefan ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] [HACKERS] Proposal: Pluggable Optimizer Interface
Julius Stroffek wrote: Hi All, Tomas Kovarik and I have presented at PGCon 2007 in Ottawa the ideas about other possible optimizer algorithms to be used in PostgreSQL. We are quite new to PostgreSQL project so it took us some time to go through the sources end explore the possibilities how things could be implemented. There is a proposal attached to this mail about the interface we would like to implement for switching between different optimizers. Please review it and provide a feedback to us. Thank You. hmm - how does is that proposal different from what got implemented with: http://archives.postgresql.org/pgsql-committers/2007-05/msg00315.php Stefan ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Postgres configuration for 64 CPUs, 128 GB RAM...
Marc Mamin wrote: Postgres configuration for 64 CPUs, 128 GB RAM... there are probably not that much installation out there that large - comments below Hello, We have the oppotunity to benchmark our application on a large server. I have to prepare the Postgres configuration and I'd appreciate some comments on it as I am not experienced with servers of such a scale. Moreover the configuration should be fail-proof as I won't be able to attend the tests. Our application (java + perl) and Postgres will run on the same server, whereas the application activity is low when Postgres has large transactions to process. There is a large gap between our current produtcion server (Linux, 4GB RAM, 4 cpus) and the benchmark server; one of the target of this benchmark is to verify the scalability of our application. [...] Posgres version: 8.2.1 upgrade to 8.2.4 File system: _http://en.wikipedia.org/wiki/ZFS_ way more important is what kind of disk-IO subsystem you have attached ... Planned configuration: # we don't expect more than 150 parallel connections, # but I suspect a leak in our application that let some idle connections open max_connections=2000 ssl = off #maximum allowed shared_buffers= 262143 this is probably on the lower side for a 128GB box # on our current best production server with 4GB RAM (not dedicated to Postgres), work_mem is set to 600 MB # this limitation is probably the bottleneck for our application as the files in pgsql_tmp grows up to 15 GB # during large aggregations (we have a locking mechanismus to avoid parallel processing of such transactions) work_mem = 31457280 # (30 GB) this is simply ridiculous - work_mem is PER SORT - so if your query requires 8 sorts it will feel free to use 8x30GB and needs to be multiplied by the number of concurrent connections. # index creation time is also an issue for us; the process is locking other large processes too. # our largest table so far is 13 GB + 11 GB indexes maintenance_work_mem = 31457280 # (30 GB) this is ridiculous too - testing has shown that there is not much point in going beyond 1GB or so # more than the max number of tables +indexes expected during the benchmark max_fsm_relations = 10 max_fsm_pages = 180 this is probably way to low for a database the size of yours - watch the oputput of VACUUM VERBOSE on a database wide vacuum for some stats on that. # don't know if I schoud modify this. # seems to be sufficient on our production servers max_stack_depth = 2MB # vacuum will be done per hand between each test session autovacuum = off # required to analyse the benchmark log_min_duration_statement = 1000 max_prepared_transaction = 100 # seems to be required to drop schema/roles containing large number of objects max_locks_per_transaction = 128 # I use the default for the bgwriter as I couldnt find recommendation on those #bgwriter_delay = 200ms # 10-1ms between rounds #bgwriter_lru_percent = 1.0 # 0-100% of LRU buffers scanned/round #bgwriter_lru_maxpages = 5 # 0-1000 buffers max written/round #bgwriter_all_percent = 0.333 # 0-100% of all buffers scanned/round #bgwriter_all_maxpages = 5 # 0-1000 buffers max written/round #WAL fsync = on #use default #wal_sync_method # we are using 32 on our production system wal_buffers=64 values up to 512 or so have been reported to help on systems with very high concurrency what is missing here is your settings for: effective_cache_size and random_page_cost Stefan ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] PostgreSQL publishes first real benchmark
Jignesh K. Shah wrote: Can you list others that seemed out of place? well to me the ones that look most questionable are: work_mem=100MB - so this benchmark is really low concurrency(which does not fit with max_connections=1000) and with trivial queries ? enable_seqscan = off - why ? effective_cache_size = 40GB - on a box with 16GB this seems wrong especially since there are some indications out there that suggest that while overestimating effective_cache_size was not a problem in versions 8.2 it might not be so in 8.2 and up wal_buffers = 2300 - there have been some numbers reported that going over the default of 8 helps but it is generally considered that going beyond 500 or maybe 1000 does not help at all ... and one more is that you claim you used -fast -O4 -xtarget=ultraT1 which is something we explicitly advise against in our own FAQ(http://www.postgresql.org/docs/faqs.FAQ_Solaris.html): Do not use any flags that modify behavior of floating point operations and errno processing (e.g.,-fast). These flags could raise some nonstandard PostgreSQL behavior for example in the date/time computing. Stefan ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] PostgreSQL Configuration Tool for Dummies
Campbell, Lance wrote: Now I am at the difficult part, what parameters to calculate and how to calculate them. Everything below has to do with PostgreSQL version 8.2: The parameters I would think we should calculate are: max_connections shared_buffers work_mem maintenance_work_mem effective_cache_size random_page_cost Any other variables? I am open to suggestions. we also should scale max_fsm_pages according to the database size and workload answers - I also note that the configuration file it generates seems to look like on for PostgreSQL 7.x or something - I think we should just include the specific parameters to change. Stefan ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] upgraded to pgsql 8.2.4, getting worse performance then 7.4.x
Michael Fuhr wrote: On Sat, Jun 02, 2007 at 09:13:32AM -0400, Douglas J Hunley wrote: Our 'esteemed' Engr group recently informed a customer that in their testing, upgrading to 8.2.x improved the performance of our J2EE application approximately 20%, so of course, the customer then tasked me with upgrading them. We dumped their db, removed pgsql, installed the 8.2.4 rpms from postgresql.org, did an initdb, and the pg_restored their data. It's been about a week now, and the customer is complaining that in their testing, they are seeing a 30% /decrease/ in general performance. After the restore, did you ANALYZE the entire database to update the planner's statistics? Have you enabled autovacuum or are you otherwise vacuuming and analyzing regularly? What kind of queries are slower than desired? If you post an example query and the EXPLAIN ANALYZE output then we might be able to see if the slowness is due to query plans. A few differences between the configuration files stand out. The 7.4 file has the following settings: shared_buffers = 25000 sort_mem = 15000 effective_cache_size = 196608 The 8.2 config has: #shared_buffers = 32MB #work_mem = 1MB #effective_cache_size = 128MB To be equivalent to the 7.4 config the 8.2 config would need: shared_buffers = 195MB work_mem = 15000kB effective_cache_size = 1536MB With 8GB of RAM you might try increasing shared_buffers to 400MB - 800MB (less if the entire database isn't that big) and effective_cache_size to 5GB - 6GB. You might have to increase the kernel's shared memory settings before increasing shared_buffers. some testing here has shown that while it is usually a good idea to set effective_cache_size rather optimistically in versions 8.2 it is advisable to make it accurate or even a bit less than that in 8.2 and up. Stefan ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Domains versus Check Constraints
Jim C. Nasby wrote: On Tue, May 22, 2007 at 12:56:21PM -0400, Chander Ganesan wrote: Are there any performance improvements that come from using a domain over a check constraint (aside from the ease of management component)? No. Plus support for domain constraints isn't universal (plpgsql doesn't honor them, for example). since 8.2 domain constraints are enforced everywhere ... Stefan ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] more on high load on postgres 7.4.16
Geoffrey wrote: We are trying to attack this problem from multiple avenues, thus I'm starting a separate thread. This is with regard to the problem posted via thread: http://archives.postgresql.org/pgsql-performance/2007-04/msg00120.php One thing we are seeing with this move to the new hardware (and rhas 4) is database connection processes that are left over by users who have exited the application. I've attached to these processes via gdb and find they all have the same backtrace. Any insights into what might be causing this issue would be appreciated. Understand, we did not have this problem on the previous hardware running on rhes 3. Here is the backtrace: #0 0x00ba47a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 #1 0x0019f1de in __lll_mutex_lock_wait () from /lib/tls/libpthread.so.0 #2 0x0019ca7a in _L_mutex_lock_23 () from /lib/tls/libpthread.so.0 #3 0xbfed9438 in ?? () #4 0x00c96a4e in pthread_cond_destroy@@GLIBC_2.3.2 () from /lib/tls/libc.so.6 #5 0x00c96a4e in pthread_cond_destroy@@GLIBC_2.3.2 () from /lib/tls/libc.so.6 #6 0x0015243f in critSec::~critSec () from /usr/local/pcm170/libdalkutil.so #7 0x003a48b8 in Comp_ZipFiles () from /usr/local/pcm170/libcompress.so /usr/local on RHEL should only contain software installed directly from source - what exactly is pcm170/libdalkutil ? beside that - is pg actually compiled with debugging symbols on that platform ? Stefan ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] SCSI vs SATA
Joshua D. Drake wrote: Good point. On another note, I am wondering why nobody's brought up the command-queuing perf benefits (yet). Is this because sata vs scsi are at SATAII has similar features. par here? I'm finding conflicting information on this -- some calling sata's ncq mostly crap, others stating the real-world results are negligible. I'm inclined to believe SCSI's pretty far ahead here but am having trouble finding recent articles on this. What I find is, a bunch of geeks sit in a room and squabble about a few percentages one way or the other. One side feels very l33t because their white paper looks like the latest swimsuit edition. Real world specs and real world performance shows that SATAII performs, very, very well. It is kind of like X86. No chip engineer that I know has ever said, X86 is elegant but guess which chip design is conquering all others in the general and enterprise marketplace? SATAII brute forces itself through some of its performance, for example 16MB write cache on each drive. sure but for any serious usage one either wants to disable that cache(and rely on tagged command queuing or how that is called in SATAII world) or rely on the OS/raidcontroller implementing some sort of FUA/write barrier feature(which linux for example only does in pretty recent kernels) Stefan ---(end of broadcast)--- TIP 6: explain analyze is your friend