Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan
Hello. As of me, all this hot thing really looks like uncertain and dynamic enough. Two things that I could directly use right now (and they are needed in pair) are: 1)Per-table/index/database bufferpools (split shared buffer into parts, allow to specify which index/table/database goes where) 2)Per-table/index cost settings If I had this, I could allocate specific bufferpools for tables/indexes that MUST be hot in memory and set low costs for this specific tables. P.S. Third thing, great to have to companion this two is Load on startup flag to automatically populate bufferpools with fast sequential read, but this can be easily emulated with a statement. Best regards, Vitalii Tymchyshyn -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hash Anti Join performance degradation
2011/5/24 panam pa...@gmx.net: Hi, In my DB, there is a large table containing messages and one that contains message boxes. Messages are assigned to boxes via a child parent link m-b. In order to obtain the last message for a specific box, I use the following SQL: SELECT m1.id FROM message m1 LEFT JOIN message m2 ON (m1.box_id = m2.box_id AND m1.id m2.id) WHERE m2.id IS NULL AND m1.box_id = id; This worked quite well for a long time. But now, suddenly the performance of this query drastically degraded as new messages were added. If these new messages are removed again, everything is back to normal. If other messages are removed instead, the problem remains, so it does not seem to be a memory issue. I fear I have difficulties to understand what is going wrong here. We need more information here. The case is in fact interesting. What's the PostgreSQL version, and did you have log of vacuum and checkpoint activity ? (no vacuum full/cluster or such thing running ?) Obvisouly, Craig suggestion to read http://wiki.postgresql.org/wiki/SlowQueryQuestions is relevant as it helps to have all common information required to analyze the issue. This is the query plan when everything is fine: Seq Scan on public.box this_ (cost=0.00..10467236.32 rows=128 width=696) (actual time=0.169..7683.978 rows=128 loops=1) Output: this_.id, this_.login, (SubPlan 1) Buffers: shared hit=188413 read=94635 written=135, temp read=22530 written=22374 SubPlan 1 - Hash Anti Join (cost=41323.25..81775.25 rows=20427 width=8) (actual time=59.571..59.877 rows=1 loops=128) Output: m1.id Hash Cond: (m1.box_id = m2.box_id) Join Filter: (m1.id m2.id) Buffers: shared hit=188412 read=94633 written=135, temp read=22530 written=22374 - Bitmap Heap Scan on public.message m1 (cost=577.97..40212.28 rows=30640 width=16) (actual time=3.152..9.514 rows=17982 loops=128) Output: m1.id, m1.box_id Recheck Cond: (m1.box_id = $0) Buffers: shared hit=131993 read=9550 written=23 - Bitmap Index Scan on message_box_Idx (cost=0.00..570.31 rows=30640 width=0) (actual time=2.840..2.840 rows=18193 loops=128) Index Cond: (m1.box_id = $0) Buffers: shared hit=314 read=6433 written=23 - Hash (cost=40212.28..40212.28 rows=30640 width=16) (actual time=26.840..26.840 rows=20014 loops=115) Output: m2.box_id, m2.id Buckets: 4096 Batches: 4 (originally 2) Memory Usage: 5444kB Buffers: shared hit=56419 read=85083 written=112, temp written=7767 - Bitmap Heap Scan on public.message m2 (cost=577.97..40212.28 rows=30640 width=16) (actual time=2.419..20.007 rows=20014 loops=115) Output: m2.box_id, m2.id Recheck Cond: (m2.box_id = $0) Buffers: shared hit=56419 read=85083 written=112 - Bitmap Index Scan on message_box_Idx (cost=0.00..570.31 rows=30640 width=0) (actual time=2.166..2.166 rows=20249 loops=115) Index Cond: (m2.box_id = $0) Buffers: shared hit=6708 Total runtime: 7685.202 ms This is the plan when the query gets sluggish: Seq Scan on public.box this_ (cost=0.00..10467236.32 rows=128 width=696) (actual time=0.262..179333.086 rows=128 loops=1) Output: this_.id, this_.login, (SubPlan 1) Buffers: shared hit=189065 read=93983 written=10, temp read=22668 written=22512 SubPlan 1 - Hash Anti Join (cost=41323.25..81775.25 rows=20427 width=8) (actual time=1264.700..1400.886 rows=1 loops=128) Output: m1.id Hash Cond: (m1.box_id = m2.box_id) Join Filter: (m1.id m2.id) Buffers: shared hit=189064 read=93981 written=10, temp read=22668 written=22512 - Bitmap Heap Scan on public.message m1 (cost=577.97..40212.28 rows=30640 width=16) (actual time=3.109..9.850 rows=18060 loops=128) Output: m1.id, m1.box_id Recheck Cond: (m1.box_id = $0) Buffers: shared hit=132095 read=9448 - Bitmap Index Scan on message_box_Idx (cost=0.00..570.31 rows=30640 width=0) (actual time=2.867..2.867 rows=18193 loops=128) Index Cond: (m1.box_id = $0) Buffers: shared hit=312 read=6435 - Hash (cost=40212.28..40212.28 rows=30640 width=16) (actual time=27.533..27.533 rows=20102 loops=115) Output: m2.box_id, m2.id Buckets: 4096 Batches: 4 (originally 2) Memory Usage: 5522kB Buffers: shared hit=56969 read=84533 written=10, temp written=7811 - Bitmap Heap Scan on public.message m2 (cost=577.97..40212.28 rows=30640 width=16) (actual time=2.406..20.492 rows=20102 loops=115)
Re: [PERFORM] Hash Anti Join performance degradation
Hi Craig and Cédric, Thanks for the very informative introduction to the netiquette here and thanks for sharing your time. I wasn't aware of http://explain.depesz.com/, very useful. So, here are the query plans: http://explain.depesz.com/s/6AU (1st from previous post, good) http://explain.depesz.com/s/YPS (2nd from previous post, bad) The usual cause is that the statistics for estimated row counts cross a threshold that makes the query planner think that a different kind of plan will be faster. Hm, as far as i understand the plans, they are equivalent, aren't they? If the query planner is using bad information about the performance of the storage, then it will be making bad decisions about which approach is faster. So the usual thing to do is to adjust seq_page_cost and random_page_cost to more closely reflect the real performance of your hardware, and to make sure that effective_cache_size matches the real amount of memory your computer has free for disk cache use. Will this make any difference even when the plans are equivalent as assumed above? The table creation SQL is as follows: http://pastebin.com/qFDUP7Aa (Message table); ~ 2328680 rows, is growing constantly (~ 1 new rows each day), http://pastebin.com/vEmh4hb8 (Box table); ~ 128 rows (growing very slowly 1 row every two days, each row updated about 2x a day) The DB contains the same data, except that for the good query, the last 10976 rows (0.4%) of message are removed by doing a DELETE FROM message where timestamp TO_DATE ('05/23/2011','mm/dd/'); This speeds up the query by a factor of ~27. (207033.081 (bad) vs. 7683.978 (good)). Each query was run before and after a vacuum analyze, one time to create appropriate statistics, and the second time to do the actual measurement. All tests were made on the dev-machine, which is a 8GB, Core i7, Windows 7 I experienced the issue at first on the production-environment, which is a 64-bit Ubuntu, running PostgreSQL 9.0.1 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit, and later for analysis on the dev-environment, which is a 64-bit Windows 7, running PostgreSQL 9.0.4, compiled by Visual C++ build 1500, 64-bit For testing, I've increased the buffers that I judge important for the issue to the following values: effective_cache_size: 4GB shared_buffers: 1GB work_mem: 1GB temp_buffers: 32MB After that, configuration was reloaded and the postgresql service was restarted using pgAdmin. Interestingly, there was no performance gain as compared to the default settings, the bad query even took about 30 seconds (15%) longer. As well it seems, all data fit into memory, so there is not much disk I/O involved. @Cédric did you have log of vacuum and checkpoint activity ? (no vacuum full/cluster or such thing running ?) There is no clustering involved here, its a pretty basic setup. How can I obtain the information you require here? I could send you the output of the analyse vacuum command from pgAdmin, but is there a way to make it output the information in English (rather than German)? Thanks for your interest in this issue. Regards, panam -- View this message in context: http://postgresql.1045698.n5.nabble.com/Hash-Anti-Join-performance-degradation-tp4420974p4422247.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] [PERFORMANCE] expanding to SAN: which portion best to move
On 05/17/2011 05:47 AM, Craig Ringer wrote: This makes me wonder if Pg attempts to pre-fetch blocks of interest for areas where I/O needs can be known in advance, while there's still other works or other I/O to do. For example, pre-fetching for the next iteration of a nested loop while still executing the prior one. Is it even possible? Well, remember that a nested loop isn't directly doing any I/O. It's pulling rows from some lower level query node. So the useful question to ask is how can pre-fetch speed up the table access methods? That worked out like this: Sequential Scan: logic here was added and measured as useful for one system with terrible I/O. Everywhere else it was tried on Linux, the read-ahead logic in the kernel seems to make this redundant. Punted as too much complexity relative to measured average gain. You can try to tweak this on a per-file database in an application, but the kernel has almost as much information to make that decision usefully as the database does. Index Scan: It's hard to know what you're going to need in advance here and pipeline the reads, so this hasn't really been explored yet. Bitmap heap scan: Here, the exact list of blocks to fetch is known in advance, they're random, and it's quite possible for the kernel to schedule them more efficiently than serial access of them can do. This was added as the effective_io_concurrency feature (it's the only thing that feature impacts), which so far is only proven to work on Linux. Any OS implementing the POSIX API used will also get this however; FreeBSD was the next likely candidate that might benefit when I last looked around. I'm guessing not, because (AFAIK) Pg uses only synchronous blocking I/O, and with that there isn't really a way to pre-fetch w/o threads or helper processes. Linux (at least) supports buffered async I/O, so it'd be possible to submit such prefetch requests ... on modern Linux kernels. Portably doing so, though - not so much. Linux supports the POSIX_FADV_WILLNEED advisory call, which is perfect for suggesting what blocks will be accessed in the near future in the bitmap heap scan case. That's how effective_io_concurrency works. Both Solaris and Linux also have async I/O mechanisms that could be used instead. Greg Stark built a prototype and there's an obvious speed-up there to be had. But the APIs for this aren't very standard, and it's really hard to rearchitect the PostgreSQL buffer manager to operate in a less synchronous way. Hoping that more kernels support the will need API usefully, which meshes very well with how PostgreSQL thinks about the problem, is where things are at right now. With so many bigger PostgreSQL sites on Linux, that's worked out well so far. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance degradation of inserts when database size grows
Dne 24.5.2011 07:24, Terry Schmitt napsal(a): As near as I can tell from your test configuration description, you have JMeter -- J2EE -- Postgres. Have you ruled out the J2EE server as the problem? This problem may not be the database. I would take a look at your app server's health and look for any potential issues there before spending too much time on the database. Perhaps there are memory issues or excessive garbage collection on the app server? It might be part of the problem, yes, but it's just a guess. We need to se some iostat / iotop / vmstat output to confirm that. The probable cause here is that the indexes grow with the table, get deeper, so when you insert a new row you need to modify more and more pages. That's why the number of buffers grows over time and the checkpoint takes more and more time (the average write speed is about 15 MB/s - not sure if that's good or bad performance). The question is whether this is influenced by other activity (Java GC or something) I see three ways to improve the checkpoint performance: 1) set checkpoint_completion_target = 0.9 or something like that (this should spread the checkpoint, but it also increases the amount of checkpoint segments to keep) 2) make the background writer more aggressive (tune the bgwriter_* variables), this is similar to (1) 3) improve the write performance (not sure how random the I/O is in this case, but a decent controller with a cache might help) and then two ways to decrease the index overhead / amount of modified buffers 1) keep only the really necessary indexes (remove duplicate, indexes, remove indexes where another index already performs reasonably, etc.) 2) partition the table (so that only indexes on the current partition will be modified, and those will be more shallow) Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hash Anti Join performance degradation
panam pa...@gmx.net writes: In my DB, there is a large table containing messages and one that contains message boxes. Messages are assigned to boxes via a child parent link m-b. In order to obtain the last message for a specific box, I use the following SQL: SELECT m1.id FROM message m1 LEFT JOIN message m2 ON (m1.box_id = m2.box_id AND m1.id m2.id) WHERE m2.id IS NULL AND m1.box_id = id; BTW, this query doesn't actually match the EXPLAIN outputs... So from my limited experience, the only significant difference I see is that the Hash Anti Join takes a lot more time in plan 2, but I do not understand why. Whatever's going on is below the level that EXPLAIN can show. I can think of a couple of possibilities: 1. The extra rows in the slower case all manage to come out to the same hash value, or some very small number of distinct hash values, such that we spend a lot of time searching a single hash chain. But it's hard to credit that adding 0.4% more rows could result in near 100x slowdown, no matter how bad their distribution. 2. There's some inefficiency in the use of the temp files, though again it's far from clear why your two cases would be noticeably different there. Possibly enabling log_temp_files would tell you something useful. One other thing I'm not following is how come it's using hash temp files at all, when you claim in your later message that you've got work_mem set to 1GB. It should certainly not take more than a couple meg to hold 20K rows at 16 payload bytes per row. You might want to check whether that setting actually took effect. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance degradation of inserts when database size grows
On 05/17/2011 08:45 AM, Andrey Vorobiev wrote: 1. How does database size affect insert performance? As indexes grow, it becomes slower to insert into them. It has to navigate all of the indexes on the table to figure out where to add the new row into there, and that navigation time goes up when tables are larger. Try using the queries at http://wiki.postgresql.org/wiki/Disk_Usage to quantify how big your indexes are. Many people are absolutely shocked to see how large they become. And some database designers throw indexes onto every possible column combination as if they were free. 2. Why does number of written buffers increase when database size grows? As indexes grow, the changes needed to insert more rows get spread over more blocks too. You can install pg_buffercache and analyze what's actually getting dirty in the buffer cache to directly measure what's changing here. If you look at http://projects.2ndquadrant.com/talks and download the Inside the PostgreSQL Buffer Cache talk and its Sample Queries set, those will give you some examples of how to summarize everything. 3. How can I further analyze this problem? This may not actually be a problem in that it's something you can resolve. If you assume that you can insert into a giant table at the same speed you can insert into a trivial one, you'll have to adjust your thinking because that's never going to be true. Removing some indexes may help; reducing the columns in the index is also good; and some people end up partitioning their data specifically to help with this situation. It's also possible to regain some of the earlier performance using things like REINDEX and CLUSTER. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: FW: [PERFORM] KVP table vs. hstore - hstore performance (Was: Postgres NoSQL emulation)
Salut Pierre You wrote Try to create a btree index on (bench_hstore-bench_id) WHERE (bench_hstore-bench_id) IS NOT NULL. What do you mean exactly? = CREATE INDEX myhstore_kps_gin_idx ON myhstore USING gin(kvps) WHERE ??? IS NOT NULL; My table's def is: CREATE TABLE myhstore ( id bigint PRIMARY KEY, kvps hstore NOT NULL ); So I'm doing something like: CREATE INDEX myhstore_kps_gin_idx ON myhstore USING gin(kvps); Stefan 2011/5/23 Pierre C li...@peufeu.com: Hi Merlin The analyze command gave the following result: On the KVP table: Index Scan using kvpidx on bench_kvp (cost=0.00..8.53 rows=1 width=180) (actual time=0.037..0.038 rows=1 loops=1) Index Cond: (bench_id = '20_20'::text) Total runtime: 0.057 ms And on the Hstore table: Bitmap Heap Scan on bench_hstore (cost=32.22..3507.54 rows=1000 width=265) (actual time=145.040..256.173 rows=1 loops=1) Recheck Cond: (bench_hstore @ 'bench_id=20_20'::hstore) - Bitmap Index Scan on hidx (cost=0.00..31.97 rows=1000 width=0) (actual time=114.748..114.748 rows=30605 loops=1) Index Cond: (bench_hstore @ 'bench_id=20_20'::hstore) Total runtime: 256.211 ms For Hstore I'm using a GIST index. Try to create a btree index on (bench_hstore-bench_id) WHERE (bench_hstore-bench_id) IS NOT NULL. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hash Anti Join performance degradation
On 24/05/11 22:34, panam wrote: The usual cause is that the statistics for estimated row counts cross a threshold that makes the query planner think that a different kind of plan will be faster. Hm, as far as i understand the plans, they are equivalent, aren't they? Yes, they are, and the estimates are too. This isn't the usual case where the planner trips over a threshold and switches to a totally different plan type that it thinks is faster, but isn't. The estimates are actually IDENTICAL for the hash anti-join node of interest, and so are the actual loop count and row count. Temp file activity is much the same across both plans too. You can reproduce this behaviour consistently? It's _seriously_ weird, and the sort of thing that when I encounter myself I tend to ask what else is going on that I'm missing?. What happens if you DELETE more rows? Or fewer? What's the threshold? What happens if you DELETE rows from the start not the end, or a random selection? Does the problem persist if you DELETE the rows then CLUSTER the table before running the query? Does the problem persist if you DELETE the rows then REINDEX? If the query planner is using bad information about the performance of the storage, then it will be making bad decisions about which approach is faster. [snip] Will this make any difference even when the plans are equivalent as assumed above? Nope. It doesn't seem to be a problem with plan selection. This speeds up the query by a factor of ~27. (207033.081 (bad) vs. 7683.978 (good)). That's a serious WTF. @Cédric did you have log of vacuum and checkpoint activity ? (no vacuum full/cluster or such thing running ?) There is no clustering involved here, its a pretty basic setup. He means 'CLUSTER', the SQL command that tells PostgreSQL to re-organize a table. The answer from the rest of your post would appear to be 'no, it's being run in an otherwise-idle stand-alone test environment'. Right? How can I obtain the information you require here? I could send you the output of the analyse vacuum command from pgAdmin, but is there a way to make it output the information in English (rather than German)? It's easy enough to read familiar output like that in German, if needs be. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance degradation of inserts when database size grows
Tomas Vondra t...@fuzzy.cz wrote: Dne 24.5.2011 07:24, Terry Schmitt napsal(a): As near as I can tell from your test configuration description, you have JMeter -- J2EE -- Postgres. Have you ruled out the J2EE server as the problem? This problem may not be the database. I would take a look at your app server's health and look for any potential issues there before spending too much time on the database. Perhaps there are memory issues or excessive garbage collection on the app server? It might be part of the problem, yes, but it's just a guess. We need to se some iostat / iotop / vmstat output to confirm that. The probable cause here is that the indexes grow with the table, get deeper, so when you insert a new row you need to modify more and more pages. That's why the number of buffers grows over time and the checkpoint takes more and more time (the average write speed is about 15 MB/s - not sure if that's good or bad performance). The question is whether this is influenced by other activity (Java GC or something) I see three ways to improve the checkpoint performance: 1) set checkpoint_completion_target = 0.9 or something like that (this should spread the checkpoint, but it also increases the amount of checkpoint segments to keep) 2) make the background writer more aggressive (tune the bgwriter_* variables), this is similar to (1) 3) improve the write performance (not sure how random the I/O is in this case, but a decent controller with a cache might help) and then two ways to decrease the index overhead / amount of modified buffers 1) keep only the really necessary indexes (remove duplicate, indexes, remove indexes where another index already performs reasonably, etc.) 2) partition the table (so that only indexes on the current partition will be modified, and those will be more shallow) Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance