[PERFORM] Unexpected slow query time when joining small table with large table
Hello all! I'm new to postgresql, so please bear with me. First of all, I have the following settings enabled in my postgresql.conf file: shared_buffers = 2GB work_mem = 2GB maintenance_work_mem = 4GB checkpoint_segments = 50 random_page_cost = 3.5 cpu_tuple_cost = 0.1 effective_cache_size = 48GB I am trying to join a small table containing 127,375 records with a larger table containing 4,830,840 records. The follow query currently takes about 300ms: select bigtable.a, bigtable.b, bigtable.c, count(*) from smalltable, bigtable where bigtable.id = smalltable.user_id and smalltable.utc between 132537600 and 132672160 group by bigtable.a, bigtable.b, bigtable.c; There's an index on the smalltable.utc field, and bigtable.id is the primary key for that table. Here's the result of running explain analyze: HashAggregate (cost=227061.05..227063.45 rows=24 width=6) (actual time=388.519..388.527 rows=24 loops=1) -> Nested Loop (cost=0.85..226511.95 rows=54911 width=6) (actual time=0.054..359.969 rows=54905 loops=1) -> Index Scan using smalltable_utc_idx on smalltable (cost=0.42..7142.13 rows=54911 width=8) (actual time=0.034..28.803 rows=54905 loops=1) Index Cond: ((utc >= 132537600::bigint) AND (utc <= 132672160::bigint)) -> Index Scan using bigtable_pkey on bigtable (cost=0.43..3.90 rows=1 width=14) (actual time=0.005..0.005 rows=1 loops=54905) Index Cond: (id = ht.user_id) Total runtime: 388.613 ms (7 rows) Time: 389.922 ms When I do \d+, I see that bigtable is 387MB and smalltable is only 10MB. Is there a way that I can get this query to perform faster? Or is this the type of performance that I can expect for this type of join? Thank you! Ryan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Horrific time for getting 1 record from an index?
On Mon, Nov 11, 2013 at 3:28 PM, Jim Nasby wrote: > On 11/11/13 4:57 PM, Jeff Janes wrote: > > On Mon, Nov 11, 2013 at 1:57 PM, Jim Nasby > jna...@enova.com>> wrote: >> Btree indexes have special code that kill index-tuples when the >> table-tuple is dead-to-all, so only the first such query after the mass >> deletion becomes vacuum-eligible should be slow, even if a vacuum is not >> done. But if there are long running transactions that prevent the dead >> rows from going out of scope, nothing can be done until those transactions >> go away. >> > > There is? I didn't know that, can you point me at code? > git grep "kill_prior_tuple" > > BTW, I originally had this, even after multiple queries: > >Buffers: shared hit=1 read=9476 > > Then vacuum: > INFO: index "page_hits_raw_pkey" now contains 50343572 row versions in > 182800 pages > DETAIL: 3466871 index row versions were removed. > 44728 index pages have been deleted, 35256 are currently reusable. > > Then... > >Buffers: shared hit=1 read=4 > > So I suspect a vacuum is actually needed... Hmm. Maybe the kill method doesn't unlink the empty pages from the tree? Cheers, Jeff
Re: [PERFORM] Horrific time for getting 1 record from an index?
On 11/11/13 4:57 PM, Jeff Janes wrote: On Mon, Nov 11, 2013 at 1:57 PM, Jim Nasby mailto:jna...@enova.com>> wrote: Btree indexes have special code that kill index-tuples when the table-tuple is dead-to-all, so only the first such query after the mass deletion becomes vacuum-eligible should be slow, even if a vacuum is not done. But if there are long running transactions that prevent the dead rows from going out of scope, nothing can be done until those transactions go away. There is? I didn't know that, can you point me at code? BTW, I originally had this, even after multiple queries: Buffers: shared hit=1 read=9476 Then vacuum: INFO: index "page_hits_raw_pkey" now contains 50343572 row versions in 182800 pages DETAIL: 3466871 index row versions were removed. 44728 index pages have been deleted, 35256 are currently reusable. Then... Buffers: shared hit=1 read=4 So I suspect a vacuum is actually needed... -- Jim Nasby, Lead Data Architect (512) 569-9461 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Horrific time for getting 1 record from an index?
On Mon, Nov 11, 2013 at 1:57 PM, Jim Nasby wrote: > > explain (analyze,buffers) select min(insert_time) from > cnu_stats.page_hits_raw ; > > QUERY PLAN > > > - > Result (cost=0.12..0.13 rows=1 width=0) (actual time=119.347..119.347 > rows=1 loops=1) >Buffers: shared hit=1 read=9476 >InitPlan 1 (returns $0) > -> Limit (cost=0.00..0.12 rows=1 width=8) (actual > time=119.335..119.336 rows=1 loops=1) >Buffers: shared hit=1 read=9476 >-> Index Scan using page_hits_raw_pkey on page_hits_raw > (cost=0.00..5445004.65 rows=47165480 width=8) (actual > time=119.333..119.333 rows=1 loops=1) > > Index Cond: (insert_time IS NOT NULL) > Buffers: shared hit=1 read=9476 > Total runtime: 119.382 ms > (9 rows) > > We do run a regular process to remove older rows... I thought we were > vacuuming after that process but maybe not. Btree indexes have special code that kill index-tuples when the table-tuple is dead-to-all, so only the first such query after the mass deletion becomes vacuum-eligible should be slow, even if a vacuum is not done. But if there are long running transactions that prevent the dead rows from going out of scope, nothing can be done until those transactions go away. Cheers, Jeff
Re: [PERFORM] Horrific time for getting 1 record from an index?
On Mon, Nov 11, 2013 at 1:57 PM, Jim Nasby wrote: > We do run a regular process to remove older rows... I thought we were > vacuuming after that process but maybe not. Could be a long query, idle-in-transaction, prepared transaction, or hot standby feedback gone bad, too. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Horrific time for getting 1 record from an index?
On 11/11/13 3:51 PM, Daniel Farina wrote: On Mon, Nov 11, 2013 at 1:48 PM, Jim Nasby wrote: Postgres 9.1.9. explain analyze select min(insert_time) from cnu_stats.page_hits_raw ; I checked and there were no un-granted locks... but I have a hard time believing it actually too 257 seconds to get 2 pages (one index, one heap) back from our SAN. Try adding EXPLAIN (ANALYZE, BUFFERS). I am wondering if you are reading through a lot of pages addressing tuples not visible to the transaction. explain (analyze,buffers) select min(insert_time) from cnu_stats.page_hits_raw ; QUERY PLAN - Result (cost=0.12..0.13 rows=1 width=0) (actual time=119.347..119.347 rows=1 loops=1) Buffers: shared hit=1 read=9476 InitPlan 1 (returns $0) -> Limit (cost=0.00..0.12 rows=1 width=8) (actual time=119.335..119.336 rows=1 loops=1) Buffers: shared hit=1 read=9476 -> Index Scan using page_hits_raw_pkey on page_hits_raw (cost=0.00..5445004.65 rows=47165480 width=8) (actual time=119.333..119.333 rows=1 loops=1) Index Cond: (insert_time IS NOT NULL) Buffers: shared hit=1 read=9476 Total runtime: 119.382 ms (9 rows) We do run a regular process to remove older rows... I thought we were vacuuming after that process but maybe not. -- Jim Nasby, Lead Data Architect (512) 569-9461 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Horrific time for getting 1 record from an index?
On Mon, Nov 11, 2013 at 1:48 PM, Jim Nasby wrote: > Postgres 9.1.9. > > explain analyze select min(insert_time) from cnu_stats.page_hits_raw ; > > QUERY PLAN > --- > Result (cost=0.12..0.13 rows=1 width=0) (actual > time=257545.835..257545.836 rows=1 loops=1) >InitPlan 1 (returns $0) > -> Limit (cost=0.00..0.12 rows=1 width=8) (actual > time=257545.828..257545.829 rows=1 loops=1) >-> Index Scan using page_hits_raw_pkey on page_hits_raw > (cost=0.00..5445004.65 rows=47165480 width=8) (actual > time=257545.826..257545.826 rows=1 loops=1) > Index Cond: (insert_time IS NOT NULL) > Total runtime: 257545.881 ms > (6 rows) > > > I checked and there were no un-granted locks... but I have a hard time > believing it actually too 257 seconds to get 2 pages (one index, one heap) > back from our SAN. Try adding EXPLAIN (ANALYZE, BUFFERS). I am wondering if you are reading through a lot of pages addressing tuples not visible to the transaction. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Horrific time for getting 1 record from an index?
Postgres 9.1.9. explain analyze select min(insert_time) from cnu_stats.page_hits_raw ; QUERY PLAN --- Result (cost=0.12..0.13 rows=1 width=0) (actual time=257545.835..257545.836 rows=1 loops=1) InitPlan 1 (returns $0) -> Limit (cost=0.00..0.12 rows=1 width=8) (actual time=257545.828..257545.829 rows=1 loops=1) -> Index Scan using page_hits_raw_pkey on page_hits_raw (cost=0.00..5445004.65 rows=47165480 width=8) (actual time=257545.826..257545.826 rows=1 loops=1) Index Cond: (insert_time IS NOT NULL) Total runtime: 257545.881 ms (6 rows) I checked and there were no un-granted locks... but I have a hard time believing it actually too 257 seconds to get 2 pages (one index, one heap) back from our SAN. Am I missing something here? -- Jim Nasby, Lead Data Architect (512) 569-9461 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] postgresql recommendation memory
On Thu, Nov 7, 2013 at 2:13 AM, Евгений Селявка wrote: > All my sar statistics > ... > sar -u ALL > 11:40:02 AM CPU %usr %nice %sys %iowait%steal > %irq %soft%guest %idle > 01:15:01 PM all 8.57 0.00 1.52 1.46 0.00 > 0.00 0.05 0.00 88.40 > 01:20:01 PM all 8.50 0.00 1.53 1.61 0.00 > 0.00 0.05 0.00 88.31 > 01:25:01 PM all 9.00 0.00 1.78 1.27 0.00 > 0.00 0.06 0.00 87.89 > 01:30:01 PM all 9.58 0.00 1.63 1.71 0.00 > 0.00 0.06 0.00 87.01 > 01:35:01 PM all 8.75 0.00 1.47 1.57 0.00 > 0.00 0.06 0.00 88.15 > Did a freeze-up occur in there someplace? Otherwise, that is not not so useful. You should try to decrease the sar interval to 1 min if you can. The extra overhead is negligible and the extra information can be very valuable. I'd also have something like "vmstat 5" running and capture that. Although perhaps one of the options to sar other than -u capture that same information, I know little of the other sar options. Cheers, Jeff
Re: [PERFORM] postgresql recommendation memory
On Mon, Nov 11, 2013 at 8:14 AM, Scott Marlowe wrote: > well you can hopefully reduce connections from jdbc pooling then. The > fact that the connections are idle is good. > > The problem you run into is what happens when things go into > "overload" I.e. when the db server starts to slow down, more of those > idle connections become not idle. If all 300 are then waiting on the > db server, it will slow to a crawl and eventually fall over. +1. Try to monitor your connections, for example like this while true; do echo -n "$(date): " psql -XAt -c "select count(1) from pg_stat_activity" sleep 1 done > activity.log and its correlation with slowdowns. -- Kind regards, Sergey Konoplev PostgreSQL Consultant and DBA http://www.linkedin.com/in/grayhemp +1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979 gray...@gmail.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] postgresql recommendation memory
On Sun, Nov 10, 2013 at 11:48 PM, Евгений Селявка wrote: > Sergey, yes this is all of my kernel setting. I don't use THP intentionally. > I think that i need a special library to use THP with postgresql like this > http://code.google.com/p/pgcookbook/wiki/Database_Server_Configuration. This > is my values for this kernel settings: Then it is definitely not THP. ps. BTW, pgcookbook has been moved to GitHub several weeks ago https://github.com/grayhemp/pgcookbook. -- Kind regards, Sergey Konoplev PostgreSQL Consultant and DBA http://www.linkedin.com/in/grayhemp +1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979 gray...@gmail.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Planner performance extremely affected by an hanging transaction (20-30 times)?
Andres Freund writes: > Also, this really isn't going to fix the issue discussed here - this was > just about the additional ProcArrayLock contention. I don't think it > would change anything dramatical in your case. All of these proposals are pretty scary for back-patching purposes, anyway. I think what we should consider doing is just changing get_actual_variable_range() to use a cheaper snapshot type, as in the attached patch (which is for 9.3 but applies to 9.2 with slight offset). On my machine, this seems to make the pathological behavior in BR's test case go away just fine. I'd be interested to hear what it does in the real-world scenarios being complained of. regards, tom lane diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c index d8c1a88..d1b9473 100644 *** a/src/backend/utils/adt/selfuncs.c --- b/src/backend/utils/adt/selfuncs.c *** get_actual_variable_range(PlannerInfo *r *** 4951,4956 --- 4951,4957 HeapTuple tup; Datum values[INDEX_MAX_KEYS]; bool isnull[INDEX_MAX_KEYS]; + SnapshotData SnapshotDirty; estate = CreateExecutorState(); econtext = GetPerTupleExprContext(estate); *** get_actual_variable_range(PlannerInfo *r *** 4973,4978 --- 4974,4980 slot = MakeSingleTupleTableSlot(RelationGetDescr(heapRel)); econtext->ecxt_scantuple = slot; get_typlenbyval(vardata->atttype, &typLen, &typByVal); + InitDirtySnapshot(SnapshotDirty); /* set up an IS NOT NULL scan key so that we ignore nulls */ ScanKeyEntryInitialize(&scankeys[0], *** get_actual_variable_range(PlannerInfo *r *** 4989,4996 /* If min is requested ... */ if (min) { ! index_scan = index_beginscan(heapRel, indexRel, SnapshotNow, ! 1, 0); index_rescan(index_scan, scankeys, 1, NULL, 0); /* Fetch first tuple in sortop's direction */ --- 4991,5013 /* If min is requested ... */ if (min) { ! /* ! * In principle, we should scan the index with our current ! * active snapshot, which is the best approximation we've got ! * to what the query will see when executed. But that won't ! * be exact if a new snap is taken before running the query, ! * and it can be very expensive if a lot of uncommitted rows ! * exist at the end of the index (because we'll laboriously ! * fetch each one and reject it). What seems like a good ! * compromise is to use SnapshotDirty. That will accept ! * uncommitted rows, and thus avoid fetching multiple heap ! * tuples in this scenario. On the other hand, it will reject ! * known-dead rows, and thus not give a bogus answer when the ! * extreme value has been deleted; that case motivates not ! * using SnapshotAny here. ! */ ! index_scan = index_beginscan(heapRel, indexRel, ! &SnapshotDirty, 1, 0); index_rescan(index_scan, scankeys, 1, NULL, 0); /* Fetch first tuple in sortop's direction */ *** get_actual_variable_range(PlannerInfo *r *** 5021,5028 /* If max is requested, and we didn't find the index is empty */ if (max && have_data) { ! index_scan = index_beginscan(heapRel, indexRel, SnapshotNow, ! 1, 0); index_rescan(index_scan, scankeys, 1, NULL, 0); /* Fetch first tuple in reverse direction */ --- 5038,5045 /* If max is requested, and we didn't find the index is empty */ if (max && have_data) { ! index_scan = index_beginscan(heapRel, indexRel, ! &SnapshotDirty, 1, 0); index_rescan(index_scan, scankeys, 1, NULL, 0); /* Fetch first tuple in reverse direction */ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] postgresql recommendation memory
On Mon, Nov 11, 2013 at 09:14:43AM -0700, Scott Marlowe wrote: > On Mon, Nov 11, 2013 at 1:09 AM, Евгений Селявка > wrote: > > Scott hi, i calculate all of my jdbc pool size. Maximum is 300 connections > > from components wich use jdbc. I don't think that this is a good idea use > > pgbouncer, because our application using spring framework which using jdbc > > and prepared statement. I try to talk with our developer about disabling > > prepared statement in this framework, they don't want do this. Thats why i > > will try to upgrade HW and buy CPU with more core as you say based on > > formula 3-4xcore. But most of this connection is idle. This is a web based > > app not a datawarehouse, thats why all this connection is lightwear. > > > > About my db freeze i set this kernel parameter: > > echo 1048576 > /proc/sys/vm/min_free_kbytes > > echo 80 > /proc/sys/vm/vfs_cache_pressure > > > > And my freeze intervals is steel smaller. I try to dig deeper. > > well you can hopefully reduce connections from jdbc pooling then. The > fact that the connections are idle is good. > > The problem you run into is what happens when things go into > "overload" I.e. when the db server starts to slow down, more of those > idle connections become not idle. If all 300 are then waiting on the > db server, it will slow to a crawl and eventually fall over. > +1 I would definitely encourage the use of pgbouncer to map the 300 connections to a saner number that your DB can actually handle. We had a similar problem and very, very occasionally the server would "lockup". Once we put the resource management pooler in place, performance has been the same best-case and much, much better worse-case and NO lockups. Regards, Ken -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] postgresql recommendation memory
On Mon, Nov 11, 2013 at 1:09 AM, Евгений Селявка wrote: > Scott hi, i calculate all of my jdbc pool size. Maximum is 300 connections > from components wich use jdbc. I don't think that this is a good idea use > pgbouncer, because our application using spring framework which using jdbc > and prepared statement. I try to talk with our developer about disabling > prepared statement in this framework, they don't want do this. Thats why i > will try to upgrade HW and buy CPU with more core as you say based on > formula 3-4xcore. But most of this connection is idle. This is a web based > app not a datawarehouse, thats why all this connection is lightwear. > > About my db freeze i set this kernel parameter: > echo 1048576 > /proc/sys/vm/min_free_kbytes > echo 80 > /proc/sys/vm/vfs_cache_pressure > > And my freeze intervals is steel smaller. I try to dig deeper. well you can hopefully reduce connections from jdbc pooling then. The fact that the connections are idle is good. The problem you run into is what happens when things go into "overload" I.e. when the db server starts to slow down, more of those idle connections become not idle. If all 300 are then waiting on the db server, it will slow to a crawl and eventually fall over. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Size of IN list affects query plan
Thanks for your comments. On 8.11.2013 15:31, Tom Lane wrote: AFAICT, the reason the second plan is slow is the large number of checks of the IN list. The planner does account for the cost of that, but it's drastically underestimating that cost relative to the cost of I/O for the heap and index accesses. I suppose that your test case is fully cached in memory, which helps make the CPU costs more important than I/O costs. If you think this is representative of your real workload, then you need to decrease random_page_cost (and maybe seq_page_cost too) to make the cost estimates correspond better to that reality. I am not sure I understand it well - in the first case (fast query), cache is utilized in a better way? Going down with random_page_cost gives me fast query plans with big lists as you expected. I tested the slow query on different machines with (default) settings of seq_page_cost, and I am getting those fast query plans, too, so I am curious what else could affect that (same db vacuum analyzed). Anyway it opens a question if big (tens to hundreds) IN lists is a bad practice, or just something that has to be used carefully. I have to admit I am surprised that this rather standard technique leads to so wide range of performance. On 8.11.2013 15:31, bricklen wrote: Looking at your EXPLAIN ANALYZE plan I was immediately reminded of this article http://www.datadoghq.com/2013/08/100x-faster-postgres-performance-by-changing-1-line/, where changing the array to a VALUES() clause was a huge win for them. Yeah, I saw it before. Unfortunately that does not help significantly in my case. Jan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] postgresql recommendation memory
Scott hi, i calculate all of my jdbc pool size. Maximum is 300 connections from components wich use jdbc. I don't think that this is a good idea use pgbouncer, because our application using spring framework which using jdbc and prepared statement. I try to talk with our developer about disabling prepared statement in this framework, they don't want do this. Thats why i will try to upgrade HW and buy CPU with more core as you say based on formula 3-4xcore. But most of this connection is idle. This is a web based app not a datawarehouse, thats why all this connection is lightwear. About my db freeze i set this kernel parameter: echo 1048576 > /proc/sys/vm/min_free_kbytes echo 80 > /proc/sys/vm/vfs_cache_pressure And my freeze intervals is steel smaller. I try to dig deeper. 2013/11/6 Scott Marlowe > Also also, the definitive page for postgres and dirty pages etc is here: > > http://www.westnet.com/~gsmith/content/linux-pdflush.htm > > Not sure if it's out of date with more modern kernels. Maybe Greg will > chime in. > -- С уважением Селявка Евгений