Re: [PERFORM] Rowcount estimation changes based on from clause order
On Thu, Oct 12, 2017 at 11:50 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Ants Aasma <ants.aa...@eesti.ee> writes: >> I stumbled upon a severe row count underestimation that confusingly >> went away when two inner joins in the from clause were reordered. > > Hm, looks more like an overestimate in this example, but anyway ... > >> Does anybody have any idea what is going on here? > > set_joinrel_size_estimates says > > * Since there is more than one way to make a joinrel for more than two > * base relations, the results we get here could depend on which component > * rel pair is provided. In theory we should get the same answers no matter > * which pair is provided; in practice, since the selectivity estimation > * routines don't handle all cases equally well, we might not. But there's > * not much to be done about it. > > In this example I think the core of the issue is actually not so much > bad selectivity estimates as rowcount roundoff error. > > If we first consider joining "small" with "big", we get an estimate of > 2000 rows (which is dead on for what would happen if we just joined > those). Then we estimate the final result size as the join of that to > "lookup". The selectivity number for that step is somewhat hogwash but > happens to yield a result that's not awful (8 rows). > > In the other case we first estimate the size of the join of "small" with > the "lookup" subquery, and we get a rounded-off estimate of one row, > whereas without the roundoff it would have been probably about 0.01. > When that's joined to "big", we are computing one row times 1 million rows > times a selectivity estimate that's about right for the "small.id = > big.small_id" clause; but because the roundoff already inflated the first > join's size so much, you end up with an inflated final result. > > This suggests that there might be some value in considering the > sub-relations from largest to smallest, so that roundoff error > in the earlier estimates is less likely to contaminate the final > answer. Not sure how expensive it would be to do that or what > sort of instability it might introduce into plan choices. > > Whether that's got anything directly to do with your original problem is > hard to say. Joins to subqueries, which we normally lack any stats for, > tend to produce pretty bogus selectivity numbers in themselves; so the > original problem might've been more of that nature. Thanks for pointing me in the correct direction. The original issue was that values from lookup joined to ref_id and the subset filter in the small table were almost perfectly correlated, which caused the underestimate. In the second case this was hidden by the intermediate clamping to 1, accidentally resulting in a more correct estimate. I actually think that it might be better to consider relations from smallest to largest. The reasoning being - a join cannot produce a fraction of a row, it will either produce 0 or 1, and we should probably plan for the case when it does return something. Going even further, and I haven't looked at how feasible this is, but I have run into several cases lately where cardinality underestimates clamping to 1 result in catastrophically bad plans. Like a stack of nested loops with unparameterized GroupAggregates and HashAggregates as inner sides bad. It seems to me that row estimates should clamp to something slightly larger than 1 unless it's provably going to be 1. Regards, Ants Aasma -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Rowcount estimation changes based on from clause order
I stumbled upon a severe row count underestimation that confusingly went away when two inner joins in the from clause were reordered. I whittled it down to a reproducible test case. Schema: CREATE TABLE small (id serial primary key, ref_id int not null, subset int not null); CREATE TABLE big (id serial primary key, small_id int not null); INSERT INTO small (ref_id, subset) SELECT i/2+1, i/2+1 FROM generate_series(1,1000) i; INSERT INTO big (small_id) SELECT (i % 1000) + 1 FROM generate_series(1,100) i; CREATE INDEX ON small (ref_id); CREATE INDEX ON big (small_id); ANALYZE; And the queries, differing in only the order of joins: SELECT * FROM small INNER JOIN big ON small.id = big.small_id INNER JOIN (SELECT 1 UNION ALL SELECT 2) lookup(ref) ON lookup.ref = small.ref_id WHERE small.subset = 42; SELECT * FROM small INNER JOIN (SELECT 1 UNION ALL SELECT 2) lookup(ref) ON lookup.ref = small.ref_id INNER JOIN big ON small.id = big.small_id WHERE small.subset = 42; Resulting plan for the first case: Nested Loop (cost=20.45..2272.13 rows=8 width=24) -> Nested Loop (cost=0.28..16.69 rows=1 width=16) -> Append (cost=0.00..0.04 rows=2 width=4) -> Result (cost=0.00..0.01 rows=1 width=4) -> Result (cost=0.00..0.01 rows=1 width=4) -> Index Scan using small_ref_id_idx on small (cost=0.28..8.32 rows=1 width=12) Index Cond: (ref_id = (1)) Filter: (subset = 42) -> Bitmap Heap Scan on big (cost=20.18..2245.44 rows=1000 width=8) Recheck Cond: (small_id = small.id) -> Bitmap Index Scan on big_small_id_idx (cost=0.00..19.93 rows=1000 width=0) Index Cond: (small_id = small.id) Second case plan is identical except row count of the topmost nest loop: Nested Loop (cost=20.45..2272.13 rows=1000 width=24) The union subselect was in reality somewhat more complicated, but for the row count issue the simplification does not seem to matter. The behavior is seen both on 9.4 and on master. Does anybody have any idea what is going on here? In the real world case this is based on the estimation was 5 rows instead of 200k, which resulted in quite bad plan choices downstream. Regards, Ants Aasma -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] In progress INSERT wrecks plans on table
On Sat, Jul 13, 2013 at 1:47 AM, Jeff Janes jeff.ja...@gmail.com wrote: But I also tried it with 4 pgbench clients, and ran into a collapse of the performance, TPS dropping down to ~8 TPS. It is too variable to figure out how reliable the speed-up with this patch is, so far. Apparently they are all fighting over the spinlock on the ProcArrayLock. This is a single quad core, Intel(R) Xeon(R) CPU X3210 @ 2.13GHz So I agree with (3) above, about not checking TransactionIdIsInProgress repeatedly. Or could we change the order of operations so that TransactionIdIsInProgress is checked only after XidInMVCCSnapshot? I haven't checked the patch in detail, but it sounds like my proposal for CSN based snapshots[1] could help here. Using it TransactionIdIsInProgress can be done completely lock-free. It would include a direct dense array lookup, read barrier and a check of the dense/sparse horizon, and if necessary a binary search in the sparse array and another read barrier and check for sparse array version counter. I plan to start working on the patch next week. I hope to have a first cut available for CF2. [1] http://www.postgresql.org/message-id/CA+CSw_tEpJ=md1zgxPkjH6CWDnTDft4gBi=+p9snoc+wy3p...@mail.gmail.com Regards, Ants Aasma -- Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Question about postmaster's CPU usage
On Mar 28, 2013 9:07 PM, kelphet xiong kelp...@gmail.com wrote: explain analyze select * from inventory; QUERY PLAN -- Seq Scan on inventory (cost=0.00..180937.00 rows=11745000 width=16) (actual time=0.005..1030.403 rows=117450 00 loops=1) Total runtime: 1750.889 ms (2 rows) A large fraction of that time, if not most is due to timing overhead. You can try the same query without timing by using explain (analyze on, timing off) select * from inventory; Regards, Ants Aasma
Re: [PERFORM] dbt2 performance regresses from 9.1.6 to 9.2.1
On Thu, Nov 1, 2012 at 12:51 AM, Dong Ye y...@vmware.com wrote: The average notpm is 61384.24 with 9.1.6 and 57381.43 with 9.2.1. Plotting notps over time shows that the slowdown of 9.2.1 is evident across the entire run period. Since we also observed sustained 80+% CPU utilization during both runs, we suspected this is a CPU bottleneck issue. So we run oprofile hoping that the profiles may suggest one is using CPU less productively than the other; but nothing jumped out to that explanation. The profiling results are posted on http://pgsql.privatepaste.com/3fa3ae0627 (9.1.6 run) and http://pgsql.privatepaste.com/930bb51374 (9.2.1 run). You are using prepared statements, this makes me think that this regression might be due to support for parameter specific plans for prepared statements. [1] Can you run the test on both versions without prepared statements and see if the regressions remains. I compared the profile results, I'll reproduce the results here incase they ring any other bells for someone. Here are top 20 functions that take more time under 9.2: Function Diff v9.2% v9.1% postgres.copyObject 3.48 1.2436 0.3569 postgres.check_stack_depth 1.92 0.7244 0.3774 postgres.eval_const_expressions_mutator 1.87 0.3473 0.1853 jbd./jbd 1.82 0.4127 0.2271 libc-2.14.1.so._int_malloc 1.75 1.4938 0.8540 libc-2.14.1.so.__strlen_sse42 1.72 0.7098 0.4124 vmlinux-2.6.32.59-0.7-default.copy_user_generic_string 1.70 0.5130 0.3017 postgres.MemoryContextCreate 1.68 0.3206 0.1914 postgres.MemoryContextAllocZeroAligned 1.64 1.5443 0.9443 libc-2.14.1.so._int_free 1.60 0.7182 0.4476 postgres.expression_tree_walker 1.60 0.8350 0.5235 postgres.XLogInsert 1.58 2.7251 1.7210 ext3./ext3 1.55 0.2065 0.1335 libc-2.14.1.so.__strcpy_ssse3 1.50 0.3061 0.2046 postgres.expression_tree_mutator 1.41 0.3461 0.2447 libc-2.14.1.so.__memcpy_ssse3_back 1.40 1.2379 0.8830 postgres.AllocSetAlloc 1.39 4.6567 3.3467 postgres.LockAcquireExtended 1.39 0.2799 0.2015 postgres.MemoryContextAlloc 1.38 1.0151 0.7373 postgres.AllocSetDelete 1.33 0.2130 0.1600 And top 10 functions present under 9.2 but not present with 9.1: Function postgres._copyList.isra.15 0.341 postgres._SPI_execute_plan.isra.4 0.224 postgres.grouping_planner 0.220 postgres.IndexOnlyNext 0.213 postgres.GetCachedPlan 0.189 postgres.MemoryContextStrdup 0.171 postgres.list_copy 0.165 postgres.index_getnext_tid 0.155 postgres.MemoryContextSetParent 0.128 postgres.cost_qual_eval_walker 0.127 I have no idea why is XLogInsert taking so much longer on 9.2. [1] http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=e6faf910 Ants Aasma -- Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SELECT AND AGG huge tables
On Wed, Oct 17, 2012 at 2:24 PM, houmanb hou...@gmx.at wrote: Hi all, Thanks for your advice and the link about posting my question in an appropriate form. Here are the info. I thank all of you in advance. Can you run the EXPLAIN once more with EXPLAIN (ANALYZE, BUFFERS, TIMING OFF). Given the number of rows processed by the query, the detailed per node timing overhead might be a considerable factor here. What happened to the WHERE T.creation_date=$SOME_DATE part of the query. These examples go through the whole table. The plans shown are about as fast as it gets. Summarizing 5GB of data will never be fast. If you need that information quickly, you'll need to actively maintain the aggregate values via triggers. Regards, Ants Aasma -- Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Guide to Posting Slow Query Questions
On Wed, Sep 26, 2012 at 11:11 PM, Jeff Janes jeff.ja...@gmail.com wrote: On Wed, Sep 12, 2012 at 11:40 PM, Ants Aasma a...@cybertec.at wrote: I don't have any links for OS level monitoring, but with version 9.2 track_io_timing would do the job. I don't know how to advice people on how to use this to obtain information on a specific query. Would someone else like to take a stab at explaining that? I added a line suggesting that 9.2 users turn it on via SET track_io_timing TO on; Ants Aasma -- Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Same query doing slow then quick
Sorry for the late answer, I was going through my e-mail backlog and noticed that this question hadn't been answered. On Thu, Sep 27, 2012 at 11:33 AM, Undertaker Rude ffw_r...@hotmail.com wrote: Oh ok. But what is this command doing ? i'm gonna runn it today. I'll keep you posted. Here is some EXPLAIN ANALYZE from the querys : Nested Loop (cost=0.00..353722.89 rows=124893 width=16) (actual time=261158.061..10304193.501 rows=99 loops=1) Join Filter: ((t2.X = (t1.x_min)::double precision) AND (t2.X = (t1.x_max)::double precision) AND (t2.Y = (t1.y_min)::double precision) AND (t2.Y = (t1.y_max)::double precision)) - Seq Scan on gps_22 t1 (cost=0.00..3431.80 rows=177480 width=44) (actual time=0.036..1399.621 rows=177480 loops=1) - Materialize (cost=0.00..20572.83 rows=57 width=20) (actual time=0.012..10.274 rows=2924 loops=177480) - Seq Scan on adresses_22 t2 (cost=0.00..20572.55 rows=57 width=20) (actual time=1570.240..1726.376 rows=2924 loops=1) Filter: ((id_maille_200m)::text = '0'::text) Total runtime: 10304211.648 ms As you can see from the explain plan, postgresql is not using any indexes here. The reason is the type mismatch between the X and x_min columns. Use matching types between tables to enable index use. The same goes for the id column, if the column type is integer use a numeric literal 0 not a text literal '0'. Regards, Ants Aasma -- Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Guide to Posting Slow Query Questions
On Wed, Sep 12, 2012 at 7:00 PM, Jeff Janes jeff.ja...@gmail.com wrote: Regarding the wiki page on reporting slow queries: We currently recommend EXPLAIN ANALYZE over just EXPLAIN. Should we recommend EXPLAIN (ANALYZE, BUFFERS) instead? I know I very often wish I could see that data. I don't think turning buffer accounting on adds much cost over a mere ANALYZE. Given the amount of version 8 installs out there the recommendation should be qualified with version 9.0. Otherwise a strong +1 Also, an additional thing that would be nice for people to report is whether long running queries are CPU bound or IO bound. Should we add that recommendation with links to how to do that in a couple OS, say, Linux and Windows. If so, does anyone know of good links that explain it for those OS? I don't have any links for OS level monitoring, but with version 9.2 track_io_timing would do the job. Ants Aasma -- Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] very very slow inserts into very large table
On Tue, Jul 17, 2012 at 6:30 AM, Craig Ringer ring...@ringerc.id.au wrote: On 07/17/2012 01:56 AM, Jon Nelson wrote: To perform reasonably well, Pg would need to be able to defer index updates when bulk-loading data in a single statement (or even transaction), then apply them when the statement finished or transaction committed. Doing this at a transaction level would mean you'd need a way to mark indexes as 'lazily updated' and have Pg avoid using them once they'd been dirtied within a transaction. No such support currently exists, and it'd be non-trivial to implement, especially since people loading huge amounts of data often want to do it with multiple concurrent sessions. You'd need some kind of 'DISABLE INDEX' and 'ENABLE INDEX' commands plus a transactional backing table of pending index updates. It seems to me that if the insertion is done as a single statement it wouldn't be a problem to collect up all btree insertions and apply them before completing the statement. I'm not sure how much that would help though. If the new rows have uniform distribution you end up reading in the whole index anyway. Because indexes are not stored in logical order you don't get to benefit from sequential I/O. The lazy merging approach (the paper that Claudio linked) on the other hand seems promising but a lot trickier to implement. Regards, Ants Aasma -- Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] how could select id=xx so slow?
On Thu, Jul 12, 2012 at 3:48 PM, Yan Chunlu springri...@gmail.com wrote: yes the system seems overloaded, I am dealing with a simple INSERT but not sure if it is normal that it took more time than the explain estimated: The estimated cost is in arbitrary units, its purpose is to compare different execution plans, not estimate time taken. So it's completely normal that it doesn't match actual time taken. Ants Aasma -- Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] how could select id=xx so slow?
On Wed, Jul 11, 2012 at 9:24 AM, Yan Chunlu springri...@gmail.com wrote: I have logged one day data and found the checkpoint is rather frequently(detail: https://gist.github.com/3088338). Not sure if it is normal, but the average time of checkpoint is about 100sec~200sec, it seems related with my settings: 574 checkpoint_segments = 64 575 wal_keep_segments = 5000 I set checkpoint_segments as a very large value which is because otherwise the slave server always can not follow the master, should I lower that value? or the slow query is about something else? thanks! Some things to notice from the checkpoints log: * All chcekpoints are triggered by checkpoint_timeout, using up only a couple log files * Checkpoints write out around 40MB of buffers * The write out period is spread out nicely like it's supposed to but the sync phase is occasionally taking a very long time (more than 2 minutes) This looks like something (not necessarily the checkpoint sync itself) is overloading the IO system. You might want to monitor the IO load with iostat and correlate it with the checkpoints and slow queries to find the culprit. It's also possible that something else is causing the issues. If the cause is checkpoints, just making them less frequent might make the problem worse. I'm assuming you have 16GB+ of RAM because you have 4GB of shared_buffers. Just making checkpoint_timeout longer will accumulate a larger number of dirty buffers that will clog up the IO queues even worse. If you are on Linux, lowering dirty_expire_centisecs or dirty_background_bytes might help to spread the load out but will make overall throughput worse. On the otherhand, if the I/O overload is from queries (more likely because some checkpoints sync quickly) there are no easy tuning answers. Making queries less IO intensive is probably the best you can do. From the tuning side, newer Linux kernels handle I/O fairness a lot better, and you could also try tweaking the I/O scheduler to achieve better throughput to avoid congestion or at least provide better latency for trivial queries. And of course its always possible to throw more hardware at the problem and upgrade the I/O subsystem. Ants Aasma -- Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] [pgsql-cluster-hackers][performance] fast reads on a busy server
On Jun 27, 2012 2:29 PM, Willy-Bas Loos willy...@gmail.com wrote: Should i use a larger shared_buffers for the other cluster(s) too, so that i bypass the inefficient OS file-cache? Once the in-memory cluster has filled its shared buffers, the pages go cold for the OS cache and get replaced with pages of other clusters that are actually referenced. Ants Aasma
Re: [PERFORM] does the query planner consider work_mem?
On Wed, May 30, 2012 at 8:57 PM, Murat Tasan mmu...@gmail.com wrote: any insights here? Have you tried running the slow option multiple times? According to the explain output all of the time was accounted to the bitmap heap scan. For the second explain plan the same node was fast. It looks to me as the first explain on 8.4 was slow because the data was still on disk. Raising work mem doubled the speed of the sort from 800ms to 400ms. Regards, Ants Aasma -- Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query got slow from 9.0 to 9.1 upgrade
On Tue, May 1, 2012 at 12:17 AM, Josh Turmel jtur...@gmail.com wrote: We just upgraded from 9.0 to 9.1, we're using the same server configuration, that has been confirmed 3 or 4 times over. Any help would be appreciated. If I remove the ORDER BY it gets fast again because it goes back to using the user_id index, if I remove the LIMIT/OFFSET it gets fast again, obviously I need both of those, but that was just to test and see what would happen. Query: SELECT * FROM bookmark_groups WHERE user_id = 6708929 ORDER BY created DESC LIMIT 25 OFFSET 0; Based on the explain numbers I'd say that 9.0 was fast by accident of having inaccurate statistics. You can see that 9.0 estimated that 757 rows have this user_id, while actually it had 33868 rows. 9.1 estimated a more accurate 35980 rows, and because of that assumed that reading the newest created rows would return 25 rows of this user rather fast, faster than sorting the 35980 rows. This assumption seems to be incorrect, probably because the rows with this user_id are all rather old. You could try tweaking cpu_index_tuple_cost to be higher so that large index scans get penalized. But ultimately with the current PG version there isn't a good general way to fix this kind of behavior. You can rewrite the query to enforce filtering before sorting: SELECT * FROM ( SELECT * FROM bookmark_groups WHERE user_id = 6708929 OFFSET 0 -- Prevents pushdown of ordering and limit ) AS sub ORDER BY created DESC LIMIT 25 OFFSET 0; This is the same issue that Simon Riggs talks about in this e-mail: http://archives.postgresql.org/message-id/ca+u5nmlbxfut9cwdhj3tpxjc3btwqizbkqtwdgzebcb5bag...@mail.gmail.com The more general approach is to be more pessimistic about limited filtered index-scans, or collecting multi-dimensional stats to figure out the correlation that all rows for this user are likely to be old. Regards, Ants Aasma -- Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] bad planning with 75% effective_cache_size
On Thu, Apr 19, 2012 at 3:44 AM, Josh Berkus j...@agliodbs.com 7500ms http://explain.depesz.com/s/ This plan seems very odd -- doing individual index lookups on 2.8m rows is not standard planner behavior. Can you confirm that all of your other query cost parameters are the defaults? This similat to the issue with limit that Simon was complaining about a few weeks ago [1]. A lot of the estimation in the planner is biased to give overestimations for number of rows returned in the face of uncertainty. This works well for joins but interacts really badly with limits. The two issues here are the join cardinality being overestimated a factor of 15x and then the unique is off by another 50x. The result is that the planner thinks that it needs to scan 0.25% of the input, while actually it needs to scan the whole of it, underestimating the real cost by a factor of 400. I'm not sure what to do about unique node overestimation, but I think it could be coaxed to be less optimistic about the limit by adding an optimization barrier and some selectivity decreasing clauses between the limit and the rest of the query: select * from ( select distinct product_code from product p_ inner join product_parent par_ on p_.parent_id=par_.id where par_.parent_name like 'aa%' offset 0 -- optimization barrier ) as x where product_code = product_code -- reduce selectivity estimate by 200x limit 2; [1] http://archives.postgresql.org/message-id/ca+u5nmlbxfut9cwdhj3tpxjc3btwqizbkqtwdgzebcb5bag...@mail.gmail.com Cheers, Ants Aasma -- Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] bad plan
On Thu, Apr 5, 2012 at 2:47 PM, Julien Cigar jci...@ulb.ac.be wrote: - http://www.pastie.org/3731956 : with default config - http://www.pastie.org/3731960 : this is with enable_seq_scan = off It looks like the join selectivity of (context_to_context_links, ancestors) is being overestimated by almost two orders of magnitude. The optimizer thinks that there are 564 rows in the context_to_context_links table for each taxon_id, while in fact for this query the number is 9. To confirm that this, you can force the selectivity estimate to be 200x lower by adding a geo_id = geod_id where clause to the subquery. If it does help, then the next question would be why is the estimate so much off. It could be either because the stats for context_to_context_links.taxon_id are wrong or because ancestors.taxon_id(subphylum_id = 18830) is a special case. To help figuring this is out, you could run the following to queries and post the results: SELECT floor(log(num,2)) AS nmatch, COUNT(*) AS freq FROM (SELECT COUNT(*) AS num FROM context_to_context_links GROUP BY taxon_id) AS dist GROUP BY 1 ORDER BY 1; SELECT floor(log(num,2)) AS nmatch, COUNT(*) AS freq FROM (SELECT COUNT(*) AS num FROM context_to_context_links WHERE NOT geo_id IS NULL and taxon_id= ANY ( select taxon_id from rab.ancestors where ancestors.subphylum_id = 18830) GROUP BY taxon_id) AS dist GROUP BY 1 ORDER BY 1; If the second distribution has a significantly different shape then cross column statistics are necessary to get good plans. As it happens I'm working on adding this functionality to PostgreSQL and would love to hear more details about your use-case to understand if it would be solved by this work. Regards, Ants Aasma -- Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Distinct + Limit
On Tue, Mar 27, 2012 at 11:54 PM, Francois Deliege fdeli...@gmail.com wrote: select col1 from table1 group by col1 limit 1; select distinct on (col1) col1 from table1 limit 1; select col1 from table1 group by col1 limit 2; select distinct on (col1) col1 from table1 limit 2; Performing any of these following queries results in a full sequential scan, followed by a hash aggregate, and then the limit. An optimization could be to stop the sequential scan as soon as the limit of results has been reached. Am I missing something? Yes, that would be an optimization. Unfortunately currently the aggregation logic doesn't have special case logic to start outputting tuples immediately when no aggregate functions are in use. In principle it's possible to teach it to do that, peeking at the code it seems that it wouldn't even be too hard to implement. Currently your best options are to add an indexes for columns that you select distinct values from, use a server side cursor and do the distinct operation on the client (might need to adjust cursor_tuple_fraction before doing the query to make cost estimates better) or use a stored procedure to do the cursor + manual distinct trick. Similarly, the following query results in a sequential scan: select * from table1 where col1 col1; PostgreSQL query optimizer doesn't try to be a theorem prover and so doesn't deduce the logical impossibility. For most queries, looking for nonsensical would be a complete waste of time. The optimizer does notice impossibilities that crop up during constant propagation, so WHERE false or WHERE 0 = 1 would work fine. It would be best to fix Sequel to output literal constant false for PostgreSQL. However, I wonder if it's worth checking for this very specific case because it is a common idiom for Oracle users to implement constant false in where predicates due to Oracle not allowing top level literal booleans for some arcane reason or another. Ants Aasma -- Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Obtaining resource usage statistics from execution? (v 9.1)
On Fri, Mar 16, 2012 at 4:31 PM, Karl Denninger k...@denninger.net wrote: What I'd like to be able to do is have the code track performance all the time and raise alerts when it sees outliers giving me a continually-improving set of targets for reduction of resource consumption (up until I reach the point where I don't seem to be able to make it any faster of course :-)) Sounds almost exactly like what the auto_explain contrib module is designed to do: http://www.postgresql.org/docs/9.1/static/auto-explain.html It can run with reasonably low overhead if your system has fast timing. You can check the timing performance of your system with the tool attached here: http://archives.postgresql.org/message-id/4F15B930.50108%402ndQuadrant.com Anything under 200ns should be ok. Cheers, Ants Aasma -- Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] index choosing problem
2012/3/15 Rural Hunter ruralhun...@gmail.com: Now the query 3-5 using article_others_pkey are quite slow. The rows for cid 74 are very old and seldom get updated. I think pg needs to scan quite a lot on article_others_pkey before it gets the rows for cid 74. The same query for other cids with new and majority of rows runs very fast. This is because the PostgreSQL cost model doesn't know about the correlation between aid and cid. In absence of information it assumes that it will find a row with cid=74 about every 68 rows (889520/13047). One option to fix this case is to use OFFSET 0 as an optimization barrier: SELECT max(aid) FROM (SELECT aid FROM article_others WHERE cid=74 OFFSET 0) AS x; That has the unfortunate effect of performing badly for cid's that are extremely popular. That may or may not be acceptable in your case. To fix this properly the query optimizer needs to know the relationship between aid and cid and needs to know how to apply that to estimating the cost of index scans. A prerequisite for implementing this is to have multi-column statistics. To do the estimation, the current linear cost model needs to be changed to something that can express a non-linear relationship between tuples returned and cost, e.g. a piece-wise linear model. The stats collection part is actually feasible, in fact I'm currently working on a patch for that. As for the estimation improvement, I have an idea how it might work, but I'm not really sure yet if the performance hit for query planning would be acceptable. Another question, why the plan shows rows=13047 for cid=74 while actually it only has 4199 rows? There is almost no data changes for cid 74 and I just vacuum/analyzed the table this morning. Might just be an artifact of random sampling. Try raising your stats target and re-analyzing to confirm. All the best, Ants Aasma -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Advice sought : new database server
On Wed, Mar 7, 2012 at 10:18 PM, Merlin Moncure mmonc...@gmail.com wrote: those numbers are stupendous for 8 drive sata. how much shared buffers do you have? Couple of things to notice: 1) The benchmark can run fully in memory, although not 100% in shared_buffers. 2) These are 100k transaction runs, meaning that probably no checkpointing was going on. 3) Given the amount of memory in the server, with dirty flush settings the OS will do mostly sequential writes. Just ran a quick test. With synchronous_commit=off to simulate a BBU I have no trouble hitting 11k tps on a single SATA disk. Seems to be mostly CPU bound on my workstation (Intel i5 2500K @ 3.9GHz, 16GB memory), dirty writes stay in OS buffers, about 220tps/6MBps of traffic to the xlog's, checkpoint dumps everything to OS cache which is then flushed at about 170MB/s (which probably would do nasty things to latency in real world cases). Unlogged tables are give me about 12k tps which seems to confirm mostly CPU bound. So regardless if the benchmark is a good representation of the target workload or not, it definitely isn't benchmarking the IO system. Ants Aasma -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Bad estimation for where field not in
On Thu, Mar 1, 2012 at 6:40 PM, Daniele Varrazzo daniele.varra...@gmail.com wrote: Is this a known planner shortcoming or something unexpected, to be escalated to -bugs? Server version is 9.0.1. The relevant code is in scalararraysel() function. It makes the assumption that element wise comparisons are completely independent, while the exact opposite is true. This has been this way since http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=290166f93404d8759f4bf60ef1732c8ba9a52785 introduced it to version 8.2. At least for equality and inequality ops it would be good to rework the logic to aggregate with s1 = s1 + s2 and s1 = s1 + s2 - 1 correspondingly. -- Ants Aasma -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] problems with set_config, work_mem, maintenance_work_mem, and sorting
On Feb 29, 2012 1:44 AM, Claudio Freire klaussfre...@gmail.com wrote: Another option, depending on your SQLA version, when connections are sent back to the pool, I seem to remember they were reset. That would also reset the work_mem, you'd still see the same pid on PG logs, but it's not the same session. Except that any open transactions are rolled back no other reset is done. The correct way to handle this would be to set the options and commit the transaction in Pool connect or checkout events. The event choice depends on whether application scope or request scope parameters are wanted. -- Ants Aasma
Re: [PERFORM] Why so slow?
On Feb 17, 2012 8:35 PM, Alessandro Gagliardi alessan...@path.com wrote: Here is the EXPLAIN: http://explain.depesz.com/s/ley I'm using PostgreSQL 9.0.6 on i486-pc-linux-gnu, compiled by GCC gcc-4.4.real (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 32-bit My random_page_cost is 2 and yet it still insists on using Seq Scan on blocks. As could be inferred from the row counts, it's slow because its joining and then aggregating a quarter of the blocks table. The hash join with its sequential scan is probably the correct choice for that type of join, it's the join itself that should be optimized out. The optimizer doesn't figure out that the join can be turned into a semi join if the output is aggregated with distinct and is from only one of the tables (in this case, because the output is the join key, it can be from either table). To make the optimizers job easier you can rewrite it as a semi-join explicitly: SELECT DISTINCT(user_id) FROM seen_its WHERE EXISTS (SELECT 1 FROM blocks WHERE blocks.user_id = seen_its.user_id) AND seen_its.created BETWEEN (now()::date - interval '8 days')::timestamp AND now()::date::timestamp -- Ants Aasma
Re: [PERFORM] Bogus startup cost for WindowAgg
On Wed, Oct 13, 2010 at 10:35 PM, Mladen Gogala mladen.gog...@vmsinfo.com wrote: You will see that for most of the columns, the length of the histogram array corresponds to the value of the default_statistics_target parameter. For those that are smaller, the size is the total number of values in the column in the sample taken by the analyze command. The longer histogram, the better plan. In this case, the size does matter. The issue here isn't that the statistics are off. The issue is, as Tom said, that the optimizer doesn't consider them for the cost model of the window aggregate. The trivial case I put forward wouldn't be too hard to cover - if there's no partitioning of the window and the frame is over the full partition, the startup cost should be nearly the same as the full cost. But outside of the trick I tried, I'm not sure if the trivial case matters much. I can also see how the estimation gets pretty hairy when partitioning, frames and real window functions come into play. One idea would be to cost three different cases. If the aggregate needs to read ahead some most likely constant number of rows, i.e. is not using an unbounded following frame, leave the startup cost as is. If there is partitioning, estimate the number of groups produced by the partitioning and add one n-th of the difference between startup and total cost. Otherwise, if the frame is to the end of the partition and there is no partitioning, set the startup cost equal to total cost, or in terms of the previous case, n=1. I don't know how accurate estimating the number of groups would be, or even if it is feasible to do it. If those assumptions hold, then it seems to me that this method should at-least cover any large O(n) effects. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Bogus startup cost for WindowAgg
I hit an issue with window aggregate costing while experimenting with providing a count of the full match along side a limited result set. Seems that the window aggregate node doesn't take into account that it has to consume the whole input before outputting the first row. When this is combined with a limit, the resulting cost estimate is wildly underestimated, leading to suboptimal plans. Is this a known issue? I couldn't find anything referring to this on the mailing list or todo. Code to reproduce follows: ants=# CREATE TABLE test (a int, b int); CREATE TABLE ants=# INSERT INTO test (a,b) SELECT random()*1e6, random()*1e6 FROM generate_series(1,100); INSERT 0 100 ants=# CREATE INDEX a_idx ON test (a); CREATE INDEX ants=# CREATE INDEX b_idx ON test (b); CREATE INDEX ants=# ANALYZE test; ANALYZE ants=# EXPLAIN ANALYZE SELECT *, COUNT(*) OVER () FROM test WHERE a 2500 ORDER BY b LIMIT 10; QUERY PLAN Limit (cost=0.00..195.31 rows=10 width=8) (actual time=728.325..728.339 rows=10 loops=1) - WindowAgg (cost=0.00..46209.93 rows=2366 width=8) (actual time=728.324..728.337 rows=10 loops=1) - Index Scan using b_idx on test (cost=0.00..46180.36 rows=2366 width=8) (actual time=0.334..727.221 rows=2512 loops=1) Filter: (a 2500) Total runtime: 728.401 ms (5 rows) ants=# SET enable_indexscan = off; SET ants=# EXPLAIN ANALYZE SELECT *, COUNT(*) OVER () FROM test WHERE a 2500 ORDER BY b LIMIT 10; QUERY PLAN --- Limit (cost=3986.82..3986.85 rows=10 width=8) (actual time=7.186..7.189 rows=10 loops=1) - Sort (cost=3986.82..3992.74 rows=2366 width=8) (actual time=7.185..7.187 rows=10 loops=1) Sort Key: b Sort Method: top-N heapsort Memory: 25kB - WindowAgg (cost=46.70..3935.69 rows=2366 width=8) (actual time=4.181..6.508 rows=2512 loops=1) - Bitmap Heap Scan on test (cost=46.70..3906.12 rows=2366 width=8) (actual time=0.933..3.555 rows=2512 loops=1) Recheck Cond: (a 2500) - Bitmap Index Scan on a_idx (cost=0.00..46.10 rows=2366 width=0) (actual time=0.512..0.512 rows=2512 loops=1) Index Cond: (a 2500) Total runtime: 7.228 ms (10 rows) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance