Re: [GENERAL] Why does row estimation on nested loop make no sense to me
Ok - I agree - Can somebody help me understand where the row estimates come from on a nested-loop operation in postgres then? - Original Message - From: hubert depesz lubaczewski dep...@depesz.com To: Jeff Amiel becauseimj...@yahoo.com Cc: pgsql-general@postgresql.org pgsql-general@postgresql.org Sent: Saturday, May 18, 2013 3:39 AM Subject: Re: [GENERAL] Why does row estimation on nested loop make no sense to me Not sure if it helps, but it's apparently not a very rare thing. Quick analysis on data from explain.depesz.com showed that 12% of plans with nested loop have such estimate. Couple of examples: http://explain.depesz.com/s/Qm4 http://explain.depesz.com/s/qmW http://explain.depesz.com/s/qnG http://explain.depesz.com/s/QO http://explain.depesz.com/s/qov ... -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why does row estimation on nested loop make no sense to me
On Mon, May 20, 2013 at 11:01 PM, Jeff Amiel becauseimj...@yahoo.com wrote: Ok - I agree - Can somebody help me understand where the row estimates come from on a nested-loop operation in postgres then? In case you haven't noticed already in the documentation, there are following lines: ... It might appear from inspection of the EXPLAIN output that the estimate of join rows comes from 50 * 1, that is, the number of outer rows times the estimated number of rows obtained by each inner index scan on tenk2. But this is not the case: *the join relation size is estimated before any particular join plan has been considered*. If everything is working well then the two ways of estimating the join size will produce about the same answer, but due to roundoff error and other factors they sometimes diverge significantly. Read more at: http://www.postgresql.org/docs/9.2/static/row-estimation-examples.html It also refers where in source code these table size estimations are done. Hope this helps. -- Amit Langote -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why does row estimation on nested loop make no sense to me
I also found one other discussion which has similar issues addressed: http://postgresql.1045698.n5.nabble.com/Bogus-nestloop-rows-estimate-in-8-4-7-td5710254.html -- Amit Langote -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why does row estimation on nested loop make no sense to me
Thanks much! (sorry for top-posting, yahoo email sucks) - Original Message - From: Amit Langote amitlangot...@gmail.com To: Jeff Amiel becauseimj...@yahoo.com Cc: dep...@depesz.com dep...@depesz.com; pgsql-general@postgresql.org pgsql-general@postgresql.org Sent: Monday, May 20, 2013 9:51 AM Subject: Re: [GENERAL] Why does row estimation on nested loop make no sense to me I also found one other discussion which has similar issues addressed: http://postgresql.1045698.n5.nabble.com/Bogus-nestloop-rows-estimate-in-8-4-7-td5710254.html -- Amit Langote -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why does row estimation on nested loop make no sense to me
On Tue, May 21, 2013 at 12:43 AM, Jeff Amiel becauseimj...@yahoo.com wrote: Thanks much! (sorry for top-posting, yahoo email sucks) I wonder if you could arrive at some conclusions with the statistics (pg_stats) you have and the join selectivity formulas described in the referred documentation link. I would like to know if you still get the same row estimates (after explain) and also if possible, the value that is computed from that formula. Do they resemble each other? -- Amit Langote -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why does row estimation on nested loop make no sense to me
Not sure if it helps, but it's apparently not a very rare thing. Quick analysis on data from explain.depesz.com showed that 12% of plans with nested loop have such estimate. Couple of examples: http://explain.depesz.com/s/Qm4 http://explain.depesz.com/s/qmW http://explain.depesz.com/s/qnG http://explain.depesz.com/s/QO http://explain.depesz.com/s/qov http://explain.depesz.com/s/qqb http://explain.depesz.com/s/QqF http://explain.depesz.com/s/qQO http://explain.depesz.com/s/qrI http://explain.depesz.com/s/QRK http://explain.depesz.com/s/QUX9 http://explain.depesz.com/s/QvN http://explain.depesz.com/s/QWL http://explain.depesz.com/s/r4F http://explain.depesz.com/s/R7q http://explain.depesz.com/s/r8 http://explain.depesz.com/s/R8 http://explain.depesz.com/s/RaB http://explain.depesz.com/s/RbV http://explain.depesz.com/s/Rc7 all these plans are public and not anonymized. depesz -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Why does row estimation on nested loop make no sense to me
On most nested loops that I do explain/explain analyze on, the row estimation for the nested-loop itself is a product of the inner nodes of the nested loop. However in this case, I am stumped! explain select era.child_entity from entity_rel era join user_entity ue on ue.entity_id = era.parent_entity and ue.user_id=12345 Nested Loop (cost=0.00..2903.37 rows=29107 width=4) - Index Only Scan using entity_pk on user_entity ue (cost=0.00..62.68 rows=2 width=4) Index Cond: (user_id = 10954) - Index Scan using rel_parent on entity_rel era (cost=0.00..1261.85 rows=317 width=8) Index Cond: (parent_entity = ue.entity_id) How can the estimated number of rows for the nested loop node EXCEED the product of the 2 row estimates of the tables being joined? Not only does it exceed it - but it is orders of magnitude greater. Am I missing something obvious here? I an see the nested loop row estimate being LESS but certainly not more. PostgreSQL 9.2.4 on x86_64-pc-solaris2.10, compiled by gcc (GCC) 3.4.3 (csl-sol210-3_4-branch+sol_rpath), 64-bit -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why does row estimation on nested loop make no sense to me
Jeff Amiel becauseimj...@yahoo.com writes: How can the estimated number of rows for the nested loop node EXCEED the product of the 2 row estimates of the tables being joined? Not only does it exceed it - but it is orders of magnitude greater. Can you provide a self-contained test case that does this? regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why does row estimation on nested loop make no sense to me
On Sat, May 18, 2013 at 1:25 AM, Jeff Amiel becauseimj...@yahoo.com wrote: On most nested loops that I do explain/explain analyze on, the row estimation for the nested-loop itself is a product of the inner nodes of the nested loop. However in this case, I am stumped! explain select era.child_entity from entity_rel era join user_entity ue on ue.entity_id = era.parent_entity and ue.user_id=12345 Nested Loop (cost=0.00..2903.37 rows=29107 width=4) - Index Only Scan using entity_pk on user_entity ue (cost=0.00..62.68 rows=2 width=4) Index Cond: (user_id = 10954) - Index Scan using rel_parent on entity_rel era (cost=0.00..1261.85 rows=317 width=8) Index Cond: (parent_entity = ue.entity_id) How can the estimated number of rows for the nested loop node EXCEED the product of the 2 row estimates of the tables being joined? Not only does it exceed it - but it is orders of magnitude greater. Am I missing something obvious here? I an see the nested loop row estimate being LESS but certainly not more. Can you also post the output of explain analyze your-query? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why does row estimation on nested loop make no sense to me
Can you provide a self-contained test case that does this? That response scares me. :) I can try - Every other table set (small, easy to experiment with) returns results as expected - Is the implication that this looks 'unusual'? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why does row estimation on nested loop make no sense to me
- Original Message - From: Amit Langote amitlangot...@gmail.com To: Jeff Amiel becauseimj...@yahoo.com Cc: pgsql-general@postgresql.org pgsql-general@postgresql.org Sent: Friday, May 17, 2013 11:37 AM Subject: Re: [GENERAL] Why does row estimation on nested loop make no sense to me On Sat, May 18, 2013 at 1:25 AM, Jeff Amiel becauseimj...@yahoo.com wrote: On most nested loops that I do explain/explain analyze on, the row estimation for the nested-loop itself is a product of the inner nodes of the nested loop. However in this case, I am stumped! explain select era.child_entity from entity_rel era join user_entity ue on ue.entity_id = era.parent_entity and ue.user_id=12345 Nested Loop (cost=0.00..2903.37 rows=29107 width=4) - Index Only Scan using entity_pk on user_entity ue (cost=0.00..62.68 rows=2 width=4) Index Cond: (user_id = 10954) - Index Scan using rel_parent on entity_rel era (cost=0.00..1261.85 rows=317 width=8) Index Cond: (parent_entity = ue.entity_id) How can the estimated number of rows for the nested loop node EXCEED the product of the 2 row estimates of the tables being joined? Not only does it exceed it - but it is orders of magnitude greater. Am I missing something obvious here? I an see the nested loop row estimate being LESS but certainly not more. Can you also post the output of explain analyze your-query? I'm not worried about performance (per se) but the row estimation issue which propagates up as part of a bigger query. But here ya go: explain analyze select era.child_entity from entity_rel era join user_entity ue on ue.entity_id = era.parent_entity and ue.user_id=12345 Nested Loop (cost=0.00..2903.37 rows=29107 width=4) (actual time=0.028..0.274 rows=201 loops=1) - Index Only Scan using entity_pk on user_entity ue (cost=0.00..62.68 rows=2 width=4) (actual time=0.011..0.012 rows=1 loops=1) Index Cond: (user_id = 12345) Heap Fetches: 1 - Index Scan using rel_parent on entity_rel era (cost=0.00..1261.85 rows=317 width=8) (actual time=0.013..0.164 rows=201 loops=1) Index Cond: (parent_entity = ue.entity_id) Total runtime: 0.361 ms -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why does row estimation on nested loop make no sense to me
On Sat, May 18, 2013 at 1:47 AM, Jeff Amiel becauseimj...@yahoo.com wrote: - Original Message - From: Amit Langote amitlangot...@gmail.com To: Jeff Amiel becauseimj...@yahoo.com Cc: pgsql-general@postgresql.org pgsql-general@postgresql.org Sent: Friday, May 17, 2013 11:37 AM Subject: Re: [GENERAL] Why does row estimation on nested loop make no sense to me On Sat, May 18, 2013 at 1:25 AM, Jeff Amiel becauseimj...@yahoo.com wrote: On most nested loops that I do explain/explain analyze on, the row estimation for the nested-loop itself is a product of the inner nodes of the nested loop. However in this case, I am stumped! explain select era.child_entity from entity_rel era join user_entity ue on ue.entity_id = era.parent_entity and ue.user_id=12345 Nested Loop (cost=0.00..2903.37 rows=29107 width=4) - Index Only Scan using entity_pk on user_entity ue (cost=0.00..62.68 rows=2 width=4) Index Cond: (user_id = 10954) - Index Scan using rel_parent on entity_rel era (cost=0.00..1261.85 rows=317 width=8) Index Cond: (parent_entity = ue.entity_id) How can the estimated number of rows for the nested loop node EXCEED the product of the 2 row estimates of the tables being joined? Not only does it exceed it - but it is orders of magnitude greater. Am I missing something obvious here? I an see the nested loop row estimate being LESS but certainly not more. Can you also post the output of explain analyze your-query? I'm not worried about performance (per se) but the row estimation issue which propagates up as part of a bigger query. But here ya go: explain analyze select era.child_entity from entity_rel era join user_entity ue on ue.entity_id = era.parent_entity and ue.user_id=12345 Nested Loop (cost=0.00..2903.37 rows=29107 width=4) (actual time=0.028..0.274 rows=201 loops=1) - Index Only Scan using entity_pk on user_entity ue (cost=0.00..62.68 rows=2 width=4) (actual time=0.011..0.012 rows=1 loops=1) Index Cond: (user_id = 12345) Heap Fetches: 1 - Index Scan using rel_parent on entity_rel era (cost=0.00..1261.85 rows=317 width=8) (actual time=0.013..0.164 rows=201 loops=1) Index Cond: (parent_entity = ue.entity_id) Total runtime: 0.361 ms Have you tried analyze (it's probably a case of insufficient/outdated statistics to planner's disposal) or probably consider changing default_statistics_target? -- Amit Langote -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why does row estimation on nested loop make no sense to me
explain analyze select era.child_entity from entity_rel era join user_entity ue on ue.entity_id = era.parent_entity and ue.user_id=12345 Nested Loop (cost=0.00..2903.37 rows=29107 width=4) (actual time=0.028..0.274 rows=201 loops=1) - Index Only Scan using entity_pk on user_entity ue (cost=0.00..62.68 rows=2 width=4) (actual time=0.011..0.012 rows=1 loops=1) Index Cond: (user_id = 12345) Heap Fetches: 1 - Index Scan using rel_parent on entity_rel era (cost=0.00..1261.85 rows=317 width=8) (actual time=0.013..0.164 rows=201 loops=1) Index Cond: (parent_entity = ue.entity_id) Total runtime: 0.361 ms Have you tried analyze (it's probably a case of insufficient/outdated statistics to planner's disposal) or probably consider changing default_statistics_target? Again - my question revolves not around the whether or not I am getting good or bad estimates - my question is related to the fact that the nested-loop row estimation does not appear to be derived from the nodes below it - it is off by orders of magnitude. I've never seen this before. That aside, yes - I did analyze and tweak stats target during experimentation - no change. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why does row estimation on nested loop make no sense to me
explain analyze select era.child_entity from entity_rel era join user_entity ue on ue.entity_id = era.parent_entity and ue.user_id=12345 Nested Loop (cost=0.00..2903.37 rows=29107 width=4) (actual time=0.028..0.274 rows=201 loops=1) - Index Only Scan using entity_pk on user_entity ue (cost=0.00..62.68 rows=2 width=4) (actual time=0.011..0.012 rows=1 loops=1) Index Cond: (user_id = 12345) Heap Fetches: 1 - Index Scan using rel_parent on entity_rel era (cost=0.00..1261.85 rows=317 width=8) (actual time=0.013..0.164 rows=201 loops=1) Index Cond: (parent_entity = ue.entity_id) Total runtime: 0.361 ms I noticed when the explain output in your first mail shows Index Cond: (user_id = 10954) whereas your query says: ue.user_id=12345. Something with that? Although, your explain analyze does show the same values at both places with the row estimate being 29107 in both cases, which, well, looks awful and quite unexpected though there seem to have been similar observations before Have you tried analyze (it's probably a case of insufficient/outdated statistics to planner's disposal) or probably consider changing default_statistics_target? Again - my question revolves not around the whether or not I am getting good or bad estimates - my question is related to the fact that the nested-loop row estimation does not appear to be derived from the nodes below it - it is off by orders of magnitude. I've never seen this before. That aside, yes - I did analyze and tweak stats target during experimentation - no change. Did you also check select count(*) on both the relations and found related numbers? -- Amit Langote -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why does row estimation on nested loop make no sense to me
- Original Message - From: Amit Langote amitlangot...@gmail.com To: Jeff Amiel becauseimj...@yahoo.com Cc: pgsql-general@postgresql.org pgsql-general@postgresql.org Sent: Friday, May 17, 2013 2:21 PM Subject: Re: [GENERAL] Why does row estimation on nested loop make no sense to me explain analyze select era.child_entity from entity_rel era join user_entity ue on ue.entity_id = era.parent_entity and ue.user_id=12345 Nested Loop (cost=0.00..2903.37 rows=29107 width=4) (actual time=0.028..0.274 rows=201 loops=1) - Index Only Scan using entity_pk on user_entity ue (cost=0.00..62.68 rows=2 width=4) (actual time=0.011..0.012 rows=1 loops=1) Index Cond: (user_id = 12345) Heap Fetches: 1 - Index Scan using rel_parent on entity_rel era (cost=0.00..1261.85 rows=317 width=8) (actual time=0.013..0.164 rows=201 loops=1) Index Cond: (parent_entity = ue.entity_id) Total runtime: 0.361 ms I noticed when the explain output in your first mail shows Index Cond: (user_id = 10954) whereas your query says: ue.user_id=12345. Something with that? Although, your explain analyze does show the same values at both places with the row estimate being 29107 in both cases, which, well, looks awful and quite unexpected though there seem to have been similar observations before That was a weak attempt at hiding 'real' data - intended to change them all to 12345. :) Did you also check select count(*) on both the relations and found related numbers? Nothing related (that I could find) on the rowcounts - one table has 20 million rows or so ad the other 65K. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general