Re: [PERFORM] 7k records into Sort node, 4.5m out?
> It isn't ... or at least, you've offered no evidence that it is. Sorry, I thought Christophe had sent you the details offlist. Checking ... -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] 7k records into Sort node, 4.5m out?
Josh Berkus writes: > I guess what I'm really not understanding is why it's calculating a cost > of 3.7m for the index scan, and then discarding that *entire* cost and > not including it in the total cost of the query? It isn't ... or at least, you've offered no evidence that it is. It's discounting some fraction of the cost on the (apparently correct) basis that the merge won't read that input all the way to the end. Whether it's discounted by an appropriate fraction is hard to tell from the information given. The actual rows count is about a quarter the whole-scan estimate, so a multiplier of 0.25 seems right in hindsight, and that seems to match up roughly right with the mergejoin cost estimate --- but not knowing the actual table size, there's a lot of uncertainty here. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] 7k records into Sort node, 4.5m out?
> Ah, I see where I was confused: in the original query plan I'd been > imagining that charlie.sierra was a unique column, but your gloss on > that as being house.district_id implies that it's highly non-unique. > And looking at the rowcounts in the original plan backs that up: > there are about 600 house rows per district row. So my thought of > having district as the outer side of a nestloop scanning the index > on house.district_id would not really work very well --- maybe it > would end up cheaper than the mergejoin plan, but it's far from a > clear-cut win. > > On the whole I'm thinking the code is operating as designed here. Well, except for the part where it's choosing a plan which takes 486 seconds over a plan which takes 4 seconds. I guess what I'm really not understanding is why it's calculating a cost of 3.7m for the index scan, and then discarding that *entire* cost and not including it in the total cost of the query? This seems wrong, especially since that index scan, in fact, ends up being 85% of the execution time of the query: Merge Join (cost=7457.670..991613.190 rows=1092168 width=4) (actual time=57.854..481062.706 rows=4514968 loops=1) Merge Cond: (charlie.sierra = four.quebec_seven) Index Scan using whiskey_delta on charlie (cost=0.000..3775171.860 rows=84904088 width=8) (actual time=0.006..459501.341 rows=20759070 loops=1) If the cost of the index scan were included in the total cost of the query plan, then the planner *would* chose the nestloop plan instead. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] 7k records into Sort node, 4.5m out?
Josh Berkus writes: > SELECT COUNT(*) > FROM "user" > INNER JOIN "house" > ON ("user"."house_id" = "house"."id") > LEFT OUTER JOIN "district" > ON ("house"."district_id" = "district"."id") > WHERE ("user"."status" = 0 > AND ("district"."update_status" = 2 > OR "district"."update_status" = 3 ) > AND ("user"."valid" = 1 > OR "user"."valid" = 3 ) > AND "district"."is_test" = false ); > However, since the anonymization above doesn't quite match that used in > the EXPLAIN plan, I'm not sure what you'll get out of it. And yes, we > know that the outer join is being invalidated. Ah, I see where I was confused: in the original query plan I'd been imagining that charlie.sierra was a unique column, but your gloss on that as being house.district_id implies that it's highly non-unique. And looking at the rowcounts in the original plan backs that up: there are about 600 house rows per district row. So my thought of having district as the outer side of a nestloop scanning the index on house.district_id would not really work very well --- maybe it would end up cheaper than the mergejoin plan, but it's far from a clear-cut win. On the whole I'm thinking the code is operating as designed here. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] 7k records into Sort node, 4.5m out?
> If you can show us the anonymized query plan, why not the anonymized query? > It doesn't look like it could be all that complicated. It's not: SELECT COUNT(*) FROM "user" INNER JOIN "house" ON ("user"."house_id" = "house"."id") LEFT OUTER JOIN "district" ON ("house"."district_id" = "district"."id") WHERE ("user"."status" = 0 AND ("district"."update_status" = 2 OR "district"."update_status" = 3 ) AND ("user"."valid" = 1 OR "user"."valid" = 3 ) AND "district"."is_test" = false ); However, since the anonymization above doesn't quite match that used in the EXPLAIN plan, I'm not sure what you'll get out of it. And yes, we know that the outer join is being invalidated. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] 7k records into Sort node, 4.5m out?
Josh Berkus writes: > On 8/15/12 1:51 PM, Tom Lane wrote: >> Maybe you had better show us the actual query, and the table/index >> definitions. Because it's sure making odd choices here. This seems >> like the wrong join order altogether ... > We'll need to do that off-list for confidentiality reasons. If you can show us the anonymized query plan, why not the anonymized query? It doesn't look like it could be all that complicated. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] 7k records into Sort node, 4.5m out?
On 8/15/12 1:51 PM, Tom Lane wrote: > Christophe Pettus writes: >> On Aug 13, 2012, at 7:11 PM, Tom Lane wrote: >>> The whole thing looks a bit weird to me --- why did it not use a >>> nestloop join with inner indexscan on charlie? With 7000 rows on the >>> other side, the estimated cost for that shouldn't have been more than >>> about 3 ... > >> Here's the same query with set enable_megejoin = off. All of the other >> query tuning parameters are default. > >> http://explain.depesz.com/s/dqO > > Maybe you had better show us the actual query, and the table/index > definitions. Because it's sure making odd choices here. This seems > like the wrong join order altogether ... We'll need to do that off-list for confidentiality reasons. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] 7k records into Sort node, 4.5m out?
Christophe Pettus writes: > On Aug 13, 2012, at 7:11 PM, Tom Lane wrote: >> The whole thing looks a bit weird to me --- why did it not use a >> nestloop join with inner indexscan on charlie? With 7000 rows on the >> other side, the estimated cost for that shouldn't have been more than >> about 3 ... > Here's the same query with set enable_megejoin = off. All of the other query > tuning parameters are default. > http://explain.depesz.com/s/dqO Maybe you had better show us the actual query, and the table/index definitions. Because it's sure making odd choices here. This seems like the wrong join order altogether ... regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] 7k records into Sort node, 4.5m out?
On Aug 13, 2012, at 7:11 PM, Tom Lane wrote: > The whole thing looks a bit weird to me --- why did it not use a > nestloop join with inner indexscan on charlie? With 7000 rows on the > other side, the estimated cost for that shouldn't have been more than > about 3 ... Here's the same query with set enable_megejoin = off. All of the other query tuning parameters are default. http://explain.depesz.com/s/dqO -- -- Christophe Pettus x...@thebuild.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] 7k records into Sort node, 4.5m out?
Christophe Pettus writes: > Thanks, that makes sense. Something a colleague of mine just noticed is that > the estimate cost of the Index Scan node isn't being included in the cost of > the Merge Join above it, which makes the Merge Join seem much cheaper than it > really is. Could this be a planner bug? No, that looks sane. It's probably expecting that the range of keys on the right-hand side is a lot less than the range of keys on the left, and thus the merge won't have to read all of the left side. Since the output shows an estimated total number of rows in the LHS of 84 million, but the join stopped after reading 20 million of them, it looks like that effect did in fact occur. If the planner had that fraction dead on, it would only have charged the mergejoin with a quarter of the indexscan's total estimated cost. It's hard to tell though exactly what it did think. The whole thing looks a bit weird to me --- why did it not use a nestloop join with inner indexscan on charlie? With 7000 rows on the other side, the estimated cost for that shouldn't have been more than about 3 ... regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] 7k records into Sort node, 4.5m out?
On Aug 13, 2012, at 6:35 PM, Tomas Vondra wrote: > On 14 Srpen 2012, 3:15, Christophe Pettus wrote: >> Consider this EXPLAIN ANALYZE output: >> >> http://explain.depesz.com/s/TCi >> >> Note the Bitmap Heap Scan at the bottom claims to be producing 7094 rows, >> and the Sort above it expects to be processing 7330 rows (the same number >> the Bitmap Heap Scan expected to produce)... but the sort is actually >> producing 4512231 rows, which the sort time would indicate is what really >> happened. How can this be? > > Hi, > > notice there's a merge join right above the sort. If there are duplicate > values in the first table (charlie in the explain plans), the matching > rows from the sort will be read repeatedly (re-scanned) and thus counted > multiple times. Thanks, that makes sense. Something a colleague of mine just noticed is that the estimate cost of the Index Scan node isn't being included in the cost of the Merge Join above it, which makes the Merge Join seem much cheaper than it really is. Could this be a planner bug? -- -- Christophe Pettus x...@thebuild.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] 7k records into Sort node, 4.5m out?
On 14 Srpen 2012, 3:15, Christophe Pettus wrote: > Consider this EXPLAIN ANALYZE output: > > http://explain.depesz.com/s/TCi > > Note the Bitmap Heap Scan at the bottom claims to be producing 7094 rows, > and the Sort above it expects to be processing 7330 rows (the same number > the Bitmap Heap Scan expected to produce)... but the sort is actually > producing 4512231 rows, which the sort time would indicate is what really > happened. How can this be? Hi, notice there's a merge join right above the sort. If there are duplicate values in the first table (charlie in the explain plans), the matching rows from the sort will be read repeatedly (re-scanned) and thus counted multiple times. Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] 7k records into Sort node, 4.5m out?
Consider this EXPLAIN ANALYZE output: http://explain.depesz.com/s/TCi Note the Bitmap Heap Scan at the bottom claims to be producing 7094 rows, and the Sort above it expects to be processing 7330 rows (the same number the Bitmap Heap Scan expected to produce)... but the sort is actually producing 4512231 rows, which the sort time would indicate is what really happened. How can this be? -- -- Christophe Pettus x...@thebuild.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance