Re: [PERFORM] 7k records into Sort node, 4.5m out?

2012-08-20 Thread Josh Berkus

> It isn't ... or at least, you've offered no evidence that it is.

Sorry, I thought Christophe had sent you the details offlist.  Checking ...


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 7k records into Sort node, 4.5m out?

2012-08-20 Thread Tom Lane
Josh Berkus  writes:
> I guess what I'm really not understanding is why it's calculating a cost
> of 3.7m for the index scan, and then discarding that *entire* cost and
> not including it in the total cost of the query?

It isn't ... or at least, you've offered no evidence that it is.
It's discounting some fraction of the cost on the (apparently correct)
basis that the merge won't read that input all the way to the end.
Whether it's discounted by an appropriate fraction is hard to tell
from the information given.  The actual rows count is about a quarter
the whole-scan estimate, so a multiplier of 0.25 seems right in
hindsight, and that seems to match up roughly right with the mergejoin
cost estimate --- but not knowing the actual table size, there's a lot
of uncertainty here.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 7k records into Sort node, 4.5m out?

2012-08-19 Thread Josh Berkus

> Ah, I see where I was confused: in the original query plan I'd been
> imagining that charlie.sierra was a unique column, but your gloss on
> that as being house.district_id implies that it's highly non-unique.
> And looking at the rowcounts in the original plan backs that up:
> there are about 600 house rows per district row.  So my thought of
> having district as the outer side of a nestloop scanning the index
> on house.district_id would not really work very well --- maybe it
> would end up cheaper than the mergejoin plan, but it's far from a
> clear-cut win.
> 
> On the whole I'm thinking the code is operating as designed here.

Well, except for the part where it's choosing a plan which takes 486
seconds over a plan which takes 4 seconds.

I guess what I'm really not understanding is why it's calculating a cost
of 3.7m for the index scan, and then discarding that *entire* cost and
not including it in the total cost of the query?  This seems wrong,
especially since that index scan, in fact, ends up being 85% of the
execution time of the query:

 Merge Join (cost=7457.670..991613.190 rows=1092168 width=4) (actual
time=57.854..481062.706 rows=4514968 loops=1)

Merge Cond: (charlie.sierra = four.quebec_seven)

Index Scan using whiskey_delta on charlie (cost=0.000..3775171.860
rows=84904088 width=8) (actual time=0.006..459501.341 rows=20759070 loops=1)

If the cost of the index scan were included in the total cost of the
query plan, then the planner *would* chose the nestloop plan instead.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 7k records into Sort node, 4.5m out?

2012-08-15 Thread Tom Lane
Josh Berkus  writes:
> SELECT COUNT(*)
> FROM "user"
> INNER JOIN "house"
>   ON ("user"."house_id" = "house"."id")
> LEFT OUTER JOIN "district"
>   ON ("house"."district_id" = "district"."id")
> WHERE ("user"."status" = 0
>  AND ("district"."update_status" = 2
> OR "district"."update_status" = 3 )
>  AND ("user"."valid" = 1
> OR "user"."valid" = 3 )
>  AND "district"."is_test" = false );

> However, since the anonymization above doesn't quite match that used in
> the EXPLAIN plan, I'm not sure what you'll get out of it.  And yes, we
> know that the outer join is being invalidated.

Ah, I see where I was confused: in the original query plan I'd been
imagining that charlie.sierra was a unique column, but your gloss on
that as being house.district_id implies that it's highly non-unique.
And looking at the rowcounts in the original plan backs that up:
there are about 600 house rows per district row.  So my thought of
having district as the outer side of a nestloop scanning the index
on house.district_id would not really work very well --- maybe it
would end up cheaper than the mergejoin plan, but it's far from a
clear-cut win.

On the whole I'm thinking the code is operating as designed here.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 7k records into Sort node, 4.5m out?

2012-08-15 Thread Josh Berkus

> If you can show us the anonymized query plan, why not the anonymized query?
> It doesn't look like it could be all that complicated.

It's not:

SELECT COUNT(*)
FROM "user"
INNER JOIN "house"
  ON ("user"."house_id" = "house"."id")
LEFT OUTER JOIN "district"
  ON ("house"."district_id" = "district"."id")
WHERE ("user"."status" = 0
 AND ("district"."update_status" = 2
OR "district"."update_status" = 3 )
 AND ("user"."valid" = 1
OR "user"."valid" = 3 )
 AND "district"."is_test" = false );

However, since the anonymization above doesn't quite match that used in
the EXPLAIN plan, I'm not sure what you'll get out of it.  And yes, we
know that the outer join is being invalidated.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 7k records into Sort node, 4.5m out?

2012-08-15 Thread Tom Lane
Josh Berkus  writes:
> On 8/15/12 1:51 PM, Tom Lane wrote:
>> Maybe you had better show us the actual query, and the table/index
>> definitions.  Because it's sure making odd choices here.  This seems
>> like the wrong join order altogether ...

> We'll need to do that off-list for confidentiality reasons.

If you can show us the anonymized query plan, why not the anonymized query?
It doesn't look like it could be all that complicated.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 7k records into Sort node, 4.5m out?

2012-08-15 Thread Josh Berkus
On 8/15/12 1:51 PM, Tom Lane wrote:
> Christophe Pettus  writes:
>> On Aug 13, 2012, at 7:11 PM, Tom Lane wrote:
>>> The whole thing looks a bit weird to me --- why did it not use a
>>> nestloop join with inner indexscan on charlie?  With 7000 rows on the
>>> other side, the estimated cost for that shouldn't have been more than
>>> about 3 ...
> 
>> Here's the same query with set enable_megejoin = off.  All of the other 
>> query tuning parameters are default.
> 
>>  http://explain.depesz.com/s/dqO
> 
> Maybe you had better show us the actual query, and the table/index
> definitions.  Because it's sure making odd choices here.  This seems
> like the wrong join order altogether ...

We'll need to do that off-list for confidentiality reasons.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 7k records into Sort node, 4.5m out?

2012-08-15 Thread Tom Lane
Christophe Pettus  writes:
> On Aug 13, 2012, at 7:11 PM, Tom Lane wrote:
>> The whole thing looks a bit weird to me --- why did it not use a
>> nestloop join with inner indexscan on charlie?  With 7000 rows on the
>> other side, the estimated cost for that shouldn't have been more than
>> about 3 ...

> Here's the same query with set enable_megejoin = off.  All of the other query 
> tuning parameters are default.

>   http://explain.depesz.com/s/dqO

Maybe you had better show us the actual query, and the table/index
definitions.  Because it's sure making odd choices here.  This seems
like the wrong join order altogether ...

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 7k records into Sort node, 4.5m out?

2012-08-15 Thread Christophe Pettus

On Aug 13, 2012, at 7:11 PM, Tom Lane wrote:
> The whole thing looks a bit weird to me --- why did it not use a
> nestloop join with inner indexscan on charlie?  With 7000 rows on the
> other side, the estimated cost for that shouldn't have been more than
> about 3 ...

Here's the same query with set enable_megejoin = off.  All of the other query 
tuning parameters are default.

http://explain.depesz.com/s/dqO

--
-- Christophe Pettus
   x...@thebuild.com



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 7k records into Sort node, 4.5m out?

2012-08-13 Thread Tom Lane
Christophe Pettus  writes:
> Thanks, that makes sense.  Something a colleague of mine just noticed is that 
> the estimate cost of the Index Scan node isn't being included in the cost of 
> the Merge Join above it, which makes the Merge Join seem much cheaper than it 
> really is.  Could this be a planner bug?

No, that looks sane.  It's probably expecting that the range of keys on
the right-hand side is a lot less than the range of keys on the left,
and thus the merge won't have to read all of the left side.  Since the
output shows an estimated total number of rows in the LHS of 84 million,
but the join stopped after reading 20 million of them, it looks like
that effect did in fact occur.  If the planner had that fraction dead
on, it would only have charged the mergejoin with a quarter of the
indexscan's total estimated cost.  It's hard to tell though exactly what
it did think.

The whole thing looks a bit weird to me --- why did it not use a
nestloop join with inner indexscan on charlie?  With 7000 rows on the
other side, the estimated cost for that shouldn't have been more than
about 3 ...

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 7k records into Sort node, 4.5m out?

2012-08-13 Thread Christophe Pettus

On Aug 13, 2012, at 6:35 PM, Tomas Vondra wrote:

> On 14 Srpen 2012, 3:15, Christophe Pettus wrote:
>> Consider this EXPLAIN ANALYZE output:
>> 
>>  http://explain.depesz.com/s/TCi
>> 
>> Note the Bitmap Heap Scan at the bottom claims to be producing 7094 rows,
>> and the Sort above it expects to be processing 7330 rows (the same number
>> the Bitmap Heap Scan expected to produce)... but the sort is actually
>> producing 4512231 rows, which the sort time would indicate is what really
>> happened.  How can this be?
> 
> Hi,
> 
> notice there's a merge join right above the sort. If there are duplicate
> values in the first table (charlie in the explain plans), the matching
> rows from the sort will be read repeatedly (re-scanned) and thus counted
> multiple times.

Thanks, that makes sense.  Something a colleague of mine just noticed is that 
the estimate cost of the Index Scan node isn't being included in the cost of 
the Merge Join above it, which makes the Merge Join seem much cheaper than it 
really is.  Could this be a planner bug?

--
-- Christophe Pettus
   x...@thebuild.com



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 7k records into Sort node, 4.5m out?

2012-08-13 Thread Tomas Vondra
On 14 Srpen 2012, 3:15, Christophe Pettus wrote:
> Consider this EXPLAIN ANALYZE output:
>
>   http://explain.depesz.com/s/TCi
>
> Note the Bitmap Heap Scan at the bottom claims to be producing 7094 rows,
> and the Sort above it expects to be processing 7330 rows (the same number
> the Bitmap Heap Scan expected to produce)... but the sort is actually
> producing 4512231 rows, which the sort time would indicate is what really
> happened.  How can this be?

Hi,

notice there's a merge join right above the sort. If there are duplicate
values in the first table (charlie in the explain plans), the matching
rows from the sort will be read repeatedly (re-scanned) and thus counted
multiple times.

Tomas



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] 7k records into Sort node, 4.5m out?

2012-08-13 Thread Christophe Pettus
Consider this EXPLAIN ANALYZE output:

http://explain.depesz.com/s/TCi

Note the Bitmap Heap Scan at the bottom claims to be producing 7094 rows, and 
the Sort above it expects to be processing 7330 rows (the same number the 
Bitmap Heap Scan expected to produce)... but the sort is actually producing 
4512231 rows, which the sort time would indicate is what really happened.  How 
can this be?

--
-- Christophe Pettus
   x...@thebuild.com



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance