On Wed, Nov 2, 2016 at 4:00 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Tomas Vondra <tomas.von...@2ndquadrant.com> writes:
>> while eye-balling some explain plans for parallel queries, I got a bit
>> confused by the row count estimates. I wonder whether I'm alone.
>
> I got confused by that a minute ago, so no you're not alone.  The problem
> is even worse in join cases.  For example:
>
>  Gather  (cost=34332.00..53265.35 rows=100 width=8)
>    Workers Planned: 2
>    ->  Hash Join  (cost=33332.00..52255.35 rows=100 width=8)
>          Hash Cond: ((pp.f1 = cc.f1) AND (pp.f2 = cc.f2))
>          ->  Append  (cost=0.00..8614.96 rows=417996 width=8)
>                ->  Parallel Seq Scan on pp  (cost=0.00..8591.67 rows=416667 
> widt
> h=8)
>                ->  Parallel Seq Scan on pp1  (cost=0.00..23.29 rows=1329 
> width=8
> )
>          ->  Hash  (cost=14425.00..14425.00 rows=1000000 width=8)
>                ->  Seq Scan on cc  (cost=0.00..14425.00 rows=1000000 width=8)
>
> There are actually 1000000 rows in pp, and none in pp1.  I'm not bothered
> particularly by the nonzero estimate for pp1, because I know where that
> came from, but I'm not very happy that nowhere here does it look like
> it's estimating a million-plus rows going into the join.

I welcome suggestions for improvement, but you will note that if the
row count didn't reflect some kind of guess about the number of rows
that each individual worker will see, the costing would be hopelessly
broken.  The cost needs to reflect a guess about the time the query
will finish, not the total amount of effort expended.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to