Re: [PERFORM] select count(distinct ...) is slower than select distinct in about 5x

jacket41142 Wed, 11 Dec 2013 19:23:58 -0800

2013/12/11 Jeff Janes <jeff.ja...@gmail.com>

> On Tuesday, December 10, 2013, jacket41142 wrote:
>
>> Thanks very much.
>>
>> I think another problem is that the cost estimation isn't good enough to
>> reflex real cost. Since we can see, from "explain analyze ...",
>> count(distinct ...) has smallest cost between the others, but since it uses
>> sorts, the time complexity should be higher especially for large amount of
>> rows.
>>
>
> That one is easy to explain.  The cost estimate is not intended to be an
> absolute estimate, it is just an relative estimate to choose between
> alternatives.   Since the current implementation of count(distinct ...)
> does not *have* any alternatives for that step in the process, there is no
> point in estimating a cost for it.  So part of providing it with
> alternatives will have to be providing those cost estimates as well.
>


I got it. Thanks very much for explain.


>
>> Also I think even if we can have multiple count() expressions, the
>> optimizer should also be able to choose between use sort, HashAggregate or
>> maybe something like linear aggregate if sorts are not needed or other
>> methods if exist. Also this may be done as just one job for entire table of
>> interested columns, or for each column separately.
>>
>
> Right.  I hope this gets fixed.  It's been on my todo list for a while,
> but at the current rate of going through my todo list, it will takes a few
> decades to get to if it is left up to me....
>

Thanks very much for your effort. Also it's still good to know for me that
this problem will be fixed in future. :)
And so until now, if someone want to use count(distinct ...), he can use a
workaround like subquery if performance is a concern. (Of course, he also
needs to take care about NULL values as mentioned in
http://www.postgresql.org/message-id/flat/CAPNY-2Utce-c+kNTwsMCbAk58=9myeaevitxt9lo7r1k77j...@mail.gmail.com#CAPNY-2Utce-c+kNTwsMCbAk58=9myeaevitxt9lo7r1k77j...@mail.gmail.com
)


best regards,
jacket41142

Re: [PERFORM] select count(distinct ...) is slower than select distinct in about 5x

Reply via email to