2013/12/11 Jeff Janes <jeff.ja...@gmail.com> > On Tuesday, December 10, 2013, jacket41142 wrote: > >> Thanks very much. >> >> I think another problem is that the cost estimation isn't good enough to >> reflex real cost. Since we can see, from "explain analyze ...", >> count(distinct ...) has smallest cost between the others, but since it uses >> sorts, the time complexity should be higher especially for large amount of >> rows. >> > > That one is easy to explain. The cost estimate is not intended to be an > absolute estimate, it is just an relative estimate to choose between > alternatives. Since the current implementation of count(distinct ...) > does not *have* any alternatives for that step in the process, there is no > point in estimating a cost for it. So part of providing it with > alternatives will have to be providing those cost estimates as well. >
I got it. Thanks very much for explain. > >> Also I think even if we can have multiple count() expressions, the >> optimizer should also be able to choose between use sort, HashAggregate or >> maybe something like linear aggregate if sorts are not needed or other >> methods if exist. Also this may be done as just one job for entire table of >> interested columns, or for each column separately. >> > > Right. I hope this gets fixed. It's been on my todo list for a while, > but at the current rate of going through my todo list, it will takes a few > decades to get to if it is left up to me.... > Thanks very much for your effort. Also it's still good to know for me that this problem will be fixed in future. :) And so until now, if someone want to use count(distinct ...), he can use a workaround like subquery if performance is a concern. (Of course, he also needs to take care about NULL values as mentioned in http://www.postgresql.org/message-id/flat/CAPNY-2Utce-c+kNTwsMCbAk58=9myeaevitxt9lo7r1k77j...@mail.gmail.com#CAPNY-2Utce-c+kNTwsMCbAk58=9myeaevitxt9lo7r1k77j...@mail.gmail.com ) best regards, jacket41142