Re: [PERFORM] count * performance issue

Bill Moran Thu, 06 Mar 2008 07:43:19 -0800

In response to Craig James <[EMAIL PROTECTED]>:

> In the 3 years I've been using Postgres, the problem of count() performance 
> has come up more times than I can recall, and each time the answer is, "It's 
> a sequential scan -- redesign your application."
> 
> My question is: What do the other databases do that Postgres can't do, and 
> why not?
> 
> Count() on Oracle and MySQL is almost instantaneous, even for very large 
> tables. So why can't Postgres do what they do?


I don't know about Oracle, but MySQL has this problem as well.  Use
innodb tables and see how slow it is.  The only reason myisam tables
don't have this problem is because they don't implement any of the
features that make the problem difficult to solve.

> On the one hand, I understand that Postgres has its architecture, and I 
> understand the issue of row visibility, and so forth.  On the other hand, my 
> database is just sitting there, nothing going on, no connections except me, 
> and... it takes FIFTY FIVE SECONDS to count 20 million rows, a query that 
> either Oracle or MySQL would answer in a fraction of a second.  It's hard for 
> me to believe there isn't a better way.

There's been discussion about putting visibility information in indexes.
I don't know how far along that effort is, but I expect that will improve
count() performance significantly.

> This is a real problem.  Countless people (including me) have spent 
> significant effort rewriting applications because of this performance flaw in 
> Postgres.  Over and over, the response is, "You don't really need to do that 
> ... change your application."  Well, sure, it's always possible to change the 
> application, but that misses the point.  To most of us users, count() seems 
> like it should be a trivial operation.  On other relational database systems, 
> it is a trivial operation.
> 
> This is really a significant flaw on an otherwise excellent relational 
> database system.

Not really.  It really is a design flaw in your application ... it doesn't
make relational sense to use the number of rows in a table for anything.
Just because other people do it frequently doesn't make it right.

That being said, it's still a useful feature, and I don't hear anyone
denying that.  As I said, google around a bit WRT to PG storing
visibility information in indexes, as I think that's the way this will
be improved.

> My rant for today...

Feel better now?

-- 
Bill Moran

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance

Re: [PERFORM] count * performance issue

Reply via email to