Re: A question about the select count(*) performance and the InnoDB engine

SGreen Tue, 28 Jun 2005 08:57:30 -0700

Mathias <[EMAIL PROTECTED]> wrote on 06/28/2005 06:13:08 AM:

> Selon Behrang Saeedzadeh <[EMAIL PROTECTED]>:
> 
> > Mathias,
> >
> > Thanks a lot!
> >
> > > I will not explain the same thing for sqlserver, sybase ..., but 
when
> > > your RDBMs
> > > have a data dictionnary, you don't need to execute count(*)  :o) 
WITH
> > > Updated
> > > statistics of course.
> >
> > I'm a little bit confused here. Why the count(*) is not transformed to 
a
> > select from the data dictionary if this way is faster? And what's the
> > difference between updated statistics and statistics not updated?
> >
> > > With information_schema in 5.x and higher, innodb will act as it's 
done
> > > in all
> > > the other RDBMS.
> > >
> > > Hope that helps
> >
> > Sure! It helped by orders of magnitured more than I thought it can 
help ;-)
> >
> > > :o)
> > > Mathias
> >
> >
> >
> > --
> > Behrang Saeedzadeh
> > http://www.jroller.com/page/behrangsa
> >
> > Using Opera's revolutionary e-mail client
> >
> 
> Well,
> The information in data dictionnary are correct only just after updating 
them.
> imagine at 12h, you update statistics, num_rows=2000. At 12h05, you 
> insert 1000
> lignes and delete 500.
> 
> At 12h10, you ask the data dictinary num_rows, it will give you 2000, 
even if
> they are 2500.
> 
> 
> Hope that helps
> :o)
> Mathias
> 
Mathias,


COUNT(*) is not slow in InnoDB due to a lack of statistics. It's due to 
the fact that for any user the value of COUNT(*) can be completely 
different than for any other user.  Assume for a moment that there is a 
table stored in InnoDB that has 1000 records in it. UserA starts a 
transaction that adds 200 records and changes 50. UserB also starts a 
transaction and adds 500 records of his own. For the rest of this example, 
both transactions remain "pending".

Physically, the database now contains 1000 (original) + 200 (added by 
UserA) + 50 (changes pending from UserA) + 500 (added by UserB) = 1750 
total records. However, if UserA performs a COUNT(*) query, they would 
only be able to see the 1200 records visible within their transaction. 
UserB will only be able to count 1500 records for the same reason. 

The slowness of performing a COUNT(*) query is caused by the need to 
individually evaluate all 1750 records to see if the user that asked to 
"count" them should actually know about them. Unless the engine is changed 
to maintain a separate set of table statistics for each user there won't 
be any way to just "look up" the number because the record count can (and 
usually will) be different for each user.

After both transactions commit, the database will only have 1700 records 
(total) as the 50 pending updates, from UserA's transaction, will have 
overwritten the 50 original records. 

Does that help?

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine

Re: A question about the select count(*) performance and the InnoDB engine

Reply via email to