Use "iostat -x" while the query is running. You are likely I/O bound doing a
table scan on the protected (BIG) table.

There has been alot of discussion about RAM and CPU on this thread regarding
performance, but nothing regarding disk I/O. If you're going to put tens of
millions of records in a database and expect fast random access to them
(i.e. not just reading the more recently inserted records). Then you need
lots of disks and a good volume manager. Don't try to compensate for lack of
disk with lots of RAM. Eventually the DB will be too big to budget for that
RAM.

Plan the disk requirements before anything else. A good rule of thumb is
that a 10K SCSI disk can do about 200 random I/O per second. You can verify
your disks using bonnie (http://www.textuality.com/bonnie/) or the like
(make sure you are looking at RANDOM I/O not sequential I/O).

Next compute the rate of I/O needed. If you are using InnoDB (which you
should, otherwise MyISAM's course grain locking will introduce
non-linearities into the performance that are too hard to model) then you
know I/O is done in 16K pages. How many such pages need to be read per
second? Look at all your queries and determine the access plans. Focus on
the ones that will dominate I/O requirements. Understanding how InnoDB lays
out data is critical for modeling. Each secondary index is in its own B-Tree
with leaves containing the value of the primary key. The data itself is in
another B-Tree keyed off the primary key. (Very similar to Oracle w/ index
organized tables.)

Unless you can benefit from clustering around the primary key, it safest to
assume one I/O per leaf-item (secondary or primary) as the tables and
indexes get large. For back of the envelope calculations its okay to ignore
the non-leaf pages and assume they are cached.

Hence for a table with primary and secondary keys named PK and SK
respectively:
"SELECT * FROM FOO WHERE PK = ?" = 1 I/O
"SELECT * FROM FOO WHERE SK = ?" = 2 I/O
"SELECT FOO.* FROM FOO, BAR WHERE FOO.SK = BAR.PK AND BAR.SK = ?" = 3 I/O

Now take all the queries and determine the number of their occurances for
some unit of work. That unit of work should correspond to something
externally observable event, like "user logins". Say you have 3 queries in
the system like this:

Query   # I/O   Occurances Per Unit Work
  Q1       3      3
  Q2       4      2
  Q3       1      3

Now you can say that on average it takes 20 I/Os per unit of work. With one
disk you can do 200/20 = 10 Units of work / second / disk.

To ensure that performance scales with disks, stripe across the disks. This
is where volume management becomes key, MySQL/InnoDB fills each datafile
sequentially so don't just put each datafile on its own disk, that will just
create hotspots and you will be bound by the performance of a single disk.

This is the general idea of planning with any database. The previous only
covers equality match via an index but you can figure out the difference for
tablescans (consider how many rows fit into a page) and range scans
(consider the affect of the primary key as a cluster index).

-Wayne


-----Original Message-----


I am not sure. Does anyone know any real examples of
mysql handling huge database and still perform well? I
am having problems with the performance with the MySQL
Left join recently. A big table (about 2.5 million
records) left join a small table (about 350K records)
takes generally 2 mins to finish. I check the
"explain" and primary key index on the small table was
indeed used for the joining. My system is Redhat Linux
7.3 with 4 GB memory. I also tried replacing the
default my.cnf with my-huge.cnf. It didn't help at
all.

Another thing, with some linux system, there is a size
limit for file. MySQL seems to store each of its table
as single file. You need to choose a file system
without that limit.

Qunfeng Dong
--- "B.G. Mahesh" <[EMAIL PROTECTED]>
wrote:
>
> hi
>
> We are evaluating few databases for developing an
> application with
> following specs,
>
> 1.    OS not very important. Leaning towards Linux
>
> 2.    Currently the database has about 5 million
> records but it will grow
> to 120 million records.
>
> 3.    The tables will have billing information for a
> telecom company.
> Nothing complex.
>
> 4.    Back office staff will use the data in the
> database to create
> invoices to be sent to customers. This data is not
> connected to the
> live telecom system [e.g. switches etc]. We get the
> data every day
> from the telecom company.
>
> 5.    Staff may perform queries on the database to get
> reports like
> "busiest hour of the day" etc etc. I don't see too
> many concurrent
> users using the system, however the system needs to
> be stable.
>
> 6.    Need to create excel, pdf files from the data in
> the database. This
> I think has nothing to do with the database, however
> this is a requirement.
>
> 7.    Needless to say, good security is a must which
> will also be built
> into the front end application.
>
> We are considering the following databases,
>
> 1.    MYSQL
> 2.    Postgres
> 3.    Oracle
> 4.    MSQL
>
> If MYSQL or Postgres can do the job I prefer not to
> spend the money on
> Oracle/MSQL. However, if Oracle/MSQL are required
> for getting good
> reports and scalability, so be it. We will use
> Oracle/MSQL.
>
> Any pointers/advice is appreciated
>
>
> --
> --
> B.G. Mahesh
> mailto:[EMAIL PROTECTED]
> http://www.indiainfo.com/
> India's first ISO certified portal
>
>
---------------------------------------------------------------------
> Before posting, please check:
>    http://www.mysql.com/manual.php   (the manual)
>    http://lists.mysql.com/           (the list
> archive)
>
> To request this thread, e-mail
> <[EMAIL PROTECTED]>
> To unsubscribe, e-mail
>
<[EMAIL PROTECTED]>
> Trouble unsubscribing? Try:
> http://lists.mysql.com/php/unsubscribe.php
>


__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail
<[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php


---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Reply via email to