R: R: R: why CPU is high while disks are idle in a table scan???

2004-06-22 Thread Leonardo Francalanci
Don't worry, I found: now my table has 64,000,000 rows...
there is no way it can cache it in memory...
Now the problem I have is that connectorJ crashes
(hasn't crashed before, when the table had 16,000,000 rows)

I'm going to write another email...

thank you anyway

> -Messaggio originale-
> Da: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
> Inviato: martedi 22 giugno 2004 15.22
> A: Leonardo Francalanci
> Cc: Mysql
> Oggetto: Re: R: R: why CPU is high while disks are idle in a table
> scan???
>
>
>
> Well, this is a very unique question. How can we force MySQL to IGNORE an
> index.  Is this so you can test your disk systems data throughput? I am
> quite interested in why you need to see the disk activity in order to
> understand if it's working or not.
>
> I have concentrated so hard and so long on making sure that it DOES use
> indices whenever they are available I am not sure how to answer that
> question. I know there is a setting that controls the size of the "key
> buffer" perhaps if you tune it to be really small it will force MySQL to
> read the indices straight off of the disk (or maybe not, I am not sure).
>
> Have you tried specifying a "wrong" index in your select statement (so as
> to force a table scan)  with the USE keyword?
>
>  Another solution could be to just get rid of the indices. Then
> ALL lookups
> would be table scans. You could replace each primary key with a unique
> constraint (not a unique index) to keep out the duplicates.
>
> Or search on a column that is not part of an index.
>
> Another solution could be to restart the server just before each test.
> Doing that will ensure that the cache is empty and that any data
> that needs
> to come from disk will do so (including indices). Merely flushing
> the cache
> while running will commit changes to disk but I don't believe that it will
> clear the cache.
>
> The excellent people on the MySQL internals discussion may be better
> equipped to answer this.
>
> Curiously yours,
> Shawn Green
> Database Administrator
> Unimin Corporation - Spruce Pine
>
>
>
>
>
>
>   "Leonardo
>
>   Francalanci"     To:       "Mysql"
> <[EMAIL PROTECTED]>
>   <[EMAIL PROTECTED]cc:
>
>   tel.ie>  Fax to:
>
>Subject:  R: R:
> why CPU is high while disks are idle in a table scan???
>   06/22/2004 03:55
>
>   AM
>
>
>
>
>
>
>
>
>
> First: thank you for sparing so much time with me.
>
> > There are 3 major factors determining your performance: 1) the
> > speed of the
> > CPU, 2) the size if your RAM, and 3) the data transfer speed of
> > your disks.
>
> I thought that the size of tables in the db would have made a difference.
> From the analysis you did and from my tests it doesn't look so...
>
> > 5) If your application is smart enough, you can split your data over
> > several servers. This would be a SERIOUS performance increase.
>
> In fact I'm writing a jdbc driver in java that I hope will be included
> in c-jdbc (http://c-jdbc.objectweb.org/), which is a java database
> cluster middleware (check it out, it seems very interesting)
>
> > 2) you can move parts of your data to other disks. Using multiple I/O
> > channels should be faster than a single channel.
>
> That is what I am trying right now, but I am having problems because
> it seems that there is no disk I/O if I access by index on my tables,
> even if I access most of the table data (while with a table scan
> disks work as expected).
>
> Does it happen because I have 1Gb of Ram? If yes: I know that this
> sounds like a silly question but... How can I "fill" the memory of my
> machine so it doesn't cache data? (I can't just remove memory, I don't
> have physical access to the machine)
>
>
> Thank you
>
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
>
>
>
>
>
>
>


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: R: R: why CPU is high while disks are idle in a table scan???

2004-06-22 Thread SGreen

Well, this is a very unique question. How can we force MySQL to IGNORE an
index.  Is this so you can test your disk systems data throughput? I am
quite interested in why you need to see the disk activity in order to
understand if it's working or not.

I have concentrated so hard and so long on making sure that it DOES use
indices whenever they are available I am not sure how to answer that
question. I know there is a setting that controls the size of the "key
buffer" perhaps if you tune it to be really small it will force MySQL to
read the indices straight off of the disk (or maybe not, I am not sure).

Have you tried specifying a "wrong" index in your select statement (so as
to force a table scan)  with the USE keyword?

 Another solution could be to just get rid of the indices. Then ALL lookups
would be table scans. You could replace each primary key with a unique
constraint (not a unique index) to keep out the duplicates.

Or search on a column that is not part of an index.

Another solution could be to restart the server just before each test.
Doing that will ensure that the cache is empty and that any data that needs
to come from disk will do so (including indices). Merely flushing the cache
while running will commit changes to disk but I don't believe that it will
clear the cache.

The excellent people on the MySQL internals discussion may be better
equipped to answer this.

Curiously yours,
Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine




   

  "Leonardo

  Francalanci" To:   "Mysql" <[EMAIL PROTECTED]>   

  <[EMAIL PROTECTED]cc:
 
  tel.ie>  Fax to: 
    
                               Subject:  R: R: why CPU is high while 
disks are idle in a table scan??? 
  06/22/2004 03:55 

  AM   

   

   





First: thank you for sparing so much time with me.

> There are 3 major factors determining your performance: 1) the
> speed of the
> CPU, 2) the size if your RAM, and 3) the data transfer speed of
> your disks.

I thought that the size of tables in the db would have made a difference.
>From the analysis you did and from my tests it doesn't look so...

> 5) If your application is smart enough, you can split your data over
> several servers. This would be a SERIOUS performance increase.

In fact I'm writing a jdbc driver in java that I hope will be included
in c-jdbc (http://c-jdbc.objectweb.org/), which is a java database
cluster middleware (check it out, it seems very interesting)

> 2) you can move parts of your data to other disks. Using multiple I/O
> channels should be faster than a single channel.

That is what I am trying right now, but I am having problems because
it seems that there is no disk I/O if I access by index on my tables,
even if I access most of the table data (while with a table scan
disks work as expected).

Does it happen because I have 1Gb of Ram? If yes: I know that this
sounds like a silly question but... How can I "fill" the memory of my
machine so it doesn't cache data? (I can't just remove memory, I don't
have physical access to the machine)


Thank you


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]







-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



R: R: why CPU is high while disks are idle in a table scan???

2004-06-22 Thread Leonardo Francalanci
First: thank you for sparing so much time with me.

> There are 3 major factors determining your performance: 1) the 
> speed of the
> CPU, 2) the size if your RAM, and 3) the data transfer speed of 
> your disks.

I thought that the size of tables in the db would have made a difference.
>From the analysis you did and from my tests it doesn't look so...

> 5) If your application is smart enough, you can split your data over
> several servers. This would be a SERIOUS performance increase.

In fact I'm writing a jdbc driver in java that I hope will be included
in c-jdbc (http://c-jdbc.objectweb.org/), which is a java database
cluster middleware (check it out, it seems very interesting)

> 2) you can move parts of your data to other disks. Using multiple I/O
> channels should be faster than a single channel.

That is what I am trying right now, but I am having problems because
it seems that there is no disk I/O if I access by index on my tables,
even if I access most of the table data (while with a table scan 
disks work as expected).

Does it happen because I have 1Gb of Ram? If yes: I know that this
sounds like a silly question but... How can I "fill" the memory of my
machine so it doesn't cache data? (I can't just remove memory, I don't
have physical access to the machine)


Thank you


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: R: why CPU is high while disks are idle in a table scan???

2004-06-21 Thread SGreen
  
  tel.ie>  Fax to: 
         
                       Subject:  R: why CPU is high while 
disks are idle in a table scan??? 
  06/21/2004 01:20 
 
  PM   
 
   
 
   
 




Sorry, now it runs as espected (but I don't know what happened, maybe doing
other queries
I emptied out the cache)

Thank you anyway for your replies.

And: could somebody answer my previous question?

I wrote:


I have to deal with pretty big data, such a table (say BIG_TABLE) with
16.000.000 of rows.
The table has just one primary index (a simple int).

I thought that splitting data among multiple tables would have boosted
performance,
so I put my data in 16 different tables (everything is on one disk).
The first table had IDs from 1 to 1.500.000, the second the range
151-300 and so on.
When I made performance tests, I found that I didn't get any performance
gain.

So I tried (just for test purpose) to create a table of 10 rows (say
SMALL_TABLE) with some data
(IDs from 1.000.000 to 1.100.000) from my big table BIG_TABLE.
Tests were like follows: (pseudocode)

for 10 times
 read a column from BIG_TABLE where ID=


for 10 times
 read a column from SMALL_TABLE where ID=


The results are the same!
There is no noticeable difference between a table with 16M rows and a table
with 100K rows!
Is that because my IDs are simple int from 1 to n (without gaps)?

I've tried to do 10 inserts in both tables, and BIG_TABLE takes 45
seconds while
SMALL_TABLE takes 43 secs (not such a big difference for a table 160 times
smaller!)

The only big performance gain I got is from a SUM on a big range, because
MySql in this case
does a full scan on the SMALL_TABLE which is faster than an access by index
on the BIG_TABLE.

Am I doing something wrong? Isn't splitting supposed to improve
performance?
Or it is true
only if you can split data on multiple disks?


Thank you

Leonardo


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:
http://lists.mysql.com/[EMAIL PROTECTED]


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]







-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: R: R: why CPU is high while disks are idle in a table scan???

2004-06-21 Thread Jeremy Zawodny
On Mon, Jun 21, 2004 at 07:37:50PM +0200, Leonardo Francalanci wrote:
> > Oh.  You didn't say thta. :-)
> 
> Well, I wrote
> " Doing a simple "SELECT SUM(MYFIELD) FROM MYTABLE" I noticed "
> but don't worry ;)

Oh.  Damned Mondays.  I never liked 'em. :-(

> > How much RAM does your machine have?  If it's nearly all cached,
> > you'll be CPU bound rather than disk bound.
> 
> 1Gb.
> 
> I'm using Solaris 8.
> 
> The SUM() works as expected now (disks works a lot), but accessing by key
> is very cpu-intense and still 0% work on disk.
> 
> To sum up: table scans are very disk-intense (which make sense),
> accessing by key leaves disk on idle (I see that it should not work
> as with a table scan, but at least a little!)

Hmm.
-- 
Jeremy D. Zawodny |  Perl, Web, MySQL, Linux Magazine, Yahoo!
<[EMAIL PROTECTED]>  |  http://jeremy.zawodny.com/

[book] High Performance MySQL -- http://highperformancemysql.com/

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



R: R: why CPU is high while disks are idle in a table scan???

2004-06-21 Thread Leonardo Francalanci
> Oh.  You didn't say thta. :-)

Well, I wrote
" Doing a simple "SELECT SUM(MYFIELD) FROM MYTABLE" I noticed "
but don't worry ;)


> How much RAM does your machine have?  If it's nearly all cached,
> you'll be CPU bound rather than disk bound.

1Gb.

I'm using Solaris 8.

The SUM() works as expected now (disks works a lot), but accessing by key
is very cpu-intense and still 0% work on disk.

To sum up: table scans are very disk-intense (which make sense), accessing
by
key leaves disk on idle (I see that it should not work as with a table scan,
but at least a little!)


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: R: why CPU is high while disks are idle in a table scan???

2004-06-21 Thread Jeremy Zawodny
On Mon, Jun 21, 2004 at 07:10:59PM +0200, Leonardo Francalanci wrote:
> > Hard to say, but in a table scan the CPU does have a lot of work to
> > do.  It needs to do about 16,000,000 comparisons (based on your info).
> 
> Why comparison? It's a sum...

Oh.  You didn't say thta. :-)

> And the table is not small: 272,000,000 bytes!
> 
> And disk is very low (almost 0%)

How much RAM does your machine have?  If it's nearly all cached,
you'll be CPU bound rather than disk bound.

Jeremy
-- 
Jeremy D. Zawodny |  Perl, Web, MySQL, Linux Magazine, Yahoo!
<[EMAIL PROTECTED]>  |  http://jeremy.zawodny.com/

[book] High Performance MySQL -- http://highperformancemysql.com/

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



R: why CPU is high while disks are idle in a table scan???

2004-06-21 Thread Leonardo Francalanci
Sorry, now it runs as espected (but I don't know what happened, maybe doing
other queries
I emptied out the cache)

Thank you anyway for your replies.

And: could somebody answer my previous question?

I wrote:


I have to deal with pretty big data, such a table (say BIG_TABLE) with
16.000.000 of rows.
The table has just one primary index (a simple int).

I thought that splitting data among multiple tables would have boosted
performance,
so I put my data in 16 different tables (everything is on one disk).
The first table had IDs from 1 to 1.500.000, the second the range
151-300 and so on.
When I made performance tests, I found that I didn't get any performance
gain.

So I tried (just for test purpose) to create a table of 10 rows (say
SMALL_TABLE) with some data
(IDs from 1.000.000 to 1.100.000) from my big table BIG_TABLE.
Tests were like follows: (pseudocode)

for 10 times
read a column from BIG_TABLE where ID=


for 10 times
read a column from SMALL_TABLE where ID=


The results are the same!
There is no noticeable difference between a table with 16M rows and a table
with 100K rows!
Is that because my IDs are simple int from 1 to n (without gaps)?

I've tried to do 10 inserts in both tables, and BIG_TABLE takes 45
seconds while
SMALL_TABLE takes 43 secs (not such a big difference for a table 160 times
smaller!)

The only big performance gain I got is from a SUM on a big range, because
MySql in this case
does a full scan on the SMALL_TABLE which is faster than an access by index
on the BIG_TABLE.

Am I doing something wrong? Isn't splitting supposed to improve performance?
Or it is true
only if you can split data on multiple disks?


Thank you

Leonardo


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



R: why CPU is high while disks are idle in a table scan???

2004-06-21 Thread Leonardo Francalanci
> Hard to say, but in a table scan the CPU does have a lot of work to
> do.  It needs to do about 16,000,000 comparisons (based on your info).

Why comparison? It's a sum...
And the table is not small: 272,000,000 bytes!

And disk is very low (almost 0%)

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]