R: R: R: why CPU is high while disks are idle in a table scan???
Don't worry, I found: now my table has 64,000,000 rows... there is no way it can cache it in memory... Now the problem I have is that connectorJ crashes (hasn't crashed before, when the table had 16,000,000 rows) I'm going to write another email... thank you anyway > -Messaggio originale- > Da: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > Inviato: martedi 22 giugno 2004 15.22 > A: Leonardo Francalanci > Cc: Mysql > Oggetto: Re: R: R: why CPU is high while disks are idle in a table > scan??? > > > > Well, this is a very unique question. How can we force MySQL to IGNORE an > index. Is this so you can test your disk systems data throughput? I am > quite interested in why you need to see the disk activity in order to > understand if it's working or not. > > I have concentrated so hard and so long on making sure that it DOES use > indices whenever they are available I am not sure how to answer that > question. I know there is a setting that controls the size of the "key > buffer" perhaps if you tune it to be really small it will force MySQL to > read the indices straight off of the disk (or maybe not, I am not sure). > > Have you tried specifying a "wrong" index in your select statement (so as > to force a table scan) with the USE keyword? > > Another solution could be to just get rid of the indices. Then > ALL lookups > would be table scans. You could replace each primary key with a unique > constraint (not a unique index) to keep out the duplicates. > > Or search on a column that is not part of an index. > > Another solution could be to restart the server just before each test. > Doing that will ensure that the cache is empty and that any data > that needs > to come from disk will do so (including indices). Merely flushing > the cache > while running will commit changes to disk but I don't believe that it will > clear the cache. > > The excellent people on the MySQL internals discussion may be better > equipped to answer this. > > Curiously yours, > Shawn Green > Database Administrator > Unimin Corporation - Spruce Pine > > > > > > > "Leonardo > > Francalanci" To: "Mysql" > <[EMAIL PROTECTED]> > <[EMAIL PROTECTED]cc: > > tel.ie> Fax to: > >Subject: R: R: > why CPU is high while disks are idle in a table scan??? > 06/22/2004 03:55 > > AM > > > > > > > > > > First: thank you for sparing so much time with me. > > > There are 3 major factors determining your performance: 1) the > > speed of the > > CPU, 2) the size if your RAM, and 3) the data transfer speed of > > your disks. > > I thought that the size of tables in the db would have made a difference. > From the analysis you did and from my tests it doesn't look so... > > > 5) If your application is smart enough, you can split your data over > > several servers. This would be a SERIOUS performance increase. > > In fact I'm writing a jdbc driver in java that I hope will be included > in c-jdbc (http://c-jdbc.objectweb.org/), which is a java database > cluster middleware (check it out, it seems very interesting) > > > 2) you can move parts of your data to other disks. Using multiple I/O > > channels should be faster than a single channel. > > That is what I am trying right now, but I am having problems because > it seems that there is no disk I/O if I access by index on my tables, > even if I access most of the table data (while with a table scan > disks work as expected). > > Does it happen because I have 1Gb of Ram? If yes: I know that this > sounds like a silly question but... How can I "fill" the memory of my > machine so it doesn't cache data? (I can't just remove memory, I don't > have physical access to the machine) > > > Thank you > > > -- > MySQL General Mailing List > For list archives: http://lists.mysql.com/mysql > To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED] > > > > > > > -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: R: R: why CPU is high while disks are idle in a table scan???
Well, this is a very unique question. How can we force MySQL to IGNORE an index. Is this so you can test your disk systems data throughput? I am quite interested in why you need to see the disk activity in order to understand if it's working or not. I have concentrated so hard and so long on making sure that it DOES use indices whenever they are available I am not sure how to answer that question. I know there is a setting that controls the size of the "key buffer" perhaps if you tune it to be really small it will force MySQL to read the indices straight off of the disk (or maybe not, I am not sure). Have you tried specifying a "wrong" index in your select statement (so as to force a table scan) with the USE keyword? Another solution could be to just get rid of the indices. Then ALL lookups would be table scans. You could replace each primary key with a unique constraint (not a unique index) to keep out the duplicates. Or search on a column that is not part of an index. Another solution could be to restart the server just before each test. Doing that will ensure that the cache is empty and that any data that needs to come from disk will do so (including indices). Merely flushing the cache while running will commit changes to disk but I don't believe that it will clear the cache. The excellent people on the MySQL internals discussion may be better equipped to answer this. Curiously yours, Shawn Green Database Administrator Unimin Corporation - Spruce Pine "Leonardo Francalanci" To: "Mysql" <[EMAIL PROTECTED]> <[EMAIL PROTECTED]cc: tel.ie> Fax to: Subject: R: R: why CPU is high while disks are idle in a table scan??? 06/22/2004 03:55 AM First: thank you for sparing so much time with me. > There are 3 major factors determining your performance: 1) the > speed of the > CPU, 2) the size if your RAM, and 3) the data transfer speed of > your disks. I thought that the size of tables in the db would have made a difference. >From the analysis you did and from my tests it doesn't look so... > 5) If your application is smart enough, you can split your data over > several servers. This would be a SERIOUS performance increase. In fact I'm writing a jdbc driver in java that I hope will be included in c-jdbc (http://c-jdbc.objectweb.org/), which is a java database cluster middleware (check it out, it seems very interesting) > 2) you can move parts of your data to other disks. Using multiple I/O > channels should be faster than a single channel. That is what I am trying right now, but I am having problems because it seems that there is no disk I/O if I access by index on my tables, even if I access most of the table data (while with a table scan disks work as expected). Does it happen because I have 1Gb of Ram? If yes: I know that this sounds like a silly question but... How can I "fill" the memory of my machine so it doesn't cache data? (I can't just remove memory, I don't have physical access to the machine) Thank you -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
R: R: why CPU is high while disks are idle in a table scan???
First: thank you for sparing so much time with me. > There are 3 major factors determining your performance: 1) the > speed of the > CPU, 2) the size if your RAM, and 3) the data transfer speed of > your disks. I thought that the size of tables in the db would have made a difference. >From the analysis you did and from my tests it doesn't look so... > 5) If your application is smart enough, you can split your data over > several servers. This would be a SERIOUS performance increase. In fact I'm writing a jdbc driver in java that I hope will be included in c-jdbc (http://c-jdbc.objectweb.org/), which is a java database cluster middleware (check it out, it seems very interesting) > 2) you can move parts of your data to other disks. Using multiple I/O > channels should be faster than a single channel. That is what I am trying right now, but I am having problems because it seems that there is no disk I/O if I access by index on my tables, even if I access most of the table data (while with a table scan disks work as expected). Does it happen because I have 1Gb of Ram? If yes: I know that this sounds like a silly question but... How can I "fill" the memory of my machine so it doesn't cache data? (I can't just remove memory, I don't have physical access to the machine) Thank you -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: R: why CPU is high while disks are idle in a table scan???
tel.ie> Fax to: Subject: R: why CPU is high while disks are idle in a table scan??? 06/21/2004 01:20 PM Sorry, now it runs as espected (but I don't know what happened, maybe doing other queries I emptied out the cache) Thank you anyway for your replies. And: could somebody answer my previous question? I wrote: I have to deal with pretty big data, such a table (say BIG_TABLE) with 16.000.000 of rows. The table has just one primary index (a simple int). I thought that splitting data among multiple tables would have boosted performance, so I put my data in 16 different tables (everything is on one disk). The first table had IDs from 1 to 1.500.000, the second the range 151-300 and so on. When I made performance tests, I found that I didn't get any performance gain. So I tried (just for test purpose) to create a table of 10 rows (say SMALL_TABLE) with some data (IDs from 1.000.000 to 1.100.000) from my big table BIG_TABLE. Tests were like follows: (pseudocode) for 10 times read a column from BIG_TABLE where ID= for 10 times read a column from SMALL_TABLE where ID= The results are the same! There is no noticeable difference between a table with 16M rows and a table with 100K rows! Is that because my IDs are simple int from 1 to n (without gaps)? I've tried to do 10 inserts in both tables, and BIG_TABLE takes 45 seconds while SMALL_TABLE takes 43 secs (not such a big difference for a table 160 times smaller!) The only big performance gain I got is from a SUM on a big range, because MySql in this case does a full scan on the SMALL_TABLE which is faster than an access by index on the BIG_TABLE. Am I doing something wrong? Isn't splitting supposed to improve performance? Or it is true only if you can split data on multiple disks? Thank you Leonardo -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: R: R: why CPU is high while disks are idle in a table scan???
On Mon, Jun 21, 2004 at 07:37:50PM +0200, Leonardo Francalanci wrote: > > Oh. You didn't say thta. :-) > > Well, I wrote > " Doing a simple "SELECT SUM(MYFIELD) FROM MYTABLE" I noticed " > but don't worry ;) Oh. Damned Mondays. I never liked 'em. :-( > > How much RAM does your machine have? If it's nearly all cached, > > you'll be CPU bound rather than disk bound. > > 1Gb. > > I'm using Solaris 8. > > The SUM() works as expected now (disks works a lot), but accessing by key > is very cpu-intense and still 0% work on disk. > > To sum up: table scans are very disk-intense (which make sense), > accessing by key leaves disk on idle (I see that it should not work > as with a table scan, but at least a little!) Hmm. -- Jeremy D. Zawodny | Perl, Web, MySQL, Linux Magazine, Yahoo! <[EMAIL PROTECTED]> | http://jeremy.zawodny.com/ [book] High Performance MySQL -- http://highperformancemysql.com/ -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
R: R: why CPU is high while disks are idle in a table scan???
> Oh. You didn't say thta. :-) Well, I wrote " Doing a simple "SELECT SUM(MYFIELD) FROM MYTABLE" I noticed " but don't worry ;) > How much RAM does your machine have? If it's nearly all cached, > you'll be CPU bound rather than disk bound. 1Gb. I'm using Solaris 8. The SUM() works as expected now (disks works a lot), but accessing by key is very cpu-intense and still 0% work on disk. To sum up: table scans are very disk-intense (which make sense), accessing by key leaves disk on idle (I see that it should not work as with a table scan, but at least a little!) -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: R: why CPU is high while disks are idle in a table scan???
On Mon, Jun 21, 2004 at 07:10:59PM +0200, Leonardo Francalanci wrote: > > Hard to say, but in a table scan the CPU does have a lot of work to > > do. It needs to do about 16,000,000 comparisons (based on your info). > > Why comparison? It's a sum... Oh. You didn't say thta. :-) > And the table is not small: 272,000,000 bytes! > > And disk is very low (almost 0%) How much RAM does your machine have? If it's nearly all cached, you'll be CPU bound rather than disk bound. Jeremy -- Jeremy D. Zawodny | Perl, Web, MySQL, Linux Magazine, Yahoo! <[EMAIL PROTECTED]> | http://jeremy.zawodny.com/ [book] High Performance MySQL -- http://highperformancemysql.com/ -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
R: why CPU is high while disks are idle in a table scan???
Sorry, now it runs as espected (but I don't know what happened, maybe doing other queries I emptied out the cache) Thank you anyway for your replies. And: could somebody answer my previous question? I wrote: I have to deal with pretty big data, such a table (say BIG_TABLE) with 16.000.000 of rows. The table has just one primary index (a simple int). I thought that splitting data among multiple tables would have boosted performance, so I put my data in 16 different tables (everything is on one disk). The first table had IDs from 1 to 1.500.000, the second the range 151-300 and so on. When I made performance tests, I found that I didn't get any performance gain. So I tried (just for test purpose) to create a table of 10 rows (say SMALL_TABLE) with some data (IDs from 1.000.000 to 1.100.000) from my big table BIG_TABLE. Tests were like follows: (pseudocode) for 10 times read a column from BIG_TABLE where ID= for 10 times read a column from SMALL_TABLE where ID= The results are the same! There is no noticeable difference between a table with 16M rows and a table with 100K rows! Is that because my IDs are simple int from 1 to n (without gaps)? I've tried to do 10 inserts in both tables, and BIG_TABLE takes 45 seconds while SMALL_TABLE takes 43 secs (not such a big difference for a table 160 times smaller!) The only big performance gain I got is from a SUM on a big range, because MySql in this case does a full scan on the SMALL_TABLE which is faster than an access by index on the BIG_TABLE. Am I doing something wrong? Isn't splitting supposed to improve performance? Or it is true only if you can split data on multiple disks? Thank you Leonardo -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
R: why CPU is high while disks are idle in a table scan???
> Hard to say, but in a table scan the CPU does have a lot of work to > do. It needs to do about 16,000,000 comparisons (based on your info). Why comparison? It's a sum... And the table is not small: 272,000,000 bytes! And disk is very low (almost 0%) -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]