Hi Chris, > A HEX editor? :-) Finding out what the developers at MySQL AB like to > drink and sending them a few megalitres of it? :-)
Heh heh :-) > In all seriousness, you haven't got a great many options (although I > could be completely wrong, in which case some kind person on this list > will enlighten / flame me), but using OPTIMIZE TABLE on MyISAM tables > and ALTER TABLE TYPE = InnoDB on InnoDB tables will defrag your table > for you, possibly helping matters. I'm pretty sure that the new tables I created for the test would have been defraged since I ALTER'd them a couple of times to drop indices. > InnoDB looks at tablespace files and raw partitions in the same way and > manages space within them in a method related to that used by the > Berkeley Fast File System. Sounds like it is already very efficient then. > It *might* have some more > options to play with if you wrap a few INSERTs in a transaction block. That's a good idea - I could certainly try that. A future version of InnoDb may also take advantage of this I guess :) > That may help. One thing you may want to do is re-run the above with an > index on the revision column for both table types with and without > BLOBs. I did try adding indices, and you are correct, it brings the times right down, so it is only table scan performance that is at fault here. > You have to be impressed with InnoDB though - it's a full-on > multiversioned storage engine and it manages to keep pace with MyISAM > (and outperform it on occassion). Completely. I'm impressed with the whole lot of MySQL - it's brilliant. > Using the FORCE INDEX(date) predicate, the query time drops to about 2 seconds. I've had to do this in the past too. I'll be looking out for slow queries in the logs and then EXPLAINing them to see where I can make this improvement. Thankyou for all your help. While there is no simple solution, I'm a lot more sure of my options now :) Cheers, Mike > -----Original Message----- > From: Chris Nolan [mailto:[EMAIL PROTECTED] > Sent: 12 February 2004 17:28 > To: Michael McTernan > Cc: Mysql; Benoit St-Jean > Subject: Re: InnoDb Table Performance problem > > > Michael McTernan wrote: > > >Hi there, > > > > > > > >>I would try to sum it up like this: > >> > >>1. Discs work best when used as purely sequential devices. > >>2. Inserting those BLOBs takes up space somewhere. Strictly speaking, on > >>the disc they may very well be in between certain rows. > >> > >> > > > >Good points - I guess there is no way for the database to > control where on > >the file system the data ends up. I wonder if using a raw > parition for the > >database under InnoDB would fix this? I guess that would give > the storage > >engine the opportunity to put the BLOB data in a heap at one end > of a disk > >while keeping the table data together at the other, although this sounds > >like a lot of work for the storage engine. What would be good would be a > >tool to show how 'fragmented' a database table is on disk? > > > > > > > A HEX editor? :-) Finding out what the developers at MySQL AB like to > drink and sending them a few megalitres of it? :-) > > In all seriousness, you haven't got a great many options (although I > could be completely wrong, in which case some kind person on this list > will enlighten / flame me), but using OPTIMIZE TABLE on MyISAM tables > and ALTER TABLE TYPE = InnoDB on InnoDB tables will defrag your table > for you, possibly helping matters. In each case, the table has a SHARE > MODE lock set (read only), copies all existing rows to a new table such > that they are positioned optimally, drops the old table and renames the > new table to the original name. > > InnoDB looks at tablespace files and raw partitions in the same way and > manages space within them in a method related to that used by the > Berkeley Fast File System. The only advantage of raw partition usage is > bypassing filesystem logic which is very beneficial on some OSes (not > Linux or any of the BSDs to my knowledge though - their FS layers are > damned quick!). > > >>3. As you may need to read over them, the disc has to seek much further > >>and depending on the way BLOBs are placed inside the tablespace, seeking > >>may be all over the place. > >> > >> > > > >Another good point. It maybe that by creating the > 'filemetadata' table in > >one go I've just ensured that most of the columns are linear on the disc. > >In which case re-creating the 'files' table and then adding the BLOB data > >back as a second step might fix this. My experiments show this > is not the > >case though... > > > > > If you're doing INSERTs in AUTO_COMMIT mode, then InnoDB will have no > choice but to put the BLOBs where it can. It *might* have some more > options to play with if you wrap a few INSERTs in a transaction block. > > > > > > >>Have you tried the following to see the results: > >> > >>ALTER TABLE files TYPE = MyISAM; > >>(Remove all indexes) > >>SELECT SUM(revision) FROM files; > >> > >> > > > >Okay. I've made two new tables, one called 'myfiles' and one called > >'inofiles' which are both duplicates of 'files' without any > indices except > >for the primary key. Here are the tables I'm comparing: > > > >CREATE TABLE `myfiles` ( > > `id` int(10) unsigned NOT NULL auto_increment, > > `revision` int(10) unsigned NOT NULL default '0', > > `filenameid` int(10) unsigned NOT NULL default '0', > > `pathid` int(10) unsigned NOT NULL default '0', > > `extensionid` int(10) unsigned NOT NULL default '0', > > `isDeleted` enum('0','1') NOT NULL default '0', > > `filepathname` tinytext NOT NULL, > > `contentsGz` longblob NOT NULL, > > PRIMARY KEY (`id`) > >) TYPE=MyISAM; > > > >CREATE TABLE `inofiles` ( > > `id` int(10) unsigned NOT NULL auto_increment, > > `revision` int(10) unsigned NOT NULL default '0', > > `filenameid` int(10) unsigned NOT NULL default '0', > > `pathid` int(10) unsigned NOT NULL default '0', > > `extensionid` int(10) unsigned NOT NULL default '0', > > `isDeleted` enum('0','1') NOT NULL default '0', > > `filepathname` tinytext NOT NULL, > > `contentsGz` longblob NOT NULL, > > PRIMARY KEY (`id`) > >) TYPE=InnoDB; > > > > > >Here's the results for the select statement you suggest: > > > >[Restart MySQL] > >mysql> SELECT SUM(revision) FROM inofiles; > >+---------------+ > >| SUM(revision) | > >+---------------+ > >| 561401 | > >+---------------+ > >1 row in set (33.61 sec) > > > >mysql> SELECT SUM(revision) from inofiles; > >+---------------+ > >| SUM(revision) | > >+---------------+ > >| 561401 | > >+---------------+ > >1 row in set (31.79 sec) > > > >[Restart MySQL] > >mysql> SELECT SUM(revision) FROM myfiles; > >+---------------+ > >| SUM(revision) | > >+---------------+ > >| 561401 | > >+---------------+ > >1 row in set (1 min 7.86 sec) > > > > > >mysql> SELECT SUM(revision) from myfiles; > >+---------------+ > >| SUM(revision) | > >+---------------+ > >| 561401 | > >+---------------+ > >1 row in set (1 min 7.83 sec) > > > >I restarted MySQL before running each test, and then run the test twice, > >being careful to change the case of the SELECT statements to > avoid hitting > >the query cache. Strangely InnoDb is more than twice as fast... > > > >Now, if I drop the BLOB column 'contentsGz' from both 'inofiles' and > >'myfiles' and repeat: > > > >[Restart database] > >mysql> SELECT SUM(revision) FROM myfiles; > >+---------------+ > >| SUM(revision) | > >+---------------+ > >| 561401 | > >+---------------+ > >1 row in set (0.34 sec) > > > >mysql> SELECT SUM(revision) from myfiles; > >+---------------+ > >| SUM(revision) | > >+---------------+ > >| 561401 | > >+---------------+ > >1 row in set (0.18 sec) > > > >[Restart database] > >mysql> SELECT SUM(revision) FROM inofiles; > >+---------------+ > >| SUM(revision) | > >+---------------+ > >| 561401 | > >+---------------+ > >1 row in set (0.29 sec) > > > >mysql> SELECT SUM(revision) from inofiles; > >+---------------+ > >| SUM(revision) | > >+---------------+ > >| 561401 | > >+---------------+ > >1 row in set (0.10 sec) > > > >So I guess the conclusion is that BLOBs are *really* bad for > table scans in > >MyISAM _and_ InnoDB, although perhaps slightly less so for > InnoDB. I guess > >the BLOB data must get interleaved with the non-BLOB data on > disk, and that > >putting the BLOBs into a different table may cause some sort of > partitioning > >between the tables on disk. > > > > > That may help. One thing you may want to do is re-run the above with an > index on the revision column for both table types with and without > BLOBs. I'm betting that you'll see little deviation between the > BLOB-equipped tables and the BLOB-less ones. One thing you might want to > investigate is storing your BLOBs in a MyISAM table and your other data > in an InnoDB table. > > You have to be impressed with InnoDB though - it's a full-on > multiversioned storage engine and it manages to keep pace with MyISAM > (and outperform it on occassion). Considering MyISAM's design brief was > to be sleek, fast and space efficient (which gives it further speed > advantages), InnoDB's comparitive overhead is hidden extremely well. > Unfortunately, I really started using MySQL well after NuSphere buggered > up everything and retired the Gemini table type - it would have been > interesting to see all 4 table types be compared with all the nice speed > improvements 4.0.x has brough along. > > >I'm guessing that the real problem here is that some of my queries are > >secretly doing table scans when they shouldn't, and that is > causing a huge > >slowdown, although I might split the BLOB column into a > different table such > >that table scans don't give me such a bad hit if they do happen. > > > > > There is a table I query often, with two ENUM columns, a TIMESTAMP > column and a TEXT column. I often need to throw queries at it that look > for substrings in the TEXT column for rows between two dates with > certain values for the ENUM columns. There are indexes on the ENUM, > TIMESTAMP and TEXT columns (prefix indexes of course). Unless I specify > FORCE INDEX(date), MySQL does a full table scan, resulting in queries > that take 2 minutes to complete. Using the FORCE INDEX(date) predicate, > the query time drops to about 2 seconds. > > >Thanks, > > > >Mike > > > > > > > You're welcome! I hope that this helps you out! > > Regards, > > Chris > > >>-----Original Message----- > >>From: Chris Nolan [mailto:[EMAIL PROTECTED] > >>Sent: 12 February 2004 12:23 > >>To: Michael McTernan > >>Cc: Benoit St-Jean; Mysql > >>Subject: Re: InnoDb Table Performance problem > >> > >> > >>Michael McTernan wrote: > >> > >> > >> > >>>Hi, > >>> > >>> > >>> > >>> > >>> > >>>>SELECT COUNT(*) for InnoDB tables is a know problem... The table > >>>>handler (for InnoDB) has to do a table scan to count all rows... This > >>>>particular case is optimized with MyISAM ... > >>>> > >>>> > >>>> > >>>> > >>>Sure. But why is the tablescan ~100 times faster for the table > >>> > >>> > >>without the > >> > >> > >>>BLOB column? > >>> > >>> > >>> > >>> > >>I would try to sum it up like this: > >> > >>1. Discs work best when used as purely sequential devices. > >>2. Inserting those BLOBs takes up space somewhere. Strictly speaking, on > >>the disc they may very well be in between certain rows. > >>3. As you may need to read over them, the disc has to seek much further > >>and depending on the way BLOBs are placed inside the tablespace, seeking > >>may be all over the place. > >> > >>Have you tried the following to see the results: > >> > >>ALTER TABLE files TYPE = MyISAM; > >>(Remove all indexes) > >>SELECT SUM(revision) FROM files; > >> > >>You may want to execute the above select on your InnoDB version of the > >>table as well. > >> > >>Generally speaking, the execution time of COUNT(*) doesn't tell you very > >>much unless you are a bit more specific: > >> > >>SELECT COUNT(revision) FROM files WHERE revision BETWEEN '1' AND '2600'; > >> > >>I am prepared to be shot down for being totally incorrect! > >> > >>Regards, > >> > >>Chris > >> > >> > >> > >>>This goes back to my original assumption that I thought the > BLOB contents > >>>would be stored outside of the normal column data, meaning that > >>> > >>> > >>you have to > >> > >> > >>>do an additional seek to retrieve a BLOB value, but that a BLOB column > >>>doesn't inflate the data over which a table scan has to iterate and so > >>>doesn't impact tablescan performance. Is this wrong? Or is > >>> > >>> > >>this different > >> > >> > >>>for InnoDB vs MyISAM? > >>> > >>>Of course, I'm aiming not to use table scans at all though :) > >>> > >>>Thanks, > >>> > >>>Mike > >>> > >>> > >>> > >>> > >>> > >>>>-----Original Message----- > >>>>From: Benoit St-Jean [mailto:[EMAIL PROTECTED] > >>>>Sent: 11 February 2004 22:47 > >>>>To: Michael McTernan > >>>>Cc: Mysql > >>>>Subject: Re: InnoDb Table Performance problem > >>>> > >>>> > >>>>Michael McTernan wrote: > >>>> > >>>> > >>>> > >>>> > >>>> > >>>>>Hi there, > >>>>> > >>>>>I'm making something similar to a file revision control > >>>>> > >>>>> > >>system, and using > >> > >> > >>>>>MySQL on Linux as the database to drive it. Almost all my tables are > >>>>>InnoDB, and generally it is going very well, with the exception > >>>>> > >>>>> > >>>>> > >>>>> > >>>>of one table > >>>> > >>>> > >>>> > >>>> > >>>>>that is always very slow. > >>>>> > >>>>>This table holds the files within the database. It is defined > >>>>> > >>>>> > >>>>> > >>>>> > >>>>as follows: > >>>> > >>>> > >>>> > >>>> > >>>>>CREATE TABLE `files` ( > >>>>>`id` int(10) unsigned NOT NULL auto_increment, > >>>>>`revision` int(10) unsigned NOT NULL default '0', > >>>>>`filenameid` int(10) unsigned NOT NULL default '0', > >>>>>`pathid` int(10) unsigned NOT NULL default '0', > >>>>>`extensionid` int(10) unsigned NOT NULL default '0', > >>>>>`isDeleted` enum('0','1') NOT NULL default '0', > >>>>>`filepathname` tinytext NOT NULL, > >>>>>`contentsGz` longblob NOT NULL, > >>>>>PRIMARY KEY (`id`), > >>>>>UNIQUE KEY `revision` (`revision`,`filepathname`(255)), > >>>>>KEY `fpeindex` (`filenameid`,`pathid`,`extensionid`,`revision`), > >>>>>KEY `filepathname` (`filepathname`(255)), > >>>>>CONSTRAINT `0_3570` FOREIGN KEY (`filenameid`) REFERENCES `filenames` > >>>>>(`id`), > >>>>>CONSTRAINT `0_3571` FOREIGN KEY (`extensionid`) REFERENCES > >>>>>`fileextensions` (`id`), > >>>>>CONSTRAINT `0_3572` FOREIGN KEY (`pathid`) REFERENCES > >>>>> > >>>>> > >>>>> > >>>>> > >>>>`filepaths` (`id`) > >>>> > >>>> > >>>> > >>>> > >>>>>) TYPE=InnoDB; > >>>>> > >>>>>The 'contentsGz' column will have the contents of the file and will > >>>>>typically be a couple of hundred kilobytes, but in some rare > >>>>> > >>>>> > >>>>> > >>>>> > >>>>cases as large > >>>> > >>>> > >>>> > >>>> > >>>>>as 20 Megabytes. > >>>>> > >>>>>Selects on this table always go very slowly. I've used EXPLAIN > >>>>> > >>>>> > >>>>> > >>>>> > >>>>to look at > >>>> > >>>> > >>>> > >>>> > >>>>>what is going on, and carefully added a couple of multi-column > >>>>> > >>>>> > >>>>> > >>>>> > >>>>indexes that > >>>> > >>>> > >>>> > >>>> > >>>>>have improved SELECT performance (this table is updated rarely, > >>>>> > >>>>> > >>>>> > >>>>> > >>>>so I'm not > >>>> > >>>> > >>>> > >>>> > >>>>>too worried about INSERT performance). However, the > >>>>> > >>>>> > >>performance is still > >> > >> > >>>>>really bad. > >>>>> > >>>>>I tried creating an identical table with the exception that it > >>>>> > >>>>> > >>>>> > >>>>> > >>>>doesn't have > >>>> > >>>> > >>>> > >>>> > >>>>>the 'contentsGz' column: > >>>>> > >>>>>CREATE TABLE `filemetadata` ( > >>>>>`id` int(10) unsigned NOT NULL auto_increment, > >>>>>`revision` int(10) unsigned NOT NULL default '0', > >>>>>`filenameid` int(10) unsigned NOT NULL default '0', > >>>>>`pathid` int(10) unsigned NOT NULL default '0', > >>>>>`extensionid` int(10) unsigned NOT NULL default '0', > >>>>>`isDeleted` enum('0','1') NOT NULL default '0', > >>>>>`filepathname` tinytext NOT NULL, > >>>>>PRIMARY KEY (`id`), > >>>>>UNIQUE KEY `revision` (`revision`,`filepathname`(255)), > >>>>>KEY `fpeindex` (`filenameid`,`pathid`,`extensionid`,`revision`), > >>>>>KEY `filepathname` (`filepathname`(255)), > >>>>>CONSTRAINT `0_3651` FOREIGN KEY (`filenameid`) REFERENCES `filenames` > >>>>>(`id`), > >>>>>CONSTRAINT `0_3652` FOREIGN KEY (`extensionid`) REFERENCES > >>>>>`fileextensions` (`id`), > >>>>>CONSTRAINT `0_3653` FOREIGN KEY (`pathid`) REFERENCES > >>>>> > >>>>> > >>>>> > >>>>> > >>>>`filepaths` (`id`) > >>>> > >>>> > >>>> > >>>> > >>>>>) TYPE=InnoDB; > >>>>> > >>>>>I used UPDATE ... SELECT to copy all data from the 'files' table to > >>>>>'filemetadata'. > >>>>> > >>>>>Here is something I found suprising: > >>>>> > >>>>>mysql> SELECT COUNT(1) FROM files; > >>>>>+----------+ > >>>>>| COUNT(1) | > >>>>>+----------+ > >>>>>| 101013 | > >>>>>+----------+ > >>>>>1 row in set (32.42 sec) > >>>>> > >>>>>mysql> SELECT COUNT(1) FROM filemetadata; > >>>>>+----------+ > >>>>>| COUNT(1) | > >>>>>+----------+ > >>>>>| 101013 | > >>>>>+----------+ > >>>>>1 row in set (0.29 sec) > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>SELECT COUNT(*) for InnoDB tables is a know problem... The table > >>>>handler (for InnoDB) has to do a table scan to count all rows... This > >>>>particular case is optimized with MyISAM ... > >>>> > >>>> > >>>> > >>>> > >>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>-- > >>>>MySQL General Mailing List > >>>>For list archives: http://lists.mysql.com/mysql > >>>>To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>> > >>> > >>> > >>> > >>> > >> > >> > >> > >> > >> > > > > > > > > > > > > > -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]