Hi there,A HEX editor? :-) Finding out what the developers at MySQL AB like to drink and sending them a few megalitres of it? :-)
I would try to sum it up like this:
1. Discs work best when used as purely sequential devices.
2. Inserting those BLOBs takes up space somewhere. Strictly speaking, on
the disc they may very well be in between certain rows.
Good points - I guess there is no way for the database to control where on the file system the data ends up. I wonder if using a raw parition for the database under InnoDB would fix this? I guess that would give the storage engine the opportunity to put the BLOB data in a heap at one end of a disk while keeping the table data together at the other, although this sounds like a lot of work for the storage engine. What would be good would be a tool to show how 'fragmented' a database table is on disk?
In all seriousness, you haven't got a great many options (although I could be completely wrong, in which case some kind person on this list will enlighten / flame me), but using OPTIMIZE TABLE on MyISAM tables and ALTER TABLE TYPE = InnoDB on InnoDB tables will defrag your table for you, possibly helping matters. In each case, the table has a SHARE MODE lock set (read only), copies all existing rows to a new table such that they are positioned optimally, drops the old table and renames the new table to the original name.
InnoDB looks at tablespace files and raw partitions in the same way and manages space within them in a method related to that used by the Berkeley Fast File System. The only advantage of raw partition usage is bypassing filesystem logic which is very beneficial on some OSes (not Linux or any of the BSDs to my knowledge though - their FS layers are damned quick!).
If you're doing INSERTs in AUTO_COMMIT mode, then InnoDB will have no choice but to put the BLOBs where it can. It *might* have some more options to play with if you wrap a few INSERTs in a transaction block.3. As you may need to read over them, the disc has to seek much further
and depending on the way BLOBs are placed inside the tablespace, seeking
may be all over the place.
Another good point. It maybe that by creating the 'filemetadata' table in
one go I've just ensured that most of the columns are linear on the disc.
In which case re-creating the 'files' table and then adding the BLOB data
back as a second step might fix this. My experiments show this is not the
case though...
That may help. One thing you may want to do is re-run the above with an index on the revision column for both table types with and without BLOBs. I'm betting that you'll see little deviation between the BLOB-equipped tables and the BLOB-less ones. One thing you might want to investigate is storing your BLOBs in a MyISAM table and your other data in an InnoDB table.
Have you tried the following to see the results:
ALTER TABLE files TYPE = MyISAM;
(Remove all indexes)
SELECT SUM(revision) FROM files;
Okay. I've made two new tables, one called 'myfiles' and one called 'inofiles' which are both duplicates of 'files' without any indices except for the primary key. Here are the tables I'm comparing:
CREATE TABLE `myfiles` ( `id` int(10) unsigned NOT NULL auto_increment, `revision` int(10) unsigned NOT NULL default '0', `filenameid` int(10) unsigned NOT NULL default '0', `pathid` int(10) unsigned NOT NULL default '0', `extensionid` int(10) unsigned NOT NULL default '0', `isDeleted` enum('0','1') NOT NULL default '0', `filepathname` tinytext NOT NULL, `contentsGz` longblob NOT NULL, PRIMARY KEY (`id`) ) TYPE=MyISAM;
CREATE TABLE `inofiles` ( `id` int(10) unsigned NOT NULL auto_increment, `revision` int(10) unsigned NOT NULL default '0', `filenameid` int(10) unsigned NOT NULL default '0', `pathid` int(10) unsigned NOT NULL default '0', `extensionid` int(10) unsigned NOT NULL default '0', `isDeleted` enum('0','1') NOT NULL default '0', `filepathname` tinytext NOT NULL, `contentsGz` longblob NOT NULL, PRIMARY KEY (`id`) ) TYPE=InnoDB;
Here's the results for the select statement you suggest:
[Restart MySQL] mysql> SELECT SUM(revision) FROM inofiles; +---------------+ | SUM(revision) | +---------------+ | 561401 | +---------------+ 1 row in set (33.61 sec)
mysql> SELECT SUM(revision) from inofiles; +---------------+ | SUM(revision) | +---------------+ | 561401 | +---------------+ 1 row in set (31.79 sec)
[Restart MySQL] mysql> SELECT SUM(revision) FROM myfiles; +---------------+ | SUM(revision) | +---------------+ | 561401 | +---------------+ 1 row in set (1 min 7.86 sec)
mysql> SELECT SUM(revision) from myfiles; +---------------+ | SUM(revision) | +---------------+ | 561401 | +---------------+ 1 row in set (1 min 7.83 sec)
I restarted MySQL before running each test, and then run the test twice, being careful to change the case of the SELECT statements to avoid hitting the query cache. Strangely InnoDb is more than twice as fast...
Now, if I drop the BLOB column 'contentsGz' from both 'inofiles' and 'myfiles' and repeat:
[Restart database] mysql> SELECT SUM(revision) FROM myfiles; +---------------+ | SUM(revision) | +---------------+ | 561401 | +---------------+ 1 row in set (0.34 sec)
mysql> SELECT SUM(revision) from myfiles; +---------------+ | SUM(revision) | +---------------+ | 561401 | +---------------+ 1 row in set (0.18 sec)
[Restart database] mysql> SELECT SUM(revision) FROM inofiles; +---------------+ | SUM(revision) | +---------------+ | 561401 | +---------------+ 1 row in set (0.29 sec)
mysql> SELECT SUM(revision) from inofiles; +---------------+ | SUM(revision) | +---------------+ | 561401 | +---------------+ 1 row in set (0.10 sec)
So I guess the conclusion is that BLOBs are *really* bad for table scans in
MyISAM _and_ InnoDB, although perhaps slightly less so for InnoDB. I guess
the BLOB data must get interleaved with the non-BLOB data on disk, and that
putting the BLOBs into a different table may cause some sort of partitioning
between the tables on disk.
You have to be impressed with InnoDB though - it's a full-on multiversioned storage engine and it manages to keep pace with MyISAM (and outperform it on occassion). Considering MyISAM's design brief was to be sleek, fast and space efficient (which gives it further speed advantages), InnoDB's comparitive overhead is hidden extremely well. Unfortunately, I really started using MySQL well after NuSphere buggered up everything and retired the Gemini table type - it would have been interesting to see all 4 table types be compared with all the nice speed improvements 4.0.x has brough along.
I'm guessing that the real problem here is that some of my queries areThere is a table I query often, with two ENUM columns, a TIMESTAMP column and a TEXT column. I often need to throw queries at it that look for substrings in the TEXT column for rows between two dates with certain values for the ENUM columns. There are indexes on the ENUM, TIMESTAMP and TEXT columns (prefix indexes of course). Unless I specify FORCE INDEX(date), MySQL does a full table scan, resulting in queries that take 2 minutes to complete. Using the FORCE INDEX(date) predicate, the query time drops to about 2 seconds.
secretly doing table scans when they shouldn't, and that is causing a huge
slowdown, although I might split the BLOB column into a different table such
that table scans don't give me such a bad hit if they do happen.
Thanks,You're welcome! I hope that this helps you out!
Mike
Regards,
Chris
-----Original Message----- From: Chris Nolan [mailto:[EMAIL PROTECTED] Sent: 12 February 2004 12:23 To: Michael McTernan Cc: Benoit St-Jean; Mysql Subject: Re: InnoDb Table Performance problem
Michael McTernan wrote:
Hi,without the
Sure. But why is the tablescan ~100 times faster for the tableSELECT COUNT(*) for InnoDB tables is a know problem... The table handler (for InnoDB) has to do a table scan to count all rows... This particular case is optimized with MyISAM ...
BLOB column?I would try to sum it up like this:
1. Discs work best when used as purely sequential devices. 2. Inserting those BLOBs takes up space somewhere. Strictly speaking, on the disc they may very well be in between certain rows. 3. As you may need to read over them, the disc has to seek much further and depending on the way BLOBs are placed inside the tablespace, seeking may be all over the place.
Have you tried the following to see the results:
ALTER TABLE files TYPE = MyISAM; (Remove all indexes) SELECT SUM(revision) FROM files;
You may want to execute the above select on your InnoDB version of the table as well.
Generally speaking, the execution time of COUNT(*) doesn't tell you very much unless you are a bit more specific:
SELECT COUNT(revision) FROM files WHERE revision BETWEEN '1' AND '2600';
I am prepared to be shot down for being totally incorrect!
Regards,
Chris
This goes back to my original assumption that I thought the BLOB contentsyou have to
would be stored outside of the normal column data, meaning that
do an additional seek to retrieve a BLOB value, but that a BLOB columnthis different
doesn't inflate the data over which a table scan has to iterate and so
doesn't impact tablescan performance. Is this wrong? Or is
for InnoDB vs MyISAM?system, and using
Of course, I'm aiming not to use table scans at all though :)
Thanks,
Mike
-----Original Message----- From: Benoit St-Jean [mailto:[EMAIL PROTECTED] Sent: 11 February 2004 22:47 To: Michael McTernan Cc: Mysql Subject: Re: InnoDb Table Performance problem
Michael McTernan wrote:
Hi there,
I'm making something similar to a file revision control
performance is stillof one tableMySQL on Linux as the database to drive it. Almost all my tables are InnoDB, and generally it is going very well, with the exception
that is always very slow.as follows:
This table holds the files within the database. It is defined
`filepaths` (`id`)CREATE TABLE `files` ( `id` int(10) unsigned NOT NULL auto_increment, `revision` int(10) unsigned NOT NULL default '0', `filenameid` int(10) unsigned NOT NULL default '0', `pathid` int(10) unsigned NOT NULL default '0', `extensionid` int(10) unsigned NOT NULL default '0', `isDeleted` enum('0','1') NOT NULL default '0', `filepathname` tinytext NOT NULL, `contentsGz` longblob NOT NULL, PRIMARY KEY (`id`), UNIQUE KEY `revision` (`revision`,`filepathname`(255)), KEY `fpeindex` (`filenameid`,`pathid`,`extensionid`,`revision`), KEY `filepathname` (`filepathname`(255)), CONSTRAINT `0_3570` FOREIGN KEY (`filenameid`) REFERENCES `filenames` (`id`), CONSTRAINT `0_3571` FOREIGN KEY (`extensionid`) REFERENCES `fileextensions` (`id`), CONSTRAINT `0_3572` FOREIGN KEY (`pathid`) REFERENCES
) TYPE=InnoDB;cases as large
The 'contentsGz' column will have the contents of the file and will typically be a couple of hundred kilobytes, but in some rare
as 20 Megabytes.to look at
Selects on this table always go very slowly. I've used EXPLAIN
what is going on, and carefully added a couple of multi-columnindexes that
have improved SELECT performance (this table is updated rarely,so I'm not
too worried about INSERT performance). However, the
really bad.doesn't have
I tried creating an identical table with the exception that it
the 'contentsGz' column:`filepaths` (`id`)
CREATE TABLE `filemetadata` ( `id` int(10) unsigned NOT NULL auto_increment, `revision` int(10) unsigned NOT NULL default '0', `filenameid` int(10) unsigned NOT NULL default '0', `pathid` int(10) unsigned NOT NULL default '0', `extensionid` int(10) unsigned NOT NULL default '0', `isDeleted` enum('0','1') NOT NULL default '0', `filepathname` tinytext NOT NULL, PRIMARY KEY (`id`), UNIQUE KEY `revision` (`revision`,`filepathname`(255)), KEY `fpeindex` (`filenameid`,`pathid`,`extensionid`,`revision`), KEY `filepathname` (`filepathname`(255)), CONSTRAINT `0_3651` FOREIGN KEY (`filenameid`) REFERENCES `filenames` (`id`), CONSTRAINT `0_3652` FOREIGN KEY (`extensionid`) REFERENCES `fileextensions` (`id`), CONSTRAINT `0_3653` FOREIGN KEY (`pathid`) REFERENCES
) TYPE=InnoDB;
I used UPDATE ... SELECT to copy all data from the 'files' table to 'filemetadata'.
Here is something I found suprising:
mysql> SELECT COUNT(1) FROM files; +----------+ | COUNT(1) | +----------+ | 101013 | +----------+ 1 row in set (32.42 sec)
mysql> SELECT COUNT(1) FROM filemetadata; +----------+ | COUNT(1) | +----------+ | 101013 | +----------+ 1 row in set (0.29 sec)
SELECT COUNT(*) for InnoDB tables is a know problem... The table handler (for InnoDB) has to do a table scan to count all rows... This particular case is optimized with MyISAM ...
-- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]
-- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]