Re: InnoDb Table Performance problem

Chris Nolan Thu, 12 Feb 2004 09:28:54 -0800

Michael McTernan wrote:

Hi there,

I would try to sum it up like this:

1. Discs work best when used as purely sequential devices. 2. Inserting those BLOBs takes up space somewhere. Strictly speaking, on the disc they may very well be in between certain rows.
Good points - I guess there is no way for the database to control where on
the file system the data ends up.  I wonder if using a raw parition for the
database under InnoDB would fix this?  I guess that would give the storage
engine the opportunity to put the BLOB data in a heap at one end of a disk
while keeping the table data together at the other, although this sounds
like a lot of work for the storage engine.  What would be good would be a
tool to show how 'fragmented' a database table is on disk?

A HEX editor? :-) Finding out what the developers at MySQL AB like to drink and sending them a few megalitres of it? :-)

In all seriousness, you haven't got a great many options (although I could be completely wrong, in which case some kind person on this list will enlighten / flame me), but using OPTIMIZE TABLE on MyISAM tables and ALTER TABLE TYPE = InnoDB on InnoDB tables will defrag your table for you, possibly helping matters. In each case, the table has a SHARE MODE lock set (read only), copies all existing rows to a new table such that they are positioned optimally, drops the old table and renames the new table to the original name.

InnoDB looks at tablespace files and raw partitions in the same way and manages space within them in a method related to that used by the Berkeley Fast File System. The only advantage of raw partition usage is bypassing filesystem logic which is very beneficial on some OSes (not Linux or any of the BSDs to my knowledge though - their FS layers are damned quick!).

3. As you may need to read over them, the disc has to seek much further and depending on the way BLOBs are placed inside the tablespace, seeking may be all over the place.

Another good point. It maybe that by creating the 'filemetadata' table in one go I've just ensured that most of the columns are linear on the disc. In which case re-creating the 'files' table and then adding the BLOB data back as a second step might fix this. My experiments show this is not the case though...

If you're doing INSERTs in AUTO_COMMIT mode, then InnoDB will have no choice but to put the BLOBs where it can. It *might* have some more options to play with if you wrap a few INSERTs in a transaction block.

Have you tried the following to see the results:

ALTER TABLE files TYPE = MyISAM; (Remove all indexes) SELECT SUM(revision) FROM files;


Okay.  I've made two new tables, one called 'myfiles' and one called
'inofiles' which are both duplicates of 'files' without any indices except
for the primary key.  Here are the tables I'm comparing:

CREATE TABLE `myfiles` (
 `id` int(10) unsigned NOT NULL auto_increment,
 `revision` int(10) unsigned NOT NULL default '0',
 `filenameid` int(10) unsigned NOT NULL default '0',
 `pathid` int(10) unsigned NOT NULL default '0',
 `extensionid` int(10) unsigned NOT NULL default '0',
 `isDeleted` enum('0','1') NOT NULL default '0',
 `filepathname` tinytext NOT NULL,
 `contentsGz` longblob NOT NULL,
 PRIMARY KEY  (`id`)
) TYPE=MyISAM;

CREATE TABLE `inofiles` (
 `id` int(10) unsigned NOT NULL auto_increment,
 `revision` int(10) unsigned NOT NULL default '0',
 `filenameid` int(10) unsigned NOT NULL default '0',
 `pathid` int(10) unsigned NOT NULL default '0',
 `extensionid` int(10) unsigned NOT NULL default '0',
 `isDeleted` enum('0','1') NOT NULL default '0',
 `filepathname` tinytext NOT NULL,
 `contentsGz` longblob NOT NULL,
 PRIMARY KEY  (`id`)
) TYPE=InnoDB;

Here's the results for the select statement you suggest:

[Restart MySQL]
mysql> SELECT SUM(revision) FROM inofiles;
+---------------+
| SUM(revision) |
+---------------+
|        561401 |
+---------------+
1 row in set (33.61 sec)

mysql> SELECT SUM(revision) from inofiles;
+---------------+
| SUM(revision) |
+---------------+
|        561401 |
+---------------+
1 row in set (31.79 sec)

[Restart MySQL]
mysql> SELECT SUM(revision) FROM myfiles;
+---------------+
| SUM(revision) |
+---------------+
|        561401 |
+---------------+
1 row in set (1 min 7.86 sec)


mysql> SELECT SUM(revision) from myfiles;
+---------------+
| SUM(revision) |
+---------------+
|        561401 |
+---------------+
1 row in set (1 min 7.83 sec)

I restarted MySQL before running each test, and then run the test twice,
being careful to change the case of the SELECT statements to avoid hitting
the query cache.  Strangely InnoDb is more than twice as fast...

Now, if I drop the BLOB column 'contentsGz' from both 'inofiles' and
'myfiles' and repeat:

[Restart database]
mysql> SELECT SUM(revision) FROM myfiles;
+---------------+
| SUM(revision) |
+---------------+
|        561401 |
+---------------+
1 row in set (0.34 sec)

mysql> SELECT SUM(revision) from myfiles;
+---------------+
| SUM(revision) |
+---------------+
|        561401 |
+---------------+
1 row in set (0.18 sec)

[Restart database]
mysql> SELECT SUM(revision) FROM inofiles;
+---------------+
| SUM(revision) |
+---------------+
|        561401 |
+---------------+
1 row in set (0.29 sec)

mysql> SELECT SUM(revision) from inofiles;
+---------------+
| SUM(revision) |
+---------------+
|        561401 |
+---------------+
1 row in set (0.10 sec)

So I guess the conclusion is that BLOBs are *really* bad for table scans in MyISAM _and_ InnoDB, although perhaps slightly less so for InnoDB. I guess the BLOB data must get interleaved with the non-BLOB data on disk, and that putting the BLOBs into a different table may cause some sort of partitioning between the tables on disk.

That may help. One thing you may want to do is re-run the above with an index on the revision column for both table types with and without BLOBs. I'm betting that you'll see little deviation between the BLOB-equipped tables and the BLOB-less ones. One thing you might want to investigate is storing your BLOBs in a MyISAM table and your other data in an InnoDB table.

You have to be impressed with InnoDB though - it's a full-on multiversioned storage engine and it manages to keep pace with MyISAM (and outperform it on occassion). Considering MyISAM's design brief was to be sleek, fast and space efficient (which gives it further speed advantages), InnoDB's comparitive overhead is hidden extremely well. Unfortunately, I really started using MySQL well after NuSphere buggered up everything and retired the Gemini table type - it would have been interesting to see all 4 table types be compared with all the nice speed improvements 4.0.x has brough along.

I'm guessing that the real problem here is that some of my queries are secretly doing table scans when they shouldn't, and that is causing a huge slowdown, although I might split the BLOB column into a different table such that table scans don't give me such a bad hit if they do happen.

There is a table I query often, with two ENUM columns, a TIMESTAMP column and a TEXT column. I often need to throw queries at it that look for substrings in the TEXT column for rows between two dates with certain values for the ENUM columns. There are indexes on the ENUM, TIMESTAMP and TEXT columns (prefix indexes of course). Unless I specify FORCE INDEX(date), MySQL does a full table scan, resulting in queries that take 2 minutes to complete. Using the FORCE INDEX(date) predicate, the query time drops to about 2 seconds.

Thanks,

Mike

You're welcome! I hope that this helps you out!

Regards,

Chris

-----Original Message-----
From: Chris Nolan [mailto:[EMAIL PROTECTED]
Sent: 12 February 2004 12:23
To: Michael McTernan
Cc: Benoit St-Jean; Mysql
Subject: Re: InnoDb Table Performance problem

Michael McTernan wrote:

Hi,

SELECT COUNT(*) for InnoDB tables is a know problem...  The table
handler (for InnoDB) has to do a table scan to count all rows...  This
particular case is optimized with MyISAM ...

Sure. But why is the tablescan ~100 times faster for the table

without the

BLOB column?

I would try to sum it up like this:

1. Discs work best when used as purely sequential devices.
2. Inserting those BLOBs takes up space somewhere. Strictly speaking, on
the disc they may very well be in between certain rows.
3. As you may need to read over them, the disc has to seek much further
and depending on the way BLOBs are placed inside the tablespace, seeking
may be all over the place.

Have you tried the following to see the results:

ALTER TABLE files TYPE = MyISAM;
(Remove all indexes)
SELECT SUM(revision) FROM files;

You may want to execute the above select on your InnoDB version of the
table as well.

Generally speaking, the execution time of COUNT(*) doesn't tell you very
much unless you are a bit more specific:

SELECT COUNT(revision) FROM files WHERE revision BETWEEN '1' AND '2600';

I am prepared to be shot down for being totally incorrect!

Regards,

Chris

This goes back to my original assumption that I thought the BLOB contents would be stored outside of the normal column data, meaning that

you have to

do an additional seek to retrieve a BLOB value, but that a BLOB column doesn't inflate the data over which a table scan has to iterate and so doesn't impact tablescan performance. Is this wrong? Or is

this different

for InnoDB vs MyISAM?

Of course, I'm aiming not to use table scans at all though :)

Thanks,

Mike
-----Original Message-----
From: Benoit St-Jean [mailto:[EMAIL PROTECTED]
Sent: 11 February 2004 22:47
To: Michael McTernan
Cc: Mysql
Subject: Re: InnoDb Table Performance problem
Michael McTernan wrote:

Hi there,

I'm making something similar to a file revision control

system, and using

MySQL on Linux as the database to drive it.  Almost all my tables are
InnoDB, and generally it is going very well, with the exception

of one table

that is always very slow.

This table holds the files within the database. It is defined

as follows:

CREATE TABLE `files` (
`id` int(10) unsigned NOT NULL auto_increment,
`revision` int(10) unsigned NOT NULL default '0',
`filenameid` int(10) unsigned NOT NULL default '0',
`pathid` int(10) unsigned NOT NULL default '0',
`extensionid` int(10) unsigned NOT NULL default '0',
`isDeleted` enum('0','1') NOT NULL default '0',
`filepathname` tinytext NOT NULL,
`contentsGz` longblob NOT NULL,
PRIMARY KEY  (`id`),
UNIQUE KEY `revision` (`revision`,`filepathname`(255)),
KEY `fpeindex` (`filenameid`,`pathid`,`extensionid`,`revision`),
KEY `filepathname` (`filepathname`(255)),
CONSTRAINT `0_3570` FOREIGN KEY (`filenameid`) REFERENCES `filenames`
(`id`),
CONSTRAINT `0_3571` FOREIGN KEY (`extensionid`) REFERENCES
`fileextensions` (`id`),
CONSTRAINT `0_3572` FOREIGN KEY (`pathid`) REFERENCES

`filepaths` (`id`)

) TYPE=InnoDB;

The 'contentsGz' column will have the contents of the file and will
typically be a couple of hundred kilobytes, but in some rare

cases as large

as 20 Megabytes.

Selects on this table always go very slowly. I've used EXPLAIN

to look at

what is going on, and carefully added a couple of multi-column

indexes that

have improved SELECT performance (this table is updated rarely,

so I'm not

too worried about INSERT performance). However, the

performance is still

really bad.

I tried creating an identical table with the exception that it

doesn't have

the 'contentsGz' column:

CREATE TABLE `filemetadata` (
`id` int(10) unsigned NOT NULL auto_increment,
`revision` int(10) unsigned NOT NULL default '0',
`filenameid` int(10) unsigned NOT NULL default '0',
`pathid` int(10) unsigned NOT NULL default '0',
`extensionid` int(10) unsigned NOT NULL default '0',
`isDeleted` enum('0','1') NOT NULL default '0',
`filepathname` tinytext NOT NULL,
PRIMARY KEY  (`id`),
UNIQUE KEY `revision` (`revision`,`filepathname`(255)),
KEY `fpeindex` (`filenameid`,`pathid`,`extensionid`,`revision`),
KEY `filepathname` (`filepathname`(255)),
CONSTRAINT `0_3651` FOREIGN KEY (`filenameid`) REFERENCES `filenames`
(`id`),
CONSTRAINT `0_3652` FOREIGN KEY (`extensionid`) REFERENCES
`fileextensions` (`id`),
CONSTRAINT `0_3653` FOREIGN KEY (`pathid`) REFERENCES

`filepaths` (`id`)

) TYPE=InnoDB;

I used UPDATE ... SELECT to copy all data from the 'files' table to
'filemetadata'.

Here is something I found suprising:

mysql> SELECT COUNT(1) FROM files;
+----------+
| COUNT(1) |
+----------+
|   101013 |
+----------+
1 row in set (32.42 sec)

mysql> SELECT COUNT(1) FROM filemetadata;
+----------+
| COUNT(1) |
+----------+
|   101013 |
+----------+
1 row in set (0.29 sec)

SELECT COUNT(*) for InnoDB tables is a know problem...  The table
handler (for InnoDB) has to do a table scan to count all rows...  This
particular case is optimized with MyISAM ...

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Re: InnoDb Table Performance problem

Reply via email to