Michael McTernan wrote:

Hi there,



I would try to sum it up like this:

1. Discs work best when used as purely sequential devices.
2. Inserting those BLOBs takes up space somewhere. Strictly speaking, on
the disc they may very well be in between certain rows.



Good points - I guess there is no way for the database to control where on the file system the data ends up. I wonder if using a raw parition for the database under InnoDB would fix this? I guess that would give the storage engine the opportunity to put the BLOB data in a heap at one end of a disk while keeping the table data together at the other, although this sounds like a lot of work for the storage engine. What would be good would be a tool to show how 'fragmented' a database table is on disk?



A HEX editor? :-) Finding out what the developers at MySQL AB like to drink and sending them a few megalitres of it? :-)

In all seriousness, you haven't got a great many options (although I could be completely wrong, in which case some kind person on this list will enlighten / flame me), but using OPTIMIZE TABLE on MyISAM tables and ALTER TABLE TYPE = InnoDB on InnoDB tables will defrag your table for you, possibly helping matters. In each case, the table has a SHARE MODE lock set (read only), copies all existing rows to a new table such that they are positioned optimally, drops the old table and renames the new table to the original name.

InnoDB looks at tablespace files and raw partitions in the same way and manages space within them in a method related to that used by the Berkeley Fast File System. The only advantage of raw partition usage is bypassing filesystem logic which is very beneficial on some OSes (not Linux or any of the BSDs to my knowledge though - their FS layers are damned quick!).

3. As you may need to read over them, the disc has to seek much further
and depending on the way BLOBs are placed inside the tablespace, seeking
may be all over the place.



Another good point. It maybe that by creating the 'filemetadata' table in
one go I've just ensured that most of the columns are linear on the disc.
In which case re-creating the 'files' table and then adding the BLOB data
back as a second step might fix this. My experiments show this is not the
case though...


If you're doing INSERTs in AUTO_COMMIT mode, then InnoDB will have no choice but to put the BLOBs where it can. It *might* have some more options to play with if you wrap a few INSERTs in a transaction block.



Have you tried the following to see the results:

ALTER TABLE files TYPE = MyISAM;
(Remove all indexes)
SELECT SUM(revision) FROM files;



Okay. I've made two new tables, one called 'myfiles' and one called 'inofiles' which are both duplicates of 'files' without any indices except for the primary key. Here are the tables I'm comparing:

CREATE TABLE `myfiles` (
 `id` int(10) unsigned NOT NULL auto_increment,
 `revision` int(10) unsigned NOT NULL default '0',
 `filenameid` int(10) unsigned NOT NULL default '0',
 `pathid` int(10) unsigned NOT NULL default '0',
 `extensionid` int(10) unsigned NOT NULL default '0',
 `isDeleted` enum('0','1') NOT NULL default '0',
 `filepathname` tinytext NOT NULL,
 `contentsGz` longblob NOT NULL,
 PRIMARY KEY  (`id`)
) TYPE=MyISAM;

CREATE TABLE `inofiles` (
 `id` int(10) unsigned NOT NULL auto_increment,
 `revision` int(10) unsigned NOT NULL default '0',
 `filenameid` int(10) unsigned NOT NULL default '0',
 `pathid` int(10) unsigned NOT NULL default '0',
 `extensionid` int(10) unsigned NOT NULL default '0',
 `isDeleted` enum('0','1') NOT NULL default '0',
 `filepathname` tinytext NOT NULL,
 `contentsGz` longblob NOT NULL,
 PRIMARY KEY  (`id`)
) TYPE=InnoDB;


Here's the results for the select statement you suggest:


[Restart MySQL]
mysql> SELECT SUM(revision) FROM inofiles;
+---------------+
| SUM(revision) |
+---------------+
|        561401 |
+---------------+
1 row in set (33.61 sec)

mysql> SELECT SUM(revision) from inofiles;
+---------------+
| SUM(revision) |
+---------------+
|        561401 |
+---------------+
1 row in set (31.79 sec)

[Restart MySQL]
mysql> SELECT SUM(revision) FROM myfiles;
+---------------+
| SUM(revision) |
+---------------+
|        561401 |
+---------------+
1 row in set (1 min 7.86 sec)


mysql> SELECT SUM(revision) from myfiles; +---------------+ | SUM(revision) | +---------------+ | 561401 | +---------------+ 1 row in set (1 min 7.83 sec)

I restarted MySQL before running each test, and then run the test twice,
being careful to change the case of the SELECT statements to avoid hitting
the query cache.  Strangely InnoDb is more than twice as fast...

Now, if I drop the BLOB column 'contentsGz' from both 'inofiles' and
'myfiles' and repeat:

[Restart database]
mysql> SELECT SUM(revision) FROM myfiles;
+---------------+
| SUM(revision) |
+---------------+
|        561401 |
+---------------+
1 row in set (0.34 sec)

mysql> SELECT SUM(revision) from myfiles;
+---------------+
| SUM(revision) |
+---------------+
|        561401 |
+---------------+
1 row in set (0.18 sec)

[Restart database]
mysql> SELECT SUM(revision) FROM inofiles;
+---------------+
| SUM(revision) |
+---------------+
|        561401 |
+---------------+
1 row in set (0.29 sec)

mysql> SELECT SUM(revision) from inofiles;
+---------------+
| SUM(revision) |
+---------------+
|        561401 |
+---------------+
1 row in set (0.10 sec)

So I guess the conclusion is that BLOBs are *really* bad for table scans in
MyISAM _and_ InnoDB, although perhaps slightly less so for InnoDB. I guess
the BLOB data must get interleaved with the non-BLOB data on disk, and that
putting the BLOBs into a different table may cause some sort of partitioning
between the tables on disk.


That may help. One thing you may want to do is re-run the above with an index on the revision column for both table types with and without BLOBs. I'm betting that you'll see little deviation between the BLOB-equipped tables and the BLOB-less ones. One thing you might want to investigate is storing your BLOBs in a MyISAM table and your other data in an InnoDB table.

You have to be impressed with InnoDB though - it's a full-on multiversioned storage engine and it manages to keep pace with MyISAM (and outperform it on occassion). Considering MyISAM's design brief was to be sleek, fast and space efficient (which gives it further speed advantages), InnoDB's comparitive overhead is hidden extremely well. Unfortunately, I really started using MySQL well after NuSphere buggered up everything and retired the Gemini table type - it would have been interesting to see all 4 table types be compared with all the nice speed improvements 4.0.x has brough along.

I'm guessing that the real problem here is that some of my queries are
secretly doing table scans when they shouldn't, and that is causing a huge
slowdown, although I might split the BLOB column into a different table such
that table scans don't give me such a bad hit if they do happen.


There is a table I query often, with two ENUM columns, a TIMESTAMP column and a TEXT column. I often need to throw queries at it that look for substrings in the TEXT column for rows between two dates with certain values for the ENUM columns. There are indexes on the ENUM, TIMESTAMP and TEXT columns (prefix indexes of course). Unless I specify FORCE INDEX(date), MySQL does a full table scan, resulting in queries that take 2 minutes to complete. Using the FORCE INDEX(date) predicate, the query time drops to about 2 seconds.

Thanks,

Mike



You're welcome! I hope that this helps you out!

Regards,

Chris

-----Original Message-----
From: Chris Nolan [mailto:[EMAIL PROTECTED]
Sent: 12 February 2004 12:23
To: Michael McTernan
Cc: Benoit St-Jean; Mysql
Subject: Re: InnoDb Table Performance problem


Michael McTernan wrote:




Hi,





SELECT COUNT(*) for InnoDB tables is a know problem...  The table
handler (for InnoDB) has to do a table scan to count all rows...  This
particular case is optimized with MyISAM ...




Sure. But why is the tablescan ~100 times faster for the table


without the


BLOB column?




I would try to sum it up like this:

1. Discs work best when used as purely sequential devices.
2. Inserting those BLOBs takes up space somewhere. Strictly speaking, on
the disc they may very well be in between certain rows.
3. As you may need to read over them, the disc has to seek much further
and depending on the way BLOBs are placed inside the tablespace, seeking
may be all over the place.

Have you tried the following to see the results:

ALTER TABLE files TYPE = MyISAM;
(Remove all indexes)
SELECT SUM(revision) FROM files;

You may want to execute the above select on your InnoDB version of the
table as well.

Generally speaking, the execution time of COUNT(*) doesn't tell you very
much unless you are a bit more specific:

SELECT COUNT(revision) FROM files WHERE revision BETWEEN '1' AND '2600';

I am prepared to be shot down for being totally incorrect!

Regards,

Chris



This goes back to my original assumption that I thought the BLOB contents
would be stored outside of the normal column data, meaning that


you have to


do an additional seek to retrieve a BLOB value, but that a BLOB column
doesn't inflate the data over which a table scan has to iterate and so
doesn't impact tablescan performance. Is this wrong? Or is


this different


for InnoDB vs MyISAM?

Of course, I'm aiming not to use table scans at all though :)

Thanks,

Mike





-----Original Message-----
From: Benoit St-Jean [mailto:[EMAIL PROTECTED]
Sent: 11 February 2004 22:47
To: Michael McTernan
Cc: Mysql
Subject: Re: InnoDb Table Performance problem


Michael McTernan wrote:






Hi there,

I'm making something similar to a file revision control


system, and using


MySQL on Linux as the database to drive it.  Almost all my tables are
InnoDB, and generally it is going very well, with the exception




of one table




that is always very slow.

This table holds the files within the database. It is defined




as follows:




CREATE TABLE `files` (
`id` int(10) unsigned NOT NULL auto_increment,
`revision` int(10) unsigned NOT NULL default '0',
`filenameid` int(10) unsigned NOT NULL default '0',
`pathid` int(10) unsigned NOT NULL default '0',
`extensionid` int(10) unsigned NOT NULL default '0',
`isDeleted` enum('0','1') NOT NULL default '0',
`filepathname` tinytext NOT NULL,
`contentsGz` longblob NOT NULL,
PRIMARY KEY  (`id`),
UNIQUE KEY `revision` (`revision`,`filepathname`(255)),
KEY `fpeindex` (`filenameid`,`pathid`,`extensionid`,`revision`),
KEY `filepathname` (`filepathname`(255)),
CONSTRAINT `0_3570` FOREIGN KEY (`filenameid`) REFERENCES `filenames`
(`id`),
CONSTRAINT `0_3571` FOREIGN KEY (`extensionid`) REFERENCES
`fileextensions` (`id`),
CONSTRAINT `0_3572` FOREIGN KEY (`pathid`) REFERENCES




`filepaths` (`id`)




) TYPE=InnoDB;

The 'contentsGz' column will have the contents of the file and will
typically be a couple of hundred kilobytes, but in some rare




cases as large




as 20 Megabytes.

Selects on this table always go very slowly. I've used EXPLAIN




to look at




what is going on, and carefully added a couple of multi-column




indexes that




have improved SELECT performance (this table is updated rarely,




so I'm not




too worried about INSERT performance). However, the


performance is still


really bad.

I tried creating an identical table with the exception that it




doesn't have




the 'contentsGz' column:

CREATE TABLE `filemetadata` (
`id` int(10) unsigned NOT NULL auto_increment,
`revision` int(10) unsigned NOT NULL default '0',
`filenameid` int(10) unsigned NOT NULL default '0',
`pathid` int(10) unsigned NOT NULL default '0',
`extensionid` int(10) unsigned NOT NULL default '0',
`isDeleted` enum('0','1') NOT NULL default '0',
`filepathname` tinytext NOT NULL,
PRIMARY KEY  (`id`),
UNIQUE KEY `revision` (`revision`,`filepathname`(255)),
KEY `fpeindex` (`filenameid`,`pathid`,`extensionid`,`revision`),
KEY `filepathname` (`filepathname`(255)),
CONSTRAINT `0_3651` FOREIGN KEY (`filenameid`) REFERENCES `filenames`
(`id`),
CONSTRAINT `0_3652` FOREIGN KEY (`extensionid`) REFERENCES
`fileextensions` (`id`),
CONSTRAINT `0_3653` FOREIGN KEY (`pathid`) REFERENCES




`filepaths` (`id`)




) TYPE=InnoDB;

I used UPDATE ... SELECT to copy all data from the 'files' table to
'filemetadata'.

Here is something I found suprising:

mysql> SELECT COUNT(1) FROM files;
+----------+
| COUNT(1) |
+----------+
|   101013 |
+----------+
1 row in set (32.42 sec)

mysql> SELECT COUNT(1) FROM filemetadata;
+----------+
| COUNT(1) |
+----------+
|   101013 |
+----------+
1 row in set (0.29 sec)






SELECT COUNT(*) for InnoDB tables is a know problem...  The table
handler (for InnoDB) has to do a table scan to count all rows...  This
particular case is optimized with MyISAM ...










--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

























-- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]



Reply via email to