Hi Chris,

> A HEX editor? :-) Finding out what the developers at MySQL AB like to
> drink and sending them a few megalitres of it? :-)

Heh heh :-)

> In all seriousness, you haven't got a great many options (although I
> could be completely wrong, in which case some kind person on this list
> will enlighten / flame me), but using OPTIMIZE TABLE on MyISAM tables
> and ALTER TABLE TYPE = InnoDB on InnoDB tables will defrag your table
> for you, possibly helping matters.

I'm pretty sure that the new tables I created for the test would have been
defraged since I ALTER'd them a couple of times to drop indices.

> InnoDB looks at tablespace files and raw partitions in the same way and
> manages space within them in a method related to that used by the
> Berkeley Fast File System.

Sounds like it is already very efficient then.

> It *might* have some more
> options to play with if you wrap a few INSERTs in a transaction block.

That's a good idea - I could certainly try that.  A future version of InnoDb
may also take advantage of this I guess :)

> That may help. One thing you may want to do is re-run the above with an
> index on the revision column for both table types with and without
> BLOBs.

I did try adding indices, and you are correct, it brings the times right
down, so it is only table scan performance that is at fault here.

> You have to be impressed with InnoDB though - it's a full-on
> multiversioned storage engine and it manages to keep pace with MyISAM
> (and outperform it on occassion).

Completely.  I'm impressed with the whole lot of MySQL - it's brilliant.

> Using the FORCE INDEX(date) predicate, the query time drops to about 2
seconds.

I've had to do this in the past too.  I'll be looking out for slow queries
in the logs and then EXPLAINing them to see where I can make this
improvement.

Thankyou for all your help.  While there is no simple solution, I'm a lot
more sure of my options now :)

Cheers,

Mike

> -----Original Message-----
> From: Chris Nolan [mailto:[EMAIL PROTECTED]
> Sent: 12 February 2004 17:28
> To: Michael McTernan
> Cc: Mysql; Benoit St-Jean
> Subject: Re: InnoDb Table Performance problem
>
>
> Michael McTernan wrote:
>
> >Hi there,
> >
> >
> >
> >>I would try to sum it up like this:
> >>
> >>1. Discs work best when used as purely sequential devices.
> >>2. Inserting those BLOBs takes up space somewhere. Strictly speaking, on
> >>the disc they may very well be in between certain rows.
> >>
> >>
> >
> >Good points - I guess there is no way for the database to
> control where on
> >the file system the data ends up.  I wonder if using a raw
> parition for the
> >database under InnoDB would fix this?  I guess that would give
> the storage
> >engine the opportunity to put the BLOB data in a heap at one end
> of a disk
> >while keeping the table data together at the other, although this sounds
> >like a lot of work for the storage engine.  What would be good would be a
> >tool to show how 'fragmented' a database table is on disk?
> >
> >
> >
> A HEX editor? :-) Finding out what the developers at MySQL AB like to
> drink and sending them a few megalitres of it? :-)
>
> In all seriousness, you haven't got a great many options (although I
> could be completely wrong, in which case some kind person on this list
> will enlighten / flame me), but using OPTIMIZE TABLE on MyISAM tables
> and ALTER TABLE TYPE = InnoDB on InnoDB tables will defrag your table
> for you, possibly helping matters. In each case, the table has a SHARE
> MODE lock set (read only), copies all existing rows to a new table such
> that they are positioned optimally, drops the old table and renames the
> new table to the original name.
>
> InnoDB looks at tablespace files and raw partitions in the same way and
> manages space within them in a method related to that used by the
> Berkeley Fast File System. The only advantage of raw partition usage is
> bypassing filesystem logic which is very beneficial on some OSes (not
> Linux or any of the BSDs to my knowledge though - their FS layers are
> damned quick!).
>
> >>3. As you may need to read over them, the disc has to seek much further
> >>and depending on the way BLOBs are placed inside the tablespace, seeking
> >>may be all over the place.
> >>
> >>
> >
> >Another good point.  It maybe that by creating the
> 'filemetadata' table in
> >one go I've just ensured that most of the columns are linear on the disc.
> >In which case re-creating the 'files' table and then adding the BLOB data
> >back as a second step might fix this.  My experiments show this
> is not the
> >case though...
> >
> >
> If you're doing INSERTs in AUTO_COMMIT mode, then InnoDB will have no
> choice but to put the BLOBs where it can. It *might* have some more
> options to play with if you wrap a few INSERTs in a transaction block.
>
> >
> >
> >>Have you tried the following to see the results:
> >>
> >>ALTER TABLE files TYPE = MyISAM;
> >>(Remove all indexes)
> >>SELECT SUM(revision) FROM files;
> >>
> >>
> >
> >Okay.  I've made two new tables, one called 'myfiles' and one called
> >'inofiles' which are both duplicates of 'files' without any
> indices except
> >for the primary key.  Here are the tables I'm comparing:
> >
> >CREATE TABLE `myfiles` (
> >  `id` int(10) unsigned NOT NULL auto_increment,
> >  `revision` int(10) unsigned NOT NULL default '0',
> >  `filenameid` int(10) unsigned NOT NULL default '0',
> >  `pathid` int(10) unsigned NOT NULL default '0',
> >  `extensionid` int(10) unsigned NOT NULL default '0',
> >  `isDeleted` enum('0','1') NOT NULL default '0',
> >  `filepathname` tinytext NOT NULL,
> >  `contentsGz` longblob NOT NULL,
> >  PRIMARY KEY  (`id`)
> >) TYPE=MyISAM;
> >
> >CREATE TABLE `inofiles` (
> >  `id` int(10) unsigned NOT NULL auto_increment,
> >  `revision` int(10) unsigned NOT NULL default '0',
> >  `filenameid` int(10) unsigned NOT NULL default '0',
> >  `pathid` int(10) unsigned NOT NULL default '0',
> >  `extensionid` int(10) unsigned NOT NULL default '0',
> >  `isDeleted` enum('0','1') NOT NULL default '0',
> >  `filepathname` tinytext NOT NULL,
> >  `contentsGz` longblob NOT NULL,
> >  PRIMARY KEY  (`id`)
> >) TYPE=InnoDB;
> >
> >
> >Here's the results for the select statement you suggest:
> >
> >[Restart MySQL]
> >mysql> SELECT SUM(revision) FROM inofiles;
> >+---------------+
> >| SUM(revision) |
> >+---------------+
> >|        561401 |
> >+---------------+
> >1 row in set (33.61 sec)
> >
> >mysql> SELECT SUM(revision) from inofiles;
> >+---------------+
> >| SUM(revision) |
> >+---------------+
> >|        561401 |
> >+---------------+
> >1 row in set (31.79 sec)
> >
> >[Restart MySQL]
> >mysql> SELECT SUM(revision) FROM myfiles;
> >+---------------+
> >| SUM(revision) |
> >+---------------+
> >|        561401 |
> >+---------------+
> >1 row in set (1 min 7.86 sec)
> >
> >
> >mysql> SELECT SUM(revision) from myfiles;
> >+---------------+
> >| SUM(revision) |
> >+---------------+
> >|        561401 |
> >+---------------+
> >1 row in set (1 min 7.83 sec)
> >
> >I restarted MySQL before running each test, and then run the test twice,
> >being careful to change the case of the SELECT statements to
> avoid hitting
> >the query cache.  Strangely InnoDb is more than twice as fast...
> >
> >Now, if I drop the BLOB column 'contentsGz' from both 'inofiles' and
> >'myfiles' and repeat:
> >
> >[Restart database]
> >mysql> SELECT SUM(revision) FROM myfiles;
> >+---------------+
> >| SUM(revision) |
> >+---------------+
> >|        561401 |
> >+---------------+
> >1 row in set (0.34 sec)
> >
> >mysql> SELECT SUM(revision) from myfiles;
> >+---------------+
> >| SUM(revision) |
> >+---------------+
> >|        561401 |
> >+---------------+
> >1 row in set (0.18 sec)
> >
> >[Restart database]
> >mysql> SELECT SUM(revision) FROM inofiles;
> >+---------------+
> >| SUM(revision) |
> >+---------------+
> >|        561401 |
> >+---------------+
> >1 row in set (0.29 sec)
> >
> >mysql> SELECT SUM(revision) from inofiles;
> >+---------------+
> >| SUM(revision) |
> >+---------------+
> >|        561401 |
> >+---------------+
> >1 row in set (0.10 sec)
> >
> >So I guess the conclusion is that BLOBs are *really* bad for
> table scans in
> >MyISAM _and_ InnoDB, although perhaps slightly less so for
> InnoDB.  I guess
> >the BLOB data must get interleaved with the non-BLOB data on
> disk, and that
> >putting the BLOBs into a different table may cause some sort of
> partitioning
> >between the tables on disk.
> >
> >
> That may help. One thing you may want to do is re-run the above with an
> index on the revision column for both table types with and without
> BLOBs. I'm betting that you'll see little deviation between the
> BLOB-equipped tables and the BLOB-less ones. One thing you might want to
> investigate is storing your BLOBs in a MyISAM table and your other data
> in an InnoDB table.
>
> You have to be impressed with InnoDB though - it's a full-on
> multiversioned storage engine and it manages to keep pace with MyISAM
> (and outperform it on occassion). Considering MyISAM's design brief was
> to be sleek, fast and space efficient (which gives it further speed
> advantages), InnoDB's comparitive overhead is hidden extremely well.
> Unfortunately, I really started using MySQL well after NuSphere buggered
> up everything and retired the Gemini table type - it would have been
> interesting to see all 4 table types be compared with all the nice speed
> improvements 4.0.x has brough along.
>
> >I'm guessing that the real problem here is that some of my queries are
> >secretly doing table scans when they shouldn't, and that is
> causing a huge
> >slowdown, although I might split the BLOB column into a
> different table such
> >that table scans don't give me such a bad hit if they do happen.
> >
> >
> There is a table I query often, with two ENUM columns, a TIMESTAMP
> column and a TEXT column. I often need to throw queries at it that look
> for substrings in the TEXT column for rows between two dates with
> certain values for the ENUM columns. There are indexes on the ENUM,
> TIMESTAMP and TEXT columns (prefix indexes of course). Unless I specify
> FORCE INDEX(date), MySQL does a full table scan, resulting in queries
> that take 2 minutes to complete. Using the FORCE INDEX(date) predicate,
> the query time drops to about 2 seconds.
>
> >Thanks,
> >
> >Mike
> >
> >
> >
> You're welcome! I hope that this helps you out!
>
> Regards,
>
> Chris
>
> >>-----Original Message-----
> >>From: Chris Nolan [mailto:[EMAIL PROTECTED]
> >>Sent: 12 February 2004 12:23
> >>To: Michael McTernan
> >>Cc: Benoit St-Jean; Mysql
> >>Subject: Re: InnoDb Table Performance problem
> >>
> >>
> >>Michael McTernan wrote:
> >>
> >>
> >>
> >>>Hi,
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>>SELECT COUNT(*) for InnoDB tables is a know problem...  The table
> >>>>handler (for InnoDB) has to do a table scan to count all rows...  This
> >>>>particular case is optimized with MyISAM ...
> >>>>
> >>>>
> >>>>
> >>>>
> >>>Sure.  But why is the tablescan ~100 times faster for the table
> >>>
> >>>
> >>without the
> >>
> >>
> >>>BLOB column?
> >>>
> >>>
> >>>
> >>>
> >>I would try to sum it up like this:
> >>
> >>1. Discs work best when used as purely sequential devices.
> >>2. Inserting those BLOBs takes up space somewhere. Strictly speaking, on
> >>the disc they may very well be in between certain rows.
> >>3. As you may need to read over them, the disc has to seek much further
> >>and depending on the way BLOBs are placed inside the tablespace, seeking
> >>may be all over the place.
> >>
> >>Have you tried the following to see the results:
> >>
> >>ALTER TABLE files TYPE = MyISAM;
> >>(Remove all indexes)
> >>SELECT SUM(revision) FROM files;
> >>
> >>You may want to execute the above select on your InnoDB version of the
> >>table as well.
> >>
> >>Generally speaking, the execution time of COUNT(*) doesn't tell you very
> >>much unless you are a bit more specific:
> >>
> >>SELECT COUNT(revision) FROM files WHERE revision BETWEEN '1' AND '2600';
> >>
> >>I am prepared to be shot down for being totally incorrect!
> >>
> >>Regards,
> >>
> >>Chris
> >>
> >>
> >>
> >>>This goes back to my original assumption that I thought the
> BLOB contents
> >>>would be stored outside of the normal column data, meaning that
> >>>
> >>>
> >>you have to
> >>
> >>
> >>>do an additional seek to retrieve a BLOB value, but that a BLOB column
> >>>doesn't inflate the data over which a table scan has to iterate and so
> >>>doesn't impact tablescan performance.  Is this wrong?  Or is
> >>>
> >>>
> >>this different
> >>
> >>
> >>>for InnoDB vs MyISAM?
> >>>
> >>>Of course, I'm aiming not to use table scans at all though :)
> >>>
> >>>Thanks,
> >>>
> >>>Mike
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>>-----Original Message-----
> >>>>From: Benoit St-Jean [mailto:[EMAIL PROTECTED]
> >>>>Sent: 11 February 2004 22:47
> >>>>To: Michael McTernan
> >>>>Cc: Mysql
> >>>>Subject: Re: InnoDb Table Performance problem
> >>>>
> >>>>
> >>>>Michael McTernan wrote:
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>>Hi there,
> >>>>>
> >>>>>I'm making something similar to a file revision control
> >>>>>
> >>>>>
> >>system, and using
> >>
> >>
> >>>>>MySQL on Linux as the database to drive it.  Almost all my tables are
> >>>>>InnoDB, and generally it is going very well, with the exception
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>of one table
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>>that is always very slow.
> >>>>>
> >>>>>This table holds the files within the database.  It is defined
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>as follows:
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>>CREATE TABLE `files` (
> >>>>>`id` int(10) unsigned NOT NULL auto_increment,
> >>>>>`revision` int(10) unsigned NOT NULL default '0',
> >>>>>`filenameid` int(10) unsigned NOT NULL default '0',
> >>>>>`pathid` int(10) unsigned NOT NULL default '0',
> >>>>>`extensionid` int(10) unsigned NOT NULL default '0',
> >>>>>`isDeleted` enum('0','1') NOT NULL default '0',
> >>>>>`filepathname` tinytext NOT NULL,
> >>>>>`contentsGz` longblob NOT NULL,
> >>>>>PRIMARY KEY  (`id`),
> >>>>>UNIQUE KEY `revision` (`revision`,`filepathname`(255)),
> >>>>>KEY `fpeindex` (`filenameid`,`pathid`,`extensionid`,`revision`),
> >>>>>KEY `filepathname` (`filepathname`(255)),
> >>>>>CONSTRAINT `0_3570` FOREIGN KEY (`filenameid`) REFERENCES `filenames`
> >>>>>(`id`),
> >>>>>CONSTRAINT `0_3571` FOREIGN KEY (`extensionid`) REFERENCES
> >>>>>`fileextensions` (`id`),
> >>>>>CONSTRAINT `0_3572` FOREIGN KEY (`pathid`) REFERENCES
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>`filepaths` (`id`)
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>>) TYPE=InnoDB;
> >>>>>
> >>>>>The 'contentsGz' column will have the contents of the file and will
> >>>>>typically be a couple of hundred kilobytes, but in some rare
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>cases as large
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>>as 20 Megabytes.
> >>>>>
> >>>>>Selects on this table always go very slowly.  I've used EXPLAIN
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>to look at
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>>what is going on, and carefully added a couple of multi-column
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>indexes that
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>>have improved SELECT performance (this table is updated rarely,
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>so I'm not
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>>too worried about INSERT performance).  However, the
> >>>>>
> >>>>>
> >>performance is still
> >>
> >>
> >>>>>really bad.
> >>>>>
> >>>>>I tried creating an identical table with the exception that it
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>doesn't have
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>>the 'contentsGz' column:
> >>>>>
> >>>>>CREATE TABLE `filemetadata` (
> >>>>>`id` int(10) unsigned NOT NULL auto_increment,
> >>>>>`revision` int(10) unsigned NOT NULL default '0',
> >>>>>`filenameid` int(10) unsigned NOT NULL default '0',
> >>>>>`pathid` int(10) unsigned NOT NULL default '0',
> >>>>>`extensionid` int(10) unsigned NOT NULL default '0',
> >>>>>`isDeleted` enum('0','1') NOT NULL default '0',
> >>>>>`filepathname` tinytext NOT NULL,
> >>>>>PRIMARY KEY  (`id`),
> >>>>>UNIQUE KEY `revision` (`revision`,`filepathname`(255)),
> >>>>>KEY `fpeindex` (`filenameid`,`pathid`,`extensionid`,`revision`),
> >>>>>KEY `filepathname` (`filepathname`(255)),
> >>>>>CONSTRAINT `0_3651` FOREIGN KEY (`filenameid`) REFERENCES `filenames`
> >>>>>(`id`),
> >>>>>CONSTRAINT `0_3652` FOREIGN KEY (`extensionid`) REFERENCES
> >>>>>`fileextensions` (`id`),
> >>>>>CONSTRAINT `0_3653` FOREIGN KEY (`pathid`) REFERENCES
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>`filepaths` (`id`)
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>>) TYPE=InnoDB;
> >>>>>
> >>>>>I used UPDATE ... SELECT to copy all data from the 'files' table to
> >>>>>'filemetadata'.
> >>>>>
> >>>>>Here is something I found suprising:
> >>>>>
> >>>>>mysql> SELECT COUNT(1) FROM files;
> >>>>>+----------+
> >>>>>| COUNT(1) |
> >>>>>+----------+
> >>>>>|   101013 |
> >>>>>+----------+
> >>>>>1 row in set (32.42 sec)
> >>>>>
> >>>>>mysql> SELECT COUNT(1) FROM filemetadata;
> >>>>>+----------+
> >>>>>| COUNT(1) |
> >>>>>+----------+
> >>>>>|   101013 |
> >>>>>+----------+
> >>>>>1 row in set (0.29 sec)
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>SELECT COUNT(*) for InnoDB tables is a know problem...  The table
> >>>>handler (for InnoDB) has to do a table scan to count all rows...  This
> >>>>particular case is optimized with MyISAM ...
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>--
> >>>>MySQL General Mailing List
> >>>>For list archives: http://lists.mysql.com/mysql
> >>>>To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> >>
> >>
> >>
> >>
> >
> >
> >
> >
>
>
>
>
>



-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to