Hi there,

> I would try to sum it up like this:
>
> 1. Discs work best when used as purely sequential devices.
> 2. Inserting those BLOBs takes up space somewhere. Strictly speaking, on
> the disc they may very well be in between certain rows.

Good points - I guess there is no way for the database to control where on
the file system the data ends up.  I wonder if using a raw parition for the
database under InnoDB would fix this?  I guess that would give the storage
engine the opportunity to put the BLOB data in a heap at one end of a disk
while keeping the table data together at the other, although this sounds
like a lot of work for the storage engine.  What would be good would be a
tool to show how 'fragmented' a database table is on disk?

> 3. As you may need to read over them, the disc has to seek much further
> and depending on the way BLOBs are placed inside the tablespace, seeking
> may be all over the place.

Another good point.  It maybe that by creating the 'filemetadata' table in
one go I've just ensured that most of the columns are linear on the disc.
In which case re-creating the 'files' table and then adding the BLOB data
back as a second step might fix this.  My experiments show this is not the
case though...

> Have you tried the following to see the results:
>
> ALTER TABLE files TYPE = MyISAM;
> (Remove all indexes)
> SELECT SUM(revision) FROM files;

Okay.  I've made two new tables, one called 'myfiles' and one called
'inofiles' which are both duplicates of 'files' without any indices except
for the primary key.  Here are the tables I'm comparing:

CREATE TABLE `myfiles` (
  `id` int(10) unsigned NOT NULL auto_increment,
  `revision` int(10) unsigned NOT NULL default '0',
  `filenameid` int(10) unsigned NOT NULL default '0',
  `pathid` int(10) unsigned NOT NULL default '0',
  `extensionid` int(10) unsigned NOT NULL default '0',
  `isDeleted` enum('0','1') NOT NULL default '0',
  `filepathname` tinytext NOT NULL,
  `contentsGz` longblob NOT NULL,
  PRIMARY KEY  (`id`)
) TYPE=MyISAM;

CREATE TABLE `inofiles` (
  `id` int(10) unsigned NOT NULL auto_increment,
  `revision` int(10) unsigned NOT NULL default '0',
  `filenameid` int(10) unsigned NOT NULL default '0',
  `pathid` int(10) unsigned NOT NULL default '0',
  `extensionid` int(10) unsigned NOT NULL default '0',
  `isDeleted` enum('0','1') NOT NULL default '0',
  `filepathname` tinytext NOT NULL,
  `contentsGz` longblob NOT NULL,
  PRIMARY KEY  (`id`)
) TYPE=InnoDB;


Here's the results for the select statement you suggest:

[Restart MySQL]
mysql> SELECT SUM(revision) FROM inofiles;
+---------------+
| SUM(revision) |
+---------------+
|        561401 |
+---------------+
1 row in set (33.61 sec)

mysql> SELECT SUM(revision) from inofiles;
+---------------+
| SUM(revision) |
+---------------+
|        561401 |
+---------------+
1 row in set (31.79 sec)

[Restart MySQL]
mysql> SELECT SUM(revision) FROM myfiles;
+---------------+
| SUM(revision) |
+---------------+
|        561401 |
+---------------+
1 row in set (1 min 7.86 sec)


mysql> SELECT SUM(revision) from myfiles;
+---------------+
| SUM(revision) |
+---------------+
|        561401 |
+---------------+
1 row in set (1 min 7.83 sec)

I restarted MySQL before running each test, and then run the test twice,
being careful to change the case of the SELECT statements to avoid hitting
the query cache.  Strangely InnoDb is more than twice as fast...

Now, if I drop the BLOB column 'contentsGz' from both 'inofiles' and
'myfiles' and repeat:

[Restart database]
mysql> SELECT SUM(revision) FROM myfiles;
+---------------+
| SUM(revision) |
+---------------+
|        561401 |
+---------------+
1 row in set (0.34 sec)

mysql> SELECT SUM(revision) from myfiles;
+---------------+
| SUM(revision) |
+---------------+
|        561401 |
+---------------+
1 row in set (0.18 sec)

[Restart database]
mysql> SELECT SUM(revision) FROM inofiles;
+---------------+
| SUM(revision) |
+---------------+
|        561401 |
+---------------+
1 row in set (0.29 sec)

mysql> SELECT SUM(revision) from inofiles;
+---------------+
| SUM(revision) |
+---------------+
|        561401 |
+---------------+
1 row in set (0.10 sec)

So I guess the conclusion is that BLOBs are *really* bad for table scans in
MyISAM _and_ InnoDB, although perhaps slightly less so for InnoDB.  I guess
the BLOB data must get interleaved with the non-BLOB data on disk, and that
putting the BLOBs into a different table may cause some sort of partitioning
between the tables on disk.

I'm guessing that the real problem here is that some of my queries are
secretly doing table scans when they shouldn't, and that is causing a huge
slowdown, although I might split the BLOB column into a different table such
that table scans don't give me such a bad hit if they do happen.

Thanks,

Mike

> -----Original Message-----
> From: Chris Nolan [mailto:[EMAIL PROTECTED]
> Sent: 12 February 2004 12:23
> To: Michael McTernan
> Cc: Benoit St-Jean; Mysql
> Subject: Re: InnoDb Table Performance problem
>
>
> Michael McTernan wrote:
>
> >Hi,
> >
> >
> >
> >>SELECT COUNT(*) for InnoDB tables is a know problem...  The table
> >>handler (for InnoDB) has to do a table scan to count all rows...  This
> >>particular case is optimized with MyISAM ...
> >>
> >>
> >
> >Sure.  But why is the tablescan ~100 times faster for the table
> without the
> >BLOB column?
> >
> >
> I would try to sum it up like this:
>
> 1. Discs work best when used as purely sequential devices.
> 2. Inserting those BLOBs takes up space somewhere. Strictly speaking, on
> the disc they may very well be in between certain rows.
> 3. As you may need to read over them, the disc has to seek much further
> and depending on the way BLOBs are placed inside the tablespace, seeking
> may be all over the place.
>
> Have you tried the following to see the results:
>
> ALTER TABLE files TYPE = MyISAM;
> (Remove all indexes)
> SELECT SUM(revision) FROM files;
>
> You may want to execute the above select on your InnoDB version of the
> table as well.
>
> Generally speaking, the execution time of COUNT(*) doesn't tell you very
> much unless you are a bit more specific:
>
> SELECT COUNT(revision) FROM files WHERE revision BETWEEN '1' AND '2600';
>
> I am prepared to be shot down for being totally incorrect!
>
> Regards,
>
> Chris
>
> >This goes back to my original assumption that I thought the BLOB contents
> >would be stored outside of the normal column data, meaning that
> you have to
> >do an additional seek to retrieve a BLOB value, but that a BLOB column
> >doesn't inflate the data over which a table scan has to iterate and so
> >doesn't impact tablescan performance.  Is this wrong?  Or is
> this different
> >for InnoDB vs MyISAM?
> >
> >Of course, I'm aiming not to use table scans at all though :)
> >
> >Thanks,
> >
> >Mike
> >
> >
> >
> >>-----Original Message-----
> >>From: Benoit St-Jean [mailto:[EMAIL PROTECTED]
> >>Sent: 11 February 2004 22:47
> >>To: Michael McTernan
> >>Cc: Mysql
> >>Subject: Re: InnoDb Table Performance problem
> >>
> >>
> >>Michael McTernan wrote:
> >>
> >>
> >>
> >>>Hi there,
> >>>
> >>>I'm making something similar to a file revision control
> system, and using
> >>>MySQL on Linux as the database to drive it.  Almost all my tables are
> >>>InnoDB, and generally it is going very well, with the exception
> >>>
> >>>
> >>of one table
> >>
> >>
> >>>that is always very slow.
> >>>
> >>>This table holds the files within the database.  It is defined
> >>>
> >>>
> >>as follows:
> >>
> >>
> >>>CREATE TABLE `files` (
> >>> `id` int(10) unsigned NOT NULL auto_increment,
> >>> `revision` int(10) unsigned NOT NULL default '0',
> >>> `filenameid` int(10) unsigned NOT NULL default '0',
> >>> `pathid` int(10) unsigned NOT NULL default '0',
> >>> `extensionid` int(10) unsigned NOT NULL default '0',
> >>> `isDeleted` enum('0','1') NOT NULL default '0',
> >>> `filepathname` tinytext NOT NULL,
> >>> `contentsGz` longblob NOT NULL,
> >>> PRIMARY KEY  (`id`),
> >>> UNIQUE KEY `revision` (`revision`,`filepathname`(255)),
> >>> KEY `fpeindex` (`filenameid`,`pathid`,`extensionid`,`revision`),
> >>> KEY `filepathname` (`filepathname`(255)),
> >>> CONSTRAINT `0_3570` FOREIGN KEY (`filenameid`) REFERENCES `filenames`
> >>>(`id`),
> >>> CONSTRAINT `0_3571` FOREIGN KEY (`extensionid`) REFERENCES
> >>>`fileextensions` (`id`),
> >>> CONSTRAINT `0_3572` FOREIGN KEY (`pathid`) REFERENCES
> >>>
> >>>
> >>`filepaths` (`id`)
> >>
> >>
> >>>) TYPE=InnoDB;
> >>>
> >>>The 'contentsGz' column will have the contents of the file and will
> >>>typically be a couple of hundred kilobytes, but in some rare
> >>>
> >>>
> >>cases as large
> >>
> >>
> >>>as 20 Megabytes.
> >>>
> >>>Selects on this table always go very slowly.  I've used EXPLAIN
> >>>
> >>>
> >>to look at
> >>
> >>
> >>>what is going on, and carefully added a couple of multi-column
> >>>
> >>>
> >>indexes that
> >>
> >>
> >>>have improved SELECT performance (this table is updated rarely,
> >>>
> >>>
> >>so I'm not
> >>
> >>
> >>>too worried about INSERT performance).  However, the
> performance is still
> >>>really bad.
> >>>
> >>>I tried creating an identical table with the exception that it
> >>>
> >>>
> >>doesn't have
> >>
> >>
> >>>the 'contentsGz' column:
> >>>
> >>>CREATE TABLE `filemetadata` (
> >>> `id` int(10) unsigned NOT NULL auto_increment,
> >>> `revision` int(10) unsigned NOT NULL default '0',
> >>> `filenameid` int(10) unsigned NOT NULL default '0',
> >>> `pathid` int(10) unsigned NOT NULL default '0',
> >>> `extensionid` int(10) unsigned NOT NULL default '0',
> >>> `isDeleted` enum('0','1') NOT NULL default '0',
> >>> `filepathname` tinytext NOT NULL,
> >>> PRIMARY KEY  (`id`),
> >>> UNIQUE KEY `revision` (`revision`,`filepathname`(255)),
> >>> KEY `fpeindex` (`filenameid`,`pathid`,`extensionid`,`revision`),
> >>> KEY `filepathname` (`filepathname`(255)),
> >>> CONSTRAINT `0_3651` FOREIGN KEY (`filenameid`) REFERENCES `filenames`
> >>>(`id`),
> >>> CONSTRAINT `0_3652` FOREIGN KEY (`extensionid`) REFERENCES
> >>>`fileextensions` (`id`),
> >>> CONSTRAINT `0_3653` FOREIGN KEY (`pathid`) REFERENCES
> >>>
> >>>
> >>`filepaths` (`id`)
> >>
> >>
> >>>) TYPE=InnoDB;
> >>>
> >>>I used UPDATE ... SELECT to copy all data from the 'files' table to
> >>>'filemetadata'.
> >>>
> >>>Here is something I found suprising:
> >>>
> >>>mysql> SELECT COUNT(1) FROM files;
> >>>+----------+
> >>>| COUNT(1) |
> >>>+----------+
> >>>|   101013 |
> >>>+----------+
> >>>1 row in set (32.42 sec)
> >>>
> >>>mysql> SELECT COUNT(1) FROM filemetadata;
> >>>+----------+
> >>>| COUNT(1) |
> >>>+----------+
> >>>|   101013 |
> >>>+----------+
> >>>1 row in set (0.29 sec)
> >>>
> >>>
> >>>
> >>>
> >>SELECT COUNT(*) for InnoDB tables is a know problem...  The table
> >>handler (for InnoDB) has to do a table scan to count all rows...  This
> >>particular case is optimized with MyISAM ...
> >>
> >>
> >>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> >>
> >>--
> >>MySQL General Mailing List
> >>For list archives: http://lists.mysql.com/mysql
> >>To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]
> >>
> >>
> >>
> >>
> >>
> >>
> >
> >
> >
> >
> >
>
>
>
>
>



-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to