Re: Re[5]: More Innobase questions (Was: RE: Innobase)

Heikki Tuuri Mon, 12 Mar 2001 12:37:54 -0800
Hi Peter,

and sorry that this response to your Big Mail comes some 12 days late.
We have put together the release 3.23.34a and that has occupied
my mind.

>HT> In pure insertions, copying a table to another, I have measured that
>HT> Innobase is about as fast as MyISAM. In aggregate queries from joins
>HT> Innobase is typically 2 x faster than MyISAM.
>
>I hope so :)  Currently I see some of queries are really faster:
>
>MYISAM: Time for min_max_on_key (70000): 167 wallclock secs (14.68 usr
2.68 sys +  0.00 cusr  0.00 csys =  0.00 CPU)
>INNOBASE: Time for min_max_on_key (70000): 92 wallclock secs (14.51 usr
2.59 sys +  0.00 cusr  0.00 csys =  0.00 CPU)
>
>But others are not:
>INNOBASE:
>Time for select_big (70:5574):  5 wallclock secs ( 0.03 usr  0.01 sys +
0.00 cusr  0.00 csys =  0.00 CPU)
>Time for select_range (410:852237): 164 wallclock secs ( 5.53 usr  2.52 sys
+  0.00 cusr  0.00 csys =  0.00 CPU)
>Estimated time for count_on_key (50000): 1301 wallclock secs (16.79 usr
2.49 sys +  0.00 cusr  0.00 csys =  0.00 CPU)
>MYISAM:
>Time for select_big (70:17207):  1 wallclock secs ( 0.08 usr  0.04 sys +
0.00 cusr  0.00 csys =  0.00 CPU)
>Time for select_range (410:1057904): 186 wallclock secs ( 6.96 usr  2.87
sys +  0.00 cusr  0.00 csys =  0.00 CPU)
>Time for count_on_key (50000): 412 wallclock secs (12.75 usr  2.12 sys +
0.00 cusr  0.00 csys =  0.00 CPU)
>
>The good thing is innobase is allready winning on some queries, there
>it is not so optimized to use within mysql as MYISAM is.

Some queries in sql-bench are a bit artificial, because they also
test the bugginess of the database. If you select single rows from
the database, then the CPU time is usually dominated by the communications
overhead between your client and the database server. If the database is
very big, it may be dominated by the disk seek needed to fetch the row.

My basic test has been a COUNT(*) on a big join in main memory. That
measures the speed of the database server kernel, and communications overhead is
negligible. The TPC-H query benchmark contains many such queries.

I have to look at the MySQL query optimizer. It does not always pick
the best table access strategy for Innobase, maybe because table
statistics are calculated in somewhat different way.

>The things also makes me happy it's as quick as MYISAM with insertions
>(these are multiple inserts so transactions does not have their
>overhead) The slight difference is only found in insert duplicates.
>
>MYISAM:
>
>Inserting 100000 multiple-value rows in order
>Inserting 100000 multiple-value rows in reverse order

>Inserting 100000 multiple-value rows in random order

>Time for insert (300000): 25 wallclock secs ( 5.82 usr  0.08 sys +  0.00
cusr  0.00 csys =  0.00 CPU)
>Testing insert of duplicates
>Time for insert_duplicates (100000): 15 wallclock secs ( 2.71 usr  2.49 sys
+  0.00 cusr  0.00 csys =  0.00 CPU) 
>
>INNOBASE:
>Inserting 100000 multiple-value rows in order
>Inserting 100000 multiple-value rows in reverse order

>Inserting 100000 multiple-value rows in random order

>Time for insert (300000): 25 wallclock secs ( 5.83 usr  0.11 sys +  0.00
cusr  0.00 csys =  0.00 CPU)
>Testing insert of duplicates
>Time for insert_duplicates (100000): 17 wallclock secs ( 3.24 usr  2.73 sys
+  >0.00 cusr  0.00 csys =  0.00 CPU)

Also some insert tests are artificial. I have typically used a table
with a primary key, and possible a couple of non-unique secondary indexes.
Rows usually come in the ascending order of the primary key. In performing
inserts, a transactional database has the extra burden of writing log records
for the inserts. The log is needed to provide rollback and recovery, of course.

>HT> If several tables were stored in a single tree, then of course that
>HT> tree would occupy at the minimum only one database page.
>
>So You mean innobae can handle several tables in one intertal tree ?

Yes, it could handle, but currently you cannot define such tables
in MySQL.

>Is database clustered index page is the same as data page ?

Yes, all data is stored in the pages of the clustered index.
In Oracle such tables are called index organized tables, but in Sybase
they use the term clustered tables. The idea is that the data is always
kept in the order of the primary (cluster key). It makes accesses
through the cluster key faster, since you do not need to latch and
access a separate data page when you fetch a row.

>OK. I see. Innobase hash indexes is something like a cache for a btree
>index. So If I do many queries like "select A from TEST where
>name='peter'" but do not have an index on name column innobase can't
>help here.

Right, it is up to the DBA to specify sufficient indexes for the table.
The hashing method is just an internal optimization, not any wizard
which would create new indexes on the fly.

>HT> What do you mean with a 200 by(te) string? As a key?
>Yes.
>HT> Yes, some types of hash indexes fit in a very small space, but Innobase
>HT> does not have such indexes. All disk-based indexes in Innobase are B-trees.
>OK. Why do you selected not to have thouse ?  I've got the one problem
>with mysql then I needed to search a rows from a table by a key which
>consist of 4 parts and which was about 600 bytes length.  The solution
>I would with MYSQL is to add a hash column to a table and a make an
>index on it, and then to compute a hash function for each
>inserting/updating row and also use it on search. This give me
>significant perfomance improvement as well as better disk utilization.
>I think the hash indexes is extremly good feature even if the hash
>values are really stored in btree.

It is best to do that in the application. Adding a new access method to
a database server is a major task, requiring 3 man-months of work, at
the minimum :).

>I ment What will be if I'll specify a wrong size in my.cnf ?

When Innobase starts, it checks that the existing data files are of
the size specified in my.cnf.

>HT>> Yes, in Oracle you can add a data file while the database is up
>HT>> (ALTER TABLESPACE ADD DATAFILE), and it is easily implementable.
>>>Here we come to one problem again - you can't do this as you have no
>>>place to store the information - You can't just change datafile.
>
>HT> ALTER TABLESPACE has to edit also the .cnf file.
>That's impossible because you can't find the exact place there you got
>the value - it could be just specified as command line parameter. Also
>the problem  is my.cnf is usualy only readable by mysql user and not
>writable by it (it would be a security hole). Also in my case several
>machines share one my.cnf over NFS. So if You don't like to
>have system table space and system table it would be nice to have a
>special database config in innobase directory which is updated by
>mysql. This is a same way mysql does with replication - current status
>is stored in special file.

Monty wanted me to put the Innobase config parameters to my.cnf. Maybe
we should do so that Innobase would internally remember the data files
and when you start up, it would remind you to edit the my.cnf file
if you have used ALTER TABLESPACE ADD DATAFILE online.

>HT> I just checked the MS SQL Server manual. They do not have a repair tool
>HT> either. With DBCC commands you can check the database integrity, but
>HT> the DBCC REPAIR command has been dropped. This suggests that a repair
>HT> tool is not needed in mainstream transactional databases: recovering
>HT> from a backup is the better method.
>
>That's the think I know people I crying much.  There are a number of
>problems
>1) Some of people just do not backup, but they would not like the
>database does not give them a chance to recover even anything if
>something wron happen
>2) The second one is much more importand. Additionally to the media
>(HDD) which may fail you have a hardware and a software (even OS)
>which may fail. In this case for example if you memory gets a problem
>you can get a random dammage - so Even logfiles will be dammaged so
>recovery from backup will not help. That's for example why chkdsk is
>needed for NTFS at most cases :)  Don't say you should have relaible
>hardware and OS for you database - no hardware or OS is totally
>reliable and also many people are just using what they have for theis
>database. The last thing is - are you shure INNOBASE is totally
>reliable so will not ñurrupt it's database in any case ? That's other
>thing why repair tool should exist.

We have to look into this in depth. The problem with a repair tool is
that it really cannot repair damaged data: if data in some rows has been
written over, the tool cannot guess what the original data was. Of course,
if the repair tool notices an inconsistency in a row, it can ask the
user to type the data in by hand. If possible, you should of course put
log files on a different disk from data files. If either disk fails,
all is not lost.

I just checked the Solid Server DBA manual from the Solid website, and I
did not find any reference to a repair tool for Solid either. I know that
Solid's predecessor VIA DRE database had a repair tool, but I think
VIA was a buggy database. I conclude from the examples of Oracle,
MS SQL Server and Solid Server that a repair tool is probably not
the best option to obtain safety.

>This thing is quite important for me because I wont use a
>database/filesystem in production enviroment then I don't have tools
>which will help to safe a data in emergency.

But, if your disk breaks, then the only way to save your data is
to have replication, or a backup and logs on a different disk. A repair
tool cannot help in case of a disk failure. Because MySQL writes
its archived log on a high level (= logical log), your data is
quite safe in the log, since an error in a simple write of an ASCII
string is much less likely than a bug in a much more complicated 
binary log.

>HT> Hmm... even from a corrupted database you probably can read several
>HT> tables and recover them. The data to a database usually comes from an
outside
>HT> source, and some data can be recovered from those outside sources.
>
>Well. You can't say so.  If the dammage is in system area it could
>just fail to get the list of tables. These (metadata dammages) are
>worst thing for filesystem and here should be the same.

I agree, and repairing corrupted metadata can be very difficult too.
A good thing about MySQL is that it keeps metadata also in separate
.frm files. One could use them to repair the Innobase internal data dictionary
if it breaks.

Some file systems are log-based, i.e., they use the same recovery mechanism
as databases do. I am not sure if NTFS is such a file system.

>About outside sources - look at the real usages in a web (at least) -
>for exampe we have using a database as forum backend - if the data is
>lost will you be able to call the people to write the messages again ?

It may be best to write a log also on the application level to get
the maximum safety. Writing simple text log to a separate disk is
very fast and should not be a bottleneck.

Sorry, I must say that I disagree on the repair tool question. But
possibly practical experience will show that such a tool should be written.
What I am afraid of is that writing a general repair tool is a big
task (and getting it to work correctly so that it does not itself corrupt data).
Maybe we should look at what corruption problems will arise in pratical
applications and write a specific repair tool for them. Currently,
I am personally the best repair tool :), because I have to make sure
beta testers can use Innobase without problems.

Also I have to add to the online manual a chapter on backups and
recovery.

The comments and critique you give is the best feedback for Innobase
and it will find its way to future releases.

Best regards,

Heikki

>Best regards,
> Peter                            mailto:[EMAIL PROTECTED]


---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: Re[5]: More Innobase questions (Was: RE: Innobase)

Reply via email to