Re: Innodb and unbuffered raw io on linux?

2002-07-19 Thread Heikki Tuuri

Christian,

the patch below may fix the problem if it is in the initialization of the
raw device to zero.

InnoDB does all normal i/o to/from the data files to memory addresses
aligned by UNIV_PAGE_SIZE. This is because earlier versions used the Windows
native AIO and that requires aligned memory addresses in i/o operations. But
since the initialization of the files was normal i/o, not AIO, I did not
remember to align the buffer there.

InnoDB has its own read-ahead for the buffer pool. Thus the read-ahead of
the OS is not necessarily needed.

Best regards,

Heikki Tuuri
Innobase Oy
---
InnoDB - transactions, hot backup, and foreign key support for MySQL
See http://www.innodb.com, download MySQL-Max from http://www.mysql.com

ChangeSet
  1.1097 02/07/19 08:33:52 [EMAIL PROTECTED] +1 -0
  os0file.c:
Align the buffer used in initing a data file to zero; this may be needed
if the data file is actually a raw device

  innobase/os/os0file.c
1.39 02/07/19 08:33:36 [EMAIL PROTECTED] +6 -2
Align the buffer used in initing a data file to zero; this may be needed
if the data file is actually a raw device

# This is a BitKeeper patch.  What follows are the unified diffs for the
# set of deltas contained in the patch.  The rest of the patch, the part
# that BitKeeper cares about, is below these diffs.
# User: heikki
# Host: hundin.mysql.fi
# Root: /home/heikki/mysql3

--- 1.38/innobase/os/os0file.c Mon Jul  8 19:28:42 2002
+++ 1.39/innobase/os/os0file.c Fri Jul 19 08:33:36 2002
@@ -690,6 +690,7 @@
  ulint   n_bytes;
  ibool ret;
  byte*   buf;
+ byte*   buf2;
  ulint   i;

  ut_a(size == (size  0x));
@@ -697,7 +698,10 @@
  /* We use a very big 8 MB buffer in writing because Linux may be
  extremely slow in fsync on 1 MB writes */

- buf = ut_malloc(UNIV_PAGE_SIZE * 512);
+ buf2 = ut_malloc(UNIV_PAGE_SIZE * 513);
+
+ /* Align the buffer for possible raw i/o */
+ buf = ut_align(buf2, UNIV_PAGE_SIZE);

  /* Write buffer full of zeros */
  for (i = 0; i  UNIV_PAGE_SIZE * 512; i++) {
@@ -725,7 +729,7 @@
  offset += n_bytes;
  }

- ut_free(buf);
+ ut_free(buf2);

  ret = os_file_flush(file);

- Original Message -
From: Christian Jaeger [EMAIL PROTECTED]
Newsgroups: mailing.database.mysql
Sent: Friday, July 19, 2002 6:37 AM
Subject: Innodb and unbuffered raw io on linux?


 Hello Heikki and all,

 I've already asked about this a year ago, but didn't finish my
 investigations then.

 What's the status with innodb and *unbuffered raw* io on linux?

 The manual describes the use of the newraw and raw options, and I
 know these work on disk devices (like /dev/sda8), but this isn't raw
 io, it's still cached by the kernel and so takes up RAM additional to
 the cache from innodb (as well as a bit CPU to copy over the data
 between kernel and user space). If you want to do direct IO, the use
 of the 'raw' tool to set up a 'raw character device' mapped to the
 disk block device is needed:

 cd /dev
 mkdir raw
 umask 077
 mknod rawctl u 162 0
 umask 007
 mknod raw/raw1 u 162 1
 mknod raw/raw2 u 162 2
 chgrp mysql raw/raw1
  # ^- I'm not sure whether the access rights of the mapped device
  # take precedence over those of the original block device, though
 raw raw/raw1 sda8

 I've tried Mysql with this config:
 #innodb_data_file_path=/dev/sda8:1906Mraw  - did work, but buffered
 innodb_data_file_path=/dev/raw/raw1:1906Mraw

 020719 00:59:24  mysqld started
 InnoDB: Operating system error number 22 in a file operation.
 InnoDB: See http://www.innodb.com/ibman.html for installation help.
 InnoDB: Look from section 13.2 at http://www.innodb.com/ibman.html
 InnoDB: what the error number means or use the perror program of MySQL.
 InnoDB: Cannot continue operation.
 020719 00:59:25  mysqld ended

 perror 22
 Error code  22:  Invalid argument

 This error code is typical for when buffers are not aligned to sector
 sized memory boundaries, which is necessary for unbuffered io to work
 on linux.
 I've written an experimental program that shows this and put it here:
 http://pflanze.mine.nu/~chris/mysql/o_direct.c

 So I guess Innodb is not ready for unbuffered io. I'm also guessing
 that it's probably not that easy to achieve good performance with
 unbuffered io, since you would probably have to do readahead and so
 on yourself.

 I'm also unsure about the current status of rawio in linux (2.4).
 Reading on http://oss.sgi.com/projects/rawio/ (under the FAQ), they
 say that they have a better implementation than the one from
 Stephen Tweedie/Redhat. But the code in kernel 2.4 seems to be only
 the one from Stephen Tweedie.
 This is what the source code of the 'dd' tool (as found in
 Debian/testing) shows, btw:
  /* ...
   The page alignment is necessary on any linux system that supports
   either the SGI raw I/O patch or Stephen Tweedies raw I/O patch.
   It is necessary when accessing raw (i.e. character special) disk
   devices on Unixware or other SVR4-derived system

Innodb and unbuffered raw io on linux?

2002-07-18 Thread Christian Jaeger

Hello Heikki and all,

I've already asked about this a year ago, but didn't finish my 
investigations then.

What's the status with innodb and *unbuffered raw* io on linux?

The manual describes the use of the newraw and raw options, and I 
know these work on disk devices (like /dev/sda8), but this isn't raw 
io, it's still cached by the kernel and so takes up RAM additional to 
the cache from innodb (as well as a bit CPU to copy over the data 
between kernel and user space). If you want to do direct IO, the use 
of the 'raw' tool to set up a 'raw character device' mapped to the 
disk block device is needed:

cd /dev
mkdir raw
umask 077
mknod rawctl u 162 0
umask 007
mknod raw/raw1 u 162 1
mknod raw/raw2 u 162 2
chgrp mysql raw/raw1
 # ^- I'm not sure whether the access rights of the mapped device
 # take precedence over those of the original block device, though
raw raw/raw1 sda8

I've tried Mysql with this config:
#innodb_data_file_path=/dev/sda8:1906Mraw  - did work, but buffered
innodb_data_file_path=/dev/raw/raw1:1906Mraw

020719 00:59:24  mysqld started
InnoDB: Operating system error number 22 in a file operation.
InnoDB: See http://www.innodb.com/ibman.html for installation help.
InnoDB: Look from section 13.2 at http://www.innodb.com/ibman.html
InnoDB: what the error number means or use the perror program of MySQL.
InnoDB: Cannot continue operation.
020719 00:59:25  mysqld ended

perror 22
Error code  22:  Invalid argument

This error code is typical for when buffers are not aligned to sector 
sized memory boundaries, which is necessary for unbuffered io to work 
on linux.
I've written an experimental program that shows this and put it here:
http://pflanze.mine.nu/~chris/mysql/o_direct.c

So I guess Innodb is not ready for unbuffered io. I'm also guessing 
that it's probably not that easy to achieve good performance with 
unbuffered io, since you would probably have to do readahead and so 
on yourself.

I'm also unsure about the current status of rawio in linux (2.4). 
Reading on http://oss.sgi.com/projects/rawio/ (under the FAQ), they 
say that they have a better implementation than the one from 
Stephen Tweedie/Redhat. But the code in kernel 2.4 seems to be only 
the one from Stephen Tweedie.
This is what the source code of the 'dd' tool (as found in 
Debian/testing) shows, btw:
 /* ...
  The page alignment is necessary on any linux system that supports
  either the SGI raw I/O patch or Stephen Tweedies raw I/O patch.
  It is necessary when accessing raw (i.e. character special) disk
  devices on Unixware or other SVR4-derived system.  */


Hope this helps a bit.
What do you think about it?
I could put a bit of time aside for testing (or maybe more, but who 
would pay me?...:)

Cheers,
Christian.
-- 
Christian Jaeger  Programmer  System Engineer  +41 1 430 45 26
ETHLife CMS Project - www.ethlife.ethz.ch/newcms - www.ethlife.ethz.ch

-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php