Christian,

the patch below may fix the problem if it is in the initialization of the
raw device to zero.

InnoDB does all normal i/o to/from the data files to memory addresses
aligned by UNIV_PAGE_SIZE. This is because earlier versions used the Windows
native AIO and that requires aligned memory addresses in i/o operations. But
since the initialization of the files was normal i/o, not AIO, I did not
remember to align the buffer there.

InnoDB has its own read-ahead for the buffer pool. Thus the read-ahead of
the OS is not necessarily needed.

Best regards,

Heikki Tuuri
Innobase Oy
---
InnoDB - transactions, hot backup, and foreign key support for MySQL
See http://www.innodb.com, download MySQL-Max from http://www.mysql.com

ChangeSet
  1.1097 02/07/19 08:33:52 [EMAIL PROTECTED] +1 -0
  os0file.c:
    Align the buffer used in initing a data file to zero; this may be needed
if the data file is actually a raw device

  innobase/os/os0file.c
    1.39 02/07/19 08:33:36 [EMAIL PROTECTED] +6 -2
    Align the buffer used in initing a data file to zero; this may be needed
if the data file is actually a raw device

# This is a BitKeeper patch.  What follows are the unified diffs for the
# set of deltas contained in the patch.  The rest of the patch, the part
# that BitKeeper cares about, is below these diffs.
# User: heikki
# Host: hundin.mysql.fi
# Root: /home/heikki/mysql3

--- 1.38/innobase/os/os0file.c Mon Jul  8 19:28:42 2002
+++ 1.39/innobase/os/os0file.c Fri Jul 19 08:33:36 2002
@@ -690,6 +690,7 @@
  ulint   n_bytes;
  ibool ret;
  byte*   buf;
+ byte*   buf2;
  ulint   i;

  ut_a(size == (size & 0xFFFFFFFF));
@@ -697,7 +698,10 @@
  /* We use a very big 8 MB buffer in writing because Linux may be
  extremely slow in fsync on 1 MB writes */

- buf = ut_malloc(UNIV_PAGE_SIZE * 512);
+ buf2 = ut_malloc(UNIV_PAGE_SIZE * 513);
+
+ /* Align the buffer for possible raw i/o */
+ buf = ut_align(buf2, UNIV_PAGE_SIZE);

  /* Write buffer full of zeros */
  for (i = 0; i < UNIV_PAGE_SIZE * 512; i++) {
@@ -725,7 +729,7 @@
          offset += n_bytes;
  }

- ut_free(buf);
+ ut_free(buf2);

  ret = os_file_flush(file);

----- Original Message -----
From: "Christian Jaeger" <[EMAIL PROTECTED]>
Newsgroups: mailing.database.mysql
Sent: Friday, July 19, 2002 6:37 AM
Subject: Innodb and unbuffered raw io on linux?


> Hello Heikki and all,
>
> I've already asked about this a year ago, but didn't finish my
> investigations then.
>
> What's the status with innodb and *unbuffered raw* io on linux?
>
> The manual describes the use of the "newraw" and "raw" options, and I
> know these work on disk devices (like /dev/sda8), but this isn't raw
> io, it's still cached by the kernel and so takes up RAM additional to
> the cache from innodb (as well as a bit CPU to copy over the data
> between kernel and user space). If you want to do direct IO, the use
> of the 'raw' tool to set up a 'raw character device' mapped to the
> disk block device is needed:
>
> cd /dev
> mkdir raw
> umask 077
> mknod rawctl u 162 0
> umask 007
> mknod raw/raw1 u 162 1
> mknod raw/raw2 u 162 2
> chgrp mysql raw/raw1
>      # ^- I'm not sure whether the access rights of the mapped device
>      # take precedence over those of the original block device, though
> raw raw/raw1 sda8
>
> I've tried Mysql with this config:
> #innodb_data_file_path=/dev/sda8:1906Mraw  <- did work, but buffered
> innodb_data_file_path=/dev/raw/raw1:1906Mraw
>
> 020719 00:59:24  mysqld started
> InnoDB: Operating system error number 22 in a file operation.
> InnoDB: See http://www.innodb.com/ibman.html for installation help.
> InnoDB: Look from section 13.2 at http://www.innodb.com/ibman.html
> InnoDB: what the error number means or use the perror program of MySQL.
> InnoDB: Cannot continue operation.
> 020719 00:59:25  mysqld ended
>
> perror 22
> Error code  22:  Invalid argument
>
> This error code is typical for when buffers are not aligned to sector
> sized memory boundaries, which is necessary for unbuffered io to work
> on linux.
> I've written an experimental program that shows this and put it here:
> http://pflanze.mine.nu/~chris/mysql/o_direct.c
>
> So I guess Innodb is not ready for unbuffered io. I'm also guessing
> that it's probably not that easy to achieve good performance with
> unbuffered io, since you would probably have to do readahead and so
> on yourself.
>
> I'm also unsure about the current status of rawio in linux (2.4).
> Reading on http://oss.sgi.com/projects/rawio/ (under the FAQ), they
> say that they have a "better" implementation than the one from
> Stephen Tweedie/Redhat. But the code in kernel 2.4 seems to be only
> the one from Stephen Tweedie.
> This is what the source code of the 'dd' tool (as found in
> Debian/testing) shows, btw:
>      /* ...
>       The page alignment is necessary on any linux system that supports
>       either the SGI raw I/O patch or Stephen Tweedies raw I/O patch.
>       It is necessary when accessing raw (i.e. character special) disk
>       devices on Unixware or other SVR4-derived system.  */
>
>
> Hope this helps a bit.
> What do you think about it?
> I could put a bit of time aside for testing (or maybe more, but who
> would pay me?...:)
>
> Cheers,
> Christian.
> --
> Christian Jaeger  Programmer & System Engineer  +41 1 430 45 26
> ETHLife CMS Project - www.ethlife.ethz.ch/newcms - www.ethlife.ethz.ch
>
> ---------------------------------------------------------------------
> Before posting, please check:
>    http://www.mysql.com/manual.php   (the manual)
>    http://lists.mysql.com/           (the list archive)
>
> To request this thread, e-mail <[EMAIL PROTECTED]>
> To unsubscribe, e-mail
<[EMAIL PROTECTED]>
> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
>



---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Reply via email to