Re: scsi vs ide performance on fsync's

2001-03-05 Thread Douglas Gilbert

Since the intention of fsync and fdatasync seems to be
to write dirty fs buffers to persistent storage (i.e.
the "oxide") then the best time is not necessarily
the objective. Given the IDE times that people have 
been reporting, it is very unlikely that any of those
IDE disks were really doing 2000 discrete IO operations
involving waiting for the those buffers to be written
to the "oxide". [Reason: it should take at least 2000 
revolutions of the disk to do it, since most of the
4KB writes are going to the same disk address as the
prior write.]

As it stands, the Linux SCSI subsystem has no mechanism 
to force a disk cache write through. The SCSI WRITE(10)
command has a Force Unit Access bit (FUA) to do exactly
that, but we don't use it. Do the fs/block layers flag
they wish buffers written to the oxide?? 
The measurements that showed SCSI disks were taking a lot 
longer with the "xlog" test were more luck than good 
management.

Here are some tests that show an IDE versus SCSI "xlog"
comparison are very similar between FreeBSD 4.2 and
lk 2.4.2 on the same hardware: 

# IBM DCHS04U SCSI disk 7200 rpm  <>
[root@free /var]# time /root/xlog tst.txt
real0m0.043s
[root@free /var]# time /root/xlog tst.txt fsync
real0m33.131s

# Quantum Fireball ST3.2A IDE disk 3600 rpm  <>
[root@free dos]# time /root/xlog tst.txt
real0m0.034s
[root@free dos]# time /root/xlog tst.txt fsync
real0m5.737s


# IBM DCHS04U SCSI disk 7200 rpm  <>
[root@tvilling extra]# time /root/xlog tst.txt
0:00.00elapsed 125%CPU
[root@tvilling spare]# time /root/xlog tst.txt fsync
0:33.15elapsed 0%CPU

# Quantum Fireball ST3.2A IDE disk 3600 rpm  <>
[root@tvilling /root]# time /root/xlog tst.txt
0:00.02elapsed 43%CPU
[root@tvilling /root]# time /root/xlog tst.txt fsync
0:05.99elapsed 69%CPU


Notes: FreeBSD doesn't have fdatasync() so I changed xlog 
to use fsync(). Linux timings were the same with fsync() 
and fdatasync(). The xlog program crashed immediately in
FreeBSD; it needed some sanity checks on its arguments.

One further note: I wrote:
> [snip] 
> So writing more data to the SCSI disk speeds it up!
> I suspect the critical point in the "20*200" test is
> that the same sequence of 8 512 byte sectors are being
> written to disk 200 times. BTW That disk spins at
> 15K rpm so one rotation takes 4 ms and it has a
> 4 MB cache.

A clarification: by "same sequence" I meant written
to the same disk address. If the 4 KB lies on the same
track, then a delay of one disk revolution would be
expected before you could write the next 4 KB to the 
"oxide" at the same address.

Doug Gilbert

-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




RE: scsi vs ide performance on fsync's

2001-03-05 Thread Chris Delaney

Hello,

Michael Widenius wrote on Monday, March 05, 2001:
>
> I wonder from where the fdatasync() is comming;  MySQL is not doing
> those (if you are not running mysqld with --flush)

The call is either a fsync or an fdatasync that is done by Berkley DB on the
transaction log.

Regards,
Chris Delaney


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: scsi vs ide performance on fsync's

2001-03-05 Thread Michael Widenius


Hi!

> "Mike" == Mike Black <[EMAIL PROTECTED]> writes:

Mike> Here's a strace -r on IDE:
Mike>  0.001488 write(3, "\214\1\0\0Blah Blah Blah Blah Blah Bla"..., 56) = 56
Mike>  0.000516 fdatasync(0x3)= 0
Mike>  0.001530 write(3, "\215\1\0\0Blah Blah Blah Blah Blah Bla"..., 56) = 56
Mike>  0.000513 fdatasync(0x3)= 0
Mike>  0.001555 write(3, "\216\1\0\0Blah Blah Blah Blah Blah Bla"..., 56) = 56
Mike>  0.000517 fdatasync(0x3)= 0
Mike>  0.001494 write(3, "\217\1\0\0Blah Blah Blah Blah Blah Bla"..., 56) = 56
Mike>  0.000515 fdatasync(0x3)= 0
Mike>  0.001495 write(3, "\220\1\0\0Blah Blah Blah Blah Blah Bla"..., 56) = 56
Mike>  0.000522 fdatasync(0x3)= 0

Mike> Here it is on SCSI:
Mike>  0.049285 write(3, "\3\0\0\0Blah Blah Blah Blah Blah Bla"..., 56) = 56
Mike>  0.000689 fdatasync(0x3)= 0
Mike>  0.049148 write(3, "\4\0\0\0Blah Blah Blah Blah Blah Bla"..., 56) = 56
Mike>  0.000516 fdatasync(0x3)= 0
Mike>  0.049318 write(3, "\5\0\0\0Blah Blah Blah Blah Blah Bla"..., 56) = 56
Mike>  0.000516 fdatasync(0x3)= 0
Mike>  0.049343 write(3, "\6\0\0\0Blah Blah Blah Blah Blah Bla"..., 56) = 56

Mike> Looks like a constant 50ms delay on each fdatasync() on SCSI vs .5ms for
Mike> IDE.  Maybe IDE isn't really doing a sync??  I find .5ms to be a little too
Mike> good.

I wonder from where the fdatasync() is comming;  MySQL is not doing
those (if you are not running mysqld with --flush)

Mike> I did this on 4 different machines with different SCSI cards (include RAID5
Mike> and non-RAID), disks, and IDE drives with the same behavior.

Regards,
Monty

-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: scsi vs ide performance on fsync's

2001-03-05 Thread Ishikawa
Douglas Gilbert wrote:

> There is definitely something strange going on here.
> As the bonnie test below shows, the SCSI disk used
> for my tests should vastly outperform the old IDE one:

First thank you and others with my clueless investigation about
the module loading under Debian GNU/Linux. (I should have known
that Debian uses a very special module setup.)

Anyway, I used to think SCSI is better than IDE in general, and
the post was quite surprising.
So I ran the test on my PC.
On my systems too, the IDE beats SCSI hand down with the test case.

BTW, has anyone noticed that
the elapsed time of SCSI case is TWICE as long if
we let the previous output of the test program stay before
running the second test? (I suspect fdatasync
takes time proportional to the (then current)  file size, but
still why SCSI case is so long is beyond me.)

Eg.

ishikawa@duron$ ls -l /tmp/t.out
ls: /tmp/t.out: No such file or directory
ishikawa@duron$ time ./xlog /tmp/t.out fsync

real0m38.673s<=== my scsi disk is slow one to begin with...
user0m0.050s
sys 0m0.140s
ishikawa@duron$ ls -l /tmp/t.out
-rw-r--r--1 ishikawa users  112000 Mar  5 06:19 /tmp/t.out
ishikawa@duron$ time ./xlog /tmp/t.out fsync

real1m16.928s<=== See TWICE as long!
user0m0.060s
sys 0m0.160s
ishikawa@duron$ ls -l /tmp/t.out
-rw-r--r--1 ishikawa users  112000 Mar  5 06:20 /tmp/t.out
ishikawa@duron$ rm /tmp/t.out< REMOVE the file and try again.
ishikawa@duron$ time ./xlog /tmp/t.out fsync

real0m40.667s   < Half as long and back to original.
user0m0.040s
sys 0m0.120s
iishikawa@duron$ time ./xlog /tmp/t.out xxx

real0m0.012s  <=== very fast without fdatasync as it should be.
user0m0.010s
sys 0m0.010s
ishikawa@duron$



-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php


Re: scsi vs ide performance on fsync's

2001-03-05 Thread Douglas Gilbert

There is definitely something strange going on here.
As the bonnie test below shows, the SCSI disk used
for my tests should vastly outperform the old IDE one:

  ---Sequential Output ---Sequential Input-- --Random--
Seagate   -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
ST318451LW MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
SCSI  200 21544 96.8 51367 51.4 11141 16.3 17729 58.2 40968 40.4 602.9  5.4

Quantum   ---Sequential Output ---Sequential Input-- --Random--
Fireball  -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
ST3.2A MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
IDE   200  3884 72.8  4513 86.0  1781 36.4  3144 89.9  4052 95.3 131.5  0.9

I used a program based on Mike Black's "Blah Blah" test
(shown below) in which 200 write()+fdatasync()s are 
performed. Each write() outputs either 20 or 4096 bytes.

On my Celeron 533 Mhz 128 MB ram hardware with an ext2 fs,
the "block" size that is seen by the sd driver for each 
fdatasync() is 4096 bytes. lk 2.4.2 is being used. The 
fs/buffer.c __wait_on_buffer() routine waits for IO 
completion in response to fdatasync(). Timings have been 
done with Andrew Morton's timepegs (units are microseconds). 
Here are the IDE results:

IDE 20*200 Destination  Count   Min   Max   Average   Total
enter __wait_on_buffer:0 ->
  leave __wait_on_buffer:0  2001,037.23  6,487.72  1,252.19  250,439.80
leave __wait_on_buffer:0 ->
  enter __wait_on_buffer:0  1997.32 21.05  7.821,557.05

IDE 4096*200   Destination  Count   Min   Max   Average   Total
enter __wait_on_buffer:0 ->
  leave __wait_on_buffer:0  2001,037.06  7,354.21  1,243.78  248,756.64
leave __wait_on_buffer:0 ->
  enter __wait_on_buffer:0  199   23.01 67.32 37.037,370.51


So the size of each transfer doesn't matter to this IDE
disk. Now the same test for the SCSI disk:

SCSI(20*200)   Destination  Count Min   Max   Average   Total
enter __wait_on_buffer:0 ->
   enter sd_init_command:0  200  1.86 13.27  2.05  411.48
enter sd_init_command:0 ->
   enter rw_intr:0  200320.87  5,398.56  3,417.30  683,461.25
enter rw_intr:0 ->
  leave __wait_on_buffer:0  200  4.04 15.81  4.42  885.73
leave __wait_on_buffer:0 ->
  enter __wait_on_buffer:0  199  8.78 14.39  9.261,844.23

SCSI(4096*200) Destination  Count MinMax   Average   Total
enter __wait_on_buffer:0 ->
   enter sd_init_command:0  200  1.97  13.20  2.21  443.52
enter sd_init_command:0 ->
   enter rw_intr:0  200109.53  13,997.50  1,327.47  265,495.87
enter rw_intr:0 ->
  leave __wait_on_buffer:0  200  4.37  22.50  4.75  951.44
leave __wait_on_buffer:0 ->
  enter __wait_on_buffer:0  199 22.40  42.20 24.274,831.34

The extra timepegs inside the SCSI subsystem show that 
the IO transaction to that disk really did take that 
long. [Initially I suspected a "plugging" type
elevator bug, but that isn't supported by the above
and various other timepegs not shown.]
Since there is a wait on completion for every write,
tagged queuing should not be involved.

So writing more data to the SCSI disk speeds it up!
I suspect the critical point in the "20*200" test is
that the same sequence of 8 512 byte sectors are being 
written to disk 200 times. BTW That disk spins at
15K rpm so one rotation takes 4 ms and it has a
4 MB cache.

Even though the SCSI disk's "cache" mode page indicates
that the write cache is on, it would seem that writing 
the same sectors continually causes flushes to the medium 
(and hence the associated delay). Here is scu's output 
of the "cache" mode page:

$ scu -f /dev/sda show page cache
Cache Control Parameters (Page 0x8 - Current Values):

Mode Parameter Header:

  Mode Data Length: 31
   Medium Type: 0 (Default Medium Type)
 Device Specific Parameter: 0x10 (Supports DPO & FUA bits)
   Block Descriptor Length: 8

Mode Parameter Block Descriptor:

  Density Code: 0x2
  Number of Logical Blocks: 2289239 (1117.792 megabytes)
  Logical Block Length: 512

Page Header / Data:
 Page Code: 0x8
Parameters Savable: Yes
   Page Length: 18
  Read Cache Disable (RCD): No
Multiplication Factor (MF): Off
  Write Cache Enable (WCE): Yes
  Cache Segment Size Enable (SIZE): Off
  Discontinuity (DISC): On
  Caching Analysis Permitted (CAP): Disabled
Abort Pre-Fetch (ABPF): Off
 Initiator Control Enable (IC): Off
  Write Retention Priority: 0 (Not distiguished)
Demand Read Retention Priority: 0 (Not distiguished)
  Disable 

Re: scsi vs ide performance on fsync's

2001-03-02 Thread alec . cawley

This is just a guess - I have significant experience of Scsi drives but none of Unix 
internals. To
do a good sync, you have to force the data from the CPU to the disk, and from the disk 
ram onto the
disk oxide. IDE disks are not very clever, and I do not think that they cache 
unwritten data. If,
therefore, the data has left the CPU, it will have hit the oxide. Scsi disks, however, 
play
considerable tricks, which may include delayed writeback. If I were writing a Scsi 
disk drive, I
would be strongly tempted to put a Scsi Rezero command into the sync operation. This 
has the effect
of flushing all cached data to disk - amongst other things.

It is the "amongst other things" which is the catch. Some disk manufacturers just do a 
simple reset
of the disk's seek logic, which would only take a few milliseconds. Others treat a 
Rezero command as
an instruction to do a full thermal recalibrate, which may take 250 milliseconds. This 
means that
drivers tested on one brand of disk will show essentially no performance hit from 
doing a sync with
Rezero, whilst a different brand would show a collossal hit. Yours appears to fall 
between the two
extremes.

 Alec Cawley



-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: scsi vs ide performance on fsync's

2001-03-02 Thread Mike Black

Here's a strace -r on IDE:
 0.001488 write(3, "\214\1\0\0Blah Blah Blah Blah Blah Bla"..., 56) = 56
 0.000516 fdatasync(0x3)= 0
 0.001530 write(3, "\215\1\0\0Blah Blah Blah Blah Blah Bla"..., 56) = 56
 0.000513 fdatasync(0x3)= 0
 0.001555 write(3, "\216\1\0\0Blah Blah Blah Blah Blah Bla"..., 56) = 56
 0.000517 fdatasync(0x3)= 0
 0.001494 write(3, "\217\1\0\0Blah Blah Blah Blah Blah Bla"..., 56) = 56
 0.000515 fdatasync(0x3)= 0
 0.001495 write(3, "\220\1\0\0Blah Blah Blah Blah Blah Bla"..., 56) = 56
 0.000522 fdatasync(0x3)= 0

Here it is on SCSI:
 0.049285 write(3, "\3\0\0\0Blah Blah Blah Blah Blah Bla"..., 56) = 56
 0.000689 fdatasync(0x3)= 0
 0.049148 write(3, "\4\0\0\0Blah Blah Blah Blah Blah Bla"..., 56) = 56
 0.000516 fdatasync(0x3)= 0
 0.049318 write(3, "\5\0\0\0Blah Blah Blah Blah Blah Bla"..., 56) = 56
 0.000516 fdatasync(0x3)= 0
 0.049343 write(3, "\6\0\0\0Blah Blah Blah Blah Blah Bla"..., 56) = 56

Looks like a constant 50ms delay on each fdatasync() on SCSI vs .5ms for
IDE.  Maybe IDE isn't really doing a sync??  I find .5ms to be a little too
good.

I did this on 4 different machines with different SCSI cards (include RAID5
and non-RAID), disks, and IDE drives with the same behavior.



Michael D. Black   Principal Engineer
[EMAIL PROTECTED]  321-676-2923,x203
http://www.csihq.com  Computer Science Innovations
http://www.csihq.com/~mike  My home page
FAX 321-676-2355
- Original Message -
From: "Jeremy Hansen" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Friday, March 02, 2001 11:27 AM
Subject: scsi vs ide performance on fsync's



We're doing some mysql benchmarking.  For some reason it seems that ide
drives are currently beating a scsi raid array and it seems to be related
to fsync's.  Bonnie stats show the scsi array to blow away ide as
expected, but mysql tests still have the idea beating on plain insert
speeds.  Can anyone explain how this is possible, or perhaps explain how
our testing may be flawed?

Here's the bonnie stats:

IDE Drive:

Version 1.00g   --Sequential Output-- --Sequential
Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per
Chr- --Block-- --Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec
%CP
jeremy 300M  9026  94 17524  12  8173   9  7269  83 23678   7 102.9
0
--Sequential Create-- Random
Create
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delet
e--
  files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec
%CP
 16   469  98  1476  98 16855  89   459  98  7132  99   688
25


SCSI Array:

Version 1.00g   --Sequential Output-- --Sequential
Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per
Chr- --Block-- --Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec
%CP
orville300M  8433 100 134143  99 127982  99  8016 100 374457  99
1583.4   6
--Sequential Create-- Random
Create
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delet
e--
  files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec
%CP
 16   503  13 + +++   538  13   490  13 + +++   428
11

So...obviously from bonnie stats, the scsi array blows away the ide...but
using the attached c program, here's what we get for fsync stats using the
little c program I've attached:

IDE Drive:

jeremy:~# time ./xlog file.out fsync

real0m1.850s
user0m0.000s
sys 0m0.220s

SCSI Array:

[root@orville mysql_data]# time /root/xlog file.out fsync

real0m23.586s
user0m0.010s
sys 0m0.110s


I would appreciate any help understand what I'm seeing here and any
suggestions on how to improve the performance.

The SCSI adapter on the raid array is an Adaptec 39160, the raid
controller is a CMD-7040.  Kernel 2.4.0 using XFS for the filesystem on
the raid array, kernel 2.2.18 on ext2 on the IDE drive.  The filesystem is
not the problem, as I get almost the exact same results running this on
ext2 on the raid array.

Thanks
-jeremy

--
this is my sig.









-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php