Re: scsi vs ide performance on fsync's
Since the intention of fsync and fdatasync seems to be to write dirty fs buffers to persistent storage (i.e. the "oxide") then the best time is not necessarily the objective. Given the IDE times that people have been reporting, it is very unlikely that any of those IDE disks were really doing 2000 discrete IO operations involving waiting for the those buffers to be written to the "oxide". [Reason: it should take at least 2000 revolutions of the disk to do it, since most of the 4KB writes are going to the same disk address as the prior write.] As it stands, the Linux SCSI subsystem has no mechanism to force a disk cache write through. The SCSI WRITE(10) command has a Force Unit Access bit (FUA) to do exactly that, but we don't use it. Do the fs/block layers flag they wish buffers written to the oxide?? The measurements that showed SCSI disks were taking a lot longer with the "xlog" test were more luck than good management. Here are some tests that show an IDE versus SCSI "xlog" comparison are very similar between FreeBSD 4.2 and lk 2.4.2 on the same hardware: # IBM DCHS04U SCSI disk 7200 rpm <> [root@free /var]# time /root/xlog tst.txt real0m0.043s [root@free /var]# time /root/xlog tst.txt fsync real0m33.131s # Quantum Fireball ST3.2A IDE disk 3600 rpm <> [root@free dos]# time /root/xlog tst.txt real0m0.034s [root@free dos]# time /root/xlog tst.txt fsync real0m5.737s # IBM DCHS04U SCSI disk 7200 rpm <> [root@tvilling extra]# time /root/xlog tst.txt 0:00.00elapsed 125%CPU [root@tvilling spare]# time /root/xlog tst.txt fsync 0:33.15elapsed 0%CPU # Quantum Fireball ST3.2A IDE disk 3600 rpm <> [root@tvilling /root]# time /root/xlog tst.txt 0:00.02elapsed 43%CPU [root@tvilling /root]# time /root/xlog tst.txt fsync 0:05.99elapsed 69%CPU Notes: FreeBSD doesn't have fdatasync() so I changed xlog to use fsync(). Linux timings were the same with fsync() and fdatasync(). The xlog program crashed immediately in FreeBSD; it needed some sanity checks on its arguments. One further note: I wrote: > [snip] > So writing more data to the SCSI disk speeds it up! > I suspect the critical point in the "20*200" test is > that the same sequence of 8 512 byte sectors are being > written to disk 200 times. BTW That disk spins at > 15K rpm so one rotation takes 4 ms and it has a > 4 MB cache. A clarification: by "same sequence" I meant written to the same disk address. If the 4 KB lies on the same track, then a delay of one disk revolution would be expected before you could write the next 4 KB to the "oxide" at the same address. Doug Gilbert - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail <[EMAIL PROTECTED]> To unsubscribe, e-mail <[EMAIL PROTECTED]> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
RE: scsi vs ide performance on fsync's
Hello, Michael Widenius wrote on Monday, March 05, 2001: > > I wonder from where the fdatasync() is comming; MySQL is not doing > those (if you are not running mysqld with --flush) The call is either a fsync or an fdatasync that is done by Berkley DB on the transaction log. Regards, Chris Delaney - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail <[EMAIL PROTECTED]> To unsubscribe, e-mail <[EMAIL PROTECTED]> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: scsi vs ide performance on fsync's
Hi! > "Mike" == Mike Black <[EMAIL PROTECTED]> writes: Mike> Here's a strace -r on IDE: Mike> 0.001488 write(3, "\214\1\0\0Blah Blah Blah Blah Blah Bla"..., 56) = 56 Mike> 0.000516 fdatasync(0x3)= 0 Mike> 0.001530 write(3, "\215\1\0\0Blah Blah Blah Blah Blah Bla"..., 56) = 56 Mike> 0.000513 fdatasync(0x3)= 0 Mike> 0.001555 write(3, "\216\1\0\0Blah Blah Blah Blah Blah Bla"..., 56) = 56 Mike> 0.000517 fdatasync(0x3)= 0 Mike> 0.001494 write(3, "\217\1\0\0Blah Blah Blah Blah Blah Bla"..., 56) = 56 Mike> 0.000515 fdatasync(0x3)= 0 Mike> 0.001495 write(3, "\220\1\0\0Blah Blah Blah Blah Blah Bla"..., 56) = 56 Mike> 0.000522 fdatasync(0x3)= 0 Mike> Here it is on SCSI: Mike> 0.049285 write(3, "\3\0\0\0Blah Blah Blah Blah Blah Bla"..., 56) = 56 Mike> 0.000689 fdatasync(0x3)= 0 Mike> 0.049148 write(3, "\4\0\0\0Blah Blah Blah Blah Blah Bla"..., 56) = 56 Mike> 0.000516 fdatasync(0x3)= 0 Mike> 0.049318 write(3, "\5\0\0\0Blah Blah Blah Blah Blah Bla"..., 56) = 56 Mike> 0.000516 fdatasync(0x3)= 0 Mike> 0.049343 write(3, "\6\0\0\0Blah Blah Blah Blah Blah Bla"..., 56) = 56 Mike> Looks like a constant 50ms delay on each fdatasync() on SCSI vs .5ms for Mike> IDE. Maybe IDE isn't really doing a sync?? I find .5ms to be a little too Mike> good. I wonder from where the fdatasync() is comming; MySQL is not doing those (if you are not running mysqld with --flush) Mike> I did this on 4 different machines with different SCSI cards (include RAID5 Mike> and non-RAID), disks, and IDE drives with the same behavior. Regards, Monty - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail <[EMAIL PROTECTED]> To unsubscribe, e-mail <[EMAIL PROTECTED]> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: scsi vs ide performance on fsync's
Douglas Gilbert wrote: > There is definitely something strange going on here. > As the bonnie test below shows, the SCSI disk used > for my tests should vastly outperform the old IDE one: First thank you and others with my clueless investigation about the module loading under Debian GNU/Linux. (I should have known that Debian uses a very special module setup.) Anyway, I used to think SCSI is better than IDE in general, and the post was quite surprising. So I ran the test on my PC. On my systems too, the IDE beats SCSI hand down with the test case. BTW, has anyone noticed that the elapsed time of SCSI case is TWICE as long if we let the previous output of the test program stay before running the second test? (I suspect fdatasync takes time proportional to the (then current) file size, but still why SCSI case is so long is beyond me.) Eg. ishikawa@duron$ ls -l /tmp/t.out ls: /tmp/t.out: No such file or directory ishikawa@duron$ time ./xlog /tmp/t.out fsync real0m38.673s<=== my scsi disk is slow one to begin with... user0m0.050s sys 0m0.140s ishikawa@duron$ ls -l /tmp/t.out -rw-r--r--1 ishikawa users 112000 Mar 5 06:19 /tmp/t.out ishikawa@duron$ time ./xlog /tmp/t.out fsync real1m16.928s<=== See TWICE as long! user0m0.060s sys 0m0.160s ishikawa@duron$ ls -l /tmp/t.out -rw-r--r--1 ishikawa users 112000 Mar 5 06:20 /tmp/t.out ishikawa@duron$ rm /tmp/t.out< REMOVE the file and try again. ishikawa@duron$ time ./xlog /tmp/t.out fsync real0m40.667s < Half as long and back to original. user0m0.040s sys 0m0.120s iishikawa@duron$ time ./xlog /tmp/t.out xxx real0m0.012s <=== very fast without fdatasync as it should be. user0m0.010s sys 0m0.010s ishikawa@duron$ - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail <[EMAIL PROTECTED]> To unsubscribe, e-mail <[EMAIL PROTECTED]> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: scsi vs ide performance on fsync's
There is definitely something strange going on here. As the bonnie test below shows, the SCSI disk used for my tests should vastly outperform the old IDE one: ---Sequential Output ---Sequential Input-- --Random-- Seagate -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- ST318451LW MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU SCSI 200 21544 96.8 51367 51.4 11141 16.3 17729 58.2 40968 40.4 602.9 5.4 Quantum ---Sequential Output ---Sequential Input-- --Random-- Fireball -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- ST3.2A MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU IDE 200 3884 72.8 4513 86.0 1781 36.4 3144 89.9 4052 95.3 131.5 0.9 I used a program based on Mike Black's "Blah Blah" test (shown below) in which 200 write()+fdatasync()s are performed. Each write() outputs either 20 or 4096 bytes. On my Celeron 533 Mhz 128 MB ram hardware with an ext2 fs, the "block" size that is seen by the sd driver for each fdatasync() is 4096 bytes. lk 2.4.2 is being used. The fs/buffer.c __wait_on_buffer() routine waits for IO completion in response to fdatasync(). Timings have been done with Andrew Morton's timepegs (units are microseconds). Here are the IDE results: IDE 20*200 Destination Count Min Max Average Total enter __wait_on_buffer:0 -> leave __wait_on_buffer:0 2001,037.23 6,487.72 1,252.19 250,439.80 leave __wait_on_buffer:0 -> enter __wait_on_buffer:0 1997.32 21.05 7.821,557.05 IDE 4096*200 Destination Count Min Max Average Total enter __wait_on_buffer:0 -> leave __wait_on_buffer:0 2001,037.06 7,354.21 1,243.78 248,756.64 leave __wait_on_buffer:0 -> enter __wait_on_buffer:0 199 23.01 67.32 37.037,370.51 So the size of each transfer doesn't matter to this IDE disk. Now the same test for the SCSI disk: SCSI(20*200) Destination Count Min Max Average Total enter __wait_on_buffer:0 -> enter sd_init_command:0 200 1.86 13.27 2.05 411.48 enter sd_init_command:0 -> enter rw_intr:0 200320.87 5,398.56 3,417.30 683,461.25 enter rw_intr:0 -> leave __wait_on_buffer:0 200 4.04 15.81 4.42 885.73 leave __wait_on_buffer:0 -> enter __wait_on_buffer:0 199 8.78 14.39 9.261,844.23 SCSI(4096*200) Destination Count MinMax Average Total enter __wait_on_buffer:0 -> enter sd_init_command:0 200 1.97 13.20 2.21 443.52 enter sd_init_command:0 -> enter rw_intr:0 200109.53 13,997.50 1,327.47 265,495.87 enter rw_intr:0 -> leave __wait_on_buffer:0 200 4.37 22.50 4.75 951.44 leave __wait_on_buffer:0 -> enter __wait_on_buffer:0 199 22.40 42.20 24.274,831.34 The extra timepegs inside the SCSI subsystem show that the IO transaction to that disk really did take that long. [Initially I suspected a "plugging" type elevator bug, but that isn't supported by the above and various other timepegs not shown.] Since there is a wait on completion for every write, tagged queuing should not be involved. So writing more data to the SCSI disk speeds it up! I suspect the critical point in the "20*200" test is that the same sequence of 8 512 byte sectors are being written to disk 200 times. BTW That disk spins at 15K rpm so one rotation takes 4 ms and it has a 4 MB cache. Even though the SCSI disk's "cache" mode page indicates that the write cache is on, it would seem that writing the same sectors continually causes flushes to the medium (and hence the associated delay). Here is scu's output of the "cache" mode page: $ scu -f /dev/sda show page cache Cache Control Parameters (Page 0x8 - Current Values): Mode Parameter Header: Mode Data Length: 31 Medium Type: 0 (Default Medium Type) Device Specific Parameter: 0x10 (Supports DPO & FUA bits) Block Descriptor Length: 8 Mode Parameter Block Descriptor: Density Code: 0x2 Number of Logical Blocks: 2289239 (1117.792 megabytes) Logical Block Length: 512 Page Header / Data: Page Code: 0x8 Parameters Savable: Yes Page Length: 18 Read Cache Disable (RCD): No Multiplication Factor (MF): Off Write Cache Enable (WCE): Yes Cache Segment Size Enable (SIZE): Off Discontinuity (DISC): On Caching Analysis Permitted (CAP): Disabled Abort Pre-Fetch (ABPF): Off Initiator Control Enable (IC): Off Write Retention Priority: 0 (Not distiguished) Demand Read Retention Priority: 0 (Not distiguished) Disable
Re: scsi vs ide performance on fsync's
This is just a guess - I have significant experience of Scsi drives but none of Unix internals. To do a good sync, you have to force the data from the CPU to the disk, and from the disk ram onto the disk oxide. IDE disks are not very clever, and I do not think that they cache unwritten data. If, therefore, the data has left the CPU, it will have hit the oxide. Scsi disks, however, play considerable tricks, which may include delayed writeback. If I were writing a Scsi disk drive, I would be strongly tempted to put a Scsi Rezero command into the sync operation. This has the effect of flushing all cached data to disk - amongst other things. It is the "amongst other things" which is the catch. Some disk manufacturers just do a simple reset of the disk's seek logic, which would only take a few milliseconds. Others treat a Rezero command as an instruction to do a full thermal recalibrate, which may take 250 milliseconds. This means that drivers tested on one brand of disk will show essentially no performance hit from doing a sync with Rezero, whilst a different brand would show a collossal hit. Yours appears to fall between the two extremes. Alec Cawley - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail <[EMAIL PROTECTED]> To unsubscribe, e-mail <[EMAIL PROTECTED]> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
Re: scsi vs ide performance on fsync's
Here's a strace -r on IDE: 0.001488 write(3, "\214\1\0\0Blah Blah Blah Blah Blah Bla"..., 56) = 56 0.000516 fdatasync(0x3)= 0 0.001530 write(3, "\215\1\0\0Blah Blah Blah Blah Blah Bla"..., 56) = 56 0.000513 fdatasync(0x3)= 0 0.001555 write(3, "\216\1\0\0Blah Blah Blah Blah Blah Bla"..., 56) = 56 0.000517 fdatasync(0x3)= 0 0.001494 write(3, "\217\1\0\0Blah Blah Blah Blah Blah Bla"..., 56) = 56 0.000515 fdatasync(0x3)= 0 0.001495 write(3, "\220\1\0\0Blah Blah Blah Blah Blah Bla"..., 56) = 56 0.000522 fdatasync(0x3)= 0 Here it is on SCSI: 0.049285 write(3, "\3\0\0\0Blah Blah Blah Blah Blah Bla"..., 56) = 56 0.000689 fdatasync(0x3)= 0 0.049148 write(3, "\4\0\0\0Blah Blah Blah Blah Blah Bla"..., 56) = 56 0.000516 fdatasync(0x3)= 0 0.049318 write(3, "\5\0\0\0Blah Blah Blah Blah Blah Bla"..., 56) = 56 0.000516 fdatasync(0x3)= 0 0.049343 write(3, "\6\0\0\0Blah Blah Blah Blah Blah Bla"..., 56) = 56 Looks like a constant 50ms delay on each fdatasync() on SCSI vs .5ms for IDE. Maybe IDE isn't really doing a sync?? I find .5ms to be a little too good. I did this on 4 different machines with different SCSI cards (include RAID5 and non-RAID), disks, and IDE drives with the same behavior. Michael D. Black Principal Engineer [EMAIL PROTECTED] 321-676-2923,x203 http://www.csihq.com Computer Science Innovations http://www.csihq.com/~mike My home page FAX 321-676-2355 - Original Message - From: "Jeremy Hansen" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Friday, March 02, 2001 11:27 AM Subject: scsi vs ide performance on fsync's We're doing some mysql benchmarking. For some reason it seems that ide drives are currently beating a scsi raid array and it seems to be related to fsync's. Bonnie stats show the scsi array to blow away ide as expected, but mysql tests still have the idea beating on plain insert speeds. Can anyone explain how this is possible, or perhaps explain how our testing may be flawed? Here's the bonnie stats: IDE Drive: Version 1.00g --Sequential Output-- --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP jeremy 300M 9026 94 17524 12 8173 9 7269 83 23678 7 102.9 0 --Sequential Create-- Random Create -Create-- --Read--- -Delete-- -Create-- --Read--- -Delet e-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 469 98 1476 98 16855 89 459 98 7132 99 688 25 SCSI Array: Version 1.00g --Sequential Output-- --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP orville300M 8433 100 134143 99 127982 99 8016 100 374457 99 1583.4 6 --Sequential Create-- Random Create -Create-- --Read--- -Delete-- -Create-- --Read--- -Delet e-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 503 13 + +++ 538 13 490 13 + +++ 428 11 So...obviously from bonnie stats, the scsi array blows away the ide...but using the attached c program, here's what we get for fsync stats using the little c program I've attached: IDE Drive: jeremy:~# time ./xlog file.out fsync real0m1.850s user0m0.000s sys 0m0.220s SCSI Array: [root@orville mysql_data]# time /root/xlog file.out fsync real0m23.586s user0m0.010s sys 0m0.110s I would appreciate any help understand what I'm seeing here and any suggestions on how to improve the performance. The SCSI adapter on the raid array is an Adaptec 39160, the raid controller is a CMD-7040. Kernel 2.4.0 using XFS for the filesystem on the raid array, kernel 2.2.18 on ext2 on the IDE drive. The filesystem is not the problem, as I get almost the exact same results running this on ext2 on the raid array. Thanks -jeremy -- this is my sig. - Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail <[EMAIL PROTECTED]> To unsubscribe, e-mail <[EMAIL PROTECTED]> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php