Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?
Bob Friesenhahn wrote: > On Tue, 7 Jul 2009, Joerg Schilling wrote: > > > > posix_fadvise seems to be _very_ new for Solaris and even though I am > > frequently reading/writing the POSIX standards mailing list, I was not > > aware of > > it. > > > > From my tests with star, I cannot see a significant performance increase > > but it > > may have a 3% effect > > Based on the prior discussions of using mmap() with ZFS and the way > ZFS likes to work, my guess is that POSIX_FADV_NOREUSE does nothing at > all and POSIX_FADV_DONTNEED probably does not work either. These are > pretty straightforward to implement with UFS since UFS benefits from > the existing working madvise() functionality. I did run my tests on UFS... > ZFS seems to want to cache all read data in the ARC, period. And this is definitely a conceptional mistake as there are applications like star that like to benefit from read ahead but that don't like to trash caches. Jörg -- EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin j...@cs.tu-berlin.de(uni) joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?
On Tue, 7 Jul 2009, Joerg Schilling wrote: posix_fadvise seems to be _very_ new for Solaris and even though I am frequently reading/writing the POSIX standards mailing list, I was not aware of it. From my tests with star, I cannot see a significant performance increase but it may have a 3% effect Based on the prior discussions of using mmap() with ZFS and the way ZFS likes to work, my guess is that POSIX_FADV_NOREUSE does nothing at all and POSIX_FADV_DONTNEED probably does not work either. These are pretty straightforward to implement with UFS since UFS benefits from the existing working madvise() functionality. ZFS seems to want to cache all read data in the ARC, period. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?
On Mon, Jul 06, 2009 at 04:54:16PM +0100, Andrew Gabriel wrote: > Andre van Eyssen wrote: > >On Mon, 6 Jul 2009, Gary Mills wrote: > > > >>As for a business case, we just had an extended and catastrophic > >>performance degradation that was the result of two ZFS bugs. If we > >>have another one like that, our director is likely to instruct us to > >>throw away all our Solaris toys and convert to Microsoft products. > > > >If you change platform every time you get two bugs in a product, you > >must cycle platforms on a pretty regular basis! > > You often find the change is towards Windows. That very rarely has the > same rules applied, so things then stick there. There's a more general principle in operation here. Organizations do sometimes change platforms for peculiar reasons, but once they do that they're not going to do it again for a long time. That's why they disregard problems with the new platform. -- -Gary Mills--Unix Support--U of M Academic Computing and Networking- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?
James Andrewartha wrote: > Joerg Schilling wrote: > > I would be interested to see a open(2) flag that tells the system that I > > will > > read a file that I opened exactly once in native oder. This could tell the > > system to do read ahead and to later mark the pages as immediately > > reusable. > > This would make star even faster than it is now. > > Are you aware of posix_fadvise(2) and madvise(2)? I am of course aware of madvise since December 1987 but this is an interface that does not play nicely with a highly portable program like star. posix_fadvise seems to be _very_ new for Solaris and even though I am frequently reading/writing the POSIX standards mailing list, I was not aware of it. >From my tests with star, I cannot see a significant performance increase but >it may have a 3% effect Jörg -- EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin j...@cs.tu-berlin.de(uni) joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?
Joerg Schilling wrote: > I would be interested to see a open(2) flag that tells the system that I will > read a file that I opened exactly once in native oder. This could tell the > system to do read ahead and to later mark the pages as immediately reusable. > This would make star even faster than it is now. Are you aware of posix_fadvise(2) and madvise(2)? -- James Andrewartha ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?
If cpu seems to be idle, the tool latencytop probably can give you some clue. It's developed for OpenSolaris but Solaris 10 should work too (with glib 2.14 installed). You can get a copy of v0.1 at http://opensolaris.org/os/project/latencytop/ To use latencytop, open a terminal and start "latencytop -s -k 2". The tool will show a window with activities that are being blocked in the system. Then you can launch your application to reproduce the performance problem in another terminal, switch back to latencytop window, and use "<" and ">" to find your process. The list will tell you which function is causing the delay. After a couple minutes you may press "q" to exit from latencytop. When it ends, a log file /var/log/latencytop.log will be created. It includes the stack trace of waiting for IO, semaphore etc. when latencytop was running. If you post the log here, I can probably extract a list of worst delays in ZFS source code, and other experts may comment. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?
Bob, Catching up late on this thread. Would it be possible for you to collect the following data : - /usr/sbin/lockstat -CcwP -n 5 -D 20 -s 40 sleep 5 - /usr/sbin/lockstat -HcwP -n 5 -D 20 -s 40 sleep 5 - /usr/sbin/lockstat -kIW -i 977 -D 20 -s 40 sleep 5 Or if you have access to the GUDs tool please collect data using that. We need to understand how ARC plays a role here. Thanks and regards, Sanjeev. On Sat, Jul 04, 2009 at 02:49:05PM -0500, Bob Friesenhahn wrote: > On Sat, 4 Jul 2009, Jonathan Edwards wrote: >> >> this is only going to help if you've got problems in zfetch .. you'd >> probably see this better by looking for high lock contention in zfetch >> with lockstat > > This is what lockstat says when performance is poor: > > Adaptive mutex spin: 477 events in 30.019 seconds (16 events/sec) > > Count indv cuml rcnt nsec Lock Caller > --- >47 10% 10% 0.00 5813 0x80256000 untimeout+0x24 >46 10% 19% 0.00 2223 0xb0a2f200 taskq_thread+0xe3 >38 8% 27% 0.00 2252 0xb0a2f200 cv_wait+0x70 >29 6% 34% 0.00 1115 0x80256000 callout_execute+0xeb >26 5% 39% 0.00 3006 0xb0a2f200 taskq_dispatch+0x1b8 >22 5% 44% 0.00 1200 0xa06158c0 post_syscall+0x206 >18 4% 47% 0.00 3858 arc_eviction_mtx arc_do_user_evicts+0x76 >16 3% 51% 0.00 1352 arc_eviction_mtx arc_buf_add_ref+0x2d >15 3% 54% 0.00 5376 0xb1adac28 taskq_thread+0xe3 >11 2% 56% 0.00 2520 0xb1adac28 taskq_dispatch+0x1b8 > 9 2% 58% 0.00 2158 0xbb909e20 pollwakeup+0x116 > 9 2% 60% 0.00 2431 0xb1adac28 cv_wait+0x70 > 8 2% 62% 0.00 3912 0x80259000 untimeout+0x24 > 7 1% 63% 0.00 3679 0xb10dfbc0 polllock+0x3f > 7 1% 65% 0.00 2171 0xb0a2f2d8 cv_wait+0x70 > 6 1% 66% 0.00 771 0xb3f23708 pcache_delete_fd+0xac > 6 1% 67% 0.00 4679 0xb0a2f2d8 taskq_dispatch+0x1b8 > 5 1% 68% 0.00 500 0xbe555040 fifo_read+0xf8 > 5 1% 69% 0.0015838 0x8025c000 untimeout+0x24 > 4 1% 70% 0.00 1213 0xac44b558 sd_initpkt_for_buf+0x110 > 4 1% 71% 0.00 638 0xa28722a0 polllock+0x3f > 4 1% 72% 0.00 610 0x80259000 timeout_common+0x39 > 4 1% 73% 0.0010691 0x80256000 timeout_common+0x39 > 3 1% 73% 0.00 1559 htable_mutex+0x78 htable_release+0x8a > 3 1% 74% 0.00 3610 0xbb909e20 cv_timedwait_sig+0x1c1 > 3 1% 74% 0.00 1636 0xa240d410 > ohci_allocate_periodic_in_resource+0x71 > 2 0% 75% 0.00 5959 0xbe555040 fifo_read+0x5c > 2 0% 75% 0.00 3744 0xbe555040 polllock+0x3f > 2 0% 76% 0.00 635 0xb3f23708 pollwakeup+0x116 > 2 0% 76% 0.00 709 0xb3f23708 cv_timedwait_sig+0x1c1 > 2 0% 77% 0.00 831 0xb3dd2070 pcache_insert+0x13d > 2 0% 77% 0.00 5976 0xb3dd2070 pollwakeup+0x116 > 2 0% 77% 0.00 1339 0xb1eb9b80 > metaslab_group_alloc+0x136 > 2 0% 78% 0.00 1514 0xb0a2f2d8 taskq_thread+0xe3 > 2 0% 78% 0.00 4042 0xb0a22988 vdev_queue_io_done+0xc3 > 2 0% 79% 0.00 3428 0xb0a21f08 vdev_queue_io_done+0xc3 > 2 0% 79% 0.00 1002 0xac44b558 sd_core_iostart+0x37 > 2 0% 79% 0.00 1387 0xa8c56d80 xbuf_iostart+0x7d > 2 0% 80% 0.00 698 0xa58a3318 sd_return_command+0x11b > 2 0% 80% 0.00 385 0xa58a3318 sd_start_cmds+0x115 > 2 0% 81% 0.00 562 0xa5647800 ssfcp_scsi_start+0x30 > 2 0% 81% 0.00 1620 0xa4162d58 ssfcp_scsi_init_pkt+0x1be > 2 0% 82% 0.00 897 0xa4162d58 ssfcp_scsi_start+0x42 > 2 0% 82% 0.00 475 0xa4162b78 ssfcp_scsi_start+0x42 > 2 0% 82% 0.00 697 0xa40fb158 sd_start_cmds+0x115 > 2 0% 83% 0.0010901 0xa28722a0 fifo_write+0x5b > 2 0% 83% 0.00 4379 0xa28722a0 fifo_read+0xf8 > 2 0% 84% 0.00 1534 0xa2638390 emlxs_tx_get+0x38 > 2 0% 84% 0.00 1601 0xa2638350 emlxs_issue_iocb_cmd+0xc1 > 2 0% 84% 0.00 6697 0xa2503f08 vdev_queue_io_done+0x7b > 2 0% 85% 0.00 4113 0xa24040b0 > gcpu_ntv_mca_poll_wrapper+0x64 > 2 0% 85% 0.00 928 0xfe85dc140658 pollwakeup+0x116 > 1 0% 86% 0.00 404 iommulib_lock lookup_cache+0x2c > 1 0% 86% 0.00 4867 pidlock
Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?
Andre van Eyssen wrote: On Mon, 6 Jul 2009, Gary Mills wrote: As for a business case, we just had an extended and catastrophic performance degradation that was the result of two ZFS bugs. If we have another one like that, our director is likely to instruct us to throw away all our Solaris toys and convert to Microsoft products. If you change platform every time you get two bugs in a product, you must cycle platforms on a pretty regular basis! You often find the change is towards Windows. That very rarely has the same rules applied, so things then stick there. -- Andrew ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?
+-- | On 2009-07-07 01:29:11, Andre van Eyssen wrote: | | On Mon, 6 Jul 2009, Gary Mills wrote: | | >As for a business case, we just had an extended and catastrophic | >performance degradation that was the result of two ZFS bugs. If we | >have another one like that, our director is likely to instruct us to | >throw away all our Solaris toys and convert to Microsoft products. | | If you change platform every time you get two bugs in a product, you must | cycle platforms on a pretty regular basis! Given that policy, I don't imagine Windows will last very long anyway. -- bda cyberpunk is dead. long live cyberpunk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?
On Mon, 6 Jul 2009, Gary Mills wrote: As for a business case, we just had an extended and catastrophic performance degradation that was the result of two ZFS bugs. If we have another one like that, our director is likely to instruct us to throw away all our Solaris toys and convert to Microsoft products. If you change platform every time you get two bugs in a product, you must cycle platforms on a pretty regular basis! -- Andre van Eyssen. mail: an...@purplecow.org jabber: an...@interact.purplecow.org purplecow.org: UNIX for the masses http://www2.purplecow.org purplecow.org: PCOWpix http://pix.purplecow.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?
On Sat, Jul 04, 2009 at 07:18:45PM +0100, Phil Harman wrote: > Gary Mills wrote: > >On Sat, Jul 04, 2009 at 08:48:33AM +0100, Phil Harman wrote: > > > >>ZFS doesn't mix well with mmap(2). This is because ZFS uses the ARC > >>instead of the Solaris page cache. But mmap() uses the latter. So if > >>anyone maps a file, ZFS has to keep the two caches in sync. > > > >That's the first I've heard of this issue. Our e-mail server runs > >Cyrus IMAP with mailboxes on ZFS filesystems. Cyrus uses mmap(2) > >extensively. I understand that Solaris has an excellent > >implementation of mmap(2). ZFS has many advantages, snapshots for > >example, for mailbox storage. Is there anything that we can be do to > >optimize the two caches in this environment? Will mmap(2) one day > >play nicely with ZFS? > [..] > Software engineering is always about prioritising resource. Nothing > prioritises performance tuning attention quite like compelling > competitive data. When Bart Smaalders and I wrote libMicro we generated > a lot of very compelling data. I also coined the phrase "If Linux is > faster, it's a Solaris bug". You will find quite a few (mostly fixed) > bugs with the synopsis "linux is faster than solaris at ...". > > So, if mmap(2) playing nicely with ZFS is important to you, probably the > best thing you can do to help that along is to provide data that will > help build the business case for spending engineering resource on the issue. First of all, how significant is the double caching in terms of performance? If the effect is small, I won't worry about it anymore. What sort of data do you need? Would a list of software products that utilize mmap(2) extensively and could benefit from ZFS be suitable? As for a business case, we just had an extended and catastrophic performance degradation that was the result of two ZFS bugs. If we have another one like that, our director is likely to instruct us to throw away all our Solaris toys and convert to Microsoft products. -- -Gary Mills--Unix Support--U of M Academic Computing and Networking- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?
On Mon, 6 Jul 2009, Boyd Adamson wrote: Probably this is encouraged by documentation like this: The memory mapping interface is described in Memory Management Interfaces. Mapping files is the most efficient form of file I/O for most applications run under the SunOS platform. Found at: http://docs.sun.com/app/docs/doc/817-4415/fileio-2?l=en&a=view People often think about the main benefit of mmap() being to reduce CPU consumption and buffer copies but the mmap() family of programming interfaces is much richer than low-level read/write, pread/pwrite, or stdio, because madvise() provides the ability for I/O scheduling, or to flush stale data from memory. In recent Solaris, it also includes provisions which allow applications to improve their performance on NUMA systems. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?
Phil Harman writes: > Gary Mills wrote: > The Solaris implementation of mmap(2) is functionally correct, but the > wait for a 64 bit address space rather moved the attention of > performance tuning elsewhere. I must admit I was surprised to see so > much code out there that still uses mmap(2) for general I/O (rather > than just to support dynamic linking). Probably this is encouraged by documentation like this: > The memory mapping interface is described in Memory Management > Interfaces. Mapping files is the most efficient form of file I/O for > most applications run under the SunOS platform. Found at: http://docs.sun.com/app/docs/doc/817-4415/fileio-2?l=en&a=view Boyd. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?
On Sat, 4 Jul 2009, Phil Harman wrote: This is not a new problem. It seems that I have been banging my head against this from the time I started using zfs. I'd like to see mpstat 1 for each case, on an otherwise idle system, but then there's probably a whole lot of dtrace I'd like to do ... but I'm just off on vacation for a week, and this will probably have to be my last post on this thread until I'm back. Shame on you for taking well-earned vacation in my time of need. :-) 'mpstat 1' output when I/O is good: CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 00 00 1700 247 2187 11 214 110 102702 5 0 93 10 00 14785 2812 18 241 100 184242 4 0 94 20 01 12100 2392 60 185 190 3019275 28 0 67 30 00 3242 2320 2028 60 18190 2225003 24 0 73 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 00 00 1862 244 25549 23160 28802 3 0 95 10 00 11581 2055 17 22170 44791 3 0 96 20 00 10370 2051 65 186 140 2502114 24 0 73 30 00 3037 2167 2101 62 186 110 2513934 25 0 71 'mpstat 1' output when I/O is bad: CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 00 00 859 243 10065 10600 207332 3 0 95 10 00 504 15 942 12 8460 740093 6 0 91 20 00 1920 3380 4800380 1 0 99 30 00 549 376 5221 3600 1350 2 0 98 Notice how intensely unbusy the CPU cores are when I/O is bad. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?
Bob Friesenhahn wrote: On Sat, 4 Jul 2009, Phil Harman wrote: However, it seems that memory mapping is not responsible for the problem I am seeing here. Memory mapping may make the problem seem worse, but it is clearly not the cause. mmap(2) is what brings ZFS files into the page cache. I think you've shown us that once you've copied files with cp(1) - which does use mmap(2) - that anything that uses read(2) on the same files is impacted. The problem is observed with cpio, which does not use mmap. This is immediately after a reboot or unmount/mount of the filesystem. Sorry, I didn't get to your other post ... Ok, here is the scoop on the dire Solaris 10 (Generic_141415-03) performance bug on my Sun Ultra 40-M2 attached to a StorageTek 2540 with latest firmware. I rebooted the system used cpio to send the input files to /dev/null, and then immediately used cpio a second time to send the input files to /dev/null. Note that the amount of file data (243 GB) is plenty sufficient to purge any file data from the ARC (which has a cap of 10 GB). % time cat dpx-files.txt | cpio -o > /dev/null 495713288 blocks cat dpx-files.txt 0.00s user 0.00s system 0% cpu 1.573 total cpio -o > /dev/null 78.92s user 360.55s system 43% cpu 16:59.48 total % time cat dpx-files.txt | cpio -o > /dev/null 495713288 blocks cat dpx-files.txt 0.00s user 0.00s system 0% cpu 0.198 total cpio -o > /dev/null 79.92s user 358.75s system 11% cpu 1:01:05.88 total zpool iostat averaged over 60 seconds reported that the first run through the files read the data at 251 MB/s and the second run only achieved 68 MB/s. It seems clear that there is something really bad about Solaris 10 zfs's file caching code which is causing it to go into the weeds. I don't think that the results mean much, but I have attached output from 'hotkernel' while a subsequent cpio copy is taking place. It shows that the kernel is mostly sleeping. This is not a new problem. It seems that I have been banging my head against this from the time I started using zfs. I'd like to see mpstat 1 for each case, on an otherwise idle system, but then there's probably a whole lot of dtrace I'd like to do ... but I'm just off on vacation for a week, and this will probably have to be my last post on this thread until I'm back. Cheers, Phil ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?
On Sat, 4 Jul 2009 13:03:52 -0500 (CDT) Bob Friesenhahn wrote: > On Sat, 4 Jul 2009, Joerg Schilling wrote: > > Did you try to use highly performant software like star? > > No, because I don't want to tarnish your software's stellar > reputation. I am focusing on Solaris 10 bugs today. Blunt. -- Dick Hoogendijk -- PGP/GnuPG key: 01D2433D + http://nagual.nl/ | nevada / OpenSolaris 2009.06 release + All that's really worth doing is what we do for others (Lewis Carrol) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?
On Sat, 4 Jul 2009, Jonathan Edwards wrote: this is only going to help if you've got problems in zfetch .. you'd probably see this better by looking for high lock contention in zfetch with lockstat This is what lockstat says when performance is poor: Adaptive mutex spin: 477 events in 30.019 seconds (16 events/sec) Count indv cuml rcnt nsec Lock Caller --- 47 10% 10% 0.00 5813 0x80256000 untimeout+0x24 46 10% 19% 0.00 2223 0xb0a2f200 taskq_thread+0xe3 38 8% 27% 0.00 2252 0xb0a2f200 cv_wait+0x70 29 6% 34% 0.00 1115 0x80256000 callout_execute+0xeb 26 5% 39% 0.00 3006 0xb0a2f200 taskq_dispatch+0x1b8 22 5% 44% 0.00 1200 0xa06158c0 post_syscall+0x206 18 4% 47% 0.00 3858 arc_eviction_mtx arc_do_user_evicts+0x76 16 3% 51% 0.00 1352 arc_eviction_mtx arc_buf_add_ref+0x2d 15 3% 54% 0.00 5376 0xb1adac28 taskq_thread+0xe3 11 2% 56% 0.00 2520 0xb1adac28 taskq_dispatch+0x1b8 9 2% 58% 0.00 2158 0xbb909e20 pollwakeup+0x116 9 2% 60% 0.00 2431 0xb1adac28 cv_wait+0x70 8 2% 62% 0.00 3912 0x80259000 untimeout+0x24 7 1% 63% 0.00 3679 0xb10dfbc0 polllock+0x3f 7 1% 65% 0.00 2171 0xb0a2f2d8 cv_wait+0x70 6 1% 66% 0.00 771 0xb3f23708 pcache_delete_fd+0xac 6 1% 67% 0.00 4679 0xb0a2f2d8 taskq_dispatch+0x1b8 5 1% 68% 0.00 500 0xbe555040 fifo_read+0xf8 5 1% 69% 0.0015838 0x8025c000 untimeout+0x24 4 1% 70% 0.00 1213 0xac44b558 sd_initpkt_for_buf+0x110 4 1% 71% 0.00 638 0xa28722a0 polllock+0x3f 4 1% 72% 0.00 610 0x80259000 timeout_common+0x39 4 1% 73% 0.0010691 0x80256000 timeout_common+0x39 3 1% 73% 0.00 1559 htable_mutex+0x78 htable_release+0x8a 3 1% 74% 0.00 3610 0xbb909e20 cv_timedwait_sig+0x1c1 3 1% 74% 0.00 1636 0xa240d410 ohci_allocate_periodic_in_resource+0x71 2 0% 75% 0.00 5959 0xbe555040 fifo_read+0x5c 2 0% 75% 0.00 3744 0xbe555040 polllock+0x3f 2 0% 76% 0.00 635 0xb3f23708 pollwakeup+0x116 2 0% 76% 0.00 709 0xb3f23708 cv_timedwait_sig+0x1c1 2 0% 77% 0.00 831 0xb3dd2070 pcache_insert+0x13d 2 0% 77% 0.00 5976 0xb3dd2070 pollwakeup+0x116 2 0% 77% 0.00 1339 0xb1eb9b80 metaslab_group_alloc+0x136 2 0% 78% 0.00 1514 0xb0a2f2d8 taskq_thread+0xe3 2 0% 78% 0.00 4042 0xb0a22988 vdev_queue_io_done+0xc3 2 0% 79% 0.00 3428 0xb0a21f08 vdev_queue_io_done+0xc3 2 0% 79% 0.00 1002 0xac44b558 sd_core_iostart+0x37 2 0% 79% 0.00 1387 0xa8c56d80 xbuf_iostart+0x7d 2 0% 80% 0.00 698 0xa58a3318 sd_return_command+0x11b 2 0% 80% 0.00 385 0xa58a3318 sd_start_cmds+0x115 2 0% 81% 0.00 562 0xa5647800 ssfcp_scsi_start+0x30 2 0% 81% 0.00 1620 0xa4162d58 ssfcp_scsi_init_pkt+0x1be 2 0% 82% 0.00 897 0xa4162d58 ssfcp_scsi_start+0x42 2 0% 82% 0.00 475 0xa4162b78 ssfcp_scsi_start+0x42 2 0% 82% 0.00 697 0xa40fb158 sd_start_cmds+0x115 2 0% 83% 0.0010901 0xa28722a0 fifo_write+0x5b 2 0% 83% 0.00 4379 0xa28722a0 fifo_read+0xf8 2 0% 84% 0.00 1534 0xa2638390 emlxs_tx_get+0x38 2 0% 84% 0.00 1601 0xa2638350 emlxs_issue_iocb_cmd+0xc1 2 0% 84% 0.00 6697 0xa2503f08 vdev_queue_io_done+0x7b 2 0% 85% 0.00 4113 0xa24040b0 gcpu_ntv_mca_poll_wrapper+0x64 2 0% 85% 0.00 928 0xfe85dc140658 pollwakeup+0x116 1 0% 86% 0.00 404 iommulib_lock lookup_cache+0x2c 1 0% 86% 0.00 4867 pidlockthread_exit+0x6f 1 0% 86% 0.00 1245 plocks+0x3c0 pollhead_delete+0x23 1 0% 86% 0.00 2452 plocks+0x3c0 pollhead_insert+0x35 1 0% 86% 0.00 882 htable_mutex+0x3c0 htable_lookup+0x83 1 0% 87% 0.0028547 htable_mutex+0x3c0 htable_create+0xe3 1 0% 87% 0.0021173 htable_mutex+0x3c0 htable_release+0x8a 1 0% 87% 0.00 1235 htable_mutex+0x370 htable_lookup+0x83 1 0% 87% 0.00 3212 htable_mutex+0x370 htable_release+0x8a 1 0% 87% 0.00 793 htable_mutex+0x78 htable_lookup+0x83 1 0%
Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?
On Sat, 4 Jul 2009, Phil Harman wrote: However, it seems that memory mapping is not responsible for the problem I am seeing here. Memory mapping may make the problem seem worse, but it is clearly not the cause. mmap(2) is what brings ZFS files into the page cache. I think you've shown us that once you've copied files with cp(1) - which does use mmap(2) - that anything that uses read(2) on the same files is impacted. The problem is observed with cpio, which does not use mmap. This is immediately after a reboot or unmount/mount of the filesystem. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?
Bob Friesenhahn wrote: On Sat, 4 Jul 2009, Phil Harman wrote: However, this is only part of the problem. The fundamental issue is that ZFS has its own ARC apart from the Solaris page cache, so whenever mmap() is used, all I/O to that file has to make sure that the two caches are in sync. Hence, a read(2) on a file which has sometime been mapped, will be impacted, even if the file is nolonger mapped. However, it seems that memory mapping is not responsible for the problem I am seeing here. Memory mapping may make the problem seem worse, but it is clearly not the cause. mmap(2) is what brings ZFS files into the page cache. I think you've shown us that once you've copied files with cp(1) - which does use mmap(2) - that anything that uses read(2) on the same files is impacted. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?
Phil Harman wrote: > I think Solaris (if you count SunOS 4.0, which was part of Solaris 1.0) > was the first UNIX to get a working implementation of mmap(2) for files > (if I recall correctly, BSD 4.3 had a manpage but no implementation for > files). From that we got a whole lot of cool stuff, not least dynamic > linking with ld.so (which has made it just about everywhere). Well on BSD, you could mmap() devices but as a result from the fact that there was no useful address space management, you had to first malloc() the amount of space, forcing you to have the same amount of memory available as swap. Later, the device was mapped on top of the allocated memory and made the underlying spap space unacessible. We had to add expensive amounts of swap that time in order to be able to mmap the 256 MB of RAM from our image processor that time at Berthold AG. > The Solaris implementation of mmap(2) is functionally correct, but the > wait for a 64 bit address space rather moved the attention of > performance tuning elsewhere. I must admit I was surprised to see so > much code out there that still uses mmap(2) for general I/O (rather than > just to support dynamic linking). When the new memory management architecture was introduced with SunOS-4.0, things became better although the now unified and partially anomized address space made it hard to implement "limit memoryuse" (rlmit with RLIMIT_RSS). I made a working implementation for SunOS-4.0 but this did not make it into SunOS. There are still related performance issues. If you e.g. store a CD/DVD/BluRay image in /tmp that is bigger than the amount of RAM in the machine, you will observe a buffer overflow while writing with cdrecord unless you use driveropts=burnfree because pagin in is slow on tmpfs. > Software engineering is always about prioritising resource. Nothing > prioritises performance tuning attention quite like compelling > competitive data. When Bart Smaalders and I wrote libMicro we generated > a lot of very compelling data. I also coined the phrase "If Linux is > faster, it's a Solaris bug". You will find quite a few (mostly fixed) > bugs with the synopsis "linux is faster than solaris at ...". Fortunately, Linux is slower with most tasks ;-) In 1988, the effect of mmap() was much more visible than it is now. 20 years ago, the CPU speed limited copy operations making pipes, copyout() and similar slow. This changed with modern CPUs and for this reason, the demand for using mmap() is lower than it has been 20 years ago. > So, if mmap(2) playing nicely with ZFS is important to you, probably the > best thing you can do to help that along is to provide data that will > help build the business case for spending engineering resource on the issue. I would be interested to see a open(2) flag that tells the system that I will read a file that I opened exactly once in native oder. This could tell the system to do read ahead and to later mark the pages as immediately reusable. This would make star even faster than it is now. Jörg -- EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin j...@cs.tu-berlin.de(uni) joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?
On Sat, 4 Jul 2009, Phil Harman wrote: However, this is only part of the problem. The fundamental issue is that ZFS has its own ARC apart from the Solaris page cache, so whenever mmap() is used, all I/O to that file has to make sure that the two caches are in sync. Hence, a read(2) on a file which has sometime been mapped, will be impacted, even if the file is nolonger mapped. However, it seems that memory mapping is not responsible for the problem I am seeing here. Memory mapping may make the problem seem worse, but it is clearly not the cause. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?
Ok, here is the scoop on the dire Solaris 10 (Generic_141415-03) performance bug on my Sun Ultra 40-M2 attached to a StorageTek 2540 with latest firmware. I rebooted the system used cpio to send the input files to /dev/null, and then immediately used cpio a second time to send the input files to /dev/null. Note that the amount of file data (243 GB) is plenty sufficient to purge any file data from the ARC (which has a cap of 10 GB). % time cat dpx-files.txt | cpio -o > /dev/null 495713288 blocks cat dpx-files.txt 0.00s user 0.00s system 0% cpu 1.573 total cpio -o > /dev/null 78.92s user 360.55s system 43% cpu 16:59.48 total % time cat dpx-files.txt | cpio -o > /dev/null 495713288 blocks cat dpx-files.txt 0.00s user 0.00s system 0% cpu 0.198 total cpio -o > /dev/null 79.92s user 358.75s system 11% cpu 1:01:05.88 total zpool iostat averaged over 60 seconds reported that the first run through the files read the data at 251 MB/s and the second run only achieved 68 MB/s. It seems clear that there is something really bad about Solaris 10 zfs's file caching code which is causing it to go into the weeds. I don't think that the results mean much, but I have attached output from 'hotkernel' while a subsequent cpio copy is taking place. It shows that the kernel is mostly sleeping. This is not a new problem. It seems that I have been banging my head against this from the time I started using zfs. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ Sampling... Hit Ctrl-C to end. FUNCTIONCOUNT PCNT unix`SHA1Update 1 0.0% unix`page_unlock1 0.0% unix`lwp_segregs_save 1 0.0% rootnex`rootnex_dma_allochdl1 0.0% unix`mutex_delay_default1 0.0% emlxs`emlxs_initialize_pkt 1 0.0% genunix`pid_lookup 1 0.0% TS`ts_setrun1 0.0% fcp`ssfcp_adjust_cmd1 0.0% genunix`strrput 1 0.0% genunix`cyclic_softint 1 0.0% genunix`fop_poll1 0.0% sd`sd_xbuf_strategy 1 0.0% ohci`ohci_state_is_operational 1 0.0% zfs`SHA256Transform 1 0.0% unix`cpu_resched1 0.0% nvidia`_nv006110rm 1 0.0% genunix`lwp_timer_timeout 1 0.0% genunix`realtime_timeout1 0.0% fcp`ssfcp_scsi_destroy_pkt 1 0.0% nvidia`nvidia_pci_check_config_space1 0.0% genunix`closef 1 0.0% sd`sd_setup_rw_pkt 1 0.0% unix`vsnprintf 1 0.0% zfs`vdev_dtl_contains 1 0.0% genunix`siginfo_kto32 1 0.0% iommulib`iommulib_nex_open 1 0.0% genunix`vn_has_cached_data 1 0.0% ohci`ohci_sendup_td_message 1 0.0% scsi_vhci`vhci_scsi_destroy_pkt 1 0.0% genunix`avl_add 1 0.0% unix`page_create_va 1 0.0% genunix`savectx 1 0.0% ohci`ohci_root_hub_allocate_intr_pipe_resource 1 0.0% unix`page_add 1 0.0% zfs`zfs_unix_to_v4 1 0.0% genunix`set_qend1 0.0% zfs`vdev_queue_io_done 1 0.0% unix`set_idle_cpu 1 0.0% zfs`vdev_cache_read 1 0.0% nvidia`_nv002998rm 1 0.0% ohci`ohci_do_intrs_stats1 0.0% genunix`putq1 0.0% genunix`strput 1 0.0% zfs`zio_buf_alloc 1 0.0% sockfs`socktpi_poll 1 0.0% sockfs`so_update_attrs 1 0.0% sockfs`so_unlock_read
Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?
Gary Mills wrote: On Sat, Jul 04, 2009 at 08:48:33AM +0100, Phil Harman wrote: ZFS doesn't mix well with mmap(2). This is because ZFS uses the ARC instead of the Solaris page cache. But mmap() uses the latter. So if anyone maps a file, ZFS has to keep the two caches in sync. That's the first I've heard of this issue. Our e-mail server runs Cyrus IMAP with mailboxes on ZFS filesystems. Cyrus uses mmap(2) extensively. I understand that Solaris has an excellent implementation of mmap(2). ZFS has many advantages, snapshots for example, for mailbox storage. Is there anything that we can be do to optimize the two caches in this environment? Will mmap(2) one day play nicely with ZFS? I think Solaris (if you count SunOS 4.0, which was part of Solaris 1.0) was the first UNIX to get a working implementation of mmap(2) for files (if I recall correctly, BSD 4.3 had a manpage but no implementation for files). From that we got a whole lot of cool stuff, not least dynamic linking with ld.so (which has made it just about everywhere). The Solaris implementation of mmap(2) is functionally correct, but the wait for a 64 bit address space rather moved the attention of performance tuning elsewhere. I must admit I was surprised to see so much code out there that still uses mmap(2) for general I/O (rather than just to support dynamic linking). Software engineering is always about prioritising resource. Nothing prioritises performance tuning attention quite like compelling competitive data. When Bart Smaalders and I wrote libMicro we generated a lot of very compelling data. I also coined the phrase "If Linux is faster, it's a Solaris bug". You will find quite a few (mostly fixed) bugs with the synopsis "linux is faster than solaris at ...". So, if mmap(2) playing nicely with ZFS is important to you, probably the best thing you can do to help that along is to provide data that will help build the business case for spending engineering resource on the issue. Cheers, Phil ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?
On Jul 4, 2009, at 11:57 AM, Bob Friesenhahn wrote: This brings me to the absurd conclusion that the system must be rebooted immediately prior to each use. see Phil's later email .. an export/import of the pool or a remount of the filesystem should clear the page cache - with mmap'd files you're essentially both them both in the page cache and also in the ARC .. then invalidations in the page cache are going to have effects on dirty data in the cache /etc/system tunables are currently: set zfs:zfs_arc_max = 0x28000 set zfs:zfs_write_limit_override = 0xea60 set zfs:zfs_vdev_max_pending = 5 if you're on x86 - i'd also increase maxphys to 128K .. we still have a 56KB default value in there which is still a bad thing (IMO) --- .je ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?
Bob Friesenhahn wrote: > On Sat, 4 Jul 2009, Joerg Schilling wrote: > >> by more than half. Based on yesterday's experience, that may diminish > >> to only 33 MB/s. > > > > "star -copy -no-fsync bs=8m fs=256m -C from-dir . to-dir" > > > > is nearly 40% faster than > > > > "find . | cpio -pdum to-dir" > > > > Did you try to use highly performant software like star? > > No, because I don't want to tarnish your software's stellar > reputation. I am focusing on Solaris 10 bugs today. I've seen more prefessional replies. At the end it is your decision to ignore helpful advise. BTW: if star on ZFS would not be faster than cpio this would be just a hint for a problem in ZFS that needs to be fixed. Jörg -- EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin j...@cs.tu-berlin.de(uni) joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?
Bob Friesenhahn wrote: On Sat, 4 Jul 2009, Phil Harman wrote: If you reboot, your cpio(1) tests will probably go fast again, until someone uses mmap(2) on the files again. I think tar(1) uses read(2), but from my iPod I can't be sure. It would be interesting to see how tar(1) performs if you run that test before cp(1) on a freshly rebooted system. Ok, I just rebooted the system. Now 'zpool iostat Sun_2540 60' shows that the cpio read rate has increased from (the most recently observed) 33 MB/second to as much as 132 MB/second. To some this may not seem significant but to me it looks a whole lot different. ;-) Thanks, that's really useful data. I wasn't near a machine at the time, so I couldn't do it for myself. I answered your initial question based on what I understood of the implementation, and it's very satisfying to have the data to back it up. I have done some work with the ZFS team towards a fix, but it is only currently in OpenSolaris. Hopefully the fix is very very good. It is difficult to displace the many years of SunOS training that using mmap is the path to best performance. Mmap provides many tools to improve application performance which are just not available via traditional I/O. The part of the problem I highlighted was ... 6699438 zfs induces crosscall storm under heavy mapped sequential read This has been fixed in OpenSolaris, and should be fixed in Solaris 10 update 8. However, this is only part of the problem. The fundamental issue is that ZFS has its own ARC apart from the Solaris page cache, so whenever mmap() is used, all I/O to that file has to make sure that the two caches are in sync. Hence, a read(2) on a file which has sometime been mapped, will be impacted, even if the file is nolonger mapped. I'm sure the data and interest from this thread will be useful to the ZFS team in prioritising further performance enhancements. So thanks again. And if there's any more useful data you can add, please do so. If you have a support contract, you might also consider logging a call and even raising an escalation request. Cheers, Phil Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?
On Sat, 4 Jul 2009, Joerg Schilling wrote: by more than half. Based on yesterday's experience, that may diminish to only 33 MB/s. "star -copy -no-fsync bs=8m fs=256m -C from-dir . to-dir" is nearly 40% faster than "find . | cpio -pdum to-dir" Did you try to use highly performant software like star? No, because I don't want to tarnish your software's stellar reputation. I am focusing on Solaris 10 bugs today. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?
Joerg Schilling wrote: Phil Harman wrote: ZFS doesn't mix well with mmap(2). This is because ZFS uses the ARC instead of the Solaris page cache. But mmap() uses the latter. So if anyone maps a file, ZFS has to keep the two caches in sync. cp(1) uses mmap(2). When you use cp(1) it brings pages of the files it copies into the Solaris page cache. As long as they remain there ZFS will be slow for those files, even if you subsequently use read(2) to access them. If you reboot, your cpio(1) tests will probably go fast again, until Do you believe that reboot is the only way to reset this? No, but from my iPod I didn't have the patience to write a fuller explanation :) See ... http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/zfs_vnops.c#514 We take the long path is the vnode has any pages cached in the page cache. So instead of a reboot, you should also be able to export/import the pool or unmount/mount the filesystem. Also, if you didn't touch the file for a long time, and had lots of other page cache churn, the file might eventually get expunged from the page cache. Phil ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?
Bob Friesenhahn wrote: > A tar pipeline still provides terrible file copy performance. Read > bandwidth is only 26 MB. So I stopped the tar copy and re-tried the > cpio copy. > > A second copy with the cpio results in a read/write data rate of only > 54.9 MB/s (vs the just experienced 132 MB/s). Performance is reduced > by more than half. Based on yesterday's experience, that may diminish > to only 33 MB/s. "star -copy -no-fsync bs=8m fs=256m -C from-dir . to-dir" is nearly 40% faster than "find . | cpio -pdum to-dir" Did you try to use highly performant software like star? Jörg -- EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin j...@cs.tu-berlin.de(uni) joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?
A tar pipeline still provides terrible file copy performance. Read bandwidth is only 26 MB. So I stopped the tar copy and re-tried the cpio copy. A second copy with the cpio results in a read/write data rate of only 54.9 MB/s (vs the just experienced 132 MB/s). Performance is reduced by more than half. Based on yesterday's experience, that may diminish to only 33 MB/s. The amount of data being copied is much larger than any cache yet somehow reading a file a second time is less than 1/2 as fast. This brings me to the absurd conclusion that the system must be rebooted immediately prior to each use. /etc/system tunables are currently: set zfs:zfs_arc_max = 0x28000 set zfs:zfs_write_limit_override = 0xea60 set zfs:zfs_vdev_max_pending = 5 Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?
On Sat, Jul 04, 2009 at 08:48:33AM +0100, Phil Harman wrote: > ZFS doesn't mix well with mmap(2). This is because ZFS uses the ARC > instead of the Solaris page cache. But mmap() uses the latter. So if > anyone maps a file, ZFS has to keep the two caches in sync. That's the first I've heard of this issue. Our e-mail server runs Cyrus IMAP with mailboxes on ZFS filesystems. Cyrus uses mmap(2) extensively. I understand that Solaris has an excellent implementation of mmap(2). ZFS has many advantages, snapshots for example, for mailbox storage. Is there anything that we can be do to optimize the two caches in this environment? Will mmap(2) one day play nicely with ZFS? -- -Gary Mills--Unix Support--U of M Academic Computing and Networking- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?
On Sat, 4 Jul 2009, Phil Harman wrote: If you reboot, your cpio(1) tests will probably go fast again, until someone uses mmap(2) on the files again. I think tar(1) uses read(2), but from my iPod I can't be sure. It would be interesting to see how tar(1) performs if you run that test before cp(1) on a freshly rebooted system. Ok, I just rebooted the system. Now 'zpool iostat Sun_2540 60' shows that the cpio read rate has increased from (the most recently observed) 33 MB/second to as much as 132 MB/second. To some this may not seem significant but to me it looks a whole lot different. ;-) I have done some work with the ZFS team towards a fix, but it is only currently in OpenSolaris. Hopefully the fix is very very good. It is difficult to displace the many years of SunOS training that using mmap is the path to best performance. Mmap provides many tools to improve application performance which are just not available via traditional I/O. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?
On Sat, 4 Jul 2009, Phil Harman wrote: ZFS doesn't mix well with mmap(2). This is because ZFS uses the ARC instead of the Solaris page cache. But mmap() uses the latter. So if anyone maps a file, ZFS has to keep the two caches in sync. cp(1) uses mmap(2). When you use cp(1) it brings pages of the files it copies into the Solaris page cache. As long as they remain there ZFS will be slow for those files, even if you subsequently use read(2) to access them. This is very interesting information and certainly can explain a lot. My application has a choice of using mmap or traditional I/O. I often use mmap. From what you are saying, using mmap is poison to subsequent performance. On June 29th I tested my application (which was set to use mmap) shortly after a reboot and got this overall initial runtime: real 2:24:25.675 user 4:38:57.837 sys 14:30.823 By June 30th (with no intermediate reboot) the overall runtime had increased to real 3:08:58.941 user 4:38:38.192 sys 15:44.197 which seems like quite a large change. If you reboot, your cpio(1) tests will probably go fast again, until someone uses mmap(2) on the files again. I think tar(1) uses read(2), but from my I will test. The other thing that slows you down is that ZFS only flushes to disk every 5 seconds if there are no synchronous writes. It would be interesting to see iostat -xnz 1 while you are running your tests. You may find the disks are writing very efficiently for one second in every five. Actually I found that the disks were writing flat out for five seconds at a time which stalled all other pool I/O (and dependent CPU) for at least three seconds (see earlier discussion). So at the moment I have zfs_write_limit_override set to 2684354560 so that the write cycle is more on the order of one second in every five. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?
On Sat, 4 Jul 2009, Jonathan Edwards wrote: somehow i don't think that reading the first 64MB off (presumably) off a raw disk device 3 times and picking the middle value is going to give you much useful information on the overall state of the disks .. i believe this was more of a quick hack to just validate that there's nothing too far out of the norm, but with that said - what's the c2 and c3 device above? you've got to be caching the heck out of that to get that unbelievable 13 GB/s - so you're really only seeing memory speeds there Agreed. It is just a quick sanity check. I think that the c2 and c3 devices are speedy USB drives. more useful information would be something more like the old taz or some of the disk IO latency tools when you're driving a workload. What I see from 'iostat -cx' is a low latency (<= 4 ms) and low workload while the data is being read, and then (periodically) a burst of write data with much higher latency (40-64ms svc_t). The write burst does not take long so it is clear that reading is the bottleneck. if you're using LUNs off an array - this might be another case of the zfs_vdev_max_pending being tuned more for direct attach drives .. you could be trying to queue up too much I/O against the RAID controller, particularly if the RAID controller is also trying to prefetch out of it's cache. I have played with zfs_vdev_max_pending before. It does dial down the latency pretty linearly during the write phase (e.g. 35 queued I/Os results in 64 ms svc_t). you might want to dtrace this to break down where the latency is occuring .. eg: is this a DNLC caching problem, ARC problem, or device level problem also - is this really coming off a 2540? if so - you should probably investigate the array throughput numbers and what's happening on the RAID controller .. i typically find it helpful to understand what the raw hardware is capable of (hence tools like vdbench to drive an anticipated load before i configure anything) - and then attempting to configure the various tunables to match after that Yes, this comes off of a 2540. I used iozone for testing and see that through zfs, the hardware is able to write a 64GB file at 380 MB/s and read at 551 MB/s. Unfortunately, this does not seem to translate well for the actual task. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?
Phil Harman wrote: > ZFS doesn't mix well with mmap(2). This is because ZFS uses the ARC > instead of the Solaris page cache. But mmap() uses the latter. So if > anyone maps a file, ZFS has to keep the two caches in sync. > > cp(1) uses mmap(2). When you use cp(1) it brings pages of the files it > copies into the Solaris page cache. As long as they remain there ZFS > will be slow for those files, even if you subsequently use read(2) to > access them. > > If you reboot, your cpio(1) tests will probably go fast again, until Do you believe that reboot is the only way to reset this? > someone uses mmap(2) on the files again. I think tar(1) uses read(2), > but from my iPod I can't be sure. It would be interesting to see how > tar(1) performs if you run that test before cp(1) on a freshly > rebooted system. There are many tar implementations. The oldest is the UNIX tar implementation from around 1978, the next was star from 1982, then there is GNU tar from 1987. Star forks into two processes that are connected via shared memory in order to speed up things. If you compare the copy speed from star amd cp on UFS and if you tell star to be as unreliable as cp (by specifying the star option -no-fsync), star will do the job by 30% faster than cp does even though star does not use mmap. Copying with Sun's tar is a tic faster than using cp and it is a bit more accurat. GNU tar is not better than Sun's tar. If you are looking for the best speed, use: star -copy -no-fsync -C from-dir . to-dir and set up e.v. bs=1m fs=128m. Jörg -- EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin j...@cs.tu-berlin.de(uni) joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?
On Jul 4, 2009, at 03:48, Phil Harman wrote: The other thing that slows you down is that ZFS only flushes to disk every 5 seconds if there are no synchronous writes. It would be interesting to see iostat -xnz 1 while you are running your tests. You may find the disks are writing very efficiently for one second in every five. The value of 5 seconds is no longer a hard stop since SNV 87. Since snv_87 (and S10u6) it can be up to 30 seconds (but it does shoot for 5 seconds): http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6429205 See the 20-Mar-2008 change for txg.c for details. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?
On Jul 4, 2009, at 12:03 AM, Bob Friesenhahn wrote: % ./diskqual.sh c1t0d0 130 MB/sec c1t1d0 130 MB/sec c2t202400A0B83A8A0Bd31 13422 MB/sec c3t202500A0B83A8A0Bd31 13422 MB/sec c4t600A0B80003A8A0B096A47B4559Ed0 191 MB/sec c4t600A0B80003A8A0B096E47B456DAd0 192 MB/sec c4t600A0B80003A8A0B096147B451BEd0 192 MB/sec c4t600A0B80003A8A0B096647B453CEd0 192 MB/sec c4t600A0B80003A8A0B097347B457D4d0 212 MB/sec c4t600A0B800039C9B50A9C47B4522Dd0 191 MB/sec c4t600A0B800039C9B50AA047B4529Bd0 192 MB/sec c4t600A0B800039C9B50AA447B4544Fd0 192 MB/sec c4t600A0B800039C9B50AA847B45605d0 191 MB/sec c4t600A0B800039C9B50AAC47B45739d0 191 MB/sec c4t600A0B800039C9B50AB047B457ADd0 191 MB/sec c4t600A0B800039C9B50AB447B4595Fd0 191 MB/sec somehow i don't think that reading the first 64MB off (presumably) off a raw disk device 3 times and picking the middle value is going to give you much useful information on the overall state of the disks .. i believe this was more of a quick hack to just validate that there's nothing too far out of the norm, but with that said - what's the c2 and c3 device above? you've got to be caching the heck out of that to get that unbelievable 13 GB/s - so you're really only seeing memory speeds there more useful information would be something more like the old taz or some of the disk IO latency tools when you're driving a workload. % arc_summary.pl System Memory: Physical RAM: 20470 MB Free Memory : 2371 MB LotsFree: 312 MB ZFS Tunables (/etc/system): * set zfs:zfs_arc_max = 0x3 set zfs:zfs_arc_max = 0x28000 * set zfs:zfs_arc_max = 0x2 ARC Size: Current Size: 9383 MB (arcsize) Target Size (Adaptive): 10240 MB (c) Min Size (Hard Limit):1280 MB (zfs_arc_min) Max Size (Hard Limit):10240 MB (zfs_arc_max) ARC Size Breakdown: Most Recently Used Cache Size: 6%644 MB (p) Most Frequently Used Cache Size:93%9595 MB (c-p) ARC Efficency: Cache Access Total: 674638362 Cache Hit Ratio: 91% 615586988 [Defined State for buffer] Cache Miss Ratio: 8% 59051374 [Undefined State for Buffer] REAL Hit Ratio: 87% 590314508 [MRU/MFU Hits Only] Data Demand Efficiency:96% Data Prefetch Efficiency: 7% CACHE HITS BY CACHE LIST: Anon:2% 13626529 [ New Customer, First Cache Hit ] Most Recently Used: 78% 480379752 (mru) [ Return Customer ] Most Frequently Used: 17% 109934756 (mfu) [ Frequent Customer ] Most Recently Used Ghost:0% 5180256 (mru_ghost) [ Return Customer Evicted, Now Back ] Most Frequently Used Ghost: 1% 6465695 (mfu_ghost) [ Frequent Customer Evicted, Now Back ] CACHE HITS BY DATA TYPE: Demand Data:78%485431759 Prefetch Data: 0%3045442 Demand Metadata:16%103900170 Prefetch Metadata: 3%23209617 CACHE MISSES BY DATA TYPE: Demand Data:30%18109355 Prefetch Data: 60%35633374 Demand Metadata: 6%3806177 Prefetch Metadata: 2% 1502468 - Prefetch seems to be performing badly. The Ben Rockwood's blog entry at http://www.cuddletech.com/blog/pivot/entry.php?id=1040 discusses prefetch. The sample Dtrace script on that page only shows cache misses: vdev_cache_read: 6507827833451031357 read 131072 bytes at offset 6774849536: MISS vdev_cache_read: 6507827833451031357 read 131072 bytes at offset 6774980608: MISS Unfortunately, the file-level prefetch DTrace sample script from the same page seems to have a syntax error. if you're using LUNs off an array - this might be another case of the zfs_vdev_max_pending being tuned more for direct attach drives .. you could be trying to queue up too much I/O against the RAID controller, particularly if the RAID controller is also trying to prefetch out of it's cache. I tried disabling file level prefetch (zfs_prefetch_disable=1) but did not observe any change in behavior. this is only going to help if you've got problems in zfetch .. you'd probably see this better by looking for high lock contention in zfetch with lockstat # kstat -p zfs:0:vdev_cache_stats zfs:0:vdev_cache_stats:classmisc zfs:0:vdev_cache_stats:crtime 130.61298275 zfs:0:vdev_cache_stats:delegations 754287 zfs:0:vdev_cache_stats:hits 3973496 zfs:0:vdev_cache_stats:misses 2154959 zfs:0:vdev_cache_stats:snaptime 451955.55419545 Performance when coping 236 GB of files (each file is 5537792 bytes, wit
Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?
Mattias Pantzare wrote: > > Performance when coping 236 GB of files (each file is 5537792 bytes, with > > 20001 files per directory) from one directory to another: > > > > Copy Method Data Rate > > == > > cpio -pdum 75 MB/s > > cp -r 32 MB/s > > tar -cf - . | (cd dest && tar -xf -) 26 MB/s > > > > I would expect data copy rates approaching 200 MB/s. > > > > What happens if you run two copy at the same time? (On different data) Before you do things like this, you first should start using test that may give you useful results. Note of the programs above have been written for decent performance. I know that "cp" on Solaris is a partial exception for songle file copies, but does not help us if we like to compare _aparent_ performance. Let me first introduce other programs: sdd A dd(1) replacement that was first written in 1984 and that includes built-in speed metering since Jily 1988. starA tar(1) replacement that was first written in 1982 and that supports much better performance by using a shared memory based FIFO. Note that most speed tests that are run on Linux do not result un useful values as you don't know what's happening dunring the observation time. If you like to meter read performance, I recommend to use a filesystem that was mounted directly before doing the test or to use files that are big enough not to fit into memory. Use e.g.: sdd if=file-name bs=64k -onull -time If you like to meter write performance, I recomment to write big enough files to avoid using wrong numbers as a result from caching. Use e.g.sdd -inull bs=64k count=some-number of=file-name -time Us an apropriate value for "some-number". For copying files, I recommend to use: star -copy bs=1m fs=128m -time -C from-dir . to-dir It makes sense to run another test using the option: -no-fsync in addition. On Solaris with UFS, using -no-fsync speeds up things by aprox. 10%. On Linux with a local filesystem, using -no-fsync speeds up things by aprox. 400%. This is why you get useless high numbers from using GNU tar for copy tests on Linux. Jörg -- EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin j...@cs.tu-berlin.de(uni) joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?
On Sat, Jul 4, 2009 at 06:03, Bob Friesenhahn wrote: > I am still trying to determine why Solaris 10 (Generic_141415-03) ZFS > performs so terribly on my system. I blew a good bit of personal life > savings on this set-up but am not seeing performance anywhere near what is > expected. Testing with iozone shows that bulk I/O performance is good. > Testing with Jeff Bonwick's 'diskqual.sh' shows expected disk performance. > The problem is that actual observed application performance sucks, and > could often be satisified by portable USB drives rather than high-end SAS > drives. It could be satisified by just one SAS disk drive. Behavior is as > if zfs is very slow to read data since disks are read at only 2 or 3 > MB/second followed by an intermittent write on a long cycle. Drive lights > blink slowly. It is as if ZFS does no successful sequential read-ahead on > the files (see Prefetch Data hit rate of 0% and Prefetch Data cache miss of > 60% below), or there is a semaphore bottleneck somewhere (but CPU use is > very low). > > Observed behavior is very program dependent. > > # zpool status Sun_2540 > pool: Sun_2540 > state: ONLINE > status: The pool is formatted using an older on-disk format. The pool can > still be used, but some features are unavailable. > action: Upgrade the pool using 'zpool upgrade'. Once this is done, the > pool will no longer be accessible on older software versions. > scrub: scrub completed after 0h46m with 0 errors on Mon Jun 29 05:06:33 > 2009 > config: > > NAME STATE READ WRITE CKSUM > Sun_2540 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c4t600A0B80003A8A0B096A47B4559Ed0 ONLINE 0 0 0 > c4t600A0B800039C9B50AA047B4529Bd0 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c4t600A0B80003A8A0B096E47B456DAd0 ONLINE 0 0 0 > c4t600A0B800039C9B50AA447B4544Fd0 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c4t600A0B80003A8A0B096147B451BEd0 ONLINE 0 0 0 > c4t600A0B800039C9B50AA847B45605d0 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c4t600A0B80003A8A0B096647B453CEd0 ONLINE 0 0 0 > c4t600A0B800039C9B50AAC47B45739d0 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c4t600A0B80003A8A0B097347B457D4d0 ONLINE 0 0 0 > c4t600A0B800039C9B50AB047B457ADd0 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > c4t600A0B800039C9B50A9C47B4522Dd0 ONLINE 0 0 0 > c4t600A0B800039C9B50AB447B4595Fd0 ONLINE 0 0 0 > > errors: No known data errors > > > Prefetch seems to be performing badly. The Ben Rockwood's blog entry at > http://www.cuddletech.com/blog/pivot/entry.php?id=1040 discusses prefetch. > The sample Dtrace script on that page only shows cache misses: > > vdev_cache_read: 6507827833451031357 read 131072 bytes at offset 6774849536: > MISS > vdev_cache_read: 6507827833451031357 read 131072 bytes at offset 6774980608: > MISS > > Unfortunately, the file-level prefetch DTrace sample script from the same > page seems to have a syntax error. > > I tried disabling file level prefetch (zfs_prefetch_disable=1) but did not > observe any change in behavior. > > # kstat -p zfs:0:vdev_cache_stats > zfs:0:vdev_cache_stats:class misc > zfs:0:vdev_cache_stats:crtime 130.61298275 > zfs:0:vdev_cache_stats:delegations 754287 > zfs:0:vdev_cache_stats:hits 3973496 > zfs:0:vdev_cache_stats:misses 2154959 > zfs:0:vdev_cache_stats:snaptime 451955.55419545 > > Performance when coping 236 GB of files (each file is 5537792 bytes, with > 20001 files per directory) from one directory to another: > > Copy Method Data Rate > == > cpio -pdum 75 MB/s > cp -r 32 MB/s > tar -cf - . | (cd dest && tar -xf -) 26 MB/s > > I would expect data copy rates approaching 200 MB/s. > What happens if you run two copy at the same time? (On different data) Your test is very bad at using striping as reads are done sequential. Prefetch can only help in a file and your files are only 5Mb. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?
ZFS doesn't mix well with mmap(2). This is because ZFS uses the ARC instead of the Solaris page cache. But mmap() uses the latter. So if anyone maps a file, ZFS has to keep the two caches in sync. cp(1) uses mmap(2). When you use cp(1) it brings pages of the files it copies into the Solaris page cache. As long as they remain there ZFS will be slow for those files, even if you subsequently use read(2) to access them. If you reboot, your cpio(1) tests will probably go fast again, until someone uses mmap(2) on the files again. I think tar(1) uses read(2), but from my iPod I can't be sure. It would be interesting to see how tar(1) performs if you run that test before cp(1) on a freshly rebooted system. I have done some work with the ZFS team towards a fix, but it is only currently in OpenSolaris. The other thing that slows you down is that ZFS only flushes to disk every 5 seconds if there are no synchronous writes. It would be interesting to see iostat -xnz 1 while you are running your tests. You may find the disks are writing very efficiently for one second in every five. Hope this helps, Phil blogs.sun.com/pgdh Sent from my iPod On 4 Jul 2009, at 05:26, Bob Friesenhahn wrote: On Fri, 3 Jul 2009, Bob Friesenhahn wrote: Copy MethodData Rate == cpio -pdum75 MB/s cp -r32 MB/s tar -cf - . | (cd dest && tar -xf -)26 MB/s It seems that the above should be ammended. Running the cpio based copy again results in zpool iostat only reporting a read bandwidth of 33 MB/second. The system seems to get slower and slower as it runs. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?
On Fri, 3 Jul 2009, Bob Friesenhahn wrote: Copy Method Data Rate == cpio -pdum 75 MB/s cp -r 32 MB/s tar -cf - . | (cd dest && tar -xf -)26 MB/s It seems that the above should be ammended. Running the cpio based copy again results in zpool iostat only reporting a read bandwidth of 33 MB/second. The system seems to get slower and slower as it runs. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Why is Solaris 10 ZFS performance so terrible?
I am still trying to determine why Solaris 10 (Generic_141415-03) ZFS performs so terribly on my system. I blew a good bit of personal life savings on this set-up but am not seeing performance anywhere near what is expected. Testing with iozone shows that bulk I/O performance is good. Testing with Jeff Bonwick's 'diskqual.sh' shows expected disk performance. The problem is that actual observed application performance sucks, and could often be satisified by portable USB drives rather than high-end SAS drives. It could be satisified by just one SAS disk drive. Behavior is as if zfs is very slow to read data since disks are read at only 2 or 3 MB/second followed by an intermittent write on a long cycle. Drive lights blink slowly. It is as if ZFS does no successful sequential read-ahead on the files (see Prefetch Data hit rate of 0% and Prefetch Data cache miss of 60% below), or there is a semaphore bottleneck somewhere (but CPU use is very low). Observed behavior is very program dependent. # zpool status Sun_2540 pool: Sun_2540 state: ONLINE status: The pool is formatted using an older on-disk format. The pool can still be used, but some features are unavailable. action: Upgrade the pool using 'zpool upgrade'. Once this is done, the pool will no longer be accessible on older software versions. scrub: scrub completed after 0h46m with 0 errors on Mon Jun 29 05:06:33 2009 config: NAME STATE READ WRITE CKSUM Sun_2540 ONLINE 0 0 0 mirror ONLINE 0 0 0 c4t600A0B80003A8A0B096A47B4559Ed0 ONLINE 0 0 0 c4t600A0B800039C9B50AA047B4529Bd0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c4t600A0B80003A8A0B096E47B456DAd0 ONLINE 0 0 0 c4t600A0B800039C9B50AA447B4544Fd0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c4t600A0B80003A8A0B096147B451BEd0 ONLINE 0 0 0 c4t600A0B800039C9B50AA847B45605d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c4t600A0B80003A8A0B096647B453CEd0 ONLINE 0 0 0 c4t600A0B800039C9B50AAC47B45739d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c4t600A0B80003A8A0B097347B457D4d0 ONLINE 0 0 0 c4t600A0B800039C9B50AB047B457ADd0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c4t600A0B800039C9B50A9C47B4522Dd0 ONLINE 0 0 0 c4t600A0B800039C9B50AB447B4595Fd0 ONLINE 0 0 0 errors: No known data errors % ./diskqual.sh c1t0d0 130 MB/sec c1t1d0 130 MB/sec c2t202400A0B83A8A0Bd31 13422 MB/sec c3t202500A0B83A8A0Bd31 13422 MB/sec c4t600A0B80003A8A0B096A47B4559Ed0 191 MB/sec c4t600A0B80003A8A0B096E47B456DAd0 192 MB/sec c4t600A0B80003A8A0B096147B451BEd0 192 MB/sec c4t600A0B80003A8A0B096647B453CEd0 192 MB/sec c4t600A0B80003A8A0B097347B457D4d0 212 MB/sec c4t600A0B800039C9B50A9C47B4522Dd0 191 MB/sec c4t600A0B800039C9B50AA047B4529Bd0 192 MB/sec c4t600A0B800039C9B50AA447B4544Fd0 192 MB/sec c4t600A0B800039C9B50AA847B45605d0 191 MB/sec c4t600A0B800039C9B50AAC47B45739d0 191 MB/sec c4t600A0B800039C9B50AB047B457ADd0 191 MB/sec c4t600A0B800039C9B50AB447B4595Fd0 191 MB/sec % arc_summary.pl System Memory: Physical RAM: 20470 MB Free Memory : 2371 MB LotsFree: 312 MB ZFS Tunables (/etc/system): * set zfs:zfs_arc_max = 0x3 set zfs:zfs_arc_max = 0x28000 * set zfs:zfs_arc_max = 0x2 ARC Size: Current Size: 9383 MB (arcsize) Target Size (Adaptive): 10240 MB (c) Min Size (Hard Limit):1280 MB (zfs_arc_min) Max Size (Hard Limit):10240 MB (zfs_arc_max) ARC Size Breakdown: Most Recently Used Cache Size: 6%644 MB (p) Most Frequently Used Cache Size:93%9595 MB (c-p) ARC Efficency: Cache Access Total: 674638362 Cache Hit Ratio: 91% 615586988 [Defined State for buffer] Cache Miss Ratio: 8% 59051374 [Undefined State for Buffer] REAL Hit Ratio: 87% 590314508 [MRU/MFU Hits Only] Data Demand Efficiency:96% Data Prefetch Efficiency: 7% CACHE HITS BY CACHE LIST: Anon:2%13626529 [ New Customer, First Cache Hit ] Most Recently Used: