Re: [zfs-discuss] No write coalescing after upgrade to Solaris 11 Express
NAMEPROPERTY VALUE SOURCE MirrorPool sync disabled local MirrorPool/CCIT sync disabled local MirrorPool/EX01 sync disabled inherited from MirrorPool MirrorPool/EX02 sync disabled inherited from MirrorPool MirrorPool/FileStore1 sync disabled inherited from MirrorPool Sync was disabled on the main pool and then let to inherrit to everything else. The reason for disabled this in the first place was to fix bad NFS write performance (even with Zil on an X25e SSD it was under 1MB/s). I've also tried setting the logbias to throughput and latency but they both perform around the same level. Thanks -Matt -Original Message- From: Andrew Gabriel [mailto:andrew.gabr...@oracle.com] Sent: Wednesday, 27 April 2011 3:41 PM To: Matthew Anderson Cc: 'zfs-discuss@opensolaris.org' Subject: Re: [zfs-discuss] No write coalescing after upgrade to Solaris 11 Express Matthew Anderson wrote: > Hi All, > > I've run into a massive performance problem after upgrading to Solaris 11 > Express from oSol 134. > > Previously the server was performing a batch write every 10-15 seconds and > the client servers (connected via NFS and iSCSI) had very low wait times. Now > I'm seeing constant writes to the array with a very low throughput and high > wait times on the client servers. Zil is currently disabled. How/Why? > There is currently one failed disk that is being replaced shortly. > > Is there any ZFS tunable to revert Solaris 11 back to the behaviour of oSol > 134? > What does "zfs get sync" report? -- Andrew Gabriel ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] No write coalescing after upgrade to Solaris 11 Express
Hi All, I've run into a massive performance problem after upgrading to Solaris 11 Express from oSol 134. Previously the server was performing a batch write every 10-15 seconds and the client servers (connected via NFS and iSCSI) had very low wait times. Now I'm seeing constant writes to the array with a very low throughput and high wait times on the client servers. Zil is currently disabled. There is currently one failed disk that is being replaced shortly. Is there any ZFS tunable to revert Solaris 11 back to the behaviour of oSol 134? I attempted to remove Sol 11 and reinstall 134 but it keeps freezing during install which is probably another issue entirely... IOstat output is below. When running iostat -v 2 that level is writes OP's and throughput is very constant. capacity operationsbandwidth poolalloc free read write read write -- - - - - - - MirrorPool 12.2T 4.11T153 4.63K 6.06M 33.6M mirror1.04T 325G 11416 400K 2.80M c7t0d0 - - 5114 163K 2.80M c7t1d0 - - 6114 237K 2.80M mirror1.04T 324G 10374 426K 2.79M c7t2d0 - - 5108 190K 2.79M c7t3d0 - - 5107 236K 2.79M mirror1.04T 324G 15425 537K 3.15M c7t4d0 - - 7115 290K 3.15M c7t5d0 - - 8116 247K 3.15M mirror1.04T 325G 13412 572K 3.00M c7t6d0 - - 7115 313K 3.00M c7t7d0 - - 6116 259K 3.00M mirror1.04T 324G 13381 580K 2.85M c7t8d0 - - 7111 362K 2.85M c7t9d0 - - 5111 219K 2.85M mirror1.04T 325G 15408 654K 3.10M c7t10d0 - - 7122 336K 3.10M c7t11d0 - - 7123 318K 3.10M mirror1.04T 325G 14461 681K 3.22M c7t12d0 - - 8130 403K 3.22M c7t13d0 - - 6132 278K 3.22M mirror 749G 643G 1279 140K 1.07M c4t14d0 - - 0 0 0 0 c7t15d0 - - 1 83 140K 1.07M mirror1.05T 319G 18333 672K 2.74M c7t16d0 - - 11 96 406K 2.74M c7t17d0 - - 7 96 266K 2.74M mirror1.04T 323G 13353 540K 2.85M c7t18d0 - - 7 98 279K 2.85M c7t19d0 - - 6100 261K 2.85M mirror1.04T 324G 12459 543K 2.99M c7t20d0 - - 7118 285K 2.99M c7t21d0 - - 4119 258K 2.99M mirror1.04T 324G 11431 465K 3.04M c7t22d0 - - 5116 195K 3.04M c7t23d0 - - 6117 272K 3.04M c8t2d00 29.5G 0 0 0 0 cache - - - - - - c8t3d059.4G 3.88M113 64 6.51M 7.31M c8t1d059.5G48K 95 69 5.69M 8.08M Thanks -Matt ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] NTFS on NFS and iSCSI always generates small IO's
Hi All, I've run into a problem with my OpenSolaris system and NTFS, I can't seem to make sense of it. The server running the virtual machines is Ubuntu Server 10.04 running KVM. Storage is presented via NFS over Infiniband. ZFS is not running compression or dedup. Zil is also currently disabled because it was causing terrible NFS performance. ESXi also displayed the same behaviour when running Windows VM's. OpenSolaris box is running - OS b147 Xeon 5520 24GB DDR3 24x 1.5TB 7200rpm sata drives 24 bay expander connected to an LSI raid card via 4 lane SAS cable 2x 64GB Intel X25-M L2ARC 2x 32GB Intel X25-M SLOG Host - 4KB reads & writes are full speed. Tested using DD. Also tested using Samba writing to the NFS share (Bare metal PC -> samba -> NFS) and it was writing as fast as the server could dish out the data (50MB/s). Test VM 1 - OS is Ubuntu Server 10.10 on ext4. Reads & writes are full speed at 4kb, tested using DD. Sequential read/write seems fine. Tested using Samba also, same results as above (50MB/s). Test VM 2- OS is Windows Server 2008 on NTFS 4kb cluster size. 4KB reads & writes are 30MB/s, tested using the ATTO disk benchmark. Sequential reads and writes max out at 30MB/s also and generate a lot of IO on the storage box. IOStat on the storage box shows 3k+ IOPS and only 30MB/s throughput. Also tested a bare metal system running iSCSI over gigabit. NTFS (default 4kb cluster size) directly on the ZFS block volume. Results are the same as Test VM 2, lots of small IO (3k+ IOPS) and small throughput 30MB/s I also did the same test using SRP. 4kb reads and writes have the same 30MB/s. Any IO above or equal to 128kb seems to pick up speed drastically, easily 700MB/s. I have a feeling it's to do with ZFS's recordsize property but haven't been able to find any solid testing done with NTFS. I'm going to do some testing using smaller record sizes tonight to see if that helps the issue. At the moment I'm surviving on cache and am quickly running out of capacity. Can anyone suggest any further tests or have any idea about what's going on? Thanks -Matt ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] SCSI timeouts with rPool on usb
I'm currently having a few problems with my storage server. Server specs are - Open Solaris snv_134 Supermicro X8DTi motherboard Intel Xeon 5520 6x 4GB DDR3 LSI RAID Card - running 24x 1.5TB SATA drives Adaptec 2405 - running 4x Intel SSD X25-E's Boot's from 8GB USB flash drive The initial problem started after a while when the console showed SCSI timeouts and the whole server locked up. After rebooting the USB boot drive failed to show in the bios, a power cycle fixed the problem and the server worked flawlessly for another 30-40 days until it locked up again. I inserted a second 8GB USB key and mirrored the rpool, again it only lasted 30 days and then locked up. It definitely seems as though the USB drives are at fault, the other disks have so far performed perfectly. The next step I was going to take is convert the rpool to a sata 2.5inch laptop drive, the problem is when I attempted to run the SSD's from the onboard data controller they lasted about 30 minutes before OS set them as degraded due to checksum errors. Running them from the 2405 controller has been error free so far so I'm very wary of using the on board controller. I was thinking I could probably slice up the SSD's to take over the rpool. The two read cache SSD's are 60GB each, missing 8GB from each wouldn't cause an issue. Can anyone help out with the process to swing the rpool from the USB key to SSD? Or if anyone has a suggestion on what I could possibly do to sort out the problem. Thanks -Matt ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] COMSTAR dropouts with dedup enabled
Thanks Brandon, This system has 24GB of RAM and currently no L2ARC. The total de duplicated data was about 250GB so I wouldn't have thought I would be out of RAM, I've removed the LUN for the time being so I can't get the DDR size at the moment. I have some X25-E's to go in as L2ARC and SLOG so I'll revisit de-dup soon to see if that helps the issue. -Matt ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] COMSTAR dropouts with dedup enabled
Hi All, I currently use b134 and COMSTAR to deploy SRP targets for virtual machine storage (VMware ESXi4) and have run into some unusual behaviour when dedup is enabled for a particular LUN. The target seems to lock up (ESX reports it as unavailable) when writing large amount or overwriting data, reads are unaffected. The easiest way for me to replicate the problem was to restore a 2GB SQL database inside a VM. The dropouts lasted anywhere from 3 seconds to a few minutes and when connectivity is restored the other LUN's (without dedup) drop out for a few seconds. The problem didn't seem to occur with only a small amount of data on the LUN (<50GB) and happened more frequently as the LUN filled up. I've since moved all data to non-dedup LUN's and I haven't seen a dropout for over a month. Does anyone know why this is happening? I've also seen the behaviour when exporting iSCSI targets with COMSTAR. I haven't had a chance to install the SSD's for L2ARC and SLOG yet so I'm unsure if that will help the issue. System specs are- Single Xeon 5620 24GB DDR3 24x 1.5TB 7200rpm LSI RAID card Thanks -Matt ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss