Re: [zfs-discuss] Adaptec AAC driver
Thanks..it was what i had to do . Bruno On 29-3-2010 19:12, Cyril Plisko wrote: On Mon, Mar 29, 2010 at 4:57 PM, Bruno Sousa bso...@epinfante.com wrote: pkg uninstall aac Creating Planpkg: Cannot remove 'pkg://opensolaris.org/driver/storage/a...@0.5.11 ,5.11-0.134:20100302T021758Z' due to the following packages that depend on it: pkg://opensolaris.org/storage/storage-ser...@0.1,5.11-0.134:20100302T050950Z pkg uninstall aac storage-server pkg: Requested uninstall operation would affect files that cannot be modified in live image. Please retry this operation on an alternate boot environment. So...how can i remove the driver from my environment. beadm create newbe beadm mount newbe /newbe pkg -R /newbe uninstall aac beadm umount newbe beadm activate newbe reboot A bit long, but you ultimately gives you fully recoverable setup in case something goes south. Thanks in advance, Bruno On 29-3-2010 15:50, Cyril Plisko wrote: On Mon, Mar 29, 2010 at 4:25 PM, Bruno Sousa bso...@epinfante.com wrote: Hello all, Currently i'm evaluating a system with an Adaptec 52445 Raid HBA, and the driver supplied by Opensolaris doesn't support JBOD drives. I'm running snv_134 but when i try to do uninstall the SUNWacc driver i have the following error : pkgrm SUNWaac The following package is currently installed: SUNWaac Adaptec AdvanceRaid Controller SCSI HBA Driver (i386) 11.11,REV=2010.02.17.03.06 Do you want to remove this package? [y,n,?,q] y pkgrm: ERROR: unable to change current working directory to /var/sadm/pkg/SUNWac/install Removal of SUNWaac failed (internal error). No changes were made to the system. Does anyone knows how can i replace the opensolaris aac driver by the Adaptec aac driver? On OpenSolaris you shouldnt use SVR4 pkg* commands - it is uses pkg(5) (aka IPS) Try ´pkg uninstall aac´ or use packagemanager to remove the aac driver package smime.p7s Description: S/MIME Cryptographic Signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] broken zfs root
I'm running Solaris 10 Sparc with rather updated patches (as of ~30 days ago?) on a netra x1. I had set up zfs root with two IDE 40GB hard disks. all was fine until my secondary master died. no read/write errors; just dead. No matter what I try (booting with the dead drive in place, booting with an identical but working drive in its place, booting with no secondary master, booting in single user mode), the system panics while trying to come up. What are we supposed to do when a ZFS root loses it's disk? This behavior is just bizarre. Boot device: disk File and args: SunOS Release 5.10 Version Generic_142900-07 64-bit Copyright 1983-2010 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms. \ panic[cpu0]/thread=180e000: assertion failed: nvlist_lookup_uint64(config, ZPOOL_CONFIG_POOL_TXG, txg) == 0, file: ../../common/fs/zfs/spa.c, line: 2218 0180b3f0 genunix:assfail+74 (130a570, 130a5b0, 8aa, 183fc00, 1265c00, 0) %l0-3: 0600106e6000 0004 4000 0600106ea000 %l4-7: 01265c00 01888400 0180b4a0 zfs:spa_check_rootconf+54 (130a400, 130a400, 0, 180b610, 1137c00, 0) %l0-3: 0021 000a 0002 0020 %l4-7: 01137df8 0008 0008 0130a400 0180b560 zfs:spa_get_rootconf+1d0 (2, 8, 180b718, 180b610, 180b620, 0) %l0-3: 0130a788 0001 0001 0130a400 %l4-7: 0130a400 0180b660 zfs:spa_import_rootpool+10 (183dc30, 0, 18db400, 18c1800, 9, 0) %l0-3: 002c 01872400 002c 01815000 %l4-7: 002b 0002 0180e000 0180b720 zfs:zfs_mountroot+6c (189b3a0, 0, 0, 708, 0, 33d6898) %l0-3: 018cf800 01877800 011b8c00 018cb000 %l4-7: 01877800 011b8c00 018bd800 01877800 0180b7e0 swapgeneric:rootconf+1b0 (0, 183dc00, 1872a20, 0, 183dc30, 1872f68) %l0-3: 01872800 03005a60 0304b208 %l4-7: 018c1800 018ca800 0180b890 unix:stubs_common_code+70 (324d000, 0, 4, 0, 324d000, 2fff) %l0-3: 0180b149 0180b211 03338dd0 %l4-7: 01818a70 0001 0180b950 genunix:vfs_mountroot+60 (800, 200, 0, 1872800, 189b400, 18cac00) %l0-3: 010bf800 010bf920 01878360 011f2400 %l4-7: 011f2400 018cd000 0600 0200 0180ba10 genunix:main+9c (0, 180c000, 1838130, 1815358, 181b738, 18bd800) %l0-3: 0180c000 0180c000 70002000 %l4-7: 0183d400 0180c000 0001 skipping system dump - no dump device configured rebooting... Res LOM event: +0h1m59s host reset etting ... -- Jeremy Kister http://jeremy.kister.net./ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] sharing a ssd between rpool and l2arc
Thanks for the reply. I didn't get very much further. Yes, ZFS loves raw devices. When I had two devices I wouldn't be in this mess. I would simply install opensolaris on the first disk and add the second ssd to the data pool with a zpool add mpool cache cxtydz Notice that no slices or partitions were used. But I don't have space for two devices. So I have to deal with slices and partitions. I did another clean install in 12Gb partition leaving 18Gb free. I tried parted to resize the partition, but it said that resizing (solaris2) partitions wasn't implemented. I tried fdisk but no luck either. I tried the send and receive, create new partition and slices, restore rpool in slice0, do installgrub but it wouldn't boot anymore. Can anybody give a summary of commands/steps howto accomplish a bootable rpool and l2arc on a ssd. Preferably for the x86 platform. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] sharing a ssd between rpool and l2arc
F. Wessels wrote: Thanks for the reply. I didn't get very much further. Yes, ZFS loves raw devices. When I had two devices I wouldn't be in this mess. I would simply install opensolaris on the first disk and add the second ssd to the data pool with a zpool add mpool cache cxtydz Notice that no slices or partitions were used. But I don't have space for two devices. So I have to deal with slices and partitions. I did another clean install in 12Gb partition leaving 18Gb free. I tried parted to resize the partition, but it said that resizing (solaris2) partitions wasn't implemented. I tried fdisk but no luck either. I tried the send and receive, create new partition and slices, restore rpool in slice0, do installgrub but it wouldn't boot anymore. Can anybody give a summary of commands/steps howto accomplish a bootable rpool and l2arc on a ssd. Preferably for the x86 platform. Look up zvols, as this is what you want to use, NOT partitions (for the many reasons you've encountered). In essence, do a normal install, using the ENTIRE disk for your rpool. Then create a zvol in the rpool: # zfs create -V 8GB rpool/zvolname Add this zvol as the cache device (L2arc) for your other pool # zpool create tank mirror c1t0d0 c1t1d0s0 cache rpool/zvolname the default block size for zvols is 8k, which I'd be interested in having someone test out other sizes, to see which would be best for an L2ARC device. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] sharing a ssd between rpool and l2arc
On 30/03/2010 10:05, Erik Trimble wrote: F. Wessels wrote: Thanks for the reply. I didn't get very much further. Yes, ZFS loves raw devices. When I had two devices I wouldn't be in this mess. I would simply install opensolaris on the first disk and add the second ssd to the data pool with a zpool add mpool cache cxtydz Notice that no slices or partitions were used. But I don't have space for two devices. So I have to deal with slices and partitions. I did another clean install in 12Gb partition leaving 18Gb free. I tried parted to resize the partition, but it said that resizing (solaris2) partitions wasn't implemented. I tried fdisk but no luck either. I tried the send and receive, create new partition and slices, restore rpool in slice0, do installgrub but it wouldn't boot anymore. Can anybody give a summary of commands/steps howto accomplish a bootable rpool and l2arc on a ssd. Preferably for the x86 platform. Look up zvols, as this is what you want to use, NOT partitions (for the many reasons you've encountered). In this case partitions is the only way this will work. In essence, do a normal install, using the ENTIRE disk for your rpool. Then create a zvol in the rpool: # zfs create -V 8GB rpool/zvolname Add this zvol as the cache device (L2arc) for your other pool # zpool create tank mirror c1t0d0 c1t1d0s0 cache rpool/zvolname That won't work L2ARC devices can not be a ZVOL of another pool, they can't be a file either. An L2ARC device must be a physical device. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] sharing a ssd between rpool and l2arc
Darren J Moffat wrote: On 30/03/2010 10:05, Erik Trimble wrote: F. Wessels wrote: Thanks for the reply. I didn't get very much further. Yes, ZFS loves raw devices. When I had two devices I wouldn't be in this mess. I would simply install opensolaris on the first disk and add the second ssd to the data pool with a zpool add mpool cache cxtydz Notice that no slices or partitions were used. But I don't have space for two devices. So I have to deal with slices and partitions. I did another clean install in 12Gb partition leaving 18Gb free. I tried parted to resize the partition, but it said that resizing (solaris2) partitions wasn't implemented. I tried fdisk but no luck either. I tried the send and receive, create new partition and slices, restore rpool in slice0, do installgrub but it wouldn't boot anymore. Can anybody give a summary of commands/steps howto accomplish a bootable rpool and l2arc on a ssd. Preferably for the x86 platform. Look up zvols, as this is what you want to use, NOT partitions (for the many reasons you've encountered). In this case partitions is the only way this will work. In essence, do a normal install, using the ENTIRE disk for your rpool. Then create a zvol in the rpool: # zfs create -V 8GB rpool/zvolname Add this zvol as the cache device (L2arc) for your other pool # zpool create tank mirror c1t0d0 c1t1d0s0 cache rpool/zvolname That won't work L2ARC devices can not be a ZVOL of another pool, they can't be a file either. An L2ARC device must be a physical device. I could have sworn I did this with a zvol awhile ago. Maybe that was for something else... -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] sharing a ssd between rpool and l2arc
On 30/03/2010 10:13, Erik Trimble wrote: Add this zvol as the cache device (L2arc) for your other pool # zpool create tank mirror c1t0d0 c1t1d0s0 cache rpool/zvolname That won't work L2ARC devices can not be a ZVOL of another pool, they can't be a file either. An L2ARC device must be a physical device. I could have sworn I did this with a zvol awhile ago. Maybe that was for something else... The check for the L2ARC device being a block device has always been there. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] sharing a ssd between rpool and l2arc
Darren J Moffat wrote: On 30/03/2010 10:13, Erik Trimble wrote: Add this zvol as the cache device (L2arc) for your other pool # zpool create tank mirror c1t0d0 c1t1d0s0 cache rpool/zvolname That won't work L2ARC devices can not be a ZVOL of another pool, they can't be a file either. An L2ARC device must be a physical device. I could have sworn I did this with a zvol awhile ago. Maybe that was for something else... The check for the L2ARC device being a block device has always been there. I just tried a couple of things on one of my test machines, and I think I know where I was mis-remembering it from: you can add a file or zvol as a ZIL device. Frankly, I'm a little confused by this. I would think that you would have consistent behavior between ZIL and L2ARC devices - either they both can be a file/zvol, or neither can. Not the current behavior. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] sharing a ssd between rpool and l2arc
Thank you Erik for the reply. I misunderstood Dan's suggestion about the zvol in the first place. Now you make the same suggestion also. Doesn't zfs prefer raw devices? When following this route the zvol used as cache device for tank makes use of the ARC of rpool what doesn't seem right. Or is there some setting to prevent this. It's zfs preference for raw devices that I looked for a way to use a slice as cache. Regards, Frederik -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Sun Flash Accelerator F20 numbers
Hi, I did some tests on a Sun Fire x4540 with an external J4500 array (connected via two HBA ports). I.e. there are 96 disks in total configured as seven 12-disk raidz2 vdevs (plus system, spares, unused disks) providing a ~ 63 TB pool with fletcher4 checksums. The system was recently equipped with a Sun Flash Accelerator F20 with 4 FMod modules to be used as log devices (ZIL). I was using the latest snv_134 software release. Here are some first performance numbers for the extraction of an uncompressed 50 MB tarball on a Linux (CentOS 5.4 x86_64) NFS-client which mounted the test filesystem (no compression or dedup) via NFSv3 (rsize=wsize=32k,sync,tcp,hard). standard ZIL: 7m40s (ZFS default) 1x SSD ZIL: 4m07s (Flash Accelerator F20) 2x SSD ZIL: 2m42s (Flash Accelerator F20) 2x SSD mirrored ZIL: 3m59s (Flash Accelerator F20) 3x SSD ZIL: 2m47s (Flash Accelerator F20) 4x SSD ZIL: 2m57s (Flash Accelerator F20) disabled ZIL: 0m15s (local extraction0m0.269s) I was not so much interested in the absolute numbers but rather in the relative performance differences between the standard ZIL, the SSD ZIL and the disabled ZIL cases. Any opinions on the results? I wish the SSD ZIL performance was closer to the disabled ZIL case than it is right now. ATM I tend to use two F20 FMods for the log and the two other FMods as L2ARC cache devices (although the system has lots of system memory i.e. the L2ARC is not really necessary). But the speedup of disabling the ZIL altogether is appealing (and would probably be acceptable in this environment). -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] sharing a ssd between rpool and l2arc
Thank you Darren. So no zvol's as L2ARC cache device. That leaves partitions and slices. When I tried to add a second partition, the first contained slices with the root pool, as cache device. Zpool refused, it reported that the device CxTyDzP2 (note P2) wasn't supported. Perhaps I did something wrong with fdisk. But the partitions were both there. Also in parted. And the other option, use a slice in the first partition as cache device. But I messed up my boot environment with that. And I'm not sure about the correct resizing procedure. Any suggestions? Regards, Frederik. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] sharing a ssd between rpool and l2arc
F. Wessels wrote: Thank you Erik for the reply. I misunderstood Dan's suggestion about the zvol in the first place. Now you make the same suggestion also. Doesn't zfs prefer raw devices? When following this route the zvol used as cache device for tank makes use of the ARC of rpool what doesn't seem right. Or is there some setting to prevent this. It's zfs preference for raw devices that I looked for a way to use a slice as cache. Regards, Frederik As Darren pointed out, you can't use anything but a block device for the L2ARC device. So my suggestion doesn't work. You can use a file or zvol as a ZIL device, but not an L2ARC device. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs diff
On Mon, Mar 29, 2010 at 5:39 PM, Nicolas Williams nicolas.willi...@sun.com wrote: One really good use for zfs diff would be: as a way to index zfs send backups by contents. Or to generate the list of files for incremental backups via NetBackup or similar. This is especially important for file systems will millions of files with relatively few changes. +1 The reason zfs send is so fast, is not because it's so fast. It's because it does not need any time to index and compare and analyze which files have changed since the last snapshot or increment. If the zfs diff command could generate the list of changed files, and you feed that into tar or whatever, then these 3rd party backup tools become suddenly much more effective. Able to rival the performance of zfs send. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] sharing a ssd between rpool and l2arc
Just clarifying Darren's comment - we got bitten by this pretty badly so I figure it's worth saying again here. ZFS will *allow* you to use a ZVOL of one pool as a ZDEV in another pool, but it results in race conditions and an unstable system. (At least on Solaris 10 update 8). We tried to use a ZVOL from rpool (on fast 15k rpm drives) as a cache device for another pool (on slower 7.2k rpm drives). It worked great up until it hit the race condition and hung the system. It would have been nice if zfs had issued a warning, or at least if this fact was better documented. Scott Duckworth, Systems Programmer II Clemson University School of Computing On Tue, Mar 30, 2010 at 5:09 AM, Darren J Moffat darr...@opensolaris.orgwrote: On 30/03/2010 10:05, Erik Trimble wrote: F. Wessels wrote: Thanks for the reply. I didn't get very much further. Yes, ZFS loves raw devices. When I had two devices I wouldn't be in this mess. I would simply install opensolaris on the first disk and add the second ssd to the data pool with a zpool add mpool cache cxtydz Notice that no slices or partitions were used. But I don't have space for two devices. So I have to deal with slices and partitions. I did another clean install in 12Gb partition leaving 18Gb free. I tried parted to resize the partition, but it said that resizing (solaris2) partitions wasn't implemented. I tried fdisk but no luck either. I tried the send and receive, create new partition and slices, restore rpool in slice0, do installgrub but it wouldn't boot anymore. Can anybody give a summary of commands/steps howto accomplish a bootable rpool and l2arc on a ssd. Preferably for the x86 platform. Look up zvols, as this is what you want to use, NOT partitions (for the many reasons you've encountered). In this case partitions is the only way this will work. In essence, do a normal install, using the ENTIRE disk for your rpool. Then create a zvol in the rpool: # zfs create -V 8GB rpool/zvolname Add this zvol as the cache device (L2arc) for your other pool # zpool create tank mirror c1t0d0 c1t1d0s0 cache rpool/zvolname That won't work L2ARC devices can not be a ZVOL of another pool, they can't be a file either. An L2ARC device must be a physical device. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] sharing a ssd between rpool and l2arc
you can't use anything but a block device for the L2ARC device. sure you can... http://mail.opensolaris.org/pipermail/zfs-discuss/2010-March/039228.html it even lives through a reboot (rpool is mounted before other pools) zpool create -f test c9t3d0s0 c9t4d0s0 zfs create -V 3G rpool/cache zpool add test cache /dev/zvol/dsk/rpool/cache reboot if your asking for a L2ARC on rpool, well, yea, its not mounted soon enough, but the point is to put rpool, swap, and L2ARC for your storage pool all on a single SSD.. Rob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool split problem?
OK, I see what the problem is: the /etc/zfs/zpool.cache file. When the pool was split, the zpool.cache file was also split - and the split happens prior to the config file being updated. So, after booting off the split side of the mirror, zfs attempts to mount rpool based on the information in the zpool.cache file (which still shows it as a mirror of c0t0d0s0 and c0t1d0s0). The fix would be to remove the appropriate entry from the split-off pool's zpool.cache file. Easy to say, not so easy to do. I have filed CR 6939334 to track this issue. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs recreate questions
Thanks for the details Edward, that is good to know. Another quick question. In my test setup I created the pool using snv_134 because I wanted to see how things would run as the next release is supposed to be based off of snv_134 (from my understanding). However, I recently read that the 2010.03 release time is unknown and that things are kind of uncertain (is this true? Is there a link that provides info about the release schedule?). Anyway, my question is, I created the pool with snv_134 and since I need to get this hardware to production, I can't wait for the new release and have to move forward with 2009.06. However, as expected I can't import it because the pool was created with a newer version of ZFS. What options are there to import? Like I said, I don't need the data, so I can blow out the pool and start over. However, I was curious to see how ZFS could handle this situation. Thanks guys, I really appreciate all the info. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] sharing a ssd between rpool and l2arc
On Mar 29, 2010, at 1:10 PM, F. Wessels wrote: Hi, as Richard Elling wrote earlier: For more background, low-cost SSDs intended for the boot market are perfect candidates. Take a X-25V @ 40GB and use 15-20 GB for root and the rest for an L2ARC. For small form factor machines or machines with max capacity of 8GB of RAM (a typical home system) this can make a pleasant improvement over a HDD-only implementation. For the upcoming 2010.03 release and now testing with a b134. What is the most appropiate way to accomplish this? The most appropriate (supportable by Oracle) is to use the automated installer. An example of the manifest is: http://dlc.sun.com/osol/docs/content/dev/AIinstall/customai.html#ievtoc The caiman installer allows you to control the size of the partition on the boot disk but it doesn't allow you (at least I couldn't figure out how) to control the size of the slices. So you end with slice0 filling the entire partition. Now this leaves you with two options, create a second partition or start a complex process of backing up the root pool, reslicing the first partition, restore the root pool and pray that the system will boot again. I tried the first, knowing that multiple partitions isn't recommended. I couldn't get zfs to add the second partition as L2ARC. It simply said that it wasn't supported. Before I try the second option perhaps somebody can give some directions howto accomplish a shared rpool and l2arc on a (ss)disk. There are perhaps a half dozen ways to do this. As others have mentioned, using fdisk partitions can be done and is particularly easy when using the text-based installer. However, with that option you need a smarter partition editor than fdisk (eg. gparted) And, of course, you can fake out the installer altogether... or even change the source code... -- richard ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] sharing a ssd between rpool and l2arc
F. Wessels wrote: Hi, as Richard Elling wrote earlier: For more background, low-cost SSDs intended for the boot market are perfect candidates. Take a X-25V @ 40GB and use 15-20 GB for root and the rest for an L2ARC. For small form factor machines or machines with max capacity of 8GB of RAM (a typical home system) this can make a pleasant improvement over a HDD-only implementation. For the upcoming 2010.03 release and now testing with a b134. What is the most appropiate way to accomplish this? The caiman installer allows you to control the size of the partition on the boot disk but it doesn't allow you (at least I couldn't figure out how) to control the size of the slices. So you end with slice0 filling the entire partition. Now this leaves you with two options, create a second partition or start a complex process of backing up the root pool, reslicing the first partition, restore the root pool and pray that the system will boot again. I tried the first, knowing that multiple partitions isn't recommended. I couldn't get zfs to add the second partition as L2ARC. It simply said that it wasn't supported. Before I try the second option perhaps somebody can give some directions howto accomplish a shared rpool and l2arc on a (ss)disk. Regards, Frederik As I think was possibly mentioned before on this thread, what you probably want to do is either: (a) create a zvol inside the existing rpool, then add the zvol as an L2ARC or (b) create a file in one of the rpool filesystems, and add that as the L2ARC Likely, (a) is the better option. So, go ahead and give the entire boot SSD to the installer to create a rpool of the entire disk, then zvol off a section to be used as the L2ARC. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun Flash Accelerator F20 numbers
On 3/30/2010 2:44 PM, Adam Leventhal wrote: Hey Karsten, Very interesting data. Your test is inherently single-threaded so I'm not surprised that the benefits aren't more impressive -- the flash modules on the F20 card are optimized more for concurrent IOPS than single-threaded latency. Yes it would be interesting to see the Avg numbers for 10 or more clients (or jobs on one client) all performing that same test. -Kyle Adam On Mar 30, 2010, at 3:30 AM, Karsten Weiss wrote: Hi, I did some tests on a Sun Fire x4540 with an external J4500 array (connected via two HBA ports). I.e. there are 96 disks in total configured as seven 12-disk raidz2 vdevs (plus system, spares, unused disks) providing a ~ 63 TB pool with fletcher4 checksums. The system was recently equipped with a Sun Flash Accelerator F20 with 4 FMod modules to be used as log devices (ZIL). I was using the latest snv_134 software release. Here are some first performance numbers for the extraction of an uncompressed 50 MB tarball on a Linux (CentOS 5.4 x86_64) NFS-client which mounted the test filesystem (no compression or dedup) via NFSv3 (rsize=wsize=32k,sync,tcp,hard). standard ZIL: 7m40s (ZFS default) 1x SSD ZIL: 4m07s (Flash Accelerator F20) 2x SSD ZIL: 2m42s (Flash Accelerator F20) 2x SSD mirrored ZIL: 3m59s (Flash Accelerator F20) 3x SSD ZIL: 2m47s (Flash Accelerator F20) 4x SSD ZIL: 2m57s (Flash Accelerator F20) disabled ZIL: 0m15s (local extraction0m0.269s) I was not so much interested in the absolute numbers but rather in the relative performance differences between the standard ZIL, the SSD ZIL and the disabled ZIL cases. Any opinions on the results? I wish the SSD ZIL performance was closer to the disabled ZIL case than it is right now. ATM I tend to use two F20 FMods for the log and the two other FMods as L2ARC cache devices (although the system has lots of system memory i.e. the L2ARC is not really necessary). But the speedup of disabling the ZIL altogether is appealing (and would probably be acceptable in this environment). -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Adam Leventhal, Fishworkshttp://blogs.sun.com/ahl ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Cannot replace a replacing device
Thanks - have run it and returns pretty quickly. Given the output (attached) what action can I take? Thanks James -- This message posted from opensolaris.orgDirty time logs: tank outage [300718,301073] length 356 outage [301138,301139] length 2 outage [301149,301149] length 1 outage [301151,301153] length 3 outage [301155,301155] length 1 outage [301157,301158] length 2 outage [301182,301182] length 1 outage [301262,301262] length 1 outage [301911,301916] length 6 outage [304063,304063] length 1 outage [304791,304796] length 6 raidz outage [300718,301073] length 356 outage [301138,301139] length 2 outage [301149,301149] length 1 outage [301151,301153] length 3 outage [301155,301155] length 1 outage [301157,301158] length 2 outage [301182,301182] length 1 outage [301262,301262] length 1 outage [301911,301916] length 6 outage [304063,304063] length 1 outage [304791,304796] length 6 /dev/ad4 /dev/ad6 replacing outage [300718,301073] length 356 outage [301138,301139] length 2 outage [301149,301149] length 1 outage [301151,301153] length 3 outage [301155,301155] length 1 outage [301157,301158] length 2 outage [301182,301182] length 1 outage [301262,301262] length 1 outage [301911,301916] length 6 outage [304063,304063] length 1 outage [304791,304796] length 6 /dev/ad7/old outage [300718,301073] length 356 outage [301138,301139] length 2 outage [301149,301149] length 1 outage [301151,301153] length 3 outage [301155,301155] length 1 outage [301157,301158] length 2 outage [301182,301182] length 1 outage [301262,301262] length 1 outage [301911,301916] length 6 outage [304063,304063] length 1 outage [304791,304796] length 6 /dev/ad7 outage [300718,301073] length 356 outage [301138,301139] length 2 outage [301149,301149] length 1 outage [301151,301153] length 3 outage [301155,301155] length 1 outage [301157,301158] length 2 outage [301182,301182] length 1 outage [301262,301262] length 1 outage [301911,301916] length 6 outage [304063,304063] length 1 outage [304791,304796] length 6 Metaslabs: vdev 0 0 26 20.0M offset spacemapfree -- 4 52166M 8 56 2.66G c 65 12.4M 10 66 20.7M 14 69 29.1M 18 73 29.7M 1c 77 29.6M 20 81 79.2M 24 91 87.9M 28 92 63.2M 2c 94 94.2M 30 99123M 34 103523M 38 107 50.9M 3c 111117M 40 116 54.3M 44 119 60.2M 48 123 97.4M 4c 126 1.20G 50 129 48.5M 54 132106M 58 137 27.4M 5c 140 39.6M 60 146 45.3M 64 149 34.9M 68 151544M 6c 154 36.6M 70 156 19.4M 74 160 35.7M 78 162 41.2M 7c 166 23.1M 9c 74 14.1M a0 78 15.2M a4 88 28.1M a8 174 23.3M ac 178 24.2M b0 181 26.3M b4 100 43.4M b8 104 33.6M bc 108 30.6M c0 113 59.8M c4 115 53.9M c8 120 30.8M cc 124 82.2M d0 127 36.9M d4 130 76.2M d8 133 39.7M
Re: [zfs-discuss] Mounting a snapshot of an iSCSI volume using Windows
Hello, wanted to know if there are any updates on this topic ? Regards, Robert -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Simultaneous failure recovery
I have a pool (on an X4540 running S10U8) in which a disk failed, and the hot spare kicked in. That's perfect. I'm happy. Then a second disk fails. Now, I've replaced the first failed disk, and it's resilvered and I have my hot spare back. But: why hasn't it used the spare to cover the other failed drive? And can I hotspare it manually? I could do a straight replace, but that isn't quite the same thing. -- -Peter Tribble http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] sharing a ssd between rpool and l2arc
Hi all, yes it works with the partitions. I think that I made a typo during the initial testing off adding a partition as cache, probably swapped the 0 for an o. Tested with a b134 gui and text installer on the x86 platform. So here it goes: Install opensolaris into a partition and leave some space for the L2ARC. This will remove all partitions from the disk! After the installation login. # fdisk /dev/rdsk/[boot disk]p0 select option 1 create a partition select option 1 for a SOLARIS2 partition type specify a size answer no, do not make this partition active when satified with the result write your changes by choosing 6 and exit fdisk finally add your cache device to your data pool # zpool add mpool cache /dev/rdsk/cXtYdZp2 that's it Some notes: -you CAN remove the cache device from the pool. -you CAN import the pool with a missing cache device. (Remember you CANN'T import a pool with a missing slog!) Open questions: -what happens when one cache device fails in case of several striped cache devices? Will this disable the entire L2ARC or will it continue to function minus the faulty device? -The alignment mess. I know that the (intel) ssd's are sensitive to misaligment. Fdisk only allows you to enter cylinders NOT lba addresses. You probably can workaround that with parted. -Does anybody know more about the recently announced flash aware sd driver? This was on storage-discuss a couple of days ago. Does anybody have any tips to squeeze the most out of the ssd's? Thank you all for your time and interest. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Simultaneous failure recovery
On 03/31/10 10:39 AM, Peter Tribble wrote: I have a pool (on an X4540 running S10U8) in which a disk failed, and the hot spare kicked in. That's perfect. I'm happy. Then a second disk fails. Now, I've replaced the first failed disk, and it's resilvered and I have my hot spare back. But: why hasn't it used the spare to cover the other failed drive? And can I hotspare it manually? I could do a straight replace, but that isn't quite the same thing. Was the spare spare when the second drive failed? If not, I don't think it will get used. My understanding is the spares are added when the drive is faulted, so it's an event rather then level driven action. At least I'm not the only one seeing multiple drive failures this week! -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun Flash Accelerator F20 numbers
On Mar 30, 2010, at 2:50 PM, Jeroen Roodhart wrote: Hi Karsten. Adam, List, Adam Leventhal wrote: Very interesting data. Your test is inherently single-threaded so I'm not surprised that the benefits aren't more impressive -- the flash modules on the F20 card are optimized more for concurrent IOPS than single-threaded latency. Well, I actually wanted to do a bit more bottleneck searching, but let me weigh in with some measurements of our own :) We're om a single X4540 with quad-core CPUs so we're on the older hypertransport bus. Connected it up to two X2200-s running Centos 5, each on its own 1Gb link. Switched write caching off with the following addition to the /kernel/drv/sd.conf file (Karsten: if you didn't do this already, you _really_ want to :) ): # http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Cache_Flushes # Add whitespace to make the vendor ID (VID) 8 ... and Product ID (PID) 16 characters long... sd-config-list = ATA MARVELL SD88SA02,cache-nonvolatile; cache-nonvolatile=1, 0x4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1; If you are going to trick the system into thinking a volatile cache is nonvolatile, you might as well disable the ZIL -- the data corruption potential is the same. As test we've found that untarring an eclipse sourcetar file is a good use case. So we use that. Called from a shell script that creates a directory, pushes directory and does the unpacking, for 40 times on each machine. Now for the interesting bit: When we use one vmod, both machines are finished in about 6min45, zilstat maxes out at about 4200 IOPS. Using four vmods it takes about 6min55, zilstat maxes out at 2200 IOPS. In both cases, probing the hyper transport bus seems to show no bottleneck there (although I'd like to see the biderectional flow, but I know we can't :) ). Network stays comfortably under the 400Mbits/s and that's peak load when using 1 vmod. Looking at the IO-connection architecture, it figures that in this set we traverse the different HT busses quite a lot. So we've also placed an Intel dual 1Gb NIC in another PCIE slot, so that ZIL traffic should only have to use 1 HT bus (not counting offloading intelligence). That helped a bit, but not much: Around 6min35 using one vmod and 6min45 using four vmod-s. It made looking at the HT-dtrace more telling though. Since the outgoing HT-bus to the F20 (and the e1000-s) is now, expectedly, a better indication of the ZIL traffic. We didn't do the 40 x 2 untar test whilst not using a SSD device. As an indication: unpacking a single tarbal then takes about 1min30. In case it means anything, single tarbal unpack no_zil, 1vmod, 1vmod_Intel, 4vmod-s, 4vmod_Intel measures around (decimals only used as indication!): 4s, 12s,11.2s, 12.5s, 11.6s Taking this all in account, I still don't see what's holding it up. Interestingly enough, the client side times are close within about 10 secs, but zilstat shows something different. Hypothesis: Zilstat shows only one vmod andwere capped in a layer above the ZIL? Can't rule out networking just yet, but my gut tells me we're not network bound here. That leaves the ZFS ZPL/VFS layer? The difference between writing to the ZIL and not writing to the ZIL is perhaps thousands of CPU cycles. For a latency-sensitive workload this will be noticed. -- richard I'm very open to suggestions on how to proceed... :) With kind regards, Jeroen -- Jeroen Roodhart ICT Consultant University of Amsterdam j.r.roodhart uva.nl Informatiseringscentrum Technical support/ATG -- See http://www.science.uva.nl/~jeroen for openPGP public key -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] VMware client solaris 10, RAW physical disk and zfs snapshots problem - all created snapshots are equal to zero.
I'm running Windows 7 64bit and VMware player 3 with Solaris 10 64bit as a client. I have added additional hard drive to virtual Solaris 10 as physical drive. Solaris 10 can see and use already created zpool without problem. I could also create additional zpool on the other mounted raw device. I can also synchronize zfs file system to other physical disk and the other zpool with zfs send/receive. All those physical disks are presented in Windows 7 and not initialized. The problem that I have now is that each created snapshot is always equal to zero... zfs just not storing changes that I have made to the file system before making a snapshot. Before each snapshot I have added or deleted different files, so the snapshot should archive those differences but this is not my case. :-( My zfs list looks now like this: r...@sl-node01:~# zfs list NAME USED AVAIL REFER MOUNTPOINT mypool01 91.9G 137G 23K /mypool01 mypool01/storage01 91.9G 137G 91.7G /mypool01/storage01 mypool01/storag...@30032010-1 0 - 91.9G - mypool01/storag...@30032010-2 0 - 91.9G - mypool01/storag...@30032010-3 0 - 91.7G - mypool02 91.9G 137G 24K /mypool02 mypool02/copies 23K 137G 23K /mypool02/copies mypool02/storage01 91.9G 137G 91.9G /mypool02/storage01 mypool02/storag...@30032010-1 0 - 91.9G - mypool02/storag...@30032010-2 0 - 91.9G - As you can see each snapshot is equal to zero besides changes that I have made to zfs content. I would like to understand what is making virtual Solaris 10 impossible to track those changes with creating a snapshot? Is it problem with Windows, VMware player or basically with mounted raw device to virtual machine? Does someone has experinace with this issue? Regards, Vladimir ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun Flash Accelerator F20 numbers
If you are going to trick the system into thinking a volatile cache is nonvolatile, you might as well disable the ZIL -- the data corruption potential is the same. I'm sorry? I believe the F20 has a supercap or the like? The advise on: http://wikis.sun.com/display/Performance/Tuning+ZFS+for+the+F5100#TuningZFSfortheF5100-ZFSF5100 Is to disable write caching altogether. We opted not to do _that_ though... :) Are you sure about disabling write cache on the F20 is a bad thing to do? With kind regards, Jeroen -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun Flash Accelerator F20 numbers
On Mar 30, 2010, at 3:32 PM, Jeroen Roodhart wrote: If you are going to trick the system into thinking a volatile cache is nonvolatile, you might as well disable the ZIL -- the data corruption potential is the same. I'm sorry? I believe the F20 has a supercap or the like? The advise on: You are correct, I misread the Marvell (as in F20) and X4540 (as in not X4500) combination. http://wikis.sun.com/display/Performance/Tuning+ZFS+for+the+F5100#TuningZFSfortheF5100-ZFSF5100 Is to disable write caching altogether. We opted not to do _that_ though... :) Good idea. That recommendation is flawed for the general case and only applies when all devices have nonvolatile caches. Are you sure about disabling write cache on the F20 is a bad thing to do? I agree that it is a reasonable choice. For this case, what is the average latency to the F20? -- richard ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun Flash Accelerator F20 numbers
Richard Elling wrote: On Mar 30, 2010, at 3:32 PM, Jeroen Roodhart wrote: If you are going to trick the system into thinking a volatile cache is nonvolatile, you might as well disable the ZIL -- the data corruption potential is the same. I'm sorry? I believe the F20 has a supercap or the like? The advise on: You are correct, I misread the Marvell (as in F20) and X4540 (as in not X4500) combination. http://wikis.sun.com/display/Performance/Tuning+ZFS+for+the+F5100#TuningZFSfortheF5100-ZFSF5100 Is to disable write caching altogether. We opted not to do _that_ though... :) Good idea. That recommendation is flawed for the general case and only applies when all devices have nonvolatile caches. Are you sure about disabling write cache on the F20 is a bad thing to do? I agree that it is a reasonable choice. For those following along at home, I'm pretty sure that the terminology being used is confusing at best, and just plain wrong at worst. The write cache is _not_ being disabled. The write cache is being marked as non-volatile. By marking the write cache as non-volatile, one is telling ZFS to not issue cache flush commands. BTW, why is a Sun/Oracle branded product not properly respecting the NV bit in the cache flush command? This seems remarkably broken, and leads to the amazingly bad advice given on the wiki referenced above. -- Carson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] How to destroy iscsi dataset?
Our backup system has a couple of datasets used for iscsi that have somehow lost their baseline snapshots with the live system. In fact zfs list -t snapshots doesn't show any snapshots at all for them. We rotate backup and live every now and then, so these datasets have been shared at some time. Therefore an incremental zfs send/recv will fail for these datasets. The send script automatically uses a non-incremental send if the target dataset is missing, so all I need to do is somehow destroy them. # svcs -a | grep iscsi disabled 18:50:21 svc:/network/iscsi_initiator:default disabled 18:50:34 svc:/network/iscsi/target:default disabled 18:50:38 svc:/system/iscsitgt:default disabled 18:50:39 svc:/network/iscsi/initiator:default # zfs list space/os-vdisks/osolx86 NAME USED AVAIL REFER MOUNTPOINT space/os-vdisks/osolx8620G 657G 14.9G - # zfs get shareiscsi space/os-vdisks/osolx86 NAME PROPERTYVALUE SOURCE space/os-vdisks/osolx86 shareiscsi off local # zfs destroy -f space/os-vdisks/osolx86 cannot destroy 'space/os-vdisks/osolx86': dataset is busy AFAIK they aren't shared in any way now. How to delete these datasets, or find out why they are busy? Thanks ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun Flash Accelerator F20 numbers
But the speedup of disabling the ZIL altogether is appealing (and would probably be acceptable in this environment). Just to make sure you know ... if you disable the ZIL altogether, and you have a power interruption, failed cpu, or kernel halt, then you're likely to have a corrupt unusable zpool, or at least data corruption. If that is indeed acceptable to you, go nuts. ;-) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun Flash Accelerator F20 numbers
standard ZIL: 7m40s (ZFS default) 1x SSD ZIL: 4m07s (Flash Accelerator F20) 2x SSD ZIL: 2m42s (Flash Accelerator F20) 2x SSD mirrored ZIL: 3m59s (Flash Accelerator F20) 3x SSD ZIL: 2m47s (Flash Accelerator F20) 4x SSD ZIL: 2m57s (Flash Accelerator F20) disabled ZIL: 0m15s (local extraction0m0.269s) I was not so much interested in the absolute numbers but rather in the relative performance differences between the standard ZIL, the SSD ZIL and the disabled ZIL cases. Oh, one more comment. If you don't mirror your ZIL, and your unmirrored SSD goes bad, you lose your whole pool. Or at least suffer data corruption. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] VMware client solaris 10, RAW physical disk and zfs snapshots problem - all created snapshots are equal to zero.
what size is the gz file if you do an incremental send to file? something like: zfs send -i sn...@vol sn...@vol | gzip /somplace/somefile.gz -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] VMware client solaris 10, RAW physical disk and zfs snapshots problem - all created snapshots are equal to zero.
The problem that I have now is that each created snapshot is always equal to zero... zfs just not storing changes that I have made to the file system before making a snapshot. r...@sl-node01:~# zfs list NAME USED AVAIL REFER MOUNTPOINT mypool01 91.9G 137G 23K /mypool01 mypool01/storage01 91.9G 137G 91.7G /mypool01/storage01 mypool01/storag...@30032010-1 0 - 91.9G - mypool01/storag...@30032010-2 0 - 91.9G - mypool01/storag...@30032010-3 0 - 91.7G - mypool02 91.9G 137G 24K /mypool02 mypool02/copies 23K 137G 23K /mypool02/copies mypool02/storage01 91.9G 137G 91.9G /mypool02/storage01 mypool02/storag...@30032010-1 0 - 91.9G - mypool02/storag...@30032010-2 0 - 91.9G - Try this: zfs snapshot mypool01/storag...@30032010-4 dd if=/dev/urandom of=/mypool01/storage01/randomfile bs=1024k count=1024 zfs snapshot mypool01/storag...@30032010-5 rm /mypool01/storage01/randomfile zfs snapshot mypool01/storag...@30032010-6 zfs list And see what happens. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun Flash Accelerator F20 numbers
On Tue, 30 Mar 2010, Edward Ned Harvey wrote: But the speedup of disabling the ZIL altogether is appealing (and would probably be acceptable in this environment). Just to make sure you know ... if you disable the ZIL altogether, and you have a power interruption, failed cpu, or kernel halt, then you're likely to have a corrupt unusable zpool, or at least data corruption. If that is indeed acceptable to you, go nuts. ;-) I believe that the above is wrong information as long as the devices involved do flush their caches when requested to. Zfs still writes data in order (at the TXG level) and advances to the next transaction group when the devices written to affirm that they have flushed their cache. Without the ZIL, data claimed to be synchronously written since the previous transaction group may be entirely lost. If the devices don't flush their caches appropriately, the ZIL is irrelevant to pool corruption. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun Flash Accelerator F20 numbers
Again, we can't get a straight answer on this one.. (or at least not 1 straight answer...) Since the ZIL logs are committed atomically they are either committed in FULL, or NOT at all (by way of rollback of incomplete ZIL applies at zpool mount time / or transaction rollbacks if things go exceptionally bad), the only LOST data would be what hasn't been transferred from ZIL to the primary pool.. But the pool should be sane. If this is true ... Suppose you shutdown a system, remove the ZIL device, and power back on again. What will happen? I'm informed that with current versions of solaris, you simply can't remove a zil device once it's added to a pool. (That's changed in recent versions of opensolaris) ... but in any system where removing the zil isn't allowed, what happens if the zil is removed? I have to assume something which isn't quite sane happens. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun Flash Accelerator F20 numbers
On Tue, 30 Mar 2010, Edward Ned Harvey wrote: If this is true ... Suppose you shutdown a system, remove the ZIL device, and power back on again. What will happen? I'm informed that with current versions of solaris, you simply can't remove a zil device once it's added to a pool. (That's changed in recent versions of opensolaris) ... but in any system where removing the zil isn't allowed, what happens if the zil is removed? If the ZIL device goes away then zfs might refuse to use the pool without user affirmation (due to potential loss of uncommitted transactions), but if the dedicated ZIL device is gone, zfs will use disks in the main pool for the ZIL. This has been clarified before on the list by top zfs developers. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs recreate questions
Anyway, my question is, [...] as expected I can't import it because the pool was created with a newer version of ZFS. What options are there to import? I'm quite sure there is no option to import or receive or downgrade a zfs filesystem from a later version. I'm pretty sure your only option is something like tar ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun Flash Accelerator F20 numbers
If the ZIL device goes away then zfs might refuse to use the pool without user affirmation (due to potential loss of uncommitted transactions), but if the dedicated ZIL device is gone, zfs will use disks in the main pool for the ZIL. This has been clarified before on the list by top zfs developers. Here's a snippet from man zpool. (Latest version available today in solaris) zpool remove pool device ... Removes the specified device from the pool. This command currently only supports removing hot spares and cache devices. Devices that are part of a mirrored configura- tion can be removed using the zpool detach command. Non-redundant and raidz devices cannot be removed from a pool. So you think it would be ok to shutdown, physically remove the log device, and then power back on again, and force import the pool? So although there may be no live way to remove a log device from a pool, it might still be possible if you offline the pool to ensure writes are all completed before removing the device? If it were really just that simple ... if zfs only needed to stop writing to the log device and ensure the cache were flushed, and then you could safely remove the log device ... doesn't it seem silly that there was ever a time when that wasn't implemented? Like ... Today. (Still not implemented in solaris, only opensolaris.) I know I am not going to put the health of my pool on the line, assuming this line of thought. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun Flash Accelerator F20 numbers
if you disable the ZIL altogether, and you have a power interruption, failed cpu, or kernel halt, then you're likely to have a corrupt unusable zpool the pool will always be fine, no matter what. or at least data corruption. yea, its a good bet that data sent to your file or zvol will not be there when the box comes back, even though your program had finished seconds before the crash. Rob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun Flash Accelerator F20 numbers
So you think it would be ok to shutdown, physically remove the log device, and then power back on again, and force import the pool? So although there may be no live way to remove a log device from a pool, it might still be possible if you offline the pool to ensure writes are all completed before removing the device? If it were really just that simple ... if zfs only needed to stop writing to the log device and ensure the cache were flushed, and then you could safely remove the log device ... doesn't it seem silly that there was ever a time when that wasn't implemented? Like ... Today. (Still not implemented in solaris, only opensolaris.) Allow me to clarify a little further, why I care about this so much. I have a solaris file server, with all the company jewels on it. I had a pair of intel X.25 SSD mirrored log devices. One of them failed. The replacement device came with a newer version of firmware on it. Now, instead of appearing as 29.802 Gb, it appears at 29.801 Gb. I cannot zpool attach. New device is too small. So apparently I'm the first guy this happened to. Oracle is caught totally off guard. They're pulling their inventory of X25's from dispatch warehouses, and inventorying all the firmware versions, and trying to figure it all out. Meanwhile, I'm still degraded. Or at least, I think I am. Nobody knows any way for me to remove my unmirrored log device. Nobody knows any way for me to add a mirror to it (until they can locate a drive with the correct firmware.) All the support people I have on the phone are just as scared as I am. Well we could upgrade the firmware of your existing drive, but that'll reduce it by 0.001 Gb, and that might just create a time bomb to destroy your pool at a later date. So we don't do it. Nobody has suggested that I simply shutdown and remove my unmirrored SSD, and power back on. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun Flash Accelerator F20 numbers
Just to make sure you know ... if you disable the ZIL altogether, and you have a power interruption, failed cpu, or kernel halt, then you're likely to have a corrupt unusable zpool, or at least data corruption. If that is indeed acceptable to you, go nuts. ;-) I believe that the above is wrong information as long as the devices involved do flush their caches when requested to. Zfs still writes data in order (at the TXG level) and advances to the next transaction group when the devices written to affirm that they have flushed their cache. Without the ZIL, data claimed to be synchronously written since the previous transaction group may be entirely lost. If the devices don't flush their caches appropriately, the ZIL is irrelevant to pool corruption. I stand corrected. You don't lose your pool. You don't have corrupted filesystem. But you lose whatever writes were not yet completed, so if those writes happen to be things like database transactions, you could have corrupted databases or files, or missing files if you were creating them at the time, and stuff like that. AKA, data corruption. But not pool corruption, and not filesystem corruption. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss