[zfs-discuss] Setting up for zfsboot
Hi everyone, Now that zfsboot is becoming available, I'm wondering how to put it to use. Imagine a system with 4 identical disks. Of course I'd like to use raidz, but zfsboot doesn't do raidz. What if I were to partition the drives, such that I have 4 small partitions that make up a zfsboot partition (4 way mirror), and the remainder of each drive becomes part of a raidz? Do I still have the advantages of having the whole disk 'owned' by zfs, even though it's split into two parts? Swap would probably have to go on a zvol - would that be best placed on the n-way mirror, or on the raidz? Regards, Paul Boven. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Gzip compression for ZFS
From: Darren J Moffat [EMAIL PROTECTED] ... The other problem is that you basically need a global unique registry anyway so that compress algorithm 1 is always lzjb, 2 is gzip, 3 is etc etc. Similarly for crypto and any other transform. I've two thoughts on that: 1) if there is to be a registry, it should be hosted by OpenSolaris and be open to all and 2) there should be provision for a private number space so that people can implement their own whatever so long as they understand that the filesystem will not work if plugged into something else. Case in point for (2), if I wanted to make a bzip2 version of ZFS at home then I should be able to and in doing so chose a number for it that I know will be safe for my playing at home. I shouldn't have to come to zfs-discuss@opensolaris.org to pick a number. Darren ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Gzip compression for ZFS
From: Darren J Moffat [EMAIL PROTECTED] ... The other problem is that you basically need a global unique registry anyway so that compress algorithm 1 is always lzjb, 2 is gzip, 3 is etc etc. Similarly for crypto and any other transform. I've two thoughts on that: 1) if there is to be a registry, it should be hosted by OpenSolaris and be open to all and 2) there should be provision for a private number space so that people can implement their own whatever so long as they understand that the filesystem will not work if plugged into something else. Case in point for (2), if I wanted to make a bzip2 version of ZFS at home then I should be able to and in doing so chose a number for it that I know will be safe for my playing at home. I shouldn't have to come to zfs-discuss@opensolaris.org to pick a number. I'm not sure we really need a registry or a number space. Algorithms should have names, not numbers. The zpool should contain a table: - 1 lzjb - 2 gzip - 3 ... but it could just as well be: - 1 gzip - 2 ... - 3 lzjb the zpool would simply not load if it cannot find the algorithm(s) used to store data in the zpool (or return I/O errors on the files/metadata it can't decompress) Global registries seem like a bad idea; names can be made arbitrarily long to make uniqueness. There's no reason why the algorithm can't be renamed after creating the pool might a clash occur; renumbering would be much harder. Casper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Is it possible to see USED and SIZE properties in bytes of zpool?
I can get USED in bytes of file systems in zfs pool but i do not know how to get USED in bytes for pools. I need such exact value of used size on pool to get system overheads while using snapshots. Any information about this? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is it possible to see USED and SIZE properties in bytes of zpool?
Hello Viktor, Wednesday, April 4, 2007, 1:17:58 PM, you wrote: VT I can get USED in bytes of file systems in zfs pool but i do not VT know how to get USED in bytes for pools. VT I need such exact value of used size on pool to get system VT overheads while using snapshots. Any information about this? VT If you have more than one snapshot you won't be able to get actual and accurate numbers for used space by given snapsot -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Setting up for zfsboot
Hi, Now that zfsboot is becoming available, I'm wondering how to put it to use. Imagine a system with 4 identical disks. Of course I'd like to use you lucky one :). raidz, but zfsboot doesn't do raidz. What if I were to partition the drives, such that I have 4 small partitions that make up a zfsboot partition (4 way mirror), and the remainder of each drive becomes part of a raidz? Sounds good. Performance will suffer a bit, as ZFS thinks it has two pools with 4 spindels each, but it should still perform better than the same on a UFS basis. You may also want to have two 2-way mirrors and keep the second for other purposes such as a scratch space for zfs migration or as spare disks for other stuff. Do I still have the advantages of having the whole disk 'owned' by zfs, even though it's split into two parts? I'm pretty sure that this is not the case: - ZFS has no guarantee that someone will do something else with that other partition, so it can't assume the right to turn on disk cache for the whole disk. - Yes, it could be smart and realize that it does have the whole disk, only split up across two pools, but then I assume that this is not your typical enterprise class configuration and so it probably didn't get implemented that way. I'd say that not being able to benefit from the disk drive's cache is not as bad in the face of ZFS' other advantages, so you can probably live with that. Swap would probably have to go on a zvol - would that be best placed on the n-way mirror, or on the raidz? I'd place it onto the mirror for performance reasons. Also, it feels cleaner to have all your OS stuff on one pool and all your user/app/data stuff on another. This is also recommended by the ZFS Best Practices Wiki on www.solarisinternals.com. Now back to the 4 disk RAID-Z: Does it have to be RAID-Z? Maybe you might want to reconsider using 2 2-way mirrors: - RAID-Z is slow when writing, you basically get only one disk's bandwidth. (Yes, with variable block sizes this might be slightly better...) - RAID-Z is _very_ slow when one disk is broken. - Using mirrors is more convenient for growing the pool: You run out of space, you add two disks, and get better performance too. No need to buy 4 extra disks for another RAID-Z set. - When using disks, you need to consider availability, performance and space. Of all the three, space is the cheapest. Therefore it's best to sacrifice space and you'll get better availability and better performance. Hope this helps, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Setting up for zfsboot
Constantin Gonzalez wrote: Do I still have the advantages of having the whole disk 'owned' by zfs, even though it's split into two parts? I'm pretty sure that this is not the case: - ZFS has no guarantee that someone will do something else with that other partition, so it can't assume the right to turn on disk cache for the whole disk. Can write-cache not be turned on manually as the user is sure that it is only ZFS that is using the entire disk? -Manoj ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: zfs blocks numbers for small files
Frederic Payet - Availability Services wrote: Hi gurus, When creating some small files an ZFS directory, used blocks number is not what could be espected: hinano# zfs list NAME USED AVAIL REFER MOUNTPOINT pool2 702K 16.5G 26.5K /pool2 pool2/new 604K 16.5G34K /pool2/new pool2/new/fs2 570K 16.5G 286K /pool2/new/fs2 pool2/new/fs2/subfs2 284K 16.5G 284K /pool2/new/fs2/subfs2 hinano# pwd /pool2/new/fred hinano# zfs get all pool2/new NAME PROPERTY VALUE SOURCE pool2/newtype filesystem - pool2/newcreation Tue Mar 20 13:27 2007 - pool2/newused 603K - pool2/new available 16.5G - pool2/newreferenced 33.5K - pool2/newcompressratio 1.00x - pool2/new mountedyes- pool2/newquota none default pool2/newreservation none default pool2/new recordsize 128K default pool2/newmountpoint /pool2/new default pool2/newsharenfs offdefault pool2/new checksum on default pool2/newcompressionoff default pool2/newatime on default pool2/new deviceson default pool2/newexec on default pool2/newsetuid on default pool2/new readonly offdefault pool2/newzoned off default pool2/newsnapdir hidden default pool2/new aclmodegroupmask default pool2/newaclinherit secure default hinano# mkfile 9 file9bytes hinano# mkfile 520 file520bytes hinano# mkfile 1025 file1025bytes hinano# mkfile 1023 file1023bytes hinano# mkfile 10 file10bytes hinano# ls -ls total 14 3 -rw--T 1 root root1023 Apr 4 13:34 file1023bytes 4 -rw--T 1 root root1025 Apr 4 13:34 file1025bytes 1 -rw--T 1 root root 10 Apr 4 13:38 file10bytes 3 -rw--T 1 root root 520 Apr 4 13:33 file520bytes 2 -rw--T 1 root root 9 Apr 4 13:33 file9bytes After 2 seconds : hinano# ls -ls total 13 3 -rw--T 1 root root1023 Apr 4 13:34 file1023bytes 4 -rw--T 1 root root1025 Apr 4 13:34 file1025bytes 2 -rw--T 1 root root 10 Apr 4 13:38 file10bytes 3 -rw--T 1 root root 520 Apr 4 13:33 file520bytes 2 -rw--T 1 root root 9 Apr 4 13:33 file9bytes 2 questions : - Could somebody explain why a file of 9 bytes takes 2 512b blocks ? One block for the znode (the meta-data), one block for the data. - Why the block number of file10bytes has changes after a while doing nothing more than 'ls -ls' The block count reflects the actual allocated storage on disk. The first time you did an 'ls' the data block had not yet been allocated (i.e., the data was still in transit to the disk). Please reply me directly as I'm not in this alias . Best, fred ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Setting up for zfsboot
Hi, Manoj Joseph wrote: Can write-cache not be turned on manually as the user is sure that it is only ZFS that is using the entire disk? yes it can be turned on. But I don't know if ZFS would then know about it. I'd still feel more comfortably with it being turned off unless ZFS itself does it. But maybe someone from the ZFS team can clarify this. Cheers, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Global Systems Engineering http://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten Amtsgericht Muenchen: HRB 161028 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer Vorsitzender des Aufsichtsrates: Martin Haering ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re[2]: [zfs-discuss] Setting up for zfsboot
Hello Constantin, Wednesday, April 4, 2007, 3:34:13 PM, you wrote: CG - RAID-Z is slow when writing, you basically get only one disk's bandwidth. CG (Yes, with variable block sizes this might be slightly better...) No, it's not. It's actually very fast for writing, in many cases it would be faster than raid-10 (both made of 4 disks). Now doing random reads is slow... -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Re: today panic ...
Gino, I just had a similar experience and was able to import the pool when I added the readonly option (zpool import -f -o ro ) no way ... We still get a panic :( gino This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Setting up for zfsboot
On Wed, Apr 04, 2007 at 03:34:13PM +0200, Constantin Gonzalez wrote: - RAID-Z is _very_ slow when one disk is broken. Do you have data on this? The reconstruction should be relatively cheap especially when compared with the initial disk access. Adam -- Adam Leventhal, Solaris Kernel Development http://blogs.sun.com/ahl ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Setting up for zfsboot
On Wed, Apr 04, 2007 at 10:08:07AM -0700, Adam Leventhal wrote: On Wed, Apr 04, 2007 at 03:34:13PM +0200, Constantin Gonzalez wrote: - RAID-Z is _very_ slow when one disk is broken. Do you have data on this? The reconstruction should be relatively cheap especially when compared with the initial disk access. Also, what is your definition of broken? Does this mean the device appears as FAULTED in the pool status, or that the drive is present and not responding? If it's the latter, this will be fixed by my upcoming FMA work. - Eric -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Setting up for zfsboot
On Wed, Apr 04, 2007 at 10:08:07AM -0700, Adam Leventhal wrote: On Wed, Apr 04, 2007 at 03:34:13PM +0200, Constantin Gonzalez wrote: - RAID-Z is _very_ slow when one disk is broken. Do you have data on this? The reconstruction should be relatively cheap especially when compared with the initial disk access. RAID-Z has to be slower when there is lots of bitrot, but it shouldn't be slower when a disk has read errors or is gone. Or are we talking about write performance (does RAID-Z wait too long for a disk that won't respond?)? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Setting up for zfsboot
Can write-cache not be turned on manually as the user is sure that it is only ZFS that is using the entire disk? yes it can be turned on. But I don't know if ZFS would then know about it. I'd still feel more comfortably with it being turned off unless ZFS itself does it. But maybe someone from the ZFS team can clarify this. I think that it's true that ZFS would not know about the write cache and thus you wouldn't get the benefit of it. At some point, we'd like to implement code that recognizes the zfs owns the entire disk even though the disk has multiple slices, and turn on write caching anyway. I haven't done much looking into this though. Some further comment on the proposed configuration (root mirrored across all four disks, the rest of the each disk going into a RAIDZ pool): 1. I suggest you make your root pool big enough to hold several boot environments so that you can try out clone-and-upgrade tricks like this: http://blogs.sun.com/timf/entry/an_easy_way_to_manage 2. So if root is mirrored across all four disks, that means that swapping will take place to all four disks. I'm wondering if that's a problem, or if not a problem, maybe not optimal. Lori ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zfs blocks numbers for small files
Resent, for Fred... Hi gurus, When creating some small files an ZFS directory, used blocks number is not what could be espected: hinano# zfs list NAME USED AVAIL REFER MOUNTPOINT pool2 702K 16.5G 26.5K /pool2 pool2/new 604K 16.5G34K /pool2/new pool2/new/fs2 570K 16.5G 286K /pool2/new/fs2 pool2/new/fs2/subfs2 284K 16.5G 284K /pool2/new/fs2/subfs2 hinano# pwd /pool2/new/fred hinano# zfs get all pool2/new NAME PROPERTY VALUE SOURCE pool2/newtype filesystem - pool2/newcreation Tue Mar 20 13:27 2007 - pool2/newused 603K - pool2/newavailable 16.5G - pool2/newreferenced 33.5K - pool2/newcompressratio 1.00x - pool2/newmountedyes- pool2/newquota none default pool2/newreservationnone default pool2/newrecordsize 128K default pool2/newmountpoint /pool2/new default pool2/newsharenfs offdefault pool2/newchecksum on default pool2/newcompressionoffdefault pool2/newatime on default pool2/newdeviceson default pool2/newexec on default pool2/newsetuid on default pool2/newreadonly offdefault pool2/newzoned offdefault pool2/newsnapdirhidden default pool2/newaclmodegroupmask default pool2/newaclinherit secure default hinano# mkfile 9 file9bytes hinano# mkfile 520 file520bytes hinano# mkfile 1025 file1025bytes hinano# mkfile 1023 file1023bytes hinano# mkfile 10 file10bytes hinano# ls -ls total 14 3 -rw--T 1 root root1023 Apr 4 13:34 file1023bytes 4 -rw--T 1 root root1025 Apr 4 13:34 file1025bytes 1 -rw--T 1 root root 10 Apr 4 13:38 file10bytes 3 -rw--T 1 root root 520 Apr 4 13:33 file520bytes 2 -rw--T 1 root root 9 Apr 4 13:33 file9bytes After 2 seconds : hinano# ls -ls total 13 3 -rw--T 1 root root1023 Apr 4 13:34 file1023bytes 4 -rw--T 1 root root1025 Apr 4 13:34 file1025bytes 2 -rw--T 1 root root 10 Apr 4 13:38 file10bytes 3 -rw--T 1 root root 520 Apr 4 13:33 file520bytes 2 -rw--T 1 root root 9 Apr 4 13:33 file9bytes 2 questions : - Could somebody explain why a file of 9 bytes takes 2 512b blocks ? - Why the block number of file10bytes has changes after a while doing nothing more than 'ls -ls' Please reply me directly as I'm not in this alias . Best, fred ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS boot: a new heads-up
It has been pointed out to me that if you have set up a zfs boot configuration using the old-style prototype code (where you had to have a ufs boot slice), and you BFU that system with a version of the Solaris archives that contain the new zfsboot support, your system will panic. So until we figure out a reasonable migration path (and I'm not sure that one exists), assume that you cannot bfu an old-style zfs boot setup. You must do a fresh install. A formal Heads-up mail will go out later, but I wanted to get the word out on this immediately. Lori Alt ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re[2]: [zfs-discuss] Setting up for zfsboot
Hello Adam, Wednesday, April 4, 2007, 7:08:07 PM, you wrote: AL On Wed, Apr 04, 2007 at 03:34:13PM +0200, Constantin Gonzalez wrote: - RAID-Z is _very_ slow when one disk is broken. AL Do you have data on this? The reconstruction should be relatively cheap AL especially when compared with the initial disk access. If I stop all activity to x4500 with a pool made of several raidz2 and then I issue spare attach I get really poor performance (1-2MB/s) on a pool with lot of relatively small files. -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Setting up for zfsboot
On Wed, Apr 04, 2007 at 11:04:06PM +0200, Robert Milkowski wrote: If I stop all activity to x4500 with a pool made of several raidz2 and then I issue spare attach I get really poor performance (1-2MB/s) on a pool with lot of relatively small files. Does that mean the spare is resilvering when you collect the performance data? I think a fair test would be to compare the performance of a fully functional RAID-Z stripe against a one with a missing (absent) device. Adam -- Adam Leventhal, Solaris Kernel Development http://blogs.sun.com/ahl ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: zfs destroy snapshot takes hours
Matthew Ahrens wrote: Miroslav Pendev wrote: I did some more testing, here is what I found: - I can destroy older and newer snapshots, just not that particular snapshot - I added some more memory total 1GB, now after I start the destroy command, ~500MB RAM are taken right away, there is still ~200MB or so left. o The machine is responsive, o If I run 'zfs list' it shows the snapshot as destroyed (it is gone from the list). o There is some zfs activity for about 20 seconds - I can see the lights of the HDDs of the pool blinking, then it stops Can you take a crash dump when the system is hung (ie. after there is no more disk activity), and make it available to me? Miro supplied the dump which I examined and filed bug 6542681. The root cause is that the machine is out of memory (in this case, kernel virtual address space). As a workaround, you can change kernelbase to allow the kernel to use more virtual address space. --matt ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Setting up for zfsboot
Lori Alt wrote: Can write-cache not be turned on manually as the user is sure that it is only ZFS that is using the entire disk? yes it can be turned on. But I don't know if ZFS would then know about it. I'd still feel more comfortably with it being turned off unless ZFS itself does it. But maybe someone from the ZFS team can clarify this. I think that it's true that ZFS would not know about the write cache and thus you wouldn't get the benefit of it. Actually, all that matters is that the write cache is on -- doesn't matter whether ZFS turned it on or you did it manually. (However, make sure that the write cache doesn't turn itself back off when you reboot / lose power...) --matt ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Excessive checksum errors...
After replacing a bad disk and waiting for the resilver to complete, I started a scrub of the pool. Currently, I have the pool mounted readonly, yet almost a quarter of the I/O is writes to the new disk. In fact, it looks like there are so many checksum errors, that zpool doesn't even list them properly: pool: p state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub in progress, 18.71% done, 2h17m to go config: NAMESTATE READ WRITE CKSUM p ONLINE 0 0 0 raidz1ONLINE 0 0 0 c2d0ONLINE 0 0 0 c3d0ONLINE 0 0 0 c5d0ONLINE 0 0 0 c4d0ONLINE 0 0 231.5 errors: No known data errors I assume that that should be followed by a K. Is my brand new replacement disk really returning gigabyte after gigabyte of silently corrupted data? I find that quite amazing, and I thought that I would inquire here. This is on snv_60. Chris ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: ZFS boot: a new heads-up
So any date on when install utility will support zfs root fresh install? almost can't wait for that. Cheers, Ivan. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS boot: a new heads-up
Ivan Wang wrote: So any date on when install utility will support zfs root fresh install? almost can't wait for that. Hi Ivan, there's no firm date for this yet, though the install team are working *really* hard at getting this to happen as soon as humanly possible. James C. McPherson -- Solaris kernel software engineer, system admin and troubleshooter http://www.jmcp.homeunix.com/blog Find me on LinkedIn @ http://www.linkedin.com/in/jamescmcpherson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss