[zfs-discuss] ZFS send/receive VS. scp
Hi there, I've been comparing using the ZFS send/receive function over SSH to simply scp'ing the contents of snapshot, and have found for me the performance is 2x faster for scp. Has anyone else noticed ZFS send/receive to be noticeably slower? Best Regards, Jason ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] performance question
Listman, What's the average size of your files? Do you have many file deletions/moves going on? I'm not that familiar with how Perforce handles moving files around. XFS is bad at small files (worse than most file systems), as SGI optimized it for larger files (> 64K). You might see a performance enhancement with ZFS. You'll definitely see a management benefit from ZFS. -J On 11/14/06, listman <[EMAIL PROTECTED]> wrote: hi all, i'm considering using ZFS for a Perforce server where the repository might have the following characteristics Number of branches 68 Number of changes 85,987 Total number of files (at head revision) 2,675,545 Total number of users 36 Total number of clients 3,219 Perforce depot size 15 GB I'm being told that raid 0/1 XFS on linux would be the most efficient way to manage this repository, I was wondering if the list thought that ZFS would be a good choice? Thx! ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] performance question
On Nov 15, 2006, at 1:09 AM, Jason J. W. Williams wrote:What's the average size of your files? Do you have many file deletions/moves going on? I'm not that familiar with how Perforce handles moving files around. average size of my files seems to be around 4k, there can be thousands of files being moved at times..the hierarchy is kind of strange for this data. we have maybe 4000 directories at the top level, each of which contains2 subdirectories, which in turn contain 3 files each. XFS is bad at small files (worse than most file systems), as SGI optimized it for larger files (> 64K). You might see a performance enhancement with ZFS. You'll definitely see a management benefit from ZFS. interesting, any other comments on this?i assume this is all going to be much better than the solaris 8 filesystem___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48
Tomas, Apologies for delayed response... Tomas Ögren wrote: Interesting ! So, it is not the ARC which is consuming too much memory It is some other piece (not sure if it belongs to ZFS) which is causing the crunch... Or the other possibility is that ARC ate up too much and caused a near crunch situation and the kmem hit back and caused ARC to free up it's buffers (hence the no_grow flag enabled). So, it (ARC) could be osscillating between large caching and then purging the caches. You might want to keep track of these values (ARC size and no_grow flag) and see how they change over a period of time. This would help us understand the pattern. I would guess it grows after boot until it hits some max and then stays there.. but I can check it out.. No, that is not true. Its shrinks when there is memory pressure. The values of 'c' and 'p' are adjusted accordingly. And if we know it ARC which is causing the crunch we could manually change the values of c_max to a comfortable value and that would limit the size of ARC. But in the ZFS world, DNLC is part of the ARC, right? Not really... ZFS uses the regular DNLC for lookup optimization. However, the metadata/data is cached in the ARC. My original question was how to get rid of "data cache", but keep "metadata cache" (such as DNLC)... This is good question. AFAIK ARC does not really differentiate between metadata and data. So, I am not sure if we can control it. However, as I mentioned above ZFS still uses the DNLC caching. However, I would suggest that you try it out on a non-production machine first. By, default the c_max is set to 75% of physmem and that is the hard limit. "c" is the soft limit and ARC would try and grow upto 'c". The value of "c" is adjusted when there is a need to cache more but, it will never exceed "c_max". Regarding the huge number of reads, I am sure you have already tried disabling the VDEV prefetch. If not, it is worth a try. That was part of my original question, how? :) Apologies :-) I was digging around the code and I find that zfs_vdev_cache_bshift is the one which would control the amount that is read. Currenty it is set to 16. So, we should be able to modify this and reduce the prefetch. However, I will have to double check with more people and get back to you. Thanks and regards, Sanjeev. /Tomas -- Solaris Revenue Products Engineering, India Engineering Center, Sun Microsystems India Pvt Ltd. Tel:x27521 +91 80 669 27521 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Fwd: [zfs-discuss] Thoughts on patching + zfs root
On 14/11/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: >Actually, we have considered this. On both SPARC and x86, there will be >a way to specify the root file system (i.e., the bootable dataset) to be >booted, >at either the GRUB prompt (for x86) or the OBP prompt (for SPARC). >If no root file system is specified, the current default 'bootfs' specified >in the root pool's metadata will be booted. But it will be possible to >override the default, which will provide that "fallback" boot capability. I was thinking of some automated mechanism such as: - BIOS which, when reset during POST, will switch to safe defaults and enter setup - Windows which, when reset during boot, will offer safe mode at the next boot. I was thinking of something that on activation of a new boot environment would automatically fallback on catastrophic failure. Multiple grub entries would mitigate most risks (you can already define multiple boot archives pointing at different zfs root filesystems, it's just not automated). I suppose it depends how 'catastrophic' the failture is, but if it's very low level, booting another root probabyl won't help, and if it's too high level, how will you detect it (i.e. you've booted the kernel, but it is buggy). -- Rasputin :: Jack of All Trades - Master of Nuns http://number9.hellooperator.net/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on patching + zfs root
On 15/11/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: >I suppose it depends how 'catastrophic' the failture is, but if it's >very low level, >booting another root probabyl won't help, and if it's too high level, how will >you detect it (i.e. you've booted the kernel, but it is buggy). If it panics (but not too early) or fails to come up properly? Detecting 'come up properly' sounds hard (as in 'turing test hard') to me. -- Rasputin :: Jack of All Trades - Master of Nuns http://number9.hellooperator.net/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS send/receive VS. scp
Jason J. W. Williams wrote: Hi there, I've been comparing using the ZFS send/receive function over SSH to simply scp'ing the contents of snapshot, and have found for me the performance is 2x faster for scp. Can you give some more details about the configuration of the two machines involved and the ssh config. For example is compression used in the file system and/or with the ssh connection. How much data are you transfering ? Is is lots of small files or a few number of large files ? Can you give some actual numbers. Has anyone else noticed ZFS send/receive to be noticeably slower? I haven't but then I haven't done much benchmarking since it was fast enough for what I needed. This is actually a very good pair of things to compare. It would also be interesting to compare rsync over ssh as well. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on patching + zfs root
Dick Davies wrote: On 15/11/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: >I suppose it depends how 'catastrophic' the failture is, but if it's >very low level, >booting another root probabyl won't help, and if it's too high level, how will >you detect it (i.e. you've booted the kernel, but it is buggy). If it panics (but not too early) or fails to come up properly? Detecting 'come up properly' sounds hard (as in 'turing test hard') to me. It isn't as hard as that. In fact I don't think it is actually that hard at all in the current Solaris architecture. SMF helps a huge amount in this area because it has lots of failure detection and milestone/dependency concepts built in. I think we first need to define what state "up" actually is. Is it the kernel booted ? Is it the root file system mounted ? Is it we reached milestone all ? Is it we reached milestone all with no services in maintenance ? Is it no services in maintenance that weren't on the last boot ? -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on patching + zfs root
>On 15/11/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: >> >> >I suppose it depends how 'catastrophic' the failture is, but if it's >> >very low level, >> >booting another root probabyl won't help, and if it's too high level, how >> >will >> >you detect it (i.e. you've booted the kernel, but it is buggy). >> >> If it panics (but not too early) or fails to come up properly? > >Detecting 'come up properly' sounds hard >(as in 'turing test hard') to me. Yep. For one, I don't think you can ever detect this for failures before the root filesystem is mounted read-write. Of course, since Sun is a system's company, that bit can conceivably be solved by changing the ILOM/ALOM to detect those. But for those systems it is less of an issue because they can be removely managed/powercycled/etc. Casper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on patching + zfs root
>I think we first need to define what state "up" actually is. Is it the >kernel booted ? Is it the root file system mounted ? Is it we reached >milestone all ? Is it we reached milestone all with no services in >maintenance ? Is it no services in maintenance that weren't on the last > boot ? I think that's fairly simple: "up is the state when the milestone we are booting to has been actually reached". What should SMF do when it finds that it cannot reach that milestone? Harder is: What is the system does not come up quickly enough? What if the system hangs before SMF is even starts? What if the system panics during boot or shortly after we reach our desired milestone? And then, of course, "define shortly and quickly". Casper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: [install-discuss] Re: Caiman Architecture document posted for review
Previously I wrote: >I still don't like forcing ZFS on people, though; I've found that ZFS >does not work on 1GB SPARC systems; I found that a rather high lower limit. > >(Whenever the NFS find runs over the zpool, the system hangs) It appears that this is a regression in build 52 or 51, I filed 6493923. It may be particular to the fact that this system contains a small zpool (25GB, 1.5GB in use) but with one particular feature: a directory with 80 files. Casper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Snapshot of a clone?
Hi, Is it possible to create snapshots off ZFS clones and further clones off those snapshots recursively? thanks, Prashanth ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Snapshot of a clone?
Prashanth Radhakrishnan wrote: Hi, Is it possible to create snapshots off ZFS clones and further clones off those snapshots recursively? Yes. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] bogus zfs error message on boot
On 11/15/06, Frank Cusack <[EMAIL PROTECTED]> wrote: After swapping some hardware and rebooting: SUNW-MSG-ID: ZFS-8000-CS, TYPE: Fault, VER: 1, SEVERITY: Major EVENT-TIME: Tue Nov 14 21:37:55 PST 2006 PLATFORM: SUNW,Sun-Fire-T1000, CSN: -, HOSTNAME: SOURCE: zfs-diagnosis, REV: 1.0 EVENT-ID: 60b31acc-0de8-c1f3-84ec-935574615804 DESC: A ZFS pool failed to open. Refer to http://sun.com/msg/ZFS-8000-CS for more information. AUTO-RESPONSE: No automated response will occur. IMPACT: The pool data is unavailable REC-ACTION: Run 'zpool status -x' and either attach the missing device or restore from backup. # zpool status -x all pools are healthy And in fact they are. What gives? This message occurs on every boot now. It didn't occur before I changed the hardware. Sounds like an opportunity for enhancement. At the very least the ZFS :: FMA interaction should include the component (pool in this case) which was noted to be marginal/faulty/dead. Does zpool status -xv show anything that zpool status -x doesn't? James C. McPherson -- Solaris kernel software engineer, system admin and troubleshooter http://www.jmcp.homeunix.com/blog Find me on LinkedIn @ http://www.linkedin.com/in/jamescmcpherson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] bogus zfs error message on boot
On 11/15/06, Frank Cusack <[EMAIL PROTECTED]> wrote: After swapping some hardware and rebooting: SUNW-MSG-ID: ZFS-8000-CS, TYPE: Fault, VER: 1, SEVERITY: Major EVENT-TIME: Tue Nov 14 21:37:55 PST 2006 PLATFORM: SUNW,Sun-Fire-T1000, CSN: -, HOSTNAME: SOURCE: zfs-diagnosis, REV: 1.0 EVENT-ID: 60b31acc-0de8-c1f3-84ec-935574615804 DESC: A ZFS pool failed to open. Refer to http://sun.com/msg/ZFS-8000-CS for more information. AUTO-RESPONSE: No automated response will occur. IMPACT: The pool data is unavailable REC-ACTION: Run 'zpool status -x' and either attach the missing device or restore from backup. # zpool status -x all pools are healthy How about this? zpool export zpool import -f And in fact they are. What gives? This message occurs on every boot now. It didn't occur before I changed the hardware. I had replaced the FC card with a fw800 card, then I changed it back. (the fw800 card didn't work) -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Asif Iqbal PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] bogus zfs error message on boot
On November 16, 2006 1:18:22 AM +1100 James McPherson <[EMAIL PROTECTED]> wrote: On 11/15/06, Frank Cusack <[EMAIL PROTECTED]> wrote: After swapping some hardware and rebooting: SUNW-MSG-ID: ZFS-8000-CS, TYPE: Fault, VER: 1, SEVERITY: Major EVENT-TIME: Tue Nov 14 21:37:55 PST 2006 PLATFORM: SUNW,Sun-Fire-T1000, CSN: -, HOSTNAME: SOURCE: zfs-diagnosis, REV: 1.0 EVENT-ID: 60b31acc-0de8-c1f3-84ec-935574615804 DESC: A ZFS pool failed to open. Refer to http://sun.com/msg/ZFS-8000-CS for more information. AUTO-RESPONSE: No automated response will occur. IMPACT: The pool data is unavailable REC-ACTION: Run 'zpool status -x' and either attach the missing device or restore from backup. # zpool status -x all pools are healthy And in fact they are. What gives? This message occurs on every boot now. It didn't occur before I changed the hardware. Sounds like an opportunity for enhancement. At the very least the ZFS :: FMA interaction should include the component (pool in this case) which was noted to be marginal/faulty/dead. Does zpool status -xv show anything that zpool status -x doesn't? Nope. But I see that my raid array (3511) is now beeping like crazy, playing a song really. I think there must be some delay that is causing the disks not to be available early in the boot? Then they become available and get imported? (I do notice that unlike scsi disks, if I add a disk to the 3511 it is noticed immediately on the host.) -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: performance question
listman <[EMAIL PROTECTED]> writes: >> What's the average size of your files? Do you have many file >> deletions/moves going on? I'm not that familiar with how Perforce >> handles moving files around. >> >average size of my files seems to be around 4k, there can be >thousands of files being moved at times.. If the original question was for a Perforce server, not a client, then what kind of files stored in the repository should not matter. If I recall correctly, Perforce uses a database and not a profusion of small files so XFS should do quite well here (and probably ZFS too). ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: performance question
On November 15, 2006 4:40:36 PM + Mattias Engdegård <[EMAIL PROTECTED]> wrote: listman <[EMAIL PROTECTED]> writes: What's the average size of your files? Do you have many file deletions/moves going on? I'm not that familiar with how Perforce handles moving files around. average size of my files seems to be around 4k, there can be thousands of files being moved at times.. If the original question was for a Perforce server, not a client, then what kind of files stored in the repository should not matter. If I recall correctly, Perforce uses a database and not a profusion of small files so XFS should do quite well here (and probably ZFS too). There are 3 components to p4 performance. 1. branching complexity (addressed with raw cpu power) 2. database performance (addressed with RAM) 3. file xfer performance ("asynchronous" wrt db updates as long as the db is on a different disk from the files, so doesn't affect concurrency but if lots of files are xferd to clients they may complain it's slow if the disk subsystem is especially slow). in any event this is generally the least important aspect of p4 performance. This is why I challenged the question originally. -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] bogus zfs error message on boot
This is likely a variation of: 6401126 FM reports 'pool data unavailable' because of timing between FM and mounting of file systems Basically, what's happening is that ZFS is trying to open the pool before the underlying device backing the vdev is available. My guess is that your new hardware is loading later in boot for some reason. When you actually come all the way up, the device is available and your pool is fine. The precise fix is a little complicated. As a workaround, you may be able to add a forceload directive to /etc/system to force the driver associated with your hardware to attach earlier in boot. - Eric On Tue, Nov 14, 2006 at 09:54:50PM -0800, Frank Cusack wrote: > After swapping some hardware and rebooting: > > SUNW-MSG-ID: ZFS-8000-CS, TYPE: Fault, VER: 1, SEVERITY: Major > EVENT-TIME: Tue Nov 14 21:37:55 PST 2006 > PLATFORM: SUNW,Sun-Fire-T1000, CSN: -, HOSTNAME: > SOURCE: zfs-diagnosis, REV: 1.0 > EVENT-ID: 60b31acc-0de8-c1f3-84ec-935574615804 > DESC: A ZFS pool failed to open. Refer to http://sun.com/msg/ZFS-8000-CS > for more information. > AUTO-RESPONSE: No automated response will occur. > IMPACT: The pool data is unavailable > REC-ACTION: Run 'zpool status -x' and either attach the missing device or >restore from backup. > > # zpool status -x > all pools are healthy > > And in fact they are. What gives? This message occurs on every boot now. > It didn't occur before I changed the hardware. > > I had replaced the FC card with a fw800 card, then I changed it back. > (the fw800 card didn't work) > > -frank > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: performance question
Frank Cusack <[EMAIL PROTECTED]> writes: >1. branching complexity (addressed with raw cpu power) >2. database performance (addressed with RAM) >3. file xfer performance ("asynchronous" wrt db updates as long as the > db is on a different disk from the files, so doesn't affect concurrency > but if lots of files are xferd to clients they may complain it's slow > if the disk subsystem is especially slow). in any event this is > generally the least important aspect of p4 performance. I agree completely, but file system performance for small files should in any case not matter for a p4 server. >This is why I challenged the question originally. I've heard that red ones are supposed to be faster than other cars. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS send/receive VS. scp
Hi Darren, The copy is going between these two machines: Source: SunFire X4100 Dual 2.2Ghz Opteron (single core) 2GB RAM - SAN Attached (STK FLX210) - ZFS RAID-Z zpool Destination: SunFire T2000 8-Core 1.2GHz T1 w/ 8GB RAM - SAN Attached (STK FLX210) - ZFS RAID-Z zpool No compression is used in either filesystem, and the majority of the files between 120MB and 4GB. It's a set of about 20 files. Total transfer is 22GB. Both systems are attached via GigE on the same switch. The zfs send/receive via SSH averages about 3MB/s. The scp averages about 6MB/s. Hope this helps explain what's going on. Best Regards, Jason On 11/15/06, Darren J Moffat <[EMAIL PROTECTED]> wrote: Jason J. W. Williams wrote: > Hi there, > > I've been comparing using the ZFS send/receive function over SSH to > simply scp'ing the contents of snapshot, and have found for me the > performance is 2x faster for scp. Can you give some more details about the configuration of the two machines involved and the ssh config. For example is compression used in the file system and/or with the ssh connection. How much data are you transfering ? Is is lots of small files or a few number of large files ? Can you give some actual numbers. > Has anyone else noticed ZFS send/receive to be noticeably slower? I haven't but then I haven't done much benchmarking since it was fast enough for what I needed. This is actually a very good pair of things to compare. It would also be interesting to compare rsync over ssh as well. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] # devices in raidz.
Torrey McMahon wrote: Richard Elling - PAE wrote: Torrey McMahon wrote: Robert Milkowski wrote: Hello Torrey, Friday, November 10, 2006, 11:31:31 PM, you wrote: [SNIP] Tunable in a form of pool property, with default 100%. On the other hand maybe simple algorithm Veritas has used is good enough - simple delay between scrubing/resilvering some data. I think a not-to-convoluted algorithm as people have suggested would be ideal and then let people override it as necessary. I would think a 100% default might be a call generator but I'm up for debate. ("Hey my array just went crazy. All the lights are blinking but my application isn't doing any I/O. What gives?") I'll argue that *any* random % is bogus. What you really want to do is prioritize activity where resources are constrained. From a RAS perspective, idle systems are the devil's playground :-). ZFS already does prioritize I/O that it knows about. Prioritizing on CPU might have some merit, but to integrate into Solaris' resource management system might bring some added system admin complexity which is unwanted. I agree but the problem as I see it as that nothing has a overview of the entire environment. ZFS knows what I/O is coming in and what its sending out but that's it. Even if we had an easy to use resource management framework across all the Sun applications and devices we'd still run into non-Sun bits that place demands on shared components like networking, san, arrays, etc. Anything that can be auto-tuned is great but I'm afraid we're still going to need manual tuning in some cases. I think this is reason #7429 why I hate SANs: no meaningful QoS related to reason #85823 why I hate SANs: sdd_max_throttle is a butt-ugly hack :-) -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: ZFS Mirror Configuration??
Hi Tomas, Thanks for that. Got to get my out of Veritas way of looking at mirrors. Another reason for wanting it this way is that IF the functionality of splitting mirrors comes into ZFS whereby we can import them into a new pool or even a new LDOM, then we know exactly the boundary to split them on. Anyways thanks for that. Regards OAB This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on patching + zfs root
On Wed, Nov 15, 2006 at 12:10:30PM +0100, [EMAIL PROTECTED] wrote: > > >I think we first need to define what state "up" actually is. Is it the > >kernel booted ? Is it the root file system mounted ? Is it we reached > >milestone all ? Is it we reached milestone all with no services in > >maintenance ? Is it no services in maintenance that weren't on the last > > boot ? > > I think that's fairly simple: "up is the state when the milestone we > are booting to has been actually reached". > > What should SMF do when it finds that it cannot reach that milestone? Another question might be: how do I fix it when it's broken? > Harder is: > > What is the system does not come up quickly enough? > What if the system hangs before SMF is even starts? > What if the system panics during boot or shortly after we > reach our desired milestone? > > And then, of course, "define shortly and quickly". Such definitions to consider net and SAN booting. Personally I think if a system is hosed then the best way to fail-safe is to either panic or drop to single-user rather than trying to be clever and booting some other kernel. Ceri -- That must be wonderful! I don't understand it at all. -- Moliere pgpT7hoII5pKc.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on patching + zfs root
On Tue, Nov 14, 2006 at 07:32:08PM +0100, [EMAIL PROTECTED] wrote: > > >Actually, we have considered this. On both SPARC and x86, there will be > >a way to specify the root file system (i.e., the bootable dataset) to be > >booted, > >at either the GRUB prompt (for x86) or the OBP prompt (for SPARC). > >If no root file system is specified, the current default 'bootfs' specified > >in the root pool's metadata will be booted. But it will be possible to > >override the default, which will provide that "fallback" boot capability. > > > I was thinking of some automated mechanism such as: > > - BIOS which, when reset during POST, will switch to safe > defaults and enter setup > - Windows which, when reset during boot, will offer safe mode > at the next boot. > > I was thinking of something that on activation of a new boot environment > would automatically fallback on catastrophic failure. Provide at least two GRUB menu entries: one for the latest BE, whatever that is, and one for the last BE known to have booted to a multi-user milestone. Then have an SMF service update a ZFS property of the booted BE to mark it as the last one to have booted. This property should be a date, so only the currently booted BE's need be updated. Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on patching + zfs root
On Wed, Nov 15, 2006 at 11:00:01AM +, Darren J Moffat wrote: > I think we first need to define what state "up" actually is. Is it the > kernel booted ? Is it the root file system mounted ? Is it we reached > milestone all ? Is it we reached milestone all with no services in > maintenance ? Is it no services in maintenance that weren't on the last > boot ? It is that the default milestone has been reached, IMO (as opposed to a milestone passed to the kernel as a boot argument). ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on patching + zfs root
On Wed, Nov 15, 2006 at 09:58:35PM +, Ceri Davies wrote: > On Wed, Nov 15, 2006 at 12:10:30PM +0100, [EMAIL PROTECTED] wrote: > > > > >I think we first need to define what state "up" actually is. Is it the > > >kernel booted ? Is it the root file system mounted ? Is it we reached > > >milestone all ? Is it we reached milestone all with no services in > > >maintenance ? Is it no services in maintenance that weren't on the last > > > boot ? > > > > I think that's fairly simple: "up is the state when the milestone we > > are booting to has been actually reached". > > > > What should SMF do when it finds that it cannot reach that milestone? > > Another question might be: how do I fix it when it's broken? That's for monitoring systems. The issue here is how to best select a BE at boot time. IMO the last booted BE to have reached its default milestone should be that BE. > > Harder is: > > > > What is the system does not come up quickly enough? The user may note this and reboot the system. BEs that once booted but now don't will still be selected at the GRUB menu as the last known-to-boot BEs, so we may want the ZFS boot code to reset the property of the BE's used for making this selection. > > What if the system hangs before SMF is even starts? > > What if the system panics during boot or shortly after we > > reach our desired milestone? > > > > And then, of course, "define shortly and quickly". > > Such definitions to consider net and SAN booting. If you're netbooting then you're not doing a ZFS boot, so the point is moot (this thread is about how to best select a BE to boot at boot time). If you're booting ZFS where the boot pool [or one or more or all of its mirror vdevs] is on a SAN then all the foregoing should apply. Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on patching + zfs root
On Wed, Nov 15, 2006 at 04:23:18PM -0600, Nicolas Williams wrote: > On Wed, Nov 15, 2006 at 09:58:35PM +, Ceri Davies wrote: > > On Wed, Nov 15, 2006 at 12:10:30PM +0100, [EMAIL PROTECTED] wrote: > > > > > > >I think we first need to define what state "up" actually is. Is it the > > > >kernel booted ? Is it the root file system mounted ? Is it we reached > > > >milestone all ? Is it we reached milestone all with no services in > > > >maintenance ? Is it no services in maintenance that weren't on the last > > > > boot ? > > > > > > I think that's fairly simple: "up is the state when the milestone we > > > are booting to has been actually reached". > > > > > > What should SMF do when it finds that it cannot reach that milestone? > > > > Another question might be: how do I fix it when it's broken? > > That's for monitoring systems. The issue here is how to best select a > BE at boot time. IMO the last booted BE to have reached its default > milestone should be that BE. What I'm trying to say (and this is the only part that you didn't quote :)) is that there is no way I want the BE programatically selected. > > > Harder is: > > > > > > What is the system does not come up quickly enough? > > The user may note this and reboot the system. BEs that once booted but > now don't will still be selected at the GRUB menu as the last > known-to-boot BEs, so we may want the ZFS boot code to reset the > property of the BE's used for making this selection. Not my text, but wtf? Booting the wrong BE because my NIS server is down (or whatever) isn't really acceptable (or likely to resolve anything). I think that's what "not quickly enough" was getting at. > If you're netbooting then you're not doing a ZFS boot, so the point is > moot (this thread is about how to best select a BE to boot at boot > time). I believe I could have /usr or /var on NFS still. Ceri -- That must be wonderful! I don't understand it at all. -- Moliere pgpx5pzictHCT.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on patching + zfs root
On Tue, Nov 14, 2006 at 07:32:08PM +0100, [EMAIL PROTECTED] wrote: > > >Actually, we have considered this. On both SPARC and x86, there will be > >a way to specify the root file system (i.e., the bootable dataset) to be > >booted, > >at either the GRUB prompt (for x86) or the OBP prompt (for SPARC). > >If no root file system is specified, the current default 'bootfs' specified > >in the root pool's metadata will be booted. But it will be possible to > >override the default, which will provide that "fallback" boot capability. > > > I was thinking of some automated mechanism such as: > > - BIOS which, when reset during POST, will switch to safe > defaults and enter setup > - Windows which, when reset during boot, will offer safe mode > at the next boot. > > I was thinking of something that on activation of a new boot environment > would automatically fallback on catastrophic failure. I don't wish to sound ungrateful or unconstructive but there's no other way to say this: I liked ZFS better when it was a filesystem + volume manager rather than the one-tool-fits-all monster that it seems to be heading in. I'm very concerned about bolting some flavour of boot loader on to the side, particularly one that's automatic. I'm not doubting that the concept is way cool, but I want predictable behaviour every time; not way cool. Ceri -- That must be wonderful! I don't understand it at all. -- Moliere pgp37F69Un8zL.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thoughts on patching + zfs root
Ceri Davies wrote: On Tue, Nov 14, 2006 at 07:32:08PM +0100, [EMAIL PROTECTED] wrote: Actually, we have considered this. On both SPARC and x86, there will be a way to specify the root file system (i.e., the bootable dataset) to be booted, at either the GRUB prompt (for x86) or the OBP prompt (for SPARC). If no root file system is specified, the current default 'bootfs' specified in the root pool's metadata will be booted. But it will be possible to override the default, which will provide that "fallback" boot capability. I was thinking of some automated mechanism such as: - BIOS which, when reset during POST, will switch to safe defaults and enter setup - Windows which, when reset during boot, will offer safe mode at the next boot. I was thinking of something that on activation of a new boot environment would automatically fallback on catastrophic failure. I don't wish to sound ungrateful or unconstructive but there's no other way to say this: I liked ZFS better when it was a filesystem + volume manager rather than the one-tool-fits-all monster that it seems to be heading in. I'm very concerned about bolting some flavour of boot loader on to the side, particularly one that's automatic. I'm not doubting that the concept is way cool, but I want predictable behaviour every time; not way cool. All of these ideas about automated recovery are just ideas. I don't think we've reached monsterdom just yet. For right now, the planned behavior is more predictable: there is one dataset specified as the 'default bootable dataset' for the pool. You will have to take explicit action (something like luactivate) to change that default. You will always have a failsafe archive to boot if something goes terribly wrong and you need to fix your menu.lst or set a different default bootable dataset. You will also be able to have multiple entries in the menu.list file, corresponding to multiple BEs, but that will be optional. But I'm open to these ideas of automatic recovery. It's an interesting thing to consider. Ultimately, it might need to be something that is optional, so that we could also get behavior that is more predictable. Lori ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS patches for FreeBSD.
Just to let you know that first set of patches for FreeBSD is now available: http://lists.freebsd.org/pipermail/freebsd-fs/2006-November/002385.html -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgp7QLQ4XRMld.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss