zpool recovery support [PSARC/2009/479 FastTrack timeout 09/16/2009]
This case was approved in today's PSARC meeting. -tim
zpool recovery support [PSARC/2009/479 FastTrack timeout 09/16/2009]
Darren Reed wrote: > On 14/09/09 03:03 PM, Tim Haley wrote: >> Victor Latushkin wrote: >>> On 10.09.09 07:40, Tim Haley wrote: I am sponsoring the following fast-track for myself. This case introduces additional zpool sub-command options to support pool recovery. The case is requesting micro/patch binding. Timeout is 09/16/2009. Template Version: @(#)sac_nextcase 1.68 02/23/09 SMI This information is Copyright 2009 Sun Microsystems 1. Introduction 1.1. Project/Component Working Name: zpool recovery support 1.2. Name of Document Author/Supplier: Author: Timothy Haley 1.3 Date of This Document: 09 September, 2009 4. Technical Description OVERVIEW: Uncooperative or deceptive hardware, combined with power failures or sudden lack of access to devices, can result in zpools without redundancy being non-importable. ZFS' copy-on-write and Merkle tree properties will sometimes allow us to recover from these problems. Only ad-hoc means currently exist to take advantage of this recoverability. This proposal aims to rectify that short-coming. PROPOSED SOLUTION: This fast-track proposes two new command line flags each for the 'zpool clear' and 'zpool import' sub-commands. >>> >>> 'zpool clear' is becoming more and more overloaded in meaning. >>> Currently it is used to clear error counters (original use) and >>> recover from faulted slog device or suspended state (though there's >>> no mention of it in the man page). This is confusing users and have >>> been brought up several times (at least) on zfs-discuss. >>> >>> isn't it better to introduce another subcommand 'recover' or >>> something to handle all sorts of recovery? >>> >> "Better" is subjective. For the limited recovery we are going to >> support at the moment, the single flag to clear or import is probably >> sufficient. The confusion of what to run to recover should hopefully >> be abated by failed imports and 'zpool status' directing the >> administrator exactly what to run to perform a recovery. >> >> Having the flag now does not preclude us from adding a recover >> subcommand in the future for more advanced recovery. > > If this is a limited recover mechanism then why not make this a variant > of the recover subcommand, rather than clear? > > That seems more obvious to me, in terms of usability. > > However, it should be noted that "zpool clear" does fit the "svcadm > clear" operational model. > > In light of that, if z "zpool recover" is added at some point in the > future, would a "zpool clear" automatically do whatever extended > recovery was possible to enable the pool to be mounted? > > Given that "zpool clear" is a recovery operation (of sorts) and that > you're hinting at there being thought about a more advanced recovery > option, I think it would be beneficial to understand more about what the > project team intends to do that requires us to have recovery performed > by two different subcommands. > You've read a lot more into my statement than I ever meant to imply. I only meant that if we thought up something exotic in the longer term, we could still consider a new sub-command. I doubt it would be necessary, though. -tim > For example, how will I know when it is appropriate to use "zpool clear" > vs "zpool recover"? > Is user confusion likely from having two subcommands that do similar but > different things, depending on the circumstances at hand? > > I appreciate that you haven't formally presented us with a case that > mentions "zpool recover", but your email here hints that there is more > to follow and that might help us put this case in better perspective. > > Darren >
zpool recovery support [PSARC/2009/479 FastTrack timeout 09/16/2009]
On 10.09.09 07:40, Tim Haley wrote: > I am sponsoring the following fast-track for myself. This case > introduces additional zpool sub-command options to support pool > recovery. The case is requesting micro/patch binding. Timeout is > 09/16/2009. > > Template Version: @(#)sac_nextcase 1.68 02/23/09 SMI > This information is Copyright 2009 Sun Microsystems > 1. Introduction > 1.1. Project/Component Working Name: >zpool recovery support > 1.2. Name of Document Author/Supplier: >Author: Timothy Haley > 1.3 Date of This Document: > 09 September, 2009 > 4. Technical Description > > OVERVIEW: > > Uncooperative or deceptive hardware, combined with power > failures or sudden lack of access to devices, can result in > zpools without redundancy being non-importable. ZFS' > copy-on-write and Merkle tree properties will sometimes allow > us to recover from these problems. Only ad-hoc means currently > exist to take advantage of this recoverability. This proposal > aims to rectify that short-coming. > > PROPOSED SOLUTION: > > This fast-track proposes two new command line flags each for > the 'zpool clear' and 'zpool import' sub-commands. 'zpool clear' is becoming more and more overloaded in meaning. Currently it is used to clear error counters (original use) and recover from faulted slog device or suspended state (though there's no mention of it in the man page). This is confusing users and have been brought up several times (at least) on zfs-discuss. isn't it better to introduce another subcommand 'recover' or something to handle all sorts of recovery? victor
zpool recovery support [PSARC/2009/479 FastTrack timeout 09/16/2009]
On 14/09/09 03:03 PM, Tim Haley wrote: > Victor Latushkin wrote: >> On 10.09.09 07:40, Tim Haley wrote: >>> I am sponsoring the following fast-track for myself. This case >>> introduces additional zpool sub-command options to support pool >>> recovery. The case is requesting micro/patch binding. Timeout is >>> 09/16/2009. >>> >>> Template Version: @(#)sac_nextcase 1.68 02/23/09 SMI >>> This information is Copyright 2009 Sun Microsystems >>> 1. Introduction >>> 1.1. Project/Component Working Name: >>> zpool recovery support >>> 1.2. Name of Document Author/Supplier: >>> Author: Timothy Haley >>> 1.3 Date of This Document: >>> 09 September, 2009 >>> 4. Technical Description >>> >>> OVERVIEW: >>> >>> Uncooperative or deceptive hardware, combined with power >>> failures or sudden lack of access to devices, can result in >>> zpools without redundancy being non-importable. ZFS' >>> copy-on-write and Merkle tree properties will sometimes allow >>> us to recover from these problems. Only ad-hoc means currently >>> exist to take advantage of this recoverability. This proposal >>> aims to rectify that short-coming. >>> >>> PROPOSED SOLUTION: >>> >>> This fast-track proposes two new command line flags each for >>> the 'zpool clear' and 'zpool import' sub-commands. >> >> 'zpool clear' is becoming more and more overloaded in meaning. >> Currently it is used to clear error counters (original use) and >> recover from faulted slog device or suspended state (though there's >> no mention of it in the man page). This is confusing users and have >> been brought up several times (at least) on zfs-discuss. >> >> isn't it better to introduce another subcommand 'recover' or >> something to handle all sorts of recovery? >> > "Better" is subjective. For the limited recovery we are going to > support at the moment, the single flag to clear or import is probably > sufficient. The confusion of what to run to recover should hopefully > be abated by failed imports and 'zpool status' directing the > administrator exactly what to run to perform a recovery. > > Having the flag now does not preclude us from adding a recover > subcommand in the future for more advanced recovery. If this is a limited recover mechanism then why not make this a variant of the recover subcommand, rather than clear? That seems more obvious to me, in terms of usability. However, it should be noted that "zpool clear" does fit the "svcadm clear" operational model. In light of that, if z "zpool recover" is added at some point in the future, would a "zpool clear" automatically do whatever extended recovery was possible to enable the pool to be mounted? Given that "zpool clear" is a recovery operation (of sorts) and that you're hinting at there being thought about a more advanced recovery option, I think it would be beneficial to understand more about what the project team intends to do that requires us to have recovery performed by two different subcommands. For example, how will I know when it is appropriate to use "zpool clear" vs "zpool recover"? Is user confusion likely from having two subcommands that do similar but different things, depending on the circumstances at hand? I appreciate that you haven't formally presented us with a case that mentions "zpool recover", but your email here hints that there is more to follow and that might help us put this case in better perspective. Darren
zpool recovery support [PSARC/2009/479 FastTrack timeout 09/16/2009]
Victor Latushkin wrote: > On 10.09.09 07:40, Tim Haley wrote: >> I am sponsoring the following fast-track for myself. This case >> introduces additional zpool sub-command options to support pool >> recovery. The case is requesting micro/patch binding. Timeout is >> 09/16/2009. >> >> Template Version: @(#)sac_nextcase 1.68 02/23/09 SMI >> This information is Copyright 2009 Sun Microsystems >> 1. Introduction >> 1.1. Project/Component Working Name: >> zpool recovery support >> 1.2. Name of Document Author/Supplier: >> Author: Timothy Haley >> 1.3 Date of This Document: >> 09 September, 2009 >> 4. Technical Description >> >> OVERVIEW: >> >> Uncooperative or deceptive hardware, combined with power >> failures or sudden lack of access to devices, can result in >> zpools without redundancy being non-importable. ZFS' >> copy-on-write and Merkle tree properties will sometimes allow >> us to recover from these problems. Only ad-hoc means currently >> exist to take advantage of this recoverability. This proposal >> aims to rectify that short-coming. >> >> PROPOSED SOLUTION: >> >> This fast-track proposes two new command line flags each for >> the 'zpool clear' and 'zpool import' sub-commands. > > 'zpool clear' is becoming more and more overloaded in meaning. Currently > it is used to clear error counters (original use) and recover from > faulted slog device or suspended state (though there's no mention of it > in the man page). This is confusing users and have been brought up > several times (at least) on zfs-discuss. > > isn't it better to introduce another subcommand 'recover' or something > to handle all sorts of recovery? > "Better" is subjective. For the limited recovery we are going to support at the moment, the single flag to clear or import is probably sufficient. The confusion of what to run to recover should hopefully be abated by failed imports and 'zpool status' directing the administrator exactly what to run to perform a recovery. Having the flag now does not preclude us from adding a recover subcommand in the future for more advanced recovery. -tim
zpool recovery support [PSARC/2009/479 FastTrack timeout 09/16/2009]
Tim Haley wrote: > I am sponsoring the following fast-track for myself. This case > introduces additional zpool sub-command options to support pool > recovery. The case is requesting micro/patch binding. Timeout is > 09/16/2009. > > Template Version: @(#)sac_nextcase 1.68 02/23/09 SMI > This information is Copyright 2009 Sun Microsystems > 1. Introduction > 1.1. Project/Component Working Name: >zpool recovery support > 1.2. Name of Document Author/Supplier: >Author: Timothy Haley > 1.3 Date of This Document: > 09 September, 2009 > 4. Technical Description I'm happy with the case as specified so it gets my +1. I'm going on the assumption that there are spa history records written for this - but didn't expect to see that as part of the ARC material since their format isn't an interface and many of them are Internal taxonomy anyway. -- Darren J Moffat
zpool recovery support [PSARC/2009/479 FastTrack timeout 09/16/2009]
On Wed, 2009-09-09 at 21:40 -0600, Tim Haley wrote: > I am sponsoring the following fast-track for myself. This case > introduces additional zpool sub-command options to support pool > recovery. The case is requesting micro/patch binding. Timeout is > 09/16/2009. +1 -Seb
zpool recovery support [PSARC/2009/479 FastTrack timeout 09/16/2009]
Scott Rotondo wrote: > Tim Haley wrote: >> >> PROPOSED SOLUTION: >> >> This fast-track proposes two new command line flags each for >> the 'zpool clear' and 'zpool import' sub-commands. >> >> Both sub-commands will now accept a '-F' recovery mode flag. >> When specified, a determination is made if discarding the last >> few transactions performed in an unopenable or non-importable >> pool will return the pool to an usable state. If so, the >> transactions are irreversibly discarded, and the pool >> imported. If the pool is usable or already imported and this >> flag is specified, the flag is ignored and no transactions are >> discarded. >> >> Both sub-commands will now also accept a '-n' flag. This flag >> is only meaningful in conjunction with the '-F' flag. When >> specified, an attempt is made to see if discarding transactions >> will return the pool to a usable state, but no transactions are >> actually discarded. > > Here's a usability suggestion. Whenever clear or import fails, why not > automatically do the equivalent of -F -n (i.e. tell the user > if recovery is possible)? If so, the user can invoke with -F if desired. > There would be no need to create a -n option. > That is exactly how it works in the prototype. The -n is still useful for reconfirming. -tim > Scott > >
zpool recovery support [PSARC/2009/479 FastTrack timeout 09/16/2009]
Tim Haley wrote: > Scott Rotondo wrote: >> Tim Haley wrote: >>> >>> PROPOSED SOLUTION: >>> >>> This fast-track proposes two new command line flags each for >>> the 'zpool clear' and 'zpool import' sub-commands. >>> >>> Both sub-commands will now accept a '-F' recovery mode flag. >>> When specified, a determination is made if discarding the last >>> few transactions performed in an unopenable or non-importable >>> pool will return the pool to an usable state. If so, the >>> transactions are irreversibly discarded, and the pool >>> imported. If the pool is usable or already imported and this >>> flag is specified, the flag is ignored and no transactions are >>> discarded. >>> >>> Both sub-commands will now also accept a '-n' flag. This flag >>> is only meaningful in conjunction with the '-F' flag. When >>> specified, an attempt is made to see if discarding transactions >>> will return the pool to a usable state, but no transactions are >>> actually discarded. >> >> Here's a usability suggestion. Whenever clear or import fails, why not >> automatically do the equivalent of -F -n (i.e. tell the user >> if recovery is possible)? If so, the user can invoke with -F if >> desired. There would be no need to create a -n option. >> > That is exactly how it works in the prototype. > > The -n is still useful for reconfirming. > > -tim > OK, good. I'm less concerned about removing the -n than I am about making sure we automatically tell the user when he should try -F. Scott -- Scott Rotondo Principal Engineer, Solaris Security Technologies President, Trusted Computing Group Phone/FAX: +1 408 850 3655 (Internal x68278)
zpool recovery support [PSARC/2009/479 FastTrack timeout 09/16/2009]
I am sponsoring the following fast-track for myself. This case introduces additional zpool sub-command options to support pool recovery. The case is requesting micro/patch binding. Timeout is 09/16/2009. Template Version: @(#)sac_nextcase 1.68 02/23/09 SMI This information is Copyright 2009 Sun Microsystems 1. Introduction 1.1. Project/Component Working Name: zpool recovery support 1.2. Name of Document Author/Supplier: Author: Timothy Haley 1.3 Date of This Document: 09 September, 2009 4. Technical Description OVERVIEW: Uncooperative or deceptive hardware, combined with power failures or sudden lack of access to devices, can result in zpools without redundancy being non-importable. ZFS' copy-on-write and Merkle tree properties will sometimes allow us to recover from these problems. Only ad-hoc means currently exist to take advantage of this recoverability. This proposal aims to rectify that short-coming. PROPOSED SOLUTION: This fast-track proposes two new command line flags each for the 'zpool clear' and 'zpool import' sub-commands. Both sub-commands will now accept a '-F' recovery mode flag. When specified, a determination is made if discarding the last few transactions performed in an unopenable or non-importable pool will return the pool to an usable state. If so, the transactions are irreversibly discarded, and the pool imported. If the pool is usable or already imported and this flag is specified, the flag is ignored and no transactions are discarded. Both sub-commands will now also accept a '-n' flag. This flag is only meaningful in conjunction with the '-F' flag. When specified, an attempt is made to see if discarding transactions will return the pool to a usable state, but no transactions are actually discarded. PROPOSED CHANGES to ZPOOL(1M) PAGE: --- zpool.1m.rogi Thu Aug 27 09:59:14 2009 +++ zpool.1mWed Sep 9 21:02:25 2009 @@ -18,7 +18,7 @@ zpool attach [-f] pool device new_device - zpool clear pool [device] + zpool clear [-n] [-F] pool [device] zpool create [-fn] [-o property=value] ... [-O file-system-property=value] @@ -44,11 +44,11 @@ zpool import [-o mntopts] [-p property=value] ... [-d dir | -c cachefile] - [-D] [-f] [-R root] -a + [-D] [-f] [-R root] [-n] [-F] -a zpool import [-o mntopts] [-o property=value] ... [-d dir | -c cachefile] - [-D] [-f] [-R root] pool |id [newpool] + [-D] [-f] [-R root] [-n] [-F] pool |id [newpool] zpool iostat [-v] [pool] ... [interval[count]] @@ -761,7 +761,7 @@ - zpool clear pool [device] ... + zpool clear [-n] [-F] pool [device] ... Clears device errors in a pool. If no arguments are specified, all device errors within the pool are @@ -769,7 +769,18 @@ errors associated with the specified device or devices are cleared. + -FInitiates recovery mode for a unopenable pool. + Attempts to discard the last few transactions in the + pool to return it to an openable state. Not all + damaged pools can be recovered by using this option. + If successful, the data from the discarded transactions + is irreversibly lost. + -nUsed in combination with the -F flag. Check if + discarding transactions would make the pool openable, + but do not actually discard any transactions. + + zpool create [-fn] [-o property=value] ... [-O file-system- property=value] ... [-m mountpoint] [-R root] pool vdev ... @@ -1016,7 +1027,7 @@ zpool import [-o mntopts] [ -o property=value] ... [-d dir | - -c cachefile] [-D] [-f] [-R root] -a + -c cachefile] [-D] [-f] [-n] [-F] [-R root] -a Imports all pools found in the search directories. Identical to the previous command, except that all pools @@ -1075,6 +1086,17 @@ appears to be potentially active. + -F Recovery mode for a non-importable pool. + Attempt to return the pool to an + importable state by discarding the last + few transactions. Not all damaged pools + can be recovered by using this option. + If successful, the data from the + discarded transactions is irreversibly + lost. This option is ignored if the pool + is importable or already imported. + + -a Searches for and imports all pools found. @@ -1083,
zpool recovery support [PSARC/2009/479 FastTrack timeout 09/16/2009]
Tim Haley wrote: > > PROPOSED SOLUTION: > > This fast-track proposes two new command line flags each for > the 'zpool clear' and 'zpool import' sub-commands. > > Both sub-commands will now accept a '-F' recovery mode flag. > When specified, a determination is made if discarding the last > few transactions performed in an unopenable or non-importable > pool will return the pool to an usable state. If so, the > transactions are irreversibly discarded, and the pool > imported. If the pool is usable or already imported and this > flag is specified, the flag is ignored and no transactions are > discarded. > > Both sub-commands will now also accept a '-n' flag. This flag > is only meaningful in conjunction with the '-F' flag. When > specified, an attempt is made to see if discarding transactions > will return the pool to a usable state, but no transactions are > actually discarded. Here's a usability suggestion. Whenever clear or import fails, why not automatically do the equivalent of -F -n (i.e. tell the user if recovery is possible)? If so, the user can invoke with -F if desired. There would be no need to create a -n option. Scott -- Scott Rotondo Principal Engineer, Solaris Security Technologies President, Trusted Computing Group Phone/FAX: +1 408 850 3655 (Internal x68278)