zpool recovery support [PSARC/2009/479 FastTrack timeout 09/16/2009]
This case was approved in today's PSARC meeting. -tim
zpool recovery support [PSARC/2009/479 FastTrack timeout 09/16/2009]
Victor Latushkin wrote: On 10.09.09 07:40, Tim Haley wrote: I am sponsoring the following fast-track for myself. This case introduces additional zpool sub-command options to support pool recovery. The case is requesting micro/patch binding. Timeout is 09/16/2009. Template Version: @(#)sac_nextcase 1.68 02/23/09 SMI This information is Copyright 2009 Sun Microsystems 1. Introduction 1.1. Project/Component Working Name: zpool recovery support 1.2. Name of Document Author/Supplier: Author: Timothy Haley 1.3 Date of This Document: 09 September, 2009 4. Technical Description OVERVIEW: Uncooperative or deceptive hardware, combined with power failures or sudden lack of access to devices, can result in zpools without redundancy being non-importable. ZFS' copy-on-write and Merkle tree properties will sometimes allow us to recover from these problems. Only ad-hoc means currently exist to take advantage of this recoverability. This proposal aims to rectify that short-coming. PROPOSED SOLUTION: This fast-track proposes two new command line flags each for the 'zpool clear' and 'zpool import' sub-commands. 'zpool clear' is becoming more and more overloaded in meaning. Currently it is used to clear error counters (original use) and recover from faulted slog device or suspended state (though there's no mention of it in the man page). This is confusing users and have been brought up several times (at least) on zfs-discuss. isn't it better to introduce another subcommand 'recover' or something to handle all sorts of recovery? Better is subjective. For the limited recovery we are going to support at the moment, the single flag to clear or import is probably sufficient. The confusion of what to run to recover should hopefully be abated by failed imports and 'zpool status' directing the administrator exactly what to run to perform a recovery. Having the flag now does not preclude us from adding a recover subcommand in the future for more advanced recovery. -tim
zpool recovery support [PSARC/2009/479 FastTrack timeout 09/16/2009]
On 14/09/09 03:03 PM, Tim Haley wrote: Victor Latushkin wrote: On 10.09.09 07:40, Tim Haley wrote: I am sponsoring the following fast-track for myself. This case introduces additional zpool sub-command options to support pool recovery. The case is requesting micro/patch binding. Timeout is 09/16/2009. Template Version: @(#)sac_nextcase 1.68 02/23/09 SMI This information is Copyright 2009 Sun Microsystems 1. Introduction 1.1. Project/Component Working Name: zpool recovery support 1.2. Name of Document Author/Supplier: Author: Timothy Haley 1.3 Date of This Document: 09 September, 2009 4. Technical Description OVERVIEW: Uncooperative or deceptive hardware, combined with power failures or sudden lack of access to devices, can result in zpools without redundancy being non-importable. ZFS' copy-on-write and Merkle tree properties will sometimes allow us to recover from these problems. Only ad-hoc means currently exist to take advantage of this recoverability. This proposal aims to rectify that short-coming. PROPOSED SOLUTION: This fast-track proposes two new command line flags each for the 'zpool clear' and 'zpool import' sub-commands. 'zpool clear' is becoming more and more overloaded in meaning. Currently it is used to clear error counters (original use) and recover from faulted slog device or suspended state (though there's no mention of it in the man page). This is confusing users and have been brought up several times (at least) on zfs-discuss. isn't it better to introduce another subcommand 'recover' or something to handle all sorts of recovery? Better is subjective. For the limited recovery we are going to support at the moment, the single flag to clear or import is probably sufficient. The confusion of what to run to recover should hopefully be abated by failed imports and 'zpool status' directing the administrator exactly what to run to perform a recovery. Having the flag now does not preclude us from adding a recover subcommand in the future for more advanced recovery. If this is a limited recover mechanism then why not make this a variant of the recover subcommand, rather than clear? That seems more obvious to me, in terms of usability. However, it should be noted that zpool clear does fit the svcadm clear operational model. In light of that, if z zpool recover is added at some point in the future, would a zpool clear automatically do whatever extended recovery was possible to enable the pool to be mounted? Given that zpool clear is a recovery operation (of sorts) and that you're hinting at there being thought about a more advanced recovery option, I think it would be beneficial to understand more about what the project team intends to do that requires us to have recovery performed by two different subcommands. For example, how will I know when it is appropriate to use zpool clear vs zpool recover? Is user confusion likely from having two subcommands that do similar but different things, depending on the circumstances at hand? I appreciate that you haven't formally presented us with a case that mentions zpool recover, but your email here hints that there is more to follow and that might help us put this case in better perspective. Darren
zpool recovery support [PSARC/2009/479 FastTrack timeout 09/16/2009]
Tim Haley wrote: I am sponsoring the following fast-track for myself. This case introduces additional zpool sub-command options to support pool recovery. The case is requesting micro/patch binding. Timeout is 09/16/2009. Template Version: @(#)sac_nextcase 1.68 02/23/09 SMI This information is Copyright 2009 Sun Microsystems 1. Introduction 1.1. Project/Component Working Name: zpool recovery support 1.2. Name of Document Author/Supplier: Author: Timothy Haley 1.3 Date of This Document: 09 September, 2009 4. Technical Description I'm happy with the case as specified so it gets my +1. I'm going on the assumption that there are spa history records written for this - but didn't expect to see that as part of the ARC material since their format isn't an interface and many of them are Internal taxonomy anyway. -- Darren J Moffat
zpool recovery support [PSARC/2009/479 FastTrack timeout 09/16/2009]
On Wed, 2009-09-09 at 21:40 -0600, Tim Haley wrote: I am sponsoring the following fast-track for myself. This case introduces additional zpool sub-command options to support pool recovery. The case is requesting micro/patch binding. Timeout is 09/16/2009. +1 -Seb
zpool recovery support [PSARC/2009/479 FastTrack timeout 09/16/2009]
I am sponsoring the following fast-track for myself. This case introduces additional zpool sub-command options to support pool recovery. The case is requesting micro/patch binding. Timeout is 09/16/2009. Template Version: @(#)sac_nextcase 1.68 02/23/09 SMI This information is Copyright 2009 Sun Microsystems 1. Introduction 1.1. Project/Component Working Name: zpool recovery support 1.2. Name of Document Author/Supplier: Author: Timothy Haley 1.3 Date of This Document: 09 September, 2009 4. Technical Description OVERVIEW: Uncooperative or deceptive hardware, combined with power failures or sudden lack of access to devices, can result in zpools without redundancy being non-importable. ZFS' copy-on-write and Merkle tree properties will sometimes allow us to recover from these problems. Only ad-hoc means currently exist to take advantage of this recoverability. This proposal aims to rectify that short-coming. PROPOSED SOLUTION: This fast-track proposes two new command line flags each for the 'zpool clear' and 'zpool import' sub-commands. Both sub-commands will now accept a '-F' recovery mode flag. When specified, a determination is made if discarding the last few transactions performed in an unopenable or non-importable pool will return the pool to an usable state. If so, the transactions are irreversibly discarded, and the pool imported. If the pool is usable or already imported and this flag is specified, the flag is ignored and no transactions are discarded. Both sub-commands will now also accept a '-n' flag. This flag is only meaningful in conjunction with the '-F' flag. When specified, an attempt is made to see if discarding transactions will return the pool to a usable state, but no transactions are actually discarded. PROPOSED CHANGES to ZPOOL(1M) PAGE: --- zpool.1m.rogi Thu Aug 27 09:59:14 2009 +++ zpool.1mWed Sep 9 21:02:25 2009 @@ -18,7 +18,7 @@ zpool attach [-f] pool device new_device - zpool clear pool [device] + zpool clear [-n] [-F] pool [device] zpool create [-fn] [-o property=value] ... [-O file-system-property=value] @@ -44,11 +44,11 @@ zpool import [-o mntopts] [-p property=value] ... [-d dir | -c cachefile] - [-D] [-f] [-R root] -a + [-D] [-f] [-R root] [-n] [-F] -a zpool import [-o mntopts] [-o property=value] ... [-d dir | -c cachefile] - [-D] [-f] [-R root] pool |id [newpool] + [-D] [-f] [-R root] [-n] [-F] pool |id [newpool] zpool iostat [-v] [pool] ... [interval[count]] @@ -761,7 +761,7 @@ - zpool clear pool [device] ... + zpool clear [-n] [-F] pool [device] ... Clears device errors in a pool. If no arguments are specified, all device errors within the pool are @@ -769,7 +769,18 @@ errors associated with the specified device or devices are cleared. + -FInitiates recovery mode for a unopenable pool. + Attempts to discard the last few transactions in the + pool to return it to an openable state. Not all + damaged pools can be recovered by using this option. + If successful, the data from the discarded transactions + is irreversibly lost. + -nUsed in combination with the -F flag. Check if + discarding transactions would make the pool openable, + but do not actually discard any transactions. + + zpool create [-fn] [-o property=value] ... [-O file-system- property=value] ... [-m mountpoint] [-R root] pool vdev ... @@ -1016,7 +1027,7 @@ zpool import [-o mntopts] [ -o property=value] ... [-d dir | - -c cachefile] [-D] [-f] [-R root] -a + -c cachefile] [-D] [-f] [-n] [-F] [-R root] -a Imports all pools found in the search directories. Identical to the previous command, except that all pools @@ -1075,6 +1086,17 @@ appears to be potentially active. + -F Recovery mode for a non-importable pool. + Attempt to return the pool to an + importable state by discarding the last + few transactions. Not all damaged pools + can be recovered by using this option. + If successful, the data from the + discarded transactions is irreversibly + lost. This option is ignored if the pool + is importable or already imported. + + -a Searches for and imports all pools found. @@
zpool recovery support [PSARC/2009/479 FastTrack timeout 09/16/2009]
Tim Haley wrote: PROPOSED SOLUTION: This fast-track proposes two new command line flags each for the 'zpool clear' and 'zpool import' sub-commands. Both sub-commands will now accept a '-F' recovery mode flag. When specified, a determination is made if discarding the last few transactions performed in an unopenable or non-importable pool will return the pool to an usable state. If so, the transactions are irreversibly discarded, and the pool imported. If the pool is usable or already imported and this flag is specified, the flag is ignored and no transactions are discarded. Both sub-commands will now also accept a '-n' flag. This flag is only meaningful in conjunction with the '-F' flag. When specified, an attempt is made to see if discarding transactions will return the pool to a usable state, but no transactions are actually discarded. Here's a usability suggestion. Whenever clear or import fails, why not automatically do the equivalent of command -F -n (i.e. tell the user if recovery is possible)? If so, the user can invoke with -F if desired. There would be no need to create a -n option. Scott -- Scott Rotondo Principal Engineer, Solaris Security Technologies President, Trusted Computing Group Phone/FAX: +1 408 850 3655 (Internal x68278)
zpool recovery support [PSARC/2009/479 FastTrack timeout 09/16/2009]
Scott Rotondo wrote: Tim Haley wrote: PROPOSED SOLUTION: This fast-track proposes two new command line flags each for the 'zpool clear' and 'zpool import' sub-commands. Both sub-commands will now accept a '-F' recovery mode flag. When specified, a determination is made if discarding the last few transactions performed in an unopenable or non-importable pool will return the pool to an usable state. If so, the transactions are irreversibly discarded, and the pool imported. If the pool is usable or already imported and this flag is specified, the flag is ignored and no transactions are discarded. Both sub-commands will now also accept a '-n' flag. This flag is only meaningful in conjunction with the '-F' flag. When specified, an attempt is made to see if discarding transactions will return the pool to a usable state, but no transactions are actually discarded. Here's a usability suggestion. Whenever clear or import fails, why not automatically do the equivalent of command -F -n (i.e. tell the user if recovery is possible)? If so, the user can invoke with -F if desired. There would be no need to create a -n option. That is exactly how it works in the prototype. The -n is still useful for reconfirming. -tim Scott
zpool recovery support [PSARC/2009/479 FastTrack timeout 09/16/2009]
Tim Haley wrote: Scott Rotondo wrote: Tim Haley wrote: PROPOSED SOLUTION: This fast-track proposes two new command line flags each for the 'zpool clear' and 'zpool import' sub-commands. Both sub-commands will now accept a '-F' recovery mode flag. When specified, a determination is made if discarding the last few transactions performed in an unopenable or non-importable pool will return the pool to an usable state. If so, the transactions are irreversibly discarded, and the pool imported. If the pool is usable or already imported and this flag is specified, the flag is ignored and no transactions are discarded. Both sub-commands will now also accept a '-n' flag. This flag is only meaningful in conjunction with the '-F' flag. When specified, an attempt is made to see if discarding transactions will return the pool to a usable state, but no transactions are actually discarded. Here's a usability suggestion. Whenever clear or import fails, why not automatically do the equivalent of command -F -n (i.e. tell the user if recovery is possible)? If so, the user can invoke with -F if desired. There would be no need to create a -n option. That is exactly how it works in the prototype. The -n is still useful for reconfirming. -tim OK, good. I'm less concerned about removing the -n than I am about making sure we automatically tell the user when he should try -F. Scott -- Scott Rotondo Principal Engineer, Solaris Security Technologies President, Trusted Computing Group Phone/FAX: +1 408 850 3655 (Internal x68278)