zpool recovery support [PSARC/2009/479 FastTrack timeout 09/16/2009]

2009-09-16 Thread Tim Haley
This case was approved in today's PSARC meeting.

-tim



zpool recovery support [PSARC/2009/479 FastTrack timeout 09/16/2009]

2009-09-14 Thread Tim Haley
Victor Latushkin wrote:
 On 10.09.09 07:40, Tim Haley wrote:
 I am sponsoring the following fast-track for myself.  This case
 introduces additional zpool sub-command options to support pool
 recovery.  The case is requesting micro/patch binding.  Timeout is
 09/16/2009.

 Template Version: @(#)sac_nextcase 1.68 02/23/09 SMI
 This information is Copyright 2009 Sun Microsystems
 1. Introduction
 1.1. Project/Component Working Name:
  zpool recovery support
 1.2. Name of Document Author/Supplier:
  Author:  Timothy Haley
 1.3  Date of This Document:
 09 September, 2009
 4. Technical Description

 OVERVIEW:

 Uncooperative or deceptive hardware, combined with power
 failures or sudden lack of access to devices, can result in
 zpools without redundancy being non-importable.  ZFS'
 copy-on-write and Merkle tree properties will sometimes allow
 us to recover from these problems. Only ad-hoc means currently
 exist to take advantage of this recoverability. This proposal
 aims to rectify that short-coming.

 PROPOSED SOLUTION:

 This fast-track proposes two new command line flags each for
 the 'zpool clear' and 'zpool import' sub-commands.
 
 'zpool clear' is becoming more and more overloaded in meaning. Currently 
 it is used to clear error counters (original use) and recover from 
 faulted slog device or suspended state (though there's no mention of it 
 in the man page). This is confusing users and have been brought up 
 several times (at least) on zfs-discuss.
 
 isn't it better to introduce another subcommand 'recover' or something 
 to handle all sorts of recovery?
 
Better is subjective.  For the limited recovery we are going to support at 
the moment, the single flag to clear or import is probably sufficient.  The 
confusion of what to run to recover should hopefully be abated by failed 
imports and 'zpool status' directing the administrator exactly what to run to 
perform a recovery.

Having the flag now does not preclude us from adding a recover subcommand in 
the future for more advanced recovery.

-tim





zpool recovery support [PSARC/2009/479 FastTrack timeout 09/16/2009]

2009-09-14 Thread Darren Reed
On 14/09/09 03:03 PM, Tim Haley wrote:
 Victor Latushkin wrote:
 On 10.09.09 07:40, Tim Haley wrote:
 I am sponsoring the following fast-track for myself.  This case
 introduces additional zpool sub-command options to support pool
 recovery.  The case is requesting micro/patch binding.  Timeout is
 09/16/2009.

 Template Version: @(#)sac_nextcase 1.68 02/23/09 SMI
 This information is Copyright 2009 Sun Microsystems
 1. Introduction
 1.1. Project/Component Working Name:
  zpool recovery support
 1.2. Name of Document Author/Supplier:
  Author:  Timothy Haley
 1.3  Date of This Document:
 09 September, 2009
 4. Technical Description

 OVERVIEW:

 Uncooperative or deceptive hardware, combined with power
 failures or sudden lack of access to devices, can result in
 zpools without redundancy being non-importable.  ZFS'
 copy-on-write and Merkle tree properties will sometimes allow
 us to recover from these problems. Only ad-hoc means currently
 exist to take advantage of this recoverability. This proposal
 aims to rectify that short-coming.

 PROPOSED SOLUTION:

 This fast-track proposes two new command line flags each for
 the 'zpool clear' and 'zpool import' sub-commands.

 'zpool clear' is becoming more and more overloaded in meaning. 
 Currently it is used to clear error counters (original use) and 
 recover from faulted slog device or suspended state (though there's 
 no mention of it in the man page). This is confusing users and have 
 been brought up several times (at least) on zfs-discuss.

 isn't it better to introduce another subcommand 'recover' or 
 something to handle all sorts of recovery?

 Better is subjective.  For the limited recovery we are going to 
 support at the moment, the single flag to clear or import is probably 
 sufficient.  The confusion of what to run to recover should hopefully 
 be abated by failed imports and 'zpool status' directing the 
 administrator exactly what to run to perform a recovery.

 Having the flag now does not preclude us from adding a recover 
 subcommand in the future for more advanced recovery.

If this is a limited recover mechanism then why not make this a variant 
of the recover subcommand, rather than clear?

That seems more obvious to me, in terms of usability.

However, it should be noted that zpool clear does fit the svcadm 
clear operational model.

In light of that, if z zpool recover is added at some point in the 
future, would a zpool clear automatically do whatever extended 
recovery was possible to enable the pool to be mounted?

Given that zpool clear is a recovery operation (of sorts) and that 
you're hinting at there being thought about a more advanced recovery 
option, I think it would be beneficial to understand more about what the 
project team intends to do that requires us to have recovery performed 
by two different subcommands.

For example, how will I know when it is appropriate to use zpool clear 
vs zpool recover?
Is user confusion likely from having two subcommands that do similar but 
different things, depending on the circumstances at hand?

I appreciate that you haven't formally presented us with a case that 
mentions zpool recover, but your email here hints that there is more 
to follow and that might help us put this case in better perspective.

Darren



zpool recovery support [PSARC/2009/479 FastTrack timeout 09/16/2009]

2009-09-10 Thread Darren J Moffat
Tim Haley wrote:
 I am sponsoring the following fast-track for myself.  This case
 introduces additional zpool sub-command options to support pool
 recovery.  The case is requesting micro/patch binding.  Timeout is
 09/16/2009.
 
 Template Version: @(#)sac_nextcase 1.68 02/23/09 SMI
 This information is Copyright 2009 Sun Microsystems
 1. Introduction
 1.1. Project/Component Working Name:
zpool recovery support
 1.2. Name of Document Author/Supplier:
Author:  Timothy Haley
 1.3  Date of This Document:
   09 September, 2009
 4. Technical Description

I'm happy with the case as specified so it gets my +1.

I'm going on the assumption that there are spa history records written 
for this - but didn't expect to see that as part of the ARC material 
since their format isn't an interface and many of them are Internal 
taxonomy anyway.

-- 
Darren J Moffat


zpool recovery support [PSARC/2009/479 FastTrack timeout 09/16/2009]

2009-09-10 Thread Sebastien Roy
On Wed, 2009-09-09 at 21:40 -0600, Tim Haley wrote:
 I am sponsoring the following fast-track for myself.  This case
 introduces additional zpool sub-command options to support pool
 recovery.  The case is requesting micro/patch binding.  Timeout is
 09/16/2009.

+1

-Seb




zpool recovery support [PSARC/2009/479 FastTrack timeout 09/16/2009]

2009-09-09 Thread Tim Haley
I am sponsoring the following fast-track for myself.  This case
introduces additional zpool sub-command options to support pool
recovery.  The case is requesting micro/patch binding.  Timeout is
09/16/2009.

Template Version: @(#)sac_nextcase 1.68 02/23/09 SMI
This information is Copyright 2009 Sun Microsystems
1. Introduction
1.1. Project/Component Working Name:
 zpool recovery support
1.2. Name of Document Author/Supplier:
 Author:  Timothy Haley
1.3  Date of This Document:
09 September, 2009
4. Technical Description

OVERVIEW:

Uncooperative or deceptive hardware, combined with power
failures or sudden lack of access to devices, can result in
zpools without redundancy being non-importable.  ZFS'
copy-on-write and Merkle tree properties will sometimes allow
us to recover from these problems. Only ad-hoc means currently
exist to take advantage of this recoverability. This proposal
aims to rectify that short-coming.

PROPOSED SOLUTION:

This fast-track proposes two new command line flags each for
the 'zpool clear' and 'zpool import' sub-commands.

Both sub-commands will now accept a '-F' recovery mode flag.
When specified, a determination is made if discarding the last
few transactions performed in an unopenable or non-importable
pool will return the pool to an usable state.  If so, the
transactions are irreversibly discarded, and the pool
imported.  If the pool is usable or already imported and this
flag is specified, the flag is ignored and no transactions are
discarded.

Both sub-commands will now also accept a '-n' flag.  This flag
is only meaningful in conjunction with the '-F' flag.  When
specified, an attempt is made to see if discarding transactions
will return the pool to a usable state, but no transactions are
actually discarded.

PROPOSED CHANGES to ZPOOL(1M) PAGE:

--- zpool.1m.rogi   Thu Aug 27 09:59:14 2009
+++ zpool.1mWed Sep  9 21:02:25 2009
@@ -18,7 +18,7 @@
  zpool attach [-f] pool device new_device
 
 
- zpool clear pool [device]
+ zpool clear [-n] [-F] pool [device]
 
 
  zpool create [-fn] [-o property=value] ... [-O file-system-property=value]
@@ -44,11 +44,11 @@
 
 
  zpool import [-o mntopts] [-p property=value] ... [-d dir | -c cachefile]
-  [-D] [-f] [-R root] -a
+  [-D] [-f] [-R root] [-n] [-F] -a
 
 
  zpool import [-o mntopts] [-o property=value] ... [-d dir | -c cachefile]
-  [-D] [-f] [-R root] pool |id [newpool]
+  [-D] [-f] [-R root] [-n] [-F] pool |id [newpool]
 
 
  zpool iostat [-v] [pool] ... [interval[count]]
@@ -761,7 +761,7 @@
 
 
 
- zpool clear pool [device] ...
+ zpool clear [-n] [-F] pool [device] ...
 
  Clears device errors in a  pool.  If  no  arguments  are
  specified,   all  device  errors  within  the  pool  are
@@ -769,7 +769,18 @@
  errors  associated  with the specified device or devices
  are cleared.
 
+ -FInitiates recovery mode for a unopenable pool.
+   Attempts to discard the last few transactions in the
+   pool to return it to an openable state.  Not all
+   damaged pools can be recovered by using this option.
+   If successful, the data from the discarded transactions
+   is irreversibly lost.
 
+ -nUsed in combination with the -F flag.  Check if
+   discarding transactions would make the pool openable,
+   but do not actually discard any transactions.
+
+
  zpool create [-fn] [-o property=value] ... [-O file-system-
  property=value] ... [-m mountpoint] [-R root] pool vdev ...
 
@@ -1016,7 +1027,7 @@
 
 
  zpool import [-o mntopts] [ -o property=value] ... [-d dir |
- -c cachefile] [-D] [-f] [-R root] -a
+ -c cachefile] [-D] [-f] [-n] [-F] [-R root] -a
 
  Imports all  pools  found  in  the  search  directories.
  Identical to the previous command, except that all pools
@@ -1075,6 +1086,17 @@
   appears to be potentially active.
 
 
+ -F   Recovery mode for a non-importable pool.
+  Attempt to return the pool to an
+  importable state by discarding the last
+  few transactions.  Not all damaged pools
+  can be recovered by using this option.
+  If successful, the data from the
+  discarded transactions is irreversibly
+  lost.  This option is ignored if the pool
+  is importable or already imported.
+
+
  -a   Searches for and imports all  pools
   found.
 
@@ 

zpool recovery support [PSARC/2009/479 FastTrack timeout 09/16/2009]

2009-09-09 Thread Scott Rotondo
Tim Haley wrote:
 
 PROPOSED SOLUTION:
 
   This fast-track proposes two new command line flags each for
   the 'zpool clear' and 'zpool import' sub-commands.
 
   Both sub-commands will now accept a '-F' recovery mode flag.
   When specified, a determination is made if discarding the last
   few transactions performed in an unopenable or non-importable
   pool will return the pool to an usable state.  If so, the
   transactions are irreversibly discarded, and the pool
   imported.  If the pool is usable or already imported and this
   flag is specified, the flag is ignored and no transactions are
   discarded.
 
   Both sub-commands will now also accept a '-n' flag.  This flag
   is only meaningful in conjunction with the '-F' flag.  When
   specified, an attempt is made to see if discarding transactions
   will return the pool to a usable state, but no transactions are
   actually discarded.

Here's a usability suggestion. Whenever clear or import fails, why not 
automatically do the equivalent of command -F -n (i.e. tell the user 
if recovery is possible)? If so, the user can invoke with -F if desired. 
There would be no need to create a -n option.

Scott


-- 
Scott Rotondo
Principal Engineer, Solaris Security Technologies
President, Trusted Computing Group
Phone/FAX: +1 408 850 3655 (Internal x68278)


zpool recovery support [PSARC/2009/479 FastTrack timeout 09/16/2009]

2009-09-09 Thread Tim Haley
Scott Rotondo wrote:
 Tim Haley wrote:

 PROPOSED SOLUTION:

 This fast-track proposes two new command line flags each for
 the 'zpool clear' and 'zpool import' sub-commands.

 Both sub-commands will now accept a '-F' recovery mode flag.
 When specified, a determination is made if discarding the last
 few transactions performed in an unopenable or non-importable
 pool will return the pool to an usable state.  If so, the
 transactions are irreversibly discarded, and the pool
 imported.  If the pool is usable or already imported and this
 flag is specified, the flag is ignored and no transactions are
 discarded.

 Both sub-commands will now also accept a '-n' flag.  This flag
 is only meaningful in conjunction with the '-F' flag.  When
 specified, an attempt is made to see if discarding transactions
 will return the pool to a usable state, but no transactions are
 actually discarded.
 
 Here's a usability suggestion. Whenever clear or import fails, why not 
 automatically do the equivalent of command -F -n (i.e. tell the user 
 if recovery is possible)? If so, the user can invoke with -F if desired. 
 There would be no need to create a -n option.
 
That is exactly how it works in the prototype.

The -n is still useful for reconfirming.

-tim


 Scott
 
 


zpool recovery support [PSARC/2009/479 FastTrack timeout 09/16/2009]

2009-09-09 Thread Scott Rotondo
Tim Haley wrote:
 Scott Rotondo wrote:
 Tim Haley wrote:

 PROPOSED SOLUTION:

 This fast-track proposes two new command line flags each for
 the 'zpool clear' and 'zpool import' sub-commands.

 Both sub-commands will now accept a '-F' recovery mode flag.
 When specified, a determination is made if discarding the last
 few transactions performed in an unopenable or non-importable
 pool will return the pool to an usable state.  If so, the
 transactions are irreversibly discarded, and the pool
 imported.  If the pool is usable or already imported and this
 flag is specified, the flag is ignored and no transactions are
 discarded.

 Both sub-commands will now also accept a '-n' flag.  This flag
 is only meaningful in conjunction with the '-F' flag.  When
 specified, an attempt is made to see if discarding transactions
 will return the pool to a usable state, but no transactions are
 actually discarded.

 Here's a usability suggestion. Whenever clear or import fails, why not 
 automatically do the equivalent of command -F -n (i.e. tell the user 
 if recovery is possible)? If so, the user can invoke with -F if 
 desired. There would be no need to create a -n option.

 That is exactly how it works in the prototype.
 
 The -n is still useful for reconfirming.
 
 -tim
 

OK, good. I'm less concerned about removing the -n than I am about 
making sure we automatically tell the user when he should try -F.

Scott

-- 
Scott Rotondo
Principal Engineer, Solaris Security Technologies
President, Trusted Computing Group
Phone/FAX: +1 408 850 3655 (Internal x68278)