zpool recovery support [PSARC/2009/479 FastTrack timeout 09/16/2009]

2009-09-16 Thread Tim Haley
This case was approved in today's PSARC meeting.

-tim



zpool recovery support [PSARC/2009/479 FastTrack timeout 09/16/2009]

2009-09-15 Thread Tim Haley
Darren Reed wrote:
> On 14/09/09 03:03 PM, Tim Haley wrote:
>> Victor Latushkin wrote:
>>> On 10.09.09 07:40, Tim Haley wrote:
 I am sponsoring the following fast-track for myself.  This case
 introduces additional zpool sub-command options to support pool
 recovery.  The case is requesting micro/patch binding.  Timeout is
 09/16/2009.

 Template Version: @(#)sac_nextcase 1.68 02/23/09 SMI
 This information is Copyright 2009 Sun Microsystems
 1. Introduction
 1.1. Project/Component Working Name:
  zpool recovery support
 1.2. Name of Document Author/Supplier:
  Author:  Timothy Haley
 1.3  Date of This Document:
 09 September, 2009
 4. Technical Description

 OVERVIEW:

 Uncooperative or deceptive hardware, combined with power
 failures or sudden lack of access to devices, can result in
 zpools without redundancy being non-importable.  ZFS'
 copy-on-write and Merkle tree properties will sometimes allow
 us to recover from these problems. Only ad-hoc means currently
 exist to take advantage of this recoverability. This proposal
 aims to rectify that short-coming.

 PROPOSED SOLUTION:

 This fast-track proposes two new command line flags each for
 the 'zpool clear' and 'zpool import' sub-commands.
>>>
>>> 'zpool clear' is becoming more and more overloaded in meaning. 
>>> Currently it is used to clear error counters (original use) and 
>>> recover from faulted slog device or suspended state (though there's 
>>> no mention of it in the man page). This is confusing users and have 
>>> been brought up several times (at least) on zfs-discuss.
>>>
>>> isn't it better to introduce another subcommand 'recover' or 
>>> something to handle all sorts of recovery?
>>>
>> "Better" is subjective.  For the limited recovery we are going to 
>> support at the moment, the single flag to clear or import is probably 
>> sufficient.  The confusion of what to run to recover should hopefully 
>> be abated by failed imports and 'zpool status' directing the 
>> administrator exactly what to run to perform a recovery.
>>
>> Having the flag now does not preclude us from adding a recover 
>> subcommand in the future for more advanced recovery.
> 
> If this is a limited recover mechanism then why not make this a variant 
> of the recover subcommand, rather than clear?
> 
> That seems more obvious to me, in terms of usability.
> 
> However, it should be noted that "zpool clear" does fit the "svcadm 
> clear" operational model.
> 
> In light of that, if z "zpool recover" is added at some point in the 
> future, would a "zpool clear" automatically do whatever extended 
> recovery was possible to enable the pool to be mounted?
> 
> Given that "zpool clear" is a recovery operation (of sorts) and that 
> you're hinting at there being thought about a more advanced recovery 
> option, I think it would be beneficial to understand more about what the 
> project team intends to do that requires us to have recovery performed 
> by two different subcommands.
> 
You've read a lot more into my statement than I ever meant to imply.  I only 
meant that if we thought up something exotic in the longer term, we could 
still consider a new sub-command.  I doubt it would be necessary, though.

-tim

> For example, how will I know when it is appropriate to use "zpool clear" 
> vs "zpool recover"?
> Is user confusion likely from having two subcommands that do similar but 
> different things, depending on the circumstances at hand?
> 
> I appreciate that you haven't formally presented us with a case that 
> mentions "zpool recover", but your email here hints that there is more 
> to follow and that might help us put this case in better perspective.
> 
> Darren
> 



zpool recovery support [PSARC/2009/479 FastTrack timeout 09/16/2009]

2009-09-15 Thread Victor Latushkin
On 10.09.09 07:40, Tim Haley wrote:
> I am sponsoring the following fast-track for myself.  This case
> introduces additional zpool sub-command options to support pool
> recovery.  The case is requesting micro/patch binding.  Timeout is
> 09/16/2009.
> 
> Template Version: @(#)sac_nextcase 1.68 02/23/09 SMI
> This information is Copyright 2009 Sun Microsystems
> 1. Introduction
> 1.1. Project/Component Working Name:
>zpool recovery support
> 1.2. Name of Document Author/Supplier:
>Author:  Timothy Haley
> 1.3  Date of This Document:
>   09 September, 2009
> 4. Technical Description
> 
> OVERVIEW:
> 
>   Uncooperative or deceptive hardware, combined with power
>   failures or sudden lack of access to devices, can result in
>   zpools without redundancy being non-importable.  ZFS'
>   copy-on-write and Merkle tree properties will sometimes allow
>   us to recover from these problems. Only ad-hoc means currently
>   exist to take advantage of this recoverability. This proposal
>   aims to rectify that short-coming.
> 
> PROPOSED SOLUTION:
> 
>   This fast-track proposes two new command line flags each for
>   the 'zpool clear' and 'zpool import' sub-commands.

'zpool clear' is becoming more and more overloaded in meaning. Currently it is 
used to clear error counters (original use) and recover from faulted slog 
device 
or suspended state (though there's no mention of it in the man page). This is 
confusing users and have been brought up several times (at least) on 
zfs-discuss.

isn't it better to introduce another subcommand 'recover' or something to 
handle all sorts of recovery?

victor


zpool recovery support [PSARC/2009/479 FastTrack timeout 09/16/2009]

2009-09-14 Thread Darren Reed
On 14/09/09 03:03 PM, Tim Haley wrote:
> Victor Latushkin wrote:
>> On 10.09.09 07:40, Tim Haley wrote:
>>> I am sponsoring the following fast-track for myself.  This case
>>> introduces additional zpool sub-command options to support pool
>>> recovery.  The case is requesting micro/patch binding.  Timeout is
>>> 09/16/2009.
>>>
>>> Template Version: @(#)sac_nextcase 1.68 02/23/09 SMI
>>> This information is Copyright 2009 Sun Microsystems
>>> 1. Introduction
>>> 1.1. Project/Component Working Name:
>>>  zpool recovery support
>>> 1.2. Name of Document Author/Supplier:
>>>  Author:  Timothy Haley
>>> 1.3  Date of This Document:
>>> 09 September, 2009
>>> 4. Technical Description
>>>
>>> OVERVIEW:
>>>
>>> Uncooperative or deceptive hardware, combined with power
>>> failures or sudden lack of access to devices, can result in
>>> zpools without redundancy being non-importable.  ZFS'
>>> copy-on-write and Merkle tree properties will sometimes allow
>>> us to recover from these problems. Only ad-hoc means currently
>>> exist to take advantage of this recoverability. This proposal
>>> aims to rectify that short-coming.
>>>
>>> PROPOSED SOLUTION:
>>>
>>> This fast-track proposes two new command line flags each for
>>> the 'zpool clear' and 'zpool import' sub-commands.
>>
>> 'zpool clear' is becoming more and more overloaded in meaning. 
>> Currently it is used to clear error counters (original use) and 
>> recover from faulted slog device or suspended state (though there's 
>> no mention of it in the man page). This is confusing users and have 
>> been brought up several times (at least) on zfs-discuss.
>>
>> isn't it better to introduce another subcommand 'recover' or 
>> something to handle all sorts of recovery?
>>
> "Better" is subjective.  For the limited recovery we are going to 
> support at the moment, the single flag to clear or import is probably 
> sufficient.  The confusion of what to run to recover should hopefully 
> be abated by failed imports and 'zpool status' directing the 
> administrator exactly what to run to perform a recovery.
>
> Having the flag now does not preclude us from adding a recover 
> subcommand in the future for more advanced recovery.

If this is a limited recover mechanism then why not make this a variant 
of the recover subcommand, rather than clear?

That seems more obvious to me, in terms of usability.

However, it should be noted that "zpool clear" does fit the "svcadm 
clear" operational model.

In light of that, if z "zpool recover" is added at some point in the 
future, would a "zpool clear" automatically do whatever extended 
recovery was possible to enable the pool to be mounted?

Given that "zpool clear" is a recovery operation (of sorts) and that 
you're hinting at there being thought about a more advanced recovery 
option, I think it would be beneficial to understand more about what the 
project team intends to do that requires us to have recovery performed 
by two different subcommands.

For example, how will I know when it is appropriate to use "zpool clear" 
vs "zpool recover"?
Is user confusion likely from having two subcommands that do similar but 
different things, depending on the circumstances at hand?

I appreciate that you haven't formally presented us with a case that 
mentions "zpool recover", but your email here hints that there is more 
to follow and that might help us put this case in better perspective.

Darren



zpool recovery support [PSARC/2009/479 FastTrack timeout 09/16/2009]

2009-09-14 Thread Tim Haley
Victor Latushkin wrote:
> On 10.09.09 07:40, Tim Haley wrote:
>> I am sponsoring the following fast-track for myself.  This case
>> introduces additional zpool sub-command options to support pool
>> recovery.  The case is requesting micro/patch binding.  Timeout is
>> 09/16/2009.
>>
>> Template Version: @(#)sac_nextcase 1.68 02/23/09 SMI
>> This information is Copyright 2009 Sun Microsystems
>> 1. Introduction
>> 1.1. Project/Component Working Name:
>>  zpool recovery support
>> 1.2. Name of Document Author/Supplier:
>>  Author:  Timothy Haley
>> 1.3  Date of This Document:
>> 09 September, 2009
>> 4. Technical Description
>>
>> OVERVIEW:
>>
>> Uncooperative or deceptive hardware, combined with power
>> failures or sudden lack of access to devices, can result in
>> zpools without redundancy being non-importable.  ZFS'
>> copy-on-write and Merkle tree properties will sometimes allow
>> us to recover from these problems. Only ad-hoc means currently
>> exist to take advantage of this recoverability. This proposal
>> aims to rectify that short-coming.
>>
>> PROPOSED SOLUTION:
>>
>> This fast-track proposes two new command line flags each for
>> the 'zpool clear' and 'zpool import' sub-commands.
> 
> 'zpool clear' is becoming more and more overloaded in meaning. Currently 
> it is used to clear error counters (original use) and recover from 
> faulted slog device or suspended state (though there's no mention of it 
> in the man page). This is confusing users and have been brought up 
> several times (at least) on zfs-discuss.
> 
> isn't it better to introduce another subcommand 'recover' or something 
> to handle all sorts of recovery?
> 
"Better" is subjective.  For the limited recovery we are going to support at 
the moment, the single flag to clear or import is probably sufficient.  The 
confusion of what to run to recover should hopefully be abated by failed 
imports and 'zpool status' directing the administrator exactly what to run to 
perform a recovery.

Having the flag now does not preclude us from adding a recover subcommand in 
the future for more advanced recovery.

-tim





zpool recovery support [PSARC/2009/479 FastTrack timeout 09/16/2009]

2009-09-10 Thread Darren J Moffat
Tim Haley wrote:
> I am sponsoring the following fast-track for myself.  This case
> introduces additional zpool sub-command options to support pool
> recovery.  The case is requesting micro/patch binding.  Timeout is
> 09/16/2009.
> 
> Template Version: @(#)sac_nextcase 1.68 02/23/09 SMI
> This information is Copyright 2009 Sun Microsystems
> 1. Introduction
> 1.1. Project/Component Working Name:
>zpool recovery support
> 1.2. Name of Document Author/Supplier:
>Author:  Timothy Haley
> 1.3  Date of This Document:
>   09 September, 2009
> 4. Technical Description

I'm happy with the case as specified so it gets my +1.

I'm going on the assumption that there are spa history records written 
for this - but didn't expect to see that as part of the ARC material 
since their format isn't an interface and many of them are Internal 
taxonomy anyway.

-- 
Darren J Moffat


zpool recovery support [PSARC/2009/479 FastTrack timeout 09/16/2009]

2009-09-10 Thread Sebastien Roy
On Wed, 2009-09-09 at 21:40 -0600, Tim Haley wrote:
> I am sponsoring the following fast-track for myself.  This case
> introduces additional zpool sub-command options to support pool
> recovery.  The case is requesting micro/patch binding.  Timeout is
> 09/16/2009.

+1

-Seb




zpool recovery support [PSARC/2009/479 FastTrack timeout 09/16/2009]

2009-09-09 Thread Tim Haley
Scott Rotondo wrote:
> Tim Haley wrote:
>>
>> PROPOSED SOLUTION:
>>
>> This fast-track proposes two new command line flags each for
>> the 'zpool clear' and 'zpool import' sub-commands.
>>
>> Both sub-commands will now accept a '-F' recovery mode flag.
>> When specified, a determination is made if discarding the last
>> few transactions performed in an unopenable or non-importable
>> pool will return the pool to an usable state.  If so, the
>> transactions are irreversibly discarded, and the pool
>> imported.  If the pool is usable or already imported and this
>> flag is specified, the flag is ignored and no transactions are
>> discarded.
>>
>> Both sub-commands will now also accept a '-n' flag.  This flag
>> is only meaningful in conjunction with the '-F' flag.  When
>> specified, an attempt is made to see if discarding transactions
>> will return the pool to a usable state, but no transactions are
>> actually discarded.
> 
> Here's a usability suggestion. Whenever clear or import fails, why not 
> automatically do the equivalent of  -F -n (i.e. tell the user 
> if recovery is possible)? If so, the user can invoke with -F if desired. 
> There would be no need to create a -n option.
> 
That is exactly how it works in the prototype.

The -n is still useful for reconfirming.

-tim


> Scott
> 
> 


zpool recovery support [PSARC/2009/479 FastTrack timeout 09/16/2009]

2009-09-09 Thread Scott Rotondo
Tim Haley wrote:
> Scott Rotondo wrote:
>> Tim Haley wrote:
>>>
>>> PROPOSED SOLUTION:
>>>
>>> This fast-track proposes two new command line flags each for
>>> the 'zpool clear' and 'zpool import' sub-commands.
>>>
>>> Both sub-commands will now accept a '-F' recovery mode flag.
>>> When specified, a determination is made if discarding the last
>>> few transactions performed in an unopenable or non-importable
>>> pool will return the pool to an usable state.  If so, the
>>> transactions are irreversibly discarded, and the pool
>>> imported.  If the pool is usable or already imported and this
>>> flag is specified, the flag is ignored and no transactions are
>>> discarded.
>>>
>>> Both sub-commands will now also accept a '-n' flag.  This flag
>>> is only meaningful in conjunction with the '-F' flag.  When
>>> specified, an attempt is made to see if discarding transactions
>>> will return the pool to a usable state, but no transactions are
>>> actually discarded.
>>
>> Here's a usability suggestion. Whenever clear or import fails, why not 
>> automatically do the equivalent of  -F -n (i.e. tell the user 
>> if recovery is possible)? If so, the user can invoke with -F if 
>> desired. There would be no need to create a -n option.
>>
> That is exactly how it works in the prototype.
> 
> The -n is still useful for reconfirming.
> 
> -tim
> 

OK, good. I'm less concerned about removing the -n than I am about 
making sure we automatically tell the user when he should try -F.

Scott

-- 
Scott Rotondo
Principal Engineer, Solaris Security Technologies
President, Trusted Computing Group
Phone/FAX: +1 408 850 3655 (Internal x68278)


zpool recovery support [PSARC/2009/479 FastTrack timeout 09/16/2009]

2009-09-09 Thread Tim Haley
I am sponsoring the following fast-track for myself.  This case
introduces additional zpool sub-command options to support pool
recovery.  The case is requesting micro/patch binding.  Timeout is
09/16/2009.

Template Version: @(#)sac_nextcase 1.68 02/23/09 SMI
This information is Copyright 2009 Sun Microsystems
1. Introduction
1.1. Project/Component Working Name:
 zpool recovery support
1.2. Name of Document Author/Supplier:
 Author:  Timothy Haley
1.3  Date of This Document:
09 September, 2009
4. Technical Description

OVERVIEW:

Uncooperative or deceptive hardware, combined with power
failures or sudden lack of access to devices, can result in
zpools without redundancy being non-importable.  ZFS'
copy-on-write and Merkle tree properties will sometimes allow
us to recover from these problems. Only ad-hoc means currently
exist to take advantage of this recoverability. This proposal
aims to rectify that short-coming.

PROPOSED SOLUTION:

This fast-track proposes two new command line flags each for
the 'zpool clear' and 'zpool import' sub-commands.

Both sub-commands will now accept a '-F' recovery mode flag.
When specified, a determination is made if discarding the last
few transactions performed in an unopenable or non-importable
pool will return the pool to an usable state.  If so, the
transactions are irreversibly discarded, and the pool
imported.  If the pool is usable or already imported and this
flag is specified, the flag is ignored and no transactions are
discarded.

Both sub-commands will now also accept a '-n' flag.  This flag
is only meaningful in conjunction with the '-F' flag.  When
specified, an attempt is made to see if discarding transactions
will return the pool to a usable state, but no transactions are
actually discarded.

PROPOSED CHANGES to ZPOOL(1M) PAGE:

--- zpool.1m.rogi   Thu Aug 27 09:59:14 2009
+++ zpool.1mWed Sep  9 21:02:25 2009
@@ -18,7 +18,7 @@
  zpool attach [-f] pool device new_device
 
 
- zpool clear pool [device]
+ zpool clear [-n] [-F] pool [device]
 
 
  zpool create [-fn] [-o property=value] ... [-O file-system-property=value]
@@ -44,11 +44,11 @@
 
 
  zpool import [-o mntopts] [-p property=value] ... [-d dir | -c cachefile]
-  [-D] [-f] [-R root] -a
+  [-D] [-f] [-R root] [-n] [-F] -a
 
 
  zpool import [-o mntopts] [-o property=value] ... [-d dir | -c cachefile]
-  [-D] [-f] [-R root] pool |id [newpool]
+  [-D] [-f] [-R root] [-n] [-F] pool |id [newpool]
 
 
  zpool iostat [-v] [pool] ... [interval[count]]
@@ -761,7 +761,7 @@
 
 
 
- zpool clear pool [device] ...
+ zpool clear [-n] [-F] pool [device] ...
 
  Clears device errors in a  pool.  If  no  arguments  are
  specified,   all  device  errors  within  the  pool  are
@@ -769,7 +769,18 @@
  errors  associated  with the specified device or devices
  are cleared.
 
+ -FInitiates recovery mode for a unopenable pool.
+   Attempts to discard the last few transactions in the
+   pool to return it to an openable state.  Not all
+   damaged pools can be recovered by using this option.
+   If successful, the data from the discarded transactions
+   is irreversibly lost.
 
+ -nUsed in combination with the -F flag.  Check if
+   discarding transactions would make the pool openable,
+   but do not actually discard any transactions.
+
+
  zpool create [-fn] [-o property=value] ... [-O file-system-
  property=value] ... [-m mountpoint] [-R root] pool vdev ...
 
@@ -1016,7 +1027,7 @@
 
 
  zpool import [-o mntopts] [ -o property=value] ... [-d dir |
- -c cachefile] [-D] [-f] [-R root] -a
+ -c cachefile] [-D] [-f] [-n] [-F] [-R root] -a
 
  Imports all  pools  found  in  the  search  directories.
  Identical to the previous command, except that all pools
@@ -1075,6 +1086,17 @@
   appears to be potentially active.
 
 
+ -F   Recovery mode for a non-importable pool.
+  Attempt to return the pool to an
+  importable state by discarding the last
+  few transactions.  Not all damaged pools
+  can be recovered by using this option.
+  If successful, the data from the
+  discarded transactions is irreversibly
+  lost.  This option is ignored if the pool
+  is importable or already imported.
+
+
  -a   Searches for and imports all  pools
   found.
 
@@ -1083,

zpool recovery support [PSARC/2009/479 FastTrack timeout 09/16/2009]

2009-09-09 Thread Scott Rotondo
Tim Haley wrote:
> 
> PROPOSED SOLUTION:
> 
>   This fast-track proposes two new command line flags each for
>   the 'zpool clear' and 'zpool import' sub-commands.
> 
>   Both sub-commands will now accept a '-F' recovery mode flag.
>   When specified, a determination is made if discarding the last
>   few transactions performed in an unopenable or non-importable
>   pool will return the pool to an usable state.  If so, the
>   transactions are irreversibly discarded, and the pool
>   imported.  If the pool is usable or already imported and this
>   flag is specified, the flag is ignored and no transactions are
>   discarded.
> 
>   Both sub-commands will now also accept a '-n' flag.  This flag
>   is only meaningful in conjunction with the '-F' flag.  When
>   specified, an attempt is made to see if discarding transactions
>   will return the pool to a usable state, but no transactions are
>   actually discarded.

Here's a usability suggestion. Whenever clear or import fails, why not 
automatically do the equivalent of  -F -n (i.e. tell the user 
if recovery is possible)? If so, the user can invoke with -F if desired. 
There would be no need to create a -n option.

Scott


-- 
Scott Rotondo
Principal Engineer, Solaris Security Technologies
President, Trusted Computing Group
Phone/FAX: +1 408 850 3655 (Internal x68278)