Re: [zfs-discuss] FS Reliability WAS: about btrfs and zfs

2011-10-21 Thread Paul Kraus
Recently someone posted to this list of that _exact_ situation, they loaded
an OS to a pair of drives while a pair of different drives containing an OS
were still attached. The zpool on the first pair ended up not being able to
be imported, and were corrupted. I can post more info when I am back in the
office on Monday.

On Friday, October 21, 2011, Fred Liu  wrote:
>
>> 3. Do NOT let a system see drives with more than one OS zpool at the
>> same time (I know you _can_ do this safely, but I have seen too many
>> horror stories on this list that I just avoid it).
>>
>
> Can you elaborate #3? In what situation will it happen?
>
>
> Thanks.
>
> Fred
>

-- 
{1-2-3-4-5-6-7-}
Paul Kraus
-> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
-> Sound Coordinator, Schenectady Light Opera Company (
http://www.sloctheater.org/ )
-> Technical Advisor, RPI Players
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FS Reliability WAS: about btrfs and zfs

2011-10-21 Thread Mike Gerdts
On Fri, Oct 21, 2011 at 8:02 PM, Fred Liu  wrote:
>
>> 3. Do NOT let a system see drives with more than one OS zpool at the
>> same time (I know you _can_ do this safely, but I have seen too many
>> horror stories on this list that I just avoid it).
>>
>
> Can you elaborate #3? In what situation will it happen?

Some people have trained their fingers to use the -f option on every
command that supports it to force the operation.  For instance, how
often do you do rm -rf vs. rm -r and answer questions about every
file?

If various zpool commands (import, create, replace, etc.) are used
against the wrong disk with a force option, you can clobber a zpool
that is in active use by another system.  In a previous job, my lab
environment had a bunch of LUNs presented to multiple boxes.  This was
done for convenience in an environment where there would be little
impact if an errant command were issued.  I'd never do that in
production without some form of I/O fencing in place.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FS Reliability WAS: about btrfs and zfs

2011-10-21 Thread Fred Liu

> 3. Do NOT let a system see drives with more than one OS zpool at the
> same time (I know you _can_ do this safely, but I have seen too many
> horror stories on this list that I just avoid it).
> 

Can you elaborate #3? In what situation will it happen?


Thanks.

Fred
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Growing CKSUM errors with no READ/WRITE errors

2011-10-21 Thread Eric Sproul
On Thu, Oct 20, 2011 at 7:55 AM, Edward Ned Harvey
 wrote:
>> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
>> boun...@opensolaris.org] On Behalf Of Jim Klimov
>>
>> new CKSUM errors
>> are being found. There are zero READ or WRITE error counts,
>> though.
>>
>> Should we be worried about replacing the ex-hotspare drive
>> ASAP as well?
>
> You should not be increasing CKSUM errors.  There is something wrong.  I 
> cannot say it's necessarily the fault of the drive, but probably it is.  When 
> some threshold is reached, ZFS should mark the drive as faulted due to too 
> many cksum errors.  I don't recommend waiting for it.

It probably indicates something else faulty in the I/O path, which
could include RAM, HBA or integrated controller chip, loose or
defective cabling, etc.  If RAM is ECC-capable, it seems unlikely to
be the issue.  I'd make sure all cables are fully seated and not
kinked or otherwise damaged.

Eric
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] commercial zfs-based storage replication software?

2011-10-21 Thread Fajar A. Nugraha
2011/10/20 Jim Klimov :
> 2011-10-19 17:54, Fajar A. Nugraha пишет:
>>
>> On Wed, Oct 19, 2011 at 7:52 PM, Jim Klimov  wrote:
>>>
>>> Well, just for the sake of completeness: most of our systems are using
>>> zfs-auto-snap service, including Solaris 10 systems datiing from Sol10u6.
>>> Installation of relevant packages from SXCE (ranging snv_117-snv_130) was
>>> trivial, but some script-patching was in order. I think, replacement of the
>>> ksh interpreter to ksh93.
>>
>> Yes, I remembered reading about that.
>>
>
> Actually, I revised the systems: the scripts are kept in original form, but
> those sol10 servers where ksh93 was absent, got a symlink:
>
> /usr/bin/ksh93 -> ../dt/bin/dtksh

For the sake of completeness, I tried using zfs-auto-snapshot 0.12
(the last version before obsoleted by  time-slider). I used opencsw's
ksh 93 (http://www.opencsw.org/package/ksh/) and symlink
/opt/csw/bin/ksh to /usr/bin/ksh93.

It kinda works when only used for automated snapshot. But when using
zfs/backup-save-cmd property ( to make remote system do a "zfs receive
-d -F", problem started to happen. It complains the snapshot already
exist. Probably the result of interaction between
com.sun:auto-snapshot user property, child dataset, and "zfs send -R".
If only "zfs receive" has the option to "force delete the snapshot if
it already exist, rollback to the previous one, and retry the
incremental receive" :)

Since zfs-auto-snapshot has been obsoleted (and obviously
unsupported), it wouldn't make sense for me to try to fix this. I
might revisit this option again later using time-slider when solaris
11 is released though.

Thanks for the info so far.

-- 
Fajar
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss