Re: [OpenIndiana-discuss] HBA failover

Richard Elling Tue, 18 Jun 2013 13:02:13 -0700

On Jun 18, 2013, at 5:34 AM, Sebastian Gabler <sequoiamo...@gmx.net> wrote:


> Am 18.06.2013 06:15, schrieb openindiana-discuss-requ...@openindiana.org:
>> Message: 7
>> Date: Mon, 17 Jun 2013 17:00:37 -0700
>> From: Richard Elling<richard.ell...@richardelling.com>
>> To: Discussion list for OpenIndiana
>>      <openindiana-discuss@openindiana.org>
>> Subject: Re: [OpenIndiana-discuss] HBA failover
>> Message-ID:<b4bf5130-a0a2-4d91-ada4-cf7f86616...@richardelling.com>
>> Content-Type: text/plain;    charset=us-ascii
>> 
>> On Jun 17, 2013, at 1:36 PM, Sebastian Gabler<sequoiamo...@gmx.net>  wrote:
>> 
>>> >Dear Bill, Peter, Richard, and Saso.
>>> >
>>> >Thanks for the great comments.
>>> >
>>> >Now, changing to reverse gear, isn't it more likely to loose data by 
>>> >having a pool that spans across mutiple HBAs than if you connect all 
>>> >drives to a single HBA?
>> That has an easy answer: yes. But you were asking about data availability, 
>> which is not
>> the same as data loss.
> Good point. That helps to sort the thought.
>> 
>>> >I mean, unless you make sure that there are never any more drives served 
>>> >by one HBA alone (single-ported SATA drives) in a leaf VDEV than can be 
>>> >tolerated by the provided redundancy, a VDEV in the pool could become 
>>> >unavailable upon HBA failure, ultimately leading to loss of the whole pool?
>> In ZFS, if a pool's redundancy cannot be assured, then the pool goes into 
>> the FAULTED
>> state and I/O is suspended until the fault is cleared. In many cases, there 
>> is no data loss, even
>> though the data is unavailable while the pool is FAULTED.
>> 
>> In general, HBAs do not have long-lived persistent storage. The data passes 
>> through them to
>> the disks. ZFS can recover from the typical failure modes where data is lost 
>> in-flight without a
>> commitment to media (see the many posts on cache flush behaviour)
> The underlying aspect to my thought is if pool fault is immediately 
> guaranteed from a hardware failure.

"immediate" is a word we don't use in the reliability business because "if a 
tree immediately
falls in the forest, does it make a sound?" :-) Until an I/O fails, it is not 
known that the device
or the datapath to the device is operational. At some point after determining 
that an I/O failed
and all other redundancy attempts fail (multipathing, reset device, retry, RAID 
etc) then the 
pool can be changed to a FAULTED state.

> There is still ZFS allocating writes dynamically depending on the workload.

ZFS will accept writes until the pool is FAULTED. Discussion of failmode > 
/dev/null for now.

> So, it may occur that not all writes hit every vdev.

This is possible, but unlikely in the real world.

>  If that happens, ZFS might note only later that one of the vdevs has gone in 
> the meantime.
> That is what I thought I had seen when my pool died slowly.  Again, I could 
> be wrong with that.
> OTOH, it's not a surprise that a mirrored rpool would be resilient - because 
> that is a design offering redundancy on the root node, so it doesn't matter 
> if one side falls by a disk, link, or hba failure.  A pool of concatenated, 
> redundant leafs can't have redundancy on the root node - as ZFS doesn't 
> support nested vdevs.

For terminology,
        top-level vdevs are striped
        vdevs can be protected

ZFS does not do concatenation and "nested vdevs" is not used because it is 
confusing.

For some, it is easier to think of the pool structure as a tree with top-level 
vdevs for
pool, log, and cache devices. If a fault occurs in a pool device, the status is 
propagated up
the tree. The device status can become faulted, setting the parent to degraded. 
If the 
top-level vdev is not redundant and copies are also not working, then the 
top-level pool
vdev is faulted. If a top-level pool vdev is faulted, then the state of the 
pool is changed
to faulted. 

For log and cache top-level vdevs, a fault changes the pool status to degraded, 
not faulted,
and I/O continues without using the log or cache.

If any device is faulted, then the pool state is degraded. Thus the status of 
the pool reflects
the status of all of the devices in the pool.
 -- richard

--

richard.ell...@richardelling.com
+1-760-896-4422



_______________________________________________
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss

Re: [OpenIndiana-discuss] HBA failover

Reply via email to