Re: Just in case it is of general interest: ZFS mirroring was the culprit in our case

2007-11-13 Thread David Carter
On Tue, 13 Nov 2007, Pascal Gienger wrote:

> Our latency problems went away like a miracle when we detached one half 
> of the mirror (so it is no more a mirror).
>
> Read-Rates are doubled (not per device, the total read rate!), latency 
> is cut off. No more latency problems.
>
> When attaching the volume again, resilvering puts the system to a halt -
> reads and writes do block for seconds (!).

Definitely of interest to those of us keeping one eye on ZFS. Thanks. Can 
someone else running ZFS confirm this behaviour?

-- 
David Carter Email: [EMAIL PROTECTED]
University Computing Service,Phone: (01223) 334502
New Museums Site, Pembroke Street,   Fax:   (01223) 334679
Cambridge UK. CB2 3QH.

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Just in case it is of general interest: ZFS mirroring was the culprit in our case

2007-11-13 Thread Dale Ghent

Interesting. What's your kernel patch level?

We're running on 125101-10 with the exact same configuration as you  
(mirrored to two arrays, in separate buildings even) and haven't seen  
this problem.

/dale


On Nov 13, 2007, at 1:23 AM, Pascal Gienger wrote:

> Our latency problems went away like a miracle when we detached one  
> half of
> the mirror (so it is no more a mirror).
>
> Read-Rates are doubled (not per device, the total read rate!),  
> latency is
> cut off. No more latency problems.
>
> When attaching the volume again, resilvering puts the system to a  
> halt -
> reads and writes do block for seconds (!). We will go on directly  
> with Sun
> to solve the problem. Their "lowest I/O-priority to resilver disks"  
> does
> not seem to be effective. It really blocks the kernel and you end up  
> with
> thousand locks in "zfs_zget".
>
> We have two SAN volumes in different buildings which are NOT the
> bottleneck, tests show it.
>
> Pascal
> 
> Cyrus Home Page: http://cyrusimap.web.cmu.edu/
> Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
> List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
>

--
Dale Ghent
Specialist, Storage and UNIX Systems
UMBC - Office of Information Technology
ECS 201 - x51705




Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Just in case it is of general interest: ZFS mirroring was the culprit in our case

2007-11-13 Thread Rob Banz

...though, we have seen super-greedyness of ZFS when resilvering. ;)

On Nov 13, 2007, at 09:17, Dale Ghent wrote:

>
> Interesting. What's your kernel patch level?
>
> We're running on 125101-10 with the exact same configuration as you
> (mirrored to two arrays, in separate buildings even) and haven't seen
> this problem.
>
> /dale
>
>
> On Nov 13, 2007, at 1:23 AM, Pascal Gienger wrote:
>
>> Our latency problems went away like a miracle when we detached one
>> half of
>> the mirror (so it is no more a mirror).
>>
>> Read-Rates are doubled (not per device, the total read rate!),
>> latency is
>> cut off. No more latency problems.
>>
>> When attaching the volume again, resilvering puts the system to a
>> halt -
>> reads and writes do block for seconds (!). We will go on directly
>> with Sun
>> to solve the problem. Their "lowest I/O-priority to resilver disks"
>> does
>> not seem to be effective. It really blocks the kernel and you end up
>> with
>> thousand locks in "zfs_zget".
>>
>> We have two SAN volumes in different buildings which are NOT the
>> bottleneck, tests show it.
>>
>> Pascal
>> 
>> Cyrus Home Page: http://cyrusimap.web.cmu.edu/
>> Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
>> List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
>>
>
> --
> Dale Ghent
> Specialist, Storage and UNIX Systems
> UMBC - Office of Information Technology
> ECS 201 - x51705
>
>
>
> 
> Cyrus Home Page: http://cyrusimap.web.cmu.edu/
> Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
> List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Just in case it is of general interest: ZFS mirroring was the culprit in our case

2007-11-13 Thread Vincent Fox

Can you expand on this, like a LOT?

I recall a while ago you brought up some performance issues and
said you had found hacks for them.  Were those issues actually unresolved
or are you talking about something else?  I don't see any recent posts by
you about problems with your Cyrus install.

I'm struggling to see the mechanism by which mirroring creates a problem.
Were you resilvering at the time?

Pascal Gienger wrote:
> Our latency problems went away like a miracle when we detached one half of 
> the mirror (so it is no more a mirror).
>
> Read-Rates are doubled (not per device, the total read rate!), latency is 
> cut off. No more latency problems.
>
> When attaching the volume again, resilvering puts the system to a halt - 
> reads and writes do block for seconds (!). We will go on directly with Sun 
> to solve the problem. Their "lowest I/O-priority to resilver disks" does 
> not seem to be effective. It really blocks the kernel and you end up with 
> thousand locks in "zfs_zget".
>
> We have two SAN volumes in different buildings which are NOT the 
> bottleneck, tests show it.
>
> Pascal
> 
> Cyrus Home Page: http://cyrusimap.web.cmu.edu/
> Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
> List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
>   


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html