Re: Just in case it is of general interest: ZFS mirroring was the culprit in our case

2007-11-13 Thread David Carter
On Tue, 13 Nov 2007, Pascal Gienger wrote:

 Our latency problems went away like a miracle when we detached one half 
 of the mirror (so it is no more a mirror).

 Read-Rates are doubled (not per device, the total read rate!), latency 
 is cut off. No more latency problems.

 When attaching the volume again, resilvering puts the system to a halt -
 reads and writes do block for seconds (!).

Definitely of interest to those of us keeping one eye on ZFS. Thanks. Can 
someone else running ZFS confirm this behaviour?

-- 
David Carter Email: [EMAIL PROTECTED]
University Computing Service,Phone: (01223) 334502
New Museums Site, Pembroke Street,   Fax:   (01223) 334679
Cambridge UK. CB2 3QH.

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Just in case it is of general interest: ZFS mirroring was the culprit in our case

2007-11-13 Thread Dale Ghent

Interesting. What's your kernel patch level?

We're running on 125101-10 with the exact same configuration as you  
(mirrored to two arrays, in separate buildings even) and haven't seen  
this problem.

/dale


On Nov 13, 2007, at 1:23 AM, Pascal Gienger wrote:

 Our latency problems went away like a miracle when we detached one  
 half of
 the mirror (so it is no more a mirror).

 Read-Rates are doubled (not per device, the total read rate!),  
 latency is
 cut off. No more latency problems.

 When attaching the volume again, resilvering puts the system to a  
 halt -
 reads and writes do block for seconds (!). We will go on directly  
 with Sun
 to solve the problem. Their lowest I/O-priority to resilver disks  
 does
 not seem to be effective. It really blocks the kernel and you end up  
 with
 thousand locks in zfs_zget.

 We have two SAN volumes in different buildings which are NOT the
 bottleneck, tests show it.

 Pascal
 
 Cyrus Home Page: http://cyrusimap.web.cmu.edu/
 Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
 List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


--
Dale Ghent
Specialist, Storage and UNIX Systems
UMBC - Office of Information Technology
ECS 201 - x51705




Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Just in case it is of general interest: ZFS mirroring was the culprit in our case

2007-11-13 Thread Rob Banz

...though, we have seen super-greedyness of ZFS when resilvering. ;)

On Nov 13, 2007, at 09:17, Dale Ghent wrote:


 Interesting. What's your kernel patch level?

 We're running on 125101-10 with the exact same configuration as you
 (mirrored to two arrays, in separate buildings even) and haven't seen
 this problem.

 /dale


 On Nov 13, 2007, at 1:23 AM, Pascal Gienger wrote:

 Our latency problems went away like a miracle when we detached one
 half of
 the mirror (so it is no more a mirror).

 Read-Rates are doubled (not per device, the total read rate!),
 latency is
 cut off. No more latency problems.

 When attaching the volume again, resilvering puts the system to a
 halt -
 reads and writes do block for seconds (!). We will go on directly
 with Sun
 to solve the problem. Their lowest I/O-priority to resilver disks
 does
 not seem to be effective. It really blocks the kernel and you end up
 with
 thousand locks in zfs_zget.

 We have two SAN volumes in different buildings which are NOT the
 bottleneck, tests show it.

 Pascal
 
 Cyrus Home Page: http://cyrusimap.web.cmu.edu/
 Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
 List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


 --
 Dale Ghent
 Specialist, Storage and UNIX Systems
 UMBC - Office of Information Technology
 ECS 201 - x51705



 
 Cyrus Home Page: http://cyrusimap.web.cmu.edu/
 Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
 List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Just in case it is of general interest: ZFS mirroring was the culprit in our case

2007-11-13 Thread Vincent Fox

Can you expand on this, like a LOT?

I recall a while ago you brought up some performance issues and
said you had found hacks for them.  Were those issues actually unresolved
or are you talking about something else?  I don't see any recent posts by
you about problems with your Cyrus install.

I'm struggling to see the mechanism by which mirroring creates a problem.
Were you resilvering at the time?

Pascal Gienger wrote:
 Our latency problems went away like a miracle when we detached one half of 
 the mirror (so it is no more a mirror).

 Read-Rates are doubled (not per device, the total read rate!), latency is 
 cut off. No more latency problems.

 When attaching the volume again, resilvering puts the system to a halt - 
 reads and writes do block for seconds (!). We will go on directly with Sun 
 to solve the problem. Their lowest I/O-priority to resilver disks does 
 not seem to be effective. It really blocks the kernel and you end up with 
 thousand locks in zfs_zget.

 We have two SAN volumes in different buildings which are NOT the 
 bottleneck, tests show it.

 Pascal
 
 Cyrus Home Page: http://cyrusimap.web.cmu.edu/
 Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
 List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
   


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Just in case it is of general interest: ZFS mirroring was the culprit in our case

2007-11-12 Thread Pascal Gienger
Our latency problems went away like a miracle when we detached one half of 
the mirror (so it is no more a mirror).

Read-Rates are doubled (not per device, the total read rate!), latency is 
cut off. No more latency problems.

When attaching the volume again, resilvering puts the system to a halt - 
reads and writes do block for seconds (!). We will go on directly with Sun 
to solve the problem. Their lowest I/O-priority to resilver disks does 
not seem to be effective. It really blocks the kernel and you end up with 
thousand locks in zfs_zget.

We have two SAN volumes in different buildings which are NOT the 
bottleneck, tests show it.

Pascal

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html