Re: Just in case it is of general interest: ZFS mirroring was the culprit in our case
On Tue, 13 Nov 2007, Pascal Gienger wrote: Our latency problems went away like a miracle when we detached one half of the mirror (so it is no more a mirror). Read-Rates are doubled (not per device, the total read rate!), latency is cut off. No more latency problems. When attaching the volume again, resilvering puts the system to a halt - reads and writes do block for seconds (!). Definitely of interest to those of us keeping one eye on ZFS. Thanks. Can someone else running ZFS confirm this behaviour? -- David Carter Email: [EMAIL PROTECTED] University Computing Service,Phone: (01223) 334502 New Museums Site, Pembroke Street, Fax: (01223) 334679 Cambridge UK. CB2 3QH. Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Just in case it is of general interest: ZFS mirroring was the culprit in our case
Interesting. What's your kernel patch level? We're running on 125101-10 with the exact same configuration as you (mirrored to two arrays, in separate buildings even) and haven't seen this problem. /dale On Nov 13, 2007, at 1:23 AM, Pascal Gienger wrote: Our latency problems went away like a miracle when we detached one half of the mirror (so it is no more a mirror). Read-Rates are doubled (not per device, the total read rate!), latency is cut off. No more latency problems. When attaching the volume again, resilvering puts the system to a halt - reads and writes do block for seconds (!). We will go on directly with Sun to solve the problem. Their lowest I/O-priority to resilver disks does not seem to be effective. It really blocks the kernel and you end up with thousand locks in zfs_zget. We have two SAN volumes in different buildings which are NOT the bottleneck, tests show it. Pascal Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html -- Dale Ghent Specialist, Storage and UNIX Systems UMBC - Office of Information Technology ECS 201 - x51705 Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Just in case it is of general interest: ZFS mirroring was the culprit in our case
...though, we have seen super-greedyness of ZFS when resilvering. ;) On Nov 13, 2007, at 09:17, Dale Ghent wrote: Interesting. What's your kernel patch level? We're running on 125101-10 with the exact same configuration as you (mirrored to two arrays, in separate buildings even) and haven't seen this problem. /dale On Nov 13, 2007, at 1:23 AM, Pascal Gienger wrote: Our latency problems went away like a miracle when we detached one half of the mirror (so it is no more a mirror). Read-Rates are doubled (not per device, the total read rate!), latency is cut off. No more latency problems. When attaching the volume again, resilvering puts the system to a halt - reads and writes do block for seconds (!). We will go on directly with Sun to solve the problem. Their lowest I/O-priority to resilver disks does not seem to be effective. It really blocks the kernel and you end up with thousand locks in zfs_zget. We have two SAN volumes in different buildings which are NOT the bottleneck, tests show it. Pascal Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html -- Dale Ghent Specialist, Storage and UNIX Systems UMBC - Office of Information Technology ECS 201 - x51705 Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Just in case it is of general interest: ZFS mirroring was the culprit in our case
Can you expand on this, like a LOT? I recall a while ago you brought up some performance issues and said you had found hacks for them. Were those issues actually unresolved or are you talking about something else? I don't see any recent posts by you about problems with your Cyrus install. I'm struggling to see the mechanism by which mirroring creates a problem. Were you resilvering at the time? Pascal Gienger wrote: Our latency problems went away like a miracle when we detached one half of the mirror (so it is no more a mirror). Read-Rates are doubled (not per device, the total read rate!), latency is cut off. No more latency problems. When attaching the volume again, resilvering puts the system to a halt - reads and writes do block for seconds (!). We will go on directly with Sun to solve the problem. Their lowest I/O-priority to resilver disks does not seem to be effective. It really blocks the kernel and you end up with thousand locks in zfs_zget. We have two SAN volumes in different buildings which are NOT the bottleneck, tests show it. Pascal Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Just in case it is of general interest: ZFS mirroring was the culprit in our case
Our latency problems went away like a miracle when we detached one half of the mirror (so it is no more a mirror). Read-Rates are doubled (not per device, the total read rate!), latency is cut off. No more latency problems. When attaching the volume again, resilvering puts the system to a halt - reads and writes do block for seconds (!). We will go on directly with Sun to solve the problem. Their lowest I/O-priority to resilver disks does not seem to be effective. It really blocks the kernel and you end up with thousand locks in zfs_zget. We have two SAN volumes in different buildings which are NOT the bottleneck, tests show it. Pascal Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html