Re: [ceph-users] where does 100% RBD utilization come from?
Hi Philip, I'm not sure if we're talking about the same thing but I was also confused when I didn't see 100% OSD drive utilization during my first RBD write benchmark. Since then I collect all my confusion here https://yourcmc.ru/wiki/Ceph_performance :) 100% RBD utilization means that something waits for some I/O ops on this device to complete all the time. This "something" (client software) can't produce more I/O operations while it's waiting for previous ones to complete, that's why it can't saturate your OSDs and your network. OSDs can't send more write requests to the drives while they're not done with calculating object states on the CPU or while they're busy with network I/O. That's why OSDs can't saturate drives. Simply said: Ceph is slow. Partly because of the network roundtrips (you have 3 of them: client -> iscsi -> primary osd -> secondary osds), partly because it's just slow. Of course it's not TERRIBLY slow, so software that can send I/O requests in batches (i.e. use async I/O) feels fine. But software that sends I/Os one by one (because of transactional requirements or just stupidity like Oracle) runs very slow. Also.. "It seems like your RBD can't flush it's I/O fast enough" implies that there is some particular measure of "fast enough", that is a tunable value somewhere. If my network cards arent blocked, and my OSDs arent blocked... then doesnt that mean that I can and should "turn that knob" up? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] where does 100% RBD utilization come from?
Also.. "It seems like your RBD can't flush it's I/O fast enough" implies that there is some particular measure of "fast enough", that is a tunable value somewhere. If my network cards arent blocked, and my OSDs arent blocked... then doesnt that mean that I can and should "turn that knob" up? - Original Message - From: "Wido den Hollander" To: "Philip Brown" , "ceph-users" Sent: Tuesday, January 14, 2020 12:42:48 AM Subject: Re: [ceph-users] where does 100% RBD utilization come from? The util is calculated based on average waits, see: https://coderwall.com/p/utc42q/understanding-iostat Just improving performance isn't just turning a knob and it will happen. It seems like your RBD can't flush it's I/O fast enough and that causes the iowait to go up. This can be all kinds of things: - Network (latency) - CPU on the OSDs Wido > > > -- > Philip Brown| Sr. Linux System Administrator | Medata, Inc. > 5 Peters Canyon Rd Suite 250 > Irvine CA 92606 > Office 714.918.1310| Fax 714.918.1325 > pbr...@medata.com| www.medata.com > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] where does 100% RBD utilization come from?
The odd thing is: the network interfaces on the gateways dont seem to be at 100% capacity and the OSD disks dont seem to be at 100% utilization. so I'm confused where this could be getting held up. - Original Message - From: "Wido den Hollander" To: "Philip Brown" , "ceph-users" Sent: Tuesday, January 14, 2020 12:42:48 AM Subject: Re: [ceph-users] where does 100% RBD utilization come from? The util is calculated based on average waits, see: https://coderwall.com/p/utc42q/understanding-iostat Just improving performance isn't just turning a knob and it will happen. It seems like your RBD can't flush it's I/O fast enough and that causes the iowait to go up. This can be all kinds of things: - Network (latency) - CPU on the OSDs Wido ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] where does 100% RBD utilization come from?
On 1/10/20 7:43 PM, Philip Brown wrote: > Surprisingly, a google search didnt seem to find the answer on this, so guess > I should ask here: > > what determines if an rdb is "100% busy"? > > I have some backend OSDs, and an iSCSI gateway, serving out some RBDs. > > iostat on the gateway says rbd is 100% utilized > > iostat on individual OSds only goes as high as about 60% on a per-device > basis. > CPU is idle. > Doesnt seem like network interface is capped either. > > So.. how do I improve RBD throughput? > The util is calculated based on average waits, see: https://coderwall.com/p/utc42q/understanding-iostat Just improving performance isn't just turning a knob and it will happen. It seems like your RBD can't flush it's I/O fast enough and that causes the iowait to go up. This can be all kinds of things: - Network (latency) - CPU on the OSDs Wido > > > -- > Philip Brown| Sr. Linux System Administrator | Medata, Inc. > 5 Peters Canyon Rd Suite 250 > Irvine CA 92606 > Office 714.918.1310| Fax 714.918.1325 > pbr...@medata.com| www.medata.com > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] where does 100% RBD utilization come from?
Surprisingly, a google search didnt seem to find the answer on this, so guess I should ask here: what determines if an rdb is "100% busy"? I have some backend OSDs, and an iSCSI gateway, serving out some RBDs. iostat on the gateway says rbd is 100% utilized iostat on individual OSds only goes as high as about 60% on a per-device basis. CPU is idle. Doesnt seem like network interface is capped either. So.. how do I improve RBD throughput? -- Philip Brown| Sr. Linux System Administrator | Medata, Inc. 5 Peters Canyon Rd Suite 250 Irvine CA 92606 Office 714.918.1310| Fax 714.918.1325 pbr...@medata.com| www.medata.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com