Re: [ceph-users] where does 100% RBD utilization come from?

2020-01-14 Thread vitalif

Hi Philip,

I'm not sure if we're talking about the same thing but I was also 
confused when I didn't see 100% OSD drive utilization during my first 
RBD write benchmark. Since then I collect all my confusion here 
https://yourcmc.ru/wiki/Ceph_performance :)


100% RBD utilization means that something waits for some I/O ops on this 
device to complete all the time.


This "something" (client software) can't produce more I/O operations 
while it's waiting for previous ones to complete, that's why it can't 
saturate your OSDs and your network.


OSDs can't send more write requests to the drives while they're not done 
with calculating object states on the CPU or while they're busy with 
network I/O. That's why OSDs can't saturate drives.


Simply said: Ceph is slow. Partly because of the network roundtrips (you 
have 3 of them: client -> iscsi -> primary osd -> secondary osds), 
partly because it's just slow.


Of course it's not TERRIBLY slow, so software that can send I/O requests 
in batches (i.e. use async I/O) feels fine. But software that sends I/Os 
one by one (because of transactional requirements or just stupidity like 
Oracle) runs very slow.



Also..

"It seems like your RBD can't flush it's I/O fast enough"
implies that there is some particular measure of "fast enough", that
is a tunable value somewhere.
If my network cards arent blocked, and my OSDs arent blocked...
then doesnt that mean that I can and should "turn that knob" up?

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] where does 100% RBD utilization come from?

2020-01-14 Thread Philip Brown
Also..

"It seems like your RBD can't flush it's I/O fast enough"
implies that there is some particular measure of "fast enough", that is a 
tunable value somewhere.
If my network cards arent blocked, and my OSDs arent blocked...
then doesnt that mean that I can and should "turn that knob" up?


- Original Message -
From: "Wido den Hollander" 
To: "Philip Brown" , "ceph-users" 
Sent: Tuesday, January 14, 2020 12:42:48 AM
Subject: Re: [ceph-users] where does 100% RBD utilization come from?


The util is calculated based on average waits, see:
https://coderwall.com/p/utc42q/understanding-iostat

Just improving performance isn't just turning a knob and it will happen.
It seems like your RBD can't flush it's I/O fast enough and that causes
the iowait to go up.

This can be all kinds of things:

- Network (latency)
- CPU on the OSDs

Wido

> 
> 
> --
> Philip Brown| Sr. Linux System Administrator | Medata, Inc. 
> 5 Peters Canyon Rd Suite 250 
> Irvine CA 92606 
> Office 714.918.1310| Fax 714.918.1325 
> pbr...@medata.com| www.medata.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] where does 100% RBD utilization come from?

2020-01-14 Thread Philip Brown
The odd thing is:
the network interfaces on the gateways dont seem to be at 100% capacity
and the OSD disks dont seem to be at 100% utilization.
so I'm confused where this could be getting held up.




- Original Message -
From: "Wido den Hollander" 
To: "Philip Brown" , "ceph-users" 
Sent: Tuesday, January 14, 2020 12:42:48 AM
Subject: Re: [ceph-users] where does 100% RBD utilization come from?

 

The util is calculated based on average waits, see:
https://coderwall.com/p/utc42q/understanding-iostat

Just improving performance isn't just turning a knob and it will happen.
It seems like your RBD can't flush it's I/O fast enough and that causes
the iowait to go up.

This can be all kinds of things:

- Network (latency)
- CPU on the OSDs

Wido
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] where does 100% RBD utilization come from?

2020-01-14 Thread Wido den Hollander



On 1/10/20 7:43 PM, Philip Brown wrote:
> Surprisingly, a google search didnt seem to find the answer on this, so guess 
> I should ask here:
> 
> what determines if an rdb is "100% busy"?
> 
> I have some backend OSDs, and an iSCSI gateway, serving out some RBDs.
> 
> iostat on the gateway says rbd is 100% utilized
> 
> iostat on individual OSds only goes as high as about 60% on a per-device 
> basis.
> CPU is idle.
> Doesnt seem like network interface is capped either.
> 
> So.. how do I improve RBD throughput?
> 

The util is calculated based on average waits, see:
https://coderwall.com/p/utc42q/understanding-iostat

Just improving performance isn't just turning a knob and it will happen.
It seems like your RBD can't flush it's I/O fast enough and that causes
the iowait to go up.

This can be all kinds of things:

- Network (latency)
- CPU on the OSDs

Wido

> 
> 
> --
> Philip Brown| Sr. Linux System Administrator | Medata, Inc. 
> 5 Peters Canyon Rd Suite 250 
> Irvine CA 92606 
> Office 714.918.1310| Fax 714.918.1325 
> pbr...@medata.com| www.medata.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com