[ceph-users] Re: High IO utilization for bstore_kv_sync

2024-02-22 Thread Anthony D'Atri



>  you can sometimes find really good older drives like Intel P4510s on ebay 
> for reasonable prices.  Just watch out for how much write wear they have on 
> them.

Also be sure to update to the latest firmware before use, then issue a Secure 
Erase.
> 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: High IO utilization for bstore_kv_sync

2024-02-22 Thread Mark Nelson
The biggest improvement would be to put all of the OSDs on SSDs with 
PLP.  Next would be to put the WAL/DB on drives with PLP.  If price is a 
concern,  you can sometimes find really good older drives like Intel 
P4510s on ebay for reasonable prices.  Just watch out for how much write 
wear they have on them.



I had an experimental PR that I was playing with to see if I could queue 
up more IO at once in the bstore_kv_sync thread here:


https://github.com/ceph/ceph/pull/50610


I didn't have the proper gear to test it though so it just kind of 
languished and was closed by the bot.  The idea was just a proof of 
concept to see if we could reduce the number of fdatasyncs by manually 
introducing latency and letting more IOs accumulate before doing a flush.



Mark


On 2/22/24 11:29, Work Ceph wrote:

Thanks for the prompt response!

I see, and indeed some of them are consumer SSD disks. Is there any 
parameter that we can change/tune to better handle the call "fdatsync"?


Maybe using NVMEs for the RocksDB?

On Thu, Feb 22, 2024 at 2:24 PM Mark Nelson  wrote:

Most likely you are seeing time spent waiting on fdatsync in
bstore_kv_sync if the drives you are using don't have power loss
protection and can't perform flushes quickly.  Some consumer grade
drives are actually slower at this than HDDs.


Mark


On 2/22/24 11:04, Work Ceph wrote:
> Hello guys,
> We are running Ceph Octopus on Ubuntu 18.04, and we are noticing
spikes of
> IO utilization for bstore_kv_sync thread during processes such
as adding a
> new pool and increasing/reducing the number of PGs in a pool.
>
> It is funny though that the IO utilization (reported with IOTOP)
is 99.99%,
> but the reading for R/W speeds are slow. The devices where we
are seeing
> these issues are all SSDs systems. We are not using high end
SSDs devices
> though.
>
> Have you guys seen such behavior?
>
> Also, do you guys have any clues on why the IO utilization would
be high,
> when there is such a small amount of data being read and written
to the
> OSD/disks?
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

-- 
Best Regards,

Mark Nelson
Head of Research and Development

Clyso GmbH
p: +49 89 21552391 12 | a: Minnesota, USA
w: https://clyso.com | e: mark.nel...@clyso.com

We are hiring: https://www.clyso.com/jobs/
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


--
Best Regards,
Mark Nelson
Head of Research and Development

Clyso GmbH
p: +49 89 21552391 12 | a: Minnesota, USA
w: https://clyso.com | e: mark.nel...@clyso.com

We are hiring: https://www.clyso.com/jobs/
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: High IO utilization for bstore_kv_sync

2024-02-22 Thread Work Ceph
Thanks for the prompt response!

I see, and indeed some of them are consumer SSD disks. Is there any
parameter that we can change/tune to better handle the call "fdatsync"?

Maybe using NVMEs for the RocksDB?

On Thu, Feb 22, 2024 at 2:24 PM Mark Nelson  wrote:

> Most likely you are seeing time spent waiting on fdatsync in
> bstore_kv_sync if the drives you are using don't have power loss
> protection and can't perform flushes quickly.  Some consumer grade
> drives are actually slower at this than HDDs.
>
>
> Mark
>
>
> On 2/22/24 11:04, Work Ceph wrote:
> > Hello guys,
> > We are running Ceph Octopus on Ubuntu 18.04, and we are noticing spikes
> of
> > IO utilization for bstore_kv_sync thread during processes such as adding
> a
> > new pool and increasing/reducing the number of PGs in a pool.
> >
> > It is funny though that the IO utilization (reported with IOTOP) is
> 99.99%,
> > but the reading for R/W speeds are slow. The devices where we are seeing
> > these issues are all SSDs systems. We are not using high end SSDs devices
> > though.
> >
> > Have you guys seen such behavior?
> >
> > Also, do you guys have any clues on why the IO utilization would be high,
> > when there is such a small amount of data being read and written to the
> > OSD/disks?
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
> --
> Best Regards,
> Mark Nelson
> Head of Research and Development
>
> Clyso GmbH
> p: +49 89 21552391 12 | a: Minnesota, USA
> w: https://clyso.com | e: mark.nel...@clyso.com
>
> We are hiring: https://www.clyso.com/jobs/
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: High IO utilization for bstore_kv_sync

2024-02-22 Thread Mark Nelson
Most likely you are seeing time spent waiting on fdatsync in 
bstore_kv_sync if the drives you are using don't have power loss 
protection and can't perform flushes quickly.  Some consumer grade 
drives are actually slower at this than HDDs.



Mark


On 2/22/24 11:04, Work Ceph wrote:

Hello guys,
We are running Ceph Octopus on Ubuntu 18.04, and we are noticing spikes of
IO utilization for bstore_kv_sync thread during processes such as adding a
new pool and increasing/reducing the number of PGs in a pool.

It is funny though that the IO utilization (reported with IOTOP) is 99.99%,
but the reading for R/W speeds are slow. The devices where we are seeing
these issues are all SSDs systems. We are not using high end SSDs devices
though.

Have you guys seen such behavior?

Also, do you guys have any clues on why the IO utilization would be high,
when there is such a small amount of data being read and written to the
OSD/disks?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


--
Best Regards,
Mark Nelson
Head of Research and Development

Clyso GmbH
p: +49 89 21552391 12 | a: Minnesota, USA
w: https://clyso.com | e: mark.nel...@clyso.com

We are hiring: https://www.clyso.com/jobs/
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io