[ceph-users] Re: Unexpected slow read for HDD cluster (good write speed)

2023-03-28 Thread Marc
Yes it pays off to know what to do before you do it, instead of after. If you 
complain about speed, is it a general unfounded complaint or did you compare 
ceph with similar solutions? I have really no idea what the standards are for 
these types of solutions. I can remember asking at such seminar that 'general' 
performance numbers should published for every release, so people are not 
having to go to the ordeal to investigate on their own. However I also get that 
there is a technical performance limit for distributing your data like this.

I bookmarked this quite a while ago, if you are in dire need you can do some 
external caching for rbd's.
https://docs.ceph.com/en/pacific/rbd/rbd-persistent-write-back-cache/


> Yes, during my last adventure of trying to get any reasonable
> performance out of ceph, i realized my testing methodology was wrong.
> Both the kernel client and qemu have queues everywhere that make the
> numbers hard to understand.
> 
> fio has rbd support, which gives more useful values.
> 
> https://subscription.packtpub.com/book/cloud-&-
> networking/9781784393502/10/ch10lvl1sec112/benchmarking-ceph-rbd-using-
> fio
> 
> frustratingly, much lower ones, showing just how slow ceph actually is.
> 
> On Sat, Mar 18, 2023 at 8:59 PM Rafael Weingartner
>  wrote:
> >
> > Hello guys!
> >
> > I would like to ask if somebody has already experienced a similar
> > situation. We have a new cluster with 5 nodes with the following
> setup:
> >
> >- 128 GB of RAM
> >- 2 cpus Intel(R) Intel Xeon Silver 4210R
> >- 1 NVME of 2 TB for the rocks DB caching
> >- 5 HDDs of 14TB
> >- 1 NIC dual port of 25GiB in BOND mode.
> >
> >
> > We are starting with a single dual port NIC (the bond has 50GiB in
> total),
> > the design has been prepared so a new NIC can be added, and a new BOND
> can
> > be created, where we intend to offload the cluster network. Therefore,
> > logically speaking, we already configured different VLANs and networks
> for
> > public and cluster traffic of Ceph.
> >
> >
> > We are using Ubuntu 20.04 with Ceph Octopus. It is a standard
> deployment
> > that we are used to. During our initial validations and evaluations of
> the
> > cluster, we are reaching write speeds between 250-300MB/s, which would
> be
> > the ballpark for this kind of setup for HDDs with the NVME as Rocks.db
> > cache (in our experience). However, the issue is the reading process.
> While
> > reading, we barely hit the mark of 100MB/s; we would expect at least
> > something similar to the write speed. These tests are being performed
> in a
> > pool with a replication factor of 3.
> >
> >
> > We have already checked the disks, and they all seem to be reading
> just
> > fine. The network does not seem to be the bottleneck either (checked
> with
> > atop while reading/writing to the cluster).
> >
> >
> > Have you guys ever encountered similar situations? Do you have any
> tips for
> > us to proceed with the troubleshooting?
> >
> >
> > We suspect that we are missing some small tuning detail, which is
> affecting
> > the read performance only, but so far we could not pinpoint it. Any
> help
> > would be much appreciated :)
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> 
> 
> 
> --
> +4916093821054
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Unexpected slow read for HDD cluster (good write speed)

2023-03-28 Thread Arvid Picciani
Yes, during my last adventure of trying to get any reasonable
performance out of ceph, i realized my testing methodology was wrong.
Both the kernel client and qemu have queues everywhere that make the
numbers hard to understand.

fio has rbd support, which gives more useful values.

https://subscription.packtpub.com/book/cloud-&-networking/9781784393502/10/ch10lvl1sec112/benchmarking-ceph-rbd-using-fio

frustratingly, much lower ones, showing just how slow ceph actually is.

On Sat, Mar 18, 2023 at 8:59 PM Rafael Weingartner
 wrote:
>
> Hello guys!
>
> I would like to ask if somebody has already experienced a similar
> situation. We have a new cluster with 5 nodes with the following setup:
>
>- 128 GB of RAM
>- 2 cpus Intel(R) Intel Xeon Silver 4210R
>- 1 NVME of 2 TB for the rocks DB caching
>- 5 HDDs of 14TB
>- 1 NIC dual port of 25GiB in BOND mode.
>
>
> We are starting with a single dual port NIC (the bond has 50GiB in total),
> the design has been prepared so a new NIC can be added, and a new BOND can
> be created, where we intend to offload the cluster network. Therefore,
> logically speaking, we already configured different VLANs and networks for
> public and cluster traffic of Ceph.
>
>
> We are using Ubuntu 20.04 with Ceph Octopus. It is a standard deployment
> that we are used to. During our initial validations and evaluations of the
> cluster, we are reaching write speeds between 250-300MB/s, which would be
> the ballpark for this kind of setup for HDDs with the NVME as Rocks.db
> cache (in our experience). However, the issue is the reading process. While
> reading, we barely hit the mark of 100MB/s; we would expect at least
> something similar to the write speed. These tests are being performed in a
> pool with a replication factor of 3.
>
>
> We have already checked the disks, and they all seem to be reading just
> fine. The network does not seem to be the bottleneck either (checked with
> atop while reading/writing to the cluster).
>
>
> Have you guys ever encountered similar situations? Do you have any tips for
> us to proceed with the troubleshooting?
>
>
> We suspect that we are missing some small tuning detail, which is affecting
> the read performance only, but so far we could not pinpoint it. Any help
> would be much appreciated :)
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io



-- 
+4916093821054
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Unexpected slow read for HDD cluster (good write speed)

2023-03-20 Thread Janne Johansson
Den mån 20 mars 2023 kl 09:45 skrev Marc :
>
> > While
> > reading, we barely hit the mark of 100MB/s; we would expect at least
> > something similar to the write speed. These tests are being performed in
> > a
> > pool with a replication factor of 3.
> >
> >
>
> You don't even describe how you test? And why would you expect something like 
> the write speed, as you described, that is a totally different configuration.

Yes, writes hit caches and can be async, whereas reads (at least large
relevant read tests) needs to get actual data off the drives, and not
from caches.
Getting 100MB/s on data not in caches from hdd drives seems reasonable
for a simplistic test (ie, one where you request a certain amount of
data, wait for it to arrive, then read a bit more and repeat for a
number of iterations).

If you instead spin up 10 guests and have all of them do read tests,
chances are you are going to see huge speedups. Not per-guest, but in
the sum of IO that your cluster can handle, which is what ceph is
aiming for. Being a good cluster storage system for cluster usage.

-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Unexpected slow read for HDD cluster (good write speed)

2023-03-20 Thread Marc
> While
> reading, we barely hit the mark of 100MB/s; we would expect at least
> something similar to the write speed. These tests are being performed in
> a
> pool with a replication factor of 3.
> 
> 

You don't even describe how you test? And why would you expect something like 
the write speed, as you described, that is a totally different configuration.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io