Re: [ceph-users] Slow Ceph: Any plans on torrent-like transfers from OSDs ?

2018-09-14 Thread Maged Mokhtar



On 14/09/18 12:13, Alex Lupsa wrote:

Hi,
Thank you for the answer Ronny. I did indeed try 2x RBD drives 
(rdb-cache was already active), striping them, and got double 
write/read speed instantly. So I am chalking this one on KVM who is 
single-threaded and not fully ceph-aware it seems. Although I can see 
some threads talking about multi-threads coming to KVM.


I am however still of the opinion that all ceph osd replicas should be 
read from in the future, because the code is there already in the form 
of recovery so the amount of time vs the tremendous speeds should be 
worth it!


About dm-cache or bcache on osd's, which one would you recommend ?
Alex


Hi Alex,

If the number of concurrent io threads is higher than your total number 
of osds, then there is no point in dividing the load, it actually 
can/will reduce performance the higher this ratio is. So in most use 
cases where you have many vms doing many io, there would be no point.


In the more specific cases where you have more osds than io operations 
and your block sizes are not tiny, for example greater than 32k  (you 
can know your average io size by dividing cluster bandwidth by iops) you 
may try rbd striping feature and adjust your stripe unit/count 
accordingly., this could be quite good for example for low concurrency  
streaming applications. For small io concurrency with smaller block 
sizes, which is not a common use case, then probably go for raid level 
striping as you did, but maybe Ceph is not ideal for such cases.


Maged

*Ronny Aasen*ronny+ceph-users at aasen.cx 


/Sun Sep 9 03:13:47 PDT 2018/

  * Previous message (by thread): [ceph-users] Slow Ceph: Any plans on
torrent-like transfers from OSDs ?


  * Next message (by thread): [ceph-users] Slow Ceph: Any plans on
torrent-like transfers from OSDs ?


  * *Messages sorted by:* [ date ]


[ thread ]


[ subject ]


[ author ]



ceph is a distributed system, it scales by concurrent access to nodes.

generally a single client will access a single OSD at the time, iow max
possible single thread read is the read speak of the drive. and max
possible write is single drive write / (replication size-1)
but when you have many vm's accessing the same cluster the load is
spread all over (just like when you see the recovery running)

A single spinning disk should be able to do 100-150MB/s depending on
make and model. even with the overhead of ceph and networking so i still
think 20MB/s is a bit on the low side, depending on how you benchmark.

I would start by going thru this benchmarking guide, and see if you find
some issues:
https://tracker.ceph.com/projects/ceph/wiki/Benchmark_Ceph_Cluster_Performance


in order to get more singlethread performance out of ceph you must get
faster individual parts ( nvram disks/fast ram and processors/fast
network/etc/etc) or you can cheat by either spreading the load over more
disks. eg you can do rbd fancy striping, or attach multiple disk's with
individual controllers in the vm. or use caching and /or readahead.


when it comes to cache tiering i would remove that, it does not get the
love it needs. and redhat have even stopped supporting it in deployments.
but you can use dm-cache or bcache on osd's
or/and  rbd-cache on kvm clients.


good luck
Ronny Aasen

Den sön 9 sep. 2018 kl 11:20 skrev Alex Lupsa >:


Hi,
Any ideas about the below ?

Thanks,
Alex

--
Hi,
I have a really small homelab 3-node ceph cluster on consumer hw - thanks
to Proxmox for making it easy to deploy it.
The problem I am having is very very bad transfer rates, ie 20mb/sec for
both read and write on 17 OSDs with cache layer.
However during recovery the speed hover between 250 to 700mb/sec which
proves that the cluster IS capable of reaching way above those 20mb/sec in
KVM.

Reading the documentation, I see that during recovery "nearly all OSDs
participate in resilvering a new drive" - kind of a torrent of data
incoming from multiple sources at once, causing a huge deluge.

However I believe this does not happen during the normal transfers, so my
question is simply - is there any hidden tunables I can 

Re: [ceph-users] Slow Ceph: Any plans on torrent-like transfers from OSDs ?

2018-09-14 Thread Alex Lupsa
Hi,
Thank you for the answer Ronny. I did indeed try 2x RBD drives (rdb-cache
was already active), striping them, and got double write/read speed
instantly. So I am chalking this one on KVM who is single-threaded and not
fully ceph-aware it seems. Although I can see some threads talking about
multi-threads coming to KVM.

I am however still of the opinion that all ceph osd replicas should be read
from in the future, because the code is there already in the form of
recovery so the amount of time vs the tremendous speeds should be worth it!

About dm-cache or bcache on osd's, which one would you recommend ?

Alex




*Ronny Aasen* ronny+ceph-users at aasen.cx

*Sun Sep 9 03:13:47 PDT 2018*


   - Previous message (by thread): [ceph-users] Slow Ceph: Any plans on
   torrent-like transfers from OSDs ?
   

   - Next message (by thread): [ceph-users] Slow Ceph: Any plans on
   torrent-like transfers from OSDs ?
   

   - *Messages sorted by:* [ date ]
   

[ thread ]
   

[ subject ]
   

[ author ]
   


ceph is a distributed system, it scales by concurrent access to nodes.

generally a single client will access a single OSD at the time, iow max
possible single thread read is the read speak of the drive. and max
possible write is single drive write / (replication size-1)
but when you have many vm's accessing the same cluster the load is
spread all over (just like when you see the recovery running)

A single spinning disk should be able to do 100-150MB/s depending on
make and model. even with the overhead of ceph and networking so i still
think 20MB/s is a bit on the low side, depending on how you benchmark.

I would start by going thru this benchmarking guide, and see if you find
some 
issues:https://tracker.ceph.com/projects/ceph/wiki/Benchmark_Ceph_Cluster_Performance


in order to get more singlethread performance out of ceph you must get
faster individual parts ( nvram disks/fast ram and processors/fast
network/etc/etc) or you can cheat by either spreading the load over more
disks. eg you can do rbd fancy striping, or attach multiple disk's with
individual controllers in the vm. or use caching and /or readahead.


when it comes to cache tiering i would remove that, it does not get the
love it needs. and redhat have even stopped supporting it in deployments.
but you can use dm-cache or bcache on osd's
or/and  rbd-cache on kvm clients.


good luck
Ronny Aasen


Den sön 9 sep. 2018 kl 11:20 skrev Alex Lupsa :

> Hi,
> Any ideas about the below ?
>
> Thanks,
> Alex
>
> --
> Hi,
> I have a really small homelab 3-node ceph cluster on consumer hw - thanks
> to Proxmox for making it easy to deploy it.
> The problem I am having is very very bad transfer rates, ie 20mb/sec for
> both read and write on 17 OSDs with cache layer.
> However during recovery the speed hover between 250 to 700mb/sec which
> proves that the cluster IS capable of reaching way above those 20mb/sec in
> KVM.
>
> Reading the documentation, I see that during recovery "nearly all OSDs
> participate in resilvering a new drive" - kind of a torrent of data
> incoming from multiple sources at once, causing a huge deluge.
>
> However I believe this does not happen during the normal transfers, so my
> question is simply - is there any hidden tunables I can enable for this
> with the implied cost of network and heavy usage of disks ? Will there be
> in the future if not ?
>
> I have tried disabling authx, upgrading the network to 10gbit, have bigger
> journals, more bluestore cache and disabled the debugging logs as it has
> been advised on the list. The only thing that did help a bit was cache
> tiering, but this only helps somewhat as the ops do not get promoted unless
> I am very adamant about keeping programs in KVM open for very long times so
> that the writes/reads are promoted.
> To add some to the injury, once the cache gets full - the whole 3-node
> cluster grinds to a full halt until I start forcefully evict data from the
> cache... manually!
> So I am therefore guessing a really bad misconfiguration from my side.
>
> Next step would be removing the cache layer and using those SSDs as bcache
> instead as it seems to yeld 5x the results, even though it does add yet
> another layer of complexity and 

Re: [ceph-users] Slow Ceph: Any plans on torrent-like transfers from OSDs ?

2018-09-09 Thread Jarek
On Sun, 9 Sep 2018 11:20:01 +0200
Alex Lupsa  wrote:

> Hi,
> Any ideas about the below ?

Don't use consumer grade ssd for Ceph cache/block.db/bcache.



> Thanks,
> Alex
> 
> --
> Hi,
> I have a really small homelab 3-node ceph cluster on consumer hw -
> thanks to Proxmox for making it easy to deploy it.
> The problem I am having is very very bad transfer rates, ie 20mb/sec
> for both read and write on 17 OSDs with cache layer.
> However during recovery the speed hover between 250 to 700mb/sec which
> proves that the cluster IS capable of reaching way above those
> 20mb/sec in KVM.
> 
> Reading the documentation, I see that during recovery "nearly all OSDs
> participate in resilvering a new drive" - kind of a torrent of data
> incoming from multiple sources at once, causing a huge deluge.
> 
> However I believe this does not happen during the normal transfers,
> so my question is simply - is there any hidden tunables I can enable
> for this with the implied cost of network and heavy usage of disks ?
> Will there be in the future if not ?
> 
> I have tried disabling authx, upgrading the network to 10gbit, have
> bigger journals, more bluestore cache and disabled the debugging logs
> as it has been advised on the list. The only thing that did help a
> bit was cache tiering, but this only helps somewhat as the ops do not
> get promoted unless I am very adamant about keeping programs in KVM
> open for very long times so that the writes/reads are promoted.
> To add some to the injury, once the cache gets full - the whole 3-node
> cluster grinds to a full halt until I start forcefully evict data
> from the cache... manually!
> So I am therefore guessing a really bad misconfiguration from my side.
> 
> Next step would be removing the cache layer and using those SSDs as
> bcache instead as it seems to yeld 5x the results, even though it
> does add yet another layer of complexity and RAM requirements.
> 
> Full config details:https://pastebin.com/xUM7VF9k
> 
> rados bench -p ceph_pool 30 write
> Total time run: 30.983343
> Total writes made:  762
> Write size: 4194304
> Object size:4194304
> Bandwidth (MB/sec): 98.3754
> Stddev Bandwidth:   20.9586
> Max bandwidth (MB/sec): 132
> Min bandwidth (MB/sec): 16
> Average IOPS:   24
> Stddev IOPS:5
> Max IOPS:   33
> Min IOPS:   4
> Average Latency(s): 0.645017
> Stddev Latency(s):  0.326411
> Max latency(s): 2.08067
> Min latency(s): 0.0355789
> Cleaning up (deleting benchmark objects)
> Removed 762 objects
> Clean up completed and total clean up time :3.925631
> 
> Thanks,
> Alex



-- 
Pozdrawiam
Jarosław Mociak - Nettelekom GK Sp. z o.o.



pgpiGzmDSwfS4.pgp
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Slow Ceph: Any plans on torrent-like transfers from OSDs ?

2018-09-09 Thread Ronny Aasen

ceph is a distributed system, it scales by concurrent access to nodes.

generally a single client will access a single OSD at the time, iow max 
possible single thread read is the read speak of the drive. and max 
possible write is single drive write / (replication size-1)
but when you have many vm's accessing the same cluster the load is 
spread all over (just like when you see the recovery running)


A single spinning disk should be able to do 100-150MB/s depending on 
make and model. even with the overhead of ceph and networking so i still 
think 20MB/s is a bit on the low side, depending on how you benchmark.


I would start by going thru this benchmarking guide, and see if you find 
some issues:

https://tracker.ceph.com/projects/ceph/wiki/Benchmark_Ceph_Cluster_Performance


in order to get more singlethread performance out of ceph you must get 
faster individual parts ( nvram disks/fast ram and processors/fast 
network/etc/etc) or you can cheat by either spreading the load over more 
disks. eg you can do rbd fancy striping, or attach multiple disk's with 
individual controllers in the vm. or use caching and /or readahead.



when it comes to cache tiering i would remove that, it does not get the 
love it needs. and redhat have even stopped supporting it in deployments.

but you can use dm-cache or bcache on osd's
or/and  rbd-cache on kvm clients.


good luck
Ronny Aasen


On 09.09.2018 11:20, Alex Lupsa wrote:

Hi,
Any ideas about the below ?

Thanks,
Alex

--
Hi,
I have a really small homelab 3-node ceph cluster on consumer hw - thanks
to Proxmox for making it easy to deploy it.
The problem I am having is very very bad transfer rates, ie 20mb/sec for
both read and write on 17 OSDs with cache layer.
However during recovery the speed hover between 250 to 700mb/sec which
proves that the cluster IS capable of reaching way above those 20mb/sec in
KVM.

Reading the documentation, I see that during recovery "nearly all OSDs
participate in resilvering a new drive" - kind of a torrent of data
incoming from multiple sources at once, causing a huge deluge.

However I believe this does not happen during the normal transfers, so my
question is simply - is there any hidden tunables I can enable for this
with the implied cost of network and heavy usage of disks ? Will there be
in the future if not ?

I have tried disabling authx, upgrading the network to 10gbit, have bigger
journals, more bluestore cache and disabled the debugging logs as it has
been advised on the list. The only thing that did help a bit was cache
tiering, but this only helps somewhat as the ops do not get promoted unless
I am very adamant about keeping programs in KVM open for very long times so
that the writes/reads are promoted.
To add some to the injury, once the cache gets full - the whole 3-node
cluster grinds to a full halt until I start forcefully evict data from the
cache... manually!
So I am therefore guessing a really bad misconfiguration from my side.

Next step would be removing the cache layer and using those SSDs as bcache
instead as it seems to yeld 5x the results, even though it does add yet
another layer of complexity and RAM requirements.

Full config details:
https://pastebin.com/xUM7VF9k

rados bench -p ceph_pool 30 write
Total time run: 30.983343
Total writes made:  762
Write size: 4194304
Object size:4194304
Bandwidth (MB/sec): 98.3754
Stddev Bandwidth:   20.9586
Max bandwidth (MB/sec): 132
Min bandwidth (MB/sec): 16
Average IOPS:   24
Stddev IOPS:5
Max IOPS:   33
Min IOPS:   4
Average Latency(s): 0.645017
Stddev Latency(s):  0.326411
Max latency(s): 2.08067
Min latency(s): 0.0355789
Cleaning up (deleting benchmark objects)
Removed 762 objects
Clean up completed and total clean up time :3.925631

Thanks,
Alex


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com