Hi Peter,

> -----Original Message-----
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Peter Kerdisle
> Sent: 02 May 2016 08:17
> To: ceph-users@lists.ceph.com
> Subject: [ceph-users] Erasure pool performance expectations
> 
> Hi guys,
> 
> I am currently testing the performance of RBD using a cache pool and a 4/2
> erasure profile pool.
> 
> I have two SSD cache servers (2 SSDs for journals, 7 SSDs for data) with
> 2x10Gbit bonded each and a six OSD nodes with a 10Gbit public and 10Gbit
> cluster network for the erasure pool (10x3TB without separate journal). This
> is all on Jewel.
> 
> What I would like to know is if the performance I'm seeing is to be expected
> and if there is some way to test this in a more qualifiable way.
> 
> Everything works as expected if the files are present on the cache pool,
> however when things need to be retrieved from the cache pool I see
> performance degradation. I'm trying to simulate real usage as much as
> possible and trying to retrieve files from the RBD volume over FTP from a
> client server. What I'm seeing is that the FTP transfer will stall for 
> seconds at a
> time and then get some more data which results in an average speed of
> 200KB/s. From the cache this is closer to 10MB/s. Is this the expected
> behaviour from a erasure coded tier with cache in front?

Unfortunately yes. The whole Erasure/Cache thing only really works well if the 
data in the EC tier is only accessed infrequently, otherwise the overheads in 
cache promotion/flushing quickly brings the cluster down to its knees. However 
it looks as though you are mainly doing reads, which means you can probably 
alter your cache settings to not promote so aggressively on reads, as reads can 
be proxied through to the EC tier instead of promoting. This should reduce the 
amount of required cache promotions.

Can you try setting min_read_recency_for promote to something higher?

Also can you check what your hit_set_period and hit_set_count is currently set 
to.


> Right now I'm unsure how to scientifically test the performance retrieving
> files when there is a cache miss. If somebody could point me towards a
> better way of doing that I would appreciate the help.
> 
> An other thing is that I'm seeing a lot of messages popping up in dmesg on
> my client server on which the RBD volumes are mounted. (IPs removed)
> 
> [685881.477383] libceph: osd50 :6800 socket closed (con state OPEN)
> [685895.597733] libceph: osd54 :6808 socket closed (con state OPEN)
> [685895.663971] libceph: osd54 :6808 socket closed (con state OPEN)
> [685895.710424] libceph: osd54 :6808 socket closed (con state OPEN)
> [685895.749417] libceph: osd54 :6808 socket closed (con state OPEN)
> [685896.517778] libceph: osd54 :6808 socket closed (con state OPEN)
> [685906.690445] libceph: osd74 :6824 socket closed (con state OPEN)
> 
> Is this a symptom of something?

This is just stale connections to the OSD's timing out after the idle period 
and is nothing to worry about.

> 
> Thanks in advance,
> 
> Peter


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to