Hey Nick, Thanks for taking the time to answer my questions. Some in-line comments.
On Tue, May 3, 2016 at 10:51 AM, Nick Fisk <n...@fisk.me.uk> wrote: > Hi Peter, > > > > -----Original Message----- > > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > > Peter Kerdisle > > Sent: 02 May 2016 08:17 > > To: ceph-users@lists.ceph.com > > Subject: [ceph-users] Erasure pool performance expectations > > > > Hi guys, > > > > I am currently testing the performance of RBD using a cache pool and a > 4/2 > > erasure profile pool. > > > > I have two SSD cache servers (2 SSDs for journals, 7 SSDs for data) with > > 2x10Gbit bonded each and a six OSD nodes with a 10Gbit public and 10Gbit > > cluster network for the erasure pool (10x3TB without separate journal). > This > > is all on Jewel. > > > > What I would like to know is if the performance I'm seeing is to be > expected > > and if there is some way to test this in a more qualifiable way. > > > > Everything works as expected if the files are present on the cache pool, > > however when things need to be retrieved from the cache pool I see > > performance degradation. I'm trying to simulate real usage as much as > > possible and trying to retrieve files from the RBD volume over FTP from a > > client server. What I'm seeing is that the FTP transfer will stall for > seconds at a > > time and then get some more data which results in an average speed of > > 200KB/s. From the cache this is closer to 10MB/s. Is this the expected > > behaviour from a erasure coded tier with cache in front? > > Unfortunately yes. The whole Erasure/Cache thing only really works well if > the data in the EC tier is only accessed infrequently, otherwise the > overheads in cache promotion/flushing quickly brings the cluster down to > its knees. However it looks as though you are mainly doing reads, which > means you can probably alter your cache settings to not promote so > aggressively on reads, as reads can be proxied through to the EC tier > instead of promoting. This should reduce the amount of required cache > promotions. > You are correct that reads have a lower priority of being cached, only when they are used very frequently should this be done in an ideal situation. > > Can you try setting min_read_recency_for promote to something higher? > I looked into the setting before but I must admit it's exact purpose eludes me still. Would it be correct to simplify it as 'min_read_recency_for_promote determines the amount of times a piece would have to be read in a certain interval (set by hit_set_period) in order to promote it to the caching tier' ? > Also can you check what your hit_set_period and hit_set_count is currently > set to. > hit_set_count is set to 1 and hit_set_period to 1800. What would increasing the hit_set_count do exactly? > > > Right now I'm unsure how to scientifically test the performance > retrieving > > files when there is a cache miss. If somebody could point me towards a > > better way of doing that I would appreciate the help. > > > > An other thing is that I'm seeing a lot of messages popping up in dmesg > on > > my client server on which the RBD volumes are mounted. (IPs removed) > > > > [685881.477383] libceph: osd50 :6800 socket closed (con state OPEN) > > [685895.597733] libceph: osd54 :6808 socket closed (con state OPEN) > > [685895.663971] libceph: osd54 :6808 socket closed (con state OPEN) > > [685895.710424] libceph: osd54 :6808 socket closed (con state OPEN) > > [685895.749417] libceph: osd54 :6808 socket closed (con state OPEN) > > [685896.517778] libceph: osd54 :6808 socket closed (con state OPEN) > > [685906.690445] libceph: osd74 :6824 socket closed (con state OPEN) > > > > Is this a symptom of something? > > This is just stale connections to the OSD's timing out after the idle > period and is nothing to worry about. > Glad to hear that, I was fearing something might be wrong. Thanks again. Peter
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com