Re: [ceph-users] requests are blocked - problem
On 08/19/2015 03:41 PM, Nick Fisk wrote: Although you may get some benefit from tweaking parameters, I suspect you are nearer the performance ceiling for the current implementation of the tiering code. Could you post all the variables you set for the tiering including target_max_bytes and the dirty/full ratios. sure, all the parameters set are like this: hit_set_type bloom hit_set_count 1 hit_set_period 3600 target_max_bytes 65498264640 target_max_objects 100 cache_target_full_ratio 0.95 cache_min_flush_age 600 cache_min_evict_age 1800 cache_target_dirty_ratio 0.75 Since you are doing maildirs, which will have lots of small files, you might also want to try making the object size of the RBD smaller. This will mean less data is needed to be shifted on each promotion/flush. I'll try that - thanks! J -- Jacek Jarosiewicz Administrator Systemów Informatycznych SUPERMEDIA Sp. z o.o. z siedzibą w Warszawie ul. Senatorska 13/15, 00-075 Warszawa Sąd Rejonowy dla m.st.Warszawy, XII Wydział Gospodarczy Krajowego Rejestru Sądowego, nr KRS 029537; kapitał zakładowy 42.756.000 zł NIP: 957-05-49-503 Adres korespondencyjny: ul. Jubilerska 10, 04-190 Warszawa SUPERMEDIA - http://www.supermedia.pl dostep do internetu - hosting - kolokacja - lacza - telefonia ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] requests are blocked - problem
On 08/20/2015 03:07 AM, Christian Balzer wrote: For a realistic comparison with your current setup, a total rebuild would be in order. Provided your cluster is testing only at this point. Given your current HW, that means the same 2-3 HDDs per storage node and 1 SSD as journal. What exact maker/model are your SSDs? Again, more HDDs means more (sustainable) IOPS, so unless your space requirements (data and physical) are very demanding, double the amount of 3TB HDDs would be noticeably better. Christian The cluster is a test setup, eventually we will have lots more hdds (probably we will fill both chassises with drives - 44 drives each), just starting with this number for testing purposes. For the ssd's we use Intel DC S3710 (we ran into dsync problems with intel 530 series drives before, so we switched to recommended ones). By total rebuild do You mean reinitializing the whole cluster (monitors/osds) and starting from scratch? J -- Jacek Jarosiewicz Administrator Systemów Informatycznych SUPERMEDIA Sp. z o.o. z siedzibą w Warszawie ul. Senatorska 13/15, 00-075 Warszawa Sąd Rejonowy dla m.st.Warszawy, XII Wydział Gospodarczy Krajowego Rejestru Sądowego, nr KRS 029537; kapitał zakładowy 42.756.000 zł NIP: 957-05-49-503 Adres korespondencyjny: ul. Jubilerska 10, 04-190 Warszawa SUPERMEDIA - http://www.supermedia.pl dostep do internetu - hosting - kolokacja - lacza - telefonia ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] requests are blocked - problem
-Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Jacek Jarosiewicz Sent: 20 August 2015 07:31 To: Nick Fisk n...@fisk.me.uk; ceph-us...@ceph.com Subject: Re: [ceph-users] requests are blocked - problem On 08/19/2015 03:41 PM, Nick Fisk wrote: Although you may get some benefit from tweaking parameters, I suspect you are nearer the performance ceiling for the current implementation of the tiering code. Could you post all the variables you set for the tiering including target_max_bytes and the dirty/full ratios. sure, all the parameters set are like this: hit_set_type bloom hit_set_count 1 hit_set_period 3600 target_max_bytes 65498264640 target_max_objects 100 cache_target_full_ratio 0.95 cache_min_flush_age 600 cache_min_evict_age 1800 cache_target_dirty_ratio 0.75 That pretty much looks ok to me, the only thing I can suggest is maybe to lower the full_ratio a bit. The full ratio is based on the percentage across the whole pool, but the actual eviction occurs at a percentage of a PG level. I think this may mean that in certain cases a PG may block whilist is evicts even though it appears the pool hasn't reached the full target. Since you are doing maildirs, which will have lots of small files, you might also want to try making the object size of the RBD smaller. This will mean less data is needed to be shifted on each promotion/flush. I'll try that - thanks! J -- Jacek Jarosiewicz Administrator Systemów Informatycznych SUPERMEDIA Sp. z o.o. z siedzibą w Warszawie ul. Senatorska 13/15, 00-075 Warszawa Sąd Rejonowy dla m.st.Warszawy, XII Wydział Gospodarczy Krajowego Rejestru Sądowego, nr KRS 029537; kapitał zakładowy 42.756.000 zł NIP: 957-05-49-503 Adres korespondencyjny: ul. Jubilerska 10, 04-190 Warszawa SUPERMEDIA - http://www.supermedia.pl dostep do internetu - hosting - kolokacja - lacza - telefonia ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] requests are blocked - problem
Hello, On Thu, 20 Aug 2015 08:25:16 +0200 Jacek Jarosiewicz wrote: On 08/20/2015 03:07 AM, Christian Balzer wrote: For a realistic comparison with your current setup, a total rebuild would be in order. Provided your cluster is testing only at this point. Given your current HW, that means the same 2-3 HDDs per storage node and 1 SSD as journal. What exact maker/model are your SSDs? Again, more HDDs means more (sustainable) IOPS, so unless your space requirements (data and physical) are very demanding, double the amount of 3TB HDDs would be noticeably better. Christian The cluster is a test setup, eventually we will have lots more hdds (probably we will fill both chassises with drives - 44 drives each), just starting with this number for testing purposes. For the ssd's we use Intel DC S3710 (we ran into dsync problems with intel 530 series drives before, so we switched to recommended ones). Ah yes. The S3710s are 200GB, I guess the 240GB came from the 530s. ^o^ Also for use as journals the older 3700 is actually better suited, as it has a higher sequential write rate. In everything else the 3710 is faster. By total rebuild do You mean reinitializing the whole cluster (monitors/osds) and starting from scratch? Well, not the the monitors, but starting from scratch might actually be the fastest way. Unless you have another 10 HDDs and 4 SSDs (1 per storage node) for journals lying around, to get a fair comparison to your current install. Realistically you would of course compare something that is about the same size and cost, so more like 16 HDDs. Christian -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] requests are blocked - problem
Hi, On 08/19/2015 11:01 AM, Christian Balzer wrote: Hello, That's a pretty small cluster all things considered, so your rather intensive test setup is likely to run into any or all of the following issues: 1) The amount of data you're moving around is going cause a lot of promotions from and to the cache tier. This is expensive and slow. 2) EC coded pools are slow. You may have actually better results with a Ceph classic approach, 2-4 HDDs per journal SSD. Also 6TB HDDs combined with EC may look nice to you from a cost/density prospect, but more HDDs means more IOPS and thus speed. 3) scrubbing (unless configured very aggressively down) will impact your performance on top of the items above. 4) You already noted the kernel versus userland bit. 5) Having all your storage in a single JBOD chassis strikes me as ill advised, though I don't think it's an actual bottleneck at 4x12Gb/s. We use two of these (I forgot to mention that) Each chasis has two internal controllers - both exposing all the disks to the connected hosts. There are two osd nodes connected to each chasis. When you ran the fio tests I assume nothing else was going on and the dataset size would have fit easily into the cache pool, right? Look at your nodes with atop or iostat, I venture all your HDDs are at 100%. Christian Yes, the problem was a full cache pool. I'm currently wondering on how to tune the cache pool parameters so that the whole cluster doesn't slow down that much when the cache is full... I'm thinking of doing some tests on a pool w/o the cache tier so I can compare the results. Any suggestions would be greatly appreciated.. J -- Jacek Jarosiewicz Administrator Systemów Informatycznych SUPERMEDIA Sp. z o.o. z siedzibą w Warszawie ul. Senatorska 13/15, 00-075 Warszawa Sąd Rejonowy dla m.st.Warszawy, XII Wydział Gospodarczy Krajowego Rejestru Sądowego, nr KRS 029537; kapitał zakładowy 42.756.000 zł NIP: 957-05-49-503 Adres korespondencyjny: ul. Jubilerska 10, 04-190 Warszawa SUPERMEDIA - http://www.supermedia.pl dostep do internetu - hosting - kolokacja - lacza - telefonia ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] requests are blocked - problem
On 08/19/2015 10:58 AM, Nick Fisk wrote: -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Jacek Jarosiewicz Sent: 19 August 2015 09:29 To: ceph-us...@ceph.com Subject: [ceph-users] requests are blocked - problem I would suggest running the fio tests again, just to make sure that there isn't a problem with your newer config, but I suspect you will see equally bad performance with the fio tests now that the cache tier has begun to be more populated. Ok, I did the tests and You're right - the full cache was the problem. After flushing cache and running fio results are again good (fast). Is there a way to tune the cache parameters so that the whole cluster doesn't slow down that much and doesn't block requests? We use defaults for the cache pool from the documentation: hit_set_period 3600 cache_min_flush_age 600 cache_min_evict_age 1800 J -- Jacek Jarosiewicz Administrator Systemów Informatycznych SUPERMEDIA Sp. z o.o. z siedzibą w Warszawie ul. Senatorska 13/15, 00-075 Warszawa Sąd Rejonowy dla m.st.Warszawy, XII Wydział Gospodarczy Krajowego Rejestru Sądowego, nr KRS 029537; kapitał zakładowy 42.756.000 zł NIP: 957-05-49-503 Adres korespondencyjny: ul. Jubilerska 10, 04-190 Warszawa SUPERMEDIA - http://www.supermedia.pl dostep do internetu - hosting - kolokacja - lacza - telefonia ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] requests are blocked - problem
-Original Message- From: Jacek Jarosiewicz [mailto:jjarosiew...@supermedia.pl] Sent: 19 August 2015 14:28 To: Nick Fisk n...@fisk.me.uk; ceph-us...@ceph.com Subject: Re: [ceph-users] requests are blocked - problem On 08/19/2015 10:58 AM, Nick Fisk wrote: -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Jacek Jarosiewicz Sent: 19 August 2015 09:29 To: ceph-us...@ceph.com Subject: [ceph-users] requests are blocked - problem I would suggest running the fio tests again, just to make sure that there isn't a problem with your newer config, but I suspect you will see equally bad performance with the fio tests now that the cache tier has begun to be more populated. Ok, I did the tests and You're right - the full cache was the problem. After flushing cache and running fio results are again good (fast). Is there a way to tune the cache parameters so that the whole cluster doesn't slow down that much and doesn't block requests? We use defaults for the cache pool from the documentation: hit_set_period 3600 cache_min_flush_age 600 cache_min_evict_age 1800 Although you may get some benefit from tweaking parameters, I suspect you are nearer the performance ceiling for the current implementation of the tiering code. Could you post all the variables you set for the tiering including target_max_bytes and the dirty/full ratios. Since you are doing maildirs, which will have lots of small files, you might also want to try making the object size of the RBD smaller. This will mean less data is needed to be shifted on each promotion/flush. J -- Jacek Jarosiewicz Administrator Systemów Informatycznych SUPERMEDIA Sp. z o.o. z siedzibą w Warszawie ul. Senatorska 13/15, 00-075 Warszawa Sąd Rejonowy dla m.st.Warszawy, XII Wydział Gospodarczy Krajowego Rejestru Sądowego, nr KRS 029537; kapitał zakładowy 42.756.000 zł NIP: 957-05-49-503 Adres korespondencyjny: ul. Jubilerska 10, 04-190 Warszawa SUPERMEDIA - http://www.supermedia.pl dostep do internetu - hosting - kolokacja - lacza - telefonia ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] requests are blocked - problem
Hello, On Wed, 19 Aug 2015 15:27:29 +0200 Jacek Jarosiewicz wrote: Hi, On 08/19/2015 11:01 AM, Christian Balzer wrote: Hello, That's a pretty small cluster all things considered, so your rather intensive test setup is likely to run into any or all of the following issues: 1) The amount of data you're moving around is going cause a lot of promotions from and to the cache tier. This is expensive and slow. 2) EC coded pools are slow. You may have actually better results with a Ceph classic approach, 2-4 HDDs per journal SSD. Also 6TB HDDs combined with EC may look nice to you from a cost/density prospect, but more HDDs means more IOPS and thus speed. 3) scrubbing (unless configured very aggressively down) will impact your performance on top of the items above. 4) You already noted the kernel versus userland bit. 5) Having all your storage in a single JBOD chassis strikes me as ill advised, though I don't think it's an actual bottleneck at 4x12Gb/s. We use two of these (I forgot to mention that) Each chasis has two internal controllers - both exposing all the disks to the connected hosts. There are two osd nodes connected to each chasis. Ah, so you have the dual controller version. When you ran the fio tests I assume nothing else was going on and the dataset size would have fit easily into the cache pool, right? Look at your nodes with atop or iostat, I venture all your HDDs are at 100%. Christian Yes, the problem was a full cache pool. I'm currently wondering on how to tune the cache pool parameters so that the whole cluster doesn't slow down that much when the cache is full... Nick already gave you some advice on this, however with the current versions of Ceph cache tiering is simply expensive and slow. I'm thinking of doing some tests on a pool w/o the cache tier so I can compare the results. Any suggestions would be greatly appreciated.. For a realistic comparison with your current setup, a total rebuild would be in order. Provided your cluster is testing only at this point. Given your current HW, that means the same 2-3 HDDs per storage node and 1 SSD as journal. What exact maker/model are your SSDs? Again, more HDDs means more (sustainable) IOPS, so unless your space requirements (data and physical) are very demanding, double the amount of 3TB HDDs would be noticeably better. Christian -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] requests are blocked - problem
-Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Jacek Jarosiewicz Sent: 19 August 2015 09:29 To: ceph-us...@ceph.com Subject: [ceph-users] requests are blocked - problem Hi, Our setup is this: 4 x OSD nodes: E5-1630 CPU 32 GB RAM Mellanox MT27520 56Gbps network cards SATA controller LSI Logic SAS3008 Storage nodes are connected to SuperMicro chassis: 847E1C-R1K28JBOD Each node has 2-3 spinning OSDs (6TB drives) and 2 ssd OSDs (240GB drives) 3 monitors running on OSD nodes ceph hammer 0.94.2 Ubuntu 14.04 cache tier with ecpool (3+1) We've ran some tests on the cluster and results were promising - speeds, iops etc as expected, but now we tried to use more than one client for testing and ran into some problems: When you ran this test did you fill the RBD enough that it would have been flushing the dirty contents down to the base tier? We've created a couple rbd images and mapped them on clients (kernel rbd) running two rsync processes and one dd on a large number of files (~ 250 GB of maildirs rsync'ed from one rbd image to the other and a dd process writing one big 2TB file on another rbd image) And the speeds now are less than OK, plus we have a lot requests blocked warnings. We've left the processes to run over night but this morning I came and none of the processes finished - they are able to write data, but at a very, very slow rate. Once you have written to a block once, if the underlying object exists on the EC pool, it will have to be promoted to the Cache pool before it can be written to. This can have a severe impact on performance, especially if you are hitting lots of different blocks and the tiering agent can't keep up with the promotion requests. Please help me diagnose this problem, everything seems to work, just very, very slow... when we ran the tests with fio (librbd engine) everything seemd fine.. I know that kernel implementation is slower, but is this normal? I can't understand why are so many requests blocked. I would suggest running the fio tests again, just to make sure that there isn't a problem with your newer config, but I suspect you will see equally bad performance with the fio tests now that the cache tier has begun to be more populated. Some diagnostic data: root@cf01:/var/log/ceph# ceph -s cluster 3469081f-9852-4b6e-b7ed-900e77c48bb5 health HEALTH_WARN 31 requests are blocked 32 sec monmap e1: 3 mons at {cf01=10.4.10.211:6789/0,cf02=10.4.10.212:6789/0,cf03=10.4.10.213:6789/0} election epoch 202, quorum 0,1,2 cf01,cf02,cf03 osdmap e1319: 18 osds: 18 up, 18 in pgmap v933010: 2112 pgs, 19 pools, 10552 GB data, 2664 kobjects 14379 GB used, 42812 GB / 57192 GB avail 2111 active+clean 1 active+clean+scrubbing client io 0 B/s rd, 12896 kB/s wr, 35 op/s root@cf01:/var/log/ceph# ceph health detail HEALTH_WARN 23 requests are blocked 32 sec; 6 osds have slow requests 1 ops are blocked 131.072 sec 22 ops are blocked 65.536 sec 1 ops are blocked 65.536 sec on osd.2 1 ops are blocked 65.536 sec on osd.3 1 ops are blocked 65.536 sec on osd.4 1 ops are blocked 131.072 sec on osd.7 18 ops are blocked 65.536 sec on osd.10 1 ops are blocked 65.536 sec on osd.12 6 osds have slow requests root@cf01:/var/log/ceph# grep WRN ceph.log | tail -50 2015-08-19 10:23:34.505669 osd.14 10.4.10.213:6810/21207 17942 : cluster [WRN] 2 slow requests, 2 included below; oldest blocked for 30.575870 secs 2015-08-19 10:23:34.505796 osd.14 10.4.10.213:6810/21207 17943 : cluster [WRN] slow request 30.575870 seconds old, received at 2015-08-19 10:23:03.929722: osd_op(client.9203.1:22822591 rbd_data.37e02ae8944a.000180ca [set-alloc-hint object_size 4194304 write_size 4194304,write 0~462848] 5.1c2aff5f ondisk+write e1319) currently waiting for blocked object 2015-08-19 10:23:34.505803 osd.14 10.4.10.213:6810/21207 17944 : cluster [WRN] slow request 30.560009 seconds old, received at 2015-08-19 10:23:03.945583: osd_op(client.9203.1:22822592 rbd_data.37e02ae8944a.000180ca [set-alloc-hint object_size 4194304 write_size 4194304,write 462848~524288] 5.1c2aff5f ondisk+write e1319) currently waiting for blocked object 2015-08-19 10:23:35.489927 osd.1 10.4.10.211:6812/9198 7921 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for 30.112783 secs 2015-08-19 10:23:35.490326 osd.1 10.4.10.211:6812/9198 7922 : cluster [WRN] slow request 30.112783 seconds old, received at 2015-08-19 10:23:05.376339: osd_op(osd.14.1299:731492 rbd_data.37e02ae8944a.00017a90 [copy-from ver 61293] 4.5aa9fb69 ondisk+write+ignore_overlay+enforce_snapc+known_if_redirected e1319) currently commit_sent 2015-08-19 10:23:36.799861 osd.6 10.4.10.213:6806/22569 22663 : cluster [WRN] 2 slow requests, 1 included below; oldest
Re: [ceph-users] requests are blocked - problem
Hello, That's a pretty small cluster all things considered, so your rather intensive test setup is likely to run into any or all of the following issues: 1) The amount of data you're moving around is going cause a lot of promotions from and to the cache tier. This is expensive and slow. 2) EC coded pools are slow. You may have actually better results with a Ceph classic approach, 2-4 HDDs per journal SSD. Also 6TB HDDs combined with EC may look nice to you from a cost/density prospect, but more HDDs means more IOPS and thus speed. 3) scrubbing (unless configured very aggressively down) will impact your performance on top of the items above. 4) You already noted the kernel versus userland bit. 5) Having all your storage in a single JBOD chassis strikes me as ill advised, though I don't think it's an actual bottleneck at 4x12Gb/s. When you ran the fio tests I assume nothing else was going on and the dataset size would have fit easily into the cache pool, right? Look at your nodes with atop or iostat, I venture all your HDDs are at 100%. Christian On Wed, 19 Aug 2015 10:29:28 +0200 Jacek Jarosiewicz wrote: Hi, Our setup is this: 4 x OSD nodes: E5-1630 CPU 32 GB RAM Mellanox MT27520 56Gbps network cards SATA controller LSI Logic SAS3008 Storage nodes are connected to SuperMicro chassis: 847E1C-R1K28JBOD Each node has 2-3 spinning OSDs (6TB drives) and 2 ssd OSDs (240GB drives) 3 monitors running on OSD nodes ceph hammer 0.94.2 Ubuntu 14.04 cache tier with ecpool (3+1) We've ran some tests on the cluster and results were promising - speeds, iops etc as expected, but now we tried to use more than one client for testing and ran into some problems: We've created a couple rbd images and mapped them on clients (kernel rbd) running two rsync processes and one dd on a large number of files (~ 250 GB of maildirs rsync'ed from one rbd image to the other and a dd process writing one big 2TB file on another rbd image) And the speeds now are less than OK, plus we have a lot requests blocked warnings. We've left the processes to run over night but this morning I came and none of the processes finished - they are able to write data, but at a very, very slow rate. Please help me diagnose this problem, everything seems to work, just very, very slow... when we ran the tests with fio (librbd engine) everything seemd fine.. I know that kernel implementation is slower, but is this normal? I can't understand why are so many requests blocked. Some diagnostic data: root@cf01:/var/log/ceph# ceph -s cluster 3469081f-9852-4b6e-b7ed-900e77c48bb5 health HEALTH_WARN 31 requests are blocked 32 sec monmap e1: 3 mons at {cf01=10.4.10.211:6789/0,cf02=10.4.10.212:6789/0,cf03=10.4.10.213:6789/0} election epoch 202, quorum 0,1,2 cf01,cf02,cf03 osdmap e1319: 18 osds: 18 up, 18 in pgmap v933010: 2112 pgs, 19 pools, 10552 GB data, 2664 kobjects 14379 GB used, 42812 GB / 57192 GB avail 2111 active+clean 1 active+clean+scrubbing client io 0 B/s rd, 12896 kB/s wr, 35 op/s root@cf01:/var/log/ceph# ceph health detail HEALTH_WARN 23 requests are blocked 32 sec; 6 osds have slow requests 1 ops are blocked 131.072 sec 22 ops are blocked 65.536 sec 1 ops are blocked 65.536 sec on osd.2 1 ops are blocked 65.536 sec on osd.3 1 ops are blocked 65.536 sec on osd.4 1 ops are blocked 131.072 sec on osd.7 18 ops are blocked 65.536 sec on osd.10 1 ops are blocked 65.536 sec on osd.12 6 osds have slow requests root@cf01:/var/log/ceph# grep WRN ceph.log | tail -50 2015-08-19 10:23:34.505669 osd.14 10.4.10.213:6810/21207 17942 : cluster [WRN] 2 slow requests, 2 included below; oldest blocked for 30.575870 secs 2015-08-19 10:23:34.505796 osd.14 10.4.10.213:6810/21207 17943 : cluster [WRN] slow request 30.575870 seconds old, received at 2015-08-19 10:23:03.929722: osd_op(client.9203.1:22822591 rbd_data.37e02ae8944a.000180ca [set-alloc-hint object_size 4194304 write_size 4194304,write 0~462848] 5.1c2aff5f ondisk+write e1319) currently waiting for blocked object 2015-08-19 10:23:34.505803 osd.14 10.4.10.213:6810/21207 17944 : cluster [WRN] slow request 30.560009 seconds old, received at 2015-08-19 10:23:03.945583: osd_op(client.9203.1:22822592 rbd_data.37e02ae8944a.000180ca [set-alloc-hint object_size 4194304 write_size 4194304,write 462848~524288] 5.1c2aff5f ondisk+write e1319) currently waiting for blocked object 2015-08-19 10:23:35.489927 osd.1 10.4.10.211:6812/9198 7921 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for 30.112783 secs 2015-08-19 10:23:35.490326 osd.1 10.4.10.211:6812/9198 7922 : cluster [WRN] slow request 30.112783 seconds old, received at 2015-08-19 10:23:05.376339: osd_op(osd.14.1299:731492 rbd_data.37e02ae8944a.00017a90 [copy-from ver 61293] 4.5aa9fb69