Re: [ceph-users] requests are blocked - problem

2015-08-20 Thread Jacek Jarosiewicz

On 08/19/2015 03:41 PM, Nick Fisk wrote:

Although you may get some benefit from tweaking parameters, I suspect you are 
nearer the performance ceiling for the current implementation of the tiering 
code. Could you post all the variables you set for the tiering including 
target_max_bytes and the dirty/full ratios.



sure, all the parameters set are like this:

hit_set_type bloom
hit_set_count 1
hit_set_period 3600
target_max_bytes 65498264640
target_max_objects 100
cache_target_full_ratio 0.95
cache_min_flush_age 600
cache_min_evict_age 1800
cache_target_dirty_ratio 0.75



Since you are doing maildirs, which will have lots of small files, you might 
also want to try making the object size of the RBD smaller. This will mean less 
data is needed to be shifted on each promotion/flush.



I'll try that - thanks!

J

--
Jacek Jarosiewicz
Administrator Systemów Informatycznych


SUPERMEDIA Sp. z o.o. z siedzibą w Warszawie
ul. Senatorska 13/15, 00-075 Warszawa
Sąd Rejonowy dla m.st.Warszawy, XII Wydział Gospodarczy Krajowego 
Rejestru Sądowego,

nr KRS 029537; kapitał zakładowy 42.756.000 zł
NIP: 957-05-49-503
Adres korespondencyjny: ul. Jubilerska 10, 04-190 Warszawa


SUPERMEDIA -   http://www.supermedia.pl
dostep do internetu - hosting - kolokacja - lacza - telefonia
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] requests are blocked - problem

2015-08-20 Thread Jacek Jarosiewicz

On 08/20/2015 03:07 AM, Christian Balzer wrote:

For a realistic comparison with your current setup, a total rebuild would
be in order. Provided your cluster is testing only at this point.

Given your current HW, that means the same 2-3 HDDs per storage node and 1
SSD as journal.

What exact maker/model are your SSDs?

Again, more HDDs means more (sustainable) IOPS, so unless your space
requirements (data and physical) are very demanding, double the amount of
3TB HDDs would be noticeably better.

Christian



The cluster is a test setup, eventually we will have lots more hdds 
(probably we will fill both chassises with drives - 44 drives each), 
just starting with this number for testing purposes.


For the ssd's we use Intel DC S3710 (we ran into dsync problems with 
intel 530 series drives before, so we switched to recommended ones).


By total rebuild do You mean reinitializing the whole cluster 
(monitors/osds) and starting from scratch?


J

--
Jacek Jarosiewicz
Administrator Systemów Informatycznych


SUPERMEDIA Sp. z o.o. z siedzibą w Warszawie
ul. Senatorska 13/15, 00-075 Warszawa
Sąd Rejonowy dla m.st.Warszawy, XII Wydział Gospodarczy Krajowego 
Rejestru Sądowego,

nr KRS 029537; kapitał zakładowy 42.756.000 zł
NIP: 957-05-49-503
Adres korespondencyjny: ul. Jubilerska 10, 04-190 Warszawa


SUPERMEDIA -   http://www.supermedia.pl
dostep do internetu - hosting - kolokacja - lacza - telefonia
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] requests are blocked - problem

2015-08-20 Thread Nick Fisk




 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
 Jacek Jarosiewicz
 Sent: 20 August 2015 07:31
 To: Nick Fisk n...@fisk.me.uk; ceph-us...@ceph.com
 Subject: Re: [ceph-users] requests are blocked - problem
 
 On 08/19/2015 03:41 PM, Nick Fisk wrote:
  Although you may get some benefit from tweaking parameters, I suspect
 you are nearer the performance ceiling for the current implementation of
 the tiering code. Could you post all the variables you set for the tiering
 including target_max_bytes and the dirty/full ratios.
 
 
 sure, all the parameters set are like this:
 
 hit_set_type bloom
 hit_set_count 1
 hit_set_period 3600
 target_max_bytes 65498264640
 target_max_objects 100
 cache_target_full_ratio 0.95
 cache_min_flush_age 600
 cache_min_evict_age 1800
 cache_target_dirty_ratio 0.75

That pretty much looks ok to me, the only thing I can suggest is maybe to lower 
the full_ratio a bit. The full ratio is based on the percentage across the 
whole pool, but the actual eviction occurs at a percentage of a PG level. I 
think this may mean that in certain cases a PG may block whilist is evicts even 
though it appears the pool hasn't reached the full target.

 
 
  Since you are doing maildirs, which will have lots of small files, you might
 also want to try making the object size of the RBD smaller. This will mean 
 less
 data is needed to be shifted on each promotion/flush.
 
 
 I'll try that - thanks!
 
 J
 
 --
 Jacek Jarosiewicz
 Administrator Systemów Informatycznych
 
 
 SUPERMEDIA Sp. z o.o. z siedzibą w Warszawie ul. Senatorska 13/15, 00-075
 Warszawa Sąd Rejonowy dla m.st.Warszawy, XII Wydział Gospodarczy
 Krajowego Rejestru Sądowego, nr KRS 029537; kapitał zakładowy
 42.756.000 zł
 NIP: 957-05-49-503
 Adres korespondencyjny: ul. Jubilerska 10, 04-190 Warszawa
 
 
 SUPERMEDIA -   http://www.supermedia.pl
 dostep do internetu - hosting - kolokacja - lacza - telefonia
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] requests are blocked - problem

2015-08-20 Thread Christian Balzer

Hello,

On Thu, 20 Aug 2015 08:25:16 +0200 Jacek Jarosiewicz wrote:

 On 08/20/2015 03:07 AM, Christian Balzer wrote:
  For a realistic comparison with your current setup, a total rebuild
  would be in order. Provided your cluster is testing only at this point.
 
  Given your current HW, that means the same 2-3 HDDs per storage node
  and 1 SSD as journal.
 
  What exact maker/model are your SSDs?
 
  Again, more HDDs means more (sustainable) IOPS, so unless your space
  requirements (data and physical) are very demanding, double the amount
  of 3TB HDDs would be noticeably better.
 
  Christian
 
 
 The cluster is a test setup, eventually we will have lots more hdds 
 (probably we will fill both chassises with drives - 44 drives each), 
 just starting with this number for testing purposes.
 
 For the ssd's we use Intel DC S3710 (we ran into dsync problems with 
 intel 530 series drives before, so we switched to recommended ones).
 
Ah yes. The S3710s are 200GB, I guess the 240GB came from the 530s. ^o^
Also for use as journals the older 3700 is actually better suited, as it
has a higher sequential write rate.
In everything else the 3710 is faster. 

 By total rebuild do You mean reinitializing the whole cluster 
 (monitors/osds) and starting from scratch?
 
Well, not the the monitors, but starting from scratch might actually be
the fastest way. 
Unless you have another 10 HDDs and 4 SSDs (1 per storage node) for
journals lying around, to get a fair comparison to your current install.

Realistically you would of course compare something that is about the same
size and cost, so more like 16 HDDs.

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] requests are blocked - problem

2015-08-19 Thread Jacek Jarosiewicz

Hi,

On 08/19/2015 11:01 AM, Christian Balzer wrote:


Hello,

That's a pretty small cluster all things considered, so your rather
intensive test setup is likely to run into any or all of the following
issues:

1) The amount of data you're moving around is going cause a lot of
promotions from and to the cache tier. This is expensive and slow.
2) EC coded pools are slow. You may have actually better results with a
Ceph classic approach, 2-4 HDDs per journal SSD. Also 6TB HDDs combined
with EC may look nice to you from a cost/density prospect, but more HDDs
means more IOPS and thus speed.
3) scrubbing (unless configured very aggressively down) will
impact your performance on top of the items above.
4) You already noted the kernel versus userland bit.
5) Having all your storage in a single JBOD chassis strikes me as ill
advised, though I don't think it's an actual bottleneck at 4x12Gb/s.



We use two of these (I forgot to mention that)
Each chasis has two internal controllers - both exposing all the disks 
to the connected hosts. There are two osd nodes connected to each chasis.



When you ran the fio tests I assume nothing else was going on and the
dataset size would have fit easily into the cache pool, right?

Look at your nodes with atop or iostat, I venture all your HDDs are at
100%.

Christian



Yes, the problem was a full cache pool. I'm currently wondering on how 
to tune the cache pool parameters so that the whole cluster doesn't slow 
down that much when the cache is full...
I'm thinking of doing some tests on a pool w/o the cache tier so I can 
compare the results. Any suggestions would be greatly appreciated..


J

--
Jacek Jarosiewicz
Administrator Systemów Informatycznych


SUPERMEDIA Sp. z o.o. z siedzibą w Warszawie
ul. Senatorska 13/15, 00-075 Warszawa
Sąd Rejonowy dla m.st.Warszawy, XII Wydział Gospodarczy Krajowego 
Rejestru Sądowego,

nr KRS 029537; kapitał zakładowy 42.756.000 zł
NIP: 957-05-49-503
Adres korespondencyjny: ul. Jubilerska 10, 04-190 Warszawa


SUPERMEDIA -   http://www.supermedia.pl
dostep do internetu - hosting - kolokacja - lacza - telefonia
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] requests are blocked - problem

2015-08-19 Thread Jacek Jarosiewicz

On 08/19/2015 10:58 AM, Nick Fisk wrote:

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Jacek Jarosiewicz
Sent: 19 August 2015 09:29
To: ceph-us...@ceph.com
Subject: [ceph-users] requests are blocked - problem


I would suggest running the fio tests again, just to make sure that there isn't 
a problem with your newer config, but I suspect you will see equally bad 
performance with the fio tests now that the cache tier has begun to be more 
populated.



Ok, I did the tests and You're right - the full cache was the problem. 
After flushing cache and running fio results are again good (fast).


Is there a way to tune the cache parameters so that the whole cluster 
doesn't slow down that much and doesn't block requests?


We use defaults for the cache pool from the documentation:
hit_set_period 3600
cache_min_flush_age 600
cache_min_evict_age 1800

J

--
Jacek Jarosiewicz
Administrator Systemów Informatycznych


SUPERMEDIA Sp. z o.o. z siedzibą w Warszawie
ul. Senatorska 13/15, 00-075 Warszawa
Sąd Rejonowy dla m.st.Warszawy, XII Wydział Gospodarczy Krajowego 
Rejestru Sądowego,

nr KRS 029537; kapitał zakładowy 42.756.000 zł
NIP: 957-05-49-503
Adres korespondencyjny: ul. Jubilerska 10, 04-190 Warszawa


SUPERMEDIA -   http://www.supermedia.pl
dostep do internetu - hosting - kolokacja - lacza - telefonia
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] requests are blocked - problem

2015-08-19 Thread Nick Fisk




 -Original Message-
 From: Jacek Jarosiewicz [mailto:jjarosiew...@supermedia.pl]
 Sent: 19 August 2015 14:28
 To: Nick Fisk n...@fisk.me.uk; ceph-us...@ceph.com
 Subject: Re: [ceph-users] requests are blocked - problem
 
 On 08/19/2015 10:58 AM, Nick Fisk wrote:
  -Original Message-
  From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
  Of Jacek Jarosiewicz
  Sent: 19 August 2015 09:29
  To: ceph-us...@ceph.com
  Subject: [ceph-users] requests are blocked - problem
 
  I would suggest running the fio tests again, just to make sure that there
 isn't a problem with your newer config, but I suspect you will see equally bad
 performance with the fio tests now that the cache tier has begun to be more
 populated.
 
 
 Ok, I did the tests and You're right - the full cache was the problem.
 After flushing cache and running fio results are again good (fast).
 
 Is there a way to tune the cache parameters so that the whole cluster
 doesn't slow down that much and doesn't block requests?
 
 We use defaults for the cache pool from the documentation:
 hit_set_period 3600
 cache_min_flush_age 600
 cache_min_evict_age 1800
 

Although you may get some benefit from tweaking parameters, I suspect you are 
nearer the performance ceiling for the current implementation of the tiering 
code. Could you post all the variables you set for the tiering including 
target_max_bytes and the dirty/full ratios.

Since you are doing maildirs, which will have lots of small files, you might 
also want to try making the object size of the RBD smaller. This will mean less 
data is needed to be shifted on each promotion/flush.

 J
 
 --
 Jacek Jarosiewicz
 Administrator Systemów Informatycznych
 
 
 SUPERMEDIA Sp. z o.o. z siedzibą w Warszawie ul. Senatorska 13/15, 00-075
 Warszawa Sąd Rejonowy dla m.st.Warszawy, XII Wydział Gospodarczy
 Krajowego Rejestru Sądowego, nr KRS 029537; kapitał zakładowy
 42.756.000 zł
 NIP: 957-05-49-503
 Adres korespondencyjny: ul. Jubilerska 10, 04-190 Warszawa
 
 
 SUPERMEDIA -   http://www.supermedia.pl
 dostep do internetu - hosting - kolokacja - lacza - telefonia




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] requests are blocked - problem

2015-08-19 Thread Christian Balzer

Hello,

On Wed, 19 Aug 2015 15:27:29 +0200 Jacek Jarosiewicz wrote:

 Hi,
 
 On 08/19/2015 11:01 AM, Christian Balzer wrote:
 
  Hello,
 
  That's a pretty small cluster all things considered, so your rather
  intensive test setup is likely to run into any or all of the following
  issues:
 
  1) The amount of data you're moving around is going cause a lot of
  promotions from and to the cache tier. This is expensive and slow.
  2) EC coded pools are slow. You may have actually better results with a
  Ceph classic approach, 2-4 HDDs per journal SSD. Also 6TB HDDs
  combined with EC may look nice to you from a cost/density prospect,
  but more HDDs means more IOPS and thus speed.
  3) scrubbing (unless configured very aggressively down) will
  impact your performance on top of the items above.
  4) You already noted the kernel versus userland bit.
  5) Having all your storage in a single JBOD chassis strikes me as ill
  advised, though I don't think it's an actual bottleneck at 4x12Gb/s.
 
 
 We use two of these (I forgot to mention that)
 Each chasis has two internal controllers - both exposing all the disks 
 to the connected hosts. There are two osd nodes connected to each chasis.

Ah, so you have the dual controller version.
 
  When you ran the fio tests I assume nothing else was going on and the
  dataset size would have fit easily into the cache pool, right?
 
  Look at your nodes with atop or iostat, I venture all your HDDs are at
  100%.
 
  Christian
 
 
 Yes, the problem was a full cache pool. I'm currently wondering on how 
 to tune the cache pool parameters so that the whole cluster doesn't slow 
 down that much when the cache is full...

Nick already gave you some advice on this, however with the current
versions of Ceph cache tiering is simply expensive and slow.

 I'm thinking of doing some tests on a pool w/o the cache tier so I can 
 compare the results. Any suggestions would be greatly appreciated..
 
For a realistic comparison with your current setup, a total rebuild would
be in order. Provided your cluster is testing only at this point.

Given your current HW, that means the same 2-3 HDDs per storage node and 1
SSD as journal.

What exact maker/model are your SSDs?

Again, more HDDs means more (sustainable) IOPS, so unless your space
requirements (data and physical) are very demanding, double the amount of
3TB HDDs would be noticeably better. 

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] requests are blocked - problem

2015-08-19 Thread Nick Fisk
 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
 Jacek Jarosiewicz
 Sent: 19 August 2015 09:29
 To: ceph-us...@ceph.com
 Subject: [ceph-users] requests are blocked - problem
 
 Hi,
 
 Our setup is this:
 
 4 x OSD nodes:
 E5-1630 CPU
 32 GB RAM
 Mellanox MT27520 56Gbps network cards
 SATA controller LSI Logic SAS3008
 Storage nodes are connected to SuperMicro chassis: 847E1C-R1K28JBOD Each
 node has 2-3 spinning OSDs (6TB drives) and 2 ssd OSDs (240GB drives)
 3 monitors running on OSD nodes
 ceph hammer 0.94.2
 Ubuntu 14.04
 cache tier with ecpool (3+1)
 
 We've ran some tests on the cluster and results were promising - speeds,
 iops etc as expected, but now we tried to use more than one client for
 testing and ran into some problems:

When you ran this test did you fill the RBD enough that it would have been 
flushing the dirty contents down to the base tier?

 
 We've created a couple rbd images and mapped them on clients (kernel
 rbd) running two rsync processes and one dd on a large number of files (~
 250 GB of maildirs rsync'ed from one rbd image to the other and a dd process
 writing one big 2TB file on another rbd image)
 
 And the speeds now are less than OK, plus we have a lot requests blocked
 warnings. We've left the processes to run over night but this morning I came
 and none of the processes finished - they are able to write data, but at a
 very, very slow rate.

Once you have written to a block once, if the underlying object exists on the 
EC pool, it will have to be promoted to the Cache pool before it can be written 
to. This can have a severe impact on performance, especially if you are hitting 
lots of different blocks and the tiering agent can't keep up with the promotion 
requests.

 
 Please help me diagnose this problem, everything seems to work, just very,
 very slow... when we ran the tests with fio (librbd engine) everything seemd
 fine.. I know that kernel implementation is slower, but is this normal? I 
 can't
 understand why are so many requests blocked.

I would suggest running the fio tests again, just to make sure that there isn't 
a problem with your newer config, but I suspect you will see equally bad 
performance with the fio tests now that the cache tier has begun to be more 
populated.

 
 Some diagnostic data:
 
 root@cf01:/var/log/ceph# ceph -s
  cluster 3469081f-9852-4b6e-b7ed-900e77c48bb5
   health HEALTH_WARN
  31 requests are blocked  32 sec
   monmap e1: 3 mons at
 {cf01=10.4.10.211:6789/0,cf02=10.4.10.212:6789/0,cf03=10.4.10.213:6789/0}
  election epoch 202, quorum 0,1,2 cf01,cf02,cf03
   osdmap e1319: 18 osds: 18 up, 18 in
pgmap v933010: 2112 pgs, 19 pools, 10552 GB data, 2664 kobjects
  14379 GB used, 42812 GB / 57192 GB avail
  2111 active+clean
 1 active+clean+scrubbing
client io 0 B/s rd, 12896 kB/s wr, 35 op/s
 
 
 root@cf01:/var/log/ceph# ceph health detail
 HEALTH_WARN 23 requests are blocked  32 sec; 6 osds have slow requests
 1 ops are blocked  131.072 sec
 22 ops are blocked  65.536 sec
 1 ops are blocked  65.536 sec on osd.2
 1 ops are blocked  65.536 sec on osd.3
 1 ops are blocked  65.536 sec on osd.4
 1 ops are blocked  131.072 sec on osd.7
 18 ops are blocked  65.536 sec on osd.10
 1 ops are blocked  65.536 sec on osd.12
 6 osds have slow requests
 
 
 root@cf01:/var/log/ceph# grep WRN ceph.log | tail -50
 2015-08-19 10:23:34.505669 osd.14 10.4.10.213:6810/21207 17942 : cluster
 [WRN] 2 slow requests, 2 included below; oldest blocked for  30.575870 secs
 2015-08-19 10:23:34.505796 osd.14 10.4.10.213:6810/21207 17943 : cluster
 [WRN] slow request 30.575870 seconds old, received at 2015-08-19
 10:23:03.929722: osd_op(client.9203.1:22822591
 rbd_data.37e02ae8944a.000180ca [set-alloc-hint object_size
 4194304 write_size 4194304,write 0~462848] 5.1c2aff5f ondisk+write
 e1319) currently waiting for blocked object
 2015-08-19 10:23:34.505803 osd.14 10.4.10.213:6810/21207 17944 : cluster
 [WRN] slow request 30.560009 seconds old, received at 2015-08-19
 10:23:03.945583: osd_op(client.9203.1:22822592
 rbd_data.37e02ae8944a.000180ca [set-alloc-hint object_size
 4194304 write_size 4194304,write 462848~524288] 5.1c2aff5f ondisk+write
 e1319) currently waiting for blocked object
 2015-08-19 10:23:35.489927 osd.1 10.4.10.211:6812/9198 7921 : cluster
 [WRN] 1 slow requests, 1 included below; oldest blocked for  30.112783 secs
 2015-08-19 10:23:35.490326 osd.1 10.4.10.211:6812/9198 7922 : cluster
 [WRN] slow request 30.112783 seconds old, received at 2015-08-19
 10:23:05.376339: osd_op(osd.14.1299:731492
 rbd_data.37e02ae8944a.00017a90 [copy-from ver 61293] 4.5aa9fb69
 ondisk+write+ignore_overlay+enforce_snapc+known_if_redirected e1319)
 currently commit_sent
 2015-08-19 10:23:36.799861 osd.6 10.4.10.213:6806/22569 22663 : cluster
 [WRN] 2 slow requests, 1 included below; oldest

Re: [ceph-users] requests are blocked - problem

2015-08-19 Thread Christian Balzer

Hello,

That's a pretty small cluster all things considered, so your rather
intensive test setup is likely to run into any or all of the following
issues:

1) The amount of data you're moving around is going cause a lot of
promotions from and to the cache tier. This is expensive and slow.
2) EC coded pools are slow. You may have actually better results with a
Ceph classic approach, 2-4 HDDs per journal SSD. Also 6TB HDDs combined
with EC may look nice to you from a cost/density prospect, but more HDDs
means more IOPS and thus speed.
3) scrubbing (unless configured very aggressively down) will
impact your performance on top of the items above.
4) You already noted the kernel versus userland bit.
5) Having all your storage in a single JBOD chassis strikes me as ill
advised, though I don't think it's an actual bottleneck at 4x12Gb/s.

When you ran the fio tests I assume nothing else was going on and the
dataset size would have fit easily into the cache pool, right?

Look at your nodes with atop or iostat, I venture all your HDDs are at
100%.

Christian

On Wed, 19 Aug 2015 10:29:28 +0200 Jacek Jarosiewicz wrote:

 Hi,
 
 Our setup is this:
 
 4 x OSD nodes:
 E5-1630 CPU
 32 GB RAM
 Mellanox MT27520 56Gbps network cards
 SATA controller LSI Logic SAS3008
 Storage nodes are connected to SuperMicro chassis: 847E1C-R1K28JBOD
 Each node has 2-3 spinning OSDs (6TB drives) and 2 ssd OSDs (240GB
 drives) 3 monitors running on OSD nodes
 ceph hammer 0.94.2
 Ubuntu 14.04
 cache tier with ecpool (3+1)
 
 We've ran some tests on the cluster and results were promising - speeds, 
 iops etc as expected, but now we tried to use more than one client for 
 testing and ran into some problems:
 
 We've created a couple rbd images and mapped them on clients (kernel 
 rbd) running two rsync processes and one dd on a large number of files 
 (~ 250 GB of maildirs rsync'ed from one rbd image to the other and a dd 
 process writing one big 2TB file on another rbd image)
 
 And the speeds now are less than OK, plus we have a lot requests blocked 
 warnings. We've left the processes to run over night but this morning I 
 came and none of the processes finished - they are able to write data, 
 but at a very, very slow rate.
 
 Please help me diagnose this problem, everything seems to work, just 
 very, very slow... when we ran the tests with fio (librbd engine) 
 everything seemd fine.. I know that kernel implementation is slower, but 
 is this normal? I can't understand why are so many requests blocked.
 
 Some diagnostic data:
 
 root@cf01:/var/log/ceph# ceph -s
  cluster 3469081f-9852-4b6e-b7ed-900e77c48bb5
   health HEALTH_WARN
  31 requests are blocked  32 sec
   monmap e1: 3 mons at 
 {cf01=10.4.10.211:6789/0,cf02=10.4.10.212:6789/0,cf03=10.4.10.213:6789/0}
  election epoch 202, quorum 0,1,2 cf01,cf02,cf03
   osdmap e1319: 18 osds: 18 up, 18 in
pgmap v933010: 2112 pgs, 19 pools, 10552 GB data, 2664 kobjects
  14379 GB used, 42812 GB / 57192 GB avail
  2111 active+clean
 1 active+clean+scrubbing
client io 0 B/s rd, 12896 kB/s wr, 35 op/s
 
 
 root@cf01:/var/log/ceph# ceph health detail
 HEALTH_WARN 23 requests are blocked  32 sec; 6 osds have slow requests
 1 ops are blocked  131.072 sec
 22 ops are blocked  65.536 sec
 1 ops are blocked  65.536 sec on osd.2
 1 ops are blocked  65.536 sec on osd.3
 1 ops are blocked  65.536 sec on osd.4
 1 ops are blocked  131.072 sec on osd.7
 18 ops are blocked  65.536 sec on osd.10
 1 ops are blocked  65.536 sec on osd.12
 6 osds have slow requests
 
 
 root@cf01:/var/log/ceph# grep WRN ceph.log | tail -50
 2015-08-19 10:23:34.505669 osd.14 10.4.10.213:6810/21207 17942 : cluster 
 [WRN] 2 slow requests, 2 included below; oldest blocked for  30.575870
 secs 2015-08-19 10:23:34.505796 osd.14 10.4.10.213:6810/21207 17943 :
 cluster [WRN] slow request 30.575870 seconds old, received at 2015-08-19 
 10:23:03.929722: osd_op(client.9203.1:22822591 
 rbd_data.37e02ae8944a.000180ca [set-alloc-hint object_size 
 4194304 write_size 4194304,write 0~462848] 5.1c2aff5f ondisk+write 
 e1319) currently waiting for blocked object
 2015-08-19 10:23:34.505803 osd.14 10.4.10.213:6810/21207 17944 : cluster 
 [WRN] slow request 30.560009 seconds old, received at 2015-08-19 
 10:23:03.945583: osd_op(client.9203.1:22822592 
 rbd_data.37e02ae8944a.000180ca [set-alloc-hint object_size 
 4194304 write_size 4194304,write 462848~524288] 5.1c2aff5f ondisk+write 
 e1319) currently waiting for blocked object
 2015-08-19 10:23:35.489927 osd.1 10.4.10.211:6812/9198 7921 : cluster 
 [WRN] 1 slow requests, 1 included below; oldest blocked for  30.112783
 secs 2015-08-19 10:23:35.490326 osd.1 10.4.10.211:6812/9198 7922 :
 cluster [WRN] slow request 30.112783 seconds old, received at 2015-08-19 
 10:23:05.376339: osd_op(osd.14.1299:731492 
 rbd_data.37e02ae8944a.00017a90 [copy-from ver 61293] 4.5aa9fb69