Hi all,

I'm experiencing a problem when running ceph with cache tiering. Can someone 
help to take a look? Any comments are appreciated, thx!

===Configuration===
I have 2 pools, one is 'rbd', the other is 'cache'. Pool 'cache' is the cache 
tier of pool 'rbd'. The 'rbd' pool has 16 HDD OSDs. And the 'cache' pool has 8 
SSD OSDs.
Here is the OSD tree:
# id    weight  type name       up/down reweight
-6      8       root ssd
-4      4               host zqw-s1-ssd
16      1                       osd.16  up      1
17      1                       osd.17  up      1
18      1                       osd.18  up      1
19      1                       osd.19  up      1
-5      4               host zqw-s2-ssd
20      1                       osd.20  up      1
21      1                       osd.21  up      1
22      1                       osd.22  up      1
23      1                       osd.23  up      1
-1      14.56   root default
-2      7.28            host zqw-s1
0       0.91                    osd.0   up      1
1       0.91                    osd.1   up      1
2       0.91                    osd.2   up      1
3       0.91                    osd.3   up      1
4       0.91                    osd.4   up      1
5       0.91                    osd.5   up      1
6       0.91                    osd.6   up      1
7       0.91                    osd.7   up      1
-3      7.28            host zqw-s2
8       0.91                    osd.8   up      1
9       0.91                    osd.9   up      1
10      0.91                    osd.10  up      1
11      0.91                    osd.11  up      1
12      0.91                    osd.12  up      1
13      0.91                    osd.13  up      1
14      0.91                    osd.14  up      1
15      0.91                    osd.15  up      1

===Problem===
After running IO for a while, I start to see the 'slow request' warnings in the 
ceph log:

2014-05-26 14:53:23.606138 osd.20 10.44.44.6:6823/61598 14 : [WRN] 6 slow 
requests, 6 included below; oldest blocked for > 30.921077 secs
2014-05-26 14:53:23.606149 osd.20 10.44.44.6:6823/61598 15 : [WRN] slow request 
30.921077 seconds old, received at 2014-05-26 14:52:52.684957: 
osd_op(client.42161.0:10162 rb.0.93ff.238e1f29.00000000028f [set-alloc-hint 
object_size 4194304 write_size 4194304,write 0~524288] 22.7ec98e46 
ack+ondisk+write e16745) v4 currently waiting for subops from 16
...
2014-05-26 14:53:23.483175 osd.21 10.44.44.6:6825/61597 37 : [WRN] 6 slow 
requests, 6 included below; oldest blocked for > 30.664500 secs
2014-05-26 14:53:23.483189 osd.21 10.44.44.6:6825/61597 38 : [WRN] slow request 
30.664500 seconds old, received at 2014-05-26 14:52:52.799480: 
osd_op(client.42177.0:10020 rb.0.93f2.238e1f29.000000000239 [set-alloc-hint 
object_size 4194304 write_size 4194304,write 0~524288] 22.a3cdd2df 
ack+ondisk+write e16745) v4 currently waiting for subops from 16
...
2014-05-26 14:53:23.483164 osd.23 10.44.44.6:6816/61595 36 : [WRN] 6 slow 
requests, 6 included below; oldest blocked for > 30.740852 secs
2014-05-26 14:53:23.483178 osd.23 10.44.44.6:6816/61595 37 : [WRN] slow request 
30.740852 seconds old, received at 2014-05-26 14:52:52.737293: 
osd_op(client.42171.0:9359 rb.0.93fd.238e1f29.00000000022b [set-alloc-hint 
object_size 4194304 write_size 4194304,write 0~524288] 22.5e9da689 
ack+ondisk+write e16745) v4 currently waiting for subops from 16
...
2014-05-26 14:53:23.808628 osd.22 10.44.44.6:6813/61596 24 : [WRN] 4 slow 
requests, 4 included below; oldest blocked for > 30.571487 secs
2014-05-26 14:53:23.808638 osd.22 10.44.44.6:6813/61596 25 : [WRN] slow request 
30.571487 seconds old, received at 2014-05-26 14:52:53.237048: 
osd_op(client.42174.0:9872 rb.0.93f7.238e1f29.00000000023f [set-alloc-hint 
object_size 4194304 write_size 4194304,write 0~524288] 22.8d495fcc 
ack+ondisk+write e16745) v4 currently waiting for subops from 16
...
2014-05-26 14:53:23.553157 osd.16 10.44.44.5:6855/72950 62 : [WRN] 6 slow 
requests, 6 included below; oldest blocked for > 30.930690 secs
2014-05-26 14:53:23.553168 osd.16 10.44.44.5:6855/72950 63 : [WRN] slow request 
30.930690 seconds old, received at 2014-05-26 14:52:52.586582: 
osd_op(client.42175.0:9650 rb.0.93f1.2ae8944a.000000000236 [set-alloc-hint 
object_size 4194304 write_size 4194304,write 0~524288] 22.3dec4f3b 
ack+ondisk+write e16745) v4 currently waiting for subops from 20
...
2014-05-26 14:53:35.553099 mon.0 10.44.44.5:6789/0 1348 : [INF] osd.16 
10.44.44.5:6855/72950 failed (6 reports from 4 peers after 27.877781 >= grace 
26.858363)
2014-05-26 14:53:35.871891 mon.0 10.44.44.5:6789/0 1350 : [INF] osdmap e16746: 
24 osds: 23 up, 24 in
...
2014-05-26 14:53:38.168716 osd.16 10.44.44.5:6855/72950 152 : [WRN] map e16747 
wrongly marked me down

===Analysis===
>From the above log, it seems something went wrong on OSD 16. OSD 16 is on 
>server 'zqw-s1'. The other 4 SSD OSDs (20, 21, 22, 23) on the other server 
>'zqw-s2' are waiting for operations on OSD 16. Finally the monitor thinks the 
>OSD 16 failed and marked it down. However, OSD 16 didn't think so.

Look at the log of OSD 20, 21, 22 and 23, around the time when the 'slow 
request' happened, these OSDs didn't get reply from OSD 16 during heartbeat 
check:
2014-05-26 14:53:27.606927 7f163d0c8700 -1 osd.20 16745 heartbeat_check: no 
reply from osd.16 since back 2014-05-26 14:53:07.369988 front 2014-05-26 
14:53:07.369988 (cutoff 2014-05-26 14:53:07.606925)

And from the log of OSD 16, around the time of the 'slow request', there are 
lots of logs like this:
2014-05-26 14:53:23.772145 7fb39e35a700  1 heartbeat_map is_healthy 'OSD::op_tp 
thread 0x7fb39a352700' had timed out after 15

===Questions===
1. From the simple analysis above, looks like the slow request happens when 
doing PG replicating. The heartbeat also fails. Is it possibly because the 
cluster network is too busy? I'm currently using 10Gb public network and 10Gb 
cluster network. If it is true, why I don't see slow request caused by other 
OSDs at the same time?
2. I run into this problem quite often when running ceph with cache tiering. 
But I don't see this problem when running without it. Is it possible that OSD 
16 stuck at some cache tiering specific code?
3. Any other reasons may cause the 'slow request' like this?
4. I noticed from 'iostat' that when the 'slow request' happened, the cluster 
halted the IO for 60s. Is this true?
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to