Re: osd suicide timeout

2015-07-13 Thread Gregory Farnum
On Fri, Jul 10, 2015 at 10:45 PM, Deneau, Tom tom.den...@amd.com wrote:
 I have an osd log file from an osd that hit a suicide timeout (with the 
 previous 1 events logged).
 (On this node I have also seen this suicide timeout happen once before and 
 also a sync_entry timeout.

 I can see that 6 minutes or so before that osd died, other osds on the same 
 node were logging
 messages such as
 heartbeat_check: no reply from osd.8
 so it appears that osd8 stopped responding quite some time before it died.

It does this deliberately when its disk threads look like they might be stuck.


 I'm wondering if there is enough information in the osd8 log file to deduce 
 why osd 8 stopped responding?
 I don't know enough to figure it out myself.

 Is there any expert who would be willing to take a look at the log file?

The logs will have a backtrace in them; if you can include that and
the last hundred lines or so prior in a pastebin that you email the
list with several people can give you a pretty good idea of what's
going on.

In general though, it's just going to be that the disk can't keep up
with the load being applied to it. That could be because it's failing,
or because you're pushing too much work on to it in some fashion.
-Greg
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: osd suicide timeout

2015-07-13 Thread Deneau, Tom
Greg --

Thanks.  I put the osd.log file at

https://drive.google.com/file/d/0B_rfwWh40kPwQjZ3OXdjLUZNRVU/view?usp=sharing

I noticed the following from journalctl output around that time, so other nodes 
were complaining they could not reach osd.8.

Jul 09 15:53:04 seattle-04-ausisv bash[8486]: 2015-07-09 15:53:03.905386 
3ffa0d9efd0 -1 osd.9 2487 heartbeat_check: no reply from osd.8 since back 
2015-07-09 15:52:43.256581 front 2015-07-09 15:52:43.256581 (cutoff 2015-07-09 
15:52:43.905384)
Jul 09 15:53:06 seattle-04-ausisv bash[1060]: 2015-07-09 15:53:06.784069 
3ff916fefd0 -1 osd.7 2487 heartbeat_check: no reply from osd.8 since back 
2015-07-09 15:52:46.474273 front 2015-07-09 15:52:46.474273 (cutoff 2015-07-09 
15:52:46.784066)

and here is some sar data for the disk that osd.8 was controlling (sde1=journal 
partition, sde2=data partition)

03:40:02 PM   DEV   tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz 
await svctm %util
03:50:17 PM  sde1  9.67  0.00  19046.40   1970.32  0.35 
35.86 30.62 29.60
03:50:17 PM  sde2 60.47   1524.27  14597.67266.63 24.45
404.30  8.54 51.67
03:50:32 PM  sde1 12.13  0.00  18158.93   1496.62  0.25 
20.66 17.58 21.33
03:50:32 PM  sde2 28.00   1550.93  17958.33696.76 10.54
376.50 13.52 37.87
03:50:47 PM  sde1 12.73  0.00  25446.40   1998.41  0.31 
24.19 22.30 28.40
03:50:47 PM  sde2 51.60338.67  18091.73357.18 13.05
252.91  8.02 41.40
03:51:02 PM  sde1 12.27  0.00  18790.40   1531.83  0.31 
25.33 18.53 22.73
03:51:02 PM  sde2 33.13   2635.20  18026.67623.60  5.02
151.57 10.99 36.40
03:51:17 PM  sde1 10.13  0.00  14557.87   1436.63  0.16 
16.18 12.76 12.93
03:51:17 PM  sde2 46.73   1107.73  12067.00281.91  8.55
182.88  5.46 25.53
03:51:32 PM  sde1 11.93  0.00  18594.13   1558.17  0.35 
29.27 16.42 19.60
03:51:32 PM  sde2 22.20555.20  18834.33873.40  4.24
191.08 13.51 30.00
03:51:47 PM  sde1 18.00  0.00  13926.40773.69  0.19 
10.78 10.07 18.13
03:51:47 PM  sde2 47.27   1652.80  10775.53262.94 12.24
259.01  6.66 31.47
03:52:02 PM  sde1 21.60  0.00  10845.87502.12  0.24 
11.08  9.75 21.07
03:52:02 PM  sde2 34.33   1652.80   9089.13312.87  7.43
216.41  8.45 29.00
03:52:17 PM  sde1 19.87  0.00  20198.40   1016.70  0.33 
16.85 13.46 26.73
03:52:17 PM  sde2 35.60   2752.53  16355.53536.74 11.90
333.33 10.90 38.80
03:52:32 PM  sde1 22.54  0.00   8434.04374.18  0.15  
6.67  6.17 13.90
03:52:32 PM  sde2 35.84   2738.30   4586.30204.38  2.01 
28.11  6.53 23.40
03:52:47 PM  sde1  0.00  0.00  0.00  0.00  0.00  
0.00  0.00  0.00
03:52:47 PM  sde2 13.37 35.83   1101.80 85.09  1.87
218.65  5.75  7.69
03:53:02 PM  sde1  0.00  0.00  0.00  0.00  0.00  
0.00  0.00  0.00
03:53:02 PM  sde2  0.00  0.00  0.00  0.00  0.00  
0.00  0.00  0.00
03:53:17 PM  sde1  0.00  0.00  0.00  0.00  0.00  
0.00  0.00  0.00
03:53:17 PM  sde2  0.13  0.00  0.20  1.50  0.00 
20.00 20.00  0.27
03:53:32 PM  sde1  0.00  0.00  0.00  0.00  0.00  
0.00  0.00  0.00
03:53:32 PM  sde2  0.00  0.00  0.00  0.00  0.00  
0.00  0.00  0.00
03:53:47 PM  sde1  0.00  0.00  0.00  0.00  0.00  
0.00  0.00  0.00
03:53:47 PM  sde2  0.13  0.00  0.20  1.50  0.00  
5.00  5.00  0.07
03:54:02 PM  sde1  0.00  0.00  0.00  0.00  0.00  
0.00  0.00  0.00
03:54:02 PM  sde2  0.00  0.00  0.00  0.00  0.00  
0.00  0.00  0.00
03:54:17 PM  sde1  0.00  0.00  0.00  0.00  0.00  
0.00  0.00  0.00
03:54:17 PM  sde2  0.00  0.00  0.00  0.00  0.00  
0.00  0.00  0.00
03:54:32 PM  sde1  0.00  0.00  0.00  0.00  0.00  
0.00  0.00  0.00
03:54:32 PM  sde2  0.00  0.00  0.00  0.00  0.00  
0.00  0.00  0.00

-- Tom

 -Original Message-
 From: Gregory Farnum [mailto:g...@gregs42.com]
 Sent: Monday, July 13, 2015 5:07 AM
 To: Deneau, Tom
 Cc: ceph-devel
 Subject: Re: osd suicide timeout
 
 On Fri, Jul 10, 2015 at 10:45 PM, Deneau, Tom tom.den...@amd.com wrote:
  I have an osd log file from an osd that hit a suicide timeout

Re: osd suicide timeout

2015-07-13 Thread Gregory Farnum
  0.00  0.00
 03:54:17 PM  sde2  0.00  0.00  0.00  0.00  0.00  
 0.00  0.00  0.00
 03:54:32 PM  sde1  0.00  0.00  0.00  0.00  0.00  
 0.00  0.00  0.00
 03:54:32 PM  sde2  0.00  0.00  0.00  0.00  0.00  
 0.00  0.00  0.00

 -- Tom

 -Original Message-
 From: Gregory Farnum [mailto:g...@gregs42.com]
 Sent: Monday, July 13, 2015 5:07 AM
 To: Deneau, Tom
 Cc: ceph-devel
 Subject: Re: osd suicide timeout

 On Fri, Jul 10, 2015 at 10:45 PM, Deneau, Tom tom.den...@amd.com wrote:
  I have an osd log file from an osd that hit a suicide timeout (with the
 previous 1 events logged).
  (On this node I have also seen this suicide timeout happen once before and
 also a sync_entry timeout.
 
  I can see that 6 minutes or so before that osd died, other osds on the
  same node were logging messages such as
  heartbeat_check: no reply from osd.8 so it appears that osd8
  stopped responding quite some time before it died.

 It does this deliberately when its disk threads look like they might be
 stuck.

 
  I'm wondering if there is enough information in the osd8 log file to deduce
 why osd 8 stopped responding?
  I don't know enough to figure it out myself.
 
  Is there any expert who would be willing to take a look at the log file?

 The logs will have a backtrace in them; if you can include that and the last
 hundred lines or so prior in a pastebin that you email the list with several
 people can give you a pretty good idea of what's going on.

 In general though, it's just going to be that the disk can't keep up with the
 load being applied to it. That could be because it's failing, or because
 you're pushing too much work on to it in some fashion.
 -Greg
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: osd suicide timeout

2015-07-13 Thread Deneau, Tom
Greg --

Not sure how to tell whether rebalancing occurred at that time.
I do see in other osd logs complaints that they do not get a reply from
osd.8 starting around 15:52 on that day.

I see the deep-scrub of pool 14 but that was almost 30 minutes earlier.

-- Tom


 -Original Message-
 From: Gregory Farnum [mailto:g...@gregs42.com]
 Sent: Monday, July 13, 2015 11:45 AM
 To: Deneau, Tom
 Cc: ceph-devel
 Subject: Re: osd suicide timeout
 
 heartbeat_map reset_timeout 'OSD::osd_op_tp thread 0x3ff6eb0efd0' had suicide
 timed out after 150
 
 So that's the OSD's op thread, which is the one that does most of the work.
 You often see the FileStore::op_tp when it's the disk or filesystem breaking,
 but I do see the line
 
 waiting 51  50 ops || 57248008  104857600
 
 which leaves me feeling pretty confident that the disk is just getting more
 work than it can keep up with. It looks like there was some rebalancing
 happening around this time?
 -Greg
 
 
 
 On Mon, Jul 13, 2015 at 5:04 PM, Deneau, Tom tom.den...@amd.com wrote:
  Greg --
 
  Thanks.  I put the osd.log file at
 
  https://drive.google.com/file/d/0B_rfwWh40kPwQjZ3OXdjLUZNRVU/view?usp=
  sharing
 
  I noticed the following from journalctl output around that time, so other
 nodes were complaining they could not reach osd.8.
 
  Jul 09 15:53:04 seattle-04-ausisv bash[8486]: 2015-07-09
  15:53:03.905386 3ffa0d9efd0 -1 osd.9 2487 heartbeat_check: no reply
  from osd.8 since back 2015-07-09 15:52:43.256581 front 2015-07-09
  15:52:43.256581 (cutoff 2015-07-09 15:52:43.905384) Jul 09 15:53:06
  seattle-04-ausisv bash[1060]: 2015-07-09 15:53:06.784069 3ff916fefd0
  -1 osd.7 2487 heartbeat_check: no reply from osd.8 since back
  2015-07-09 15:52:46.474273 front 2015-07-09 15:52:46.474273 (cutoff
  2015-07-09 15:52:46.784066)
 
  and here is some sar data for the disk that osd.8 was controlling
  (sde1=journal partition, sde2=data partition)
 
  03:40:02 PM   DEV   tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz
 await svctm %util
  03:50:17 PM  sde1  9.67  0.00  19046.40   1970.32  0.35
 35.86 30.62 29.60
  03:50:17 PM  sde2 60.47   1524.27  14597.67266.63 24.45
 404.30  8.54 51.67
  03:50:32 PM  sde1 12.13  0.00  18158.93   1496.62  0.25
 20.66 17.58 21.33
  03:50:32 PM  sde2 28.00   1550.93  17958.33696.76 10.54
 376.50 13.52 37.87
  03:50:47 PM  sde1 12.73  0.00  25446.40   1998.41  0.31
 24.19 22.30 28.40
  03:50:47 PM  sde2 51.60338.67  18091.73357.18 13.05
 252.91  8.02 41.40
  03:51:02 PM  sde1 12.27  0.00  18790.40   1531.83  0.31
 25.33 18.53 22.73
  03:51:02 PM  sde2 33.13   2635.20  18026.67623.60  5.02
 151.57 10.99 36.40
  03:51:17 PM  sde1 10.13  0.00  14557.87   1436.63  0.16
 16.18 12.76 12.93
  03:51:17 PM  sde2 46.73   1107.73  12067.00281.91  8.55
 182.88  5.46 25.53
  03:51:32 PM  sde1 11.93  0.00  18594.13   1558.17  0.35
 29.27 16.42 19.60
  03:51:32 PM  sde2 22.20555.20  18834.33873.40  4.24
 191.08 13.51 30.00
  03:51:47 PM  sde1 18.00  0.00  13926.40773.69  0.19
 10.78 10.07 18.13
  03:51:47 PM  sde2 47.27   1652.80  10775.53262.94 12.24
 259.01  6.66 31.47
  03:52:02 PM  sde1 21.60  0.00  10845.87502.12  0.24
 11.08  9.75 21.07
  03:52:02 PM  sde2 34.33   1652.80   9089.13312.87  7.43
 216.41  8.45 29.00
  03:52:17 PM  sde1 19.87  0.00  20198.40   1016.70  0.33
 16.85 13.46 26.73
  03:52:17 PM  sde2 35.60   2752.53  16355.53536.74 11.90
 333.33 10.90 38.80
  03:52:32 PM  sde1 22.54  0.00   8434.04374.18  0.15
 6.67  6.17 13.90
  03:52:32 PM  sde2 35.84   2738.30   4586.30204.38  2.01
 28.11  6.53 23.40
  03:52:47 PM  sde1  0.00  0.00  0.00  0.00  0.00
 0.00  0.00  0.00
  03:52:47 PM  sde2 13.37 35.83   1101.80 85.09  1.87
 218.65  5.75  7.69
  03:53:02 PM  sde1  0.00  0.00  0.00  0.00  0.00
 0.00  0.00  0.00
  03:53:02 PM  sde2  0.00  0.00  0.00  0.00  0.00
 0.00  0.00  0.00
  03:53:17 PM  sde1  0.00  0.00  0.00  0.00  0.00
 0.00  0.00  0.00
  03:53:17 PM  sde2  0.13  0.00  0.20  1.50  0.00
 20.00 20.00  0.27
  03:53:32 PM  sde1  0.00  0.00  0.00  0.00  0.00
 0.00  0.00  0.00
  03:53:32 PM  sde2  0.00  0.00  0.00  0.00  0.00
 0.00  0.00  0.00
  03:53:47 PM  sde1  0.00  0.00  0.00  0.00  0.00
 0.00  0.00  0.00
  03:53:47 PM  sde2  0.13

Re: osd suicide timeout

2015-07-13 Thread huang jun
hi,Deneau
we set the filestore_max_queue_ops=1000, unlucky we alse meet the osd suicide.
we test random 4K writes, the osd_op_tp thread suicide event if we set
osd_op_thread_timeout=120,
the threads in osd_op_tp will reset the timeout handle after submit op
to FileJournal's workqueue,
so we have no idea why the thread blocked so long. the debug log
showed during that time, other threads in osd_op_tp
works fine.

2015-07-14 7:12 GMT+08:00 Deneau, Tom tom.den...@amd.com:
 Greg --

 Not sure how to tell whether rebalancing occurred at that time.
 I do see in other osd logs complaints that they do not get a reply from
 osd.8 starting around 15:52 on that day.

 I see the deep-scrub of pool 14 but that was almost 30 minutes earlier.

 -- Tom


 -Original Message-
 From: Gregory Farnum [mailto:g...@gregs42.com]
 Sent: Monday, July 13, 2015 11:45 AM
 To: Deneau, Tom
 Cc: ceph-devel
 Subject: Re: osd suicide timeout

 heartbeat_map reset_timeout 'OSD::osd_op_tp thread 0x3ff6eb0efd0' had suicide
 timed out after 150

 So that's the OSD's op thread, which is the one that does most of the work.
 You often see the FileStore::op_tp when it's the disk or filesystem breaking,
 but I do see the line

 waiting 51  50 ops || 57248008  104857600

 which leaves me feeling pretty confident that the disk is just getting more
 work than it can keep up with. It looks like there was some rebalancing
 happening around this time?
 -Greg



 On Mon, Jul 13, 2015 at 5:04 PM, Deneau, Tom tom.den...@amd.com wrote:
  Greg --
 
  Thanks.  I put the osd.log file at
 
  https://drive.google.com/file/d/0B_rfwWh40kPwQjZ3OXdjLUZNRVU/view?usp=
  sharing
 
  I noticed the following from journalctl output around that time, so other
 nodes were complaining they could not reach osd.8.
 
  Jul 09 15:53:04 seattle-04-ausisv bash[8486]: 2015-07-09
  15:53:03.905386 3ffa0d9efd0 -1 osd.9 2487 heartbeat_check: no reply
  from osd.8 since back 2015-07-09 15:52:43.256581 front 2015-07-09
  15:52:43.256581 (cutoff 2015-07-09 15:52:43.905384) Jul 09 15:53:06
  seattle-04-ausisv bash[1060]: 2015-07-09 15:53:06.784069 3ff916fefd0
  -1 osd.7 2487 heartbeat_check: no reply from osd.8 since back
  2015-07-09 15:52:46.474273 front 2015-07-09 15:52:46.474273 (cutoff
  2015-07-09 15:52:46.784066)
 
  and here is some sar data for the disk that osd.8 was controlling
  (sde1=journal partition, sde2=data partition)
 
  03:40:02 PM   DEV   tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz
 await svctm %util
  03:50:17 PM  sde1  9.67  0.00  19046.40   1970.32  0.35
 35.86 30.62 29.60
  03:50:17 PM  sde2 60.47   1524.27  14597.67266.63 24.45
 404.30  8.54 51.67
  03:50:32 PM  sde1 12.13  0.00  18158.93   1496.62  0.25
 20.66 17.58 21.33
  03:50:32 PM  sde2 28.00   1550.93  17958.33696.76 10.54
 376.50 13.52 37.87
  03:50:47 PM  sde1 12.73  0.00  25446.40   1998.41  0.31
 24.19 22.30 28.40
  03:50:47 PM  sde2 51.60338.67  18091.73357.18 13.05
 252.91  8.02 41.40
  03:51:02 PM  sde1 12.27  0.00  18790.40   1531.83  0.31
 25.33 18.53 22.73
  03:51:02 PM  sde2 33.13   2635.20  18026.67623.60  5.02
 151.57 10.99 36.40
  03:51:17 PM  sde1 10.13  0.00  14557.87   1436.63  0.16
 16.18 12.76 12.93
  03:51:17 PM  sde2 46.73   1107.73  12067.00281.91  8.55
 182.88  5.46 25.53
  03:51:32 PM  sde1 11.93  0.00  18594.13   1558.17  0.35
 29.27 16.42 19.60
  03:51:32 PM  sde2 22.20555.20  18834.33873.40  4.24
 191.08 13.51 30.00
  03:51:47 PM  sde1 18.00  0.00  13926.40773.69  0.19
 10.78 10.07 18.13
  03:51:47 PM  sde2 47.27   1652.80  10775.53262.94 12.24
 259.01  6.66 31.47
  03:52:02 PM  sde1 21.60  0.00  10845.87502.12  0.24
 11.08  9.75 21.07
  03:52:02 PM  sde2 34.33   1652.80   9089.13312.87  7.43
 216.41  8.45 29.00
  03:52:17 PM  sde1 19.87  0.00  20198.40   1016.70  0.33
 16.85 13.46 26.73
  03:52:17 PM  sde2 35.60   2752.53  16355.53536.74 11.90
 333.33 10.90 38.80
  03:52:32 PM  sde1 22.54  0.00   8434.04374.18  0.15
 6.67  6.17 13.90
  03:52:32 PM  sde2 35.84   2738.30   4586.30204.38  2.01
 28.11  6.53 23.40
  03:52:47 PM  sde1  0.00  0.00  0.00  0.00  0.00
 0.00  0.00  0.00
  03:52:47 PM  sde2 13.37 35.83   1101.80 85.09  1.87
 218.65  5.75  7.69
  03:53:02 PM  sde1  0.00  0.00  0.00  0.00  0.00
 0.00  0.00  0.00
  03:53:02 PM  sde2  0.00  0.00  0.00  0.00  0.00
 0.00  0.00  0.00
  03:53:17 PM  sde1  0.00  0.00  0.00