Re: [ceph-users] Part 2: ssd osd fails often with FAILED assert(soid scrubber.start || soid = scrubber.end)

2015-01-26 Thread Irek Fasikhov
Hi, All,Loic
I have exactly the same error. I understand the problem is in 0.80.9? Thank
you.



Sat Jan 17 2015 at 2:21:09 AM, Loic Dachary l...@dachary.org:



 On 14/01/2015 18:33, Udo Lembke wrote:
  Hi Loic,
  thanks for the answer. I hope it's not like in
  http://tracker.ceph.com/issues/8747 where the issue happens with an
  patched version if understand right.

 http://tracker.ceph.com/issues/8747 is a duplicate of
 http://tracker.ceph.com/issues/8011 indeed :-)
 
  So I must only wait few month ;-) for an backport...
 
  Udo
 
  Am 14.01.2015 09:40, schrieb Loic Dachary:
  Hi,
 
  This is http://tracker.ceph.com/issues/8011 which is being
  backported.
 
  Cheers
 
 

 --
 Loïc Dachary, Artisan Logiciel Libre

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Part 2: ssd osd fails often with FAILED assert(soid scrubber.start || soid = scrubber.end)

2015-01-14 Thread Udo Lembke
Hi again,
sorry for not threaded, but my last email don't came back on the mailing
list (often miss some posts!).

Just after sending the last mail, the first time another SSD fails - in
this case an cheap one, but with the same error:

root@ceph-04:/var/log/ceph# more ceph-osd.62.log
2015-01-13 16:40:55.712967 7fb29cfd3700  0 log [INF] : 17.2 scrub ok
2015-01-13 17:54:35.548361 7fb29dfd5700  0 log [INF] : 17.3 scrub ok
2015-01-13 17:54:38.007014 7fb29dfd5700  0 log [INF] : 17.5 scrub ok
2015-01-13 17:54:41.215558 7fb29d7d4700  0 log [INF] : 17.f scrub ok
2015-01-13 17:54:42.277585 7fb29dfd5700  0 log [INF] : 17.a scrub ok
2015-01-13 17:54:48.961582 7fb29d7d4700  0 log [INF] : 17.6 scrub ok
2015-01-13 20:15:08.749597 7fb292337700  0 -- 192.168.3.14:6824/9185 
192.168.3.15:6824/11735 pipe(0x107d9680 sd=307 :6824 s=2 pgs=2 cs=1
l=0 c=0x124a09a0).fault, initiating reconnect
2015-01-13 20:15:08.750803 7fb296dbe700  0 -- 192.168.3.14:0/9185 
192.168.3.15:6825/11735 pipe(0xd011180 sd=42 :0 s=1 pgs=0 cs=0 l=1 c=0x
8d19760).fault
2015-01-13 20:15:08.750804 7fb292b3f700  0 -- 192.168.3.14:0/9185 
172.20.2.15:6837/11735 pipe(0x1210f900 sd=66 :0 s=1 pgs=0 cs=0 l=1 c=0x
beae840).fault
2015-01-13 20:15:08.751056 7fb291d31700  0 -- 192.168.3.14:6824/9185 
192.168.3.15:6824/11735 pipe(0x107d9680 sd=29 :6824 s=1 pgs=2 cs=2 l
=0 c=0x124a09a0).fault
2015-01-13 20:15:27.035342 7fb2b3edd700 -1 osd.62 116422
heartbeat_check: no reply from osd.61 since back 2015-01-13
20:15:06.843259 front 2
015-01-13 20:15:06.843259 (cutoff 2015-01-13 20:15:07.035339)
2015-01-13 20:15:28.036773 7fb2b3edd700 -1 osd.62 116422
heartbeat_check: no reply from osd.61 since back 2015-01-13
20:15:06.843259 front 2
015-01-13 20:15:06.843259 (cutoff 2015-01-13 20:15:08.036769)
2015-01-13 20:15:28.945179 7fb29b7d0700 -1 osd.62 116422
heartbeat_check: no reply from osd.61 since back 2015-01-13
20:15:06.843259 front 2
015-01-13 20:15:06.843259 (cutoff 2015-01-13 20:15:08.945178)
2015-01-13 20:15:29.037016 7fb2b3edd700 -1 osd.62 116422
heartbeat_check: no reply from osd.61 since back 2015-01-13
20:15:06.843259 front 2
015-01-13 20:15:06.843259 (cutoff 2015-01-13 20:15:09.037014)
2015-01-13 20:15:30.037204 7fb2b3edd700 -1 osd.62 116422
heartbeat_check: no reply from osd.61 since back 2015-01-13
20:15:06.843259 front 2
015-01-13 20:15:06.843259 (cutoff 2015-01-13 20:15:10.037202)
2015-01-13 20:15:30.645491 7fb29b7d0700 -1 osd.62 116422
heartbeat_check: no reply from osd.61 since back 2015-01-13
20:15:06.843259 front 2
015-01-13 20:15:06.843259 (cutoff 2015-01-13 20:15:10.645483)
2015-01-13 20:15:31.037326 7fb2b3edd700 -1 osd.62 116422
heartbeat_check: no reply from osd.61 since back 2015-01-13
20:15:06.843259 front 2
015-01-13 20:15:06.843259 (cutoff 2015-01-13 20:15:11.037323)
2015-01-13 20:15:32.037442 7fb2b3edd700 -1 osd.62 116422
heartbeat_check: no reply from osd.61 since back 2015-01-13
20:15:06.843259 front 2
015-01-13 20:15:06.843259 (cutoff 2015-01-13 20:15:12.037439)
2015-01-13 20:15:33.037641 7fb2b3edd700 -1 osd.62 116422
heartbeat_check: no reply from osd.61 since back 2015-01-13
20:15:06.843259 front 2
015-01-13 20:15:06.843259 (cutoff 2015-01-13 20:15:13.037637)
2015-01-13 20:15:34.037843 7fb2b3edd700 -1 osd.62 116422
heartbeat_check: no reply from osd.61 since back 2015-01-13
20:15:06.843259 front 2
015-01-13 20:15:06.843259 (cutoff 2015-01-13 20:15:14.037839)
2015-01-13 21:39:35.241153 7fb29dfd5700  0 log [INF] : 17.d scrub ok
2015-01-13 21:39:39.293113 7fb29a7ce700 -1 osd/ReplicatedPG.cc: In
function 'void ReplicatedPG::finish_ctx(ReplicatedPG::OpContext*, int, bo
ol)' thread 7fb29a7ce700 time 2015-01-13 21:39:39.279799
osd/ReplicatedPG.cc: 5306: FAILED assert(soid  scrubber.start || soid
= scrubber.end)

 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
 1: (ReplicatedPG::finish_ctx(ReplicatedPG::OpContext*, int,
bool)+0x1320) [0x9296b0]
 2:
(ReplicatedPG::try_flush_mark_clean(boost::shared_ptrReplicatedPG::FlushOp)+0x5f6)
[0x92b076]
 3: (ReplicatedPG::finish_flush(hobject_t, unsigned long, int)+0x296)
[0x92b876]
 4: (C_Flush::finish(int)+0x86) [0x986226]
 5: (Context::complete(int)+0x9) [0x78f449]
 6: (Finisher::finisher_thread_entry()+0x1c8) [0xad5a18]
 7: (()+0x6b50) [0x7fb2b94ceb50]
 8: (clone()+0x6d) [0x7fb2b80dc7bd]
 NOTE: a copy of the executable, or `objdump -rdS executable` is
needed to interpret this.

--- begin dump of recent events ---
  -127 2015-01-10 19:39:41.861724 7fb2b9faa780  5 asok(0x28e4230)
register_command perfcounters_dump hook 0x28d4010
  -126 2015-01-10 19:39:41.861749 7fb2b9faa780  5 asok(0x28e4230)
register_command 1 hook 0x28d4010
  -125 2015-01-10 19:39:41.861753 7fb2b9faa780  5 asok(0x28e4230)
register_command perf dump hook 0x28d4010
  -124 2015-01-10 19:39:41.861756 7fb2b9faa780  5 asok(0x28e4230)
register_command perfcounters_schema hook 0x28d4010
  -123 2015-01-10 19:39:41.861759 7fb2b9faa780  5 asok(0x28e4230)
register_command 2 hook 0x28d4010
  -122 2015-01-10 19:39:41.861762 7fb2b9faa780  

Re: [ceph-users] Part 2: ssd osd fails often with FAILED assert(soid scrubber.start || soid = scrubber.end)

2015-01-14 Thread Loic Dachary
Hi,

This is http://tracker.ceph.com/issues/8011 which is being backported.

Cheers

On 13/01/2015 22:00, Udo Lembke wrote:
 Hi again,
 sorry for not threaded, but my last email don't came back on the mailing
 list (often miss some posts!).
 
 Just after sending the last mail, the first time another SSD fails - in
 this case an cheap one, but with the same error:
 
 root@ceph-04:/var/log/ceph# more ceph-osd.62.log
 2015-01-13 16:40:55.712967 7fb29cfd3700  0 log [INF] : 17.2 scrub ok
 2015-01-13 17:54:35.548361 7fb29dfd5700  0 log [INF] : 17.3 scrub ok
 2015-01-13 17:54:38.007014 7fb29dfd5700  0 log [INF] : 17.5 scrub ok
 2015-01-13 17:54:41.215558 7fb29d7d4700  0 log [INF] : 17.f scrub ok
 2015-01-13 17:54:42.277585 7fb29dfd5700  0 log [INF] : 17.a scrub ok
 2015-01-13 17:54:48.961582 7fb29d7d4700  0 log [INF] : 17.6 scrub ok
 2015-01-13 20:15:08.749597 7fb292337700  0 -- 192.168.3.14:6824/9185 
 192.168.3.15:6824/11735 pipe(0x107d9680 sd=307 :6824 s=2 pgs=2 cs=1
 l=0 c=0x124a09a0).fault, initiating reconnect
 2015-01-13 20:15:08.750803 7fb296dbe700  0 -- 192.168.3.14:0/9185 
 192.168.3.15:6825/11735 pipe(0xd011180 sd=42 :0 s=1 pgs=0 cs=0 l=1 c=0x
 8d19760).fault
 2015-01-13 20:15:08.750804 7fb292b3f700  0 -- 192.168.3.14:0/9185 
 172.20.2.15:6837/11735 pipe(0x1210f900 sd=66 :0 s=1 pgs=0 cs=0 l=1 c=0x
 beae840).fault
 2015-01-13 20:15:08.751056 7fb291d31700  0 -- 192.168.3.14:6824/9185 
 192.168.3.15:6824/11735 pipe(0x107d9680 sd=29 :6824 s=1 pgs=2 cs=2 l
 =0 c=0x124a09a0).fault
 2015-01-13 20:15:27.035342 7fb2b3edd700 -1 osd.62 116422
 heartbeat_check: no reply from osd.61 since back 2015-01-13
 20:15:06.843259 front 2
 015-01-13 20:15:06.843259 (cutoff 2015-01-13 20:15:07.035339)
 2015-01-13 20:15:28.036773 7fb2b3edd700 -1 osd.62 116422
 heartbeat_check: no reply from osd.61 since back 2015-01-13
 20:15:06.843259 front 2
 015-01-13 20:15:06.843259 (cutoff 2015-01-13 20:15:08.036769)
 2015-01-13 20:15:28.945179 7fb29b7d0700 -1 osd.62 116422
 heartbeat_check: no reply from osd.61 since back 2015-01-13
 20:15:06.843259 front 2
 015-01-13 20:15:06.843259 (cutoff 2015-01-13 20:15:08.945178)
 2015-01-13 20:15:29.037016 7fb2b3edd700 -1 osd.62 116422
 heartbeat_check: no reply from osd.61 since back 2015-01-13
 20:15:06.843259 front 2
 015-01-13 20:15:06.843259 (cutoff 2015-01-13 20:15:09.037014)
 2015-01-13 20:15:30.037204 7fb2b3edd700 -1 osd.62 116422
 heartbeat_check: no reply from osd.61 since back 2015-01-13
 20:15:06.843259 front 2
 015-01-13 20:15:06.843259 (cutoff 2015-01-13 20:15:10.037202)
 2015-01-13 20:15:30.645491 7fb29b7d0700 -1 osd.62 116422
 heartbeat_check: no reply from osd.61 since back 2015-01-13
 20:15:06.843259 front 2
 015-01-13 20:15:06.843259 (cutoff 2015-01-13 20:15:10.645483)
 2015-01-13 20:15:31.037326 7fb2b3edd700 -1 osd.62 116422
 heartbeat_check: no reply from osd.61 since back 2015-01-13
 20:15:06.843259 front 2
 015-01-13 20:15:06.843259 (cutoff 2015-01-13 20:15:11.037323)
 2015-01-13 20:15:32.037442 7fb2b3edd700 -1 osd.62 116422
 heartbeat_check: no reply from osd.61 since back 2015-01-13
 20:15:06.843259 front 2
 015-01-13 20:15:06.843259 (cutoff 2015-01-13 20:15:12.037439)
 2015-01-13 20:15:33.037641 7fb2b3edd700 -1 osd.62 116422
 heartbeat_check: no reply from osd.61 since back 2015-01-13
 20:15:06.843259 front 2
 015-01-13 20:15:06.843259 (cutoff 2015-01-13 20:15:13.037637)
 2015-01-13 20:15:34.037843 7fb2b3edd700 -1 osd.62 116422
 heartbeat_check: no reply from osd.61 since back 2015-01-13
 20:15:06.843259 front 2
 015-01-13 20:15:06.843259 (cutoff 2015-01-13 20:15:14.037839)
 2015-01-13 21:39:35.241153 7fb29dfd5700  0 log [INF] : 17.d scrub ok
 2015-01-13 21:39:39.293113 7fb29a7ce700 -1 osd/ReplicatedPG.cc: In
 function 'void ReplicatedPG::finish_ctx(ReplicatedPG::OpContext*, int, bo
 ol)' thread 7fb29a7ce700 time 2015-01-13 21:39:39.279799
 osd/ReplicatedPG.cc: 5306: FAILED assert(soid  scrubber.start || soid
 = scrubber.end)
 
  ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
  1: (ReplicatedPG::finish_ctx(ReplicatedPG::OpContext*, int,
 bool)+0x1320) [0x9296b0]
  2:
 (ReplicatedPG::try_flush_mark_clean(boost::shared_ptrReplicatedPG::FlushOp)+0x5f6)
 [0x92b076]
  3: (ReplicatedPG::finish_flush(hobject_t, unsigned long, int)+0x296)
 [0x92b876]
  4: (C_Flush::finish(int)+0x86) [0x986226]
  5: (Context::complete(int)+0x9) [0x78f449]
  6: (Finisher::finisher_thread_entry()+0x1c8) [0xad5a18]
  7: (()+0x6b50) [0x7fb2b94ceb50]
  8: (clone()+0x6d) [0x7fb2b80dc7bd]
  NOTE: a copy of the executable, or `objdump -rdS executable` is
 needed to interpret this.
 
 --- begin dump of recent events ---
   -127 2015-01-10 19:39:41.861724 7fb2b9faa780  5 asok(0x28e4230)
 register_command perfcounters_dump hook 0x28d4010
   -126 2015-01-10 19:39:41.861749 7fb2b9faa780  5 asok(0x28e4230)
 register_command 1 hook 0x28d4010
   -125 2015-01-10 19:39:41.861753 7fb2b9faa780  5 asok(0x28e4230)
 register_command perf dump hook 0x28d4010
   -124 2015-01-10 19:39:41.861756