Hi,
since last thursday we had an ssd-pool (cache tier) in front of an
ec-pool and fill the pools with data via rsync (app. 50MB/s).
The ssd-pool has tree disks and one of them (an DC S3700) fails four
times since that.
I simply start the osd again and the pool pas rebuilded and work again
for some hours up to some days.

I switched the ceph-node and the ssh-adapter, but this don't solve the
issue.
There wasn't any messages in syslog/messages and an fsck runs without
trouble, so I guess the problem is not OS-related.

I found this issue http://tracker.ceph.com/issues/8747 but my
ceph-version is newer (debian: ceph version 0.80.7
(6c0127fcb58008793d3c8b62d925bc91963672a3)),
and it's looks that i can reproduce this issue during 1-3 days.

The osd is ext4-formatted. All other OSDs (62) runs without trouble.

# more ceph-osd.61.log
2015-01-13 16:29:26.494458 7fedf9a3d700  0 log [INF] : 17.0 scrub ok
2015-01-13 17:29:03.988530 7fedf823a700  0 log [INF] : 17.16 scrub ok
2015-01-13 17:30:31.901032 7fedf8a3b700  0 log [INF] : 17.18 scrub ok
2015-01-13 17:31:58.983736 7fedf823a700  0 log [INF] : 17.9 scrub ok
2015-01-13 17:32:30.780308 7fedf9a3d700  0 log [INF] : 17.c scrub ok
2015-01-13 17:32:33.311433 7fedf8a3b700  0 log [INF] : 17.11 scrub ok
2015-01-13 17:37:22.237214 7fedf9a3d700  0 log [INF] : 17.7 scrub ok
2015-01-13 20:15:07.874376 7fedf6236700 -1 osd/ReplicatedPG.cc: In
function 'void ReplicatedPG::finish_ctx(ReplicatedPG::OpContext*, int, bo
ol)' thread 7fedf6236700 time 2015-01-13 20:15:07.853440
osd/ReplicatedPG.cc: 5306: FAILED assert(soid < scrubber.start || soid
>= scrubber.end)

 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
 1: (ReplicatedPG::finish_ctx(ReplicatedPG::OpContext*, int,
bool)+0x1320) [0x9296b0]
 2:
(ReplicatedPG::try_flush_mark_clean(boost::shared_ptr<ReplicatedPG::FlushOp>)+0x5f6)
[0x92b076]
 3: (ReplicatedPG::finish_flush(hobject_t, unsigned long, int)+0x296)
[0x92b876]
 4: (C_Flush::finish(int)+0x86) [0x986226]
 5: (Context::complete(int)+0x9) [0x78f449]
 6: (Finisher::finisher_thread_entry()+0x1c8) [0xad5a18]
 7: (()+0x6b50) [0x7fee152f6b50]
 8: (clone()+0x6d) [0x7fee13f047bd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

--- begin dump of recent events ---
   -70> 2015-01-11 19:54:47.962164 7fee15dd4780  5 asok(0x2f56230)
register_command perfcounters_dump hook 0x2f44010
   -69> 2015-01-11 19:54:47.962190 7fee15dd4780  5 asok(0x2f56230)
register_command 1 hook 0x2f44010
   -68> 2015-01-11 19:54:47.962195 7fee15dd4780  5 asok(0x2f56230)
register_command perf dump hook 0x2f44010
   -67> 2015-01-11 19:54:47.962201 7fee15dd4780  5 asok(0x2f56230)
register_command perfcounters_schema hook 0x2f44010
   -66> 2015-01-11 19:54:47.962203 7fee15dd4780  5 asok(0x2f56230)
register_command 2 hook 0x2f44010
   -65> 2015-01-11 19:54:47.962207 7fee15dd4780  5 asok(0x2f56230)
register_command perf schema hook 0x2f44010
   -64> 2015-01-11 19:54:47.962209 7fee15dd4780  5 asok(0x2f56230)
register_command config show hook 0x2f44010
   -63> 2015-01-11 19:54:47.962214 7fee15dd4780  5 asok(0x2f56230)
register_command config set hook 0x2f44010
   -62> 2015-01-11 19:54:47.962219 7fee15dd4780  5 asok(0x2f56230)
register_command config get hook 0x2f44010
   -61> 2015-01-11 19:54:47.962223 7fee15dd4780  5 asok(0x2f56230)
register_command log flush hook 0x2f44010
   -60> 2015-01-11 19:54:47.962226 7fee15dd4780  5 asok(0x2f56230)
register_command log dump hook 0x2f44010
   -59> 2015-01-11 19:54:47.962229 7fee15dd4780  5 asok(0x2f56230)
register_command log reopen hook 0x2f44010
   -58> 2015-01-11 19:54:47.965000 7fee15dd4780  0 ceph version 0.80.7
(6c0127fcb58008793d3c8b62d925bc91963672a3), process ceph-osd, pid 117
35
   -57> 2015-01-11 19:54:47.967362 7fee15dd4780  1 finished
global_init_daemonize
   -56> 2015-01-11 19:54:47.971666 7fee15dd4780  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-61) detect_features:
FIEMAP ioctl is suppo
rted and appears to work
   -55> 2015-01-11 19:54:47.971682 7fee15dd4780  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-61) detect_features:
FIEMAP ioctl is disab
led via 'filestore fiemap' config option
   -54> 2015-01-11 19:54:47.973281 7fee15dd4780  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-61) detect_features:
syscall(SYS_syncfs, f
d) fully supported
   -53> 2015-01-11 19:54:47.975393 7fee15dd4780  0
filestore(/var/lib/ceph/osd/ceph-61) limited size xattrs
   -52> 2015-01-11 19:54:48.013905 7fee15dd4780  0
filestore(/var/lib/ceph/osd/ceph-61) mount: enabling WRITEAHEAD journal
mode: checkpoint
is not enabled
   -51> 2015-01-11 19:54:49.245360 7fee15dd4780  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-61) detect_features:
FIEMAP ioctl is suppo
rted and appears to work
   -50> 2015-01-11 19:54:49.245370 7fee15dd4780  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-61) detect_features:
FIEMAP ioctl is disab
led via 'filestore fiemap' config option
   -49> 2015-01-11 19:54:49.247017 7fee15dd4780  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-61) detect_features:
syscall(SYS_syncfs, f
d) fully supported
   -48> 2015-01-11 19:54:49.248912 7fee15dd4780  0
filestore(/var/lib/ceph/osd/ceph-61) limited size xattrs
   -47> 2015-01-11 19:54:49.251863 7fee15dd4780  0
filestore(/var/lib/ceph/osd/ceph-61) mount: WRITEAHEAD journal mode
explicitly enabled in
 conf
   -46> 2015-01-11 19:54:49.362965 7fee15dd4780  0 <cls>
cls/hello/cls_hello.cc:271: loading cls_hello
   -45> 2015-01-11 19:54:49.387439 7fee15dd4780  0 osd.61 116417 crush
map has features 2303210029056, adjusting msgr requires for clients
   -44> 2015-01-11 19:54:49.387458 7fee15dd4780  0 osd.61 116417 crush
map has features 2578087936000 was 8705, adjusting msgr requires for
mons
   -43> 2015-01-11 19:54:49.387470 7fee15dd4780  0 osd.61 116417 crush
map has features 2578087936000, adjusting msgr requires for osds
   -42> 2015-01-11 19:54:49.387484 7fee15dd4780  0 osd.61 116417 load_pgs
   -41> 2015-01-11 19:54:50.206711 7fee15dd4780  0 osd.61 116417
load_pgs opened 32 pgs
   -40> 2015-01-11 19:54:50.211664 7fee02a4f700  0 osd.61 116417
ignoring osdmap until we have initialized
   -39> 2015-01-11 19:54:50.211752 7fee02a4f700  0 osd.61 116417
ignoring osdmap until we have initialized
   -38> 2015-01-11 19:54:50.212428 7fee15dd4780  0 osd.61 116417 done
with init, starting boot process
....

Udo
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to