I've been successfully running cephfs on my Debian Jessies for a while and one 
day after power outage, MDS wasn't happy.  MDS crashing after it was done 
loading, increasing the memory utilization quite a bit.  I was running 
infernalis 9.2.0 and did successful upgrade from Hammer before... so I thought 
I may have hit a bug and decided to try 9.2.1.
In 9.2.1, it was not happy that my journal didn't have permission for user 
ceph.  So corrected it.  Then all of my OSDs are no longer starting.  Failing 
with similar messages as below.  I upgraded to Jewel, as I didn't see too much 
more complexitiy to upgrade from Infernalis and am still seeing these errors.
2016-04-15 22:47:04.897500 7f65fbbb0800  0 set uid:gid to 1001:1001 
(ceph:ceph)2016-04-15 22:47:04.897635 7f65fbbb0800  0 ceph version 10.1.2 
(4a2a6f72640d6b74a3bbd92798bb913ed380dcd4), process ceph-osd, pid 
12842016-04-15 22:47:04.900585 7f65fbbb0800  0 pidfile_write: ignore empty 
--pid-file2016-04-15 22:47:05.467530 7f65fbbb0800  0 
filestore(/var/lib/ceph/osd/ceph-3) backend xfs (magic 0x58465342)2016-04-15 
22:47:05.477912 7f65fbbb0800  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-3) detect_features: FIEMAP ioctl 
is disabled via 'filestore fiemap' config option2016-04-15 22:47:05.477999 
7f65fbbb0800  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-3) 
detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' 
config option2016-04-15 22:47:05.478091 7f65fbbb0800  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-3) detect_features: splice is 
supported2016-04-15 22:47:05.494593 7f65fbbb0800  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-3) detect_features: syncfs(2) 
syscall fully supported (by glibc and kernel)2016-04-15 22:47:05.494785 
7f65fbbb0800  0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-3) detect_feature: 
extsize is disabled by conf2016-04-15 22:47:05.596738 7f65fbbb0800  1 leveldb: 
Recovering log #208992016-04-15 22:47:05.825914 7f65fbbb0800  1 leveldb: Delete 
type=0 #20899
2016-04-15 22:47:05.826089 7f65fbbb0800  1 leveldb: Delete type=3 #20898
2016-04-15 22:47:05.900058 7f65fbbb0800  0 filestore(/var/lib/ceph/osd/ceph-3) 
mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled2016-04-15 
22:47:06.377878 7f65fbbb0800  1 journal _open /var/lib/ceph/osd/ceph-3/journal 
fd 18: 14998831104 bytes, block size 4096 bytes, directio = 1, aio = 
12016-04-15 22:47:06.381738 7f65fbbb0800  1 journal _open 
/var/lib/ceph/osd/ceph-3/journal fd 18: 14998831104 bytes, block size 4096 
bytes, directio = 1, aio = 12016-04-15 22:47:06.384954 7f65fbbb0800  1 
filestore(/var/lib/ceph/osd/ceph-3) upgrade2016-04-15 22:47:06.415851 
7f65fbbb0800  0 <cls> cls/cephfs/cls_cephfs.cc:202: loading 
cephfs_size_scan2016-04-15 22:47:06.419654 7f65fbbb0800  0 <cls> 
cls/hello/cls_hello.cc:305: loading cls_hello2016-04-15 22:47:06.498512 
7f65fbbb0800 -1 osd/OSD.h: In function 'OSDMapRef OSDService::get_map(epoch_t)' 
thread 7f65fbbb0800 time 2016-04-15 22:47:06.494680osd/OSD.h: 885: FAILED 
assert(ret)
 ceph version 10.1.2 (4a2a6f72640d6b74a3bbd92798bb913ed380dcd4) 1: 
(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x82) 
[0x7f65fb6364f2] 2: (OSDService::get_map(unsigned int)+0x3d) [0x7f65fafbd83d] 
3: (OSD::init()+0x1862) [0x7f65faf6ba52] 4: (main()+0x2b05) [0x7f65faed1735] 5: 
(__libc_start_main()+0xf5) [0x7f65f7a67b45] 6: (()+0x337197) [0x7f65faf1c197] 
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to 
interpret this.
--- begin dump of recent events ---   -78> 2016-04-15 22:47:04.873688 
7f65fbbb0800  5 asok(0x7f660689a000) register_command perfcounters_dump hook 
0x7f66067e2030   -77> 2016-04-15 22:47:04.873771 7f65fbbb0800  5 
asok(0x7f660689a000) register_command 1 hook 0x7f66067e2030   -76> 2016-04-15 
22:47:04.873804 7f65fbbb0800  5 asok(0x7f660689a000) register_command perf dump 
hook 0x7f66067e2030   -75> 2016-04-15 22:47:04.873834 7f65fbbb0800  5 
asok(0x7f660689a000) register_command perfcounters_schema hook 0x7f66067e2030   
-76> 2016-04-15 22:47:04.873804 7f65fbbb0800  5 asok(0x7f660689a000) 
register_command perf dump hook 0x7f66067e2030   -75> 2016-04-15 
22:47:04.873834 7f65fbbb0800  5 asok(0x7f660689a000) register_command 
perfcounters_schema hook 0x7f66067e2030   -74> 2016-04-15 22:47:04.873860 
7f65fbbb0800  5 asok(0x7f660689a000) register_command 2 hook 0x7f66067e2030   
-73> 2016-04-15 22:47:04.873886 7f65fbbb0800  5 asok(0x7f660689a000) 
register_command perf schema hook 0x7f66067e2030   -72> 2016-04-15 
22:47:04.873916 7f65fbbb0800  5 asok(0x7f660689a000) register_command perf 
reset hook 0x7f66067e2030   -71> 2016-04-15 22:47:04.873943 7f65fbbb0800  5 
asok(0x7f660689a000) register_command config show hook 0x7f66067e2030   -70> 
2016-04-15 22:47:04.873974 7f65fbbb0800  5 asok(0x7f660689a000) 
register_command config set hook 0x7f66067e2030   -69> 2016-04-15 
22:47:04.874000 7f65fbbb0800  5 asok(0x7f660689a000) register_command config 
get hook 0x7f66067e2030   -68> 2016-04-15 22:47:04.874029 7f65fbbb0800  5 
asok(0x7f660689a000) register_command config diff hook 0x7f66067e2030   -67> 
2016-04-15 22:47:04.874055 7f65fbbb0800  5 asok(0x7f660689a000) 
register_command log flush hook 0x7f66067e2030   -66> 2016-04-15 
22:47:04.874082 7f65fbbb0800  5 asok(0x7f660689a000) register_command log dump 
hook 0x7f66067e2030   -65> 2016-04-15 22:47:04.874109 7f65fbbb0800  5 
asok(0x7f660689a000) register_command log reopen hook 0x7f66067e2030   -64> 
2016-04-15 22:47:04.897500 7f65fbbb0800  0 set uid:gid to 1001:1001 (ceph:ceph) 
  -63> 2016-04-15 22:47:04.897635 7f65fbbb0800  0 ceph version 10.1.2 
(4a2a6f72640d6b74a3bbd92798bb913ed380dcd4), process ceph-osd, pid 1284   -62> 
2016-04-15 22:47:04.900224 7f65fbbb0800  1 -- 192.168.1.31:0/0 learned my addr 
192.168.1.31:0/0   -61> 2016-04-15 22:47:04.900286 7f65fbbb0800  1 
accepter.accepter.bind my_inst.addr is 192.168.1.31:6802/1284 need_addr=0   
-60> 2016-04-15 22:47:04.900350 7f65fbbb0800  1 -- 192.168.2.31:0/0 learned my 
addr 192.168.2.31:0/0   -59> 2016-04-15 22:47:04.900375 7f65fbbb0800  1 
accepter.accepter.bind my_inst.addr is 192.168.2.31:6802/1284 need_addr=0   
-58> 2016-04-15 22:47:04.900443 7f65fbbb0800  1 -- 192.168.2.31:0/0 learned my 
addr 192.168.2.31:0/0   -57> 2016-04-15 22:47:04.900475 7f65fbbb0800  1 
accepter.accepter.bind my_inst.addr is 192.168.2.31:6803/1284 need_addr=0   
-56> 2016-04-15 22:47:04.900538 7f65fbbb0800  1 -- 192.168.1.31:0/0 learned my 
addr 192.168.1.31:0/0   -55> 2016-04-15 22:47:04.900562 7f65fbbb0800  1 
accepter.accepter.bind my_inst.addr is 192.168.1.31:6803/1284 need_addr=0   
-54> 2016-04-15 22:47:04.900585 7f65fbbb0800  0 pidfile_write: ignore empty 
--pid-file   -53> 2016-04-15 22:47:04.909743 7f65fbbb0800  5 
asok(0x7f660689a000) init /var/run/ceph/ceph-osd.3.asok   -52> 2016-04-15 
22:47:04.909792 7f65fbbb0800  5 asok(0x7f660689a000) bind_and_listen 
/var/run/ceph/ceph-osd.3.asok   -51> 2016-04-15 22:47:04.909891 7f65fbbb0800  5 
asok(0x7f660689a000) register_command 0 hook 0x7f66067de0d8   -50> 2016-04-15 
22:47:04.909928 7f65fbbb0800  5 asok(0x7f660689a000) register_command version 
hook 0x7f66067de0d8   -49> 2016-04-15 22:47:04.909955 7f65fbbb0800  5 
asok(0x7f660689a000) register_command git_version hook 0x7f66067de0d8   -48> 
2016-04-15 22:47:04.909988 7f65fbbb0800  5 asok(0x7f660689a000) 
register_command help hook 0x7f66067e21e0   -47> 2016-04-15 22:47:04.910015 
7f65fbbb0800  5 asok(0x7f660689a000) register_command get_command_descriptions 
hook 0x7f66067e21f0   -46> 2016-04-15 22:47:04.910205 7f65f43c9700  5 
asok(0x7f660689a000) entry start   -45> 2016-04-15 22:47:04.910330 7f65fbbb0800 
10 monclient(hunting): build_initial_monmap   -44> 2016-04-15 22:47:04.939070 
7f65fbbb0800  5 adding auth protocol: cephx   -43> 2016-04-15 22:47:04.939118 
7f65fbbb0800  5 adding auth protocol: cephx   -42> 2016-04-15 22:47:04.939986 
7f65fbbb0800  5 asok(0x7f660689a000) register_command objecter_requests hook 
0x7f66067e22b0   -41> 2016-04-15 22:47:04.940256 7f65fbbb0800  1 -- 
192.168.1.31:6802/1284 messenger.start   -40> 2016-04-15 22:47:04.940413 
7f65fbbb0800  1 -- :/0 messenger.start   -39> 2016-04-15 22:47:04.940557 
7f65fbbb0800  1 -- 192.168.1.31:6803/1284 messenger.start   -38> 2016-04-15 
22:47:04.940686 7f65fbbb0800  1 -- 192.168.2.31:6803/1284 messenger.start   
-37> 2016-04-15 22:47:04.940798 7f65fbbb0800  1 -- 192.168.2.31:6802/1284 
messenger.start   -36> 2016-04-15 22:47:04.940899 7f65fbbb0800  1 -- :/0 
messenger.start   -35> 2016-04-15 22:47:04.941223 7f65fbbb0800  2 osd.3 0 
mounting /var/lib/ceph/osd/ceph-3 /var/lib/ceph/osd/ceph-3/journal   -34> 
2016-04-15 22:47:05.467530 7f65fbbb0800  0 filestore(/var/lib/ceph/osd/ceph-3) 
backend xfs (magic 0x58465342)   -33> 2016-04-15 22:47:05.477912 7f65fbbb0800  
0 genericfilestorebackend(/var/lib/ceph/osd/ceph-3) detect_features: FIEMAP 
ioctl is disabled via 'filestore fiemap' config option   -32> 2016-04-15 
22:47:05.477999 7f65fbbb0800  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-3) detect_features: 
SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option   
-31> 2016-04-15 22:47:05.478091 7f65fbbb0800  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-3) detect_features: splice is 
supported   -30> 2016-04-15 22:47:05.494593 7f65fbbb0800  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-3) detect_features: syncfs(2) 
syscall fully supported (by glibc and kernel)   -29> 2016-04-15 22:47:05.494785 
7f65fbbb0800  0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-3) detect_feature: 
extsize is disabled by conf   -28> 2016-04-15 22:47:05.596738 7f65fbbb0800  1 
leveldb: Recovering log #20899   -27> 2016-04-15 22:47:05.825914 7f65fbbb0800  
1 leveldb: Delete type=0 #20899
   -26> 2016-04-15 22:47:05.826089 7f65fbbb0800  1 leveldb: Delete type=3 #20898
   -25> 2016-04-15 22:47:05.900058 7f65fbbb0800  0 
filestore(/var/lib/ceph/osd/ceph-3) mount: enabling WRITEAHEAD journal mode: 
checkpoint is not enabled   -24> 2016-04-15 22:47:06.377715 7f65fbbb0800  2 
journal open /var/lib/ceph/osd/ceph-3/journal fsid 
4f86a418-6c67-4cb4-83a1-6c123c890036 fs_op_seq 9829589   -23> 2016-04-15 
22:47:06.377878 7f65fbbb0800  1 journal _open /var/lib/ceph/osd/ceph-3/journal 
fd 18: 14998831104 bytes, block size 4096 bytes, directio = 1, aio = 1   -22> 
2016-04-15 22:47:06.379811 7f65fbbb0800  2 journal open advancing committed_seq 
9829584 to fs op_seq 9829589   -21> 2016-04-15 22:47:06.380757 7f65fbbb0800  2 
journal read_entry 2537717760 : seq 9829585 29509 bytes   -20> 2016-04-15 
22:47:06.380996 7f65fbbb0800  2 journal read_entry 2537750528 : seq 9829586 
8134 bytes   -19> 2016-04-15 22:47:06.381091 7f65fbbb0800  2 journal read_entry 
2537762816 : seq 9829587 3064 bytes   -18> 2016-04-15 22:47:06.381155 
7f65fbbb0800  2 journal read_entry 2537766912 : seq 9829588 7647 bytes   -17> 
2016-04-15 22:47:06.381219 7f65fbbb0800  2 journal read_entry 2537775104 : seq 
9829589 4737 bytes   -16> 2016-04-15 22:47:06.381257 7f65fbbb0800  2 journal No 
further valid entries found, journal is most likely valid   -15> 2016-04-15 
22:47:06.381287 7f65fbbb0800  2 journal No further valid entries found, journal 
is most likely valid   -14> 2016-04-15 22:47:06.381302 7f65fbbb0800  3 journal 
journal_replay: end of journal, done.   -13> 2016-04-15 22:47:06.381738 
7f65fbbb0800  1 journal _open /var/lib/ceph/osd/ceph-3/journal fd 18: 
14998831104 bytes, block size 4096 bytes, directio = 1, aio = 1   -12> 
2016-04-15 22:47:06.384954 7f65fbbb0800  1 filestore(/var/lib/ceph/osd/ceph-3) 
upgrade   -11> 2016-04-15 22:47:06.385071 7f65fbbb0800  2 osd.3 0 boot   -10> 
2016-04-15 22:47:06.415253 7f65fbbb0800  1 <cls> 
cls/statelog/cls_statelog.cc:306: Loaded log class!    -9> 2016-04-15 
22:47:06.415851 7f65fbbb0800  0 <cls> cls/cephfs/cls_cephfs.cc:202: loading 
cephfs_size_scan    -8> 2016-04-15 22:47:06.418172 7f65fbbb0800  1 <cls> 
cls/version/cls_version.cc:228: Loaded version class!    -7> 2016-04-15 
22:47:06.419654 7f65fbbb0800  0 <cls> cls/hello/cls_hello.cc:305: loading 
cls_hello    -6> 2016-04-15 22:47:06.426520 7f65fbbb0800  1 <cls> 
cls/refcount/cls_refcount.cc:232: Loaded refcount class!    -5> 2016-04-15 
22:47:06.427217 7f65fbbb0800  1 <cls> cls/user/cls_user.cc:375: Loaded user 
class!    -4> 2016-04-15 22:47:06.428364 7f65fbbb0800  1 <cls> 
cls/replica_log/cls_replica_log.cc:141: Loaded replica log class!    -3> 
2016-04-15 22:47:06.428970 7f65fbbb0800  1 <cls> 
cls/timeindex/cls_timeindex.cc:259: Loaded timeindex class!    -2> 2016-04-15 
22:47:06.430177 7f65fbbb0800  1 <cls> cls/log/cls_log.cc:317: Loaded log class! 
   -1> 2016-04-15 22:47:06.438063 7f65fbbb0800  1 <cls> 
cls/rgw/cls_rgw.cc:3206: Loaded rgw class!     0> 2016-04-15 22:47:06.498512 
7f65fbbb0800 -1 osd/OSD.h: In function 'OSDMapRef OSDService::get_map(epoch_t)' 
thread 7f65fbbb0800 time 2016-04-15 22:47:06.494680osd/OSD.h: 885: FAILED 
assert(ret)
 ceph version 10.1.2 (4a2a6f72640d6b74a3bbd92798bb913ed380dcd4) 1: 
(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x82) 
[0x7f65fb6364f2] 2: (OSDService::get_map(unsigned int)+0x3d) [0x7f65fafbd83d] 
3: (OSD::init()+0x1862) [0x7f65faf6ba52] 4: (main()+0x2b05) [0x7f65faed1735] 5: 
(__libc_start_main()+0xf5) [0x7f65f7a67b45] 6: (()+0x337197) [0x7f65faf1c197] 
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to 
interpret this.
--- logging levels ---   0/ 5 none   0/ 1 lockdep   0/ 1 context   1/ 1 crush   
1/ 5 mds   1/ 5 mds_balancer   1/ 5 mds_locker   1/ 5 mds_log   1/ 5 
mds_log_expire   1/ 5 mds_migrator   0/ 1 buffer   0/ 1 timer   0/ 1 filer   0/ 
1 striper   0/ 1 objecter   0/ 5 rados   0/ 5 rbd   0/ 5 rbd_mirror   0/ 5 
rbd_replay   0/ 5 journaler   0/ 5 objectcacher   0/ 5 client   0/ 5 osd   0/ 5 
optracker   0/ 5 objclass   1/ 3 filestore   1/ 3 journal   0/ 5 ms   1/ 5 mon  
 0/10 monc   1/ 5 paxos   0/ 5 tp   1/ 5 auth   1/ 5 crypto   1/ 1 finisher   
1/ 5 heartbeatmap   1/ 5 perfcounter   1/ 5 rgw   1/10 civetweb   1/ 5 
javaclient   1/ 5 asok   1/ 1 throttle   0/ 0 refs   1/ 5 xio   1/ 5 compressor 
  1/ 5 newstore   1/ 5 bluestore   1/ 5 bluefs   1/ 3 bdev   1/ 5 kstore   4/ 5 
rocksdb   4/ 5 leveldb   1/ 5 kinetic   1/ 5 fuse  -2/-2 (syslog threshold)  
-1/-1 (stderr threshold)  max_recent     10000  max_new         1000  log_file 
/var/log/ceph/ceph-osd.3.log--- end dump of recent events ---2016-04-15 
22:47:06.509080 7f65fbbb0800 -1 *** Caught signal (Aborted) ** in thread 
7f65fbbb0800 thread_name:ceph-osd
 ceph version 10.1.2 (4a2a6f72640d6b74a3bbd92798bb913ed380dcd4) 1: 
(()+0x949117) [0x7f65fb52e117] 2: (()+0xf8d0) [0x7f65f9a318d0] 3: 
(gsignal()+0x37) [0x7f65f7a7b067] 4: (abort()+0x148) [0x7f65f7a7c448] 5: 
(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x256) 
[0x7f65fb6366c6] 6: (OSDService::get_map(unsigned int)+0x3d) [0x7f65fafbd83d] 
7: (OSD::init()+0x1862) [0x7f65faf6ba52] 8: (main()+0x2b05) [0x7f65faed1735] 9: 
(__libc_start_main()+0xf5) [0x7f65f7a67b45] 10: (()+0x337197) [0x7f65faf1c197] 
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to 
interpret this.
--- begin dump of recent events ---     0> 2016-04-15 22:47:06.509080 
7f65fbbb0800 -1 *** Caught signal (Aborted) ** in thread 7f65fbbb0800 
thread_name:ceph-osd
 ceph version 10.1.2 (4a2a6f72640d6b74a3bbd92798bb913ed380dcd4) 1: 
(()+0x949117) [0x7f65fb52e117] 2: (()+0xf8d0) [0x7f65f9a318d0] 3: 
(gsignal()+0x37) [0x7f65f7a7b067] 4: (abort()+0x148) [0x7f65f7a7c448] 5: 
(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x256) 
[0x7f65fb6366c6] 6: (OSDService::get_map(unsigned int)+0x3d) [0x7f65fafbd83d] 
7: (OSD::init()+0x1862) [0x7f65faf6ba52] 8: (main()+0x2b05) [0x7f65faed1735] 9: 
(__libc_start_main()+0xf5) [0x7f65f7a67b45] 10: (()+0x337197) [0x7f65faf1c197] 
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to 
interpret this.
--- logging levels ---   0/ 5 none   0/ 1 lockdep   0/ 1 context   1/ 1 crush   
1/ 5 mds   1/ 5 mds_balancer   1/ 5 mds_locker   1/ 5 mds_log   1/ 5 
mds_log_expire   1/ 5 mds_migrator   0/ 1 buffer   0/ 1 timer   0/ 1 filer   0/ 
1 striper   0/ 1 objecter   0/ 5 rados   0/ 5 rbd   0/ 5 rbd_mirror   0/ 5 
rbd_replay   0/ 5 journaler   0/ 5 objectcacher   0/ 5 client   0/ 5 osd   0/ 5 
optracker   0/ 5 objclass   1/ 3 filestore   1/ 3 journal   0/ 5 ms   1/ 5 mon  
 0/10 monc   1/ 5 paxos   0/ 5 tp   1/ 5 auth   1/ 5 crypto   1/ 1 finisher   
1/ 5 heartbeatmap   1/ 5 perfcounter   1/ 5 rgw   1/10 civetweb   1/ 5 
javaclient   1/ 5 asok   1/ 1 throttle   0/ 0 refs   1/ 5 xio   1/ 5 compressor 
  1/ 5 newstore   1/ 5 bluestore   1/ 5 bluefs   1/ 3 bdev   1/ 5 kstore   4/ 5 
rocksdb   4/ 5 leveldb   1/ 5 kinetic   1/ 5 fuse  -2/-2 (syslog threshold)  
-1/-1 (stderr threshold)  max_recent     10000  max_new         1000  log_file 
/var/log/ceph/ceph-osd.3.log--- end dump of recent events ---
What can I try to get this OSD back online?  I saw some similar issues on 
google but I wasn't sure if that was actually the same issue.
If I run in to MDS issue after resolving this, I'll send out another one. =)  
Thanks all!
Regards,Hong
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to