Dear all,

I got to an unrecoverable crash at one specific OSD, every time I try to restart it. It happened first at firefly 0.80.8, I updated to 0.80.10, but it continued to happen.

Due to this failure, I have several PGs down+peering, that won't recover even marking the OSD out.

Could someone help me? Is it possible to edit/rebuild the leveldb-based log that seems to be causing the problem?

Here is what the logfile informs me:

[(12:54:45) root@spcsnp2 ~]# service ceph start osd.31
=== osd.31 ===
create-or-move updated item name 'osd.31' weight 2.73 at location {host=spcsnp2,root=default} to crush map
Starting Ceph osd.31 on spcsnp2...
starting osd.31 at :/0 osd_data /var/lib/ceph/osd/ceph-31 /var/lib/ceph/osd/ceph-31/journal 2015-08-07 12:55:12.916880 7fd614c8f780 0 ceph version 0.80.10 (ea6c958c38df1216bf95c927f143d8b13c4a9e70), process ceph-osd, pid 23260 [(12:55:12) root@spcsnp2 ~]# 2015-08-07 12:55:12.928614 7fd614c8f780 0 filestore(/var/lib/ceph/osd/ceph-31) mount detected xfs (libxfs) 2015-08-07 12:55:12.928622 7fd614c8f780 1 filestore(/var/lib/ceph/osd/ceph-31) disabling 'filestore replica fadvise' due to known issues with fadvise(DONTNEED) on xfs 2015-08-07 12:55:12.931410 7fd614c8f780 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-31) detect_features: FIEMAP ioctl is supported and appears to work 2015-08-07 12:55:12.931419 7fd614c8f780 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-31) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option 2015-08-07 12:55:12.939290 7fd614c8f780 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-31) detect_features: syscall(SYS_syncfs, fd) fully supported 2015-08-07 12:55:12.939326 7fd614c8f780 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-31) detect_feature: extsize is disabled by conf
2015-08-07 12:55:45.587019 7fd614c8f780 -1 *** Caught signal (Aborted) **
 in thread 7fd614c8f780

 ceph version 0.80.10 (ea6c958c38df1216bf95c927f143d8b13c4a9e70)
 1: /usr/bin/ceph-osd() [0xab7562]
 2: (()+0xf030) [0x7fd6141ce030]
 3: (gsignal()+0x35) [0x7fd612d41475]
 4: (abort()+0x180) [0x7fd612d446f0]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fd61359689d]
 6: (()+0x63996) [0x7fd613594996]
 7: (()+0x639c3) [0x7fd6135949c3]
 8: (()+0x63bee) [0x7fd613594bee]
 9: (tc_new()+0x48e) [0x7fd614414aee]
10: (std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&)+0x59) [0x7fd6135f0999] 11: (std::string::_Rep::_M_clone(std::allocator<char> const&, unsigned long)+0x28) [0x7fd6135f1708]
 12: (std::string::reserve(unsigned long)+0x30) [0x7fd6135f17f0]
13: (std::string::append(char const*, unsigned long)+0xb5) [0x7fd6135f1ab5] 14: (leveldb::log::Reader::ReadRecord(leveldb::Slice*, std::string*)+0x2a2) [0x7fd614670fa2] 15: (leveldb::DBImpl::RecoverLogFile(unsigned long, leveldb::VersionEdit*, unsigned long*)+0x180) [0x7fd614669360] 16: (leveldb::DBImpl::Recover(leveldb::VersionEdit*)+0x5c2) [0x7fd61466bdf2] 17: (leveldb::DB::Open(leveldb::Options const&, std::string const&, leveldb::DB**)+0xff) [0x7fd61466c11f]
 18: (LevelDBStore::do_open(std::ostream&, bool)+0xd8) [0xa123a8]
 19: (FileStore::mount()+0x18e0) [0x9b7080]
 20: (OSD::do_convertfs(ObjectStore*)+0x1a) [0x78f52a]
 21: (main()+0x2234) [0x7331c4]
 22: (__libc_start_main()+0xfd) [0x7fd612d2dead]
 23: /usr/bin/ceph-osd() [0x736e99]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
-56> 2015-08-07 12:55:12.915675 7fd614c8f780 5 asok(0x1a20230) register_command perfcounters_dump hook 0x1a10010 -55> 2015-08-07 12:55:12.915697 7fd614c8f780 5 asok(0x1a20230) register_command 1 hook 0x1a10010 -54> 2015-08-07 12:55:12.915700 7fd614c8f780 5 asok(0x1a20230) register_command perf dump hook 0x1a10010 -53> 2015-08-07 12:55:12.915704 7fd614c8f780 5 asok(0x1a20230) register_command perfcounters_schema hook 0x1a10010 -52> 2015-08-07 12:55:12.915706 7fd614c8f780 5 asok(0x1a20230) register_command 2 hook 0x1a10010 -51> 2015-08-07 12:55:12.915709 7fd614c8f780 5 asok(0x1a20230) register_command perf schema hook 0x1a10010 -50> 2015-08-07 12:55:12.915711 7fd614c8f780 5 asok(0x1a20230) register_command config show hook 0x1a10010 -49> 2015-08-07 12:55:12.915714 7fd614c8f780 5 asok(0x1a20230) register_command config set hook 0x1a10010 -48> 2015-08-07 12:55:12.915716 7fd614c8f780 5 asok(0x1a20230) register_command config get hook 0x1a10010 -47> 2015-08-07 12:55:12.915718 7fd614c8f780 5 asok(0x1a20230) register_command log flush hook 0x1a10010 -46> 2015-08-07 12:55:12.915721 7fd614c8f780 5 asok(0x1a20230) register_command log dump hook 0x1a10010 -45> 2015-08-07 12:55:12.915723 7fd614c8f780 5 asok(0x1a20230) register_command log reopen hook 0x1a10010 -44> 2015-08-07 12:55:12.916880 7fd614c8f780 0 ceph version 0.80.10 (ea6c958c38df1216bf95c927f143d8b13c4a9e70), process ceph-osd, pid 23260 -43> 2015-08-07 12:55:12.918156 7fd614c8f780 1 -- 10.17.0.6:0/0 learned my addr 10.17.0.6:0/0 -42> 2015-08-07 12:55:12.918164 7fd614c8f780 1 accepter.accepter.bind my_inst.addr is 10.17.0.6:6812/23260 need_addr=0 -41> 2015-08-07 12:55:12.918178 7fd614c8f780 1 -- 10.18.0.6:0/0 learned my addr 10.18.0.6:0/0 -40> 2015-08-07 12:55:12.918180 7fd614c8f780 1 accepter.accepter.bind my_inst.addr is 10.18.0.6:6810/23260 need_addr=0 -39> 2015-08-07 12:55:12.918191 7fd614c8f780 1 -- 10.18.0.6:0/0 learned my addr 10.18.0.6:0/0 -38> 2015-08-07 12:55:12.918192 7fd614c8f780 1 accepter.accepter.bind my_inst.addr is 10.18.0.6:6811/23260 need_addr=0 -37> 2015-08-07 12:55:12.918202 7fd614c8f780 1 -- 10.17.0.6:0/0 learned my addr 10.17.0.6:0/0 -36> 2015-08-07 12:55:12.918204 7fd614c8f780 1 accepter.accepter.bind my_inst.addr is 10.17.0.6:6815/23260 need_addr=0 -35> 2015-08-07 12:55:12.918214 7fd614c8f780 1 -- 10.17.0.6:0/0 learned my addr 10.17.0.6:0/0 -34> 2015-08-07 12:55:12.918216 7fd614c8f780 1 accepter.accepter.bind my_inst.addr is 10.17.0.6:6816/23260 need_addr=0 -33> 2015-08-07 12:55:12.925154 7fd614c8f780 1 finished global_init_daemonize -32> 2015-08-07 12:55:12.927746 7fd614c8f780 5 asok(0x1a20230) init /var/run/ceph/ceph-osd.31.asok -31> 2015-08-07 12:55:12.927760 7fd614c8f780 5 asok(0x1a20230) bind_and_listen /var/run/ceph/ceph-osd.31.asok -30> 2015-08-07 12:55:12.927828 7fd614c8f780 5 asok(0x1a20230) register_command 0 hook 0x1a0e0b0 -29> 2015-08-07 12:55:12.927837 7fd614c8f780 5 asok(0x1a20230) register_command version hook 0x1a0e0b0 -28> 2015-08-07 12:55:12.927840 7fd614c8f780 5 asok(0x1a20230) register_command git_version hook 0x1a0e0b0 -27> 2015-08-07 12:55:12.927843 7fd614c8f780 5 asok(0x1a20230) register_command help hook 0x1a100b0 -26> 2015-08-07 12:55:12.927845 7fd614c8f780 5 asok(0x1a20230) register_command get_command_descriptions hook 0x1a10150 -25> 2015-08-07 12:55:12.927861 7fd61094c700 5 asok(0x1a20230) entry start -24> 2015-08-07 12:55:12.928614 7fd614c8f780 0 filestore(/var/lib/ceph/osd/ceph-31) mount detected xfs (libxfs) -23> 2015-08-07 12:55:12.928622 7fd614c8f780 1 filestore(/var/lib/ceph/osd/ceph-31) disabling 'filestore replica fadvise' due to known issues with fadvise(DONTNEED) on xfs -22> 2015-08-07 12:55:12.931410 7fd614c8f780 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-31) detect_features: FIEMAP ioctl is supported and appears to work -21> 2015-08-07 12:55:12.931419 7fd614c8f780 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-31) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option -20> 2015-08-07 12:55:12.939290 7fd614c8f780 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-31) detect_features: syscall(SYS_syncfs, fd) fully supported -19> 2015-08-07 12:55:12.939326 7fd614c8f780 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-31) detect_feature: extsize is disabled by conf -18> 2015-08-07 12:55:16.785686 7fd61094c700 5 asok(0x1a20230) AdminSocket: request 'get_command_descriptions' '' to 0x1a10150 returned 1164 bytes -17> 2015-08-07 12:55:16.788515 7fd61094c700 1 do_command 'config get' 'format:json var:fsid -16> 2015-08-07 12:55:16.788546 7fd61094c700 1 do_command 'config get' 'format:json var:fsid result is 47 bytes -15> 2015-08-07 12:55:16.788549 7fd61094c700 5 asok(0x1a20230) AdminSocket: request 'config get' '' to 0x1a10010 returned 47 bytes -14> 2015-08-07 12:55:16.788748 7fd61094c700 5 asok(0x1a20230) AdminSocket: request 'get_command_descriptions' '' to 0x1a10150 returned 1164 bytes -13> 2015-08-07 12:55:16.790540 7fd61094c700 5 asok(0x1a20230) AdminSocket: request 'version' '' to 0x1a0e0b0 returned 21 bytes -12> 2015-08-07 12:55:26.022803 7fd61094c700 5 asok(0x1a20230) AdminSocket: request 'get_command_descriptions' '' to 0x1a10150 returned 1164 bytes -11> 2015-08-07 12:55:26.025710 7fd61094c700 1 do_command 'config get' 'format:json var:fsid -10> 2015-08-07 12:55:26.025725 7fd61094c700 1 do_command 'config get' 'format:json var:fsid result is 47 bytes -9> 2015-08-07 12:55:26.025727 7fd61094c700 5 asok(0x1a20230) AdminSocket: request 'config get' '' to 0x1a10010 returned 47 bytes -8> 2015-08-07 12:55:26.025883 7fd61094c700 5 asok(0x1a20230) AdminSocket: request 'get_command_descriptions' '' to 0x1a10150 returned 1164 bytes -7> 2015-08-07 12:55:26.027690 7fd61094c700 5 asok(0x1a20230) AdminSocket: request 'version' '' to 0x1a0e0b0 returned 21 bytes -6> 2015-08-07 12:55:36.291878 7fd61094c700 5 asok(0x1a20230) AdminSocket: request 'get_command_descriptions' '' to 0x1a10150 returned 1164 bytes -5> 2015-08-07 12:55:36.294711 7fd61094c700 1 do_command 'config get' 'format:json var:fsid -4> 2015-08-07 12:55:36.294729 7fd61094c700 1 do_command 'config get' 'format:json var:fsid result is 47 bytes -3> 2015-08-07 12:55:36.294732 7fd61094c700 5 asok(0x1a20230) AdminSocket: request 'config get' '' to 0x1a10010 returned 47 bytes -2> 2015-08-07 12:55:36.294936 7fd61094c700 5 asok(0x1a20230) AdminSocket: request 'get_command_descriptions' '' to 0x1a10150 returned 1164 bytes -1> 2015-08-07 12:55:36.296827 7fd61094c700 5 asok(0x1a20230) AdminSocket: request 'version' '' to 0x1a0e0b0 returned 21 bytes 0> 2015-08-07 12:55:45.587019 7fd614c8f780 -1 *** Caught signal (Aborted) **
 in thread 7fd614c8f780

 ceph version 0.80.10 (ea6c958c38df1216bf95c927f143d8b13c4a9e70)
 1: /usr/bin/ceph-osd() [0xab7562]
 2: (()+0xf030) [0x7fd6141ce030]
 3: (gsignal()+0x35) [0x7fd612d41475]
 4: (abort()+0x180) [0x7fd612d446f0]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fd61359689d]
 6: (()+0x63996) [0x7fd613594996]
 7: (()+0x639c3) [0x7fd6135949c3]
 8: (()+0x63bee) [0x7fd613594bee]
 9: (tc_new()+0x48e) [0x7fd614414aee]
10: (std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&)+0x59) [0x7fd6135f0999] 11: (std::string::_Rep::_M_clone(std::allocator<char> const&, unsigned long)+0x28) [0x7fd6135f1708]
 12: (std::string::reserve(unsigned long)+0x30) [0x7fd6135f17f0]
13: (std::string::append(char const*, unsigned long)+0xb5) [0x7fd6135f1ab5] 14: (leveldb::log::Reader::ReadRecord(leveldb::Slice*, std::string*)+0x2a2) [0x7fd614670fa2] 15: (leveldb::DBImpl::RecoverLogFile(unsigned long, leveldb::VersionEdit*, unsigned long*)+0x180) [0x7fd614669360] 16: (leveldb::DBImpl::Recover(leveldb::VersionEdit*)+0x5c2) [0x7fd61466bdf2] 17: (leveldb::DB::Open(leveldb::Options const&, std::string const&, leveldb::DB**)+0xff) [0x7fd61466c11f]
 18: (LevelDBStore::do_open(std::ostream&, bool)+0xd8) [0xa123a8]
 19: (FileStore::mount()+0x18e0) [0x9b7080]
 20: (OSD::do_convertfs(ObjectStore*)+0x1a) [0x78f52a]
 21: (main()+0x2234) [0x7331c4]
 22: (__libc_start_main()+0xfd) [0x7fd612d2dead]
 23: /usr/bin/ceph-osd() [0x736e99]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 keyvaluestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-osd.31.log
--- end dump of recent events ---

--



--

As informações contidas nesta mensagem são CONFIDENCIAIS, protegidas pelo 
sigilo legal e por direitos autorais. A divulgação, distribuição, reprodução ou 
qualquer forma de utilização do teor deste documento depende de autorização do 
emissor, sujeitando-se o infrator às sanções legais. Caso esta comunicação 
tenha sido recebida por engano, favor avisar imediatamente, respondendo esta 
mensagem.
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to