I kind of "fixed" it by creating a new journal in file instead of separate
partition, which probably caused some data loss, but at least allowed OSD
to start and join cluster. Backfilling is now in progress.
Old journal is still there on separate device, if it can help in
investigation. But this "malloc -> ENOMEM/OOM killer -> corrupted journal
-> trying to recover -> ENOMEM/OOM killer ..." looks like a bug.

2015-08-19 0:13 GMT+03:00 Евгений Д. <ineu.m...@gmail.com>:

> Hello.
>
> I have a small Ceph cluster running 9 OSDs, using XFS on separate disks
> and dedicated partitions on system disk for journals.
> After creation it worked fine for a while. Then suddenly one of OSDs
> stopped and didn't start. I had to recreate it. Recovery started.
> After few days of recovery OSD on another machine also stopped. I try to
> start it, it runs for few minutes and dies, looks like it is not able to
> recover journal.
> According to strace, it tries to allocate too much memory and stops with
> ENOMEM. Sometimes it is being killed by kernel's OOM killer.
>
> I tried flushing journal manually with `ceph-osd -i 3 --flush-journal`,
> but it didn't work either. Error log is as follows:
>
> [root@assets-2 ~]# ceph-osd -i 3 --flush-journal
> SG_IO: bad/missing sense data, sb[]:  70 00 05 00 00 00 00 0d 00 00 00 00
> 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 2015-08-18 23:00:37.956714 7ff102040880 -1
> filestore(/var/lib/ceph/osd/ceph-3) could not find
> 225eff8c/default.4323.18_22783306dc51892b40b49e3e26f79baf_55c38b331700002600566c46_s.jpeg/head//8
> in index: (2) No such file or directory
> 2015-08-18 23:00:37.956741 7ff102040880 -1
> filestore(/var/lib/ceph/osd/ceph-3) could not find
> 235eff8c/default.4323.16_3018ff7c6066bddc0c867b293724d7b1_dolar7_106_m.jpg/head//8
> in index: (2) No such file or directory
> <skipped>
> 2015-08-18 23:00:37.958424 7ff102040880 -1
> filestore(/var/lib/ceph/osd/ceph-3) could not find c//head//8 in index: (2)
> No such file or directory
> tcmalloc: large alloc 1073741824 bytes == 0x66b10000 @  0x7ff10115ae6a
> 0x7ff10117ad64 0x7ff0ffd4fc29 0x7ff0ffd5086b 0x7ff0ffd50914 0x7ff0ffd50b7f
> 0x968a0f 0xa572b3 0xa5c6b1 0xa5f762 0x9018ba 0x90238e 0x911b2c 0x915064
> 0x92d7cb 0x8ff890 0x642239 0x7ff0ff3daaf5 0x65cdc9 (nil)
> tcmalloc: large alloc 2147483648 bytes == 0xbf490000 @  0x7ff10115ae6a
> 0x7ff10117ad64 0x7ff0ffd4fc29 0x7ff0ffd5086b 0x7ff0ffd50914 0x7ff0ffd50b7f
> 0x968a0f 0xa572b3 0xa5c6b1 0xa5f762 0x9018ba 0x90238e 0x911b2c 0x915064
> 0x92d7cb 0x8ff890 0x642239 0x7ff0ff3daaf5 0x65cdc9 (nil)
> tcmalloc: large alloc 4294967296 bytes == 0x16e320000 @  0x7ff10115ae6a
> 0x7ff10117ad64 0x7ff0ffd4fc29 0x7ff0ffd5086b 0x7ff0ffd50914 0x7ff0ffd50b7f
> 0x968a0f 0xa572b3 0xa5c6b1 0xa5f762 0x9018ba 0x90238e 0x911b2c 0x915064
> 0x92d7cb 0x8ff890 0x642239 0x7ff0ff3daaf5 0x65cdc9 (nil)
> tcmalloc: large alloc 8589934592 bytes == (nil) @  0x7ff10115ae6a
> 0x7ff10117ad64 0x7ff0ffd4fc29 0x7ff0ffd5086b 0x7ff0ffd50914 0x7ff0ffd50b7f
> 0x968a0f 0xa572b3 0xa5c6b1 0xa5f762 0x9018ba 0x90238e 0x911b2c 0x915064
> 0x92d7cb 0x8ff890 0x642239 0x7ff0ff3daaf5 0x65cdc9 (nil)
> terminate called after throwing an instance of 'std::bad_alloc'
>   what():  std::bad_alloc
> *** Caught signal (Aborted) **
>  in thread 7ff102040880
>  ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3)
>  1: ceph-osd() [0xac5642]
>  2: (()+0xf130) [0x7ff1009d4130]
>  3: (gsignal()+0x37) [0x7ff0ff3ee5d7]
>  4: (abort()+0x148) [0x7ff0ff3efcc8]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7ff0ffcf29b5]
>  6: (()+0x5e926) [0x7ff0ffcf0926]
>  7: (()+0x5e953) [0x7ff0ffcf0953]
>  8: (()+0x5eb73) [0x7ff0ffcf0b73]
>  9: (()+0x15d3e) [0x7ff10115ad3e]
>  10: (tc_new()+0x1e0) [0x7ff10117ade0]
>  11: (std::string::_Rep::_S_create(unsigned long, unsigned long,
> std::allocator<char> const&)+0x59) [0x7ff0ffd4fc29]
>  12: (std::string::_Rep::_M_clone(std::allocator<char> const&, unsigned
> long)+0x1b) [0x7ff0ffd5086b]
>  13: (std::string::reserve(unsigned long)+0x44) [0x7ff0ffd50914]
>  14: (std::string::append(char const*, unsigned long)+0x4f)
> [0x7ff0ffd50b7f]
>  15: (LevelDBStore::LevelDBTransactionImpl::rmkeys_by_prefix(std::string
> const&)+0xdf) [0x968a0f]
>  16:
> (DBObjectMap::clear_header(std::tr1::shared_ptr<DBObjectMap::_Header>,
> std::tr1::shared_ptr<KeyValueDB::TransactionImpl>)+0xd3) [0xa572b3]
>  17: (DBObjectMap::_clear(std::tr1::shared_ptr<DBObjectMap::_Header>,
> std::tr1::shared_ptr<KeyValueDB::TransactionImpl>)+0xa1) [0xa5c6b1]
>  18: (DBObjectMap::clear(ghobject_t const&, SequencerPosition
> const*)+0x202) [0xa5f762]
>  19: (FileStore::lfn_unlink(coll_t, ghobject_t const&, SequencerPosition
> const&, bool)+0x16a) [0x9018ba]
>  20: (FileStore::_remove(coll_t, ghobject_t const&, SequencerPosition
> const&)+0x9e) [0x90238e]
>  21: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long,
> int, ThreadPool::TPHandle*)+0x252c) [0x911b2c]
>  22: (FileStore::_do_transactions(std::list<ObjectStore::Transaction*,
> std::allocator<ObjectStore::Transaction*> >&, unsigned long,
> ThreadPool::TPHandle*)+0x64) [0x915064]
>  23: (JournalingObjectStore::journal_replay(unsigned long)+0x5db)
> [0x92d7cb]
>  24: (FileStore::mount()+0x3730) [0x8ff890]
>  25: (main()+0xec9) [0x642239]
>  26: (__libc_start_main()+0xf5) [0x7ff0ff3daaf5]
>  27: ceph-osd() [0x65cdc9]
> 2015-08-18 23:02:38.167194 7ff102040880 -1 *** Caught signal (Aborted) **
>  in thread 7ff102040880
>
>
>
> I can recreate filesystem on this OSD's disk and recreate OSD, but I'm not
> sure that this won't happen with another OSD on this or another machine,
> and eventually I won't lose all my data because it doesn't recover before
> all OSDs fail.
> Using CentOS 7, kernel 3.10.0-229.11.1.el7.x86_64, Ceph version 0.94.
> There are three machines, each one has 16GB memory.
>
> Is there some way to bring OSD back to life or at least to investigate the
> problem? Ceph bug, kernel issue, hardware problem?
>
> Thanks
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to