Hi,

While trying to get an OSD back in the test cluster, which had been
dropped out for unknown reason, we see a RocksDB Segmentation fault
during "compaction". I increased debugging to 20/20 for OSD / RocksDB,
see part of the logfile below:

... 49477, 49476, 49475, 49474, 49473, 49472, 49471, 49470, 49469, 49468,
49467], "files_L1": [49465], "score": 1138.25, "input_data_size": 82872298}
    -1> 2018-01-12 08:48:23.915753 7f91eaf89e40  1 freelist init
     0> 2018-01-12 08:48:45.630418 7f91eaf89e40 -1 *** Caught signal 
(Segmentation fault) **
 in thread 7f91eaf89e40 thread_name:ceph-osd

 ceph version 12.2.2 (cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba) luminous 
(stable)
 1: (()+0xa65824) [0x55a124693824]
 2: (()+0x11390) [0x7f91e9238390]
 3: (()+0x1f8af) [0x7f91eab658af]
 4: (rocksdb::BlockBasedTable::PutDataBlockToCache(rocksdb::Slice const&, 
rocksdb::Slice const&, rocksdb::Cache*, rocksdb::Cache*, rocksdb::ReadOptions 
const&, rocksdb::ImmutableCFOptions const&, 
rocksdb::BlockBasedTable::CachableEntry<rocksdb::Block>*, rocksdb::Block*, 
unsigned int, rocksdb::Slice const&, unsigned long, bool, 
rocksdb::Cache::Priority)+0x1d9) [0x55a124a64e49]
 5: 
(rocksdb::BlockBasedTable::MaybeLoadDataBlockToCache(rocksdb::BlockBasedTable::Rep*,
 rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::Slice, 
rocksdb::BlockBasedTable::CachableEntry<rocksdb::Block>*, bool)+0x3b7) 
[0x55a124a66827]
 6: 
(rocksdb::BlockBasedTable::NewDataBlockIterator(rocksdb::BlockBasedTable::Rep*, 
rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::BlockIter*, 
bool, rocksdb::Status)+0x2ac) [0x55a124a66b6c]
 7: 
(rocksdb::BlockBasedTable::BlockEntryIteratorState::NewSecondaryIterator(rocksdb::Slice
 const&)+0x97) [0x55a124a6f2e7]
 8: (()+0xe6c48e) [0x55a124a9a48e]
 9: (()+0xe6ca06) [0x55a124a9aa06]
 10: (rocksdb::MergingIterator::Seek(rocksdb::Slice const&)+0x126) 
[0x55a124a7bc86]
 11: (rocksdb::DBIter::Seek(rocksdb::Slice const&)+0x20a) [0x55a124b1bdaa]
 12: 
(RocksDBStore::RocksDBWholeSpaceIteratorImpl::lower_bound(std::__cxx11::basic_string<char,
 std::char_traits<char>, std::allocator<char> > const&, 
std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > 
const&)+0x46) [0x55a1245d4676]
 13: (BitmapFreelistManager::init(unsigned long)+0x2dc) [0x55a12463976c]
 14: (BlueStore::_open_fm(bool)+0xc00) [0x55a124526c50]
 15: (BlueStore::_mount(bool)+0x3dc) [0x55a12459aa1c]
 16: (OSD::init()+0x3e2) [0x55a1241064e2]
 17: (main()+0x2f07) [0x55a1240181d7]
 18: (__libc_start_main()+0xf0) [0x7f91e81be830]
 19: (_start()+0x29) [0x55a1240a37f9]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to 
interpret this.

The disk in question is very old (powered on ~ 8 years), so it might be that
part of the data is corrupt. Would RocksDB throw a similar error like this in 
that case?

Gr. Stefan

P.s. We're trying to learn as much as possible when things do not go according
to plan. There is way more debug info available in case anyone is interested. 



-- 
| BIT BV  http://www.bit.nl/        Kamer van Koophandel 09090351
| GPG: 0xD14839C6                   +31 318 648 688 / i...@bit.nl
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to