Hi There,
We hit a monitor crash bug in our production clusters during adding more
nodes into one of  clusters.
The stack trace looks like below:
lc 25431444     0> 2017-11-23 15:41:16.688046 7f93883f2700 -1 error_msg
mon/OSDMonitor.cc: In function 'MOSDMap*
OSDMonitor::build_incremental(epoch_t, epoch_t)' thread 7f93883f2700 time
2017-11-23 15:41:16.683525
mon/OSDMonitor.cc: 2123: FAILED assert(0)

ceph version .94.5.9 (e92a4716ae7404566753964959ddd84411b5dd18)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x85) [0x7b4735]
2: (OSDMonitor::build_incremental(unsigned int, unsigned int)+0x9ab)
[0x5e2e5b]
3: (OSDMonitor::send_incremental(unsigned int, MonSession*, bool)+0xb1)
[0x5e85b1]
4: (OSDMonitor::check_sub(Subscription*)+0x217) [0x5e8c17]
5: (Monitor::handle_subscribe(MMonSubscribe*)+0x440) [0x571810]
6: (Monitor::dispatch(MonSession*, Message*, bool)+0x3eb) [0x592d5b]
7: (Monitor::_ms_dispatch(Message*)+0x1a6) [0x593716]
8: (Monitor::ms_dispatch(Message*)+0x23) [0x5b2ac3]
9: (DispatchQueue::entry()+0x62a) [0x8a44aa]
10: (DispatchQueue::DispatchThread::entry()+0xd) [0x79c97d]
11: (()+0x7dc5) [0x7f93ad51ddc5]
12: (clone()+0x6d) [0x7f93ac00176d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
interpret this.

the exact assert failure is:
MOSDMap *OSDMonitor::build_incremental(epoch_t from, epoch_t to)
{
  dout(10) << "build_incremental [" << from << ".." << to << "]" << dendl;
  MOSDMap *m = new MOSDMap(mon->monmap->fsid);
  m->oldest_map = get_first_committed();
  m->newest_map = osdmap.get_epoch();

  for (epoch_t e = to; e >= from && e > 0; e--) {
  bufferlist bl;
  int err = get_version(e, bl);
  if (err == 0) {
  assert(bl.length());
  // if (get_version(e, bl) > 0) {
  dout(20) << "build_incremental inc " << e << " "
<< bl.length() << " bytes" << dendl;
  m->incremental_maps[e] = bl;
  } else {
  assert(err == -ENOENT);
  assert(!bl.length());
  get_version_full(e, bl);
  if (bl.length() > 0) {
  //else if (get_version("full", e, bl) > 0) {
  dout(20) << "build_incremental full " << e << " "
<< bl.length() << " bytes" << dendl;
  m->maps[e] = bl;
  } else {
assert(0); // we should have all maps.   <=======assert failed
  }
  }
  }
  return m;
}

we checked the code and found there could be race condition between
mondbstore read operation and osdmap trim operation. The panic scenario
looks like mondbstore is trimming osdmap and concurrently, new added osd is
requesting osdmap which invoked OSDMonitor::build_incremental(). if the
requested map is trimmed, get_version_full can not get the osdmaps from
mondbstore, then the assert failure is triggered. Though we run into this
issue with hammer, we checked the latest master branch and believe the race
condition is still there. Can anyone confirm this?

BTW, we think this is a dup of http://tracker.ceph.com/issues/11332 and
updated the comments but no response by now.

zhongyan
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to