Yes, CentOS 7.2. Happened twice in a row, both times shortly after a restart, so i expect i'll be able to reproduce it. However, i've now tried a bunch of times and it's not happening again.
In any case i have glibc + ceph-debuginfo installed so we can get more info if it does happen. thanks! On Wed, Apr 27, 2016 at 8:40 PM, Brad Hubbard <bhubb...@redhat.com> wrote: > ----- Original Message ----- > > From: "Karol Mroz" <km...@suse.com> > > To: "Ben Hines" <bhi...@gmail.com> > > Cc: "ceph-users" <ceph-users@lists.ceph.com> > > Sent: Wednesday, 27 April, 2016 7:06:56 PM > > Subject: Re: [ceph-users] radosgw crash - Infernalis > > > > On Tue, Apr 26, 2016 at 10:17:31PM -0700, Ben Hines wrote: > > [...] > > > --> 10.30.1.6:6800/10350 -- osd_op(client.44852756.0:79 > > > default.42048218.<redacted> [getxattrs,stat,read 0~524288] 12.aa730416 > > > ack+read+known_if_redirected e100207) v6 -- ?+0 0x7f49c41880b0 con > > > 0x7f49c4145eb0 > > > 0> 2016-04-26 22:07:59.685615 7f49a07f0700 -1 *** Caught signal > > > (Segmentation fault) ** > > > in thread 7f49a07f0700 > > > > > > ceph version 9.2.1 (752b6a3020c3de74e07d2a8b4c5e48dab5a6b6fd) > > > 1: (()+0x30b0a2) [0x7f4c4907f0a2] > > > 2: (()+0xf100) [0x7f4c44f7a100] > > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is > needed > > > to interpret this. > > > > Hi Ben, > > > > I sense a pretty badly corrupted stack. From the radosgw-9.2.1 (obtained > from > > a downloaded rpm): > > > > 000000000030a810 <_Z13pidfile_writePK11md_config_t@@Base>: > > ... > > 30b09d: e8 0e 40 e4 ff callq 14f0b0 <backtrace@plt> > > 30b0a2: 4c 89 ef mov %r13,%rdi > > ------- > > ... > > > > So either we tripped backtrace() code from pidfile_write() _or_ we can't > > trust the stack. From the log snippet, it looks that we're far past the > point > > at which we would write a pidfile to disk (ie. at process start during > > global_init()). > > Rather, we're actually handling a request and outputting some bit of > debug > > message > > via MSDOp::print() and beyond... > > It would help to know what binary this is and what OS. > > We know the offset into the function is 0x30b0a2 but we don't know which > function yet AFAICT. Karol, how did you arrive at pidfile_write? Purely > from > the offset? I'm not sure that would be reliable... > > This is a segfault so the address of the frame where we crashed should be > the > exact instruction where we crashed. I don't believe a mov from one > register to > another that does not involve a dereference ((%r13) as opposed to %r13) can > cause a segfault so I don't think we are on the right instruction but > then, as > you say, the stack may be corrupt. > > > > > Is this something you're able to easily reproduce? More logs with higher > log > > levels > > would be helpful... a coredump with radosgw compiled with -g would be > > excellent :) > > Agreed, although if this is an rpm based system it should be sufficient to > run the following. > > # debuginfo-install ceph glibc > > That may give us the name of the function depending on where we are (if we > are > in a library it may require the debuginfo for that library be loaded. > > Karol is right that a coredump would be a good idea in this case and will > give > us maximum information about the issue you are seeing. > > Cheers, > Brad > > > > > -- > > Regards, > > Karol > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com