Yes, CentOS 7.2. Happened twice in a row, both times shortly after a
restart, so i expect i'll be able to reproduce it. However, i've now tried
a bunch of times and it's not happening again.

In any case i have glibc + ceph-debuginfo installed so we can get more info
if it does happen.

thanks!

On Wed, Apr 27, 2016 at 8:40 PM, Brad Hubbard <bhubb...@redhat.com> wrote:

> ----- Original Message -----
> > From: "Karol Mroz" <km...@suse.com>
> > To: "Ben Hines" <bhi...@gmail.com>
> > Cc: "ceph-users" <ceph-users@lists.ceph.com>
> > Sent: Wednesday, 27 April, 2016 7:06:56 PM
> > Subject: Re: [ceph-users] radosgw crash - Infernalis
> >
> > On Tue, Apr 26, 2016 at 10:17:31PM -0700, Ben Hines wrote:
> > [...]
> > > --> 10.30.1.6:6800/10350 -- osd_op(client.44852756.0:79
> > > default.42048218.<redacted> [getxattrs,stat,read 0~524288] 12.aa730416
> > > ack+read+known_if_redirected e100207) v6 -- ?+0 0x7f49c41880b0 con
> > > 0x7f49c4145eb0
> > >      0> 2016-04-26 22:07:59.685615 7f49a07f0700 -1 *** Caught signal
> > > (Segmentation fault) **
> > >  in thread 7f49a07f0700
> > >
> > >  ceph version 9.2.1 (752b6a3020c3de74e07d2a8b4c5e48dab5a6b6fd)
> > >  1: (()+0x30b0a2) [0x7f4c4907f0a2]
> > >  2: (()+0xf100) [0x7f4c44f7a100]
> > >  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed
> > > to interpret this.
> >
> > Hi Ben,
> >
> > I sense a pretty badly corrupted stack. From the radosgw-9.2.1 (obtained
> from
> > a downloaded rpm):
> >
> > 000000000030a810 <_Z13pidfile_writePK11md_config_t@@Base>:
> > ...
> >   30b09d:       e8 0e 40 e4 ff          callq  14f0b0 <backtrace@plt>
> >   30b0a2:       4c 89 ef                mov    %r13,%rdi
> >   -------
> > ...
> >
> > So either we tripped backtrace() code from pidfile_write() _or_ we can't
> > trust the stack. From the log snippet, it looks that we're far past the
> point
> > at which we would write a pidfile to disk (ie. at process start during
> > global_init()).
> > Rather, we're actually handling a request and outputting some bit of
> debug
> > message
> > via MSDOp::print() and beyond...
>
> It would help to know what binary this is and what OS.
>
> We know the offset into the function is 0x30b0a2 but we don't know which
> function yet AFAICT. Karol, how did you arrive at pidfile_write? Purely
> from
> the offset? I'm not sure that would be reliable...
>
> This is a segfault so the address of the frame where we crashed should be
> the
> exact instruction where we crashed. I don't believe a mov from one
> register to
> another that does not involve a dereference ((%r13) as opposed to %r13) can
> cause a segfault so I don't think we are on the right instruction but
> then, as
> you say, the stack may be corrupt.
>
> >
> > Is this something you're able to easily reproduce? More logs with higher
> log
> > levels
> > would be helpful... a coredump with radosgw compiled with -g would be
> > excellent :)
>
> Agreed, although if this is an rpm based system it should be sufficient to
> run the following.
>
> # debuginfo-install ceph glibc
>
> That may give us the name of the function depending on where we are (if we
> are
> in a library it may require the debuginfo for that library be loaded.
>
> Karol is right that a coredump would be a good idea in this case and will
> give
> us maximum information about the issue you are seeing.
>
> Cheers,
> Brad
>
> >
> > --
> > Regards,
> > Karol
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to