Aha, i see how to use the debuginfo - trying it by running through gdb.
On Wed, Apr 27, 2016 at 10:09 PM, Ben Hines <bhi...@gmail.com> wrote: > Got it again - however, the stack is exactly the same, no symbols - > debuginfo didn't resolve. Do i need to do something to enable that? > > The server in 'debug ms=10' this time, so there is a bit more spew: > > -14> 2016-04-27 21:59:58.811919 7f9e817fa700 1 -- > 10.30.1.8:0/3291985349 --> 10.30.2.13:6805/27519 -- > osd_op(client.44936150.0:223 obj_delete_at_hint.0000000055 [call > timeindex.list] 10.2c88dbcf ack+read+known_if_redirected e100564) v6 -- ?+0 > 0x7f9f140dc5f0 con 0x7f9f1410ed10 > -13> 2016-04-27 21:59:58.812039 7f9e3fa6b700 10 -- > 10.30.1.8:0/3291985349 >> 10.30.2.13:6805/27519 pipe(0x7f9f14110010 > sd=153 :10861 s=2 pgs=725914 cs=1 l=1 c=0x7f9f1410ed10).writer: state = > open policy.server=0 > -12> 2016-04-27 21:59:58.812096 7f9e3fa6b700 10 -- > 10.30.1.8:0/3291985349 >> 10.30.2.13:6805/27519 pipe(0x7f9f14110010 > sd=153 :10861 s=2 pgs=725914 cs=1 l=1 c=0x7f9f1410ed10).writer: state = > open policy.server=0 > -11> 2016-04-27 21:59:58.814343 7f9e3f96a700 10 -- > 10.30.1.8:0/3291985349 >> 10.30.2.13:6805/27519 pipe(0x7f9f14110010 > sd=153 :10861 s=2 pgs=725914 cs=1 l=1 c=0x7f9f1410ed10).reader wants 211 > from dispatch throttler 0/104857600 > -10> 2016-04-27 21:59:58.814375 7f9e3f96a700 10 -- > 10.30.1.8:0/3291985349 >> 10.30.2.13:6805/27519 pipe(0x7f9f14110010 > sd=153 :10861 s=2 pgs=725914 cs=1 l=1 c=0x7f9f1410ed10).aborted = 0 > -9> 2016-04-27 21:59:58.814405 7f9e3f96a700 10 -- > 10.30.1.8:0/3291985349 >> 10.30.2.13:6805/27519 pipe(0x7f9f14110010 > sd=153 :10861 s=2 pgs=725914 cs=1 l=1 c=0x7f9f1410ed10).reader got message > 2 0x7f9ec0009250 osd_op_reply(223 obj_delete_at_hint.0000000055 [call] v0'0 > uv1448004 ondisk = 0) v6 > -8> 2016-04-27 21:59:58.814428 7f9e3f96a700 1 -- > 10.30.1.8:0/3291985349 <== osd.6 10.30.2.13:6805/27519 2 ==== > osd_op_reply(223 obj_delete_at_hint.0000000055 [call] v0'0 uv1448004 ondisk > = 0) v6 ==== 196+0+15 (3849172018 0 2149983739) 0x7f9ec0009250 con > 0x7f9f1410ed10 > -7> 2016-04-27 21:59:58.814472 7f9e3f96a700 10 -- > 10.30.1.8:0/3291985349 dispatch_throttle_release 211 to dispatch > throttler 211/104857600 > -6> 2016-04-27 21:59:58.814470 7f9e3fa6b700 10 -- > 10.30.1.8:0/3291985349 >> 10.30.2.13:6805/27519 pipe(0x7f9f14110010 > sd=153 :10861 s=2 pgs=725914 cs=1 l=1 c=0x7f9f1410ed10).writer: state = > open policy.server=0 > -5> 2016-04-27 21:59:58.814511 7f9e3fa6b700 10 -- > 10.30.1.8:0/3291985349 >> 10.30.2.13:6805/27519 pipe(0x7f9f14110010 > sd=153 :10861 s=2 pgs=725914 cs=1 l=1 c=0x7f9f1410ed10).write_ack 2 > -4> 2016-04-27 21:59:58.814528 7f9e3fa6b700 10 -- > 10.30.1.8:0/3291985349 >> 10.30.2.13:6805/27519 pipe(0x7f9f14110010 > sd=153 :10861 s=2 pgs=725914 cs=1 l=1 c=0x7f9f1410ed10).writer: state = > open policy.server=0 > -3> 2016-04-27 21:59:58.814607 7f9e817fa700 1 -- > 10.30.1.8:0/3291985349 --> 10.30.2.13:6805/27519 -- > osd_op(client.44936150.0:224 obj_delete_at_hint.0000000055 [call > lock.unlock] 10.2c88dbcf ondisk+write+known_if_redirected e100564) v6 -- > ?+0 0x7f9f140dc5f0 con 0x7f9f1410ed10 > -2> 2016-04-27 21:59:58.814718 7f9e3fa6b700 10 -- > 10.30.1.8:0/3291985349 >> 10.30.2.13:6805/27519 pipe(0x7f9f14110010 > sd=153 :10861 s=2 pgs=725914 cs=1 l=1 c=0x7f9f1410ed10).writer: state = > open policy.server=0 > -1> 2016-04-27 21:59:58.814778 7f9e3fa6b700 10 -- > 10.30.1.8:0/3291985349 >> 10.30.2.13:6805/27519 pipe(0x7f9f14110010 > sd=153 :10861 s=2 pgs=725914 cs=1 l=1 c=0x7f9f1410ed10).writer: state = > open policy.server=0 > 0> 2016-04-27 21:59:58.826494 7f9e7e7f4700 -1 *** Caught signal > (Segmentation fault) ** > in thread 7f9e7e7f4700 > > ceph version 9.2.1 (752b6a3020c3de74e07d2a8b4c5e48dab5a6b6fd) > 1: (()+0x30b0a2) [0x7fa11c5030a2] > 2: (()+0xf100) [0x7fa1183fe100] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed > to interpret this. > > --- logging levels --- > <snip> > > > On Wed, Apr 27, 2016 at 9:39 PM, Ben Hines <bhi...@gmail.com> wrote: > >> Yes, CentOS 7.2. Happened twice in a row, both times shortly after a >> restart, so i expect i'll be able to reproduce it. However, i've now tried >> a bunch of times and it's not happening again. >> >> In any case i have glibc + ceph-debuginfo installed so we can get more >> info if it does happen. >> >> thanks! >> >> On Wed, Apr 27, 2016 at 8:40 PM, Brad Hubbard <bhubb...@redhat.com> >> wrote: >> >>> ----- Original Message ----- >>> > From: "Karol Mroz" <km...@suse.com> >>> > To: "Ben Hines" <bhi...@gmail.com> >>> > Cc: "ceph-users" <ceph-users@lists.ceph.com> >>> > Sent: Wednesday, 27 April, 2016 7:06:56 PM >>> > Subject: Re: [ceph-users] radosgw crash - Infernalis >>> > >>> > On Tue, Apr 26, 2016 at 10:17:31PM -0700, Ben Hines wrote: >>> > [...] >>> > > --> 10.30.1.6:6800/10350 -- osd_op(client.44852756.0:79 >>> > > default.42048218.<redacted> [getxattrs,stat,read 0~524288] >>> 12.aa730416 >>> > > ack+read+known_if_redirected e100207) v6 -- ?+0 0x7f49c41880b0 con >>> > > 0x7f49c4145eb0 >>> > > 0> 2016-04-26 22:07:59.685615 7f49a07f0700 -1 *** Caught signal >>> > > (Segmentation fault) ** >>> > > in thread 7f49a07f0700 >>> > > >>> > > ceph version 9.2.1 (752b6a3020c3de74e07d2a8b4c5e48dab5a6b6fd) >>> > > 1: (()+0x30b0a2) [0x7f4c4907f0a2] >>> > > 2: (()+0xf100) [0x7f4c44f7a100] >>> > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>> needed >>> > > to interpret this. >>> > >>> > Hi Ben, >>> > >>> > I sense a pretty badly corrupted stack. From the radosgw-9.2.1 >>> (obtained from >>> > a downloaded rpm): >>> > >>> > 000000000030a810 <_Z13pidfile_writePK11md_config_t@@Base>: >>> > ... >>> > 30b09d: e8 0e 40 e4 ff callq 14f0b0 <backtrace@plt> >>> > 30b0a2: 4c 89 ef mov %r13,%rdi >>> > ------- >>> > ... >>> > >>> > So either we tripped backtrace() code from pidfile_write() _or_ we >>> can't >>> > trust the stack. From the log snippet, it looks that we're far past >>> the point >>> > at which we would write a pidfile to disk (ie. at process start during >>> > global_init()). >>> > Rather, we're actually handling a request and outputting some bit of >>> debug >>> > message >>> > via MSDOp::print() and beyond... >>> >>> It would help to know what binary this is and what OS. >>> >>> We know the offset into the function is 0x30b0a2 but we don't know which >>> function yet AFAICT. Karol, how did you arrive at pidfile_write? Purely >>> from >>> the offset? I'm not sure that would be reliable... >>> >>> This is a segfault so the address of the frame where we crashed should >>> be the >>> exact instruction where we crashed. I don't believe a mov from one >>> register to >>> another that does not involve a dereference ((%r13) as opposed to %r13) >>> can >>> cause a segfault so I don't think we are on the right instruction but >>> then, as >>> you say, the stack may be corrupt. >>> >>> > >>> > Is this something you're able to easily reproduce? More logs with >>> higher log >>> > levels >>> > would be helpful... a coredump with radosgw compiled with -g would be >>> > excellent :) >>> >>> Agreed, although if this is an rpm based system it should be sufficient >>> to >>> run the following. >>> >>> # debuginfo-install ceph glibc >>> >>> That may give us the name of the function depending on where we are (if >>> we are >>> in a library it may require the debuginfo for that library be loaded. >>> >>> Karol is right that a coredump would be a good idea in this case and >>> will give >>> us maximum information about the issue you are seeing. >>> >>> Cheers, >>> Brad >>> >>> > >>> > -- >>> > Regards, >>> > Karol >>> > >>> > _______________________________________________ >>> > ceph-users mailing list >>> > ceph-users@lists.ceph.com >>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> > >>> >> >> >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com