Re: [ceph-users] 1 osd Segmentation fault in test cluster
>/Is this useful for someone? / Yes! Seehttp://tracker.ceph.com/issues/21259 The latest luminous branch (which you can get from https://shaman.ceph.com/builds/ceph/luminous/) has some additional debugging on OSD shutdown that should help me figure out what is causing this. If this is something you can reproduce on your cluster, please install the latest luminous and set 'osd debug shutdown = true' in the [osd] section of your config, and then ceph-post-file the log after a Don't know fix backported or not to 12.2.2, but today one of osd: Dec 15 06:23:57 ceph-osd0 ceph-osd[89499]: *** Caught signal (Segmentation fault) ** Dec 15 06:23:57 ceph-osd0 ceph-osd[89499]: in thread 7f8f44b72700 thread_name:bstore_mempool Dec 15 06:23:57 ceph-osd0 ceph-osd[89499]: ceph version 12.2.2 (cf0baba3b47f9427c6c97e2144b094b7e5ba) luminous (stable) Dec 15 06:23:57 ceph-osd0 ceph-osd[89499]: 1: (()+0xa339e1) [0x5629c08799e1] Dec 15 06:23:57 ceph-osd0 ceph-osd[89499]: 2: (()+0xf5e0) [0x7f8f4f63a5e0] Dec 15 06:23:57 ceph-osd0 ceph-osd[89499]: 3: (BlueStore::TwoQCache::_trim(unsigned long, unsigned long)+0x2df) [0x5629c074665f] Dec 15 06:23:57 ceph-osd0 ceph-osd[89499]: 4: (BlueStore::Cache::trim(unsigned long, float, float, float)+0x1d1) [0x5629c0718d71] Dec 15 06:23:57 ceph-osd0 ceph-osd[89499]: 5: (BlueStore::MempoolThread::entry()+0x14d) [0x5629c071f4ad] Dec 15 06:23:57 ceph-osd0 ceph-osd[89499]: 6: (()+0x7e25) [0x7f8f4f632e25] Dec 15 06:23:57 ceph-osd0 ceph-osd[89499]: 7: (clone()+0x6d) [0x7f8f4e72634d] Dec 15 06:23:57 ceph-osd0 ceph-osd[89499]: 2017-12-15 06:23:57.714362 7f8f44b72700 -1 *** Caught signal (Segmentation fault) ** Dec 15 06:23:57 ceph-osd0 ceph-osd[89499]: in thread 7f8f44b72700 thread_name:bstore_mempool Dec 15 06:23:57 ceph-osd0 ceph-osd[89499]: ceph version 12.2.2 (cf0baba3b47f9427c6c97e2144b094b7e5ba) luminous (stable) Dec 15 06:23:57 ceph-osd0 ceph-osd[89499]: 1: (()+0xa339e1) [0x5629c08799e1] Dec 15 06:23:57 ceph-osd0 ceph-osd[89499]: 2: (()+0xf5e0) [0x7f8f4f63a5e0] Dec 15 06:23:57 ceph-osd0 ceph-osd[89499]: 3: (BlueStore::TwoQCache::_trim(unsigned long, unsigned long)+0x2df) [0x5629c074665f] Dec 15 06:23:57 ceph-osd0 ceph-osd[89499]: 4: (BlueStore::Cache::trim(unsigned long, float, float, float)+0x1d1) [0x5629c0718d71] Dec 15 06:23:57 ceph-osd0 ceph-osd[89499]: 5: (BlueStore::MempoolThread::entry()+0x14d) [0x5629c071f4ad] Dec 15 06:23:57 ceph-osd0 ceph-osd[89499]: 6: (()+0x7e25) [0x7f8f4f632e25] Dec 15 06:23:57 ceph-osd0 ceph-osd[89499]: 7: (clone()+0x6d) [0x7f8f4e72634d] Dec 15 06:23:57 ceph-osd0 ceph-osd[89499]: NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. Dec 15 06:23:57 ceph-osd0 ceph-osd[89499]: 0> 2017-12-15 06:23:57.714362 7f8f44b72700 -1 *** Caught signal (Segmentation fault) ** Dec 15 06:23:57 ceph-osd0 ceph-osd[89499]: in thread 7f8f44b72700 thread_name:bstore_mempool Dec 15 06:23:57 ceph-osd0 ceph-osd[89499]: ceph version 12.2.2 (cf0baba3b47f9427c6c97e2144b094b7e5ba) luminous (stable) Dec 15 06:23:57 ceph-osd0 ceph-osd[89499]: 1: (()+0xa339e1) [0x5629c08799e1] Dec 15 06:23:57 ceph-osd0 ceph-osd[89499]: 2: (()+0xf5e0) [0x7f8f4f63a5e0] Dec 15 06:23:57 ceph-osd0 ceph-osd[89499]: 3: (BlueStore::TwoQCache::_trim(unsigned long, unsigned long)+0x2df) [0x5629c074665f] Dec 15 06:23:57 ceph-osd0 ceph-osd[89499]: 4: (BlueStore::Cache::trim(unsigned long, float, float, float)+0x1d1) [0x5629c0718d71] Dec 15 06:23:57 ceph-osd0 ceph-osd[89499]: 5: (BlueStore::MempoolThread::entry()+0x14d) [0x5629c071f4ad] Dec 15 06:23:57 ceph-osd0 ceph-osd[89499]: 6: (()+0x7e25) [0x7f8f4f632e25] Dec 15 06:23:57 ceph-osd0 ceph-osd[89499]: 7: (clone()+0x6d) [0x7f8f4e72634d] Dec 15 06:23:57 ceph-osd0 ceph-osd[89499]: NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. k ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] 1 osd Segmentation fault in test cluster
On Sat, 30 Sep 2017, Marc Roos wrote: > Is this useful for someone? Yes! > 1: (()+0xa29511) [0x7f762e5b2511] > 2: (()+0xf370) [0x7f762afa5370] > 3: (BlueStore::TwoQCache::_trim(unsigned long, unsigned long)+0x2df) > [0x7f762e481a2f] > 4: (BlueStore::Cache::trim(unsigned long, float, float, float)+0x1d1) > [0x7f762e4543e1] > 5: (BlueStore::MempoolThread::entry()+0x14d) [0x7f762e45a71d] See http://tracker.ceph.com/issues/21259 The latest luminous branch (which you can get from https://shaman.ceph.com/builds/ceph/luminous/) has some additional debugging on OSD shutdown that should help me figure out what is causing this. If this is something you can reproduce on your cluster, please install the latest luminous and set 'osd debug shutdown = true' in the [osd] section of your config, and then ceph-post-file the log after a crash. Thanks! sage ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] 1 osd Segmentation fault in test cluster
Looks like there is one already. http://tracker.ceph.com/issues/21259 On Tue, Oct 3, 2017 at 1:15 AM, Gregory Farnum wrote: > Please file a tracker ticket with all the info you have for stuff like this. > They’re a lot harder to lose than emails are. ;) > > On Sat, Sep 30, 2017 at 8:31 AM Marc Roos wrote: >> >> Is this useful for someone? >> >> >> >> [Sat Sep 30 15:51:11 2017] libceph: osd5 192.168.10.113:6809 socket >> closed (con state OPEN) >> [Sat Sep 30 15:51:11 2017] libceph: osd5 192.168.10.113:6809 socket >> closed (con state CONNECTING) >> [Sat Sep 30 15:51:11 2017] libceph: osd5 down >> [Sat Sep 30 15:51:11 2017] libceph: osd5 down >> [Sat Sep 30 15:52:52 2017] libceph: osd5 up >> [Sat Sep 30 15:52:52 2017] libceph: osd5 up >> >> >> >> 2017-09-30 15:48:08.542202 7f7623ce9700 0 log_channel(cluster) log >> [WRN] : slow request 31.456482 seconds old, received at 2017-09-30 >> 15:47:37.085589: osd_op(mds.0.9227:1289186 20.2b 20.9af42b6b (undecoded) >> ondisk+write+known_if_redirected+full_force e15675) currently >> queued_for_pg >> 2017-09-30 15:48:08.542207 7f7623ce9700 0 log_channel(cluster) log >> [WRN] : slow request 31.456086 seconds old, received at 2017-09-30 >> 15:47:37.085984: osd_op(mds.0.9227:1289190 20.13 20.e44f3f53 (undecoded) >> ondisk+write+known_if_redirected+full_force e15675) currently >> queued_for_pg >> 2017-09-30 15:48:08.542212 7f7623ce9700 0 log_channel(cluster) log >> [WRN] : slow request 31.456005 seconds old, received at 2017-09-30 >> 15:47:37.086065: osd_op(mds.0.9227:1289194 20.2b 20.6733bdeb (undecoded) >> ondisk+write+known_if_redirected+full_force e15675) currently >> queued_for_pg >> 2017-09-30 15:51:12.592490 7f7611cc5700 0 log_channel(cluster) log >> [DBG] : 20.3f scrub starts >> 2017-09-30 15:51:24.514602 7f76214e4700 -1 *** Caught signal >> (Segmentation fault) ** >> in thread 7f76214e4700 thread_name:bstore_mempool >> >> ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous >> (stable) >> 1: (()+0xa29511) [0x7f762e5b2511] >> 2: (()+0xf370) [0x7f762afa5370] >> 3: (BlueStore::TwoQCache::_trim(unsigned long, unsigned long)+0x2df) >> [0x7f762e481a2f] >> 4: (BlueStore::Cache::trim(unsigned long, float, float, float)+0x1d1) >> [0x7f762e4543e1] >> 5: (BlueStore::MempoolThread::entry()+0x14d) [0x7f762e45a71d] >> 6: (()+0x7dc5) [0x7f762af9ddc5] >> 7: (clone()+0x6d) [0x7f762a09176d] >> NOTE: a copy of the executable, or `objdump -rdS ` is >> needed to interpret this. >> >> --- begin dump of recent events --- >> -1> 2017-09-30 15:51:05.105915 7f76284ac700 5 -- >> 192.168.10.113:0/27661 >> 192.168.10.111:6810/6617 conn(0x7f766b736000 >> :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=19 cs=1 l=1). rx >> osd.0 seq 19546 0x7f76a2daf000 osd_ping(ping_reply e15675 stamp >> 2017-09-30 15:51:05.105439) v4 >> -> 2017-09-30 15:51:05.105963 7f760fcc1700 1 -- 10.0.0.13:0/27661 >> --> 10.0.0.11:6805/6491 -- osd_ping(ping e15675 stamp 2017-09-30 >> 15:51:05.105439) v4 -- 0x7f7683e98a00 con 0 >> -9998> 2017-09-30 15:51:05.105960 7f76284ac700 1 -- >> 192.168.10.113:0/27661 <== osd.0 192.168.10.111:6810/6617 19546 >> osd_ping(ping_reply e15675 stamp 2017-09-30 15:51:05.105439) v4 >> 2004+0+0 (1212154800 0 0) 0x7f76a2daf000 con 0x7f766b736000 >> -9997> 2017-09-30 15:51:05.105961 7f76274aa700 5 -- 10.0.0.13:0/27661 >> >> 10.0.0.11:6808/6646 conn(0x7f766b745800 :-1 >> s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=24 cs=1 l=1). rx osd.3 >> seq 19546 0x7f769b95f200 osd_ping(ping_reply e15675 stamp 2017-09-30 >> 15:51:05.105439) v4 >> -9996> 2017-09-30 15:51:05.105983 7f760fcc1700 1 -- >> 192.168.10.113:0/27661 --> 192.168.10.111:6805/6491 -- osd_ping(ping >> e15675 stamp 2017-09-30 15:51:05.105439) v4 -- 0x7f7683e97600 con 0 >> -9995> 2017-09-30 15:51:05.106001 7f76274aa700 1 -- 10.0.0.13:0/27661 >> <== osd.3 10.0.0.11:6808/6646 19546 osd_ping(ping_reply e15675 >> stamp 2017-09-30 15:51:05.105439) v4 2004+0+0 (1212154800 0 0) >> 0x7f769b95f200 con 0x7f766b745800 >> -9994> 2017-09-30 15:51:05.106015 7f760fcc1700 1 -- 10.0.0.13:0/27661 >> --> 10.0.0.11:6807/6470 -- osd_ping(ping e15675 stamp 2017-09-30 >> 15:51:05.105439) v4 -- 0x7f7683e99800 con 0 >> -9993> 2017-09-30 15:51:05.106035 7f760fcc1700 1 -- >> 192.168.10.113:0/27661 --> 192.168.10.111:6808/6470 -- osd_ping(ping >> e15675 stamp 2017-09-30 15:51:05.105439) v4 -- 0x7f763b72a200 con 0 >> -9992> 2017-09-30 15:51:05.106072 7f760fcc1700 1 -- 10.0.0.13:0/27661 >> --> 10.0.0.11:6809/6710 -- osd_ping(ping e15675 stamp 2017-09-30 >> 15:51:05.105439) v4 -- 0x7f768633dc00 con 0 >> -9991> 2017-09-30 15:51:05.106093 7f760fcc1700 1 -- >> 192.168.10.113:0/27661 --> 192.168.10.111:6804/6710 -- osd_ping(ping >> e15675 stamp 2017-09-30 15:51:05.105439) v4 -- 0x7f76667d3600 con 0 >> -9990> 2017-09-30 15:51:05.106114 7f760fcc1700 1 -- 10.0.0.13:0/27661 >> --> 10.0.0.12:6805/1949 -- osd_ping(ping e15675 stamp 2017-09-30 >> 15:51:05.105439) v4 -- 0x7f768fcd6200 con
Re: [ceph-users] 1 osd Segmentation fault in test cluster
Please file a tracker ticket with all the info you have for stuff like this. They’re a lot harder to lose than emails are. ;) On Sat, Sep 30, 2017 at 8:31 AM Marc Roos wrote: > Is this useful for someone? > > > > [Sat Sep 30 15:51:11 2017] libceph: osd5 192.168.10.113:6809 socket > closed (con state OPEN) > [Sat Sep 30 15:51:11 2017] libceph: osd5 192.168.10.113:6809 socket > closed (con state CONNECTING) > [Sat Sep 30 15:51:11 2017] libceph: osd5 down > [Sat Sep 30 15:51:11 2017] libceph: osd5 down > [Sat Sep 30 15:52:52 2017] libceph: osd5 up > [Sat Sep 30 15:52:52 2017] libceph: osd5 up > > > > 2017-09-30 15:48:08.542202 7f7623ce9700 0 log_channel(cluster) log > [WRN] : slow request 31.456482 seconds old, received at 2017-09-30 > 15:47:37.085589: osd_op(mds.0.9227:1289186 20.2b 20.9af42b6b (undecoded) > ondisk+write+known_if_redirected+full_force e15675) currently > queued_for_pg > 2017-09-30 15:48:08.542207 7f7623ce9700 0 log_channel(cluster) log > [WRN] : slow request 31.456086 seconds old, received at 2017-09-30 > 15:47:37.085984: osd_op(mds.0.9227:1289190 20.13 20.e44f3f53 (undecoded) > ondisk+write+known_if_redirected+full_force e15675) currently > queued_for_pg > 2017-09-30 15:48:08.542212 7f7623ce9700 0 log_channel(cluster) log > [WRN] : slow request 31.456005 seconds old, received at 2017-09-30 > 15:47:37.086065: osd_op(mds.0.9227:1289194 20.2b 20.6733bdeb (undecoded) > ondisk+write+known_if_redirected+full_force e15675) currently > queued_for_pg > 2017-09-30 15:51:12.592490 7f7611cc5700 0 log_channel(cluster) log > [DBG] : 20.3f scrub starts > 2017-09-30 15:51:24.514602 7f76214e4700 -1 *** Caught signal > (Segmentation fault) ** > in thread 7f76214e4700 thread_name:bstore_mempool > > ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous > (stable) > 1: (()+0xa29511) [0x7f762e5b2511] > 2: (()+0xf370) [0x7f762afa5370] > 3: (BlueStore::TwoQCache::_trim(unsigned long, unsigned long)+0x2df) > [0x7f762e481a2f] > 4: (BlueStore::Cache::trim(unsigned long, float, float, float)+0x1d1) > [0x7f762e4543e1] > 5: (BlueStore::MempoolThread::entry()+0x14d) [0x7f762e45a71d] > 6: (()+0x7dc5) [0x7f762af9ddc5] > 7: (clone()+0x6d) [0x7f762a09176d] > NOTE: a copy of the executable, or `objdump -rdS ` is > needed to interpret this. > > --- begin dump of recent events --- > -1> 2017-09-30 15:51:05.105915 7f76284ac700 5 -- > 192.168.10.113:0/27661 >> 192.168.10.111:6810/6617 conn(0x7f766b736000 > :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=19 cs=1 l=1). rx > osd.0 seq 19546 0x7f76a2daf000 osd_ping(ping_reply e15675 stamp > 2017-09-30 15:51:05.105439) v4 > -> 2017-09-30 15:51:05.105963 7f760fcc1700 1 -- 10.0.0.13:0/27661 > --> 10.0.0.11:6805/6491 -- osd_ping(ping e15675 stamp 2017-09-30 > 15:51:05.105439) v4 -- 0x7f7683e98a00 con 0 > -9998> 2017-09-30 15:51:05.105960 7f76284ac700 1 -- > 192.168.10.113:0/27661 <== osd.0 192.168.10.111:6810/6617 19546 > osd_ping(ping_reply e15675 stamp 2017-09-30 15:51:05.105439) v4 > 2004+0+0 (1212154800 0 0) 0x7f76a2daf000 con 0x7f766b736000 > -9997> 2017-09-30 15:51:05.105961 7f76274aa700 5 -- 10.0.0.13:0/27661 > >> 10.0.0.11:6808/6646 conn(0x7f766b745800 :-1 > s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=24 cs=1 l=1). rx osd.3 > seq 19546 0x7f769b95f200 osd_ping(ping_reply e15675 stamp 2017-09-30 > 15:51:05.105439) v4 > -9996> 2017-09-30 15:51:05.105983 7f760fcc1700 1 -- > 192.168.10.113:0/27661 --> 192.168.10.111:6805/6491 -- osd_ping(ping > e15675 stamp 2017-09-30 15:51:05.105439) v4 -- 0x7f7683e97600 con 0 > -9995> 2017-09-30 15:51:05.106001 7f76274aa700 1 -- 10.0.0.13:0/27661 > <== osd.3 10.0.0.11:6808/6646 19546 osd_ping(ping_reply e15675 > stamp 2017-09-30 15:51:05.105439) v4 2004+0+0 (1212154800 0 0) > 0x7f769b95f200 con 0x7f766b745800 > -9994> 2017-09-30 15:51:05.106015 7f760fcc1700 1 -- 10.0.0.13:0/27661 > --> 10.0.0.11:6807/6470 -- osd_ping(ping e15675 stamp 2017-09-30 > 15:51:05.105439) v4 -- 0x7f7683e99800 con 0 > -9993> 2017-09-30 15:51:05.106035 7f760fcc1700 1 -- > 192.168.10.113:0/27661 --> 192.168.10.111:6808/6470 -- osd_ping(ping > e15675 stamp 2017-09-30 15:51:05.105439) v4 -- 0x7f763b72a200 con 0 > -9992> 2017-09-30 15:51:05.106072 7f760fcc1700 1 -- 10.0.0.13:0/27661 > --> 10.0.0.11:6809/6710 -- osd_ping(ping e15675 stamp 2017-09-30 > 15:51:05.105439) v4 -- 0x7f768633dc00 con 0 > -9991> 2017-09-30 15:51:05.106093 7f760fcc1700 1 -- > 192.168.10.113:0/27661 --> 192.168.10.111:6804/6710 -- osd_ping(ping > e15675 stamp 2017-09-30 15:51:05.105439) v4 -- 0x7f76667d3600 con 0 > -9990> 2017-09-30 15:51:05.106114 7f760fcc1700 1 -- 10.0.0.13:0/27661 > --> 10.0.0.12:6805/1949 -- osd_ping(ping e15675 stamp 2017-09-30 > 15:51:05.105439) v4 -- 0x7f768fcd6200 con 0 > -9989> 2017-09-30 15:51:05.106134 7f760fcc1700 1 -- > 192.168.10.113:0/27661 --> 192.168.10.112:6805/1949 -- osd_ping(ping > e15675 stamp 2017-09-30 15:51:05.105439) v4 -- 0x7f765f27a800 con 0 > > > ... > > > -29> 2017