[ceph-users] Re: osds crash and restart in octopus

2021-09-03 Thread Amudhan P
I also have a similar problem in my case OSD's starts and stops after a few
mins and not much in the log.

I have filed a bug waiting for a reply to confirm it's a bug or an issue.







On Fri, Sep 3, 2021 at 5:21 PM mahnoosh shahidi 
wrote:

> We still have this problem. Does anybody have any ideas about this?
>
> On Mon, Aug 23, 2021 at 9:53 AM mahnoosh shahidi 
> wrote:
>
> > Hi everyone,
> >
> > We have a problem with octopus 15.2.12. osds randomly crash and restart
> > with the following traceback log.
> >
> > -8> 2021-08-20T15:01:03.165+0430 7f2d10fd7700 10 monclient:
> > handle_auth_request added challenge on 0x55a3fc654400
> > -7> 2021-08-20T15:01:03.201+0430 7f2d02960700  2 osd.202 1145364
> > ms_handle_reset con 0x55a548087000 session 0x55a4be8a4940
> > -6> 2021-08-20T15:01:03.209+0430 7f2d02960700  2 osd.202 1145364
> > ms_handle_reset con 0x55a52aab2800 session 0x55a4497dd0c0
> > -5> 2021-08-20T15:01:03.213+0430 7f2d02960700  2 osd.202 1145364
> > ms_handle_reset con 0x55a548084800 session 0x55a3fca0f860
> > -4> 2021-08-20T15:01:03.217+0430 7f2d02960700  2 osd.202 1145364
> > ms_handle_reset con 0x55a3c5e50800 session 0x55a51c1b7680
> > -3> 2021-08-20T15:01:03.217+0430 7f2d02960700  2 osd.202 1145364
> > ms_handle_reset con 0x55a3c5e52000 session 0x55a4055932a0
> > -2> 2021-08-20T15:01:03.225+0430 7f2d02960700  2 osd.202 1145364
> > ms_handle_reset con 0x55a4b835f800 session 0x55a51c1b90c0
> > -1> 2021-08-20T15:01:03.225+0430 7f2d107d6700 10 monclient:
> > handle_auth_request added challenge on 0x55a3c5e52000
> >  0> 2021-08-20T15:01:03.233+0430 7f2d0ffd5700 -1 *** Caught signal
> > (Segmentation fault) **
> >  in thread 7f2d0ffd5700 thread_name:msgr-worker-2
> >
> >  ceph version 15.2.12 (ce065eabfa5ce81323b009786bdf5bb03127cbe1) octopus
> > (stable)
> >  1: (()+0x12980) [0x7f2d144b0980]
> >  2: (AsyncConnection::_stop()+0x9c) [0x55a37bf56cdc]
> >  3: (ProtocolV2::stop()+0x8b) [0x55a37bf8016b]
> >  4: (ProtocolV2::_fault()+0x6b) [0x55a37bf8030b]
> >  5:
> >
> (ProtocolV2::handle_read_frame_preamble_main(std::unique_ptr > ceph::buffer::v15_2_0::ptr_node::disposer>&&, int)+0x1d1)
> [0x55a37bf97d51]
> >  6: (ProtocolV2::run_continuation(Ct&)+0x34) [0x55a37bf80e64]
> >  7: (AsyncConnection::process()+0x5fc) [0x55a37bf59e0c]
> >  8: (EventCenter::process_events(unsigned int,
> > std::chrono::duration
> >*)+0x7dd)
> > [0x55a37bda9a2d]
> >  9: (()+0x11d45a8) [0x55a37bdaf5a8]
> >  10: (()+0xbd6df) [0x7f2d13b886df]
> >  11: (()+0x76db) [0x7f2d144a56db]
> >  12: (clone()+0x3f) [0x7f2d1324571f]
> >  NOTE: a copy of the executable, or `objdump -rdS ` is needed
> > to interpret this.
> >
> > Our cluster has 220 hdd disks and 200 ssds. We have separate nvme for DB
> > use in hdd osds. bucket indexes have also separate ssd disks.
> > Does anybody have any idea what the problem could be?
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: osds crash and restart in octopus

2021-09-03 Thread mahnoosh shahidi
We still have this problem. Does anybody have any ideas about this?

On Mon, Aug 23, 2021 at 9:53 AM mahnoosh shahidi 
wrote:

> Hi everyone,
>
> We have a problem with octopus 15.2.12. osds randomly crash and restart
> with the following traceback log.
>
> -8> 2021-08-20T15:01:03.165+0430 7f2d10fd7700 10 monclient:
> handle_auth_request added challenge on 0x55a3fc654400
> -7> 2021-08-20T15:01:03.201+0430 7f2d02960700  2 osd.202 1145364
> ms_handle_reset con 0x55a548087000 session 0x55a4be8a4940
> -6> 2021-08-20T15:01:03.209+0430 7f2d02960700  2 osd.202 1145364
> ms_handle_reset con 0x55a52aab2800 session 0x55a4497dd0c0
> -5> 2021-08-20T15:01:03.213+0430 7f2d02960700  2 osd.202 1145364
> ms_handle_reset con 0x55a548084800 session 0x55a3fca0f860
> -4> 2021-08-20T15:01:03.217+0430 7f2d02960700  2 osd.202 1145364
> ms_handle_reset con 0x55a3c5e50800 session 0x55a51c1b7680
> -3> 2021-08-20T15:01:03.217+0430 7f2d02960700  2 osd.202 1145364
> ms_handle_reset con 0x55a3c5e52000 session 0x55a4055932a0
> -2> 2021-08-20T15:01:03.225+0430 7f2d02960700  2 osd.202 1145364
> ms_handle_reset con 0x55a4b835f800 session 0x55a51c1b90c0
> -1> 2021-08-20T15:01:03.225+0430 7f2d107d6700 10 monclient:
> handle_auth_request added challenge on 0x55a3c5e52000
>  0> 2021-08-20T15:01:03.233+0430 7f2d0ffd5700 -1 *** Caught signal
> (Segmentation fault) **
>  in thread 7f2d0ffd5700 thread_name:msgr-worker-2
>
>  ceph version 15.2.12 (ce065eabfa5ce81323b009786bdf5bb03127cbe1) octopus
> (stable)
>  1: (()+0x12980) [0x7f2d144b0980]
>  2: (AsyncConnection::_stop()+0x9c) [0x55a37bf56cdc]
>  3: (ProtocolV2::stop()+0x8b) [0x55a37bf8016b]
>  4: (ProtocolV2::_fault()+0x6b) [0x55a37bf8030b]
>  5:
> (ProtocolV2::handle_read_frame_preamble_main(std::unique_ptr ceph::buffer::v15_2_0::ptr_node::disposer>&&, int)+0x1d1) [0x55a37bf97d51]
>  6: (ProtocolV2::run_continuation(Ct&)+0x34) [0x55a37bf80e64]
>  7: (AsyncConnection::process()+0x5fc) [0x55a37bf59e0c]
>  8: (EventCenter::process_events(unsigned int,
> std::chrono::duration >*)+0x7dd)
> [0x55a37bda9a2d]
>  9: (()+0x11d45a8) [0x55a37bdaf5a8]
>  10: (()+0xbd6df) [0x7f2d13b886df]
>  11: (()+0x76db) [0x7f2d144a56db]
>  12: (clone()+0x3f) [0x7f2d1324571f]
>  NOTE: a copy of the executable, or `objdump -rdS ` is needed
> to interpret this.
>
> Our cluster has 220 hdd disks and 200 ssds. We have separate nvme for DB
> use in hdd osds. bucket indexes have also separate ssd disks.
> Does anybody have any idea what the problem could be?
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io