Re: [ceph-users] OSD crashed during the fio test
If it is only this one osd I'd be inclined to be taking a hard look at the underlying hardware and how it behaves/performs compared to the hw backing identical osds. The less likely possibility is that you have some sort of "hot spot" causing resource contention for that osd. To investigate that further you could look at whether the pattern of cpu and ram usage of that daemon varies significantly compared to the other osd daemons in the cluster. You could also compare perf dumps between daemons. On Wed, Oct 2, 2019 at 1:46 PM Sasha Litvak wrote: > > I updated firmware and kernel, running torture tests. So far no assert, but > I still noticed this on the same osd as yesterday > > Oct 01 19:35:13 storage2n2-la ceph-osd-34[11188]: 2019-10-01 19:35:13.721 > 7f8d03150700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread > 0x7f8cd05d7700' had timed out after 60 > Oct 01 19:35:13 storage2n2-la ceph-osd-34[11188]: 2019-10-01 19:35:13.721 > 7f8d03150700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread > 0x7f8cd0dd8700' had timed out after 60 > Oct 01 19:35:13 storage2n2-la ceph-osd-34[11188]: 2019-10-01 19:35:13.721 > 7f8d03150700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread > 0x7f8cd2ddc700' had timed out after 60 > Oct 01 19:35:13 storage2n2-la ceph-osd-34[11188]: 2019-10-01 19:35:13.721 > 7f8d03150700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread > 0x7f8cd35dd700' had timed out after 60 > Oct 01 19:35:13 storage2n2-la ceph-osd-34[11188]: 2019-10-01 19:35:13.721 > 7f8d03150700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread > 0x7f8cd3dde700' had timed out after 60 > > The spike of latency on this OSD is 6 seconds at that time. Any ideas? > > On Tue, Oct 1, 2019 at 8:03 AM Sasha Litvak > wrote: >> >> It was hardware indeed. Dell server reported a disk being reset with power >> on. Checking the usual suspects i.e. controller firmware, controller event >> log (if I can get one), drive firmware. >> I will report more when I get a better idea >> >> Thank you! >> >> On Tue, Oct 1, 2019 at 2:33 AM Brad Hubbard wrote: >>> >>> Removed ceph-de...@vger.kernel.org and added d...@ceph.io >>> >>> On Tue, Oct 1, 2019 at 4:26 PM Alex Litvak >>> wrote: >>> > >>> > Hellow everyone, >>> > >>> > Can you shed the line on the cause of the crash? Could actually client >>> > request trigger it? >>> > >>> > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 2019-09-30 22:52:58.867 >>> > 7f093d71e700 -1 bdev(0x55b72c156000 /var/lib/ceph/osd/ceph-17/block) >>> > aio_submit retries 16 >>> > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 2019-09-30 22:52:58.867 >>> > 7f093d71e700 -1 bdev(0x55b72c156000 /var/lib/ceph/osd/ceph-17/block) aio >>> > submit got (11) Resource temporarily unavailable >>> >>> The KernelDevice::aio_submit function has tried to submit Io 16 times >>> (a hard coded limit) and received an error each time causing it to >>> assert. Can you check the status of the underlying device(s)? >>> >>> > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: >>> > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.2/rpm/el7/BUILD/ceph-14.2.2/src/os/bluestore/KernelDevice.cc: >>> > In fun >>> > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: >>> > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.2/rpm/el7/BUILD/ceph-14.2.2/src/os/bluestore/KernelDevice.cc: >>> > 757: F >>> > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: ceph version 14.2.2 >>> > (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable) >>> > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 1: >>> > (ceph::__ceph_assert_fail(char const*, char const*, int, char >>> > const*)+0x14a) [0x55b71f668cf4] >>> > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 2: >>> > (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, >>> > char const*, ...)+0) [0x55b71f668ec2] >>> > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 3: >>> > (KernelDevice::aio_submit(IOContext*)+0x701) [0x55b71fd61ca1] >>> > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 4: >>> > (BlueStore::_txc_aio_submit(BlueStore::TransContext*)+0x42) >>> > [0x55b71fc29892] >>> > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 5: >>> > (BlueStore::_txc_state_proc(BlueStore::TransContext*)+0x42b) >>> > [0x55b71fc496ab] >>> > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 6: >>> > (BlueStore::queue_transactions(boost::intrusive_ptr&, >>> > std::vector>> > std::allocator >&, >>> > boost::intrusive_ptr, ThreadPool::T >>> > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 7: (non-virtual thunk >>> > to PrimaryLogPG::queue_transactions(std::vector>> > std::allocator >&, >>> > boost::intrusive_ptr)+0x54) [0x55b71f9b1b84] >>> > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 8: >>> >
Re: [ceph-users] OSD crashed during the fio test
I updated firmware and kernel, running torture tests. So far no assert, but I still noticed this on the same osd as yesterday Oct 01 19:35:13 storage2n2-la ceph-osd-34[11188]: 2019-10-01 19:35:13.721 7f8d03150700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f8cd05d7700' had timed out after 60 Oct 01 19:35:13 storage2n2-la ceph-osd-34[11188]: 2019-10-01 19:35:13.721 7f8d03150700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f8cd0dd8700' had timed out after 60 Oct 01 19:35:13 storage2n2-la ceph-osd-34[11188]: 2019-10-01 19:35:13.721 7f8d03150700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f8cd2ddc700' had timed out after 60 Oct 01 19:35:13 storage2n2-la ceph-osd-34[11188]: 2019-10-01 19:35:13.721 7f8d03150700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f8cd35dd700' had timed out after 60 Oct 01 19:35:13 storage2n2-la ceph-osd-34[11188]: 2019-10-01 19:35:13.721 7f8d03150700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f8cd3dde700' had timed out after 60 The spike of latency on this OSD is 6 seconds at that time. Any ideas? On Tue, Oct 1, 2019 at 8:03 AM Sasha Litvak wrote: > It was hardware indeed. Dell server reported a disk being reset with > power on. Checking the usual suspects i.e. controller firmware, controller > event log (if I can get one), drive firmware. > I will report more when I get a better idea > > Thank you! > > On Tue, Oct 1, 2019 at 2:33 AM Brad Hubbard wrote: > >> Removed ceph-de...@vger.kernel.org and added d...@ceph.io >> >> On Tue, Oct 1, 2019 at 4:26 PM Alex Litvak >> wrote: >> > >> > Hellow everyone, >> > >> > Can you shed the line on the cause of the crash? Could actually client >> request trigger it? >> > >> > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 2019-09-30 >> 22:52:58.867 7f093d71e700 -1 bdev(0x55b72c156000 >> /var/lib/ceph/osd/ceph-17/block) aio_submit retries 16 >> > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 2019-09-30 >> 22:52:58.867 7f093d71e700 -1 bdev(0x55b72c156000 >> /var/lib/ceph/osd/ceph-17/block) aio submit got (11) Resource temporarily >> unavailable >> >> The KernelDevice::aio_submit function has tried to submit Io 16 times >> (a hard coded limit) and received an error each time causing it to >> assert. Can you check the status of the underlying device(s)? >> >> > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: >> > >> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.2/rpm/el7/BUILD/ceph-14.2.2/src/os/bluestore/KernelDevice.cc: >> > In fun >> > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: >> > >> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.2/rpm/el7/BUILD/ceph-14.2.2/src/os/bluestore/KernelDevice.cc: >> > 757: F >> > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: ceph version 14.2.2 >> (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable) >> > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 1: >> (ceph::__ceph_assert_fail(char const*, char const*, int, char >> const*)+0x14a) [0x55b71f668cf4] >> > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 2: >> (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char >> const*, ...)+0) [0x55b71f668ec2] >> > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 3: >> (KernelDevice::aio_submit(IOContext*)+0x701) [0x55b71fd61ca1] >> > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 4: >> (BlueStore::_txc_aio_submit(BlueStore::TransContext*)+0x42) [0x55b71fc29892] >> > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 5: >> (BlueStore::_txc_state_proc(BlueStore::TransContext*)+0x42b) >> [0x55b71fc496ab] >> > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 6: >> (BlueStore::queue_transactions(boost::intrusive_ptr&, >> std::vector> > std::allocator >&, >> boost::intrusive_ptr, ThreadPool::T >> > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 7: (non-virtual >> thunk to >> PrimaryLogPG::queue_transactions(std::vector> std::allocator >&, >> > boost::intrusive_ptr)+0x54) [0x55b71f9b1b84] >> > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 8: >> (ReplicatedBackend::submit_transaction(hobject_t const&, object_stat_sum_t >> const&, eversion_t const&, std::unique_ptr> > std::default_delete >&&, eversion_t const&, eversion_t >> const&, s >> > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 9: >> (PrimaryLogPG::issue_repop(PrimaryLogPG::RepGather*, >> PrimaryLogPG::OpContext*)+0xf12) [0x55b71f90e322] >> > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 10: >> (PrimaryLogPG::execute_ctx(PrimaryLogPG::OpContext*)+0xfae) [0x55b71f969b7e] >> > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 11: >> (PrimaryLogPG::do_op(boost::intrusive_ptr&)+0x3965) >> [0x55b71f96de15] >> > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 12: >> (PrimaryLogPG::do_request(boost::intrusive_ptr&, >>
Re: [ceph-users] OSD crashed during the fio test
It was hardware indeed. Dell server reported a disk being reset with power on. Checking the usual suspects i.e. controller firmware, controller event log (if I can get one), drive firmware. I will report more when I get a better idea Thank you! On Tue, Oct 1, 2019 at 2:33 AM Brad Hubbard wrote: > Removed ceph-de...@vger.kernel.org and added d...@ceph.io > > On Tue, Oct 1, 2019 at 4:26 PM Alex Litvak > wrote: > > > > Hellow everyone, > > > > Can you shed the line on the cause of the crash? Could actually client > request trigger it? > > > > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 2019-09-30 > 22:52:58.867 7f093d71e700 -1 bdev(0x55b72c156000 > /var/lib/ceph/osd/ceph-17/block) aio_submit retries 16 > > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 2019-09-30 > 22:52:58.867 7f093d71e700 -1 bdev(0x55b72c156000 > /var/lib/ceph/osd/ceph-17/block) aio submit got (11) Resource temporarily > unavailable > > The KernelDevice::aio_submit function has tried to submit Io 16 times > (a hard coded limit) and received an error each time causing it to > assert. Can you check the status of the underlying device(s)? > > > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: > > > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.2/rpm/el7/BUILD/ceph-14.2.2/src/os/bluestore/KernelDevice.cc: > > In fun > > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: > > > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.2/rpm/el7/BUILD/ceph-14.2.2/src/os/bluestore/KernelDevice.cc: > > 757: F > > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: ceph version 14.2.2 > (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable) > > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 1: > (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x14a) [0x55b71f668cf4] > > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 2: > (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char > const*, ...)+0) [0x55b71f668ec2] > > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 3: > (KernelDevice::aio_submit(IOContext*)+0x701) [0x55b71fd61ca1] > > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 4: > (BlueStore::_txc_aio_submit(BlueStore::TransContext*)+0x42) [0x55b71fc29892] > > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 5: > (BlueStore::_txc_state_proc(BlueStore::TransContext*)+0x42b) > [0x55b71fc496ab] > > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 6: > (BlueStore::queue_transactions(boost::intrusive_ptr&, > std::vector > std::allocator >&, > boost::intrusive_ptr, ThreadPool::T > > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 7: (non-virtual thunk > to PrimaryLogPG::queue_transactions(std::vector std::allocator >&, > > boost::intrusive_ptr)+0x54) [0x55b71f9b1b84] > > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 8: > (ReplicatedBackend::submit_transaction(hobject_t const&, object_stat_sum_t > const&, eversion_t const&, std::unique_ptr > std::default_delete >&&, eversion_t const&, eversion_t > const&, s > > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 9: > (PrimaryLogPG::issue_repop(PrimaryLogPG::RepGather*, > PrimaryLogPG::OpContext*)+0xf12) [0x55b71f90e322] > > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 10: > (PrimaryLogPG::execute_ctx(PrimaryLogPG::OpContext*)+0xfae) [0x55b71f969b7e] > > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 11: > (PrimaryLogPG::do_op(boost::intrusive_ptr&)+0x3965) > [0x55b71f96de15] > > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 12: > (PrimaryLogPG::do_request(boost::intrusive_ptr&, > ThreadPool::TPHandle&)+0xbd4) [0x55b71f96f8a4] > > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 13: > (OSD::dequeue_op(boost::intrusive_ptr, boost::intrusive_ptr, > ThreadPool::TPHandle&)+0x1a9) [0x55b71f7a9ea9] > > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 14: > (PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr&, > ThreadPool::TPHandle&)+0x62) [0x55b71fa475d2] > > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 15: > (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x9f4) > [0x55b71f7c6ef4] > > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 16: > (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x433) > [0x55b71fdc5ce3] > > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 17: > (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55b71fdc8d80] > > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 18: (()+0x7dd5) > [0x7f0971da9dd5] > > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 19: (clone()+0x6d) > [0x7f0970c7002d] > > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 2019-09-30 > 22:52:58.879 7f093d71e700 -1 > > >
Re: [ceph-users] OSD crashed during the fio test
Removed ceph-de...@vger.kernel.org and added d...@ceph.io On Tue, Oct 1, 2019 at 4:26 PM Alex Litvak wrote: > > Hellow everyone, > > Can you shed the line on the cause of the crash? Could actually client > request trigger it? > > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 2019-09-30 22:52:58.867 > 7f093d71e700 -1 bdev(0x55b72c156000 /var/lib/ceph/osd/ceph-17/block) > aio_submit retries 16 > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 2019-09-30 22:52:58.867 > 7f093d71e700 -1 bdev(0x55b72c156000 /var/lib/ceph/osd/ceph-17/block) aio > submit got (11) Resource temporarily unavailable The KernelDevice::aio_submit function has tried to submit Io 16 times (a hard coded limit) and received an error each time causing it to assert. Can you check the status of the underlying device(s)? > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.2/rpm/el7/BUILD/ceph-14.2.2/src/os/bluestore/KernelDevice.cc: > In fun > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.2/rpm/el7/BUILD/ceph-14.2.2/src/os/bluestore/KernelDevice.cc: > 757: F > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: ceph version 14.2.2 > (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable) > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 1: > (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14a) > [0x55b71f668cf4] > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 2: > (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char > const*, ...)+0) [0x55b71f668ec2] > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 3: > (KernelDevice::aio_submit(IOContext*)+0x701) [0x55b71fd61ca1] > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 4: > (BlueStore::_txc_aio_submit(BlueStore::TransContext*)+0x42) [0x55b71fc29892] > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 5: > (BlueStore::_txc_state_proc(BlueStore::TransContext*)+0x42b) [0x55b71fc496ab] > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 6: > (BlueStore::queue_transactions(boost::intrusive_ptr&, > std::vector std::allocator >&, boost::intrusive_ptr, > ThreadPool::T > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 7: (non-virtual thunk to > PrimaryLogPG::queue_transactions(std::vector std::allocator >&, > boost::intrusive_ptr)+0x54) [0x55b71f9b1b84] > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 8: > (ReplicatedBackend::submit_transaction(hobject_t const&, object_stat_sum_t > const&, eversion_t const&, std::unique_ptr std::default_delete >&&, eversion_t const&, eversion_t const&, > s > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 9: > (PrimaryLogPG::issue_repop(PrimaryLogPG::RepGather*, > PrimaryLogPG::OpContext*)+0xf12) [0x55b71f90e322] > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 10: > (PrimaryLogPG::execute_ctx(PrimaryLogPG::OpContext*)+0xfae) [0x55b71f969b7e] > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 11: > (PrimaryLogPG::do_op(boost::intrusive_ptr&)+0x3965) > [0x55b71f96de15] > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 12: > (PrimaryLogPG::do_request(boost::intrusive_ptr&, > ThreadPool::TPHandle&)+0xbd4) [0x55b71f96f8a4] > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 13: > (OSD::dequeue_op(boost::intrusive_ptr, boost::intrusive_ptr, > ThreadPool::TPHandle&)+0x1a9) [0x55b71f7a9ea9] > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 14: (PGOpItem::run(OSD*, > OSDShard*, boost::intrusive_ptr&, ThreadPool::TPHandle&)+0x62) > [0x55b71fa475d2] > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 15: > (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x9f4) > [0x55b71f7c6ef4] > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 16: > (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x433) > [0x55b71fdc5ce3] > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 17: > (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55b71fdc8d80] > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 18: (()+0x7dd5) > [0x7f0971da9dd5] > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 19: (clone()+0x6d) > [0x7f0970c7002d] > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 2019-09-30 22:52:58.879 > 7f093d71e700 -1 > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.2/rpm/el7/BUILD/ceph-14.2.2/ > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.2/rpm/el7/BUILD/ceph-14.2.2/src/os/bluestore/KernelDevice.cc: > 757: F > Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: ceph version
[ceph-users] OSD crashed during the fio test
Hellow everyone, Can you shed the line on the cause of the crash? Could actually client request trigger it? Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 2019-09-30 22:52:58.867 7f093d71e700 -1 bdev(0x55b72c156000 /var/lib/ceph/osd/ceph-17/block) aio_submit retries 16 Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 2019-09-30 22:52:58.867 7f093d71e700 -1 bdev(0x55b72c156000 /var/lib/ceph/osd/ceph-17/block) aio submit got (11) Resource temporarily unavailable Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.2/rpm/el7/BUILD/ceph-14.2.2/src/os/bluestore/KernelDevice.cc: In fun Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.2/rpm/el7/BUILD/ceph-14.2.2/src/os/bluestore/KernelDevice.cc: 757: F Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable) Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14a) [0x55b71f668cf4] Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x55b71f668ec2] Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 3: (KernelDevice::aio_submit(IOContext*)+0x701) [0x55b71fd61ca1] Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 4: (BlueStore::_txc_aio_submit(BlueStore::TransContext*)+0x42) [0x55b71fc29892] Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 5: (BlueStore::_txc_state_proc(BlueStore::TransContext*)+0x42b) [0x55b71fc496ab] Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 6: (BlueStore::queue_transactions(boost::intrusive_ptr&, std::vectorstd::allocator >&, boost::intrusive_ptr, ThreadPool::T Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 7: (non-virtual thunk to PrimaryLogPG::queue_transactions(std::vector >&, boost::intrusive_ptr)+0x54) [0x55b71f9b1b84] Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 8: (ReplicatedBackend::submit_transaction(hobject_t const&, object_stat_sum_t const&, eversion_t const&, std::unique_ptrstd::default_delete >&&, eversion_t const&, eversion_t const&, s Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 9: (PrimaryLogPG::issue_repop(PrimaryLogPG::RepGather*, PrimaryLogPG::OpContext*)+0xf12) [0x55b71f90e322] Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 10: (PrimaryLogPG::execute_ctx(PrimaryLogPG::OpContext*)+0xfae) [0x55b71f969b7e] Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 11: (PrimaryLogPG::do_op(boost::intrusive_ptr&)+0x3965) [0x55b71f96de15] Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 12: (PrimaryLogPG::do_request(boost::intrusive_ptr&, ThreadPool::TPHandle&)+0xbd4) [0x55b71f96f8a4] Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 13: (OSD::dequeue_op(boost::intrusive_ptr, boost::intrusive_ptr, ThreadPool::TPHandle&)+0x1a9) [0x55b71f7a9ea9] Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 14: (PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr&, ThreadPool::TPHandle&)+0x62) [0x55b71fa475d2] Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 15: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x9f4) [0x55b71f7c6ef4] Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 16: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x433) [0x55b71fdc5ce3] Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 17: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55b71fdc8d80] Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 18: (()+0x7dd5) [0x7f0971da9dd5] Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 19: (clone()+0x6d) [0x7f0970c7002d] Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 2019-09-30 22:52:58.879 7f093d71e700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.2/rpm/el7/BUILD/ceph-14.2.2/ Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.2/rpm/el7/BUILD/ceph-14.2.2/src/os/bluestore/KernelDevice.cc: 757: F Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable) Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14a) [0x55b71f668cf4] Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x55b71f668ec2] Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 3: (KernelDevice::aio_submit(IOContext*)+0x701)