I'm hitting this same issue on 12.2.5. Upgraded one node to 12.2.10 and it didn't clear.
6 OSDs flapping with this error. I know this is an older issue but are traces still needed? I don't see a resolution available. Thanks, Dan On Wed, Sep 6, 2017 at 10:30 PM Brad Hubbard <bhubb...@redhat.com> wrote: > These error logs look like they are being generated here, > > https://github.com/ceph/ceph/blob/master/src/os/bluestore/BlueStore.cc#L8987-L8993 > or possibly here, > > https://github.com/ceph/ceph/blob/master/src/os/bluestore/BlueStore.cc#L9230-L9236 > . > > Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: > 2017-09-05 17:02:58.686723 7fe1871ac700 -1 > bluestore(/var/lib/ceph/osd/ceph-12) _txc_add_transaction error (2) No > such file or directory not handled on operation 15 (op 0, counting > from 0) > > The table of operations is here, > https://github.com/ceph/ceph/blob/master/src/os/ObjectStore.h#L370 > > Operation 15 is OP_SETATTRS so it appears to be some extended > attribute operation that is failing. > > Can someone run the ceph-osd under strace and find the last system > call (probably a call that manipulates xattrs) that returns -2 in the > thread that crashes (or that outputs the above messages)? > > strace -fvttyyTo /tmp/strace.out -s 1024 ceph-osd [system specific > argumentsarguments] > > Capturing logs with "debug_bluestore = 20" may tell us more as well. > > We need to work out what resource it is trying to access when it > receives the error '2' (No such file or directory). > > > On Thu, Sep 7, 2017 at 12:13 AM, Thomas Coelho > <coe...@th.physik.uni-frankfurt.de> wrote: > > Hi, > > > > I have the same problem. A bug [1] is reported since months, but > > unfortunately this is not fixed yet. I hope, if more people are having > > this problem the developers can reproduce and fix it. > > > > I was using Kernel-RBD with a Cache Tier. > > > > so long > > Thomas Coelho > > > > [1] http://tracker.ceph.com/issues/20222 > > > > > > Am 06.09.2017 um 15:41 schrieb Henrik Korkuc: > >> On 17-09-06 16:24, Jean-Francois Nadeau wrote: > >>> Hi, > >>> > >>> On a 4 node / 48 OSDs Luminous cluster Im giving a try at RBD on EC > >>> pools + Bluestore. > >>> > >>> Setup went fine but after a few bench runs several OSD are failing and > >>> many wont even restart. > >>> > >>> ceph osd erasure-code-profile set myprofile \ > >>> k=2\ > >>> m=1 \ > >>> crush-failure-domain=host > >>> ceph osd pool create mypool 1024 1024 erasure myprofile > >>> ceph osd pool set mypool allow_ec_overwrites true > >>> rbd pool init mypool > >>> ceph -s > >>> ceph health detail > >>> ceph osd pool create metapool 1024 1024 replicated > >>> rbd create --size 1024G --data-pool mypool --image metapool/test1 > >>> rbd bench -p metapool test1 --io-type write --io-size 8192 > >>> --io-pattern rand --io-total 10G > >>> ... > >>> > >>> > >>> One of many OSD failing logs > >>> > >>> Sep 05 17:02:54 r72-k7-06-01.k8s.ash1.cloudsys.tmcs systemd[1]: > >>> Started Ceph object storage daemon osd.12. > >>> Sep 05 17:02:54 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: > >>> starting osd.12 at - osd_data /var/lib/ceph/osd/ceph-12 > >>> /var/lib/ceph/osd/ceph-12/journal > >>> Sep 05 17:02:56 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: > >>> 2017-09-05 17:02:56.627301 7fe1a2e42d00 -1 osd.12 2219 log_to_monitors > >>> {default=true} > >>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: > >>> 2017-09-05 17:02:58.686723 7fe1871ac700 -1 > >>> bluestore(/var/lib/ceph/osd/ceph-12) _txc_add_transac > >>> tion error (2) No such file or directory not handled on operation 15 > >>> (op 0, counting from 0) > >>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: > >>> 2017-09-05 17:02:58.686742 7fe1871ac700 -1 > >>> bluestore(/var/lib/ceph/osd/ceph-12) unexpected error > >>> code > >>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: > >>> > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/ > >>> > centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.0/rpm/el7/BUILD/ceph-12.2.0/src/os/bluestore/BlueStore.cc: > >>> In function 'void BlueStore::_txc_add_transaction(Blu > >>> eStore::TransContext*, ObjectStore::Transaction*)' thread 7fe1871ac700 > >>> time 2017-09-05 17:02:58.686821 > >>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: > >>> > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/ > >>> > centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.0/rpm/el7/BUILD/ceph-12.2.0/src/os/bluestore/BlueStore.cc: > >>> 9282: FAILED assert(0 == "unexpected error") > >>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: > >>> ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) > >>> luminous (rc) > >>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 1: > >>> (ceph::__ceph_assert_fail(char const*, char const*, int, char > >>> const*)+0x110) [0x7fe1a38bf510] > >>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 2: > >>> (BlueStore::_txc_add_transaction(BlueStore::TransContext*, > >>> ObjectStore::Transaction*)+0x1487) > >>> [0x7fe1a3796057] > >>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 3: > >>> (BlueStore::queue_transactions(ObjectStore::Sequencer*, > >>> std::vector<ObjectStore::Transaction, > >>> std::allocator<ObjectStore::Transaction> >&, > >>> boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x3a0) > >>> [0x7fe1a37970a0] > >>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 4: > >>> > (PrimaryLogPG::queue_transactions(std::vector<ObjectStore::Transaction, > std::allocator<Object > >>> Store::Transaction> >&, boost::intrusive_ptr<OpRequest>)+0x65) > >>> [0x7fe1a3508745] > >>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 5: > >>> (ECBackend::handle_sub_write(pg_shard_t, > >>> boost::intrusive_ptr<OpRequest>, ECSubWrite&, ZTrace > >>> r::Trace const&, Context*)+0x631) [0x7fe1a3628711] > >>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 6: > >>> (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x327) > >>> [0x7fe1a36392b7] > >>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 7: > >>> (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) > >>> [0x7fe1a353da10] > >>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 8: > >>> (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > >>> ThreadPool::TPHandle&)+0x58e) [0x > >>> 7fe1a34a9a7e] > >>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 9: > >>> (OSD::dequeue_op(boost::intrusive_ptr<PG>, > >>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHan > >>> dle&)+0x3f9) [0x7fe1a333c729] > >>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: > >>> 10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > >>> const&)+0x57) [0x7fe1a35ac1 > >>> 97] > >>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: > >>> 11: (OSD::ShardedOpWQ::_process(unsigned int, > >>> ceph::heartbeat_handle_d*)+0xfce) [0x7fe1a3367c8e] > >>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: > >>> 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839) > >>> [0x7fe1a38c5029] > >>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: > >>> 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) > [0x7fe1a38c6fc0] > >>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: > >>> 14: (()+0x7dc5) [0x7fe1a0484dc5] > >>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: > >>> 15: (clone()+0x6d) [0x7fe19f57876d] > >>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: > >>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >>> needed to interpret this. > >>> [root@r72-k7-06-01 ~]# journalctl -u ceph-osd@12 --no-pager -n 100 > >>> -- Logs begin at Wed 2017-08-30 10:26:26 UTC, end at Tue 2017-09-05 > >>> 22:06:19 UTC. -- > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: > >>> 15: (clone()+0x6d) [0x7f0160c9d76d] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: > >>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >>> needed to interpret this. > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: > >>> -5496> 2017-09-05 17:08:03.460844 7f0164567d00 -1 osd.12 2362 > >>> log_to_monitors {default=true} > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: > >>> -74> 2017-09-05 17:08:05.837648 7f01488d1700 -1 > >>> bluestore(/var/lib/ceph/osd/ceph-12) _txc_add_transaction error (2) No > >>> such file or directory not handled on operation 15 (op 0, counting > from 0) > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: > >>> -73> 2017-09-05 17:08:05.837670 7f01488d1700 -1 > >>> bluestore(/var/lib/ceph/osd/ceph-12) unexpected error code > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 0> > >>> 2017-09-05 17:08:05.843218 7f01488d1700 -1 > >>> > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.0/rpm/el7/BUILD/ceph-12.2.0/src/os/bluestore/BlueStore.cc: > >>> In function 'void > >>> BlueStore::_txc_add_transaction(BlueStore::TransContext*, > >>> ObjectStore::Transaction*)' thread 7f01488d1700 time 2017-09-05 > >>> 17:08:05.837770 > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: > >>> > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.0/rpm/el7/BUILD/ceph-12.2.0/src/os/bluestore/BlueStore.cc: > >>> 9282: FAILED assert(0 == "unexpected error") > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: > >>> ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) > >>> luminous (rc) > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 1: > >>> (ceph::__ceph_assert_fail(char const*, char const*, int, char > >>> const*)+0x110) [0x7f0164fe4510] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 2: > >>> (BlueStore::_txc_add_transaction(BlueStore::TransContext*, > >>> ObjectStore::Transaction*)+0x1487) [0x7f0164ebb057] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 3: > >>> (BlueStore::queue_transactions(ObjectStore::Sequencer*, > >>> std::vector<ObjectStore::Transaction, > >>> std::allocator<ObjectStore::Transaction> >&, > >>> boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x3a0) > >>> [0x7f0164ebc0a0] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 4: > >>> > (PrimaryLogPG::queue_transactions(std::vector<ObjectStore::Transaction, > std::allocator<ObjectStore::Transaction> > >>> >&, boost::intrusive_ptr<OpRequest>)+0x65) [0x7f0164c2d745] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 5: > >>> (ECBackend::handle_sub_write(pg_shard_t, > >>> boost::intrusive_ptr<OpRequest>, ECSubWrite&, ZTracer::Trace const&, > >>> Context*)+0x631) [0x7f0164d4d711] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 6: > >>> (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x327) > >>> [0x7f0164d5e2b7] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 7: > >>> (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) > >>> [0x7f0164c62a10] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 8: > >>> (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > >>> ThreadPool::TPHandle&)+0x58e) [0x7f0164bcea7e] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 9: > >>> (OSD::dequeue_op(boost::intrusive_ptr<PG>, > >>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f9) > >>> [0x7f0164a61729] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: > >>> 10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > >>> const&)+0x57) [0x7f0164cd1197] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: > >>> 11: (OSD::ShardedOpWQ::_process(unsigned int, > >>> ceph::heartbeat_handle_d*)+0xfce) [0x7f0164a8cc8e] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: > >>> 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839) > >>> [0x7f0164fea029] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: > >>> 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) > [0x7f0164febfc0] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: > >>> 14: (()+0x7dc5) [0x7f0161ba9dc5] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: > >>> 15: (clone()+0x6d) [0x7f0160c9d76d] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: > >>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >>> needed to interpret this. > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: > >>> *** Caught signal (Aborted) ** > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: in > >>> thread 7f01488d1700 thread_name:tp_osd_tp > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: > >>> ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) > >>> luminous (rc) > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 1: > >>> (()+0xa23b21) [0x7f0164fa5b21] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 2: > >>> (()+0xf370) [0x7f0161bb1370] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 3: > >>> (gsignal()+0x37) [0x7f0160bdb1d7] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 4: > >>> (abort()+0x148) [0x7f0160bdc8c8] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 5: > >>> (ceph::__ceph_assert_fail(char const*, char const*, int, char > >>> const*)+0x284) [0x7f0164fe4684] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 6: > >>> (BlueStore::_txc_add_transaction(BlueStore::TransContext*, > >>> ObjectStore::Transaction*)+0x1487) [0x7f0164ebb057] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 7: > >>> (BlueStore::queue_transactions(ObjectStore::Sequencer*, > >>> std::vector<ObjectStore::Transaction, > >>> std::allocator<ObjectStore::Transaction> >&, > >>> boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x3a0) > >>> [0x7f0164ebc0a0] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 8: > >>> > (PrimaryLogPG::queue_transactions(std::vector<ObjectStore::Transaction, > std::allocator<ObjectStore::Transaction> > >>> >&, boost::intrusive_ptr<OpRequest>)+0x65) [0x7f0164c2d745] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 9: > >>> (ECBackend::handle_sub_write(pg_shard_t, > >>> boost::intrusive_ptr<OpRequest>, ECSubWrite&, ZTracer::Trace const&, > >>> Context*)+0x631) [0x7f0164d4d711] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: > >>> 10: > >>> (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x327) > >>> [0x7f0164d5e2b7] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: > >>> 11: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) > >>> [0x7f0164c62a10] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: > >>> 12: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > >>> ThreadPool::TPHandle&)+0x58e) [0x7f0164bcea7e] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: > >>> 13: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > >>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f9) > >>> [0x7f0164a61729] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: > >>> 14: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > >>> const&)+0x57) [0x7f0164cd1197] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: > >>> 15: (OSD::ShardedOpWQ::_process(unsigned int, > >>> ceph::heartbeat_handle_d*)+0xfce) [0x7f0164a8cc8e] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: > >>> 16: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839) > >>> [0x7f0164fea029] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: > >>> 17: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) > [0x7f0164febfc0] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: > >>> 18: (()+0x7dc5) [0x7f0161ba9dc5] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: > >>> 19: (clone()+0x6d) [0x7f0160c9d76d] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: > >>> 2017-09-05 17:08:05.883240 7f01488d1700 -1 *** Caught signal (Aborted) > ** > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: in > >>> thread 7f01488d1700 thread_name:tp_osd_tp > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: > >>> ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) > >>> luminous (rc) > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 1: > >>> (()+0xa23b21) [0x7f0164fa5b21] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 2: > >>> (()+0xf370) [0x7f0161bb1370] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 3: > >>> (gsignal()+0x37) [0x7f0160bdb1d7] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 4: > >>> (abort()+0x148) [0x7f0160bdc8c8] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 5: > >>> (ceph::__ceph_assert_fail(char const*, char const*, int, char > >>> const*)+0x284) [0x7f0164fe4684] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 6: > >>> (BlueStore::_txc_add_transaction(BlueStore::TransContext*, > >>> ObjectStore::Transaction*)+0x1487) [0x7f0164ebb057] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 7: > >>> (BlueStore::queue_transactions(ObjectStore::Sequencer*, > >>> std::vector<ObjectStore::Transaction, > >>> std::allocator<ObjectStore::Transaction> >&, > >>> boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x3a0) > >>> [0x7f0164ebc0a0] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 8: > >>> > (PrimaryLogPG::queue_transactions(std::vector<ObjectStore::Transaction, > std::allocator<ObjectStore::Transaction> > >>> >&, boost::intrusive_ptr<OpRequest>)+0x65) [0x7f0164c2d745] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 9: > >>> (ECBackend::handle_sub_write(pg_shard_t, > >>> boost::intrusive_ptr<OpRequest>, ECSubWrite&, ZTracer::Trace const&, > >>> Context*)+0x631) [0x7f0164d4d711] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: > >>> 10: > >>> (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x327) > >>> [0x7f0164d5e2b7] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: > >>> 11: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) > >>> [0x7f0164c62a10] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: > >>> 12: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > >>> ThreadPool::TPHandle&)+0x58e) [0x7f0164bcea7e] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: > >>> 13: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > >>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f9) > >>> [0x7f0164a61729] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: > >>> 14: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > >>> const&)+0x57) [0x7f0164cd1197] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: > >>> 15: (OSD::ShardedOpWQ::_process(unsigned int, > >>> ceph::heartbeat_handle_d*)+0xfce) [0x7f0164a8cc8e] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: > >>> 16: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839) > >>> [0x7f0164fea029] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: > >>> 17: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) > [0x7f0164febfc0] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: > >>> 18: (()+0x7dc5) [0x7f0161ba9dc5] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: > >>> 19: (clone()+0x6d) [0x7f0160c9d76d] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: > >>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >>> needed to interpret this. > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 0> > >>> 2017-09-05 17:08:05.883240 7f01488d1700 -1 *** Caught signal (Aborted) > ** > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: in > >>> thread 7f01488d1700 thread_name:tp_osd_tp > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: > >>> ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) > >>> luminous (rc) > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 1: > >>> (()+0xa23b21) [0x7f0164fa5b21] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 2: > >>> (()+0xf370) [0x7f0161bb1370] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 3: > >>> (gsignal()+0x37) [0x7f0160bdb1d7] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 4: > >>> (abort()+0x148) [0x7f0160bdc8c8] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 5: > >>> (ceph::__ceph_assert_fail(char const*, char const*, int, char > >>> const*)+0x284) [0x7f0164fe4684] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 6: > >>> (BlueStore::_txc_add_transaction(BlueStore::TransContext*, > >>> ObjectStore::Transaction*)+0x1487) [0x7f0164ebb057] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 7: > >>> (BlueStore::queue_transactions(ObjectStore::Sequencer*, > >>> std::vector<ObjectStore::Transaction, > >>> std::allocator<ObjectStore::Transaction> >&, > >>> boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x3a0) > >>> [0x7f0164ebc0a0] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 8: > >>> > (PrimaryLogPG::queue_transactions(std::vector<ObjectStore::Transaction, > std::allocator<ObjectStore::Transaction> > >>> >&, boost::intrusive_ptr<OpRequest>)+0x65) [0x7f0164c2d745] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 9: > >>> (ECBackend::handle_sub_write(pg_shard_t, > >>> boost::intrusive_ptr<OpRequest>, ECSubWrite&, ZTracer::Trace const&, > >>> Context*)+0x631) [0x7f0164d4d711] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: > >>> 10: > >>> (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x327) > >>> [0x7f0164d5e2b7] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: > >>> 11: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) > >>> [0x7f0164c62a10] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: > >>> 12: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > >>> ThreadPool::TPHandle&)+0x58e) [0x7f0164bcea7e] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: > >>> 13: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > >>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f9) > >>> [0x7f0164a61729] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: > >>> 14: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> > >>> const&)+0x57) [0x7f0164cd1197] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: > >>> 15: (OSD::ShardedOpWQ::_process(unsigned int, > >>> ceph::heartbeat_handle_d*)+0xfce) [0x7f0164a8cc8e] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: > >>> 16: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839) > >>> [0x7f0164fea029] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: > >>> 17: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) > [0x7f0164febfc0] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: > >>> 18: (()+0x7dc5) [0x7f0161ba9dc5] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: > >>> 19: (clone()+0x6d) [0x7f0160c9d76d] > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: > >>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is > >>> needed to interpret this. > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs systemd[1]: > >>> ceph-osd@12.service: main process exited, code=killed, status=6/ABRT > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs systemd[1]: Unit > >>> ceph-osd@12.service entered failed state. > >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs systemd[1]: > >>> ceph-osd@12.service failed. > >>> Sep 05 17:08:26 r72-k7-06-01.k8s.ash1.cloudsys.tmcs systemd[1]: > >>> ceph-osd@12.service holdoff time over, scheduling restart. > >>> Sep 05 17:08:26 r72-k7-06-01.k8s.ash1.cloudsys.tmcs systemd[1]: start > >>> request repeated too quickly for ceph-osd@12.service > >>> Sep 05 17:08:26 r72-k7-06-01.k8s.ash1.cloudsys.tmcs systemd[1]: Failed > >>> to start Ceph object storage daemon osd.12. > >>> Sep 05 17:08:26 r72-k7-06-01.k8s.ash1.cloudsys.tmcs systemd[1]: Unit > >>> ceph-osd@12.service entered failed state. > >>> Sep 05 17:08:26 r72-k7-06-01.k8s.ash1.cloudsys.tmcs systemd[1]: > >>> ceph-osd@12.service failed. > >>> > >>> > >>> > >>> Drives are OK, FS mounted and no suspicious kernel logs. > >>> > >>> What's next to troubleshoot ? > >>> > >> While I am unable to assist you with this problem, I think it would be > >> useful to create tracker.ceph.com issue and attack OSD logs as it dumps > >> recent events too. > >> > >> BTW I have few open tasks too related to OSD stability/reliability on > >> similar setup - http://tracker.ceph.com/issues/21259 and > >> http://tracker.ceph.com/issues/21180 > >> Can you grep your logs to see if you experienced something similar? > >> > >> And I got "FAILED assert(i->prior_version == last || i->is_error())", > >> "FAILED assert(p.second->shared_blob_set.empty())" and abort on > >> thread_name:tp_peering today which I didn't get to report yet. > >> > >> > >>> > >>> thanks ! > >>> > >>> > >>> > >>> _______________________________________________ > >>> ceph-users mailing list > >>> ceph-users@lists.ceph.com > >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> > >> > >> > >> > >> _______________________________________________ > >> ceph-users mailing list > >> ceph-users@lists.ceph.com > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> > > > > -- > > Dr. Thomas Coelho > > HRZ Service Center Riedberg > > > > Goethe Universität Frankfurt | Campus Riedberg > > Max-von-Laue Str. 1 | 60438 Frankfurt am Main > > Room 1.128 > > > > Phone: +49-(0)69 798 40026 > > E-Mail: coe...@th.physik.uni-frankfurt.de > > _______________________________________________ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > -- > Cheers, > Brad > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com