I'm hitting this same issue on 12.2.5. Upgraded one node to 12.2.10 and it
didn't clear.

6 OSDs flapping with this error. I know this is an older issue but are
traces still needed? I don't see a resolution available.

Thanks,

Dan

On Wed, Sep 6, 2017 at 10:30 PM Brad Hubbard <bhubb...@redhat.com> wrote:

> These error logs look like they are being generated here,
>
> https://github.com/ceph/ceph/blob/master/src/os/bluestore/BlueStore.cc#L8987-L8993
> or possibly here,
>
> https://github.com/ceph/ceph/blob/master/src/os/bluestore/BlueStore.cc#L9230-L9236
> .
>
> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]:
> 2017-09-05 17:02:58.686723 7fe1871ac700 -1
> bluestore(/var/lib/ceph/osd/ceph-12) _txc_add_transaction error (2) No
> such file or directory not handled on operation 15 (op 0, counting
> from 0)
>
> The table of operations is here,
> https://github.com/ceph/ceph/blob/master/src/os/ObjectStore.h#L370
>
> Operation 15 is OP_SETATTRS so it appears to be some extended
> attribute operation that is failing.
>
> Can someone run the ceph-osd under strace and find the last system
> call (probably a call that manipulates xattrs) that returns -2 in the
> thread that crashes (or that outputs the above messages)?
>
> strace -fvttyyTo /tmp/strace.out -s 1024 ceph-osd [system specific
> argumentsarguments]
>
> Capturing logs with "debug_bluestore = 20" may tell us more as well.
>
> We need to work out what resource it is trying to access when it
> receives the error '2' (No such file or directory).
>
>
> On Thu, Sep 7, 2017 at 12:13 AM, Thomas Coelho
> <coe...@th.physik.uni-frankfurt.de> wrote:
> > Hi,
> >
> > I have the same problem. A bug [1] is reported since months, but
> > unfortunately this is not fixed yet. I hope, if more people are having
> > this problem the developers can reproduce and fix it.
> >
> > I was using Kernel-RBD with a Cache Tier.
> >
> > so long
> > Thomas Coelho
> >
> > [1] http://tracker.ceph.com/issues/20222
> >
> >
> > Am 06.09.2017 um 15:41 schrieb Henrik Korkuc:
> >> On 17-09-06 16:24, Jean-Francois Nadeau wrote:
> >>> Hi,
> >>>
> >>> On a 4 node / 48 OSDs Luminous cluster Im giving a try at RBD on EC
> >>> pools + Bluestore.
> >>>
> >>> Setup went fine but after a few bench runs several OSD are failing and
> >>> many wont even restart.
> >>>
> >>> ceph osd erasure-code-profile set myprofile \
> >>>    k=2\
> >>>    m=1 \
> >>>    crush-failure-domain=host
> >>> ceph osd pool create mypool 1024 1024 erasure myprofile
> >>> ceph osd pool set mypool allow_ec_overwrites true
> >>> rbd pool init mypool
> >>> ceph -s
> >>> ceph health detail
> >>> ceph osd pool create metapool 1024 1024 replicated
> >>> rbd create --size 1024G --data-pool mypool --image metapool/test1
> >>> rbd bench -p metapool test1 --io-type write --io-size 8192
> >>> --io-pattern rand --io-total 10G
> >>> ...
> >>>
> >>>
> >>> One of many OSD failing logs
> >>>
> >>> Sep 05 17:02:54 r72-k7-06-01.k8s.ash1.cloudsys.tmcs systemd[1]:
> >>> Started Ceph object storage daemon osd.12.
> >>> Sep 05 17:02:54 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]:
> >>> starting osd.12 at - osd_data /var/lib/ceph/osd/ceph-12
> >>> /var/lib/ceph/osd/ceph-12/journal
> >>> Sep 05 17:02:56 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]:
> >>> 2017-09-05 17:02:56.627301 7fe1a2e42d00 -1 osd.12 2219 log_to_monitors
> >>> {default=true}
> >>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]:
> >>> 2017-09-05 17:02:58.686723 7fe1871ac700 -1
> >>> bluestore(/var/lib/ceph/osd/ceph-12) _txc_add_transac
> >>> tion error (2) No such file or directory not handled on operation 15
> >>> (op 0, counting from 0)
> >>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]:
> >>> 2017-09-05 17:02:58.686742 7fe1871ac700 -1
> >>> bluestore(/var/lib/ceph/osd/ceph-12) unexpected error
> >>>  code
> >>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]:
> >>>
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/
> >>>
> centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.0/rpm/el7/BUILD/ceph-12.2.0/src/os/bluestore/BlueStore.cc:
> >>> In function 'void BlueStore::_txc_add_transaction(Blu
> >>> eStore::TransContext*, ObjectStore::Transaction*)' thread 7fe1871ac700
> >>> time 2017-09-05 17:02:58.686821
> >>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]:
> >>>
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/
> >>>
> centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.0/rpm/el7/BUILD/ceph-12.2.0/src/os/bluestore/BlueStore.cc:
> >>> 9282: FAILED assert(0 == "unexpected error")
> >>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]:
> >>> ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c)
> >>> luminous (rc)
> >>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 1:
> >>> (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >>> const*)+0x110) [0x7fe1a38bf510]
> >>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 2:
> >>> (BlueStore::_txc_add_transaction(BlueStore::TransContext*,
> >>> ObjectStore::Transaction*)+0x1487)
> >>>  [0x7fe1a3796057]
> >>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 3:
> >>> (BlueStore::queue_transactions(ObjectStore::Sequencer*,
> >>> std::vector<ObjectStore::Transaction,
> >>>  std::allocator<ObjectStore::Transaction> >&,
> >>> boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x3a0)
> >>> [0x7fe1a37970a0]
> >>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 4:
> >>>
> (PrimaryLogPG::queue_transactions(std::vector<ObjectStore::Transaction,
> std::allocator<Object
> >>> Store::Transaction> >&, boost::intrusive_ptr<OpRequest>)+0x65)
> >>> [0x7fe1a3508745]
> >>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 5:
> >>> (ECBackend::handle_sub_write(pg_shard_t,
> >>> boost::intrusive_ptr<OpRequest>, ECSubWrite&, ZTrace
> >>> r::Trace const&, Context*)+0x631) [0x7fe1a3628711]
> >>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 6:
> >>> (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x327)
> >>> [0x7fe1a36392b7]
> >>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 7:
> >>> (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
> >>> [0x7fe1a353da10]
> >>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 8:
> >>> (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >>> ThreadPool::TPHandle&)+0x58e) [0x
> >>> 7fe1a34a9a7e]
> >>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 9:
> >>> (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHan
> >>> dle&)+0x3f9) [0x7fe1a333c729]
> >>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]:
> >>> 10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >>> const&)+0x57) [0x7fe1a35ac1
> >>> 97]
> >>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]:
> >>> 11: (OSD::ShardedOpWQ::_process(unsigned int,
> >>> ceph::heartbeat_handle_d*)+0xfce) [0x7fe1a3367c8e]
> >>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]:
> >>> 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839)
> >>> [0x7fe1a38c5029]
> >>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]:
> >>> 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10)
> [0x7fe1a38c6fc0]
> >>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]:
> >>> 14: (()+0x7dc5) [0x7fe1a0484dc5]
> >>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]:
> >>> 15: (clone()+0x6d) [0x7fe19f57876d]
> >>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]:
> >>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >>> needed to interpret this.
> >>> [root@r72-k7-06-01 ~]# journalctl -u  ceph-osd@12 --no-pager -n 100
> >>> -- Logs begin at Wed 2017-08-30 10:26:26 UTC, end at Tue 2017-09-05
> >>> 22:06:19 UTC. --
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
> >>> 15: (clone()+0x6d) [0x7f0160c9d76d]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
> >>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >>> needed to interpret this.
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
> >>> -5496> 2017-09-05 17:08:03.460844 7f0164567d00 -1 osd.12 2362
> >>> log_to_monitors {default=true}
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
> >>> -74> 2017-09-05 17:08:05.837648 7f01488d1700 -1
> >>> bluestore(/var/lib/ceph/osd/ceph-12) _txc_add_transaction error (2) No
> >>> such file or directory not handled on operation 15 (op 0, counting
> from 0)
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
> >>> -73> 2017-09-05 17:08:05.837670 7f01488d1700 -1
> >>> bluestore(/var/lib/ceph/osd/ceph-12) unexpected error code
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 0>
> >>> 2017-09-05 17:08:05.843218 7f01488d1700 -1
> >>>
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.0/rpm/el7/BUILD/ceph-12.2.0/src/os/bluestore/BlueStore.cc:
> >>> In function 'void
> >>> BlueStore::_txc_add_transaction(BlueStore::TransContext*,
> >>> ObjectStore::Transaction*)' thread 7f01488d1700 time 2017-09-05
> >>> 17:08:05.837770
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
> >>>
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.0/rpm/el7/BUILD/ceph-12.2.0/src/os/bluestore/BlueStore.cc:
> >>> 9282: FAILED assert(0 == "unexpected error")
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
> >>> ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c)
> >>> luminous (rc)
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 1:
> >>> (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >>> const*)+0x110) [0x7f0164fe4510]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 2:
> >>> (BlueStore::_txc_add_transaction(BlueStore::TransContext*,
> >>> ObjectStore::Transaction*)+0x1487) [0x7f0164ebb057]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 3:
> >>> (BlueStore::queue_transactions(ObjectStore::Sequencer*,
> >>> std::vector<ObjectStore::Transaction,
> >>> std::allocator<ObjectStore::Transaction> >&,
> >>> boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x3a0)
> >>> [0x7f0164ebc0a0]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 4:
> >>>
> (PrimaryLogPG::queue_transactions(std::vector<ObjectStore::Transaction,
> std::allocator<ObjectStore::Transaction>
> >>> >&, boost::intrusive_ptr<OpRequest>)+0x65) [0x7f0164c2d745]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 5:
> >>> (ECBackend::handle_sub_write(pg_shard_t,
> >>> boost::intrusive_ptr<OpRequest>, ECSubWrite&, ZTracer::Trace const&,
> >>> Context*)+0x631) [0x7f0164d4d711]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 6:
> >>> (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x327)
> >>> [0x7f0164d5e2b7]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 7:
> >>> (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
> >>> [0x7f0164c62a10]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 8:
> >>> (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >>> ThreadPool::TPHandle&)+0x58e) [0x7f0164bcea7e]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 9:
> >>> (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f9)
> >>> [0x7f0164a61729]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
> >>> 10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >>> const&)+0x57) [0x7f0164cd1197]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
> >>> 11: (OSD::ShardedOpWQ::_process(unsigned int,
> >>> ceph::heartbeat_handle_d*)+0xfce) [0x7f0164a8cc8e]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
> >>> 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839)
> >>> [0x7f0164fea029]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
> >>> 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10)
> [0x7f0164febfc0]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
> >>> 14: (()+0x7dc5) [0x7f0161ba9dc5]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
> >>> 15: (clone()+0x6d) [0x7f0160c9d76d]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
> >>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >>> needed to interpret this.
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
> >>> *** Caught signal (Aborted) **
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: in
> >>> thread 7f01488d1700 thread_name:tp_osd_tp
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
> >>> ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c)
> >>> luminous (rc)
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 1:
> >>> (()+0xa23b21) [0x7f0164fa5b21]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 2:
> >>> (()+0xf370) [0x7f0161bb1370]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 3:
> >>> (gsignal()+0x37) [0x7f0160bdb1d7]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 4:
> >>> (abort()+0x148) [0x7f0160bdc8c8]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 5:
> >>> (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >>> const*)+0x284) [0x7f0164fe4684]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 6:
> >>> (BlueStore::_txc_add_transaction(BlueStore::TransContext*,
> >>> ObjectStore::Transaction*)+0x1487) [0x7f0164ebb057]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 7:
> >>> (BlueStore::queue_transactions(ObjectStore::Sequencer*,
> >>> std::vector<ObjectStore::Transaction,
> >>> std::allocator<ObjectStore::Transaction> >&,
> >>> boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x3a0)
> >>> [0x7f0164ebc0a0]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 8:
> >>>
> (PrimaryLogPG::queue_transactions(std::vector<ObjectStore::Transaction,
> std::allocator<ObjectStore::Transaction>
> >>> >&, boost::intrusive_ptr<OpRequest>)+0x65) [0x7f0164c2d745]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 9:
> >>> (ECBackend::handle_sub_write(pg_shard_t,
> >>> boost::intrusive_ptr<OpRequest>, ECSubWrite&, ZTracer::Trace const&,
> >>> Context*)+0x631) [0x7f0164d4d711]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
> >>> 10:
> >>> (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x327)
> >>> [0x7f0164d5e2b7]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
> >>> 11: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
> >>> [0x7f0164c62a10]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
> >>> 12: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >>> ThreadPool::TPHandle&)+0x58e) [0x7f0164bcea7e]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
> >>> 13: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f9)
> >>> [0x7f0164a61729]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
> >>> 14: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >>> const&)+0x57) [0x7f0164cd1197]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
> >>> 15: (OSD::ShardedOpWQ::_process(unsigned int,
> >>> ceph::heartbeat_handle_d*)+0xfce) [0x7f0164a8cc8e]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
> >>> 16: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839)
> >>> [0x7f0164fea029]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
> >>> 17: (ShardedThreadPool::WorkThreadSharded::entry()+0x10)
> [0x7f0164febfc0]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
> >>> 18: (()+0x7dc5) [0x7f0161ba9dc5]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
> >>> 19: (clone()+0x6d) [0x7f0160c9d76d]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
> >>> 2017-09-05 17:08:05.883240 7f01488d1700 -1 *** Caught signal (Aborted)
> **
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: in
> >>> thread 7f01488d1700 thread_name:tp_osd_tp
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
> >>> ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c)
> >>> luminous (rc)
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 1:
> >>> (()+0xa23b21) [0x7f0164fa5b21]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 2:
> >>> (()+0xf370) [0x7f0161bb1370]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 3:
> >>> (gsignal()+0x37) [0x7f0160bdb1d7]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 4:
> >>> (abort()+0x148) [0x7f0160bdc8c8]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 5:
> >>> (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >>> const*)+0x284) [0x7f0164fe4684]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 6:
> >>> (BlueStore::_txc_add_transaction(BlueStore::TransContext*,
> >>> ObjectStore::Transaction*)+0x1487) [0x7f0164ebb057]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 7:
> >>> (BlueStore::queue_transactions(ObjectStore::Sequencer*,
> >>> std::vector<ObjectStore::Transaction,
> >>> std::allocator<ObjectStore::Transaction> >&,
> >>> boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x3a0)
> >>> [0x7f0164ebc0a0]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 8:
> >>>
> (PrimaryLogPG::queue_transactions(std::vector<ObjectStore::Transaction,
> std::allocator<ObjectStore::Transaction>
> >>> >&, boost::intrusive_ptr<OpRequest>)+0x65) [0x7f0164c2d745]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 9:
> >>> (ECBackend::handle_sub_write(pg_shard_t,
> >>> boost::intrusive_ptr<OpRequest>, ECSubWrite&, ZTracer::Trace const&,
> >>> Context*)+0x631) [0x7f0164d4d711]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
> >>> 10:
> >>> (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x327)
> >>> [0x7f0164d5e2b7]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
> >>> 11: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
> >>> [0x7f0164c62a10]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
> >>> 12: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >>> ThreadPool::TPHandle&)+0x58e) [0x7f0164bcea7e]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
> >>> 13: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f9)
> >>> [0x7f0164a61729]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
> >>> 14: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >>> const&)+0x57) [0x7f0164cd1197]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
> >>> 15: (OSD::ShardedOpWQ::_process(unsigned int,
> >>> ceph::heartbeat_handle_d*)+0xfce) [0x7f0164a8cc8e]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
> >>> 16: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839)
> >>> [0x7f0164fea029]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
> >>> 17: (ShardedThreadPool::WorkThreadSharded::entry()+0x10)
> [0x7f0164febfc0]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
> >>> 18: (()+0x7dc5) [0x7f0161ba9dc5]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
> >>> 19: (clone()+0x6d) [0x7f0160c9d76d]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
> >>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >>> needed to interpret this.
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 0>
> >>> 2017-09-05 17:08:05.883240 7f01488d1700 -1 *** Caught signal (Aborted)
> **
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: in
> >>> thread 7f01488d1700 thread_name:tp_osd_tp
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
> >>> ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c)
> >>> luminous (rc)
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 1:
> >>> (()+0xa23b21) [0x7f0164fa5b21]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 2:
> >>> (()+0xf370) [0x7f0161bb1370]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 3:
> >>> (gsignal()+0x37) [0x7f0160bdb1d7]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 4:
> >>> (abort()+0x148) [0x7f0160bdc8c8]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 5:
> >>> (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >>> const*)+0x284) [0x7f0164fe4684]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 6:
> >>> (BlueStore::_txc_add_transaction(BlueStore::TransContext*,
> >>> ObjectStore::Transaction*)+0x1487) [0x7f0164ebb057]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 7:
> >>> (BlueStore::queue_transactions(ObjectStore::Sequencer*,
> >>> std::vector<ObjectStore::Transaction,
> >>> std::allocator<ObjectStore::Transaction> >&,
> >>> boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x3a0)
> >>> [0x7f0164ebc0a0]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 8:
> >>>
> (PrimaryLogPG::queue_transactions(std::vector<ObjectStore::Transaction,
> std::allocator<ObjectStore::Transaction>
> >>> >&, boost::intrusive_ptr<OpRequest>)+0x65) [0x7f0164c2d745]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 9:
> >>> (ECBackend::handle_sub_write(pg_shard_t,
> >>> boost::intrusive_ptr<OpRequest>, ECSubWrite&, ZTracer::Trace const&,
> >>> Context*)+0x631) [0x7f0164d4d711]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
> >>> 10:
> >>> (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x327)
> >>> [0x7f0164d5e2b7]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
> >>> 11: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
> >>> [0x7f0164c62a10]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
> >>> 12: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> >>> ThreadPool::TPHandle&)+0x58e) [0x7f0164bcea7e]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
> >>> 13: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> >>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f9)
> >>> [0x7f0164a61729]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
> >>> 14: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
> >>> const&)+0x57) [0x7f0164cd1197]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
> >>> 15: (OSD::ShardedOpWQ::_process(unsigned int,
> >>> ceph::heartbeat_handle_d*)+0xfce) [0x7f0164a8cc8e]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
> >>> 16: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839)
> >>> [0x7f0164fea029]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
> >>> 17: (ShardedThreadPool::WorkThreadSharded::entry()+0x10)
> [0x7f0164febfc0]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
> >>> 18: (()+0x7dc5) [0x7f0161ba9dc5]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
> >>> 19: (clone()+0x6d) [0x7f0160c9d76d]
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
> >>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >>> needed to interpret this.
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs systemd[1]:
> >>> ceph-osd@12.service: main process exited, code=killed, status=6/ABRT
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs systemd[1]: Unit
> >>> ceph-osd@12.service entered failed state.
> >>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs systemd[1]:
> >>> ceph-osd@12.service failed.
> >>> Sep 05 17:08:26 r72-k7-06-01.k8s.ash1.cloudsys.tmcs systemd[1]:
> >>> ceph-osd@12.service holdoff time over, scheduling restart.
> >>> Sep 05 17:08:26 r72-k7-06-01.k8s.ash1.cloudsys.tmcs systemd[1]: start
> >>> request repeated too quickly for ceph-osd@12.service
> >>> Sep 05 17:08:26 r72-k7-06-01.k8s.ash1.cloudsys.tmcs systemd[1]: Failed
> >>> to start Ceph object storage daemon osd.12.
> >>> Sep 05 17:08:26 r72-k7-06-01.k8s.ash1.cloudsys.tmcs systemd[1]: Unit
> >>> ceph-osd@12.service entered failed state.
> >>> Sep 05 17:08:26 r72-k7-06-01.k8s.ash1.cloudsys.tmcs systemd[1]:
> >>> ceph-osd@12.service failed.
> >>>
> >>>
> >>>
> >>> Drives are OK, FS mounted and no suspicious kernel logs.
> >>>
> >>> What's next to troubleshoot ?
> >>>
> >> While I am unable to assist you with this problem, I think it would be
> >> useful to create tracker.ceph.com issue and attack OSD logs as it dumps
> >> recent events too.
> >>
> >> BTW I have few open tasks too related to OSD stability/reliability on
> >> similar setup - http://tracker.ceph.com/issues/21259 and
> >> http://tracker.ceph.com/issues/21180
> >> Can you grep your logs to see if you experienced something similar?
> >>
> >> And I got "FAILED assert(i->prior_version == last || i->is_error())",
> >> "FAILED assert(p.second->shared_blob_set.empty())" and abort on
> >> thread_name:tp_peering today which I didn't get to report yet.
> >>
> >>
> >>>
> >>> thanks !
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> ceph-users mailing list
> >>> ceph-users@lists.ceph.com
> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> >
> > --
> > Dr. Thomas Coelho
> > HRZ Service Center Riedberg
> >
> > Goethe Universität Frankfurt | Campus Riedberg
> > Max-von-Laue Str. 1 | 60438 Frankfurt am Main
> > Room 1.128
> >
> > Phone:  +49-(0)69 798 40026
> > E-Mail: coe...@th.physik.uni-frankfurt.de
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> --
> Cheers,
> Brad
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to