I have my servers on UPS and shutdown them manually the way I use to turn
them off. There where enough power in the UPS after the servers were
shutdown because is continued to beep. Anyway, I will wipe it and re-add
it. Thanks for your reply.

/Peter

mån 19 juni 2017 kl 09:11 skrev Wido den Hollander <w...@42on.com>:

>
> > Op 18 juni 2017 om 16:21 schreef Peter Rosell <peter.ros...@gmail.com>:
> >
> >
> > Hi,
> > I have a small cluster with only three nodes, 4 OSDs + 3 OSDs. I have
> been
> > running version 0.87.2 (Giant) for over 2.5 year, but a couple of day
> ago I
> > upgraded to 0.94.10 (Hammer) and then up to 10.2.7 (Jewel). Both the
> > upgrades went great. Started with monitors, osd and finally mds. The log
> > shows all 448 pgs active+clean. I'm running all daemons inside docker and
> > ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)
> >
> > Today I had a power outage and I had to take down the servers. When I now
> > start the servers again one OSD daemon doesn't start properly. It keeps
> > crashing.
> >
> > I noticed that the two first restarts of the osd daemon crashed with this
> > error:
> > FAILED assert(rollback_info_trimmed_to_riter == log.rbegin())
> >
> > After that it always fails with "FAILED assert(i.first <= i.last)"
> >
> > I have 15 logs like this one:
> > Jun 18 08:56:18 island sh[27991]: 2017-06-18 08:56:18.300641 7f5c5e0ff8c0
> > -1 log_channel(cluster) log [ERR] : 2.38 log bound mismatch, info
> > (19544'666742,19691'671046] actual [19499'665843,19691'671046]
> >
> > I removed the directories <pg_id>_head, but that just removed these error
> > logs. It crashes anyway.
> >
> > Anyone has any suggestions what to do to make it start up correct. Of
> > course I can remove the OSD from the cluster and re-add it, but it feels
> > like a bug.
>
> Are you sure? Since you had a power failure it could be that certain parts
> weren't committed to disk/FS properly when the power failed. That really
> depends on the hardware and configuration.
>
> Please, do not try to repair this OSD. Wipe it and re-add it to the
> cluster.
>
> Wido
>
> > A small snippet from the logs is added below. I didn't include the event
> > list. If it will help I can send it too.
> >
> > Jun 18 13:52:23 island sh[7068]: osd/osd_types.cc: In function 'static
> bool
> > pg_interval_t::check_new_interval(int, int, const std::vector<int>&,
> const
> > std::vector<int>&, int, int, const std::vector<int>&, const
> > std::vector<int>&, epoch_t, epoch_t, OSDMapRef, OSDMapRef, pg_t,
> > IsPGRecoverablePredicate*, std::map<unsigned int, pg_interval_t>*,
> > std::ostream*)' thread 7f4fc2500700 time 2017-06-18 13:52:23.593991
> > Jun 18 13:52:23 island sh[7068]: osd/osd_types.cc: 3132: FAILED
> > assert(i.first <= i.last)
> > Jun 18 13:52:23 island sh[7068]:  ceph version 10.2.7
> > (50e863e0f4bc8f4b9e31156de690d765af245185)
> > Jun 18 13:52:23 island sh[7068]:  1: (ceph::__ceph_assert_fail(char
> const*,
> > char const*, int, char const*)+0x80) [0x559fe4c14360]
> > Jun 18 13:52:23 island sh[7068]:  2:
> > (pg_interval_t::check_new_interval(int, int, std::vector<int,
> > std::allocator<int> > const&, std::vector<int, std::allocator<int> >
> > const&, int, int, std::vector<int, std::allocator<int> > const&,
> > std::vector<int, std::allocator<int> > const&, unsigned int, unsigned
> int,
> > std::shared_ptr<OSDMap const>, std::shared_ptr<OSDMap const>, pg_t,
> > IsPGRecoverablePredicate*, std::map<unsigned int, pg_interval_t,
> > std::less<unsigned int>, std::allocator<std::pair<unsigned int const,
> > pg_interval_t> > >*, std::ostream*)+0x72c) [0x559fe47f723c]
> > Jun 18 13:52:23 island sh[7068]:  3:
> > (PG::start_peering_interval(std::shared_ptr<OSDMap const>,
> std::vector<int,
> > std::allocator<int> > const&, int, std::vector<int, std::allocator<int> >
> > const&, int, ObjectStore::Transaction*)+0x3ff) [0x559fe461439f]
> > Jun 18 13:52:23 island sh[7068]:  4:
> > (PG::RecoveryState::Reset::react(PG::AdvMap const&)+0x478)
> [0x559fe4615828]
> > Jun 18 13:52:23 island sh[7068]:  5:
> > (boost::statechart::simple_state<PG::RecoveryState::Reset,
> > PG::RecoveryState::RecoveryMachine, boost::mpl::list<mpl_::na, mpl_::na,
> > mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
> > mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
> > mpl_::na, mpl_::na, mpl_::na, mpl_::na>,
> >
> (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base
> > const&, void const*)+0x176) [0x559fe4645b86]
> > Jun 18 13:52:23 island sh[7068]:  6:
> > (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine,
> > PG::RecoveryState::Initial, std::allocator<void>,
> >
> boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base
> > const&)+0x69) [0x559fe4626d49]
> > Jun 18 13:52:23 island sh[7068]:  7:
> > (PG::handle_advance_map(std::shared_ptr<OSDMap const>,
> > std::shared_ptr<OSDMap const>, std::vector<int, std::allocator<int> >&,
> > int, std::vector<int, std::allocator<int> >&, int,
> PG::RecoveryCtx*)+0x49e)
> > [0x559fe45fa5ae]
> > Jun 18 13:52:23 island sh[7068]:  8: (OSD::advance_pg(unsigned int, PG*,
> > ThreadPool::TPHandle&, PG::RecoveryCtx*,
> std::set<boost::intrusive_ptr<PG>,
> > std::less<boost::intrusive_ptr<PG> >,
> > std::allocator<boost::intrusive_ptr<PG> > >*)+0x2f2) [0x559fe452c042]
> > Jun 18 13:52:23 island sh[7068]:  9:
> > (OSD::process_peering_events(std::__cxx11::list<PG*, std::allocator<PG*>
> >
> > const&, ThreadPool::TPHandle&)+0x214) [0x559fe4546d34]
> > Jun 18 13:52:23 island sh[7068]:  10:
> > (ThreadPool::BatchWorkQueue<PG>::_void_process(void*,
> > ThreadPool::TPHandle&)+0x25) [0x559fe458f8e5]
> > Jun 18 13:52:23 island sh[7068]:  11:
> > (ThreadPool::worker(ThreadPool::WorkThread*)+0xdb1) [0x559fe4c06531]
> > Jun 18 13:52:23 island sh[7068]:  12:
> > (ThreadPool::WorkThread::entry()+0x10) [0x559fe4c07630]
> > Jun 18 13:52:23 island sh[7068]:  13: (()+0x76fa) [0x7f4fe256b6fa]
> > Jun 18 13:52:23 island sh[7068]:  14: (clone()+0x6d) [0x7f4fe05e3b5d]
> > Jun 18 13:52:23 island sh[7068]:  NOTE: a copy of the executable, or
> > `objdump -rdS <executable>` is needed to interpret this.
> > Jun 18 13:52:23 island sh[7068]: 2017-06-18 13:52:23.599558 7f4fc2500700
> -1
> > osd/osd_types.cc: In function 'static bool
> > pg_interval_t::check_new_interval(int, int, const std::vector<int>&,
> const
> > std::vector<int>&, int, int, const std::vector<int>&, const
> > std::vector<int>&, epoch_t, epoch_t, OSDMapRef, OSDMapRef, pg_t,
> > IsPGRecoverablePredicate*, std::map<unsigned int, pg_interval_t>*,
> > std::ostream*)' thread 7f4fc2500700 time 2017-06-18 13:52:23.593991
> > Jun 18 13:52:23 island sh[7068]: osd/osd_types.cc: 3132: FAILED
> > assert(i.first <= i.last)
> > Jun 18 13:52:23 island sh[7068]:  ceph version 10.2.7
> > (50e863e0f4bc8f4b9e31156de690d765af245185)
> > Jun 18 13:52:23 island sh[7068]:  1: (ceph::__ceph_assert_fail(char
> const*,
> > char const*, int, char const*)+0x80) [0x559fe4c14360]
> > Jun 18 13:52:23 island sh[7068]:  2:
> > (pg_interval_t::check_new_interval(int, int, std::vector<int,
> > std::allocator<int> > const&, std::vector<int, std::allocator<int> >
> > const&, int, int, std::vector<int, std::allocator<int> > const&,
> > std::vector<int, std::allocator<int> > const&, unsigned int, unsigned
> int,
> > std::shared_ptr<OSDMap const>, std::shared_ptr<OSDMap const>, pg_t,
> > IsPGRecoverablePredicate*, std::map<unsigned int, pg_interval_t,
> > std::less<unsigned int>, std::allocator<std::pair<unsigned int const,
> > pg_interval_t> > >*, std::ostream*)+0x72c) [0x559fe47f723c]
> > Jun 18 13:52:23 island sh[7068]:  3:
> > (PG::start_peering_interval(std::shared_ptr<OSDMap const>,
> std::vector<int,
> > std::allocator<int> > const&, int, std::vector<int, std::allocator<int> >
> > const&, int, ObjectStore::Transaction*)+0x3ff) [0x559fe461439f]
> > Jun 18 13:52:23 island sh[7068]:  4:
> > (PG::RecoveryState::Reset::react(PG::AdvMap const&)+0x478)
> [0x559fe4615828]
> > Jun 18 13:52:23 island sh[7068]:  5:
> > (boost::statechart::simple_state<PG::RecoveryState::Reset,
> > PG::RecoveryState::RecoveryMachine, boost::mpl::list<mpl_::na, mpl_::na,
> > mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
> > mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
> > mpl_::na, mpl_::na, mpl_::na, mpl_::na>,
> >
> (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base
> > const&, void const*)+0x176) [0x559fe4645b86]
> > Jun 18 13:52:23 island sh[7068]:  6:
> > (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine,
> > PG::RecoveryState::Initial, std::allocator<void>,
> >
> boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base
> > const&)+0x69) [0x559fe4626d49]
> > Jun 18 13:52:23 island sh[7068]:  7:
> > (PG::handle_advance_map(std::shared_ptr<OSDMap const>,
> > std::shared_ptr<OSDMap const>, std::vector<int, std::allocator<int> >&,
> > int, std::vector<int, std::allocator<int> >&, int,
> PG::RecoveryCtx*)+0x49e)
> > [0x559fe45fa5ae]
> > Jun 18 13:52:23 island sh[7068]:  8: (OSD::advance_pg(unsigned int, PG*,
> > ThreadPool::TPHandle&, PG::RecoveryCtx*,
> std::set<boost::intrusive_ptr<PG>,
> > std::less<boost::intrusive_ptr<PG> >,
> > std::allocator<boost::intrusive_ptr<PG> > >*)+0x2f2) [0x559fe452c042]
> > Jun 18 13:52:23 island sh[7068]:  9:
> > (OSD::process_peering_events(std::__cxx11::list<PG*, std::allocator<PG*>
> >
> > const&, ThreadPool::TPHandle&)+0x214) [0x559fe4546d34]
> > Jun 18 13:52:23 island sh[7068]:  10:
> > (ThreadPool::BatchWorkQueue<PG>::_void_process(void*,
> > ThreadPool::TPHandle&)+0x25) [0x559fe458f8e5]
> > Jun 18 13:52:23 island sh[7068]:  11:
> > (ThreadPool::worker(ThreadPool::WorkThread*)+0xdb1) [0x559fe4c06531]
> > Jun 18 13:52:23 island sh[7068]:  12:
> > (ThreadPool::WorkThread::entry()+0x10) [0x559fe4c07630]
> > Jun 18 13:52:23 island sh[7068]:  13: (()+0x76fa) [0x7f4fe256b6fa]
> > Jun 18 13:52:23 island sh[7068]:  14: (clone()+0x6d) [0x7f4fe05e3b5d]
> > Jun 18 13:52:23 island sh[7068]:  NOTE: a copy of the executable, or
> > `objdump -rdS <executable>` is needed to interpret this.
> > Jun 18 13:52:23 island sh[7068]: --- begin dump of recent events ---
> > Jun 18 13:52:23 island sh[7068]:  -2051> 2017-06-18 13:50:36.086036
> > 7f4fe36bb8c0  5 asok(0x559fef2d6000) register_command perfcounters_dump
> > hook 0x559fef216030
> >
> >
> > /Peter
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to