Re: [ceph-users] OSD distribution unequally -- osd crashes

Kenneth Waegeman Thu, 24 Apr 2014 02:19:31 -0700


----- Message from Craig Lewis <cle...@centraldesktop.com> ---------
   Date: Fri, 18 Apr 2014 14:59:25 -0700
   From: Craig Lewis <cle...@centraldesktop.com>
Subject: Re: [ceph-users] OSD distribution unequally
     To: ceph-users@lists.ceph.com

When you increase the number of PGs, don't just go to the max value.Step into it.You'll want to end up around 2048, so do 400 -> 512, wait for it tofinish, -> 1024, wait, -> 2048.


Thanks, I changed it to 512, also doing the reweight-by-utilisation.

While doing this, some osds crashed a few times. I thought it wasmaybe because of the fact they were almost full.When this was finishes, some pgs were still in active-remapped state.I read in another mail from the list to try to do ceph osd crushtunables optimal. So that is running now. But after a few hours, again17 out of 42 osds were crashed. (I don't know the crashing isconnected with the reweight and the pgs stuck in active-remapped)

Out of log file:

2014-04-24 03:46:57.442110 7f4968a11700 -1 osd/PG.cc: In function'PG::RecoveryState::Crashed::Crashed(boost::statechart::state<PG::RecoveryState::Crashed,PG::RecoveryState::RecoveryMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na,mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,mpl_::na, mpl_::na, mpl_::na,mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,mpl_::na>, (boost::statechart::history_mode)0u>::my_context)' thread7f4968a11700 time 2014-04-24 03:46:57.3

66010
osd/PG.cc: 5298: FAILED assert(0 == "we got a bad state machine event")

 ceph version 0.79-209-g924064f (924064f83b7fb5d4f0961ee712d410ed1855cba0)

1:(PG::RecoveryState::Crashed::Crashed(boost::statechart::state<PG::RecoveryState::Crashed, PG::RecoveryState::RecoveryMachine, boost::mpl::list<mpl_::na, mpl_::na,mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,mpl_::na, mpl_::na, mpl_::na, m

pl_::na>, (boost::statechart::history_mode)0>::my_context)+0x12f) [0x7a99ff]

2:(boost::statechart::detail::inner_constructor<boost::mpl::l_item<mpl_::long_<1l>, PG::RecoveryState::Crashed, boost::mpl::l_end>,boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial,std::allocator<void>, boost::statechart::null_exception_translator>>::construct(boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial,std::allocator<void>, boost::statechart::null_exception_translator>*const&, boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine,PG::RecoveryState::Initial, std::allocator<void>,boost::statechart::null_exception_translator>&)+0x26) [0x7ed146]3: (boost::statechart::simple_state<PG::RecoveryState::Started,PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Start,(boost::statechart::history_mode)0>::react_impl(boost::statechart::event_baseconst&, void const*)+0xfa) [0x7f6faa]4:(boost::statechart::simple_state<PG::RecoveryState::ReplicaActive,PG::RecoveryState::Started, PG::RecoveryState::RepNotRecovering,(boost::statechart::history_mode)0>::react_impl(boost::statechart::event_baseconst&, void const*)+0x161) [0x802461]5:(boost::statechart::simple_state<PG::RecoveryState::RepRecovering,PG::RecoveryState::ReplicaActive, boost::mpl::list<mpl_::na, mpl_::na,mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,mpl_::na, mpl_::na, mpl_::na, mpl_::na>,(boost::statechart::history_mode)0>::react_impl(boost::statechart::event_baseconst&, void const*)+0x140) [0x7f5860]6:(boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine,PG::RecoveryState::Initial, std::allocator<void>,boost::statechart::null_exception_translator>::send_event(boost::statechart::event_base const&)+0x4b)[0x7f7e6b]7:(PG::handle_peering_event(std::tr1::shared_ptr<PG::CephPeeringEvt>,PG::RecoveryCtx*)+0x32f) [0x7b335f]8: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> >const&, ThreadPool::TPHandle&)+0x330) [0x65e190]9: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> >const&, ThreadPool::TPHandle&)+0x16) [0x69bed6]

 10: (ThreadPool::worker(ThreadPool::WorkThread*)+0x551) [0x9d1e11]
 11: (ThreadPool::WorkThread::entry()+0x10) [0x9d4e50]
 12: (()+0x79d1) [0x7f49840d79d1]
 13: (clone()+0x6d) [0x7f4982df8b6d]

NOTE: a copy of the executable, or `objdump -rdS <executable>` isneeded to interpret this.



Many thanks!

Kenneth

Also remember that you don't need a lot of PGs if you don't havemuch data in the pools. My .rgw.buckets pool has 2k PGs, but thethe RGW metadata pools only have a couple MB and 32 PGs each.



*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com <mailto:cle...@centraldesktop.com>

*Central Desktop. Work together in ways you never thought possible.*

Connect with us Website <http://www.centraldesktop.com/> | Twitter<http://www.twitter.com/centraldesktop> | Facebook<http://www.facebook.com/CentralDesktop> | LinkedIn<http://www.linkedin.com/groups?gid=147417> | Blog<http://cdblog.centraldesktop.com/>


On 4/18/14 05:04 , Tyler Brekke wrote:

That is rather low, increasing the pg count should help with thedata distribution.

Documentation recommends starting with (100 * (num of osds))/(replicas) rounded up to the nearest power of two.


https://ceph.com/docs/master/rados/operations/placement-groups/

On Fri, Apr 18, 2014 at 4:54 AM, Kenneth Waegeman<kenneth.waege...@ugent.be <mailto:kenneth.waege...@ugent.be>> wrote:



   ----- Message from Tyler Brekke <tyler.bre...@inktank.com
   <mailto:tyler.bre...@inktank.com>> ---------
      Date: Fri, 18 Apr 2014 04:37:26 -0700
      From: Tyler Brekke <tyler.bre...@inktank.com
   <mailto:tyler.bre...@inktank.com>>
   Subject: Re: [ceph-users] OSD distribution unequally
        To: Dan Van Der Ster <daniel.vanders...@cern.ch
   <mailto:daniel.vanders...@cern.ch>>
        Cc: Kenneth Waegeman <kenneth.waege...@ugent.be
   <mailto:kenneth.waege...@ugent.be>>, ceph-users
   <ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>>



       How many placement groups do you have in your pool containing
       the data, and
       what is the replication level of that pool?


   400 pgs per pool, replication factor is 3



       Looks like you have too few placement groups which is causing
       the data
       distribution to be off.

       -Tyler


       On Fri, Apr 18, 2014 at 4:12 AM, Dan Van Der Ster
       <daniel.vanders...@cern.ch <mailto:daniel.vanders...@cern.ch>

           wrote:


            ceph osd reweight-by-utilization

           Is that still in 0.79?

           I'd start with reweight-by-utilization 200 and then adjust
           that number
           down until you get to 120 or so.

           Cheers, Dan
           On Apr 18, 2014 12:49 PM, Kenneth Waegeman
           <kenneth.waege...@ugent.be>
           wrote:
             Hi,

           Some osds of our cluster filled up:
                 health HEALTH_ERR 1 full osd(s); 4 near full osd(s)
                 monmap e1: 3 mons at
           {ceph001=

10.141.8.180:6789/0,ceph002=10.141.8.181:6789/0,ceph003=10.141.8.182:6789/0<http://10.141.8.180:6789/0,ceph002=10.141.8.181:6789/0,ceph003=10.141.8.182:6789/0>},

           election epoch 96, quorum 0,1,2
           ceph001,ceph002,ceph003
                 mdsmap e93: 1/1/1 up
           {0=ceph001.cubone.os=up:active}, 1 up:standby
                 osdmap e1974: 42 osds: 42 up, 42 in
                        flags full
                  pgmap v286626: 1200 pgs, 3 pools, 31096 GB data,
           26259 kobjects
                        94270 GB used, 40874 GB / 131 TB avail
                               1 active+clean+scrubbing+deep
                            1199 active+clean

           I knew it is never really uniform, but the differences of
           the OSDs
           seems very big, one OSD has 96% while another only has 48%
           usage,
           which is about 1,8 TB difference:
           /dev/sdc        3.7T  1.9T  1.8T  51% /var/lib/ceph/osd/sdc
           /dev/sdd        3.7T  2.5T  1.2T  68% /var/lib/ceph/osd/sdd
           /dev/sde        3.7T  2.3T  1.5T  61% /var/lib/ceph/osd/sde
           /dev/sdf        3.7T  2.7T  975G  74% /var/lib/ceph/osd/sdf
           /dev/sdg        3.7T  3.2T  491G  87% /var/lib/ceph/osd/sdg
           /dev/sdh        3.7T  2.0T  1.8T  53% /var/lib/ceph/osd/sdh
           /dev/sdi        3.7T  2.3T  1.4T  63% /var/lib/ceph/osd/sdi
           /dev/sdj        3.7T  3.4T  303G  92% /var/lib/ceph/osd/sdj
           /dev/sdk        3.7T  2.8T  915G  76% /var/lib/ceph/osd/sdk
           /dev/sdl        3.7T  1.8T  2.0T  48% /var/lib/ceph/osd/sdl
           /dev/sdm        3.7T  2.8T  917G  76% /var/lib/ceph/osd/sdm
           /dev/sdn        3.7T  3.5T  186G  96% /var/lib/ceph/osd/sdn

           We are running 0.79 (well precisely a patched version of
           it with an
           MDS fix of another thread:-) )
           I remember hearing something about the hashpgpool having
           an effect on
           it, but I read this was already default enabled on the latest
           versions. osd_pool_default_flag_hashpspool has indeed the
           value true,
           but I don't know how to check this for a specific pool.

           Is this behaviour normal? Or what can be wrong?

           Thanks!

           Kind regards,
           Kenneth Waegeman

           _______________________________________________
           ceph-users mailing list
           ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
           http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

           _______________________________________________
           ceph-users mailing list
           ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
           http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




   ----- End message from Tyler Brekke <tyler.bre...@inktank.com
   <mailto:tyler.bre...@inktank.com>> -----

   --     Met vriendelijke groeten,
   Kenneth Waegeman





_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



----- End message from Craig Lewis <cle...@centraldesktop.com> -----

--

Met vriendelijke groeten,
Kenneth Waegeman

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] OSD distribution unequally -- osd crashes

Reply via email to