Re: [ceph-users] Health_Warn recovery stuck / crushmap problem?

Jonas Stunkat Wed, 25 Jan 2017 05:25:49 -0800

Thanks for the response, problem solved.
I added "osd crush update on start = false" in my ceph.conf unter the [osd] 
section. I decided to go this way as this environment is just not big enough to 
use custom hooks. After starting and than inserting my crushmap the recovery 
started and everything is runnning fine now.
Didn´t even know that config switch existed, or that anything was automated 
regarding the crushmap. Next time I should read the whole document not just the 
bottom part ^^.


As additional info which I didn´t mention before I am running ceph version 
0.94.9 (fe6d859066244b97b24f09d46552afc2071e6f90).

Thank you Jean-Charles

Best Regards
Jonas

>
>Hi Jonas,
>
>
>
>In your current CRUSH map your root ssd contains 2 nodes but those two nodes 
>contain no
>osds and this is causing the problem.
>
>
>
>Look like you forgot to set the parameter osd_crush_update_on_start = false 
>before
>applying your special CRUSH Map. Hence when you restarted the OSD they wen 
>back the default
>behaviour of attaching themselves to the host they run on.
>
>
>
>
>To get it back to healthy for now, set the parameter above in your ceph.conf 
>on your OSD
>nodes, restart your OSDs then re-apply your customized CRUSH map.
>
>
>
>As an alternative you can also use the CRUSH location hook to automate the 
>placement of
>your OSDs
>([http://docs.ceph.com/docs/master/rados/operations/crush-map/#custom-loc
>ation-hooks ->
>http://docs.ceph.com/docs/master/rados/operations/crush-map/#custom-locat
>ion-hooks]).
>
>
>
>
>Regards
>
>JC
>
>
>
>
>
>On 24 Jan 2017, at 07:42, Jonas Stunkat jonas.stun...@miarka24.de wrote:
>
>
>All OSD´s and Monitors are up from what I can see.
>I read through the troubleshooting like mentioned in the ceph documentation 
>for PGs
>and came to the conclusion that nothing there would help me, so I didn´t try 
>anything -
>except restarting / rebooting OSD´s and Monitors.
>
>How do I recover from this, it looks to me that the data itself should be safe 
>for now, but
>why is it not restoring?
>I guess the problem may be the crushmap.
>
>Here are some outputs:
>
>#ceph health detail
>
>HEALTH_WARN 475 pgs degraded; 640 pgs stale; 475 pgs stuck degraded; 640 pgs 
>stuck
>stale; 640 pgs stuck unclean; 475 pgs stuck undersized; 475 pgs undersized; 
>recovery
>104812/279550 objects degraded (37.493%); recovery 69926/279550 objects 
>misplaced (25.014%)
>pg 3.ec is stuck unclean for 3326815.935321, current state 
>stale+active+remapped,
>last acting [7,6]
>pg 3.ed is stuck unclean for 3288818.682456, current state 
>stale+active+remapped,
>last acting [6,7]
>pg 3.ee is stuck unclean for 409973.052061, current state
>stale+active+undersized+degraded, last acting [7]
>pg 3.ef is stuck unclean for 3357894.554762, current state
>stale+active+undersized+degraded, last acting [7]
>pg 3.e8 is stuck unclean for 384815.518837, current state
>stale+active+undersized+degraded, last acting [6]
>pg 3.e9 is stuck unclean for 3274554.591000, current state 
>stale+active+remapped,
>last acting [6,7]
>......
>
>
>#########################################################################
>#######
>
>This is the crushmap I created and intended to use and thought I used for the 
>past 2
>months:
>- pvestorage1-ssd and pvestorage1-platter are the same hosts, it seems like 
>this is
>not possible but I never noticed
>- likewise with pvestorage2
>
># begin crush map
>tunable choose_local_tries 0
>tunable choose_local_fallback_tries 0
>tunable choose_total_tries 50
>tunable chooseleaf_descend_once 1
>tunable straw_calc_version 1
>
># devices
>device 0 osd.0
>device 1 osd.1
>device 2 osd.2
>device 3 osd.3
>device 4 osd.4
>device 5 osd.5
>device 6 osd.6
>device 7 osd.7
>
># types
>type 0 osd
>type 1 host
>type 2 chassis
>type 3 rack
>type 4 row
>type 5 pdu
>type 6 pod
>type 7 room
>type 8 datacenter
>type 9 region
>type 10 root
>
># buckets
>host pvestorage1-ssd {
> id -2 # do not change unnecessarily
> # weight 1.740
> alg straw
> hash 0 # rjenkins1
> item osd.0 weight 0.870
> item osd.1 weight 0.870
>}
>host pvestorage2-ssd {
> id -3 # do not change unnecessarily
> # weight 1.740
> alg straw
> hash 0 # rjenkins1
> item osd.2 weight 0.870
> item osd.3 weight 0.870
>}
>host pvestorage1-platter {
> id -4 # do not change unnecessarily
> # weight 4
> alg straw
> hash 0 # rjenkins1
> item osd.4 weight 2.000
> item osd.5 weight 2.000
>}
>host pvestorage2-platter {
> id -5 # do not change unnecessarily
> # weight 4
> alg straw
> hash 0 # rjenkins1
> item osd.6 weight 2.000
> item osd.7 weight 2.000
>}
>
>root ssd {
> id -1 # do not change unnecessarily
> # weight 3.480
> alg straw
> hash 0 # rjenkins1
> item pvestorage1-ssd weight 1.740
> item pvestorage2-ssd weight 1.740
>}
>
>root platter {
> id -6 # do not change unnecessarily
> # weight 8
> alg straw
> hash 0 # rjenkins1
> item pvestorage1-platter weight 4.000
> item pvestorage2-platter weight 4.000
>}
>
># rules
>rule ssd {
> ruleset 0
> type replicated
> min_size 1
> max_size 10
> step take ssd
> step chooseleaf firstn 0 type host
> step emit
>}
>
>rule platter {
> ruleset 1
> type replicated
> min_size 1
> max_size 10
> step take platter
> step chooseleaf firstn 0 type host
> step emit
>}
># end crush map
>
>#########################################################################
>#######
>
>This is the what ceph made of this crushmap and the one that is actually used 
>right now, I
>never looked -_- :
>
># begin crush map
>tunable choose_local_tries 0
>tunable choose_local_fallback_tries 0
>tunable choose_total_tries 50
>tunable chooseleaf_descend_once 1
>tunable straw_calc_version 1
>
># devices
>device 0 osd.0
>device 1 osd.1
>device 2 osd.2
>device 3 osd.3
>device 4 osd.4
>device 5 osd.5
>device 6 osd.6
>device 7 osd.7
>
># types
>type 0 osd
>type 1 host
>type 2 chassis
>type 3 rack
>type 4 row
>type 5 pdu
>type 6 pod
>type 7 room
>type 8 datacenter
>type 9 region
>type 10 root
>
># buckets
>host pvestorage1-ssd {
> id -2 # do not change unnecessarily
> # weight 0.000
> alg straw
> hash 0 # rjenkins1
>}
>host pvestorage2-ssd {
> id -3 # do not change unnecessarily
> # weight 0.000
> alg straw
> hash 0 # rjenkins1
>}
>root ssd {
> id -1 # do not change unnecessarily
> # weight 0.000
> alg straw
> hash 0 # rjenkins1
> item pvestorage1-ssd weight 0.000
> item pvestorage2-ssd weight 0.000
>}
>host pvestorage1-platter {
> id -4 # do not change unnecessarily
> # weight 0.000
> alg straw
> hash 0 # rjenkins1
>}
>host pvestorage2-platter {
> id -5 # do not change unnecessarily
> # weight 0.000
> alg straw
> hash 0 # rjenkins1
>}
>root platter {
> id -6 # do not change unnecessarily
> # weight 0.000
> alg straw
> hash 0 # rjenkins1
> item pvestorage1-platter weight 0.000
> item pvestorage2-platter weight 0.000
>}
>host pvestorage1 {
> id -7 # do not change unnecessarily
> # weight 5.740
> alg straw
> hash 0 # rjenkins1
> item osd.5 weight 2.000
> item osd.4 weight 2.000
> item osd.1 weight 0.870
> item osd.0 weight 0.870
>}
>host pvestorage2 {
> id -9 # do not change unnecessarily
> # weight 5.740
> alg straw
> hash 0 # rjenkins1
> item osd.3 weight 0.870
> item osd.2 weight 0.870
> item osd.6 weight 2.000
> item osd.7 weight 2.000
>}
>root default {
> id -8 # do not change unnecessarily
> # weight 11.480
> alg straw
> hash 0 # rjenkins1
> item pvestorage1 weight 5.740
> item pvestorage2 weight 5.740
>}
>
># rules
>rule ssd {
> ruleset 0
> type replicated
> min_size 1
> max_size 10
> step take ssd
> step chooseleaf firstn 0 type host
> step emit
>}
>rule platter {
> ruleset 1
> type replicated
> min_size 1
> max_size 10
> step take platter
> step chooseleaf firstn 0 type host
> step emit
>}
>
># end crush map
>
>#########################################################################
>#######
>
>How do I recover from this?
>
>Best Regards
>Jonas
>_______________________________________________
>ceph-users mailing list
>ceph-users@lists.ceph.com
>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>
--

--
Mit freundlichem Gruß



Jonas Stunkat

mail: i...@miarka24.de 
*****************************************************************************************
*****************************************************************************************
******************

Miarka Elektronik GmbH


Unterhaltungselektronik, LCD-TV, sofort Finanzierungen ohne Gehaltsnachweis 
möglich,
Haushaltsgeräte, Kücheneinbaugeräte, Elektroinstallationen, Sat-und
Kabelanlagen,EDV Computer Multimedia Reparaturen und Ersatzteile aller 
Hersteller,
Meisterwerkstatt - autorisierter Kundendienst
10823 Berlin Schöneberg, Akazienstraße 25
Tel. 030 / 784 84 74, Fax: +49 30 78 70 23 77
www.miarka24.de | Kontakt: i...@miarka24.de E-mail: i...@miarka24.de 
Eingetragen beim Amtsgericht Berlin Charlottenburg HRB 138962B

Geschäftsführer: André Schumacher

Ust-IdNr.:DE281159356


*****************************************************************************************
*****************************************************************************************
******************

Wichtiger Hinweis / Important Notice

Diese E-Mail inkl. Anhänge ist vertraulich und könnte geheime Informationen 
enthalten.
Sind Sie nicht der richtige Adressat, dann unterlassen Sie bitte Weiterleiten, 
Kopieren und Speichern
dieser E-Mail sowie das Öffnen der Anhänge. Bitte löschen Sie diese E-Mail von 
Ihrem
System und benachrichtigen Sie den Absender Vielen Dank.

This email and any files attached are confidential and may contain privileged 
information. If you are not
the intended recipient, do not forward or disclose this email, open any 
attachments, make any copies or
save this email anywhere. Please delete this email from your system and notify 
the sender). Thank you very
much.

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Health_Warn recovery stuck / crushmap problem?

Reply via email to