Re: [ceph-users] Production Ceph :: PG data lost : Cluster PG incomplete, inactive, unclean

Craig Lewis Wed, 01 Apr 2015 17:10:34 -0700

Both of those say they want to talk to osd.115.

I see from the recovery_state, past_intervals that you have flapping OSDs.
 osd.140 will drop out, then come back.  osd.115 will drop out, then come
back.  osd.80 will drop out, then come back.


So really, you need to solve the OSD flapping.  That will likely solve your
incompleteness.

Any idea with the OSDs are flapping?  Any errors in ceph-osd.140.log ?



The very long past_intervals looks like you might be hitting something I
saw before.  I was having problems with the suicide timeout.  The OSDs fail
and restart so many times that they can't apply all of the map changes
before they hit the timeout.  Sage gave me some suggestions.  Give this a
try: https://www.mail-archive.com/ceph-devel@vger.kernel.org/msg18862.html
 That process solved my suicide timeouts, with one caveat. When I
followed it, I filled up /var/log/ceph/ and the recovery failed.  I had to
manually run each OSD in debugging mode until it completed the map update.
Aside from that, I followed the procedure.


That's a symptom though, not the cause.  Once I got the OSDs to stop
flapping, it would come back every couple of weeks.  I eventually
determined that the real cause was an XFS malloc issues because I used

[osd]
  osd mkfs type = xfs
  osd mkfs options xfs = -l size=1024m -n size=64k -i size=2048 -s size=4096

Changing it to

[osd]
  osd mkfs type = xfs
  osd mkfs options xfs = -s size=4096

and reformatting all disks avoided the XFS deadlock.  When the free memory
got low, OSDs would get marked out.  After a few hours, it got to the
points that the OSDs would suicide.



On Wed, Apr 1, 2015 at 12:17 PM, Karan Singh <karan.si...@csc.fi> wrote:

> Any pointers to fix incomplete PG would be grateful
>
>
> I tried the following with no success.
>
> pg scrub
> pg deep scrub
> pg repair
> osd out , down , rm , in
> osd lost
>
>
>
> *# ceph -s*
>     cluster 2bd3283d-67ef-4316-8b7e-d8f4747eae33
>      health HEALTH_WARN 7 pgs down; 20 pgs incomplete; 1 pgs recovering;
> 20 pgs stuck inactive; 21 pgs stuck unclean; 4 requests are blocked > 32
> sec; recovery 201/986658 objects degraded (0.020%); 133/328886 unfound
> (0.040%)
>      monmap e3: 3 mons at {pouta-s01=xx.xx.xx.1:6789/0,pouta-s02=xx.xx.xx
> .2:6789/0,pouta-s03=xx.xx.xx.3:6789/0}, election epoch 1920, quorum 0,1,2
> pouta-s01,pouta-s02,pouta-s03
>      osdmap e262813: 239 osds: 239 up, 239 in
>       pgmap v588073: 18432 pgs, 13 pools, 2338 GB data, 321 kobjects
>             19094 GB used, 849 TB / 868 TB avail
>             201/986658 objects degraded (0.020%); 133/328886 unfound
> (0.040%)
>                   * 7 down+incomplete*
>                18411 active+clean
>                   *13 incomplete*
>                    1 active+recovering
>
>
>
> *# ceph pg dump_stuck inactive*
> ok
> pg_stat objects mip degr unf bytes log disklog state state_stamp v
> reported up up_primary acting acting_primar last_scrub scrub_stamp
> last_deep_scrub deep_scrub_stamp
> 10.70 0 0 0 0 0 0 0 incomplete 2015-04-01 21:21:16.152179 0'0 262813:163
> [213,88,80] 213 [213,88,80] 213 0'0 2015-03-12 17:59:43.275049 0'0 2015-03-09
> 17:55:58.745662
> 3.dde 68 66 0 66 552861709 297 297 down+incomplete 2015-04-01
> 21:21:16.161066 33547'297 262813:230683 [174,5,179] 174 [174,5,179] 174
> 33547'297 2015-03-12 14:19:15.261595 28522'43 2015-03-11 14:19:13.894538
> 5.a2 0 0 0 0 0 0 0 incomplete 2015-04-01 21:21:16.145329 0'0 262813:150
> [168,182,201] 168 [168,182,201] 168 0'0 2015-03-12 17:58:29.257085 0'0 
> 2015-03-09
> 17:55:07.684377
> 13.1b6 0 0 0 0 0 0 0 incomplete 2015-04-01 21:21:16.139062 0'0 262813:2974
> [0,176,131] 0 [0,176,131] 0 0'0 2015-03-12 18:00:13.286920 0'0 2015-03-09
> 17:56:18.715208
> 7.25b 0 0 0 0 0 0 0 incomplete 2015-04-01 21:21:16.113876 0'0 262813:167
> [111,26,108] 111 [111,26,108] 111 27666'16 2015-03-12 17:59:06.357864
> 2330'3 2015-03-09 17:55:30.754522
> 5.19 0 0 0 0 0 0 0 down+incomplete 2015-04-01 21:21:16.199712 0'0
> 262813:27605 [212,43,131] 212 [212,43,131] 212 0'0 2015-03-12
> 13:51:37.777026 0'0 2015-03-11 13:51:35.406246
> 3.a2f 68 0 0 0 543686693 302 302 incomplete 2015-04-01 21:21:16.141368
> 33531'302 262813:3731 [149,224,33] 149 [149,224,33] 149 33531'302 2015-03-12
> 14:17:43.045627 28564'54 2015-03-11 14:17:40.314189
> 7.298 0 0 0 0 0 0 0 incomplete 2015-04-01 21:21:16.108523 0'0 262813:166
> [221,154,225] 221 [221,154,225] 221 27666'13 2015-03-12 17:59:10.308423
> 2330'4 2015-03-09 17:55:35.750109
> 1.1e7 0 0 0 0 0 0 0 incomplete 2015-04-01 21:21:16.192711 0'0 262813:162
> [215,232] 215 [215,232] 215 0'0 2015-03-12 17:55:45.203232 0'0 2015-03-09
> 17:53:49.694822
> 3.774 79 0 0 0 645136397 339 339 down+incomplete 2015-04-01
> 21:21:16.207131 33570'339 262813:168986 [162,39,161] 162 [162,39,161] 162
> 33570'339 2015-03-12 14:49:03.869447 2226'2 2015-03-09 13:46:49.783950
> 3.7d0 78 0 0 0 609222686 376 376 down+incomplete 2015-04-01
> 21:21:16.135599 33538'376 262813:185045 [117,118,177] 117 [117,118,177]
> 117 33538'376 2015-03-12 13:51:03.984454 28394'62 2015-03-11
> 13:50:58.196288
> 3.d60 0 0 0 0 0 0 0 incomplete 2015-04-01 21:21:16.158179 0'0 262813:169
> [60,56,220] 60 [60,56,220] 60 33552'321 2015-03-12 13:44:43.502907
> 28356'39 2015-03-11 13:44:41.663482
> 4.1fc 0 0 0 0 0 0 0 incomplete 2015-04-01 21:21:16.217291 0'0 262813:163
> [144,58,153] 144 [144,58,153] 144 0'0 2015-03-12 17:58:19.254170 0'0 
> 2015-03-09
> 17:54:55.720479
> 3.e02 72 0 0 0 585105425 304 304 down+incomplete 2015-04-01
> 21:21:16.099150 33568'304 262813:169744 [15,102,147] 15 [15,102,147] 15
> 33568'304 2015-03-16 10:04:19.894789 2246'4 2015-03-09 11:43:44.176331
> 8.1d4 0 0 0 0 0 0 0 down+incomplete 2015-04-01 21:21:16.218644 0'0
> 262813:21867 [126,43,174] 126 [126,43,174] 126 0'0 2015-03-12
> 14:34:35.258338 0'0 2015-03-12 14:34:35.258338
> 4.2f4 0 0 0 0 0 0 0 down+incomplete 2015-04-01 21:21:16.117515 0'0
> 262813:116150 [181,186,13] 181 [181,186,13] 181 0'0 2015-03-12
> 14:59:03.529264 0'0 2015-03-09 13:46:40.601301
> 3.e5a 76 70 0 0 623902741 325 325 incomplete 2015-04-01 21:21:16.043300
> 33569'325 262813:73426 [97,22,62] 97 [97,22,62] 97 33569'325 2015-03-12
> 13:58:05.813966 28433'44 2015-03-11 13:57:53.909795
> 8.3a0 0 0 0 0 0 0 0 incomplete 2015-04-01 21:21:16.056437 0'0
> 262813:175168 [62,14,224] 62 [62,14,224] 62 0'0 2015-03-12 13:52:44.546418
> 0'0 2015-03-12 13:52:44.546418
> 3.24e 0 0 0 0 0 0 0 incomplete 2015-04-01 21:21:16.130831 0'0 262813:165
> [39,202,90] 39 [39,202,90] 39 33556'272 2015-03-13 11:44:41.263725 2327'4 
> 2015-03-09
> 17:54:43.675552
> 5.f7 0 0 0 0 0 0 0 incomplete 2015-04-01 21:21:16.145298 0'0 262813:153
> [54,193,123] 54 [54,193,123] 54 0'0 2015-03-12 17:58:30.257371 0'0 2015-03-09
> 17:55:11.725629
> [root@pouta-s01 ceph]#
>
>
> ##########  Example 1 : PG 10.70 ###########
>
>
> *10.70 0 0 0 0 0 0 0 incomplete 2015-04-01 21:21:16.152179 0'0 262813:163
> [213,88,80] 213 [213,88,80] 213 0'0 2015-03-12 17:59:43.275049 0'0
> 2015-03-09 17:55:58.745662*
>
>
> This is how i found location of each OSD
>
> [root@pouta-s01 ceph]# *ceph osd find 88*
>
> { "osd": 88,
>   "ip": "10.100.50.3:7079\/916853",
>   "crush_location": { "host": "pouta-s03",
>       "root": "default”}}
> [root@pouta-s01 ceph]#
>
>
> When i manually check current/pg_head directory , data is not present here
> ( i.e. data is lost from all the copies )
>
>
> [root@pouta-s04 current]# ls -l
> /var/lib/ceph/osd/ceph-80/current/10.70_head
> *total 0*
> [root@pouta-s04 current]#
>
>
> On some of the OSD’s HEAD directory does not exists
>
> [root@pouta-s03 ~]# ls -l /var/lib/ceph/osd/ceph-88/current/10.70_head
> *ls: cannot access /var/lib/ceph/osd/ceph-88/current/10.70_head: No such
> file or directory*
> [root@pouta-s03 ~]#
>
> [root@pouta-s02 ~]# ls -l /var/lib/ceph/osd/ceph-213/current/10.70_head
> *total 0*
> [root@pouta-s02 ~]#
>
>
> # ceph pg 10.70 query  --->  *http://paste.ubuntu.com/10719840/
> <http://paste.ubuntu.com/10719840/>*
>
>
> ##########  Example 2 : PG 3.7d0 ###########
>
> *3.7d0 78 0 0 0 609222686 376 376 down+incomplete 2015-04-01
> 21:21:16.135599 33538'376 262813:185045 [117,118,177] 117 [117,118,177] 117
> 33538'376 2015-03-12 13:51:03.984454 28394'62 2015-03-11 13:50:58.196288*
>
>
> [root@pouta-s04 current]# ceph pg map 3.7d0
> osdmap e262813 pg 3.7d0 (3.7d0) -> up [117,118,177] acting [117,118,177]
> [root@pouta-s04 current]#
>
>
> *Data is present here , so 1 copy is present out of 3 *
>
> *[root@pouta-s04 current]# ls -l
> /var/lib/ceph/osd/ceph-117/current/3.7d0_head/ | wc -l*
> *63*
> *[root@pouta-s04 current]#*
>
>
>
> [root@pouta-s03 ~]#  ls -l /var/lib/ceph/osd/ceph-118/current/3.7d0_head/
> *total 0*
> [root@pouta-s03 ~]#
>
>
> [root@pouta-s01 ceph]# ceph osd find 177
> { "osd": 177,
>   "ip": "10.100.50.2:7062\/777799",
>   "crush_location": { "host": "pouta-s02",
>       "root": "default”}}
> [root@pouta-s01 ceph]#
>
> *Even directory is not present here *
>
> [root@pouta-s02 ~]#  ls -l /var/lib/ceph/osd/ceph-177/current/3.7d0_head/
> *ls: cannot access /var/lib/ceph/osd/ceph-177/current/3.7d0_head/: No such
> file or directory*
> [root@pouta-s02 ~]#
>
>
> *# ceph pg  3.7d0 query http://paste.ubuntu.com/10720107/
> <http://paste.ubuntu.com/10720107/>*
>
>
> - Karan -
>
> On 20 Mar 2015, at 22:43, Craig Lewis <cle...@centraldesktop.com> wrote:
>
> > osdmap e261536: 239 osds: 239 up, 238 in
>
> Why is that last OSD not IN?  The history you need is probably there.
>
> Run  ceph pg <pgid> query on some of the stuck PGs.  Look for
> the recovery_state section.  That should tell you what Ceph needs to
> complete the recovery.
>
>
> If you need more help, post the output of a couple pg queries.
>
>
>
> On Fri, Mar 20, 2015 at 4:22 AM, Karan Singh <karan.si...@csc.fi> wrote:
>
>> Hello Guys
>>
>> My CEPH cluster lost data and not its not recovering. This problem
>> occurred when Ceph performed recovery when one of the node was down.
>> Now all the nodes are up but Ceph is showing PG as incomplete , unclean ,
>> recovering.
>>
>>
>> I have tried several things to recover them like , *scrub , deep-scrub ,
>> pg repair , try changing primary affinity and then scrubbing ,
>> osd_pool_default_size etc. BUT NO LUCK*
>>
>> Could yo please advice , how to recover PG and achieve HEALTH_OK
>>
>> # ceph -s
>>     cluster 2bd3283d-67ef-4316-8b7e-d8f4747eae33
>>      health *HEALTH_WARN 19 pgs incomplete; 3 pgs recovering; 20 pgs
>> stuck inactive; 23 pgs stuck unclean*; 2 requests are blocked > 32 sec;
>> recovery 531/980676 objects degraded (0.054%); 243/326892 unfound (0.074%)
>>      monmap e3: 3 mons at
>> {xxx=xxxx:6789/0,xxx=xxxx:6789:6789/0,xxx=xxxx:6789:6789/0}, election epoch
>> 1474, quorum 0,1,2 xx,xx,xx
>>      osdmap e261536: 239 osds: 239 up, 238 in
>>       pgmap v415790: 18432 pgs, 13 pools, 2330 GB data, 319 kobjects
>>             20316 GB used, 844 TB / 864 TB avail
>>             531/980676 objects degraded (0.054%); 243/326892 unfound
>> (0.074%)
>>                    1 creating
>>                18409 active+clean
>>                    3 active+recovering
>>                   19 incomplete
>>
>>
>>
>>
>> # ceph pg dump_stuck unclean
>> ok
>> pg_stat objects mip degr unf bytes log disklog state state_stamp v
>> reported up up_primary acting acting_primary last_scrub scrub_stamp
>> last_deep_scrub deep_scrub_stamp
>> 10.70 0 0 0 0 0 0 0 incomplete 2015-03-20 12:19:49.534911 0'0 261536:1015
>> [153,140,80] 153 [153,140,80] 153 0'0 2015-03-12 17:59:43.275049 0'0 
>> 2015-03-09
>> 17:55:58.745662
>> 3.dde 68 66 0 66 552861709 297 297 incomplete 2015-03-20 12:19:49.584839
>> 33547'297 261536:228352 [174,5,179] 174 [174,5,179] 174 33547'297 2015-03-12
>> 14:19:15.261595 28522'43 2015-03-11 14:19:13.894538
>> 5.a2 0 0 0 0 0 0 0 incomplete 2015-03-20 12:19:49.560756 0'0 261536:897
>> [214,191,170] 214 [214,191,170] 214 0'0 2015-03-12 17:58:29.257085 0'0 
>> 2015-03-09
>> 17:55:07.684377
>> 13.1b6 0 0 0 0 0 0 0 incomplete 2015-03-20 12:19:49.846253 0'0
>> 261536:1050 [0,176,131] 0 [0,176,131] 0 0'0 2015-03-12 18:00:13.286920
>> 0'0 2015-03-09 17:56:18.715208
>> 7.25b 16 0 0 0 67108864 16 16 incomplete 2015-03-20 12:19:49.639102
>> 27666'16 261536:4777 [194,145,45] 194 [194,145,45] 194 27666'16 2015-03-12
>> 17:59:06.357864 2330'3 2015-03-09 17:55:30.754522
>> 5.19 0 0 0 0 0 0 0 incomplete 2015-03-20 12:19:49.742698 0'0 261536:25410
>> [212,43,131] 212 [212,43,131] 212 0'0 2015-03-12 13:51:37.777026 0'0 
>> 2015-03-11
>> 13:51:35.406246
>> 3.a2f 0 0 0 0 0 0 0 creating 2015-03-20 12:42:15.586372 0'0 0:0 [] -1 []
>> -1 0'0 0.000000 0'0 0.000000
>> 7.298 0 0 0 0 0 0 0 incomplete 2015-03-20 12:19:49.566966 0'0 261536:900
>> [187,95,225] 187 [187,95,225] 187 27666'13 2015-03-12 17:59:10.308423
>> 2330'4 2015-03-09 17:55:35.750109
>> 3.a5a 77 87 261 87 623902741 325 325 active+recovering 2015-03-20
>> 10:54:57.443670 33569'325 261536:182464 [150,149,181] 150 [150,149,181]
>> 150 33569'325 2015-03-12 13:58:05.813966 28433'44 2015-03-11
>> 13:57:53.909795
>> 1.1e7 0 0 0 0 0 0 0 incomplete 2015-03-20 12:19:49.610547 0'0 261536:772
>> [175,182] 175 [175,182] 175 0'0 2015-03-12 17:55:45.203232 0'0 2015-03-09
>> 17:53:49.694822
>> 3.774 79 0 0 0 645136397 339 339 incomplete 2015-03-20 12:19:49.821708
>> 33570'339 261536:166857 [162,39,161] 162 [162,39,161] 162 33570'339 
>> 2015-03-12
>> 14:49:03.869447 2226'2 2015-03-09 13:46:49.783950
>> 3.7d0 78 0 0 0 609222686 376 376 incomplete 2015-03-20 12:19:49.534004
>> 33538'376 261536:182810 [117,118,177] 117 [117,118,177] 117 33538'376 
>> 2015-03-12
>> 13:51:03.984454 28394'62 2015-03-11 13:50:58.196288
>> 3.d60 0 0 0 0 0 0 0 incomplete 2015-03-20 12:19:49.647196 0'0 261536:833
>> [154,172,1] 154 [154,172,1] 154 33552'321 2015-03-12 13:44:43.502907
>> 28356'39 2015-03-11 13:44:41.663482
>> 4.1fc 0 0 0 0 0 0 0 incomplete 2015-03-20 12:19:49.610103 0'0 261536:1069
>> [70,179,58] 70 [70,179,58] 70 0'0 2015-03-12 17:58:19.254170 0'0 2015-03-09
>> 17:54:55.720479
>> 3.e02 72 0 0 0 585105425 304 304 incomplete 2015-03-20 12:19:49.564768
>> 33568'304 261536:167428 [15,102,147] 15 [15,102,147] 15 33568'304 2015-03-16
>> 10:04:19.894789 2246'4 2015-03-09 11:43:44.176331
>> 8.1d4 0 0 0 0 0 0 0 incomplete 2015-03-20 12:19:49.614727 0'0
>> 261536:19611 [126,43,174] 126 [126,43,174] 126 0'0 2015-03-12
>> 14:34:35.258338 0'0 2015-03-12 14:34:35.258338
>> 4.2f4 0 0 0 0 0 0 0 incomplete 2015-03-20 12:19:49.595109 0'0
>> 261536:113791 [181,186,13] 181 [181,186,13] 181 0'0 2015-03-12
>> 14:59:03.529264 0'0 2015-03-09 13:46:40.601301
>> 3.52c 65 23 69 23 543162368 290 290 active+recovering 2015-03-20
>> 10:51:43.664734 33553'290 261536:8431 [212,100,219] 212 [212,100,219] 212
>> 33553'290 2015-03-13 11:44:26.396514 29686'103 2015-03-11 17:18:33.452616
>> 3.e5a 76 70 0 0 623902741 325 325 incomplete 2015-03-20 12:19:49.552071
>> 33569'325 261536:71248 [97,22,62] 97 [97,22,62] 97 33569'325 2015-03-12
>> 13:58:05.813966 28433'44 2015-03-11 13:57:53.909795
>> 8.3a0 0 0 0 0 0 0 0 incomplete 2015-03-20 12:19:49.615728 0'0
>> 261536:173184 [62,14,178] 62 [62,14,178] 62 0'0 2015-03-12
>> 13:52:44.546418 0'0 2015-03-12 13:52:44.546418
>> 3.24e 0 0 0 0 0 0 0 incomplete 2015-03-20 12:19:49.591282 0'0 261536:1026
>> [103,14,90] 103 [103,14,90] 103 33556'272 2015-03-13 11:44:41.263725
>> 2327'4 2015-03-09 17:54:43.675552
>> 5.f7 0 0 0 0 0 0 0 incomplete 2015-03-20 12:19:49.667823 0'0 261536:853
>> [73,44,123] 73 [73,44,123] 73 0'0 2015-03-12 17:58:30.257371 0'0 2015-03-09
>> 17:55:11.725629
>> 3.ae8 77 67 201 67 624427024 342 342 active+recovering 2015-03-20
>> 10:50:01.693979 33516'342 261536:149258 [122,144,218] 122 [122,144,218]
>> 122 33516'342 2015-03-12 17:11:01.899062 29638'134 2015-03-11
>> 17:10:59.966372
>> #
>>
>>
>> PG data is there on multiple OSD’s but Ceph is not recovering the PG ,
>> For Example
>>
>> # ceph pg map 7.25b
>> osdmap e261536 pg 7.25b (7.25b) -> up [194,145,45] acting [194,145,45]
>>
>>
>> # ls -l /var/lib/ceph/osd/ceph-194/current/7.25b_head | wc -l
>> 17
>>
>> # ls -l /var/lib/ceph/osd/ceph-145/current/7.25b_head | wc -l
>> 0
>> #
>>
>> # ls -l /var/lib/ceph/osd/ceph-45/current/7.25b_head | wc -l
>> 17
>>
>>
>>
>>
>>
>> Some of the PG are completely lost , i.e they don’t have any data . For
>> example
>>
>> # ceph pg map 10.70
>> osdmap e261536 pg 10.70 (10.70) -> up [153,140,80] acting [153,140,80]
>>
>>
>> # ls -l /var/lib/ceph/osd/ceph-140/current/10.70_head | wc -l
>> 0
>>
>> # ls -l /var/lib/ceph/osd/ceph-153/current/10.70_head | wc -l
>> 0
>>
>> # ls -l /var/lib/ceph/osd/ceph-80/current/10.70_head | wc -l
>> 0
>>
>>
>>
>> - Karan -
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Production Ceph :: PG data lost : Cluster PG incomplete, inactive, unclean

Reply via email to