Re: [ceph-users] Oeps: lost cluster with: ceph osd require-osd-release luminous

2017-09-12 Thread Josh Durgin
Could you post your crushmap? PGs mapping to no OSDs is a symptom of something 
wrong there.


You can stop the osds from changing position at startup with 'osd crush update 
on start = false':


http://docs.ceph.com/docs/master/rados/operations/crush-map/#crush-location


Josh

Sent from Nine

From: Jan-Willem Michels 
Sent: Sep 11, 2017 23:50
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Oeps: lost cluster with: ceph osd require-osd-release 
luminous

> We have a kraken cluster,  at the time newly build, with bluestore enabled. 
> it is 8 systems, with each 10 disks 10TB ,  and each computer has 1 NVME 
> 2TB disk 
> 3 monitor etc 
> About 700 TB and 300TB used. Mainly S3 objectstore 
>
> Of course there is more to the story:  We have one strange thing in our 
> cluster. 
> We tried  to create two pools of storage, default and ssd, and created a 
> new crush rule. 
> Worked without problems for months 
> But when we restart a computer / nvme-osd, it would "forget" that the 
> nvme should be connected the SSD pool ( for that particular computer). 
> Since we don't restart systems, we didn't notice that. 
> The nvme would appear back a default  pool. 
> When we re-apply the same crush rule again it would go back to the SSD 
> pool. 
> All while data kept working on the nvme disks 
>
> Clearly something is not ideal there. And luminous has a different 
> approach to separating  SSD from HDD. 
> So we thought first go to luminous 12.2.0 and later see how we fix this. 
>
> We did an upgrade to luminous and that went well. That requires a reboot 
> / restart off osd's, so all nvme devices where a default. 
> Reapplying the crush rule  brought them back to the ssd pool. 
> Also while doing the upgrade we switched off in ceph.conf the rule: 
> # enable experimental unrecoverable data corrupting features = 
> bluestore, sine in luminous that was no problem 
>
> Everything was working fine. 
> In Ceph -s we had this health warning 
>
>  all OSDs are running luminous or later but 
> require_osd_release < luminous 
>
> So i thought i would set the minimum  OSD version to luminous with; 
>
> ceph osd require-osd-release luminous 
>
> To us that seemed nothing more than a minimum software version that was 
> required to connect tot the cluster 
> the system answered back 
>
> recovery_deletes is set 
>
> and that was it, the same second, ceph-s went to "0" 
>
>   ceph -s 
>    cluster: 
>  id: 5bafad08-31b2-4716-be77-07ad2e2647eb 
>  health: HEALTH_WARN 
>  noout flag(s) set 
>  Reduced data availability: 3248 pgs inactive 
>  Degraded data redundancy: 3248 pgs unclean 
>
>    services: 
>  mon: 3 daemons, quorum Ceph-Mon1,Ceph-Mon2,Ceph-Mon3 
>  mgr: Ceph-Mon2(active), standbys: Ceph-Mon3, Ceph-Mon1 
>  osd: 88 osds: 88 up, 88 in; 297 remapped pgs 
>   flags noout 
>
>    data: 
>  pools:   26 pools, 3248 pgs 
>  objects: 0 objects, 0 bytes 
>  usage:   0 kB used, 0 kB / 0 kB avail 
>  pgs: 100.000% pgs unknown 
>   3248 unknown 
>
> And it was something like this. The errors (apart  from the scrub error) 
> you see would where from the upgrade / restarting, and I would expect 
> them to go away very fast. 
>
> ceph -s 
>    cluster: 
>  id: 5bafad08-31b2-4716-be77-07ad2e2647eb 
>  health: HEALTH_ERR 
>  385 pgs backfill_wait 
>  5 pgs backfilling 
>  135 pgs degraded 
>  1 pgs inconsistent 
>  1 pgs peering 
>  4 pgs recovering 
>  131 pgs recovery_wait 
>  98 pgs stuck degraded 
>  525 pgs stuck unclean 
>  recovery 119/612465488 objects degraded (0.000%) 
>  recovery 24/612465488 objects misplaced (0.000%) 
>  1 scrub errors 
>  noout flag(s) set 
>  all OSDs are running luminous or later but 
> require_osd_release < luminous 
>
>    services: 
>  mon: 3 daemons, quorum Ceph-Mon1,Ceph-Mon2,Ceph-Mon3 
>  mgr: Ceph-Mon2(active), standbys: Ceph-Mon1, Ceph-Mon3 
>  osd: 88 osds: 88 up, 88 in; 387 remapped pgs 
>   flags noout 
>
>    data: 
>  pools:   26 pools, 3248 pgs 
>  objects: 87862k objects, 288 TB 
>  usage:   442 TB used, 300 TB / 742 TB avail 
>  pgs: 0.031% pgs not active 
>   119/612465488 objects degraded (0.000%) 
>   24/612465488 objects misplaced (0.000%) 
>   2720 active+clean 
>   385  active+remapped+backfill_wait 

[ceph-users] Oeps: lost cluster with: ceph osd require-osd-release luminous

2017-09-11 Thread Jan-Willem Michels

We have a kraken cluster,  at the time newly build, with bluestore enabled.
it is 8 systems, with each 10 disks 10TB ,  and each computer has 1 NVME 
2TB disk

3 monitor etc
About 700 TB and 300TB used. Mainly S3 objectstore

Of course there is more to the story:  We have one strange thing in our 
cluster.
We tried  to create two pools of storage, default and ssd, and created a 
new crush rule.

Worked without problems for months
But when we restart a computer / nvme-osd, it would "forget" that the 
nvme should be connected the SSD pool ( for that particular computer).

Since we don't restart systems, we didn't notice that.
The nvme would appear back a default  pool.
When we re-apply the same crush rule again it would go back to the SSD 
pool.

All while data kept working on the nvme disks

Clearly something is not ideal there. And luminous has a different 
approach to separating  SSD from HDD.

So we thought first go to luminous 12.2.0 and later see how we fix this.

We did an upgrade to luminous and that went well. That requires a reboot 
/ restart off osd's, so all nvme devices where a default.

Reapplying the crush rule  brought them back to the ssd pool.
Also while doing the upgrade we switched off in ceph.conf the rule:
# enable experimental unrecoverable data corrupting features = 
bluestore, sine in luminous that was no problem


Everything was working fine.
In Ceph -s we had this health warning

all OSDs are running luminous or later but 
require_osd_release < luminous


So i thought i would set the minimum  OSD version to luminous with;

ceph osd require-osd-release luminous

To us that seemed nothing more than a minimum software version that was 
required to connect tot the cluster

the system answered back

recovery_deletes is set

and that was it, the same second, ceph-s went to "0"

 ceph -s
  cluster:
id: 5bafad08-31b2-4716-be77-07ad2e2647eb
health: HEALTH_WARN
noout flag(s) set
Reduced data availability: 3248 pgs inactive
Degraded data redundancy: 3248 pgs unclean

  services:
mon: 3 daemons, quorum Ceph-Mon1,Ceph-Mon2,Ceph-Mon3
mgr: Ceph-Mon2(active), standbys: Ceph-Mon3, Ceph-Mon1
osd: 88 osds: 88 up, 88 in; 297 remapped pgs
 flags noout

  data:
pools:   26 pools, 3248 pgs
objects: 0 objects, 0 bytes
usage:   0 kB used, 0 kB / 0 kB avail
pgs: 100.000% pgs unknown
 3248 unknown

And it was something like this. The errors (apart  from the scrub error) 
you see would where from the upgrade / restarting, and I would expect 
them to go away very fast.


ceph -s
  cluster:
id: 5bafad08-31b2-4716-be77-07ad2e2647eb
health: HEALTH_ERR
385 pgs backfill_wait
5 pgs backfilling
135 pgs degraded
1 pgs inconsistent
1 pgs peering
4 pgs recovering
131 pgs recovery_wait
98 pgs stuck degraded
525 pgs stuck unclean
recovery 119/612465488 objects degraded (0.000%)
recovery 24/612465488 objects misplaced (0.000%)
1 scrub errors
noout flag(s) set
all OSDs are running luminous or later but 
require_osd_release < luminous


  services:
mon: 3 daemons, quorum Ceph-Mon1,Ceph-Mon2,Ceph-Mon3
mgr: Ceph-Mon2(active), standbys: Ceph-Mon1, Ceph-Mon3
osd: 88 osds: 88 up, 88 in; 387 remapped pgs
 flags noout

  data:
pools:   26 pools, 3248 pgs
objects: 87862k objects, 288 TB
usage:   442 TB used, 300 TB / 742 TB avail
pgs: 0.031% pgs not active
 119/612465488 objects degraded (0.000%)
 24/612465488 objects misplaced (0.000%)
 2720 active+clean
 385  active+remapped+backfill_wait
 131  active+recovery_wait+degraded
 5active+remapped+backfilling
 4active+recovering+degraded
 1active+clean+inconsistent
 1peering
 1active+clean+scrubbing+deep

  io:
client:   34264 B/s rd, 2091 kB/s wr, 38 op/s rd, 48 op/s wr
recovery: 4235 kB/s, 6 objects/s

current ceph health detail

HEALTH_WARN noout flag(s) set; Reduced data availability: 3248 pgs 
inactive; Degraded data redundancy: 3248 pgs unclean

OSDMAP_FLAGS noout flag(s) set
PG_AVAILABILITY Reduced data availability: 3248 pgs inactive
pg 15.7cd is stuck inactive for 24780.157341, current state 
unknown, last acting []
pg 15.7ce is stuck inactive for 24780.157341, current state 
unknown, last acting []
pg 15.7cf is stuck inactive for 24780.157341, current state 
unknown, last acting []

..
pg 15.7ff is stuck inactive for 24728.059692, current state 
unknown, last acting []

PG_DEGRADED Degraded data redundancy: 3248 pgs unclean
pg 15.7cd is stuck unclean for 24728.059692, current state unknown, 
last acting []
pg 15.7ce is stuck unclean for 24728.059692, current state unknown, 
last ac