[ceph-users] Re: Building new cluster had a couple of questions

2023-12-22 Thread Johan Hattne

On 2023-12-22 03:28, Robert Sander wrote:

Hi,

On 22.12.23 11:41, Albert Shih wrote:


for n in 1-100
   Put off line osd on server n
   Uninstall docker on server n
   Install podman on server n
   redeploy on server n
end


Yep, that's basically the procedure.

But first try it on a test cluster.

Regards


For reference, this was also discussed about two years ago:

  https://www.spinics.net/lists/ceph-users/msg70108.html

Worked for me.

// Johan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Misplaced objects greater than 100%

2023-04-05 Thread Johan Hattne
I think this is resolved—and you're right about the 0-weight of the root 
bucket being strange. I had created the rack buckets with


# ceph osd crush add-bucket rack-0 rack

whereas I should have used something like

# ceph osd crush add-bucket rack-0 rack root=default

There's a bit in the documentation 
(https://docs.ceph.com/en/quincy/rados/operations/crush-map) that says 
"Not all keys need to be specified" (in a different context, I admit).


I might have saved a second or two by omitting "root=default" and maybe 
half a minute by not checking the CRUSH map carefully afterwards.  It 
was not worth it.


// J

On 2023-04-05 12:01, c...@elchaka.de wrote:

I guess this is related to your crush rules..
Unfortunaly i dont know much about creating the rules...

But someone cloud give more insights when you also provide

crush rule dump

 your "-1 0 root default" is a bit strange


Am 1. April 2023 01:01:39 MESZ schrieb Johan Hattne :

Here goes:

# ceph -s
   cluster:
 id: e1327a10-8b8c-11ed-88b9-3cecef0e3946
 health: HEALTH_OK

   services:
 mon: 5 daemons, quorum 
bcgonen-a,bcgonen-b,bcgonen-c,bcgonen-r0h0,bcgonen-r0h1 (age 16h)
 mgr: bcgonen-b.furndm(active, since 8d), standbys: bcgonen-a.qmmqxj
 mds: 1/1 daemons up, 2 standby
 osd: 36 osds: 36 up (since 16h), 36 in (since 3d); 1041 remapped pgs

   data:
 volumes: 1/1 healthy
 pools:   3 pools, 1041 pgs
 objects: 5.42M objects, 6.5 TiB
 usage:   19 TiB used, 428 TiB / 447 TiB avail
 pgs: 27087125/16252275 objects misplaced (166.667%)
  1039 active+clean+remapped
  2active+clean+remapped+scrubbing+deep

# ceph osd tree
ID   CLASS  WEIGHT TYPE NAME  STATUS  REWEIGHT  PRI-AFF
-14 149.02008  rack rack-1
  -7 149.02008  host bcgonen-r1h0
  20hdd   14.55269  osd.20 up   1.0  1.0
  21hdd   14.55269  osd.21 up   1.0  1.0
  22hdd   14.55269  osd.22 up   1.0  1.0
  23hdd   14.55269  osd.23 up   1.0  1.0
  24hdd   14.55269  osd.24 up   1.0  1.0
  25hdd   14.55269  osd.25 up   1.0  1.0
  26hdd   14.55269  osd.26 up   1.0  1.0
  27hdd   14.55269  osd.27 up   1.0  1.0
  28hdd   14.55269  osd.28 up   1.0  1.0
  29hdd   14.55269  osd.29 up   1.0  1.0
  34ssd1.74660  osd.34 up   1.0  1.0
  35ssd1.74660  osd.35 up   1.0  1.0
-13 298.04016  rack rack-0
  -3 149.02008  host bcgonen-r0h0
   0hdd   14.55269  osd.0  up   1.0  1.0
   1hdd   14.55269  osd.1  up   1.0  1.0
   2hdd   14.55269  osd.2  up   1.0  1.0
   3hdd   14.55269  osd.3  up   1.0  1.0
   4hdd   14.55269  osd.4  up   1.0  1.0
   5hdd   14.55269  osd.5  up   1.0  1.0
   6hdd   14.55269  osd.6  up   1.0  1.0
   7hdd   14.55269  osd.7  up   1.0  1.0
   8hdd   14.55269  osd.8  up   1.0  1.0
   9hdd   14.55269  osd.9  up   1.0  1.0
  30ssd1.74660  osd.30 up   1.0  1.0
  31ssd1.74660  osd.31 up   1.0  1.0
  -5 149.02008  host bcgonen-r0h1
  10hdd   14.55269  osd.10 up   1.0  1.0
  11hdd   14.55269  osd.11 up   1.0  1.0
  12hdd   14.55269  osd.12 up   1.0  1.0
  13hdd   14.55269  osd.13 up   1.0  1.0
  14hdd   14.55269  osd.14 up   1.0  1.0
  15hdd   14.55269  osd.15 up   1.0  1.0
  16hdd   14.55269  osd.16 up   1.0  1.0
  17hdd   14.55269  osd.17 up   1.0  1.0
  18hdd   14.55269  osd.18 up   1.0  1.0
  19hdd   14.55269  osd.19 up   1.0  1.0
  32ssd1.74660  osd.32 up   1.0  1.0
  33ssd1.74660  osd.33 up   1.0  1.0
  -1 0  root default

# ceph osd pool ls detail
pool 1 '.mgr' replicated size 3 min_

[ceph-users] Re: Misplaced objects greater than 100%

2023-04-03 Thread Johan Hattne
Thanks Mehmet; I took a closer look at what I sent you and the problem 
appears to be in the CRUSH map.  At some point since anything was last 
rebooted, I created rack buckets and moved the OSD nodes in under them:


  # ceph osd crush add-bucket rack-0 rack
  # ceph osd crush add-bucket rack-1 rack

  # ceph osd crush move bcgonen-r0h0 rack=rack-0
  # ceph osd crush move bcgonen-r0h1 rack=rack-0
  # ceph osd crush move bcgonen-r1h0 rack=rack-1

All seemed fine at the time; it was not until bcgonen-r1h0 was rebooted 
that stuff got weird.  But as per "ceph osd tree" output, those rack 
buckets were sitting next to the default root as opposed to under it.


Now that's fixed, and the cluster is backfilling remapped PGs.

// J

On 2023-03-31 16:01, Johan Hattne wrote:

Here goes:

# ceph -s
   cluster:
     id: e1327a10-8b8c-11ed-88b9-3cecef0e3946
     health: HEALTH_OK

   services:
     mon: 5 daemons, quorum 
bcgonen-a,bcgonen-b,bcgonen-c,bcgonen-r0h0,bcgonen-r0h1 (age 16h)

     mgr: bcgonen-b.furndm(active, since 8d), standbys: bcgonen-a.qmmqxj
     mds: 1/1 daemons up, 2 standby
     osd: 36 osds: 36 up (since 16h), 36 in (since 3d); 1041 remapped pgs

   data:
     volumes: 1/1 healthy
     pools:   3 pools, 1041 pgs
     objects: 5.42M objects, 6.5 TiB
     usage:   19 TiB used, 428 TiB / 447 TiB avail
     pgs: 27087125/16252275 objects misplaced (166.667%)
  1039 active+clean+remapped
  2    active+clean+remapped+scrubbing+deep

# ceph osd tree
ID   CLASS  WEIGHT TYPE NAME  STATUS  REWEIGHT  PRI-AFF
-14 149.02008  rack rack-1
  -7 149.02008  host bcgonen-r1h0
  20    hdd   14.55269  osd.20 up   1.0  1.0
  21    hdd   14.55269  osd.21 up   1.0  1.0
  22    hdd   14.55269  osd.22 up   1.0  1.0
  23    hdd   14.55269  osd.23 up   1.0  1.0
  24    hdd   14.55269  osd.24 up   1.0  1.0
  25    hdd   14.55269  osd.25 up   1.0  1.0
  26    hdd   14.55269  osd.26 up   1.0  1.0
  27    hdd   14.55269  osd.27 up   1.0  1.0
  28    hdd   14.55269  osd.28 up   1.0  1.0
  29    hdd   14.55269  osd.29 up   1.0  1.0
  34    ssd    1.74660  osd.34 up   1.0  1.0
  35    ssd    1.74660  osd.35 up   1.0  1.0
-13 298.04016  rack rack-0
  -3 149.02008  host bcgonen-r0h0
   0    hdd   14.55269  osd.0  up   1.0  1.0
   1    hdd   14.55269  osd.1  up   1.0  1.0
   2    hdd   14.55269  osd.2  up   1.0  1.0
   3    hdd   14.55269  osd.3  up   1.0  1.0
   4    hdd   14.55269  osd.4  up   1.0  1.0
   5    hdd   14.55269  osd.5  up   1.0  1.0
   6    hdd   14.55269  osd.6  up   1.0  1.0
   7    hdd   14.55269  osd.7  up   1.0  1.0
   8    hdd   14.55269  osd.8  up   1.0  1.0
   9    hdd   14.55269  osd.9  up   1.0  1.0
  30    ssd    1.74660  osd.30 up   1.0  1.0
  31    ssd    1.74660  osd.31 up   1.0  1.0
  -5 149.02008  host bcgonen-r0h1
  10    hdd   14.55269  osd.10 up   1.0  1.0
  11    hdd   14.55269  osd.11 up   1.0  1.0
  12    hdd   14.55269  osd.12 up   1.0  1.0
  13    hdd   14.55269  osd.13 up   1.0  1.0
  14    hdd   14.55269  osd.14 up   1.0  1.0
  15    hdd   14.55269  osd.15 up   1.0  1.0
  16    hdd   14.55269  osd.16 up   1.0  1.0
  17    hdd   14.55269  osd.17 up   1.0  1.0
  18    hdd   14.55269  osd.18 up   1.0  1.0
  19    hdd   14.55269  osd.19 up   1.0  1.0
  32    ssd    1.74660  osd.32 up   1.0  1.0
  33    ssd    1.74660  osd.33 up   1.0  1.0
  -1 0  root default

# ceph osd pool ls detail
pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash 
rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 31 flags 
hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr
pool 2 'cephfs.cephfs.meta' replicated size 3 min_size 2 crush_rule 2 
object_hash rjenkins pg_num 16 pgp_num 16 autoscale_mode on last_change 
9833 lfor 0/0/584 flags hashpspool stripe_width 0 pg_autoscale_bias 4 
pg_num_min 16 recovery_priority 5 application cephfs
pool 3 &

[ceph-users] Re: Misplaced objects greater than 100%

2023-03-31 Thread Johan Hattne

Here goes:

# ceph -s
  cluster:
id: e1327a10-8b8c-11ed-88b9-3cecef0e3946
health: HEALTH_OK

  services:
mon: 5 daemons, quorum 
bcgonen-a,bcgonen-b,bcgonen-c,bcgonen-r0h0,bcgonen-r0h1 (age 16h)

mgr: bcgonen-b.furndm(active, since 8d), standbys: bcgonen-a.qmmqxj
mds: 1/1 daemons up, 2 standby
osd: 36 osds: 36 up (since 16h), 36 in (since 3d); 1041 remapped pgs

  data:
volumes: 1/1 healthy
pools:   3 pools, 1041 pgs
objects: 5.42M objects, 6.5 TiB
usage:   19 TiB used, 428 TiB / 447 TiB avail
pgs: 27087125/16252275 objects misplaced (166.667%)
 1039 active+clean+remapped
 2active+clean+remapped+scrubbing+deep

# ceph osd tree
ID   CLASS  WEIGHT TYPE NAME  STATUS  REWEIGHT  PRI-AFF
-14 149.02008  rack rack-1
 -7 149.02008  host bcgonen-r1h0
 20hdd   14.55269  osd.20 up   1.0  1.0
 21hdd   14.55269  osd.21 up   1.0  1.0
 22hdd   14.55269  osd.22 up   1.0  1.0
 23hdd   14.55269  osd.23 up   1.0  1.0
 24hdd   14.55269  osd.24 up   1.0  1.0
 25hdd   14.55269  osd.25 up   1.0  1.0
 26hdd   14.55269  osd.26 up   1.0  1.0
 27hdd   14.55269  osd.27 up   1.0  1.0
 28hdd   14.55269  osd.28 up   1.0  1.0
 29hdd   14.55269  osd.29 up   1.0  1.0
 34ssd1.74660  osd.34 up   1.0  1.0
 35ssd1.74660  osd.35 up   1.0  1.0
-13 298.04016  rack rack-0
 -3 149.02008  host bcgonen-r0h0
  0hdd   14.55269  osd.0  up   1.0  1.0
  1hdd   14.55269  osd.1  up   1.0  1.0
  2hdd   14.55269  osd.2  up   1.0  1.0
  3hdd   14.55269  osd.3  up   1.0  1.0
  4hdd   14.55269  osd.4  up   1.0  1.0
  5hdd   14.55269  osd.5  up   1.0  1.0
  6hdd   14.55269  osd.6  up   1.0  1.0
  7hdd   14.55269  osd.7  up   1.0  1.0
  8hdd   14.55269  osd.8  up   1.0  1.0
  9hdd   14.55269  osd.9  up   1.0  1.0
 30ssd1.74660  osd.30 up   1.0  1.0
 31ssd1.74660  osd.31 up   1.0  1.0
 -5 149.02008  host bcgonen-r0h1
 10hdd   14.55269  osd.10 up   1.0  1.0
 11hdd   14.55269  osd.11 up   1.0  1.0
 12hdd   14.55269  osd.12 up   1.0  1.0
 13hdd   14.55269  osd.13 up   1.0  1.0
 14hdd   14.55269  osd.14 up   1.0  1.0
 15hdd   14.55269  osd.15 up   1.0  1.0
 16hdd   14.55269  osd.16 up   1.0  1.0
 17hdd   14.55269  osd.17 up   1.0  1.0
 18hdd   14.55269  osd.18 up   1.0  1.0
 19hdd   14.55269  osd.19 up   1.0  1.0
 32ssd1.74660  osd.32 up   1.0  1.0
 33ssd1.74660  osd.33 up   1.0  1.0
 -1 0  root default

# ceph osd pool ls detail
pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash 
rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 31 flags 
hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr
pool 2 'cephfs.cephfs.meta' replicated size 3 min_size 2 crush_rule 2 
object_hash rjenkins pg_num 16 pgp_num 16 autoscale_mode on last_change 
9833 lfor 0/0/584 flags hashpspool stripe_width 0 pg_autoscale_bias 4 
pg_num_min 16 recovery_priority 5 application cephfs
pool 3 'cephfs.cephfs.data' replicated size 3 min_size 2 crush_rule 1 
object_hash rjenkins pg_num 1024 pgp_num 1024 autoscale_mode on 
last_change 7630 lfor 0/1831/6544 flags hashpspool,bulk stripe_width 0 
application cephfs


crush_rules 1 and 2 are just used to assign the data and meta pool to 
HDD and SSD, respectively (failure domain: host).


// J

On 2023-03-31 15:37, c...@elchaka.de wrote:

Need to know some more about your cluster...

Ceph -s
Ceph osd df tree
Replica or ec?
...

Perhaps this can give us some insight
Mehmet

Am 31. März 2023 18:08:38 MESZ schrieb Johan Hattne :

Dear all;

Up until a few hours ago, I had a seemingly normally-behaving cluster 
(Quincy, 17.2.5) with 36 OSDs, evenly distributed across 3 of its 6 nodes.  The 
cluster is only used for CephFS and the only non-standard configuration I can 
t

[ceph-users] Misplaced objects greater than 100%

2023-03-31 Thread Johan Hattne

Dear all;

Up until a few hours ago, I had a seemingly normally-behaving cluster 
(Quincy, 17.2.5) with 36 OSDs, evenly distributed across 3 of its 6 
nodes.  The cluster is only used for CephFS and the only non-standard 
configuration I can think of is that I had 2 active MDSs, but only 1 
standby.  I had also doubled mds_cache_memory limit to 8 GB (all OSD 
hosts have 256 G of RAM) at some point in the past.


Then I rebooted one of the OSD nodes.  The rebooted node held one of the 
active MDSs.  Now the node is back up: ceph -s says the cluster is 
healthy, but all PGs are in a active+clean+remapped state and 166.67% of 
the objects are misplaced (dashboard: -66.66% healthy).


The data pool is a threefold replica with 5.4M object,  the number of 
misplaced objects is reported as 27087410/16252446.  The denominator in 
the ratio makes sense to me (16.2M / 3 = 5.4M), but the numerator does 
not.  I also note that the ratio is *exactly* 5 / 3.  The filesystem is 
still mounted and appears to be usable, but df reports it as 100% full; 
I suspect it would say 167% but that is capped somewhere.


Any ideas about what is going on?  Any suggestions for recovery?

// Best wishes; Johan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSD failed to load OSD map for epoch

2021-07-28 Thread Johan Hattne
OK, thanks!  This is the same package as in the Octopus images, so I 
would expect Pacific to fail just as spectacularly.


What's the best way to have this fixed?  New issue on the Ceph tracker? 
 I understand the Ceph images use CentOS packages, so should they be 
poked as well?


// Best wishes; Johan

On 2021-07-27 23:48, Eugen Block wrote:

Alright, it's great that you could fix it!

In my one-node test cluster (Pacific) I see this smartctl version:

[ceph: root@pacific /]# rpm -q smartmontools
smartmontools-7.1-1.el8.x86_64



Zitat von Johan Hattne :

Thanks a lot, Eugen!  I had not found those threads, but I did 
eventually recover; details below.  And yes, this is a toy size-2 
cluster with two OSDs, but I suspect I would seen the same problem on 
a more reasonable setup since this whole mess was caused by Octopus's 
smartmontools not playing nice with the NVMes.


Just as in the previous thread Eugen provided, I got an OSD map from 
the monitors:


  # ceph osd getmap 4372 > /tmp/osd_map_4372

copied it to the OSD hosts and imported it:

  # CEPH_ARGS="--bluestore-ignore-data-csum" ceph-objectstore-tool 
--data-path /var/lib/ceph/osd/ceph-0/ --op set-osdmap --file 
/tmp/osd_map_4372


Given the initial cause of the error, I removed the WAL devices:

  # ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-0 --devs-source 
/var/lib/ceph/osd/ceph-0/block.wal --dev-target 
/var/lib/ceph/osd/ceph-0/block --command bluefs-bdev-migrate

  # ceph-volume lvm zap /var/lib/ceph/osd/ceph-0/block.wal

Here I got bitten by what looks like #49554, so

  # lvchange --deltag "ceph.wal_device=/dev/ceph-wal/wal-0" --deltag 
"ceph.wal_uuid=G7Z5qA-OaJQ-Spvs-X4ec-0SvX-vT2C-C0Dbpe" 
/dev/ceph-block-0/block-0


And analogously for osd1.  After restarting the OSDs, deep scrubbing, 
and a bit of manual repair, the cluster is healthy again.


The reason for the crash turns out to be a known problem with 
smartmontools <7.2 and the Micron 2200 NVMes that were used to back 
the WAL (https://www.smartmontools.org/ticket/1404).  Unfortunately, 
the Octopus image ships with smartmontools 7.1, which will crash the 
kernel on e.g. "smartctl -a /dev/nvme0".  Before switching to Octopus 
containers, I was using smartmontools from Debian backports, which 
does not have this problem.


Does Pacific have newer smartmontools?

// Best wishes; Johan

On 2021-07-27 06:35, Eugen Block wrote:

Hi,

did you read this thread [1] reporting a similar issue? It refers to 
a solution described in [2] but the OP in [1] recreated all OSDs, so 
it's not clear what the root cause was.
Can you start the OSD with more verbose (debug) output and share 
that? Does your cluster really have only two OSDs? Are you running it 
with size 2 pools?


[1] 
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/EUFDKK3HEA5DPTUVJ5LBNQSWAKZH5ZM7/ 
[2] 
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-August/036592.html 




Zitat von Johan Hattne :


Dear all;

We have 3-node cluster that has two OSDs on separate nodes, each 
with wal on NVMe.  It's been running fine for quite some time, 
albeit under very light load.  This week, we moved from 
package-based Octopus to container-based ditto (15.2.13, all on 
Debian stable).  Within a few hours of that change, both OSDs 
crashed and dmesg filled up with stuff like:


  DMAR: DRHD: handling fault status reg 2
  DMAR: [DMA Read] Request device [06:00.0] PASID  fault 
addr ffbc [fault reason 06] PTE Read access is not set


where 06:00.0 is the NVMe with the wal.  This happened at the same 
time on *both* OSD nodes, but I'll worry about why this happened 
later.  I would first like to see if I can get the cluster back up.


From cephadm shell, I activate OSD 1 and try to start it (I did 
create a minimal /etc/ceph/ceph.conf with global "fsid" and "mon 
host" for that purpose):


  # ceph-volume lvm activate 1 cce125b2-2597-4be9-bd17-23eb059d2778 
--no-systemd

  # ceph-osd -d --cluster ceph --id 1

This gives "osd.1 0 OSD::init() : unable to read osd superblock", 
and the subsequent output indicates that this due to checksum 
errors.  So ignore checksum mismatches and try again:


  # CEPH_ARGS="--bluestore-ignore-data-csum" ceph-osd -d --cluster 
ceph --id 1


which results in "osd.1 0 failed to load OSD map for epoch 4372, got 
0 bytes".  The monitors are at 4378, as per:


  # ceph osd stat
  2 osds: 0 up (since 47h), 1 in (since 47h); epoch: e4378

Is there any way to get past this?  For instance, could I coax the 
OSDs into epoch 4378?  This is the first time I deal a ceph 
disaster, so there may be all kinds of obvious things I'm missing.


// Best wishes; Johan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io




___

[ceph-users] Re: OSD failed to load OSD map for epoch

2021-07-27 Thread Johan Hattne
Thanks a lot, Eugen!  I had not found those threads, but I did 
eventually recover; details below.  And yes, this is a toy size-2 
cluster with two OSDs, but I suspect I would seen the same problem on a 
more reasonable setup since this whole mess was caused by Octopus's 
smartmontools not playing nice with the NVMes.


Just as in the previous thread Eugen provided, I got an OSD map from the 
monitors:


  # ceph osd getmap 4372 > /tmp/osd_map_4372

copied it to the OSD hosts and imported it:

  # CEPH_ARGS="--bluestore-ignore-data-csum" ceph-objectstore-tool 
--data-path /var/lib/ceph/osd/ceph-0/ --op set-osdmap --file 
/tmp/osd_map_4372


Given the initial cause of the error, I removed the WAL devices:

  # ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-0 --devs-source 
/var/lib/ceph/osd/ceph-0/block.wal --dev-target 
/var/lib/ceph/osd/ceph-0/block --command bluefs-bdev-migrate

  # ceph-volume lvm zap /var/lib/ceph/osd/ceph-0/block.wal

Here I got bitten by what looks like #49554, so

  # lvchange --deltag "ceph.wal_device=/dev/ceph-wal/wal-0" --deltag 
"ceph.wal_uuid=G7Z5qA-OaJQ-Spvs-X4ec-0SvX-vT2C-C0Dbpe" 
/dev/ceph-block-0/block-0


And analogously for osd1.  After restarting the OSDs, deep scrubbing, 
and a bit of manual repair, the cluster is healthy again.


The reason for the crash turns out to be a known problem with 
smartmontools <7.2 and the Micron 2200 NVMes that were used to back the 
WAL (https://www.smartmontools.org/ticket/1404).  Unfortunately, the 
Octopus image ships with smartmontools 7.1, which will crash the kernel 
on e.g. "smartctl -a /dev/nvme0".  Before switching to Octopus 
containers, I was using smartmontools from Debian backports, which does 
not have this problem.


Does Pacific have newer smartmontools?

// Best wishes; Johan

On 2021-07-27 06:35, Eugen Block wrote:

Hi,

did you read this thread [1] reporting a similar issue? It refers to a 
solution described in [2] but the OP in [1] recreated all OSDs, so it's 
not clear what the root cause was.
Can you start the OSD with more verbose (debug) output and share that? 
Does your cluster really have only two OSDs? Are you running it with 
size 2 pools?


[1] 
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/EUFDKK3HEA5DPTUVJ5LBNQSWAKZH5ZM7/ 

[2] 
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-August/036592.html



Zitat von Johan Hattne :


Dear all;

We have 3-node cluster that has two OSDs on separate nodes, each with 
wal on NVMe.  It's been running fine for quite some time, albeit under 
very light load.  This week, we moved from package-based Octopus to 
container-based ditto (15.2.13, all on Debian stable).  Within a few 
hours of that change, both OSDs crashed and dmesg filled up with stuff 
like:


  DMAR: DRHD: handling fault status reg 2
  DMAR: [DMA Read] Request device [06:00.0] PASID  fault addr 
ffbc [fault reason 06] PTE Read access is not set


where 06:00.0 is the NVMe with the wal.  This happened at the same 
time on *both* OSD nodes, but I'll worry about why this happened 
later.  I would first like to see if I can get the cluster back up.


From cephadm shell, I activate OSD 1 and try to start it (I did create 
a minimal /etc/ceph/ceph.conf with global "fsid" and "mon host" for 
that purpose):


  # ceph-volume lvm activate 1 cce125b2-2597-4be9-bd17-23eb059d2778 
--no-systemd

  # ceph-osd -d --cluster ceph --id 1

This gives "osd.1 0 OSD::init() : unable to read osd superblock", and 
the subsequent output indicates that this due to checksum errors.  So 
ignore checksum mismatches and try again:


  # CEPH_ARGS="--bluestore-ignore-data-csum" ceph-osd -d --cluster 
ceph --id 1


which results in "osd.1 0 failed to load OSD map for epoch 4372, got 0 
bytes".  The monitors are at 4378, as per:


  # ceph osd stat
  2 osds: 0 up (since 47h), 1 in (since 47h); epoch: e4378

Is there any way to get past this?  For instance, could I coax the 
OSDs into epoch 4378?  This is the first time I deal a ceph disaster, 
so there may be all kinds of obvious things I'm missing.


// Best wishes; Johan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] OSD failed to load OSD map for epoch

2021-07-23 Thread Johan Hattne

Dear all;

We have 3-node cluster that has two OSDs on separate nodes, each with 
wal on NVMe.  It's been running fine for quite some time, albeit under 
very light load.  This week, we moved from package-based Octopus to 
container-based ditto (15.2.13, all on Debian stable).  Within a few 
hours of that change, both OSDs crashed and dmesg filled up with stuff like:


  DMAR: DRHD: handling fault status reg 2
  DMAR: [DMA Read] Request device [06:00.0] PASID  fault addr 
ffbc [fault reason 06] PTE Read access is not set


where 06:00.0 is the NVMe with the wal.  This happened at the same time 
on *both* OSD nodes, but I'll worry about why this happened later.  I 
would first like to see if I can get the cluster back up.


From cephadm shell, I activate OSD 1 and try to start it (I did create 
a minimal /etc/ceph/ceph.conf with global "fsid" and "mon host" for that 
purpose):


  # ceph-volume lvm activate 1 cce125b2-2597-4be9-bd17-23eb059d2778 
--no-systemd

  # ceph-osd -d --cluster ceph --id 1

This gives "osd.1 0 OSD::init() : unable to read osd superblock", and 
the subsequent output indicates that this due to checksum errors.  So 
ignore checksum mismatches and try again:


  # CEPH_ARGS="--bluestore-ignore-data-csum" ceph-osd -d --cluster ceph 
--id 1


which results in "osd.1 0 failed to load OSD map for epoch 4372, got 0 
bytes".  The monitors are at 4378, as per:


  # ceph osd stat
  2 osds: 0 up (since 47h), 1 in (since 47h); epoch: e4378

Is there any way to get past this?  For instance, could I coax the OSDs 
into epoch 4378?  This is the first time I deal a ceph disaster, so 
there may be all kinds of obvious things I'm missing.


// Best wishes; Johan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io