[ceph-users] Re: ceph_volume.process hangs after reboot with missing osds lockbox.keyring dm-crypt osd luks [solved]

2022-12-23 Thread Jelle de Jong

Hello everybody,

I solved the issue, there was an network misconfiguration and the some 
of the osd nodes where not able to communicate with the monitoring node.


Kind regards,

Jelle de Jong

On 12/23/22 17:49, Jelle de Jong wrote:

Hello everybody,

After an reboot of my ceph-osd node non of the osds came back online.

I am suspecting some issue with the ubuntu focal packages in combination 
with ceph 15.2.16-1focal.


I tried upgrading all ceph packages to 15.2.17-1bionic and downgrading 
to a few other versions as well but I can not get it back online.


Am I missing something needed for osd teo start like luks crypto keys of 
sort?


Can someone point me in the right direction on debugging this and 
getting the osds back up?


I attached some of the logs and system information.

Is there somethign specific with Ubuntu focal and Ceph 15.2 after recent 
updates?


Kind regards,

Jelle de Jong


[2022-12-23 16:29:05,881][ceph_volume.process][INFO  ] Running command: 
/usr/bin/ceph --cluster ceph --name 
client.osd-lockbox.2e3c25c2-bfe5-4ecf-b402-5bd75872c568 --keyring 
/var/lib/ceph/osd/ceph-96/lockbox.keyring config-key get 
dm-crypt/osd/2e3c25c2-bfe5-4ecf-b402-5bd75872c568/luks


# ls -hal  /var/lib/ceph/osd/*/*
-rw--- 1 ceph ceph 106 Dec 23 16:29 
/var/lib/ceph/osd/ceph-100/lockbox.keyring
-rw--- 1 ceph ceph 106 Dec 23 16:29 
/var/lib/ceph/osd/ceph-106/lockbox.keyring
-rw--- 1 ceph ceph 106 Dec 23 16:29 
/var/lib/ceph/osd/ceph-110/lockbox.keyring
-rw--- 1 ceph ceph 106 Dec 23 16:29 
/var/lib/ceph/osd/ceph-114/lockbox.keyring
-rw--- 1 ceph ceph 106 Dec 23 16:29 
/var/lib/ceph/osd/ceph-118/lockbox.keyring
-rw--- 1 ceph ceph 106 Dec 23 16:29 
/var/lib/ceph/osd/ceph-81/lockbox.keyring
-rw--- 1 ceph ceph 106 Dec 23 16:29 
/var/lib/ceph/osd/ceph-85/lockbox.keyring
-rw--- 1 ceph ceph 106 Dec 23 16:29 
/var/lib/ceph/osd/ceph-89/lockbox.keyring
-rw--- 1 ceph ceph 106 Dec 23 16:29 
/var/lib/ceph/osd/ceph-92/lockbox.keyring
-rw--- 1 ceph ceph 106 Dec 23 16:29 
/var/lib/ceph/osd/ceph-96/lockbox.keyring



# cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.5 LTS"

# cat /etc/apt/sources.list.d/eu_ceph_com_debian_octopus.list
#deb http://eu.ceph.com//debian-octopus focal main
deb http://eu.ceph.com/debian-15.2.16/ focal main


# dpkg -l *ceph*
Desired=Unknown/Install/Remove/Purge/Hold
| 
Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend

|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name  Version    Architecture Description
+++-=-==--=
ii  ceph  15.2.16-1focal amd64 distributed 
storage and file system
ii  ceph-base 15.2.16-1focal amd64    common 
ceph daemon libraries and management tools
un  ceph-client-tools     (no 
description available)
ii  ceph-common   15.2.16-1focal amd64    common 
utilities to mount and interact with a ceph storage cluster
un  ceph-fs-common        (no 
description available)
ii  ceph-fuse 15.2.16-1focal amd64    FUSE-based 
client for the Ceph distributed file system
ii  ceph-mds  15.2.16-1focal amd64    metadata 
server for the ceph distributed file system
ii  ceph-mgr  15.2.16-1focal amd64    manager 
for the ceph distributed storage system
ii  ceph-mgr-cephadm  15.2.16-1focal all  cephadm 
orchestrator module for ceph-mgr
ii  ceph-mgr-dashboard    15.2.16-1focal all  dashboard 
module for ceph-mgr
ii  ceph-mgr-diskprediction-cloud 15.2.16-1focal all 
diskprediction-cloud module for ceph-mgr
ii  ceph-mgr-diskprediction-local 15.2.16-1focal all 
diskprediction-local module for ceph-mgr
ii  ceph-mgr-k8sevents    15.2.16-1focal all  kubernetes 
events module for ceph-mgr
ii  ceph-mgr-modules-core 15.2.16-1focal all  ceph 
manager modules which are always enabled
ii  ceph-mon  15.2.16-1focal amd64    monitor 
server for the ceph storage system
ii  ceph-osd  15.2.16-1focal amd64    OSD server 
for the ceph storage system
un  ceph-test     (no 
description available)
ii  cephadm   15.2.16-1focal amd64    cephadm 
utility to bootstrap ceph daemons with systemd and containers
un  libceph       (no 
description available)
un  libceph1      (no 
description available)
un  libcephfs     (no 
description available)
ii  libcephfs2    15.2.16-1focal amd64    Ceph 
distributed file system client library
un  python-ceph

[ceph-users] ceph_volume.process hangs after reboot with missing osds lockbox.keyring dm-crypt osd luks

2022-12-23 Thread Jelle de Jong

Hello everybody,

After an reboot of my ceph-osd node non of the osds came back online.

I am suspecting some issue with the ubuntu focal packages in combination 
with ceph 15.2.16-1focal.


I tried upgrading all ceph packages to 15.2.17-1bionic and downgrading 
to a few other versions as well but I can not get it back online.


Am I missing something needed for osd teo start like luks crypto keys of 
sort?


Can someone point me in the right direction on debugging this and 
getting the osds back up?


I attached some of the logs and system information.

Is there somethign specific with Ubuntu focal and Ceph 15.2 after recent 
updates?


Kind regards,

Jelle de Jong


[2022-12-23 16:29:05,881][ceph_volume.process][INFO  ] Running command: 
/usr/bin/ceph --cluster ceph --name 
client.osd-lockbox.2e3c25c2-bfe5-4ecf-b402-5bd75872c568 --keyring 
/var/lib/ceph/osd/ceph-96/lockbox.keyring config-key get 
dm-crypt/osd/2e3c25c2-bfe5-4ecf-b402-5bd75872c568/luks


# ls -hal  /var/lib/ceph/osd/*/*
-rw--- 1 ceph ceph 106 Dec 23 16:29 
/var/lib/ceph/osd/ceph-100/lockbox.keyring
-rw--- 1 ceph ceph 106 Dec 23 16:29 
/var/lib/ceph/osd/ceph-106/lockbox.keyring
-rw--- 1 ceph ceph 106 Dec 23 16:29 
/var/lib/ceph/osd/ceph-110/lockbox.keyring
-rw--- 1 ceph ceph 106 Dec 23 16:29 
/var/lib/ceph/osd/ceph-114/lockbox.keyring
-rw--- 1 ceph ceph 106 Dec 23 16:29 
/var/lib/ceph/osd/ceph-118/lockbox.keyring
-rw--- 1 ceph ceph 106 Dec 23 16:29 
/var/lib/ceph/osd/ceph-81/lockbox.keyring
-rw--- 1 ceph ceph 106 Dec 23 16:29 
/var/lib/ceph/osd/ceph-85/lockbox.keyring
-rw--- 1 ceph ceph 106 Dec 23 16:29 
/var/lib/ceph/osd/ceph-89/lockbox.keyring
-rw--- 1 ceph ceph 106 Dec 23 16:29 
/var/lib/ceph/osd/ceph-92/lockbox.keyring
-rw--- 1 ceph ceph 106 Dec 23 16:29 
/var/lib/ceph/osd/ceph-96/lockbox.keyring



# cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.5 LTS"

# cat /etc/apt/sources.list.d/eu_ceph_com_debian_octopus.list
#deb http://eu.ceph.com//debian-octopus focal main
deb http://eu.ceph.com/debian-15.2.16/ focal main


# dpkg -l *ceph*
Desired=Unknown/Install/Remove/Purge/Hold
| 
Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend

|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name  VersionArchitecture Description
+++-=-==--=
ii  ceph  15.2.16-1focal amd64 
distributed storage and file system
ii  ceph-base 15.2.16-1focal amd64common 
ceph daemon libraries and management tools
un  ceph-client-tools (no 
description available)
ii  ceph-common   15.2.16-1focal amd64common 
utilities to mount and interact with a ceph storage cluster
un  ceph-fs-common(no 
description available)
ii  ceph-fuse 15.2.16-1focal amd64FUSE-based 
client for the Ceph distributed file system
ii  ceph-mds  15.2.16-1focal amd64metadata 
server for the ceph distributed file system
ii  ceph-mgr  15.2.16-1focal amd64manager 
for the ceph distributed storage system
ii  ceph-mgr-cephadm  15.2.16-1focal all  cephadm 
orchestrator module for ceph-mgr
ii  ceph-mgr-dashboard15.2.16-1focal all  dashboard 
module for ceph-mgr
ii  ceph-mgr-diskprediction-cloud 15.2.16-1focal all 
diskprediction-cloud module for ceph-mgr
ii  ceph-mgr-diskprediction-local 15.2.16-1focal all 
diskprediction-local module for ceph-mgr
ii  ceph-mgr-k8sevents15.2.16-1focal all  kubernetes 
events module for ceph-mgr
ii  ceph-mgr-modules-core 15.2.16-1focal all  ceph 
manager modules which are always enabled
ii  ceph-mon  15.2.16-1focal amd64monitor 
server for the ceph storage system
ii  ceph-osd  15.2.16-1focal amd64OSD server 
for the ceph storage system
un  ceph-test (no 
description available)
ii  cephadm   15.2.16-1focal amd64cephadm 
utility to bootstrap ceph daemons with systemd and containers
un  libceph   (no 
description available)
un  libceph1  (no 
description available)
un  libcephfs (no 
description available)
ii  libcephfs215.2.16-1focal amd64Ceph 
distributed file system client library
un  python-ceph   (no 
description available)
ii  python3-ceph-argparse 15.2.16-1focal all  Python 3 
utility libraries for Ceph CLI
ii  python3-ceph-common   15.2.16-1focal all  Python 3 
utility 

[ceph-users] Re: S3 Deletes in Multisite Sometimes Not Syncing

2022-12-23 Thread Matthew Darwin

Hi Alex,

We also have a multi-site setup (17.2.5). I just deleted a bunch of 
files from one side and some files got deleted on the other side but 
not others. I waited 10 hours to see if the files would delete. I 
didn't do an exhaustive test like yours, but seems similar issues. In 
our case, like yours, the two ceph sites are geographically separated.


We don't have versioning enabled.

I would love to hear from anyone who has replication working perfectly.

On 2022-12-22 07:17, Alex Hussein-Kershaw (HE/HIM) wrote:

Hi Folks,

Have made a strange observation on one of our Storage Clusters.

   *   Running Ceph 15.2.13.
   *   Set up as a multisite pair of siteA and siteB. The two sites are 
geographically separated.
   *   We are using S3 with a bucket in versioning suspended state (we 
previously had versioning on but decided it’s not required).
   *   We’re using pubsub in conjunction with our S3 usage, don’t think this is 
relevant but figured I should mention just in case.

We wrote 2413 small objects (no more than a few MB each) into the cluster via 
S3 on siteA. Then we deleted those objects via the S3 interface on siteA. Once 
the deleting was complete, we had 11 objects of the 2413 in a strange state on 
siteB but not siteA.

On both sites the objects were set to zero size, I think this is expected. On 
siteA, where the deletes were sent, the objects were marked with 
“delete-marker”. On siteB, the objects were not marked with “delete-marker”. 
“DELETE_MARKER_CREATE” pubsub events on siteA were generated for these objects, 
but not on siteB (expecting the problem is not at the pubsub level).

I followed a specific object through in logs and saw the following:

   *   Object created: 00:11:16
   *   Object deleted: 01:04:02
   *   Pubsub on SiteB generated “OBJECT_CREATE” events at 00:11:31, 00:11:34, 
01:04:18.


My observations from this are:

   *   There is plenty time between the create and the delete for this not to 
be some niche timing issue.
   *   The final “OBJECT_CREATE” event is after the delete so I expect is a 
result of the multisite sync informing siteB of the change.
   *   I expect this final event to be a “DELETE_MARKER_CREATE” event, not an 
“OBJECT_CREATE”.

We can manually delete the objects from siteB to clean-up, but this is painful 
and makes us look a bit silly when we get support calls from customers for this 
sort of thing – so I’m keen find a better solution.

I’ve failed to find a reason why this would occur due to us doing something 
wrong in our setup, it seems this is not the intended behaviour given that it’s 
only affecting a small number of the objects (most are marked as deleted on 
both sites as expected).

   *   Has anyone else experienced this sort of thing?
   *   I wonder if it’s related to our versioning suspended state.
   *   How well tested is this scenario i.e., multisite + bucket versioning 
together?
   *   Is there something we can do it mitigate it? As I understand, we can’t 
return to a versioning disabled state for this bucket.

Thanks, and Season’s Greetings 

Alex Kershaw |alex...@microsoft.com
Software Engineer | Azure for Operators

___
ceph-users mailing list --ceph-users@ceph.io
To unsubscribe send an email toceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Recent ceph.io Performance Blog Posts

2022-12-23 Thread Dhairya Parmar
If this is the same issue that affected a couple of PRs in the last few
weeks
then rebasing the PR with the latest fetch of the main branch and force
pushing
it should solve the problem.
- Dhairya


On Fri, Dec 23, 2022 at 7:12 PM Stefan Kooman  wrote:

> On 12/19/22 10:26, Stefan Kooman wrote:
> > On 12/14/22 19:04, Mark Nelson wrote:
> >
> >>
> >> This is great work!  Would you consider making a PR against main for
> >> the change to ceph-volume?  Given that you have performance data it
> >> sounds like good justification.  I'm not sure who's merging changes to
> >> ceph-volume these days, but I can try to find out if no one is biting.
> >>
> >>
> > Yes, will do.
>
> https://github.com/ceph/ceph/pull/49554
>
> Gr. Stefan
>
> P.s. failed "make check" does not seem to be related to my changes AFAICT.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Recent ceph.io Performance Blog Posts

2022-12-23 Thread Stefan Kooman

On 12/19/22 10:26, Stefan Kooman wrote:

On 12/14/22 19:04, Mark Nelson wrote:



This is great work!  Would you consider making a PR against main for 
the change to ceph-volume?  Given that you have performance data it 
sounds like good justification.  I'm not sure who's merging changes to 
ceph-volume these days, but I can try to find out if no one is biting.




Yes, will do.


https://github.com/ceph/ceph/pull/49554

Gr. Stefan

P.s. failed "make check" does not seem to be related to my changes AFAICT.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Rocky9 support for ceph ?? What is the official word ?

2022-12-23 Thread zRiemann Contact




The reason for me to stick to older versions as long as possible, is so that 
others have solved such problems for me, when I do this upgrade.

I am still in my testing phase of el9/centos9stream. I have for all servers a 
custom own repo added. This is where I put packages that are not available. If 
you can find a srpm, you are mostlikely able to build it on el9 and add it.

I just did a quick search and there seem to be el9 packages.
https://copr.fedorainfracloud.org/coprs/ceph/el9/build/2355701/

I would approach it like this.


Marc,

this is an unofficial repo. Rely a production infrastructure on an 
unofficial repository that welcomes you with:


"Enabling a Copr repository. Please note that this repository is not part
of the main distribution, and quality may vary.
The Fedora Project does not exercise any power over the contents of
this repository beyond the rules outlined in the Copr FAQ"

...is what I call bad practice.

my two questions of the other email remains unanswered: How ceph team 
tested rpm installations of ceph on RHEL9?


FPG

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS: Isolating folders for different users

2022-12-23 Thread Kai Stian Olstad

On 22.12.2022 15:47, Jonas Schwab wrote:

Now the question: Since I established this setup more or less through
trial and error, I was wondering if there is a more elegant/better
approach than what is outlined above?


You can use namespace so you don't need separate pools.
Unfortunately the documentation is sparse on the subject, I use it with 
subvolume like this



# Create a subvolume

ceph fs subvolume create   
--pool_layout  --namespace-isolated


The subvolume is created with namespace fsvolume_
You can also find the name with

ceph fs subvolume info   | jq -r 
.pool_namespace



# Create a user with access to the subvolume and the namespace

## First find the path to the subvolume

ceph fs subvolume getpath  

## Create the user

ceph auth get-or-create client. mon 'allow r' mds 'allow 
rw path=' osd 'allow rw pool= 
namespace=fsvolumens_'



I have found this by looking at how Openstack does it and some trial and 
error.



--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io