from:"Patrick Begou"

[ceph-users] Re: Question about erasure coding on cephfs

2024-03-03 Thread Patrick Begou


Hi Erich,

about a similar problem I asked some months ago,Frank Schilder published 
this on the list (December 6, 2023) and it may be helpfull for your 
setup. I've not tested yet, my cluster is still in deployment state.


   To provide some first-hand experience, I was operating a pool with a 6+2 EC 
profile on 4 hosts for a while (until we got more hosts) and the "subdivide a 
physical host into 2 crush-buckets" approach is actually working best (I basically 
tried all the approaches described in the linked post and they all had pitfalls).

   Procedure is more or less:

   - add second (logical) host bucket for each physical host by suffixing the host name with 
"-B" (ceph osd crush add-bucket   )
   - move half the OSDs per host to this new host bucket (ceph osd crush 
move osd.ID host=HOSTNAME-B)
   - make this location persist reboot of the OSDs (ceph config set osd.ID 
crush_location host=HOSTNAME-B")

   This will allow you to move OSDs back easily when you get more hosts and can 
afford the recommended 1 shard per host. It will also show which and where OSDs are moved 
to with a simple "ceph config dump | grep crush_location". Bets of all, you 
don't have to fiddle around with crush maps and hope they do what you want. Just use 
failure domain host and you are good. No more than 2 host buckets per physical host means 
no more than 2 shards per physical host with default placement rules.

   I was operating this set-up with min_size=6 and feeling bad about it due 
to the reduced maintainability (risk of data loss during maintenance). Its not 
great really, but sometimes there is no way around it. I was happy when I got 
the extra hosts.

Patrick

Le 02/03/2024 à 16:37, Erich Weiler a écrit :

Hi Y'all,

We have a new ceph cluster online that looks like this:

md-01 : monitor, manager, mds
md-02 : monitor, manager, mds
md-03 : monitor, manager
store-01 : twenty 30TB NVMe OSDs
store-02 : twenty 30TB NVMe OSDs

The cephfs storage is using erasure coding at 4:2.  The crush domain 
is set to "osd".


(I know that's not optimal but let me get to that in a minute)

We have a current regular single NFS server (nfs-01) with the same 
storage as the OSD servers above (twenty 30TB NVME disks).  We want to 
wipe the NFS server and integrate it into the above ceph cluster as 
"store-03".  When we do that, we would then have three OSD servers.  
We would then switch the crush domain to "host".


My question is this:  Given that we have 4:2 erasure coding, would the 
data rebalance evenly across the three OSD servers after we add 
store-03 such that if a single OSD server went down, the other two 
would be enough to keep the system online?  Like, with 4:2 erasure 
coding, would 2 shards go on store-01, then 2 shards on store-02, and 
then 2 shards on store-03?  Is that how I understand it?


Thanks for any insight!

-erich
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: cephadm bootstrap on 3 network clusters

2024-01-08 Thread Patrick Begou


Hi Sebastian

as you says "more than 3 public networks", did you manage Ceph daemons 
listening on multiple public interface ?
I'm looking for such a possibility as daemons seams binded to one 
interface only but do not find any how-to.


Thanks

Patrick

Le 03/01/2024 à 21:31, Sebastian a écrit :

Hi,
check routing table and default gateway and eventually fix it.
use IP instead of dns name.

I have more complicated situation :D
I have more than 3 public networks and cluster networks…

BR,
Sebastian


On Jan 3, 2024, at 16:40, Luis Domingues  wrote:



Why? The public network should not have any restrictions between the
Ceph nodes. Same with the cluster network.

Internal policies and network rules.

Luis Domingues
Proton AG


On Wednesday, 3 January 2024 at 16:15, Robert Sander 
 wrote:



Hi Luis,

On 1/3/24 16:12, Luis Domingues wrote:


My issue is that mon1 cannot connect via SSH to itself using pub network, and 
bootstrap fail at the end when cephadm tries to add mon1 to the list of hosts.


Why? The public network should not have any restrictions between the
Ceph nodes. Same with the cluster network.

Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

https://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Amtsgericht Berlin-Charlottenburg - HRB 220009 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: EC Profiles & DR

2023-12-06 Thread Patrick Begou


Le 06/12/2023 à 16:21, Frank Schilder a écrit :

Hi,

the post linked in the previous message is a good source for different 
approaches.

To provide some first-hand experience, I was operating a pool with a 6+2 EC profile on 4 
hosts for a while (until we got more hosts) and the "subdivide a physical host into 
2 crush-buckets" approach is actually working best (I basically tried all the 
approaches described in the linked post and they all had pitfalls).

Procedure is more or less:

- add second (logical) host bucket for each physical host by suffixing the host name with "-B" 
(ceph osd crush add-bucket   )
- move half the OSDs per host to this new host bucket (ceph osd crush move 
osd.ID host=HOSTNAME-B)
- make this location persist reboot of the OSDs (ceph config set osd.ID 
crush_location host=HOSTNAME-B")

This will allow you to move OSDs back easily when you get more hosts and can afford the 
recommended 1 shard per host. It will also show which and where OSDs are moved to with a 
simple "ceph config dump | grep crush_location". Bets of all, you don't have to 
fiddle around with crush maps and hope they do what you want. Just use failure domain 
host and you are good. No more than 2 host buckets per physical host means no more than 2 
shards per physical host with default placement rules.

I was operating this set-up with min_size=6 and feeling bad about it due to the 
reduced maintainability (risk of data loss during maintenance). Its not great 
really, but sometimes there is no way around it. I was happy when I got the 
extra hosts.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Curt
Sent: Wednesday, December 6, 2023 3:56 PM
To: Patrick Begou
Cc:ceph-users@ceph.io
Subject: [ceph-users] Re: EC Profiles & DR

Hi Patrick,

Yes K and M are chunks, but the default crush map is a chunk per host,
which is probably the best way to do it, but I'm no expert. I'm not sure
why you would want to do a crush map with 2 chunks per host and min size 4
as it' s just asking for trouble at some point, in my opinion.  Anyway,
take a look at this post if your interested in doing 2 chunks per host it
will give you an idea of crushmap setup,
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/NB3M22GNAC7VNWW7YBVYTH6TBZOYLTWA/
.

Regards,
Curt


Thanks all for this details that clarify many things for me.

Rich, yes I'm starting with 5 nodes and 4 HDD/node to set up the first 
Ceph cluster in the laboratory and my goal is to increase this cluster 
(may be up to  10 nodes) and to add storage in the nodes (until 12 OSD 
per node). It is a starting point for capacitif storage connected to my 
two clusters (400 cores + 256 cores).


Thanks Franck for these details, as a newbie Iwould never have thought 
to this strategy. In my mind, this is the best way for starting the 
first setup and moving to a more standard configuration later. I've all 
the template now, just have to dive deeper in the details to build it.


Patrick
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: EC Profiles & DR

2023-12-06 Thread Patrick Begou


Le 06/12/2023 à 00:11, Rich Freeman a écrit :

On Tue, Dec 5, 2023 at 6:35 AM Patrick Begou
  wrote:

Ok, so I've misunderstood the meaning of failure domain. If there is no
way to request using 2 osd/node and node as failure domain, with 5 nodes
k=3+m=1 is not secure enough and I will have to use k=2+m=2, so like a
raid1  setup. A little bit better than replication in the point of view
of global storage capacity.


I'm not sure what you mean by requesting 2osd/node.  If the failure
domain is set to the host, then by default k/m refer to hosts, and the
PGs will be spread across all OSDs on all hosts, but with any
particular PG only being present on one OSD on each host.  You can get
fancy with device classes and crush rules and such and be more
specific with how they're allocated, but that would be the typical
behavior.

Since k/m refer to hosts, then k+m must be less than or equal to the
number of hosts or you'll have a degraded pool because there won't be
enough hosts to allocate them all.  It won't ever stack them across
multiple OSDs on the same host with that configuration.

k=2,m=2 with min=3 would require at least 4 hosts (k+m), and would
allow you to operate degraded with a single host down, and the PGs
would become inactive but would still be recoverable with two hosts
down.  While strictly speaking only 4 hosts are required, you'd do
better to have more than that since then the cluster can immediately
recover from a loss, assuming you have sufficient space.  As you say
it is no more space-efficient than RAID1 or size=2, and it suffers
write amplification for modifications, but it does allow recovery
after the loss of up to two hosts, and you can operate degraded with
one host down which allows for somewhat high availability.


Hi Rich,

My understood was that k and m were for EC chunks not hosts.  Of 
course if k and m are hosts the best choice would be k=2 and m=2.


When Christian wrote:
/For example if you run an EC=4+2 profile on 3 hosts you can structure 
your crushmap so that you have 2 chunks per host. This means even if one 
host is down you are still guaranteed to have 4 chunks available./


This is that I had thought before (and using 5 nodes instead of 3 as the 
Christian's example). But it does not match what you explain if k and m 
are nodes.


I'm a little bit confused with crushmap settings.

Patrick
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: EC Profiles & DR

2023-12-05 Thread Patrick Begou

Ok, so I've misunderstood the meaning of failure domain. If there is no 
way to request using 2 osd/node and node as failure domain, with 5 nodes 
k=3+m=1 is not secure enough and I will have to use k=2+m=2, so like a 
raid1  setup. A little bit better than replication in the point of view 
of global storage capacity.

Patrick

Le 05/12/2023 à 12:19, David C. a écrit :

Hi,

To return to my comparison with SANs, on a SAN you have spare disks to 
repair a failed disk.

On Ceph, you therefore need at least one more host (k+m+1).

If we take into consideration the formalities/delivery times of a new 
server, k+m+2 is not luxury (Depending on the growth of your volume).

Cordialement,

*David CASIER*

Le mar. 5 déc. 2023 à 11:17, Patrick Begou 
 a écrit :

Hi Robert,

Le 05/12/2023 à 10:05, Robert Sander a écrit :
> On 12/5/23 10:01, duluxoz wrote:
>> Thanks David, I knew I had something wrong  :-)
>>
>> Just for my own edification: Why is k=2, m=1 not recommended for
>> production? Considered to "fragile", or something else?
>
> It is the same as a replicated pool with size=2. Only one host
can go
> down. After that you risk to lose data.
>
> Erasure coding is possible with a cluster size of 10 nodes or more.
> With smaller clusters you have to go with replicated pools.
>
Could you explain why 10 nodes are required for EC ?

On my side, I'm working on building my first (small) Ceph cluster
using
E.C. and I was thinking about 5 nodes and k=4 m=2. With a failure
domain
on host and several osd by nodes, in my mind this setup may run
degraded
with 3 nodes using 2 distincts osd by node and the ultimate
possibility
to loose an additional node without loosing data.  Of course with
sufficient free storage available.

Am I totally wrong in my first ceph approach ?

Patrick
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: EC Profiles & DR

2023-12-05 Thread Patrick Begou


Hi Robert,

Le 05/12/2023 à 10:05, Robert Sander a écrit :

On 12/5/23 10:01, duluxoz wrote:

Thanks David, I knew I had something wrong  :-)

Just for my own edification: Why is k=2, m=1 not recommended for
production? Considered to "fragile", or something else?


It is the same as a replicated pool with size=2. Only one host can go 
down. After that you risk to lose data.


Erasure coding is possible with a cluster size of 10 nodes or more.
With smaller clusters you have to go with replicated pools.


Could you explain why 10 nodes are required for EC ?

On my side, I'm working on building my first (small) Ceph cluster using 
E.C. and I was thinking about 5 nodes and k=4 m=2. With a failure domain 
on host and several osd by nodes, in my mind this setup may run degraded 
with 3 nodes using 2 distincts osd by node and the ultimate possibility 
to loose an additional node without loosing data.  Of course with 
sufficient free storage available.


Am I totally wrong in my first ceph approach ?

Patrick
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: [EXTERNAL] [Pacific] ceph orch device ls do not returns any HDD

2023-10-27 Thread Patrick Begou


Hi all,

First of all I apologize if I've not done things correctly but these are 
some tests results.


1) I've compiled the main branch in a fresh podman container (Alma Linux 
8) and installed. Successfull!
2) I have done a copy of the /etc/ceph directory of the host (member of 
the ceph cluster in Pacific 16.2.14) in this container (good or bad idea ?)

3) "ceph-volume inventory" works but with some error messages:

[root@74285dcfa91f etc]# ceph-volume inventory
 stderr: Unknown device, --name=, --path=, or absolute path in /dev/ or 
/sys expected.
 stderr: Unknown device, --name=, --path=, or absolute path in /dev/ or 
/sys expected.
 stderr: Unknown device, --name=, --path=, or absolute path in /dev/ or 
/sys expected.
 stderr: Unknown device, --name=, --path=, or absolute path in /dev/ or 
/sys expected.
 stderr: Unknown device, --name=, --path=, or absolute path in /dev/ or 
/sys expected.


Device Path   Size Device nodes    rotates available 
Model name
/dev/sdc  232.83 GB    sdc True True  
SAMSUNG HE253GJ
/dev/sda  232.83 GB    sda True False 
SAMSUNG HE253GJ
/dev/sdb  465.76 GB    sdb True False 
WDC WD5003ABYX-1

4) ceph version show:
[root@74285dcfa91f etc]# ceph -v
ceph version 18.0.0-6846-g2706ecac4a9 
(2706ecac4a90447420904e42d6e0445134dff2be) reef (dev)



5) lsblk works (container launched with "--privileged" flag)
[root@74285dcfa91f etc]# lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda  8:0    1 232.9G  0 disk
|-sda1   8:1    3.9G  0 part
|-sda2   8:2    1   3.9G  0 part [SWAP]
`-sda3   8:3    1   225G  0 part
sdb  8:16   1 465.8G  0 disk
sdc  8:32   1 232.9G  0 disk

But some commands do not works (my setup or ceph ?)

[root@74285dcfa91f etc]# ceph orch device zap 
mostha1.legi.grenoble-inp.fr /dev/sdc --force
Error EINVAL: Device path '/dev/sdc' not found on host 
'mostha1.legi.grenoble-inp.fr'

[root@74285dcfa91f etc]#

[root@74285dcfa91f etc]# ceph orch device ls
[root@74285dcfa91f etc]#

Patrick


Le 24/10/2023 à 22:43, Zack Cerza a écrit :

That's correct - it's the removable flag that's causing the disks to
be excluded.

I actually just merged this PR last week:
https://github.com/ceph/ceph/pull/49954

One of the changes it made was to enable removable (but not USB)
devices, as there are vendors that report hot-swappable drives as
removable. Patrick, it looks like this may resolve your issue as well.


On Tue, Oct 24, 2023 at 5:57 AM Eugen Block  wrote:

Hi,


May be because they are hot-swappable hard drives.

yes, that's my assumption as well.


Zitat von Patrick Begou :


Hi Eugen,

Yes Eugen, all the devices /dev/sd[abc] have the removable flag set
to 1. May be because they are hot-swappable hard drives.

I have contacted the commit author Zack Cerza and he asked me for
some additional tests too this morning. I add him in copy to this
mail.

Patrick

Le 24/10/2023 à 12:57, Eugen Block a écrit :

Hi,

just to confirm, could you check that the disk which is *not*
discovered by 16.2.11 has a "removable" flag?

cat /sys/block/sdX/removable

I could reproduce it as well on a test machine with a USB thumb
drive (live distro) which is excluded in 16.2.11 but is shown in
16.2.10. Although I'm not a developer I tried to understand what
changes were made in
https://github.com/ceph/ceph/pull/46375/files#diff-330f9319b0fe352dff0486f66d3c4d6a6a3d48efd900b2ceb86551cfd88dc4c4R771
 and there's this
line:


if get_file_contents(os.path.join(_sys_block_path, dev,
'removable')) == "1":
continue

The thumb drive is removable, of course, apparently that is filtered here.

Regards,
Eugen

Zitat von Patrick Begou :


Le 23/10/2023 à 03:04, 544463...@qq.com a écrit :

I think you can try to roll back this part of the python code and
wait for your good news :)


Not so easy 


[root@e9865d9a7f41 ceph]# git revert
4fc6bc394dffaf3ad375ff29cbb0a3eb9e4dbefc
Auto-merging src/ceph-volume/ceph_volume/tests/util/test_device.py
CONFLICT (content): Merge conflict in
src/ceph-volume/ceph_volume/tests/util/test_device.py
Auto-merging src/ceph-volume/ceph_volume/util/device.py
CONFLICT (content): Merge conflict in
src/ceph-volume/ceph_volume/util/device.py
Auto-merging src/ceph-volume/ceph_volume/util/disk.py
CONFLICT (content): Merge conflict in
src/ceph-volume/ceph_volume/util/disk.py
error: could not revert 4fc6bc394df... ceph-volume: Optionally
consume loop devices

Patrick
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe s

[ceph-users] Re: [EXTERNAL] [Pacific] ceph orch device ls do not returns any HDD

2023-10-25 Thread Patrick Begou


Hi,

Running git pull this morning I saw the patch on the main branch and try 
to compile it but it fails with cython for rbd.pyx. I have many similar 
errors:


rbd.pyx:760:44: Cannot assign type 'int (*)(uint64_t, uint64_t, void *) 
except? -1' to 'librbd_progress_fn_t'. Exception values are 
incompatible. Suggest adding 'noexcept' to type 'int (uint64_t, 
uint64_t, void *) except? -1'.
rbd.pyx:763:23: Cannot assign type 'int (*)(uint64_t, uint64_t, void *) 
except? -1 nogil' to 'librbd_progress_fn_t'. Exception values are 
incompatible. Suggest adding 'noexcept' to type 'int (uint64_t, 
uint64_t, void *) except? -1 nogil'.
rbd.pyx:868:44: Cannot assign type 'int (*)(uint64_t, uint64_t, void *) 
except? -1' to 'librbd_progress_fn_t'. Exception values are 
incompatible. Suggest adding 'noexcept' to type 'int (uint64_t, 
uint64_t, void *) except? -1'.



I don't know cython at all.

I've juste run
./install-deps.sh
./do_cmake.sh
cd build
ninja

# gcc --version
gcc (GCC) 11.2.1 20220127 (Red Hat 11.2.1-9)

Any suggestion ?

Thanks

Patrick

Le 24/10/2023 à 22:43, Zack Cerza a écrit :

That's correct - it's the removable flag that's causing the disks to
be excluded.

I actually just merged this PR last week:
https://github.com/ceph/ceph/pull/49954

One of the changes it made was to enable removable (but not USB)
devices, as there are vendors that report hot-swappable drives as
removable. Patrick, it looks like this may resolve your issue as well.


On Tue, Oct 24, 2023 at 5:57 AM Eugen Block  wrote:

Hi,


May be because they are hot-swappable hard drives.

yes, that's my assumption as well.


Zitat von Patrick Begou :


Hi Eugen,

Yes Eugen, all the devices /dev/sd[abc] have the removable flag set
to 1. May be because they are hot-swappable hard drives.

I have contacted the commit author Zack Cerza and he asked me for
some additional tests too this morning. I add him in copy to this
mail.

Patrick

Le 24/10/2023 à 12:57, Eugen Block a écrit :

Hi,

just to confirm, could you check that the disk which is *not*
discovered by 16.2.11 has a "removable" flag?

cat /sys/block/sdX/removable

I could reproduce it as well on a test machine with a USB thumb
drive (live distro) which is excluded in 16.2.11 but is shown in
16.2.10. Although I'm not a developer I tried to understand what
changes were made in
https://github.com/ceph/ceph/pull/46375/files#diff-330f9319b0fe352dff0486f66d3c4d6a6a3d48efd900b2ceb86551cfd88dc4c4R771
 and there's this
line:


if get_file_contents(os.path.join(_sys_block_path, dev,
'removable')) == "1":
continue

The thumb drive is removable, of course, apparently that is filtered here.

Regards,
Eugen

Zitat von Patrick Begou :


Le 23/10/2023 à 03:04, 544463...@qq.com a écrit :

I think you can try to roll back this part of the python code and
wait for your good news :)


Not so easy 


[root@e9865d9a7f41 ceph]# git revert
4fc6bc394dffaf3ad375ff29cbb0a3eb9e4dbefc
Auto-merging src/ceph-volume/ceph_volume/tests/util/test_device.py
CONFLICT (content): Merge conflict in
src/ceph-volume/ceph_volume/tests/util/test_device.py
Auto-merging src/ceph-volume/ceph_volume/util/device.py
CONFLICT (content): Merge conflict in
src/ceph-volume/ceph_volume/util/device.py
Auto-merging src/ceph-volume/ceph_volume/util/disk.py
CONFLICT (content): Merge conflict in
src/ceph-volume/ceph_volume/util/disk.py
error: could not revert 4fc6bc394df... ceph-volume: Optionally
consume loop devices

Patrick
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: [EXTERNAL] [Pacific] ceph orch device ls do not returns any HDD

2023-10-24 Thread Patrick Begou


Some tests:

If in Nautilus 16.2.14 in 
/usr/lib/python3.6/site-packages/ceph_volume/util/disk.py I disable 
lines 804 and 805
    804 if get_file_contents(os.path.join(_sys_block_path, dev, 
'removable')) == "1":

    805 continue
the command "ceph-volume inventory" works as in Octopus or Nautilus < 
16.2.11:


[ceph: root@mostha1 /]# ceph-volume inventory

Device Path   Size Device nodes    rotates available 
Model name
/dev/sdc  232.83 GB    sdc True True  
SAMSUNG HE253GJ
/dev/sda  232.83 GB    sda True False 
SAMSUNG HE253GJ
/dev/sdb  465.76 GB    sdb True False 
WDC WD5003ABYX-1


but
1) "ceph orch device ls" still returns nothing.

2) I cannot zap the /dev/sdc device:
[ceph: root@mostha1 /]# ceph orch device zap 
mostha1.legi.grenoble-inp.fr /dev/sdc --force
Error EINVAL: Device path '/dev/sdc' not found on host 
'mostha1.legi.grenoble-inp.fr'


3) I cannot manualy add the sdc device as an osd:
[ceph: root@mostha1 /]# ceph orch daemon add osd 
mostha1.legi.grenoble-inp.fr:/dev/sdc

Created no osd(s) on host mostha1.legi.grenoble-inp.fr; already created?

Even is the device is present and unused:
[ceph: root@mostha1 /]# lsblk
NAME MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda 8:0    1 232.9G  0 disk
|-sda1 8:1    1   3.9G  0 part /rootfs/boot
|-sda2 8:2    1   3.9G  0 part [SWAP]
`-sda3 8:3    1   225G  0 part
|-al8vg-rootvol 253:0    0  48.8G  0 lvm  /rootfs
|-al8vg-homevol 253:2    0   9.8G  0 lvm  /rootfs/home
|-al8vg-tmpvol 253:3    0   9.8G  0 lvm  /rootfs/tmp
`-al8vg-varvol 253:4    0  79.8G  0 lvm  /rootfs/var
sdb 8:16   1 465.8G  0 disk
`-ceph--08827fdc--136e--4070--97e9--e5e8b3970766-osd--block--7dec1808--d6f4--4f90--ac74--75a4346e1df5 
253:1    0 465.8G  0 lvm

sdc 8:32   1 232.9G  0 disk

Patrick

Le 24/10/2023 à 13:38, Patrick Begou a écrit :

Hi Eugen,

Yes Eugen, all the devices /dev/sd[abc] have the removable flag set to 
1. May be because they are hot-swappable hard drives.


I have contacted the commit author Zack Cerza and he asked me for some 
additional tests too this morning. I add him in copy to this mail.


Patrick

Le 24/10/2023 à 12:57, Eugen Block a écrit :

Hi,

just to confirm, could you check that the disk which is *not* 
discovered by 16.2.11 has a "removable" flag?


cat /sys/block/sdX/removable

I could reproduce it as well on a test machine with a USB thumb drive 
(live distro) which is excluded in 16.2.11 but is shown in 16.2.10. 
Although I'm not a developer I tried to understand what changes were 
made in 
https://github.com/ceph/ceph/pull/46375/files#diff-330f9319b0fe352dff0486f66d3c4d6a6a3d48efd900b2ceb86551cfd88dc4c4R771 
and there's this line:


if get_file_contents(os.path.join(_sys_block_path, dev, 
'removable')) == "1":

   continue


The thumb drive is removable, of course, apparently that is filtered 
here.


Regards,
Eugen

Zitat von Patrick Begou :


Le 23/10/2023 à 03:04, 544463...@qq.com a écrit :
I think you can try to roll back this part of the python code and 
wait for your good news :)



Not so easy 


[root@e9865d9a7f41 ceph]# git revert 
4fc6bc394dffaf3ad375ff29cbb0a3eb9e4dbefc

Auto-merging src/ceph-volume/ceph_volume/tests/util/test_device.py
CONFLICT (content): Merge conflict in 
src/ceph-volume/ceph_volume/tests/util/test_device.py

Auto-merging src/ceph-volume/ceph_volume/util/device.py
CONFLICT (content): Merge conflict in 
src/ceph-volume/ceph_volume/util/device.py

Auto-merging src/ceph-volume/ceph_volume/util/disk.py
CONFLICT (content): Merge conflict in 
src/ceph-volume/ceph_volume/util/disk.py
error: could not revert 4fc6bc394df... ceph-volume: Optionally 
consume loop devices


Patrick
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: [EXTERNAL] [Pacific] ceph orch device ls do not returns any HDD

2023-10-24 Thread Patrick Begou


Hi Eugen,

Yes Eugen, all the devices /dev/sd[abc] have the removable flag set to 
1. May be because they are hot-swappable hard drives.


I have contacted the commit author Zack Cerza and he asked me for some 
additional tests too this morning. I add him in copy to this mail.


Patrick

Le 24/10/2023 à 12:57, Eugen Block a écrit :

Hi,

just to confirm, could you check that the disk which is *not* 
discovered by 16.2.11 has a "removable" flag?


cat /sys/block/sdX/removable

I could reproduce it as well on a test machine with a USB thumb drive 
(live distro) which is excluded in 16.2.11 but is shown in 16.2.10. 
Although I'm not a developer I tried to understand what changes were 
made in 
https://github.com/ceph/ceph/pull/46375/files#diff-330f9319b0fe352dff0486f66d3c4d6a6a3d48efd900b2ceb86551cfd88dc4c4R771 
and there's this line:


if get_file_contents(os.path.join(_sys_block_path, dev, 'removable')) 
== "1":

   continue


The thumb drive is removable, of course, apparently that is filtered 
here.


Regards,
Eugen

Zitat von Patrick Begou :


Le 23/10/2023 à 03:04, 544463...@qq.com a écrit :
I think you can try to roll back this part of the python code and 
wait for your good news :)



Not so easy 


[root@e9865d9a7f41 ceph]# git revert 
4fc6bc394dffaf3ad375ff29cbb0a3eb9e4dbefc

Auto-merging src/ceph-volume/ceph_volume/tests/util/test_device.py
CONFLICT (content): Merge conflict in 
src/ceph-volume/ceph_volume/tests/util/test_device.py

Auto-merging src/ceph-volume/ceph_volume/util/device.py
CONFLICT (content): Merge conflict in 
src/ceph-volume/ceph_volume/util/device.py

Auto-merging src/ceph-volume/ceph_volume/util/disk.py
CONFLICT (content): Merge conflict in 
src/ceph-volume/ceph_volume/util/disk.py
error: could not revert 4fc6bc394df... ceph-volume: Optionally 
consume loop devices


Patrick
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: [EXTERNAL] [Pacific] ceph orch device ls do not returns any HDD

2023-10-23 Thread Patrick Begou


Le 23/10/2023 à 03:04, 544463...@qq.com a écrit :

I think you can try to roll back this part of the python code and wait for your 
good news :)



Not so easy 


[root@e9865d9a7f41 ceph]# git revert 
4fc6bc394dffaf3ad375ff29cbb0a3eb9e4dbefc

Auto-merging src/ceph-volume/ceph_volume/tests/util/test_device.py
CONFLICT (content): Merge conflict in 
src/ceph-volume/ceph_volume/tests/util/test_device.py

Auto-merging src/ceph-volume/ceph_volume/util/device.py
CONFLICT (content): Merge conflict in 
src/ceph-volume/ceph_volume/util/device.py

Auto-merging src/ceph-volume/ceph_volume/util/disk.py
CONFLICT (content): Merge conflict in 
src/ceph-volume/ceph_volume/util/disk.py
error: could not revert 4fc6bc394df... ceph-volume: Optionally consume 
loop devices


Patrick
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: [EXTERNAL] [Pacific] ceph orch device ls do not returns any HDD

2023-10-20 Thread Patrick Begou


Hi all,

ending with git bisect just now shows:

4fc6bc394dffaf3ad375ff29cbb0a3eb9e4dbefc is the first bad commit
commit 4fc6bc394dffaf3ad375ff29cbb0a3eb9e4dbefc
Author: Zack Cerza 
Date:   Tue May 17 11:29:02 2022 -0600

    ceph-volume: Optionally consume loop devices

    A similar proposal was rejected in #24765; I understand the logic
    behind the rejection, but this will allow us to run Ceph clusters on
    machines that lack disk resources for testing purposes. We just need to
    make it impossible to accidentally enable, and make it clear it is
    unsupported.

    Signed-off-by: Zack Cerza 
    (cherry picked from commit c7f017b21ade3762ba5b7b9688bed72c6b60dc0e)

 .../ceph_volume/tests/util/test_device.py  | 17 +++
 src/ceph-volume/ceph_volume/util/device.py | 14 +++--
 src/ceph-volume/ceph_volume/util/disk.py   | 59 
++

 3 files changed, 78 insertions(+), 12 deletions(-)

I will try to investigate next week but if some Ceph expert developpers 
can have a look at this commit ;-)


Have a nice week-end

Patrick

Le 18/10/2023 à 13:48, Patrick Begou a écrit :

Hi all,

I'm trying to catch the faulty commit. I'm able to build Ceph from the 
git repo in a fresh podman container but at this time, the lsblk 
command returns nothing in my container.

In ceph containers lsblk works
So something is wrong with launching my podman container (or different 
from launching ceph containers) and I cannot find what.


Any help about this step ?

Thanks

Patrick


Le 13/10/2023 à 09:18, Eugen Block a écrit :

Trying to resend with the attachment.
I can't really find anything suspicious, ceph-volume (16.2.11) does 
recognize /dev/sdc though:


[2023-10-12 08:58:14,135][ceph_volume.process][INFO  ] stdout 
NAME="sdc" KNAME="sdc" PKNAME="" MAJ:MIN="8:32" FSTYPE="" 
MOUNTPOINT="" LABEL="" UUID="" RO="0" RM="1" MODEL="SAMSUNG HE253GJ " 
SIZE="232.9G" STATE="running" OWNER="root" GROUP="disk" 
MODE="brw-rw" ALIGNMENT="0" PHY-SEC="512" LOG-SEC="512" ROTA="1" 
SCHED="mq-deadline" TYPE="disk" DISC-ALN="0" DISC-GRAN="0B" 
DISC-MAX="0B" DISC-ZERO="0" PKNAME="" PARTLABEL=""
[2023-10-12 08:58:14,139][ceph_volume.util.system][INFO  ] Executable 
pvs found on the host, will use /sbin/pvs
[2023-10-12 08:58:14,140][ceph_volume.process][INFO  ] Running 
command: nsenter --mount=/rootfs/proc/1/ns/mnt 
--ipc=/rootfs/proc/1/ns/ipc --net=/rootfs/proc/1/ns/net 
--uts=/rootfs/proc/1/ns/uts /sbin/pvs --noheadings --readonly 
--units=b --nosuffix --separator=";" -o 
pv_name,vg_name,pv_count,lv_count,vg_attr,vg_extent_count,vg_free_count,vg_extent_size


But apparently it just stops after that. I already tried to find a 
debug log-level for ceph-volume but it's not applicable to all 
subcommands.
The cephadm.log also just stops without even finishing the "copying 
blob", which makes me wonder if it actually pulls the entire image? I 
assume you have enough free disk space (otherwise I would expect a 
message "failed to pull target image"), do you see any other warnings 
in syslog or something? Or are the logs incomplete?

Maybe someone else finds any clues in the logs...

Regards,
Eugen

Zitat von Patrick Begou :


Hi Eugen,

You will find in attachment cephadm.log and cepĥ-volume.log. Each 
contains the outputs for the 2 versions. v16.2.10-20220920 is really 
more verbose or v16.2.11-20230125 does not execute all the detection 
process


Patrick


Le 12/10/2023 à 09:34, Eugen Block a écrit :
Good catch, and I found the thread I had in my mind, it was this 
exact one. :-D Anyway, can you share the ceph-volume.log from the 
working and the not working attempt?
I tried to look for something significant in the pacific release 
notes for 16.2.11, and there were some changes to ceph-volume, but 
I'm not sure what it could be.


Zitat von Patrick Begou :

I've ran additional tests with Pacific releases and with 
"ceph-volume inventory" things went wrong with the first v16.11 
release (v16.2.11-20230125)


=== Ceph v16.2.10-20220920 ===

Device Path   Size rotates available Model name
/dev/sdc  232.83 GB    True    True SAMSUNG HE253GJ
/dev/sda  232.83 GB    True    False SAMSUNG HE253GJ
/dev/sdb  465.76 GB    True    False WDC WD5003ABYX-1

=== Ceph v16.2.11-20230125 ===

Device Path   Size Device nodes rotates 
available Model name



May be this could help to see what has changed ?

Patrick

Le 11/10/2023 à 17:38, Eugen Block a écrit :
That's really strange. Just out of curiosity, have you tried 
Quincy (

[ceph-users] Re: [EXTERNAL] [Pacific] ceph orch device ls do not returns any HDD

2023-10-18 Thread Patrick Begou


Hi all,

I'm trying to catch the faulty commit. I'm able to build Ceph from the 
git repo in a fresh podman container but at this time, the lsblk command 
returns nothing in my container.

In ceph containers lsblk works
So something is wrong with launching my podman container (or different 
from launching ceph containers) and I cannot find what.


Any help about this step ?

Thanks

Patrick


Le 13/10/2023 à 09:18, Eugen Block a écrit :

Trying to resend with the attachment.
I can't really find anything suspicious, ceph-volume (16.2.11) does 
recognize /dev/sdc though:


[2023-10-12 08:58:14,135][ceph_volume.process][INFO  ] stdout 
NAME="sdc" KNAME="sdc" PKNAME="" MAJ:MIN="8:32" FSTYPE="" 
MOUNTPOINT="" LABEL="" UUID="" RO="0" RM="1" MODEL="SAMSUNG HE253GJ " 
SIZE="232.9G" STATE="running" OWNER="root" GROUP="disk" 
MODE="brw-rw" ALIGNMENT="0" PHY-SEC="512" LOG-SEC="512" ROTA="1" 
SCHED="mq-deadline" TYPE="disk" DISC-ALN="0" DISC-GRAN="0B" 
DISC-MAX="0B" DISC-ZERO="0" PKNAME="" PARTLABEL=""
[2023-10-12 08:58:14,139][ceph_volume.util.system][INFO  ] Executable 
pvs found on the host, will use /sbin/pvs
[2023-10-12 08:58:14,140][ceph_volume.process][INFO  ] Running 
command: nsenter --mount=/rootfs/proc/1/ns/mnt 
--ipc=/rootfs/proc/1/ns/ipc --net=/rootfs/proc/1/ns/net 
--uts=/rootfs/proc/1/ns/uts /sbin/pvs --noheadings --readonly 
--units=b --nosuffix --separator=";" -o 
pv_name,vg_name,pv_count,lv_count,vg_attr,vg_extent_count,vg_free_count,vg_extent_size


But apparently it just stops after that. I already tried to find a 
debug log-level for ceph-volume but it's not applicable to all 
subcommands.
The cephadm.log also just stops without even finishing the "copying 
blob", which makes me wonder if it actually pulls the entire image? I 
assume you have enough free disk space (otherwise I would expect a 
message "failed to pull target image"), do you see any other warnings 
in syslog or something? Or are the logs incomplete?

Maybe someone else finds any clues in the logs...

Regards,
Eugen

Zitat von Patrick Begou :


Hi Eugen,

You will find in attachment cephadm.log and cepĥ-volume.log. Each 
contains the outputs for the 2 versions.  v16.2.10-20220920 is really 
more verbose or v16.2.11-20230125 does not execute all the detection 
process


Patrick


Le 12/10/2023 à 09:34, Eugen Block a écrit :
Good catch, and I found the thread I had in my mind, it was this 
exact one. :-D Anyway, can you share the ceph-volume.log from the 
working and the not working attempt?
I tried to look for something significant in the pacific release 
notes for 16.2.11, and there were some changes to ceph-volume, but 
I'm not sure what it could be.


Zitat von Patrick Begou :

I've ran additional tests with Pacific releases and with 
"ceph-volume inventory" things went wrong with the first v16.11 
release (v16.2.11-20230125)


=== Ceph v16.2.10-20220920 ===

Device Path   Size rotates available Model name
/dev/sdc  232.83 GB    True    True SAMSUNG HE253GJ
/dev/sda  232.83 GB    True    False SAMSUNG HE253GJ
/dev/sdb  465.76 GB    True    False WDC 
WD5003ABYX-1


=== Ceph v16.2.11-20230125 ===

Device Path   Size Device nodes rotates 
available Model name



May be this could help to see what has changed ?

Patrick

Le 11/10/2023 à 17:38, Eugen Block a écrit :
That's really strange. Just out of curiosity, have you tried 
Quincy (and/or Reef) as well? I don't recall what inventory does 
in the background exactly, I believe Adam King mentioned that in 
some thread, maybe that can help here. I'll search for that thread 
tomorrow.


Zitat von Patrick Begou :


Hi Eugen,

[root@mostha1 ~]# rpm -q cephadm
cephadm-16.2.14-0.el8.noarch

Log associated to the

2023-10-11 16:16:02,167 7f820515fb80 DEBUG 


cephadm ['gather-facts']
2023-10-11 16:16:02,208 7f820515fb80 DEBUG /bin/podman: 4.4.1
2023-10-11 16:16:02,313 7f820515fb80 DEBUG sestatus: SELinux 
status: disabled
2023-10-11 16:16:02,317 7f820515fb80 DEBUG sestatus: SELinux 
status: disabled
2023-10-11 16:16:02,322 7f820515fb80 DEBUG sestatus: SELinux 
status: disabled
2023-10-11 16:16:02,326 7f820515fb80 DEBUG sestatus: SELinux 
status: disabled
2023-10-11 16:16:02,329 7f820515fb80 DEBUG sestatus: SELinux 
status: disabled
2023-10-11 16:16:02,333 7f820515fb80 DEBUG sestatus: SELinux 
status: disabled
2023-10-11 16:16:0

[ceph-users] Re: [EXTERNAL] [Pacific] ceph orch device ls do not returns any HDD

2023-10-15 Thread Patrick Begou


Hi Johan,

So it is not O.S. related as you are running Debian and I am running 
Alma Linux. But I'm surprised why so few people meet this bug.


Patrick

Le 13/10/2023 à 17:38, Johan a écrit :

At home Im running a small cluster, Ceph v17.2.6, Debian 11 Bullseye.

I have recently added a new server to the cluster but face the same 
problem as Patrick, I can't add any HDD. Ceph doesn't recognise them.


I have run the same tests as Patrick, using Ceph v14-v18, and as 
Patrick showed the problem appears in Ceph v16.2.11-20230125



=== Ceph v16.2.10-20220920 ===
$ sudo cephadm --image quay.io/ceph/ceph:v16.2.10-20220920 ceph-volume 
inventory

Inferring fsid 5592891c-30e4-11ed-b720-f02f741f58ac

Device Path   Size rotates available Model name
/dev/nvme0n1  931.51 GB    False   False KINGSTON 
SNV2S1000G
/dev/nvme1n1  931.51 GB    False   False KINGSTON 
SNV2S1000G

/dev/sda  3.64 TB  True    False WDC WD4003FFBX-6
/dev/sdb  5.46 TB  True    False WDC WD6003FFBX-6
/dev/sdc  7.28 TB  True    False ST8000NE001-2M71
/dev/sdd  7.28 TB  True    False WDC WD8003FFBX-6


=== Ceph v16.2.11-20230125 ===
$ sudo cephadm --image quay.io/ceph/ceph:v16.2.11-20230125 ceph-volume 
inventory

Inferring fsid 5592891c-30e4-11ed-b720-f02f741f58ac

Device Path   Size Device nodes    rotates 
available Model name

/dev/md0  9.30 GB  nvme1n1p2,nvme0n1p2 False False
/dev/md1  59.57 GB nvme0n1p3,nvme1n1p3 False False
/dev/md2  279.27 GB    nvme1n1p4,nvme0n1p4 False False
/dev/nvme0n1  931.51 GB    nvme0n1 False False 
KINGSTON SNV2S1000G
/dev/nvme1n1  931.51 GB    nvme1n1 False False 
KINGSTON SNV2S1000G



/Johan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: [EXTERNAL] [Pacific] ceph orch device ls do not returns any HDD

2023-10-13 Thread Patrick Begou


The server has enough available storage:

   [root@mostha1 log]# df -h
   Sys. de fichiers  Taille Utilisé Dispo Uti% Monté sur
   devtmpfs 24G   0   24G   0% /dev
   tmpfs    24G 84K   24G   1% /dev/shm
   tmpfs    24G    195M   24G   1% /run
   tmpfs    24G   0   24G   0% /sys/fs/cgroup
   /dev/mapper/al8vg-rootvol    49G    6,5G   43G  14% /
   /dev/sda1   3,8G    412M  3,2G  12% /boot
   /dev/mapper/al8vg-varvol 20G    9,7G   11G  49% /var
   /dev/mapper/al8vg-tmpvol    9,8G    103M  9,7G   2% /tmp
   /dev/mapper/al8vg-homevol   9,8G    103M  9,7G   2% /home
   tmpfs   4,7G   0  4,7G   0% /run/user/0
   overlay  20G    9,7G   11G  49%
   
/var/lib/containers/storage/overlay/b8769720357497ebdbf68768753da154b3d63cfbef254036441af60a91649127/merged
   overlay  20G    9,7G   11G  49%
   
/var/lib/containers/storage/overlay/2eed15daec130da50530621740025655ecd961e1b1855f35922f03561960d999/merged
   overlay  20G    9,7G   11G  49%
   
/var/lib/containers/storage/overlay/4d0b4f0b4063cce3f983beda80bac78dd3b5f30379d2eb96daefef8ddfaf/merged
   overlay  20G    9,7G   11G  49%
   
/var/lib/containers/storage/overlay/129c5d3e070f80f17a79c1f172b60c2fc0f30a84b51b07ea207dc5868cd1d7f0/merged
   overlay  20G    9,7G   11G  49%
   
/var/lib/containers/storage/overlay/c41d6bdaf941d16fd80326ef5dae6a02524d3f41bcb64cb29bda2bd5816fee9a/merged
   overlay  20G    9,7G   11G  49%
   
/var/lib/containers/storage/overlay/1b6c1c893e7ed2c128378bdf2af408f3a834f3453a0505ac042099d6f484dc9b/merged
   overlay  20G    9,7G   11G  49%
   
/var/lib/containers/storage/overlay/962e5c1380a60e9a54ac29eccb71667f13a5f9047b2ee98e6303a5fea613162f/merged
   overlay  20G    9,7G   11G  49%
   
/var/lib/containers/storage/overlay/3578d0f5a70afce839017dec888908dead82fb50f90834e5b040e9fd2ada9fba/merged
   overlay  20G    9,7G   11G  49%
   
/var/lib/containers/storage/overlay/7d9c35751388325c3da54f03981770aa49599a657c2dfe3ba9527884864f177d/merged


When I was testing different versions, I removed tested images each time 
with "podman rmi"


   for i in v16.2.10-20220920 v16.2.11-20230125 v16.2.11-20230209
   v16.2.11-20230316; do
   echo "=== Ceph $i ==="
   cephadm --image quay.io/ceph/ceph:$i ceph-volume inventory
   id=$(podman images |grep " $i "|cut -c 46-59)
   podman rmi $id
   done |tee trace.ceph16.2.txt

I do not now how to investigate, may be with a "git bisect" between the 
2 releases to catch the faulty commit in a podman container context. I'm 
not so familiar with containers and ceph.


Patrick

Le 13/10/2023 à 09:18, Eugen Block a écrit :

Trying to resend with the attachment.
I can't really find anything suspicious, ceph-volume (16.2.11) does 
recognize /dev/sdc though:


[2023-10-12 08:58:14,135][ceph_volume.process][INFO  ] stdout 
NAME="sdc" KNAME="sdc" PKNAME="" MAJ:MIN="8:32" FSTYPE="" 
MOUNTPOINT="" LABEL="" UUID="" RO="0" RM="1" MODEL="SAMSUNG HE253GJ " 
SIZE="232.9G" STATE="running" OWNER="root" GROUP="disk" 
MODE="brw-rw" ALIGNMENT="0" PHY-SEC="512" LOG-SEC="512" ROTA="1" 
SCHED="mq-deadline" TYPE="disk" DISC-ALN="0" DISC-GRAN="0B" 
DISC-MAX="0B" DISC-ZERO="0" PKNAME="" PARTLABEL=""
[2023-10-12 08:58:14,139][ceph_volume.util.system][INFO  ] Executable 
pvs found on the host, will use /sbin/pvs
[2023-10-12 08:58:14,140][ceph_volume.process][INFO  ] Running 
command: nsenter --mount=/rootfs/proc/1/ns/mnt 
--ipc=/rootfs/proc/1/ns/ipc --net=/rootfs/proc/1/ns/net 
--uts=/rootfs/proc/1/ns/uts /sbin/pvs --noheadings --readonly 
--units=b --nosuffix --separator=";" -o 
pv_name,vg_name,pv_count,lv_count,vg_attr,vg_extent_count,vg_free_count,vg_extent_size


But apparently it just stops after that. I already tried to find a 
debug log-level for ceph-volume but it's not applicable to all 
subcommands.
The cephadm.log also just stops without even finishing the "copying 
blob", which makes me wonder if it actually pulls the entire image? I 
assume you have enough free disk space (otherwise I would expect a 
message "failed to pull target image"), do you see any other warnings 
in syslog or something? Or are the logs incomplete?

Maybe someone else finds any clues in the logs...

Regards,
Eugen

Zitat von Patrick Begou :


Hi Eugen,

You will find in attachment cephadm.log and cepĥ-volume.log. Each 
contains the outputs for the 2 ve

[ceph-users] Re: [EXTERNAL] [Pacific] ceph orch device ls do not returns any HDD

2023-10-12 Thread Patrick Begou


Hi Eugen,

You will find in attachment cephadm.log and cepĥ-volume.log. Each 
contains the outputs for the 2 versions.  v16.2.10-20220920 is really 
more verbose or v16.2.11-20230125 does not execute all the detection process


Patrick


Le 12/10/2023 à 09:34, Eugen Block a écrit :
Good catch, and I found the thread I had in my mind, it was this exact 
one. :-D Anyway, can you share the ceph-volume.log from the working 
and the not working attempt?
I tried to look for something significant in the pacific release notes 
for 16.2.11, and there were some changes to ceph-volume, but I'm not 
sure what it could be.


Zitat von Patrick Begou :

I've ran additional tests with Pacific releases and with "ceph-volume 
inventory" things went wrong with the first v16.11 release 
(v16.2.11-20230125)


=== Ceph v16.2.10-20220920 ===

Device Path   Size rotates available Model name
/dev/sdc  232.83 GB    True    True  SAMSUNG HE253GJ
/dev/sda  232.83 GB    True    False SAMSUNG HE253GJ
/dev/sdb  465.76 GB    True    False WDC 
WD5003ABYX-1


=== Ceph v16.2.11-20230125 ===

Device Path   Size Device nodes    rotates 
available Model name



May be this could help to see what has changed ?

Patrick

Le 11/10/2023 à 17:38, Eugen Block a écrit :
That's really strange. Just out of curiosity, have you tried Quincy 
(and/or Reef) as well? I don't recall what inventory does in the 
background exactly, I believe Adam King mentioned that in some 
thread, maybe that can help here. I'll search for that thread tomorrow.


Zitat von Patrick Begou :


Hi Eugen,

[root@mostha1 ~]# rpm -q cephadm
cephadm-16.2.14-0.el8.noarch

Log associated to the

2023-10-11 16:16:02,167 7f820515fb80 DEBUG 


cephadm ['gather-facts']
2023-10-11 16:16:02,208 7f820515fb80 DEBUG /bin/podman: 4.4.1
2023-10-11 16:16:02,313 7f820515fb80 DEBUG sestatus: SELinux 
status: disabled
2023-10-11 16:16:02,317 7f820515fb80 DEBUG sestatus: SELinux 
status: disabled
2023-10-11 16:16:02,322 7f820515fb80 DEBUG sestatus: SELinux 
status: disabled
2023-10-11 16:16:02,326 7f820515fb80 DEBUG sestatus: SELinux 
status: disabled
2023-10-11 16:16:02,329 7f820515fb80 DEBUG sestatus: SELinux 
status: disabled
2023-10-11 16:16:02,333 7f820515fb80 DEBUG sestatus: SELinux 
status: disabled
2023-10-11 16:16:04,474 7ff2a5c08b80 DEBUG 


cephadm ['ceph-volume', 'inventory']
2023-10-11 16:16:04,516 7ff2a5c08b80 DEBUG /usr/bin/podman: 4.4.1
2023-10-11 16:16:04,520 7ff2a5c08b80 DEBUG Using default config: 
/etc/ceph/ceph.conf
2023-10-11 16:16:04,573 7ff2a5c08b80 DEBUG /usr/bin/podman: 
0d28d71358d7,445.8MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: 
2084faaf4d54,13.27MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: 
61073c53805d,512.7MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: 
6b9f0b72d668,361.1MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: 
7493a28808ad,163.7MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: 
a89672a3accf,59.22MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: 
b45271cc9726,54.24MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: 
e00ec13ab138,707.3MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: 
fcb1e1a6b08d,35.55MB / 50.32GB
2023-10-11 16:16:04,630 7ff2a5c08b80 DEBUG /usr/bin/podman: 
0d28d71358d7,1.28%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman: 
2084faaf4d54,0.00%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman: 
61073c53805d,1.19%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman: 
6b9f0b72d668,1.03%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman: 
7493a28808ad,0.78%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman: 
a89672a3accf,0.11%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman: 
b45271cc9726,1.35%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman: 
e00ec13ab138,0.43%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman: 
fcb1e1a6b08d,0.02%
2023-10-11 16:16:04,634 7ff2a5c08b80 INFO Inferring fsid 
250f9864-0142-11ee-8e5f-00266cf8869c
2023-10-11 16:16:04,691 7ff2a5c08b80 DEBUG /usr/bin/podman: 
quay.io/ceph/ceph@sha256:f30bf50755d7087f47c6223e6a921caf5b12e86401b3d49220230c84a8302a1e 

2023-10-11 16:16:04,692 7ff2a5c08b80 DEBUG /usr/bin/podman: 
quay.io/ceph/ceph@sha256:c08064dde4bba4e72a1f55d90ca32df9ef5aafab82efe2e0a0722444a5aaacca 

2023-10-11 16:16:04,692 7ff2a5c08b80 DEBUG /usr/bin/podman: 
docker.io/ceph/c

[ceph-users] Re: [EXTERNAL] [Pacific] ceph orch device ls do not returns any HDD

2023-10-11 Thread Patrick Begou

I've ran additional tests with Pacific releases and with "ceph-volume 
inventory" things went wrong with the first v16.11 release 
(v16.2.11-20230125)


=== Ceph v16.2.10-20220920 ===

Device Path   Size rotates available Model name
/dev/sdc  232.83 GB    True    True  SAMSUNG HE253GJ
/dev/sda  232.83 GB    True    False SAMSUNG HE253GJ
/dev/sdb  465.76 GB    True    False WDC WD5003ABYX-1

=== Ceph v16.2.11-20230125 ===

Device Path   Size Device nodes    rotates available 
Model name



May be this could help to see what has changed ?

Patrick

Le 11/10/2023 à 17:38, Eugen Block a écrit :
That's really strange. Just out of curiosity, have you tried Quincy 
(and/or Reef) as well? I don't recall what inventory does in the 
background exactly, I believe Adam King mentioned that in some thread, 
maybe that can help here. I'll search for that thread tomorrow.


Zitat von Patrick Begou :


Hi Eugen,

[root@mostha1 ~]# rpm -q cephadm
cephadm-16.2.14-0.el8.noarch

Log associated to the

2023-10-11 16:16:02,167 7f820515fb80 DEBUG 


cephadm ['gather-facts']
2023-10-11 16:16:02,208 7f820515fb80 DEBUG /bin/podman: 4.4.1
2023-10-11 16:16:02,313 7f820515fb80 DEBUG sestatus: SELinux 
status: disabled
2023-10-11 16:16:02,317 7f820515fb80 DEBUG sestatus: SELinux 
status: disabled
2023-10-11 16:16:02,322 7f820515fb80 DEBUG sestatus: SELinux 
status: disabled
2023-10-11 16:16:02,326 7f820515fb80 DEBUG sestatus: SELinux 
status: disabled
2023-10-11 16:16:02,329 7f820515fb80 DEBUG sestatus: SELinux 
status: disabled
2023-10-11 16:16:02,333 7f820515fb80 DEBUG sestatus: SELinux 
status: disabled
2023-10-11 16:16:04,474 7ff2a5c08b80 DEBUG 


cephadm ['ceph-volume', 'inventory']
2023-10-11 16:16:04,516 7ff2a5c08b80 DEBUG /usr/bin/podman: 4.4.1
2023-10-11 16:16:04,520 7ff2a5c08b80 DEBUG Using default config: 
/etc/ceph/ceph.conf
2023-10-11 16:16:04,573 7ff2a5c08b80 DEBUG /usr/bin/podman: 
0d28d71358d7,445.8MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: 
2084faaf4d54,13.27MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: 
61073c53805d,512.7MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: 
6b9f0b72d668,361.1MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: 
7493a28808ad,163.7MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: 
a89672a3accf,59.22MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: 
b45271cc9726,54.24MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: 
e00ec13ab138,707.3MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: 
fcb1e1a6b08d,35.55MB / 50.32GB
2023-10-11 16:16:04,630 7ff2a5c08b80 DEBUG /usr/bin/podman: 
0d28d71358d7,1.28%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman: 
2084faaf4d54,0.00%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman: 
61073c53805d,1.19%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman: 
6b9f0b72d668,1.03%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman: 
7493a28808ad,0.78%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman: 
a89672a3accf,0.11%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman: 
b45271cc9726,1.35%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman: 
e00ec13ab138,0.43%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman: 
fcb1e1a6b08d,0.02%
2023-10-11 16:16:04,634 7ff2a5c08b80 INFO Inferring fsid 
250f9864-0142-11ee-8e5f-00266cf8869c
2023-10-11 16:16:04,691 7ff2a5c08b80 DEBUG /usr/bin/podman: 
quay.io/ceph/ceph@sha256:f30bf50755d7087f47c6223e6a921caf5b12e86401b3d49220230c84a8302a1e
2023-10-11 16:16:04,692 7ff2a5c08b80 DEBUG /usr/bin/podman: 
quay.io/ceph/ceph@sha256:c08064dde4bba4e72a1f55d90ca32df9ef5aafab82efe2e0a0722444a5aaacca
2023-10-11 16:16:04,692 7ff2a5c08b80 DEBUG /usr/bin/podman: 
docker.io/ceph/ceph@sha256:056637972a107df4096f10951e4216b21fcd8ae0b9fb4552e628d35df3f61139
2023-10-11 16:16:04,694 7ff2a5c08b80 INFO Using recent ceph image 
quay.io/ceph/ceph@sha256:f30bf50755d7087f47c6223e6a921caf5b12e86401b3d49220230c84a8302a1e

2023-10-11 16:16:05,094 7ff2a5c08b80 DEBUG stat: 167 167
2023-10-11 16:16:05,903 7ff2a5c08b80 DEBUG Acquiring lock 
140679815723776 on 
/run/cephadm/250f9864-0142-11ee-8e5f-00266cf8869c.lock
2023-10-11 16:16:05,903 7ff2a5c08b80 DEBUG Lock 140679815723776 
acquired on /run/cephadm/250f9864-0142-11ee-8e5f-00266cf8869c.lock
2023-10-11 16:16:05,929 7ff2a5c08b80 DEBUG sestatus: SELinux 
status: disabled
2023-10-11 16:16:05,933 7ff2a5c08b80 DEBU

[ceph-users] Re: [EXTERNAL] [Pacific] ceph orch device ls do not returns any HDD

2023-10-11 Thread Patrick Begou

This afternoon I have a look at the python file but do not manage how it 
works with containers as I am only a Fortran HPC programmer... but I 
found that "cephadm gather-facts" shows all the HDD in Pacific.


Some quick tests show:

== Nautilus ==

[root@mostha1 ~]# cephadm --image quay.io/ceph/ceph:v14 ceph-volume 
inventory

Inferring fsid 250f9864-0142-11ee-8e5f-00266cf8869c

Device Path   Size rotates available Model name
/dev/sdc  232.83 GB    True    True  SAMSUNG HE253GJ
/dev/sda  232.83 GB    True    False SAMSUNG HE253GJ
/dev/sdb  465.76 GB    True    False WDC WD5003ABYX-1

== Octopus ==

[root@mostha1 ~]# cephadm --image quay.io/ceph/ceph:v15 ceph-volume 
inventory

Inferring fsid 250f9864-0142-11ee-8e5f-00266cf8869c

Device Path   Size rotates available Model name
/dev/sdc  232.83 GB    True    True  SAMSUNG HE253GJ
/dev/sda  232.83 GB    True    False SAMSUNG 
HE253GJNautilus

/dev/sdb  465.76 GB    True    False WDC WD5003ABYX-1

== Pacific ==

[root@mostha1 ~]# cephadm --image quay.io/ceph/ceph:v16 ceph-volume 
inventory

Inferring fsid 250f9864-0142-11ee-8e5f-00266cf8869c

Device Path   Size Device nodes    rotates available 
Model name


== Quincy ==

[root@mostha1 ~]# cephadm --image quay.io/ceph/ceph:v17 ceph-volume 
inventory

Inferring fsid 250f9864-0142-11ee-8e5f-00266cf8869c

Device Path   Size Device nodes    rotates available 
Model name


== Reef ==

[root@mostha1 ~]# cephadm --image quay.io/ceph/ceph:v18 ceph-volume 
inventory

Inferring fsid 250f9864-0142-11ee-8e5f-00266cf8869c

Device Path   Size Device nodes    rotates available 
Model name


Could it be related to deprecated hardware support in Ceph with SATA 
drives ?


Patrick

Le 11/10/2023 à 17:38, Eugen Block a écrit :
That's really strange. Just out of curiosity, have you tried Quincy 
(and/or Reef) as well? I don't recall what inventory does in the 
background exactly, I believe Adam King mentioned that in some thread, 
maybe that can help here. I'll search for that thread tomorrow.


Zitat von Patrick Begou :


Hi Eugen,

[root@mostha1 ~]# rpm -q cephadm
cephadm-16.2.14-0.el8.noarch

Log associated to the

2023-10-11 16:16:02,167 7f820515fb80 DEBUG 


cephadm ['gather-facts']
2023-10-11 16:16:02,208 7f820515fb80 DEBUG /bin/podman: 4.4.1
2023-10-11 16:16:02,313 7f820515fb80 DEBUG sestatus: SELinux 
status: disabled
2023-10-11 16:16:02,317 7f820515fb80 DEBUG sestatus: SELinux 
status: disabled
2023-10-11 16:16:02,322 7f820515fb80 DEBUG sestatus: SELinux 
status: disabled
2023-10-11 16:16:02,326 7f820515fb80 DEBUG sestatus: SELinux 
status: disabled
2023-10-11 16:16:02,329 7f820515fb80 DEBUG sestatus: SELinux 
status: disabled
2023-10-11 16:16:02,333 7f820515fb80 DEBUG sestatus: SELinux 
status: disabled
2023-10-11 16:16:04,474 7ff2a5c08b80 DEBUG 


cephadm ['ceph-volume', 'inventory']
2023-10-11 16:16:04,516 7ff2a5c08b80 DEBUG /usr/bin/podman: 4.4.1
2023-10-11 16:16:04,520 7ff2a5c08b80 DEBUG Using default config: 
/etc/ceph/ceph.conf
2023-10-11 16:16:04,573 7ff2a5c08b80 DEBUG /usr/bin/podman: 
0d28d71358d7,445.8MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: 
2084faaf4d54,13.27MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: 
61073c53805d,512.7MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: 
6b9f0b72d668,361.1MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: 
7493a28808ad,163.7MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: 
a89672a3accf,59.22MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: 
b45271cc9726,54.24MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: 
e00ec13ab138,707.3MB / 50.32GB
2023-10-11 16:16:04,574 7ff2a5c08b80 DEBUG /usr/bin/podman: 
fcb1e1a6b08d,35.55MB / 50.32GB
2023-10-11 16:16:04,630 7ff2a5c08b80 DEBUG /usr/bin/podman: 
0d28d71358d7,1.28%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman: 
2084faaf4d54,0.00%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman: 
61073c53805d,1.19%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman: 
6b9f0b72d668,1.03%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman: 
7493a28808ad,0.78%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman: 
a89672a3accf,0.11%
2023-10-11 16:16:04,631 7ff2a5c08b80 DEBUG /usr/bin/podman: 
b45271cc9726,1.35%
2023-10-11 16:16:04,631 7f

[ceph-users] Re: [EXTERNAL] [Pacific] ceph orch device ls do not returns any HDD

2023-10-11 Thread Patrick Begou

467cf31b80 DEBUG Using default config: 
/etc/ceph/ceph.conf
2023-10-11 16:21:36,067 7f467cf31b80 DEBUG /usr/bin/podman: 
0d28d71358d7,452.1MB / 50.32GB
2023-10-11 16:21:36,067 7f467cf31b80 DEBUG /usr/bin/podman: 
2084faaf4d54,13.27MB / 50.32GB
2023-10-11 16:21:36,067 7f467cf31b80 DEBUG /usr/bin/podman: 
61073c53805d,513.6MB / 50.32GB
2023-10-11 16:21:36,067 7f467cf31b80 DEBUG /usr/bin/podman: 
6b9f0b72d668,322.4MB / 50.32GB
2023-10-11 16:21:36,067 7f467cf31b80 DEBUG /usr/bin/podman: 
7493a28808ad,164MB / 50.32GB
2023-10-11 16:21:36,067 7f467cf31b80 DEBUG /usr/bin/podman: 
a89672a3accf,58.5MB / 50.32GB
2023-10-11 16:21:36,067 7f467cf31b80 DEBUG /usr/bin/podman: 
b45271cc9726,54.69MB / 50.32GB
2023-10-11 16:21:36,067 7f467cf31b80 DEBUG /usr/bin/podman: 
e00ec13ab138,707.1MB / 50.32GB
2023-10-11 16:21:36,068 7f467cf31b80 DEBUG /usr/bin/podman: 
fcb1e1a6b08d,36.28MB / 50.32GB
2023-10-11 16:21:36,125 7f467cf31b80 DEBUG /usr/bin/podman: 
0d28d71358d7,1.27%
2023-10-11 16:21:36,125 7f467cf31b80 DEBUG /usr/bin/podman: 
2084faaf4d54,0.00%
2023-10-11 16:21:36,125 7f467cf31b80 DEBUG /usr/bin/podman: 
61073c53805d,1.16%
2023-10-11 16:21:36,125 7f467cf31b80 DEBUG /usr/bin/podman: 
6b9f0b72d668,1.02%
2023-10-11 16:21:36,125 7f467cf31b80 DEBUG /usr/bin/podman: 
7493a28808ad,0.78%
2023-10-11 16:21:36,125 7f467cf31b80 DEBUG /usr/bin/podman: 
a89672a3accf,0.11%
2023-10-11 16:21:36,125 7f467cf31b80 DEBUG /usr/bin/podman: 
b45271cc9726,1.35%
2023-10-11 16:21:36,125 7f467cf31b80 DEBUG /usr/bin/podman: 
e00ec13ab138,0.41%
2023-10-11 16:21:36,125 7f467cf31b80 DEBUG /usr/bin/podman: 
fcb1e1a6b08d,0.02%
2023-10-11 16:21:36,128 7f467cf31b80 INFO Inferring fsid 
250f9864-0142-11ee-8e5f-00266cf8869c
2023-10-11 16:21:36,186 7f467cf31b80 DEBUG /usr/bin/podman: 
quay.io/ceph/ceph@sha256:f30bf50755d7087f47c6223e6a921caf5b12e86401b3d49220230c84a8302a1e
2023-10-11 16:21:36,187 7f467cf31b80 DEBUG /usr/bin/podman: 
quay.io/ceph/ceph@sha256:c08064dde4bba4e72a1f55d90ca32df9ef5aafab82efe2e0a0722444a5aaacca
2023-10-11 16:21:36,187 7f467cf31b80 DEBUG /usr/bin/podman: 
docker.io/ceph/ceph@sha256:056637972a107df4096f10951e4216b21fcd8ae0b9fb4552e628d35df3f61139
2023-10-11 16:21:36,189 7f467cf31b80 INFO Using recent ceph image 
quay.io/ceph/ceph@sha256:f30bf50755d7087f47c6223e6a921caf5b12e86401b3d49220230c84a8302a1e

2023-10-11 16:21:36,549 7f467cf31b80 DEBUG stat: 167 167
2023-10-11 16:21:36,942 7f467cf31b80 DEBUG Acquiring lock 
139940396923424 on /run/cephadm/250f9864-0142-11ee-8e5f-00266cf8869c.lock
2023-10-11 16:21:36,942 7f467cf31b80 DEBUG Lock 139940396923424 acquired 
on /run/cephadm/250f9864-0142-11ee-8e5f-00266cf8869c.lock
2023-10-11 16:21:36,969 7f467cf31b80 DEBUG sestatus: SELinux 
status: disabled
2023-10-11 16:21:36,972 7f467cf31b80 DEBUG sestatus: SELinux 
status: disabled

2023-10-11 16:21:37,749 7f467cf31b80 DEBUG /usr/bin/podman:
2023-10-11 16:21:37,750 7f467cf31b80 DEBUG /usr/bin/podman: Device 
Path   Size Device nodes    rotates available Model name


Patrick

Le 11/10/2023 à 15:59, Eugen Block a écrit :
Can you check which cephadm version is installed on the host? And then 
please add (only the relevant) output from the cephadm.log when you 
run the inventory (without the --image ). Sometimes the 
version mismatch on the host and the one the orchestrator uses can 
cause some disruptions. You could try the same with the latest cephadm 
you have in /var/lib/ceph/${fsid}/ (ls -lrt 
/var/lib/ceph/${fsid}/cephadm.*). I mentioned that in this thread [1]. 
So you could try the following:


$ chmod +x /var/lib/ceph/{fsid}/cephadm.{latest}

$ python3 /var/lib/ceph/{fsid}/cephadm.{latest} ceph-volume inventory

Does the output differ? Paste the relevant cephadm.log from that 
attempt as well.


[1] 
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/LASBJCSPFGDYAWPVE2YLV2ZLF3HC5SLS/


Zitat von Patrick Begou :


Hi Eugen,

first many thanks for the time spent on this problem.

"ceph osd purge 2 --force --yes-i-really-mean-it" works and clean all 
the bas status.


*[root@mostha1 ~]# cephadm shell
*Inferring fsid 250f9864-0142-11ee-8e5f-00266cf8869c
Using recent ceph image 
quay.io/ceph/ceph@sha256:f30bf50755d7087f47c6223e6a921caf5b12e86401b3d49220230c84a8302a1e

*
*
*[ceph: root@mostha1 /]# ceph osd purge 2 --force 
--yes-i-really-mean-it *

purged osd.2
*
*
*[ceph: root@mostha1 /]# ceph osd tree*
ID  CLASS  WEIGHT   TYPE NAME STATUS  REWEIGHT  PRI-AFF
-1 1.72823  root default
-5 0.45477  host dean
 0    hdd  0.22739  osd.0 up   1.0  1.0
 4    hdd  0.22739  osd.4 up   1.0  1.0
-9 0.22739  host ekman
 6    hdd  0.22739  osd.6 up   1.0  1.0
-7 0.45479  host mostha1
 5    hdd  0.45479  osd.5 up   1.0  1.0
-3 0.59128  host mostha2
 1    hdd  0.22739  osd.1 up   1.0  1.0
 3    hdd  0.

[ceph-users] Re: [EXTERNAL] [Pacific] ceph orch device ls do not returns any HDD

2023-10-11 Thread Patrick Begou


Hi Eugen,

first many thanks for the time spent on this problem.

"ceph osd purge 2 --force --yes-i-really-mean-it" works and clean all 
the bas status.


*[root@mostha1 ~]# cephadm shell
*Inferring fsid 250f9864-0142-11ee-8e5f-00266cf8869c
Using recent ceph image 
quay.io/ceph/ceph@sha256:f30bf50755d7087f47c6223e6a921caf5b12e86401b3d49220230c84a8302a1e

*
*
*[ceph: root@mostha1 /]# ceph osd purge 2 --force --yes-i-really-mean-it *
purged osd.2
*
*
*[ceph: root@mostha1 /]# ceph osd tree*
ID  CLASS  WEIGHT   TYPE NAME STATUS  REWEIGHT  PRI-AFF
-1 1.72823  root default
-5 0.45477  host dean
 0    hdd  0.22739  osd.0 up   1.0  1.0
 4    hdd  0.22739  osd.4 up   1.0  1.0
-9 0.22739  host ekman
 6    hdd  0.22739  osd.6 up   1.0  1.0
-7 0.45479  host mostha1
 5    hdd  0.45479  osd.5 up   1.0  1.0
-3 0.59128  host mostha2
 1    hdd  0.22739  osd.1 up   1.0  1.0
 3    hdd  0.36389  osd.3 up   1.0  1.0
*
*
*[ceph: root@mostha1 /]# lsblk*
NAME MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda 8:0    1 232.9G  0 disk
|-sda1 8:1    1   3.9G  0 part /rootfs/boot
|-sda2 8:2    1   3.9G  0 part [SWAP]
`-sda3 8:3    1   225G  0 part
|-al8vg-rootvol 253:0    0  48.8G  0 lvm  /rootfs
|-al8vg-homevol 253:2    0   9.8G  0 lvm  /rootfs/home
|-al8vg-tmpvol 253:3    0   9.8G  0 lvm  /rootfs/tmp
`-al8vg-varvol 253:4    0  19.8G  0 lvm  /rootfs/var
sdb 8:16   1 465.8G  0 disk
`-ceph--08827fdc--136e--4070--97e9--e5e8b3970766-osd--block--7dec1808--d6f4--4f90--ac74--75a4346e1df5 
253:1    0 465.8G  0 lvm

sdc 8:32   1 232.9G  0 disk

"cephadm ceph-volume inventory" returns nothing:

*[root@mostha1 ~]# cephadm ceph-volume inventory **
*Inferring fsid 250f9864-0142-11ee-8e5f-00266cf8869c
Using recent ceph image 
quay.io/ceph/ceph@sha256:f30bf50755d7087f47c6223e6a921caf5b12e86401b3d49220230c84a8302a1e


Device Path   Size Device nodes    rotates available 
Model name


[root@mostha1 ~]#

But running the same command within cephadm 15.2.17 works:

*[root@mostha1 ~]# cephadm --image 93146564743f ceph-volume inventory*
Inferring fsid 250f9864-0142-11ee-8e5f-00266cf8869c

Device Path   Size rotates available Model name
/dev/sdc  232.83 GB    True    True  SAMSUNG HE253GJ
/dev/sda  232.83 GB    True    False SAMSUNG HE253GJ
/dev/sdb  465.76 GB    True    False WDC WD5003ABYX-1

[root@mostha1 ~]#

*[root@mostha1 ~]# podman images -a**
*REPOSITORY    TAG IMAGE ID CREATED    SIZE
quay.io/ceph/ceph v16.2.14    f13d80acdbb5  2 weeks 
ago    1.21 GB
quay.io/ceph/ceph v15.2.17    93146564743f  14 months 
ago  1.24 GB




Patrick

Le 11/10/2023 à 15:14, Eugen Block a écrit :
Your response is a bit confusing since it seems to be mixed up with 
the previous answer. So you still need to remove the OSD properly, so 
purge it from the crush tree:


ceph osd purge 2 --force --yes-i-really-mean-it (only in a test cluster!)

If everything is clean (OSD has been removed, disk has been zapped, 
lsblk shows no LVs for that disk) you can check the inventory:


cephadm ceph-volume inventory

Please also add the output of 'ceph orch ls osd --export'.

Zitat von Patrick Begou :


Hi Eugen,

- the OS is Alma Linux 8 with latests updates.

- this morning I've worked with ceph-volume but it ends with a 
strange final state. I was connected on host mostha1 where /dev/sdc 
was not reconized. These are the steps followed based on the 
ceph-volume documentation I've read:

[root@mostha1 ~]# cephadm shell
[ceph: root@mostha1 /]# ceph auth get client.bootstrap-osd > 
/var/lib/ceph/bootstrap-osd/ceph.keyring
[ceph: root@mostha1 /]# ceph-volume lvm prepare --bluestore --data 
/dev/sdc


Now lsblk command shows sdc as an osd:

sdb 8:16   1 465.8G  0 disk
`-ceph--08827fdc--136e--4070--97e9--e5e8b3970766-osd--block--7dec1808--d6f4--4f90--ac74--75a4346e1df5 
253:1    0 465.8G  0 lvm

sdc 8:32   1 232.9G  0 disk
`-ceph--b27d7a07--278d--4ee2--b84e--53256ef8de4c-osd--block--45c8e92c--caf9--4fe7--9a42--7b45a0794632 
253:5    0 232.8G  0 lvm


Then I've tried to activate this osd but it fails as in podman I have 
not access to systemctl:


[ceph: root@mostha1 /]# ceph-volume lvm activate 2 
45c8e92c-caf9-4fe7-9a42-7b45a0794632

.
Running command: /usr/bin/systemctl start ceph-osd@2
 stderr: Failed to connect to bus: No such file or directory
-->  RuntimeError: command returned non-zero exit status: 1
[ceph: root@mostha1 /]# ceph osd tree

And now I have now I have a strange status for this osd.2:

[ceph: root@mostha1 /]# ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME STATUS  REWEIGHT  PRI-AFF
-1 1.72823  root default
-5 0.45477  host dean
 0    hdd  0.22739

[ceph-users] Re: [EXTERNAL] [Pacific] ceph orch device ls do not returns any HDD

2023-10-11 Thread Patrick Begou


Hi Eugen,

sorry for posting twice, my zimbra server returns an error at the first 
attempt.


My initial problem is that ceph cannot detect these HDD since Pacific.
So I have deployed  Octopus, where "ceph orch apply osd 
--all-available-devices" works fine and then upgraded to Pacific.
But during the upgrate, 2 OSD went to "out" and "down" and I'm looking 
for a solution to manually re-integrate these 2 HDD in the cluster as 
Pacific is not able to do this automatically with "ceph orch..."  like 
Octopus.
But it is a test cluster to understand and get basic knowledge of Ceph  
(and I'm allowed to break everything).


Patrick


Le 11/10/2023 à 14:35, Eugen Block a écrit :
Don't use ceph-volume manually to deploy OSDs if your cluster is 
managed by cephadm. I just wanted to point out that you hadn't wiped 
the disk properly to be able to re-use it. Let the orchestrator handle 
the OSD creation and activation. I recommend to remove the OSD again, 
wipe it properly (cephadm ceph-volume lvm zap --destroy /dev/sdc) and 
then let the orchestrator add it as an OSD. Depending on your 
drivegroup configuration it will happen automatically (if 
"all-available-devices" is enabled or your osd specs are already 
applied). If it doesn't happen automatically, deploy it with 'ceph 
orch daemon add osd **:**' [1].


[1] https://docs.ceph.com/en/quincy/cephadm/services/osd/#deploy-osds

Zitat von Patrick Begou :


Hi Eugen,

- the OS is Alma Linux 8 with latests updates.

- this morning I've worked with ceph-volume but it ends with a 
strange final state. I was connected on host mostha1 where /dev/sdc 
was not reconized. These are the steps followed based on the 
ceph-volume documentation I've read:


   *[root@mostha1 ~]# cephadm shell**
   **[ceph: root@mostha1 /]# ceph auth get client.bootstrap-osd >
   /var/lib/ceph/bootstrap-osd/ceph.keyring**
   **[ceph: root@mostha1 /]# ceph-volume lvm prepare --bluestore --data
   /dev/sdc**
   *
   *[ceph: root@mostha1 /]# ceph-volume lvm list


   == osd.2 ===

   *  [block]
/dev/ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c/osd-block-45c8e92c-caf9-4fe7-9a42-7b45a0794632

      block device
/dev/ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c/osd-block-45c8e92c-caf9-4fe7-9a42-7b45a0794632
      block uuid Pq0XeH-LJct-t4yH-f56F-d5jk-JzGQ-zITfhE
      cephx lockbox secret
      cluster fsid 250f9864-0142-11ee-8e5f-00266cf8869c
      cluster name  ceph
      crush device class
      encrypted 0
   *osd fsid 45c8e92c-caf9-4fe7-9a42-7b45a0794632*
      osd id    2
      osdspec affinity
      type  block
      vdo   0
   *  devices   /dev/sdc

   *


Now lsblk command shows sdc as an osd:

sdb 8:16   1 465.8G  0 disk
`-ceph--08827fdc--136e--4070--97e9--e5e8b3970766-osd--block--7dec1808--d6f4--4f90--ac74--75a4346e1df5 
253:1    0 465.8G  0 lvm

*sdc 8:32   1 232.9G  0 disk **
**`-ceph--b27d7a07--278d--4ee2--b84e--53256ef8de4c-osd--block--45c8e92c--caf9--4fe7--9a42--7b45a0794632 
253:5    0 232.8G  0 lvm **

*

But this osd.2 is "down" and "out" with a strange status (no related 
cluster host, no weight) and I cannot activate it as within the 
podman container systemctl is not working.


   [ceph: root@mostha1 /]# ceph osd tree
   ID  CLASS  WEIGHT   TYPE NAME STATUS  REWEIGHT PRI-AFF
   -1 1.72823  root default
   -5 0.45477  host dean
     0    hdd  0.22739  osd.0 up   1.0 1.0
     4    hdd  0.22739  osd.4 up   1.0 1.0
   -9 0.22739  host ekman
     6    hdd  0.22739  osd.6 up   1.0 1.0
   -7 0.45479  host mostha1
     5    hdd  0.45479  osd.5 up   1.0 1.0
   -3 0.59128  host mostha2
     1    hdd  0.22739  osd.1 up   1.0 1.0
     3    hdd  0.36389  osd.3 up   1.0 1.0
   *2   0  osd.2   down 0 1.0*

My attempt to activate the osd:

[ceph: root@mostha1 /]# ceph-volume lvm activate 2 
45c8e92c-caf9-4fe7-9a42-7b45a0794632

Running command: /usr/bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-2
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-2
Running command: /usr/bin/ceph-bluestore-tool --cluster=ceph 
prime-osd-dir --dev 
/dev/ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c/osd-block-45c8e92c-caf9-4fe7-9a42-7b45a0794632 
--path /var/lib/ceph/osd/ceph-2 --no-mon-config
Running command: /usr/bin/ln -snf 
/dev/ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c/osd-block-45c8e92c-caf9-4fe7-9a42-7b45a0794632 
/var/lib/ceph/osd/ceph-2/block
Running command: /usr/bin/chown -h ceph:ceph 
/var/lib/ceph/osd/ceph-2/block

Running command: /usr/bin/chown -R ceph:ceph /dev/dm-1
Running command: /usr/bin/chown -R ceph:ceph /var/li

[ceph-users] Re: [EXTERNAL] [Pacific] ceph orch device ls do not returns any HDD

2023-10-11 Thread Patrick Begou

   0.59128  host mostha2
 1    hdd  0.22739  osd.1 up   1.0  1.0
 3    hdd  0.36389  osd.3 up   1.0  1.0
 2   0  osd.2   down 0  1.0
*
*
*[ceph: root@mostha1 /]# lsblk**
*NAME MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda 8:0    1 232.9G  0 disk
|-sda1 8:1    1   3.9G  0 part /rootfs/boot
|-sda2 8:2    1   3.9G  0 part [SWAP]
`-sda3 8:3    1   225G  0 part
|-al8vg-rootvol 253:0    0  48.8G  0 lvm  /rootfs
|-al8vg-homevol 253:3    0   9.8G  0 lvm  /rootfs/home
|-al8vg-tmpvol 253:4    0   9.8G  0 lvm  /rootfs/tmp
`-al8vg-varvol 253:5    0  19.8G  0 lvm  /rootfs/var
sdb 8:16   1 465.8G  0 disk
`-ceph--08827fdc--136e--4070--97e9--e5e8b3970766-osd--block--7dec1808--d6f4--4f90--ac74--75a4346e1df5 
253:2    0 465.8G  0 lvm

*sdc *

Patrick
Le 11/10/2023 à 11:00, Eugen Block a écrit :

Hi,

just wondering if 'ceph-volume lvm zap --destroy /dev/sdc' would help 
here. From your previous output you didn't specify the --destroy flag.
Which cephadm version is installed on the host? Did you also upgrade 
the OS when moving to Pacific? (Sorry if I missed that.



Zitat von Patrick Begou :


Le 02/10/2023 à 18:22, Patrick Bégou a écrit :

Hi all,

still stuck with this problem.

I've deployed octopus and all my HDD have been setup as osd. Fine.
I've upgraded to pacific and 2 osd have failed. They have been 
automatically removed and upgrade finishes. Cluster Health is finaly 
OK, no data loss.


But now I cannot re-add these osd with pacific (I had previous 
troubles on these old HDDs, lost one osd in octopus and was able to 
reset and re-add it).


I've tried manually to add the first osd on the node where it is 
located, following 
https://docs.ceph.com/en/pacific/rados/operations/bluestore-migration/ 
(not sure it's the best idea...) but it fails too. This node was the 
one used for deploying the cluster.


[ceph: root@mostha1 /]# *ceph-volume lvm zap /dev/sdc*
--> Zapping: /dev/sdc
--> --destroy was not specified, but zapping a whole device will 
remove the partition table
Running command: /usr/bin/dd if=/dev/zero of=/dev/sdc bs=1M count=10 
conv=fsync

 stderr: 10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.663425 s, 15.8 MB/s
--> Zapping successful for: 


[ceph: root@mostha1 /]# *ceph-volume lvm create --bluestore --data 
/dev/sdc*

Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name 
client.bootstrap-osd --keyring 
/var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 
9f1eb8ee-41e6-4350-ad73-1be21234ec7c
 stderr: 2023-10-02T16:09:29.855+ 7fb4eb8c0700 -1 auth: unable 
to find a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) 
No such file or directory
 stderr: 2023-10-02T16:09:29.855+ 7fb4eb8c0700 -1 
AuthRegistry(0x7fb4e405c4d8) no keyring found at 
/var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
 stderr: 2023-10-02T16:09:29.856+ 7fb4eb8c0700 -1 auth: unable 
to find a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) 
No such file or directory
 stderr: 2023-10-02T16:09:29.856+ 7fb4eb8c0700 -1 
AuthRegistry(0x7fb4e40601d0) no keyring found at 
/var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
 stderr: 2023-10-02T16:09:29.857+ 7fb4eb8c0700 -1 auth: unable 
to find a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) 
No such file or directory
 stderr: 2023-10-02T16:09:29.857+ 7fb4eb8c0700 -1 
AuthRegistry(0x7fb4eb8bee90) no keyring found at 
/var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
 stderr: 2023-10-02T16:09:29.858+ 7fb4e965c700 -1 
monclient(hunting): handle_auth_bad_method server allowed_methods 
[2] but i only support [1]
 stderr: 2023-10-02T16:09:29.858+ 7fb4e9e5d700 -1 
monclient(hunting): handle_auth_bad_method server allowed_methods 
[2] but i only support [1]
 stderr: 2023-10-02T16:09:29.858+ 7fb4e8e5b700 -1 
monclient(hunting): handle_auth_bad_method server allowed_methods 
[2] but i only support [1]
 stderr: 2023-10-02T16:09:29.858+ 7fb4eb8c0700 -1 monclient: 
authenticate NOTE: no keyring found; disabled cephx authentication
 stderr: [errno 13] RADOS permission denied (error connecting to the 
cluster)

-->  RuntimeError: Unable to create a new OSD id

Any idea of what is wrong ?

Thanks

Patrick
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



I'm still trying to understand what can be wrong or how to debug this 
situation where Ceph cannot see the devices.


The device :dev/sdc exists:

   [root@mostha1 ~]# cephadm shell lsmcli ldl
   Inferring fsid 250f9864-0142-11ee-8e5f-00266cf8869c
   Using recent ceph image
quay.io/ceph/ceph@sha256:f30bf50755d7087f47c6223e6a921caf5b12e86401b3d49220230c84a8302a1e
   Path | SCSI VPD 0x83    | Link Type | Serial Number   | Health
   Status
-

[ceph-users] Re: [EXTERNAL] [Pacific] ceph orch device ls do not returns any HDD

2023-10-11 Thread Patrick Begou


Hi Eugen,

- the OS is Alma Linux 8 with latests updates.

- this morning I've worked with ceph-volume but it ends with a strange 
final state. I was connected on host mostha1 where /dev/sdc was not 
reconized. These are the steps followed based on the ceph-volume 
documentation I've read:


   *[root@mostha1 ~]# cephadm shell**
   **[ceph: root@mostha1 /]# ceph auth get client.bootstrap-osd >
   /var/lib/ceph/bootstrap-osd/ceph.keyring**
   **[ceph: root@mostha1 /]# ceph-volume lvm prepare --bluestore --data
   /dev/sdc**
   *
   *[ceph: root@mostha1 /]# ceph-volume lvm list


   == osd.2 ===

   *  [block]
   
/dev/ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c/osd-block-45c8e92c-caf9-4fe7-9a42-7b45a0794632

  block device
   
/dev/ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c/osd-block-45c8e92c-caf9-4fe7-9a42-7b45a0794632
  block uuid Pq0XeH-LJct-t4yH-f56F-d5jk-JzGQ-zITfhE
  cephx lockbox secret
  cluster fsid 250f9864-0142-11ee-8e5f-00266cf8869c
  cluster name  ceph
  crush device class
  encrypted 0
   *osd fsid 45c8e92c-caf9-4fe7-9a42-7b45a0794632*
  osd id    2
  osdspec affinity
  type  block
  vdo   0
   *  devices   /dev/sdc

   *


Now lsblk command shows sdc as an osd:

sdb 8:16   1 465.8G  0 disk
`-ceph--08827fdc--136e--4070--97e9--e5e8b3970766-osd--block--7dec1808--d6f4--4f90--ac74--75a4346e1df5 
253:1    0 465.8G  0 lvm

*sdc 8:32   1 232.9G  0 disk **
**`-ceph--b27d7a07--278d--4ee2--b84e--53256ef8de4c-osd--block--45c8e92c--caf9--4fe7--9a42--7b45a0794632 
253:5    0 232.8G  0 lvm **

*

But this osd.2 is "down" and "out" with a strange status (no related 
cluster host, no weight) and I cannot activate it as within the 
podman container systemctl is not working.


   [ceph: root@mostha1 /]# ceph osd tree
   ID  CLASS  WEIGHT   TYPE NAME STATUS  REWEIGHT  PRI-AFF
   -1 1.72823  root default
   -5 0.45477  host dean
 0    hdd  0.22739  osd.0 up   1.0  1.0
 4    hdd  0.22739  osd.4 up   1.0  1.0
   -9 0.22739  host ekman
 6    hdd  0.22739  osd.6 up   1.0  1.0
   -7 0.45479  host mostha1
 5    hdd  0.45479  osd.5 up   1.0  1.0
   -3 0.59128  host mostha2
 1    hdd  0.22739  osd.1 up   1.0  1.0
 3    hdd  0.36389  osd.3 up   1.0  1.0
   *2   0  osd.2   down 0 1.0*

My attempt to activate the osd:

[ceph: root@mostha1 /]# ceph-volume lvm activate 2 
45c8e92c-caf9-4fe7-9a42-7b45a0794632

Running command: /usr/bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-2
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-2
Running command: /usr/bin/ceph-bluestore-tool --cluster=ceph 
prime-osd-dir --dev 
/dev/ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c/osd-block-45c8e92c-caf9-4fe7-9a42-7b45a0794632 
--path /var/lib/ceph/osd/ceph-2 --no-mon-config
Running command: /usr/bin/ln -snf 
/dev/ceph-b27d7a07-278d-4ee2-b84e-53256ef8de4c/osd-block-45c8e92c-caf9-4fe7-9a42-7b45a0794632 
/var/lib/ceph/osd/ceph-2/block

Running command: /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-2/block
Running command: /usr/bin/chown -R ceph:ceph /dev/dm-1
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-2
Running command: /usr/bin/systemctl enable 
ceph-volume@lvm-2-45c8e92c-caf9-4fe7-9a42-7b45a0794632
 stderr: Created symlink 
/etc/systemd/system/multi-user.target.wants/ceph-volume@lvm-2-45c8e92c-caf9-4fe7-9a42-7b45a0794632.service 
-> /usr/lib/systemd/system/ceph-volume@.service.

Running command: /usr/bin/systemctl enable --runtime ceph-osd@2
 stderr: Created symlink 
/run/systemd/system/ceph-osd.target.wants/ceph-osd@2.service -> 
/usr/lib/systemd/system/ceph-osd@.service.

Running command: /usr/bin/systemctl start ceph-osd@2
 stderr: Failed to connect to bus: No such file or directory
-->  RuntimeError: command returned non-zero exit status: 1

Patrick


Le 11/10/2023 à 11:00, Eugen Block a écrit :

Hi,

just wondering if 'ceph-volume lvm zap --destroy /dev/sdc' would help 
here. From your previous output you didn't specify the --destroy flag.
Which cephadm version is installed on the host? Did you also upgrade 
the OS when moving to Pacific? (Sorry if I missed that.



Zitat von Patrick Begou :


Le 02/10/2023 à 18:22, Patrick Bégou a écrit :

Hi all,

still stuck with this problem.

I've deployed octopus and all my HDD have been setup as osd. Fine.
I've upgraded to pacific and 2 osd have failed. They have been 
automatically removed and upgrade finishes. Cluster Health is finaly 
OK, no data loss.


But now I cannot re-add these osd with pacific (I had previous 
troubles on these old HDDs, lost

[ceph-users] Re: [EXTERNAL] [Pacific] ceph orch device ls do not returns any HDD

2023-10-11 Thread Patrick Begou


Le 02/10/2023 à 18:22, Patrick Bégou a écrit :

Hi all,

still stuck with this problem.

I've deployed octopus and all my HDD have been setup as osd. Fine.
I've upgraded to pacific and 2 osd have failed. They have been 
automatically removed and upgrade finishes. Cluster Health is finaly 
OK, no data loss.


But now I cannot re-add these osd with pacific (I had previous 
troubles on these old HDDs, lost one osd in octopus and was able to 
reset and re-add it).


I've tried manually to add the first osd on the node where it is 
located, following 
https://docs.ceph.com/en/pacific/rados/operations/bluestore-migration/ 
(not sure it's the best idea...) but it fails too. This node was the 
one used for deploying the cluster.


[ceph: root@mostha1 /]# *ceph-volume lvm zap /dev/sdc*
--> Zapping: /dev/sdc
--> --destroy was not specified, but zapping a whole device will 
remove the partition table
Running command: /usr/bin/dd if=/dev/zero of=/dev/sdc bs=1M count=10 
conv=fsync

 stderr: 10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.663425 s, 15.8 MB/s
--> Zapping successful for: 


[ceph: root@mostha1 /]# *ceph-volume lvm create --bluestore --data 
/dev/sdc*

Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name 
client.bootstrap-osd --keyring 
/var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 
9f1eb8ee-41e6-4350-ad73-1be21234ec7c
 stderr: 2023-10-02T16:09:29.855+ 7fb4eb8c0700 -1 auth: unable to 
find a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No 
such file or directory
 stderr: 2023-10-02T16:09:29.855+ 7fb4eb8c0700 -1 
AuthRegistry(0x7fb4e405c4d8) no keyring found at 
/var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
 stderr: 2023-10-02T16:09:29.856+ 7fb4eb8c0700 -1 auth: unable to 
find a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No 
such file or directory
 stderr: 2023-10-02T16:09:29.856+ 7fb4eb8c0700 -1 
AuthRegistry(0x7fb4e40601d0) no keyring found at 
/var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
 stderr: 2023-10-02T16:09:29.857+ 7fb4eb8c0700 -1 auth: unable to 
find a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No 
such file or directory
 stderr: 2023-10-02T16:09:29.857+ 7fb4eb8c0700 -1 
AuthRegistry(0x7fb4eb8bee90) no keyring found at 
/var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
 stderr: 2023-10-02T16:09:29.858+ 7fb4e965c700 -1 
monclient(hunting): handle_auth_bad_method server allowed_methods [2] 
but i only support [1]
 stderr: 2023-10-02T16:09:29.858+ 7fb4e9e5d700 -1 
monclient(hunting): handle_auth_bad_method server allowed_methods [2] 
but i only support [1]
 stderr: 2023-10-02T16:09:29.858+ 7fb4e8e5b700 -1 
monclient(hunting): handle_auth_bad_method server allowed_methods [2] 
but i only support [1]
 stderr: 2023-10-02T16:09:29.858+ 7fb4eb8c0700 -1 monclient: 
authenticate NOTE: no keyring found; disabled cephx authentication
 stderr: [errno 13] RADOS permission denied (error connecting to the 
cluster)

-->  RuntimeError: Unable to create a new OSD id

Any idea of what is wrong ?

Thanks

Patrick
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



I'm still trying to understand what can be wrong or how to debug this 
situation where Ceph cannot see the devices.


The device :dev/sdc exists:

   [root@mostha1 ~]# cephadm shell lsmcli ldl
   Inferring fsid 250f9864-0142-11ee-8e5f-00266cf8869c
   Using recent ceph image
   
quay.io/ceph/ceph@sha256:f30bf50755d7087f47c6223e6a921caf5b12e86401b3d49220230c84a8302a1e
   Path | SCSI VPD 0x83    | Link Type | Serial Number   | Health
   Status
   -
   /dev/sda | 50024e92039e4f1c | PATA/SATA | S2B5J90ZA10142  | Good
   /dev/sdb | 50014ee0ad5953c9 | PATA/SATA | WD-WMAYP0982329 | Good
   /dev/sdc | 50024e920387fa2c | PATA/SATA | S2B5J90ZA02494  | Good

But I cannot do anything with it:

   [root@mostha1 ~]# cephadm shell ceph orch device zap
   mostha1.legi.grenoble-inp.fr /dev/sdc --force
   Inferring fsid 250f9864-0142-11ee-8e5f-00266cf8869c
   Using recent ceph image
   
quay.io/ceph/ceph@sha256:f30bf50755d7087f47c6223e6a921caf5b12e86401b3d49220230c84a8302a1e
   Error EINVAL: Device path '/dev/sdc' not found on host
   'mostha1.legi.grenoble-inp.fr'

Since I moved from octopus to Pacific.

Patrick
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: After power outage, osd do not restart

2023-09-21 Thread Patrick Begou


Hi Eneko,

I do not work on the ceph cluster since my last email (making some user 
support) and now the osd.2 is back in the cluster:


 -7 0.68217  host mostha1
  2    hdd  0.22739  osd.2   up   1.0  1.0
  5    hdd  0.45479  osd.5   up   1.0  1.0

May be the reboot suggested by Igor ?

I will try to solve my last problem now. While upgrading from 15.2.13 to 
15.2.17 I hit a memory problem on one node (these are old computers used 
to learn Ceph).
Upgrading one of the osd fails and it locks the upgrade as Ceph did not 
accept to stop and upgrade next osd in the cluster. But Ceph start 
rebalancing the data and magicaly finishes the upgrade.
But a last osd is still down and out and it is a daemon problem as 
smartctl returns a good health for the HDD.
I've changed the faulty memory dims and the node is back in the cluster. 
So this is my new challenge 


Using old material (2011) for learning seams fine to investigate Ceph 
reliability as many problems can raise up  but at no risks!


Patrick



Le 21/09/2023 à 16:31, Eneko Lacunza a écrit :

Hi Patrick,

It seems your disk or controller are damaged. Are other disks 
connected to the same controller working ok? If so, I'd say disk is dead.


Cheers

El 21/9/23 a las 16:17, Patrick Begou escribió:

Hi Igor,

a "systemctl reset-failed" doesn't restart the osd.

I reboot the node and now it show some error on the HDD:

[  107.716769] ata3.00: exception Emask 0x0 SAct 0x80 SErr 0x0 
action 0x0

[  107.716782] ata3.00: irq_stat 0x4008
[  107.716787] ata3.00: failed command: READ FPDMA QUEUED
[  107.716791] ata3.00: cmd 60/00:b8:00:a8:08/08:00:0e:00:00/40 tag 
23 ncq dma 1048576 in
    res 41/40:00:c2:ad:08/00:00:0e:00:00/40 Emask 
0x409 (media error) 

[  107.716802] ata3.00: status: { DRDY ERR }
[  107.716806] ata3.00: error: { UNC }
[  107.728547] ata3.00: configured for UDMA/133
[  107.728575] sd 2:0:0:0: [sda] tag#23 FAILED Result: 
hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=1s
[  107.728581] sd 2:0:0:0: [sda] tag#23 Sense Key : Medium Error 
[current]
[  107.728585] sd 2:0:0:0: [sda] tag#23 Add. Sense: Unrecovered read 
error - auto reallocate failed
[  107.728590] sd 2:0:0:0: [sda] tag#23 CDB: Read(10) 28 00 0e 08 a8 
00 00 08 00 00
[  107.728592] I/O error, dev sda, sector 235449794 op 0x0:(READ) 
flags 0x80700 phys_seg 6 prio class 2

[  107.728623] ata3: EH complete
[  109.203256] ata3.00: exception Emask 0x0 SAct 0x2000 SErr 0x0 
action 0x0

[  109.203268] ata3.00: irq_stat 0x4008
[  109.203274] ata3.00: failed command: READ FPDMA QUEUED
[  109.203277] ata3.00: cmd 60/08:e8:48:ad:08/00:00:0e:00:00/40 tag 
29 ncq dma 4096 in
    res 41/40:00:48:ad:08/00:00:0e:00:00/40 Emask 
0x409 (media error) 

[  109.203289] ata3.00: status: { DRDY ERR }
[  109.203292] ata3.00: error: { UNC }



I think the storage is corrupted and I have te reset it all.

Patrick


Le 21/09/2023 à 13:32, Igor Fedotov a écrit :


May be execute systemctl reset-failed <...> or even restart the node?


On 21/09/2023 14:26, Patrick Begou wrote:

Hi Igor,

the ceph-osd.2.log remains empty on the node where this osd is 
located. This is what I get when manualy restarting the osd.


[root@mostha1 250f9864-0142-11ee-8e5f-00266cf8869c]# systemctl 
restart ceph-250f9864-0142-11ee-8e5f-00266cf8869c@osd.2.service
Job for ceph-250f9864-0142-11ee-8e5f-00266cf8869c@osd.2.service 
failed because a timeout was exceeded.
See "systemctl status 
ceph-250f9864-0142-11ee-8e5f-00266cf8869c@osd.2.service" and 
"journalctl -xe" for details.

[root@mostha1 250f9864-0142-11ee-8e5f-00266cf8869c]# journalctl -xe
sept. 21 13:22:39 mostha1.legi.grenoble-inp.fr systemd[1]: 
ceph-250f9864-0142-11ee-8e5f-00266cf8869c@osd.2.service: Found 
left-over process 5728 (podman) in control group while starting 
unit. Ignoring.
sept. 21 13:22:39 mostha1.legi.grenoble-inp.fr systemd[1]: This 
usually indicates unclean termination of a previous run, or service 
implementation deficiencies.
sept. 21 13:22:39 mostha1.legi.grenoble-inp.fr systemd[1]: 
ceph-250f9864-0142-11ee-8e5f-00266cf8869c@osd.2.service: Found 
left-over process 5882 (bash) in control group while starting unit. 
Ignoring.
sept. 21 13:22:39 mostha1.legi.grenoble-inp.fr systemd[1]: This 
usually indicates unclean termination of a previous run, or service 
implementation deficiencies.
sept. 21 13:22:39 mostha1.legi.grenoble-inp.fr systemd[1]: 
ceph-250f9864-0142-11ee-8e5f-00266cf8869c@osd.2.service: Found 
left-over process 5884 (podman) in control group while starting 
unit. Ignoring.
sept. 21 13:22:39 mostha1.legi.grenoble-inp.fr systemd[1]: This 
usually indicates unclean termination of a previous run, or service 
implementation deficiencies.
sept. 21 13:22:39 mostha1.legi.grenoble-inp.fr systemd[1]: 
ceph-250f9864-0142-11ee-8e5f-00266cf8869c@osd.2.service: Found 
left-over process 6031 (bash) in control

[ceph-users] Re: After power outage, osd do not restart

2023-09-21 Thread Patrick Begou


Hi Igor,

a "systemctl reset-failed" doesn't restart the osd.

I reboot the node and now it show some error on the HDD:

[  107.716769] ata3.00: exception Emask 0x0 SAct 0x80 SErr 0x0 
action 0x0

[  107.716782] ata3.00: irq_stat 0x4008
[  107.716787] ata3.00: failed command: READ FPDMA QUEUED
[  107.716791] ata3.00: cmd 60/00:b8:00:a8:08/08:00:0e:00:00/40 tag 23 
ncq dma 1048576 in
    res 41/40:00:c2:ad:08/00:00:0e:00:00/40 Emask 
0x409 (media error) 

[  107.716802] ata3.00: status: { DRDY ERR }
[  107.716806] ata3.00: error: { UNC }
[  107.728547] ata3.00: configured for UDMA/133
[  107.728575] sd 2:0:0:0: [sda] tag#23 FAILED Result: hostbyte=DID_OK 
driverbyte=DRIVER_OK cmd_age=1s

[  107.728581] sd 2:0:0:0: [sda] tag#23 Sense Key : Medium Error [current]
[  107.728585] sd 2:0:0:0: [sda] tag#23 Add. Sense: Unrecovered read 
error - auto reallocate failed
[  107.728590] sd 2:0:0:0: [sda] tag#23 CDB: Read(10) 28 00 0e 08 a8 00 
00 08 00 00
[  107.728592] I/O error, dev sda, sector 235449794 op 0x0:(READ) flags 
0x80700 phys_seg 6 prio class 2

[  107.728623] ata3: EH complete
[  109.203256] ata3.00: exception Emask 0x0 SAct 0x2000 SErr 0x0 
action 0x0

[  109.203268] ata3.00: irq_stat 0x4008
[  109.203274] ata3.00: failed command: READ FPDMA QUEUED
[  109.203277] ata3.00: cmd 60/08:e8:48:ad:08/00:00:0e:00:00/40 tag 29 
ncq dma 4096 in
    res 41/40:00:48:ad:08/00:00:0e:00:00/40 Emask 
0x409 (media error) 

[  109.203289] ata3.00: status: { DRDY ERR }
[  109.203292] ata3.00: error: { UNC }



I think the storage is corrupted and I have te reset it all.

Patrick


Le 21/09/2023 à 13:32, Igor Fedotov a écrit :


May be execute systemctl reset-failed <...> or even restart the node?


On 21/09/2023 14:26, Patrick Begou wrote:

Hi Igor,

the ceph-osd.2.log remains empty on the node where this osd is 
located. This is what I get when manualy restarting the osd.


[root@mostha1 250f9864-0142-11ee-8e5f-00266cf8869c]# systemctl 
restart ceph-250f9864-0142-11ee-8e5f-00266cf8869c@osd.2.service
Job for ceph-250f9864-0142-11ee-8e5f-00266cf8869c@osd.2.service 
failed because a timeout was exceeded.
See "systemctl status 
ceph-250f9864-0142-11ee-8e5f-00266cf8869c@osd.2.service" and 
"journalctl -xe" for details.

[root@mostha1 250f9864-0142-11ee-8e5f-00266cf8869c]# journalctl -xe
sept. 21 13:22:39 mostha1.legi.grenoble-inp.fr systemd[1]: 
ceph-250f9864-0142-11ee-8e5f-00266cf8869c@osd.2.service: Found 
left-over process 5728 (podman) in control group while starting unit. 
Ignoring.
sept. 21 13:22:39 mostha1.legi.grenoble-inp.fr systemd[1]: This 
usually indicates unclean termination of a previous run, or service 
implementation deficiencies.
sept. 21 13:22:39 mostha1.legi.grenoble-inp.fr systemd[1]: 
ceph-250f9864-0142-11ee-8e5f-00266cf8869c@osd.2.service: Found 
left-over process 5882 (bash) in control group while starting unit. 
Ignoring.
sept. 21 13:22:39 mostha1.legi.grenoble-inp.fr systemd[1]: This 
usually indicates unclean termination of a previous run, or service 
implementation deficiencies.
sept. 21 13:22:39 mostha1.legi.grenoble-inp.fr systemd[1]: 
ceph-250f9864-0142-11ee-8e5f-00266cf8869c@osd.2.service: Found 
left-over process 5884 (podman) in control group while starting unit. 
Ignoring.
sept. 21 13:22:39 mostha1.legi.grenoble-inp.fr systemd[1]: This 
usually indicates unclean termination of a previous run, or service 
implementation deficiencies.
sept. 21 13:22:39 mostha1.legi.grenoble-inp.fr systemd[1]: 
ceph-250f9864-0142-11ee-8e5f-00266cf8869c@osd.2.service: Found 
left-over process 6031 (bash) in control group while starting unit. 
Ignoring.
sept. 21 13:22:39 mostha1.legi.grenoble-inp.fr systemd[1]: This 
usually indicates unclean termination of a previous run, or service 
implementation deficiencies.
sept. 21 13:22:39 mostha1.legi.grenoble-inp.fr systemd[1]: 
ceph-250f9864-0142-11ee-8e5f-00266cf8869c@osd.2.service: Found 
left-over process 6033 (podman) in control group while starting unit. 
Ignoring.
sept. 21 13:22:39 mostha1.legi.grenoble-inp.fr systemd[1]: This 
usually indicates unclean termination of a previous run, or service 
implementation deficiencies.
sept. 21 13:22:39 mostha1.legi.grenoble-inp.fr systemd[1]: 
ceph-250f9864-0142-11ee-8e5f-00266cf8869c@osd.2.service: Found 
left-over process 6185 (bash) in control group while starting unit. 
Ignoring.
sept. 21 13:22:39 mostha1.legi.grenoble-inp.fr systemd[1]: This 
usually indicates unclean termination of a previous run, or service 
implementation deficiencies.
sept. 21 13:22:39 mostha1.legi.grenoble-inp.fr systemd[1]: 
ceph-250f9864-0142-11ee-8e5f-00266cf8869c@osd.2.service: Found 
left-over process 6187 (podman) in control group while starting unit. 
Ignoring.
sept. 21 13:22:39 mostha1.legi.grenoble-inp.fr systemd[1]: This 
usually indicates unclean termination of a previous run, or service 
implementation deficiencies.
sept. 21 13:22:39 mo

[ceph-users] Re: After power outage, osd do not restart

2023-09-21 Thread Patrick Begou

legi.grenoble-inp.fr systemd[1]: 
ceph-250f9864-0142-11ee-8e5f-00266cf8869c@osd.2.service: Found left-over 
process 15171 (podman) in control group while starting unit. Ignoring.
sept. 21 13:22:39 mostha1.legi.grenoble-inp.fr systemd[1]: This usually 
indicates unclean termination of a previous run, or service 
implementation deficiencies.
sept. 21 13:22:39 mostha1.legi.grenoble-inp.fr systemd[1]: 
ceph-250f9864-0142-11ee-8e5f-00266cf8869c@osd.2.service: Found left-over 
process 15646 (bash) in control group while starting unit. Ignoring.
sept. 21 13:22:39 mostha1.legi.grenoble-inp.fr systemd[1]: This usually 
indicates unclean termination of a previous run, or service 
implementation deficiencies.
sept. 21 13:22:39 mostha1.legi.grenoble-inp.fr systemd[1]: 
ceph-250f9864-0142-11ee-8e5f-00266cf8869c@osd.2.service: Found left-over 
process 15648 (podman) in control group while starting unit. Ignoring.
sept. 21 13:22:39 mostha1.legi.grenoble-inp.fr systemd[1]: This usually 
indicates unclean termination of a previous run, or service 
implementation deficiencies.
sept. 21 13:22:39 mostha1.legi.grenoble-inp.fr systemd[1]: 
ceph-250f9864-0142-11ee-8e5f-00266cf8869c@osd.2.service: Found left-over 
process 15792 (bash) in control group while starting unit. Ignoring.
sept. 21 13:22:39 mostha1.legi.grenoble-inp.fr systemd[1]: This usually 
indicates unclean termination of a previous run, or service 
implementation deficiencies.
sept. 21 13:22:39 mostha1.legi.grenoble-inp.fr systemd[1]: 
ceph-250f9864-0142-11ee-8e5f-00266cf8869c@osd.2.service: Found left-over 
process 15794 (podman) in control group while starting unit. Ignoring.
sept. 21 13:22:39 mostha1.legi.grenoble-inp.fr systemd[1]: This usually 
indicates unclean termination of a previous run, or service 
implementation deficiencies.
sept. 21 13:22:39 mostha1.legi.grenoble-inp.fr systemd[1]: 
ceph-250f9864-0142-11ee-8e5f-00266cf8869c@osd.2.service: Found left-over 
process 25561 (bash) in control group while starting unit. Ignoring.
sept. 21 13:22:39 mostha1.legi.grenoble-inp.fr systemd[1]: This usually 
indicates unclean termination of a previous run, or service 
implementation deficiencies.
sept. 21 13:22:39 mostha1.legi.grenoble-inp.fr systemd[1]: 
ceph-250f9864-0142-11ee-8e5f-00266cf8869c@osd.2.service: Found left-over 
process 25563 (podman) in control group while starting unit. Ignoring.
sept. 21 13:22:39 mostha1.legi.grenoble-inp.fr systemd[1]: This usually 
indicates unclean termination of a previous run, or service 
implementation deficiencies.


Patrick

Le 21/09/2023 à 12:44, Igor Fedotov a écrit :

Hi Patrick,

please share osd restart log to investigate that.


Thanks,

Igor

On 21/09/2023 13:41, Patrick Begou wrote:

Hi,

After a power outage on my test ceph cluster, 2 osd fail to restart.  
The log file show:


8e5f-00266cf8869c@osd.2.service: Failed with result 'timeout'.
Sep 21 11:55:02 mostha1 systemd[1]: Failed to start Ceph osd.2 for 
250f9864-0142-11ee-8e5f-00266cf8869c.
Sep 21 11:55:12 mostha1 systemd[1]: 
ceph-250f9864-0142-11ee-8e5f-00266cf8869c@osd.2.service: Service 
RestartSec=10s expired, scheduling restart.
Sep 21 11:55:12 mostha1 systemd[1]: 
ceph-250f9864-0142-11ee-8e5f-00266cf8869c@osd.2.service: Scheduled 
restart job, restart counter is at 2.
Sep 21 11:55:12 mostha1 systemd[1]: Stopped Ceph osd.2 for 
250f9864-0142-11ee-8e5f-00266cf8869c.
Sep 21 11:55:12 mostha1 systemd[1]: 
ceph-250f9864-0142-11ee-8e5f-00266cf8869c@osd.2.service: Found 
left-over process 1858 (bash) in control group while starting unit. 
Ignoring.
Sep 21 11:55:12 mostha1 systemd[1]: This usually indicates unclean 
termination of a previous run, or service implementation deficiencies.
Sep 21 11:55:12 mostha1 systemd[1]: 
ceph-250f9864-0142-11ee-8e5f-00266cf8869c@osd.2.service: Found 
left-over process 2815 (podman) in control group while starting unit. 
Ignoring.


This is not critical as it is a test cluster and it is actually 
rebalancing on other osd but I would like to know how to return to 
HEALTH_OK status.


Smartctl show the HDD are OK.

So is there a way to recover the osd from this state ? Version is 
15.2.17 (juste moved from 15.2.13 to 15.2.17 yesterday, will try to 
move to latest versions as soon as this problem is solved)


Thanks

Patrick

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] After power outage, osd do not restart

2023-09-21 Thread Patrick Begou


Hi,

After a power outage on my test ceph cluster, 2 osd fail to restart.  
The log file show:


8e5f-00266cf8869c@osd.2.service: Failed with result 'timeout'.
Sep 21 11:55:02 mostha1 systemd[1]: Failed to start Ceph osd.2 for 
250f9864-0142-11ee-8e5f-00266cf8869c.
Sep 21 11:55:12 mostha1 systemd[1]: 
ceph-250f9864-0142-11ee-8e5f-00266cf8869c@osd.2.service: Service 
RestartSec=10s expired, scheduling restart.
Sep 21 11:55:12 mostha1 systemd[1]: 
ceph-250f9864-0142-11ee-8e5f-00266cf8869c@osd.2.service: Scheduled 
restart job, restart counter is at 2.
Sep 21 11:55:12 mostha1 systemd[1]: Stopped Ceph osd.2 for 
250f9864-0142-11ee-8e5f-00266cf8869c.
Sep 21 11:55:12 mostha1 systemd[1]: 
ceph-250f9864-0142-11ee-8e5f-00266cf8869c@osd.2.service: Found left-over 
process 1858 (bash) in control group while starting unit. Ignoring.
Sep 21 11:55:12 mostha1 systemd[1]: This usually indicates unclean 
termination of a previous run, or service implementation deficiencies.
Sep 21 11:55:12 mostha1 systemd[1]: 
ceph-250f9864-0142-11ee-8e5f-00266cf8869c@osd.2.service: Found left-over 
process 2815 (podman) in control group while starting unit. Ignoring.


This is not critical as it is a test cluster and it is actually 
rebalancing on other osd but I would like to know how to return to 
HEALTH_OK status.


Smartctl show the HDD are OK.

So is there a way to recover the osd from this state ? Version is 
15.2.17 (juste moved from 15.2.13 to 15.2.17 yesterday, will try to move 
to latest versions as soon as this problem is solved)


Thanks

Patrick

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: No snap_schedule module in Octopus

2023-09-20 Thread Patrick Begou


Hi Patrick,

I agree that learning Ceph today with Octopus is not a good idea, but, 
as a newbie with this tool, I was not able to solve the HDD detection 
problem and my post about it on this forum do not provide any help 
(https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/OPMWHJ4ZFCOOPUY6ST4WAJ4G4ASJFALM/). 
I've also looked for a list of new unsupported hardware  between Octopus 
and Pacific without success. I've also received a private mail from a 
Sweden user reading the forum last week and having the same HDD 
detection problem with 17.2.6. He was asking if I have solved it. He 
tells me he will try to debug.


In my mind, an old version of Ceph on an old material had more chances 
to be stable and bug free too.


Yes I have created file systems (datacfs) and I can create a snapshot by 
hand using cephadm. I've just tested:

# ceph fs set datacfs allow_new_snaps true
# ceph-fuse /mnt
# mkdir /mnt/.snap/$(TZ=CET date +%Y-%m-%d:%H-%M-%S)
and I have a snapshot. I can remove it too.

May be today my goal should be:
1- try to undo   "ceph mgr module enable snap_schedule --force" (always 
a bad idea in my mind to use options like "--force")
2- launch update to nautilus now that all HDDs are configured. In my 
Ceph learning process there is also the step to test update procedures.

3- try again to use snap_schedule

Thanks for the time spent on my problem

Patrick

Le 19/09/2023 à 19:46, Patrick Donnelly a écrit :

I'm not sure off-hand. The module did have several changes as recently
as pacific so it's possible something is broken. Perhaps you don't
have a file system created yet? I would still expect to see the
commands however...

I suggest you figure out why Ceph Pacific+ can't detect your hard disk
drives (???). That seems more productive than debugging a long EOLifed
release.

On Tue, Sep 19, 2023 at 8:49 AM Patrick Begou
 wrote:

Hi Patrick,

sorry for the bad copy/paste.  As it was not working I have also tried
with the module name 

[ceph: root@mostha1 /]# ceph fs snap-schedule
no valid command found; 10 closest matches:
fs status []
fs volume ls
fs volume create  []
fs volume rm  []
fs subvolumegroup ls 
fs subvolumegroup create   []
[] [] []
fs subvolumegroup rm   [--force]
fs subvolume ls  []
fs subvolume create   [] []
[] [] [] [] [--namespace-isolated]
fs subvolume rm   [] [--force]
[--retain-snapshots]
Error EINVAL: invalid command

I'm reading the same documentation, but for Octopus:
https://docs.ceph.com/en/octopus/cephfs/snap-schedule/#

I think that if  "ceph mgr module enable snap_schedule" was not working
without the "--force" option, it was because something was wrong in my
Ceph install.

Patrick

Le 19/09/2023 à 14:29, Patrick Donnelly a écrit :

https://docs.ceph.com/en/quincy/cephfs/snap-schedule/#usage

ceph fs snap-schedule

(note the hyphen!)

On Tue, Sep 19, 2023 at 8:23 AM Patrick Begou
 wrote:

Hi,

still some problems with snap_schedule as as the ceph fs snap-schedule
namespace is not available on my nodes.

[ceph: root@mostha1 /]# ceph mgr module ls | jq -r '.enabled_modules []'
cephadm
dashboard
iostat
prometheus
restful
snap_schedule

[ceph: root@mostha1 /]# ceph fs snap_schedule
no valid command found; 10 closest matches:
fs status []
fs volume ls
fs volume create  []
fs volume rm  []
fs subvolumegroup ls 
fs subvolumegroup create   []
[] [] []
fs subvolumegroup rm   [--force]
fs subvolume ls  []
fs subvolume create   [] []
[] [] [] [] [--namespace-isolated]
fs subvolume rm   [] [--force]
[--retain-snapshots]
Error EINVAL: invalid command

I think I need your help to go further 

Patrick
Le 19/09/2023 à 10:23, Patrick Begou a écrit :

Hi,

bad question, sorry.
I've just run

ceph mgr module enable snap_schedule --force

to solve this problem. I was just afraid to use "--force"   but as I
can break this test configuration....

Patrick

Le 19/09/2023 à 09:47, Patrick Begou a écrit :

Hi,

I'm working on a small POC for a ceph setup on 4 old C6100
power-edge. I had to install Octopus since latest versions were
unable to detect the HDD (too old hardware ??).  No matter, this is
only for training and understanding Ceph environment.

My installation is based on
https://download.ceph.com/rpm-15.2.12/el8/noarch/cephadm-15.2.12-0.el8.noarch.rpm
bootstrapped.

I'm reaching the point to automate the snapshots (I can create
snapshot by hand without any problem). The documentation
https://download.ceph.com/rpm-15.2.12/el8/noarch/cephadm-15.2.12-0.el8.noarch.rpm
says to use the snap_schedule module but this module does not exist.

# ceph mgr module ls | jq -r '.enabled_modules []'
cephadm
dashboard
iostat
prometheus
restful

Have I missed something ? Is there some additional install steps to
do for this module ?

Thanks for your help.

Patrick
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

_

[ceph-users] Re: No snap_schedule module in Octopus

2023-09-19 Thread Patrick Begou


Hi Patrick,

sorry for the bad copy/paste.  As it was not working I have also tried 
with the module name 


[ceph: root@mostha1 /]# ceph fs snap-schedule
no valid command found; 10 closest matches:
fs status []
fs volume ls
fs volume create  []
fs volume rm  []
fs subvolumegroup ls 
fs subvolumegroup create   [] 
[] [] []

fs subvolumegroup rm   [--force]
fs subvolume ls  []
fs subvolume create   [] [] 
[] [] [] [] [--namespace-isolated]
fs subvolume rm   [] [--force] 
[--retain-snapshots]

Error EINVAL: invalid command

I'm reading the same documentation, but for Octopus: 
https://docs.ceph.com/en/octopus/cephfs/snap-schedule/#


I think that if  "ceph mgr module enable snap_schedule" was not working 
without the "--force" option, it was because something was wrong in my 
Ceph install.


Patrick

Le 19/09/2023 à 14:29, Patrick Donnelly a écrit :

https://docs.ceph.com/en/quincy/cephfs/snap-schedule/#usage

ceph fs snap-schedule

(note the hyphen!)

On Tue, Sep 19, 2023 at 8:23 AM Patrick Begou
 wrote:

Hi,

still some problems with snap_schedule as as the ceph fs snap-schedule
namespace is not available on my nodes.

[ceph: root@mostha1 /]# ceph mgr module ls | jq -r '.enabled_modules []'
cephadm
dashboard
iostat
prometheus
restful
snap_schedule

[ceph: root@mostha1 /]# ceph fs snap_schedule
no valid command found; 10 closest matches:
fs status []
fs volume ls
fs volume create  []
fs volume rm  []
fs subvolumegroup ls 
fs subvolumegroup create   []
[] [] []
fs subvolumegroup rm   [--force]
fs subvolume ls  []
fs subvolume create   [] []
[] [] [] [] [--namespace-isolated]
fs subvolume rm   [] [--force]
[--retain-snapshots]
Error EINVAL: invalid command

I think I need your help to go further 

Patrick
Le 19/09/2023 à 10:23, Patrick Begou a écrit :

Hi,

bad question, sorry.
I've just run

ceph mgr module enable snap_schedule --force

to solve this problem. I was just afraid to use "--force"   but as I
can break this test configuration

Patrick

Le 19/09/2023 à 09:47, Patrick Begou a écrit :

Hi,

I'm working on a small POC for a ceph setup on 4 old C6100
power-edge. I had to install Octopus since latest versions were
unable to detect the HDD (too old hardware ??).  No matter, this is
only for training and understanding Ceph environment.

My installation is based on
https://download.ceph.com/rpm-15.2.12/el8/noarch/cephadm-15.2.12-0.el8.noarch.rpm
bootstrapped.

I'm reaching the point to automate the snapshots (I can create
snapshot by hand without any problem). The documentation
https://download.ceph.com/rpm-15.2.12/el8/noarch/cephadm-15.2.12-0.el8.noarch.rpm
says to use the snap_schedule module but this module does not exist.

# ceph mgr module ls | jq -r '.enabled_modules []'
cephadm
dashboard
iostat
prometheus
restful

Have I missed something ? Is there some additional install steps to
do for this module ?

Thanks for your help.

Patrick
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: No snap_schedule module in Octopus

2023-09-19 Thread Patrick Begou


Hi,

still some problems with snap_schedule as as the ceph fs snap-schedule 
namespace is not available on my nodes.


[ceph: root@mostha1 /]# ceph mgr module ls | jq -r '.enabled_modules []'
cephadm
dashboard
iostat
prometheus
restful
snap_schedule

[ceph: root@mostha1 /]# ceph fs snap_schedule
no valid command found; 10 closest matches:
fs status []
fs volume ls
fs volume create  []
fs volume rm  []
fs subvolumegroup ls 
fs subvolumegroup create   [] 
[] [] []

fs subvolumegroup rm   [--force]
fs subvolume ls  []
fs subvolume create   [] [] 
[] [] [] [] [--namespace-isolated]
fs subvolume rm   [] [--force] 
[--retain-snapshots]

Error EINVAL: invalid command

I think I need your help to go further 

Patrick
Le 19/09/2023 à 10:23, Patrick Begou a écrit :

Hi,

bad question, sorry.
I've just run

ceph mgr module enable snap_schedule --force

to solve this problem. I was just afraid to use "--force"   but as I 
can break this test configuration


Patrick

Le 19/09/2023 à 09:47, Patrick Begou a écrit :

Hi,

I'm working on a small POC for a ceph setup on 4 old C6100 
power-edge. I had to install Octopus since latest versions were 
unable to detect the HDD (too old hardware ??).  No matter, this is 
only for training and understanding Ceph environment.


My installation is based on 
https://download.ceph.com/rpm-15.2.12/el8/noarch/cephadm-15.2.12-0.el8.noarch.rpm 
bootstrapped.


I'm reaching the point to automate the snapshots (I can create 
snapshot by hand without any problem). The documentation 
https://download.ceph.com/rpm-15.2.12/el8/noarch/cephadm-15.2.12-0.el8.noarch.rpm 
says to use the snap_schedule module but this module does not exist.


# ceph mgr module ls | jq -r '.enabled_modules []'
cephadm
dashboard
iostat
prometheus
restful

Have I missed something ? Is there some additional install steps to 
do for this module ?


Thanks for your help.

Patrick
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: No snap_schedule module in Octopus

2023-09-19 Thread Patrick Begou


Hi,

bad question, sorry.
I've just run

ceph mgr module enable snap_schedule --force

to solve this problem. I was just afraid to use "--force"   but as I 
can break this test configuration


Patrick

Le 19/09/2023 à 09:47, Patrick Begou a écrit :

Hi,

I'm working on a small POC for a ceph setup on 4 old C6100 power-edge. 
I had to install Octopus since latest versions were unable to detect 
the HDD (too old hardware ??).  No matter, this is only for training 
and understanding Ceph environment.


My installation is based on 
https://download.ceph.com/rpm-15.2.12/el8/noarch/cephadm-15.2.12-0.el8.noarch.rpm 
bootstrapped.


I'm reaching the point to automate the snapshots (I can create 
snapshot by hand without any problem). The documentation 
https://download.ceph.com/rpm-15.2.12/el8/noarch/cephadm-15.2.12-0.el8.noarch.rpm 
says to use the snap_schedule module but this module does not exist.


# ceph mgr module ls | jq -r '.enabled_modules []'
cephadm
dashboard
iostat
prometheus
restful

Have I missed something ? Is there some additional install steps to do 
for this module ?


Thanks for your help.

Patrick
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] No snap_schedule module in Octopus

2023-09-19 Thread Patrick Begou


Hi,

I'm working on a small POC for a ceph setup on 4 old C6100 power-edge. I 
had to install Octopus since latest versions were unable to detect the 
HDD (too old hardware ??).  No matter, this is only for training and 
understanding Ceph environment.


My installation is based on 
https://download.ceph.com/rpm-15.2.12/el8/noarch/cephadm-15.2.12-0.el8.noarch.rpm 
bootstrapped.


I'm reaching the point to automate the snapshots (I can create snapshot 
by hand without any problem). The documentation 
https://download.ceph.com/rpm-15.2.12/el8/noarch/cephadm-15.2.12-0.el8.noarch.rpm 
says to use the snap_schedule module but this module does not exist.


# ceph mgr module ls | jq -r '.enabled_modules []'
cephadm
dashboard
iostat
prometheus
restful

Have I missed something ? Is there some additional install steps to do 
for this module ?


Thanks for your help.

Patrick
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Seeking feedback on Improving cephadm bootstrap process

2023-05-31 Thread Patrick Begou

I'm a new ceph user and I have some trouble with boostraping with 
cephadm: using Pacific or Quincy  no hard drive are detected by Ceph. 
Using Octopus all the hard drives are detected. As I do not know how to 
really clean, even a successful  install but not functional, each test  
requires me a full reinstall of the node (it is a test node, no problem 
except needed time). A detailed (and working) cleaning (or uninstalling) 
methode (or command) of a ceph deployment for a Ceph newbie will be very 
helpfull.


About how to do this, I'm using proxmox for vitualization and removing a 
VM via the web interface requires typing again the ID of the VM. May be 
Ceph could require the user providing the cluster ID when running the 
command ? In the command arguments if building a new cluster create 
always a different id or when command is running as a double check.


Best regards,

Patrick

Le 30/05/2023 à 11:23, Frank Schilder a écrit :

What I'm having in mind is if the command is already in history. A wrong history 
reference can execute a command with "--yes-i-really-mean-it" even though you 
really don't mean it. Been there. For an OSD this is maybe tolerable, but for an entire 
cluster ... not really. Some things need to be hard to limit the blast radius of a typo 
(or attacker).

For example, when issuing such a command the first time, the cluster could print a nonce 
that needs to be included in such a command to make it happen and which is only valid 
once for this exact command, so one actually needs to type something new every time to 
destroy stuff. An exception could be if a "safe-to-destroy" query for any 
daemon (pool etc.) returns true.

I would still not allow an entire cluster to be wiped with a single command. In 
a single step, only allow to destroy what could be recovered in some way (there 
has to be some form of undo). And there should be notifications to all admins 
about what is going on to be able to catch malicious execution of destructive 
commands.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Nico Schottelius 
Sent: Tuesday, May 30, 2023 10:51 AM
To: Frank Schilder
Cc: Nico Schottelius; Redouane Kachach; ceph-users@ceph.io
Subject: Re: [ceph-users] Re: Seeking feedback on Improving cephadm bootstrap 
process


Hey Frank,

in regards to destroying a cluster, I'd suggest to reuse the old
--yes-i-really-mean-it parameter, as it is already in use by ceph osd
destroy [0]. Then it doesn't matter whether it's prod or not, if you
really mean it ... ;-)

Best regards,

Nico

[0] https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/

Frank Schilder  writes:


Hi, I would like to second Nico's comment. What happened to the idea that a 
deployment tool should be idempotent? The most natural option would be:

1) start install -> something fails
2) fix problem
3) repeat exact same deploy command -> deployment picks up at current state 
(including cleaning up failed state markers) and tries to continue until next 
issue (go to 2)

I'm not sure (meaning: its a terrible idea) if its a good idea to
provide a single command to wipe a cluster. Just for the fat finger
syndrome. This seems safe only if it would be possible to mark a
cluster as production somehow (must be sticky, that is, cannot be
unset), which prevents a cluster destroy command (or any too dangerous
command) from executing. I understand the test case in the tracker,
but having such test-case utils that can run on a production cluster
and destroy everything seems a bit dangerous.

I think destroying a cluster should be a manual and tedious process
and figuring out how to do it should be part of the learning
experience. So my answer to "how do I start over" would be "go figure
it out, its an important lesson".

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Nico Schottelius 
Sent: Friday, May 26, 2023 10:40 PM
To: Redouane Kachach
Cc: ceph-users@ceph.io
Subject: [ceph-users] Re: Seeking feedback on Improving cephadm bootstrap 
process


Hello Redouane,

much appreciated kick-off for improving cephadm. I was wondering why
cephadm does not use a similar approach to rook in the sense of "repeat
until it is fixed?"

For the background, rook uses a controller that checks the state of the
cluster, the state of monitors, whether there are disks to be added,
etc. It periodically restarts the checks and when needed shifts
monitors, creates OSDs, etc.

My question is, why not have a daemon or checker subcommand of cephadm
that a) checks what the current cluster status is (i.e. cephadm
verify-cluster) and b) fixes the situation (i.e. cephadm 
verify-and-fix-cluster)?

I think that option would be much more beneficial than the other two
suggested ones.

Best regards,

Nico


--
Sustainable and modern Infrastructures by ungleich.ch

[ceph-users] Re: [EXTERNAL] [Pacific] ceph orch device ls do not returns any HDD

2023-05-26 Thread Patrick Begou


Hi Michel,

I do not notice anything strange in the logs files (looking for errors 
or warnings).


The hardware is a DELL C6100 sled (from 2011) running Alma Linux8 
up-to-date. It uses 3 sata disks.


Is there a way to force osd installation by hand with providing the 
device /dev/sdc  for example ? A "do what I say" approach...


Is it a good try to deploy Octopus on the nodes, configure the osd (even 
if podman 4.2.0 is not validated for Octopus)  and then upgrade to 
Pacific? Could this be a workaround for this sort of regression from 
Octopus to Pacific ?


May be updating the BIOS from 1.7.1 to 1.8.1 ?


All this is a little bit confusing for me as I'm trying to discover Ceph 

Thanks

Patrick


Le 26/05/2023 à 17:19, Michel Jouvin a écrit :

Hi Patrick,

It is weird, we have a couple of clusters with cephadm and running 
pacify or quincy and ceph orch device works well. Have you looked at 
the cephadm logs (ceph log last cephadm)?


Except if you are using a very specific hardware, I suspect Ceph is 
suffering of a problem outside it...


Cheers,

Michel
Sent from my mobile

Le 26 mai 2023 17:02:50 Patrick Begou 
 a écrit :



Hi,

I'm back working on this problem.

First of all, I saw that I had a hardware memory error so I had to solve
this first. It's done.

I've tested some different Ceph deployments, each time starting with a
full OS re-install (it requires some time for each test).

Using Octopus, the devices are found:

    dnf -y install \
https://download.ceph.com/rpm-15.2.12/el8/noarch/cephadm-15.2.12-0.el8.noarch.rpm
    monip=$(getent ahostsv4 mostha1 |head -n 1| awk '{ print $1 }'))
    cephadm bootstrap --mon-ip $monip --initial-dashboard-password 
x \

       --allow-fqdn-hostname

    [ceph: root@mostha1 /]# *ceph orch device ls*
    Hostname  Path Type  Serial Size   Health
    Ident  Fault  Available
    mostha1.legi.grenoble-inp.fr  /dev/sda hdd S2B5J90ZA02494    250G
    Unknown  N/A    N/A    Yes
    mostha1.legi.grenoble-inp.fr  /dev/sdc hdd WD-WMAYP0982329   500G
    Unknown  N/A    N/A    Yes


But with Pacific or Quincy the command returns nothing.

With Pacific:

    dnf -y install \
https://download.ceph.com/rpm-16.2.13/el8/noarch/cephadm-16.2.13-0.el8.noarch.rpm
    monip=$(getent ahostsv4 mostha1 |head -n 1| awk '{ print $1 }')
    cephadm bootstrap --mon-ip $monip --initial-dashboard-password 
x \

    --allow-fqdn-hostname


"ceph orch device ls" doesn't return anything but "cephadm shell lsmcli
ldl"  list all the devices.

    [ceph: root@mostha1 /]# *ceph orch device ls --wide*
    [ceph: root@mostha1 /]# *lsblk*
    NAME MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
    sda    8:0    1 232.9G 0 disk
    |-sda1 8:1    1   3.9G 0 part /rootfs/boot
    |-sda2 8:2    1  78.1G 0 part
    | `-osvg-rootvol 253:0    0  48.8G 0 lvm  /rootfs
    |-sda3 8:3    1   3.9G 0 part [SWAP]
    `-sda4 8:4    1 146.9G 0 part
       |-secretvg-homevol 253:1    0   9.8G 0 lvm  /rootfs/home
       |-secretvg-tmpvol  253:2    0   9.8G 0 lvm  /rootfs/tmp
       `-secretvg-varvol  253:3    0   9.8G 0 lvm  /rootfs/var
    sdb    8:16   1 232.9G 0 disk
    sdc    8:32   1 465.8G 0 disk
    [ceph: root@mostha1 /]# exit
    [root@mostha1 ~]# *cephadm ceph-volume inventory*
    Inferring fsid 2e3e85a8-fbcf-11ed-84e5-00266cf8869c
    Using ceph image with id '0dc91bca92c2' and tag 'v17' created on
    2023-05-25 16:26:31 + UTC
quay.io/ceph/ceph@sha256:b8df01a568f4dec7bac6d5040f9391dcca14e00ec7f4de8a3dcf3f2a6502d3a9

    Device Path   Size Device nodes    rotates
    available Model name

    [root@mostha1 ~]# *cephadm shell lsmcli ldl*
    Inferring fsid 4d54823c-fb05-11ed-aecf-00266cf8869c
    Inferring config
/var/lib/ceph/4d54823c-fb05-11ed-aecf-00266cf8869c/mon.mostha1/config
    Using ceph image with id 'c9a1062f7289' and tag 'v17' created on
    2023-04-25 16:04:33 + UTC
quay.io/ceph/ceph@sha256:af79fedafc42237b7612fe2d18a9c64ca62a0b38ab362e614ad671efa4a0547e
    Path | SCSI VPD 0x83    | Link Type | Serial Number   | Health
    Status
-
    */dev/sda | 50024e92039e4f1c | PATA/SATA | S2B5J90ZA10142  | Good**
    **/dev/sdc | 50014ee0ad5953c9 | PATA/SATA | WD-WMAYP0982329 | Good**
    **/dev/sdb | 50024e920387fa2c | PATA/SATA | S2B5J90ZA02494  | Good**
    *


Could it be a bug in ceph-volume ?
Adam suggest looking to the underlying commands (lsblk, blkid, udevadm,
lvs, or pvs) but I'm not very comfortable with blkid and udevadm. Is
there a "debug flag" to set ceph more verbose ?

Thanks

Patrick

Le 15/05/2023 à 21:20, Adam King a écrit :

As you've already seem to have figured out, "ceph orch device ls" is
populated with the results from "ceph-volume inventory".

[ceph-users] Re: [EXTERNAL] [Pacific] ceph orch device ls do not returns any HDD

2023-05-26 Thread Patrick Begou

 lsblk, blkid, udevadm, lvs, or pvs.


Also, if you want to see if it's an issue with a certain version of 
ceph-volume, you can use different versions by passing the image flag 
to cephadm. E.g.


cephadm --image quay.io/ceph/ceph:v17.2.6 
<http://quay.io/ceph/ceph:v17.2.6> ceph-volume -- inventory


would use the 17.2.6 version of ceph-volume for the inventory. It 
works by running ceph-volume through the container, so you don't have 
to have to worry about installing different packages to try them and 
it should pull the container image on its own if it isn't on the 
machine already (but note that means the command will take longer as 
it pulls the image the first time).




On Sat, May 13, 2023 at 4:34 AM Patrick Begou 
 wrote:


Hi Joshua,

I've tried these commands but it looks like CEPH is unable to see and
configure these HDDs.
[root@mostha1 ~]# cephadm ceph-volume inventory

    Inferring fsid 4b7a6504-f0be-11ed-be1a-00266cf8869c
    Using recent ceph image

quay.io/ceph/ceph@sha256:e6919776f0ff8331a8e9c4b18d36c5e9eed31e1a80da62ae8454e42d10e95544

<http://quay.io/ceph/ceph@sha256:e6919776f0ff8331a8e9c4b18d36c5e9eed31e1a80da62ae8454e42d10e95544>

    Device Path   Size Device nodes rotates
    available Model name

[root@mostha1 ~]# cephadm shell

[ceph: root@mostha1 /]# ceph orch apply osd --all-available-devices

    Scheduled osd.all-available-devices update...

[ceph: root@mostha1 /]# ceph orch device ls[ceph: root@mostha1 /]#
ceph-volume lvm zap /dev/sdb

    --> Zapping: /dev/sdb
    --> --destroy was not specified, but zapping a whole device will
    remove the partition table
    Running command: /usr/bin/dd if=/dev/zero of=/dev/sdb bs=1M
count=10
    conv=fsync
      stderr: 10+0 records in
    10+0 records out
    10485760 bytes (10 MB, 10 MiB) copied, 0.10039 s, 104 MB/s
    --> Zapping successful for: 

I can check that /dev/sdb1 has been erased, so previous command is
successful
[ceph: root@mostha1 ceph]# lsblk
NAME MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda    8:0    1 232.9G  0 disk
|-sda1 8:1    1   3.9G  0 part /rootfs/boot
|-sda2 8:2    1  78.1G  0 part
| `-osvg-rootvol 253:0    0  48.8G  0 lvm  /rootfs
|-sda3 8:3    1   3.9G  0 part [SWAP]
`-sda4 8:4    1 146.9G  0 part
   |-secretvg-homevol 253:1    0   9.8G  0 lvm  /rootfs/home
   |-secretvg-tmpvol  253:2    0   9.8G  0 lvm  /rootfs/tmp
   `-secretvg-varvol  253:3    0   9.8G  0 lvm  /rootfs/var
sdb    8:16   1 465.8G  0 disk
sdc    8:32   1 232.9G  0 disk

But still no visible HDD:

[ceph: root@mostha1 ceph]# ceph orch apply osd --all-available-devices

    Scheduled osd.all-available-devices update...

[ceph: root@mostha1 ceph]# ceph orch device ls
[ceph: root@mostha1 ceph]#

May be I have done something bad at install time as in the container
I've unintentionally run:

dnf -y install

https://download.ceph.com/rpm-16.2.13/el8/noarch/cephadm-16.2.13-0.el8.noarch.rpm

(an awful copy/paste launching the command). Can this break The
container ? I do not know what should be available as ceph
packages in
the container to remove properly this install (no dnf.log file in the
container)

Patrick


Le 12/05/2023 à 21:38, Beaman, Joshua a écrit :
> The most significant point I see there, is you have no OSD service
> spec to tell orchestrator how to deploy OSDs.  The easiest fix for
> that would be “cephorchapplyosd--all-available-devices”
>
> This will create a simple spec that should work for a test
> environment.  Most likely it will collocate the block, block.db,
and
> WAL all on the same device.  Not ideal for prod environments,
but fine
> for practice and testing.
>
> The other command I should have had you try is “cephadm ceph-volume
> inventory”.  That should show you the devices available for OSD
> deployment, and hopefully matches up to what your “lsblk”
shows.  If
> you need to zap HDDs and orchestrator is still not seeing them, you
> can try “cephadm ceph-volume lvm zap /dev/sdb”
>
> Thank you,
>
> Josh Beaman
>
> *From: *Patrick Begou 
> *Date: *Friday, May 12, 2023 at 2:22 PM
> *To: *Beaman, Joshua , ceph-users
> 
> *Subject: *Re: [EXTERNAL] [ceph-users] [Pacific] ceph orch
device ls
> do not returns any HDD
>
> Hi Joshua and thanks for this quick reply.
>
> At this step I have only one node. I was checking what ceph was
> returning with different commands on this host before adding new
> hosts. Just to compar

[ceph-users] Re: [EXTERNAL] [Pacific] ceph orch device ls do not returns any HDD

2023-05-13 Thread Patrick Begou


Hi Joshua,

I've tried these commands but it looks like CEPH is unable to see and 
configure these HDDs.

[root@mostha1 ~]# cephadm ceph-volume inventory

   Inferring fsid 4b7a6504-f0be-11ed-be1a-00266cf8869c
   Using recent ceph image
   
quay.io/ceph/ceph@sha256:e6919776f0ff8331a8e9c4b18d36c5e9eed31e1a80da62ae8454e42d10e95544

   Device Path   Size Device nodes    rotates
   available Model name

[root@mostha1 ~]# cephadm shell

[ceph: root@mostha1 /]# ceph orch apply osd --all-available-devices

   Scheduled osd.all-available-devices update...

[ceph: root@mostha1 /]# ceph orch device ls[ceph: root@mostha1 /]# 
ceph-volume lvm zap /dev/sdb


   --> Zapping: /dev/sdb
   --> --destroy was not specified, but zapping a whole device will
   remove the partition table
   Running command: /usr/bin/dd if=/dev/zero of=/dev/sdb bs=1M count=10
   conv=fsync
 stderr: 10+0 records in
   10+0 records out
   10485760 bytes (10 MB, 10 MiB) copied, 0.10039 s, 104 MB/s
   --> Zapping successful for: 

I can check that /dev/sdb1 has been erased, so previous command is 
successful

[ceph: root@mostha1 ceph]# lsblk
NAME MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda    8:0    1 232.9G  0 disk
|-sda1 8:1    1   3.9G  0 part /rootfs/boot
|-sda2 8:2    1  78.1G  0 part
| `-osvg-rootvol 253:0    0  48.8G  0 lvm  /rootfs
|-sda3 8:3    1   3.9G  0 part [SWAP]
`-sda4 8:4    1 146.9G  0 part
  |-secretvg-homevol 253:1    0   9.8G  0 lvm  /rootfs/home
  |-secretvg-tmpvol  253:2    0   9.8G  0 lvm  /rootfs/tmp
  `-secretvg-varvol  253:3    0   9.8G  0 lvm  /rootfs/var
sdb    8:16   1 465.8G  0 disk
sdc    8:32   1 232.9G  0 disk

But still no visible HDD:

[ceph: root@mostha1 ceph]# ceph orch apply osd --all-available-devices

   Scheduled osd.all-available-devices update...

[ceph: root@mostha1 ceph]# ceph orch device ls
[ceph: root@mostha1 ceph]#

May be I have done something bad at install time as in the container 
I've unintentionally run:


dnf -y install 
https://download.ceph.com/rpm-16.2.13/el8/noarch/cephadm-16.2.13-0.el8.noarch.rpm


(an awful copy/paste launching the command). Can this break The 
container ? I do not know what should be available as ceph packages in 
the container to remove properly this install (no dnf.log file in the 
container)


Patrick


Le 12/05/2023 à 21:38, Beaman, Joshua a écrit :
The most significant point I see there, is you have no OSD service 
spec to tell orchestrator how to deploy OSDs.  The easiest fix for 
that would be “cephorchapplyosd--all-available-devices”


This will create a simple spec that should work for a test 
environment.  Most likely it will collocate the block, block.db, and 
WAL all on the same device.  Not ideal for prod environments, but fine 
for practice and testing.


The other command I should have had you try is “cephadm ceph-volume 
inventory”.  That should show you the devices available for OSD 
deployment, and hopefully matches up to what your “lsblk” shows.  If 
you need to zap HDDs and orchestrator is still not seeing them, you 
can try “cephadm ceph-volume lvm zap /dev/sdb”


Thank you,

Josh Beaman

*From: *Patrick Begou 
*Date: *Friday, May 12, 2023 at 2:22 PM
*To: *Beaman, Joshua , ceph-users 

*Subject: *Re: [EXTERNAL] [ceph-users] [Pacific] ceph orch device ls 
do not returns any HDD


Hi Joshua and thanks for this quick reply.

At this step I have only one node. I was checking what ceph was 
returning with different commands on this host before adding new 
hosts. Just to compare with my first Octopus install. As this hardware 
is for testing only, it remains easy for me to break everything and 
reinstall again.


[root@mostha1 ~]# cephadm check-host

podman (/usr/bin/podman) version 4.2.0 is present
systemctl is present
lvcreate is present
Unit chronyd.service is enabled and running
Host looks OK

[ceph: root@mostha1 /]# ceph -s

  cluster:
    id: 4b7a6504-f0be-11ed-be1a-00266cf8869c
    health: HEALTH_WARN
    OSD count 0 < osd_pool_default_size 3

  services:
    mon: 1 daemons, quorum mostha1.legi.grenoble-inp.fr (age 5h)
    mgr: mostha1.legi.grenoble-inp.fr.hogwuz(active, since 5h)
    osd: 0 osds: 0 up, 0 in

  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 B
    usage:   0 B used, 0 B / 0 B avail
    pgs:

[ceph: root@mostha1 /]# ceph orch ls

NAME   PORTS    RUNNING REFRESHED  AGE  PLACEMENT
alertmanager   ?:9093,9094  1/1  6m ago 6h count:1
crash   1/1  6m ago 6h *
grafana    ?:3000   1/1  6m ago 6h count:1
mgr 1/2  6m ago 6h count:2
mon 1/5  6m ago 6h count:5
node-exporter  ?:9100   1/1  6m ago 6h *
prometheu

[ceph-users] Re: [EXTERNAL] [Pacific] ceph orch device ls do not returns any HDD

2023-05-12 Thread Patrick Begou

Hi Joshua and thanks for this quick reply.

At this step I have only one node. I was checking what ceph was
returning with different commands on this host before adding new hosts.
Just to compare with my first Octopus install. As this hardware is for
testing only, it remains easy for me to break everything and reinstall
again.

[root@mostha1 ~]# cephadm check-host

podman (/usr/bin/podman) version 4.2.0 is present
systemctl is present
lvcreate is present
Unit chronyd.service is enabled and running
Host looks OK

[ceph: root@mostha1 /]# ceph -s

cluster:
id: 4b7a6504-f0be-11ed-be1a-00266cf8869c
health: HEALTH_WARN
OSD count 0 < osd_pool_default_size 3

services:
mon: 1 daemons, quorum mostha1.legi.grenoble-inp.fr (age 5h)
mgr: mostha1.legi.grenoble-inp.fr.hogwuz(active, since 5h)
osd: 0 osds: 0 up, 0 in

data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 B
usage: 0 B used, 0 B / 0 B avail
pgs:

[ceph: root@mostha1 /]# ceph orch ls

NAME PORTS RUNNING REFRESHED AGE PLACEMENT
alertmanager ?:9093,9094 1/1 6m ago 6h count:1
crash 1/1 6m ago 6h *
grafana ?:3000 1/1 6m ago 6h count:1
mgr 1/2 6m ago 6h count:2
mon 1/5 6m ago 6h count:5
node-exporter ?:9100 1/1 6m ago 6h *
prometheus ?:9095 1/1 6m ago 6h count:1

[ceph: root@mostha1 /]# ceph orch ls osd -export

No services reported

[ceph: root@mostha1 /]# ceph orch host ls

HOST ADDR LABELS STATUS
mostha1.legi.grenoble-inp.fr 194.254.66.34 _admin
1 hosts in cluster

[ceph: root@mostha1 /]# ceph log last cephadm

...
2023-05-12T15:19:58.754655+
mgr.mostha1.legi.grenoble-inp.fr.hogwuz (mgr.44098) 1876 : cephadm
[INF] Zap device mostha1.legi.grenoble-inp.fr:/dev/sdb
2023-05-12T15:19:58.756639+
mgr.mostha1.legi.grenoble-inp.fr.hogwuz (mgr.44098) 1877 : cephadm
[ERR] Device path '/dev/sdb' not found on host
'mostha1.legi.grenoble-inp.fr'
Traceback (most recent call last):
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 125,
in wrapper
return OrchResult(f(*args, **kwargs))
File "/usr/share/ceph/mgr/cephadm/module.py", line 2275, in
zap_device
f"Device path '{path}' not found on host '{host}'")
orchestrator._interface.OrchestratorError: Device path '/dev/sdb'
not found on host 'mostha1.legi.grenoble-inp.fr'

[ceph: root@mostha1 /]# ls -l /dev/sdb

brw-rw 1 root disk 8, 16 May 12 15:16 /dev/sdb

[ceph: root@mostha1 /]# lsblk /dev/sdb

NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sdb 8:16 1 465.8G 0 disk
`-sdb1 8:17 1 465.8G 0 part

I have crated a full partition on /dev/sdb (for testing) and /dev/sdc
has no partition table (removed).

But all seams fine with these commands.

Patrick

Le 12/05/2023 à 20:19, Beaman, Joshua a écrit :

I don’t quite understand why that zap would not work. But, here’s
where I’d start.

1. cephadm check-host
1. Run this on each of your hosts to make sure cephadm, podman
and all other prerequisites are installed and recognized
2. ceph orch ls
1. This should show at least a mon, mgr, and osd spec deployed
3. ceph orch ls osd –export
1. This will show the OSD placement service specifications that
orchestrator uses to identify devices to deploy as OSDs
4. ceph orch host ls
1. This will list the hosts that have been added to
orchestrator’s inventory, and what labels are applied which
correlate to the service placement labels
5. ceph log last cephadm
1. This will show you what orchestrator has been trying to do,
and how it may be failing

Also, it’s never un-helpful to have a look at “ceph -s” and “ceph
health detail”, particularly for any people trying to help you without
access to your systems.

Best of luck,

Josh Beaman

*From: *Patrick Begou
*Date: *Friday, May 12, 2023 at 10:45 AM
*To: *ceph-users
*Subject: *[EXTERNAL] [ceph-users] [Pacific] ceph orch device ls do
not returns any HDD

Hi everyone

I'm new to CEPH, just a french 4 days training session with Octopus on
VMs that convince me to build my first cluster.

At this time I have 4 old identical nodes for testing with 3 HDDs each,
2 network interfaces and running Alma Linux8 (el8). I try to replay the
training session but it fails, breaking the web interface because of
some problems with podman 4.2 not compatible with Octopus.

So I try to deploy Pacific with cephadm tool on my first node (mostha1)
(to enable testing also an upgrade later).

dnf -y install
https://urldefense.com/v3/__https://download.ceph.com/rpm-16.2.13/el8/noarch/cephadm-16.2.1

[ceph-users] [Pacific] ceph orch device ls do not returns any HDD

2023-05-12 Thread Patrick Begou


Hi everyone

I'm new to CEPH, just a french 4 days training session with Octopus on 
VMs that convince me to build my first cluster.


At this time I have 4 old identical nodes for testing with 3 HDDs each, 
2 network interfaces and running Alma Linux8 (el8). I try to replay the 
training session but it fails, breaking the web interface because of 
some problems with podman 4.2 not compatible with Octopus.


So I try to deploy Pacific with cephadm tool on my first node (mostha1) 
(to enable testing also an upgrade later).


   dnf -y install
   
https://download.ceph.com/rpm-16.2.13/el8/noarch/cephadm-16.2.13-0.el8.noarch.rpm

   monip=$(getent ahostsv4 mostha1 |head -n 1| awk '{ print $1 }')
   cephadm bootstrap --mon-ip $monip --initial-dashboard-password x \
  --initial-dashboard-user admceph \
  --allow-fqdn-hostname --cluster-network 10.1.0.0/16

This was sucessfull.

But running "*c**eph orch device ls*" do not show any HDD even if I have 
/dev/sda (used by the OS), /dev/sdb and /dev/sdc


The web interface shows a row capacity which is an aggregate of the 
sizes of the 3 HDDs for the node.


I've also tried to reset /dev/sdb but cephadm do not see it:

   [ceph: root@mostha1 /]# ceph orch device zap
   mostha1.legi.grenoble-inp.fr /dev/sdb --force
   Error EINVAL: Device path '/dev/sdb' not found on host
   'mostha1.legi.grenoble-inp.fr'

On my first attempt with octopus, I was able to list the available HDD 
with this command line. Before moving to Pacific, the OS on this node 
has been reinstalled from scratch.


Any advices for a CEPH beginner ?

Thanks

Patrick
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

39 matches

Mail list logo