[ceph-users] About ceph osd slow ops

2023-11-30 Thread VÔ VI
Hi community,

My cluster running with 10 nodes and 2 nodes goes down, sometimes the log
shows the slow ops, what is the root cause?
My osd is HDD and block.db and wal is 500GB SSD per osd.

Health check update: 13 slow ops, oldest one blocked for 167 sec, osd.10
has slow ops (SLOW_OPS)

Thanks to the community.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Space reclaim doesn't happening in nautilus RBD pool

2023-11-30 Thread Szabo, Istvan (Agoda)
Thrash empty.


Istvan Szabo
Staff Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com
---




From: Ilya Dryomov 
Sent: Thursday, November 30, 2023 6:27 PM
To: Szabo, Istvan (Agoda) 
Cc: Ceph Users 
Subject: Re: [ceph-users] Space reclaim doesn't happening in nautilus RBD pool

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !


On Thu, Nov 30, 2023 at 8:25 AM Szabo, Istvan (Agoda)
 wrote:
>
> Hi,
>
> Is there any config on Ceph that block/not perform space reclaim?
> I test on one pool which has only one image 1.8 TiB in used.
>
>
> rbd $p du im/root
> warning: fast-diff map is not enabled for root. operation may be slow.
> NAMEPROVISIONED USED
> root 2.2 TiB 1.8 TiB
>
>
>
> I already removed all snaphots and now pool has only one image alone.
> I run both fstrim  over the filesystem (XFS) and try rbd sparsify im/root  
> (don't know what it is exactly but it mentions to reclaim something)
> It still shows the pool used 6.9 TiB which totally not make sense right? It 
> should be up to 3.6 (1.8 * 2) according to its replica?

Hi Istvan,

Have you checked RBD trash?

$ rbd trash ls -p im

Thanks,

Ilya


This message is confidential and is for the sole use of the intended 
recipient(s). It may also be privileged or otherwise protected by copyright or 
other legal rules. If you have received it by mistake please let us know by 
reply email and delete it from your system. It is prohibited to copy this 
message or disclose its content to anyone. Any confidentiality or privilege is 
not waived or lost by any mistaken delivery or unauthorized disclosure of the 
message. All messages sent to and from Agoda may be monitored to ensure 
compliance with company policies, to protect the company's interests and to 
remove potential malware. Electronic messages may be intercepted, amended, lost 
or deleted, or contain viruses.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph/daemon container lvm tools don’t work

2023-11-30 Thread Gaël THEROND
Is there anyone using containerized CEPH over CentOS Stream 9 Hosts already?

I think there is a pretty big issue in here if CEPH images are built over
CentOS but never tested against it.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Recommended architecture

2023-11-30 Thread Anthony D'Atri
I try to address these ideas in 
https://www.amazon.com/Learning-Ceph-scalable-reliable-solution-ebook/dp/B01NBP2D9I

though as with any tech topic the details change over time.

It's difficult to interpret the table the OP included, but I think it shows a 3 
node cluster.  When you only have 3 nodes, you don't really have a choice about 
segregating daemons.

Since you mention VMs, should we assume that this is just a sandbox cluster?


> On Nov 30, 2023, at 13:39, Janne Johansson  wrote:
> 
> Den tors 30 nov. 2023 kl 17:35 skrev Francisco Arencibia Quesada <
> arencibia.franci...@gmail.com>:
> 
>> Hello again guys,
>> 
>> Can you recommend me a book that explains best practices with Ceph,
>> for example is it okay to have mon,mgr, osd in the same virtual machine,
>> 
> 
> OSDs can need very much RAM during recovery, after crashes and things like
> that. In such a case, it might be suboptimal to cohost other services on
> that host, since those would be starved for memory if/when the OSD balloons
> to huge sizes for a short while.
> 
> As for recommendations, this needs more information on what to achieve,
> what budget you have, what other choices one must make (OS, virtualization
> or not) and so on. There is no one single solution for "storage", not even
> just within ceph.
> 
> -- 
> May the most significant bit of your life be positive.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Recommended architecture

2023-11-30 Thread Janne Johansson
Den tors 30 nov. 2023 kl 17:35 skrev Francisco Arencibia Quesada <
arencibia.franci...@gmail.com>:

> Hello again guys,
>
> Can you recommend me a book that explains best practices with Ceph,
> for example is it okay to have mon,mgr, osd in the same virtual machine,
>

OSDs can need very much RAM during recovery, after crashes and things like
that. In such a case, it might be suboptimal to cohost other services on
that host, since those would be starved for memory if/when the OSD balloons
to huge sizes for a short while.

As for recommendations, this needs more information on what to achieve,
what budget you have, what other choices one must make (OS, virtualization
or not) and so on. There is no one single solution for "storage", not even
just within ceph.

-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Public/private network

2023-11-30 Thread John Jasen
cluster_network is an optional add-on to handle some of the internal ceph
traffic. Your mon address needs to be accessible/routable for anything
outside your ceph cluster that wants to consume it. That should also be in
your public_network range.

I stumbled over this a few times in figuring out how to deploy ceph. Should
the documentation be cleaned up?



On Thu, Nov 30, 2023 at 12:37 PM Albert Shih  wrote:

> Hi everyone.
>
> Status : Installing a ceph cluster
> Version : 17.2.7 Quincy
> OS : Debian 11.
>
>
> Each of my server got two ip address. One public and one private.
>
> When I'm trying to deploy my cluster with on a server
>
>   server1 (the hostname)
>
> with
>
>   cephadm bootstrap --mon-id hostname --mon-ip IP_PRIVATE
> --cluster-network PRIVATE_SUB
>
> I end up with  private network for
>
>   ceph config get mon public_network
>
> So I try to change it with
>
>   ceph config set mon public_network PUBLIC_SUB
>
> still with lsof -i |grep -i listen I got
>
> ceph-mgr  31427  ceph   49u  IPv4 119937  0t0  TCP
> server1-ceph.private.:7150 (LISTEN)
> node_expo 31572nobody3u  IPv6  65495  0t0  TCP *:9100
> (LISTEN)
> alertmana 31573nobody3u  IPv6  21377  0t0  TCP *:9094
> (LISTEN)
> alertmana 31573nobody8u  IPv6 136298  0t0  TCP *:9093
> (LISTEN)
> prometheu 31757nobody7u  IPv6 109680  0t0  TCP *:9095
> (LISTEN)
> grafana   31758 node-exporter   11u  IPv6 100726  0t0  TCP *:3000
> (LISTEN)
> ceph-mon  31850  ceph   27u  IPv4 139664  0t0  TCP
> server1-ceph.private.:3300 (LISTEN)
> ceph-mon  31850  ceph   28u  IPv4 139665  0t0  TCP
> server1-ceph.private.:6789 (LISTEN)
>
> So the ceph-mon listen on the private interface.
>
> Is this something normal ? Because according to
>
>
> https://access.redhat.com/documentation/fr-fr/red_hat_ceph_storage/5/html/configuration_guide/ceph-network-configuration
>
> only the OSD should listen on private network.
>
> Is they are anyway to configure booth public_network and private_network
> with cephadm bootstrap ?
>
> Regards.
>
>
> --
> Albert SHIH 嶺 
> France
> Heure locale/Local time:
> jeu. 30 nov. 2023 18:27:08 CET
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Public/private network

2023-11-30 Thread Albert Shih
Hi everyone. 

Status : Installing a ceph cluster
Version : 17.2.7 Quincy
OS : Debian 11.


Each of my server got two ip address. One public and one private. 

When I'm trying to deploy my cluster with on a server 

  server1 (the hostname)

with

  cephadm bootstrap --mon-id hostname --mon-ip IP_PRIVATE --cluster-network 
PRIVATE_SUB

I end up with  private network for 

  ceph config get mon public_network

So I try to change it with 

  ceph config set mon public_network PUBLIC_SUB

still with lsof -i |grep -i listen I got 

ceph-mgr  31427  ceph   49u  IPv4 119937  0t0  TCP 
server1-ceph.private.:7150 (LISTEN)
node_expo 31572nobody3u  IPv6  65495  0t0  TCP *:9100 (LISTEN)
alertmana 31573nobody3u  IPv6  21377  0t0  TCP *:9094 (LISTEN)
alertmana 31573nobody8u  IPv6 136298  0t0  TCP *:9093 (LISTEN)
prometheu 31757nobody7u  IPv6 109680  0t0  TCP *:9095 (LISTEN)
grafana   31758 node-exporter   11u  IPv6 100726  0t0  TCP *:3000 (LISTEN)
ceph-mon  31850  ceph   27u  IPv4 139664  0t0  TCP 
server1-ceph.private.:3300 (LISTEN)
ceph-mon  31850  ceph   28u  IPv4 139665  0t0  TCP 
server1-ceph.private.:6789 (LISTEN)

So the ceph-mon listen on the private interface. 

Is this something normal ? Because according to 

  
https://access.redhat.com/documentation/fr-fr/red_hat_ceph_storage/5/html/configuration_guide/ceph-network-configuration

only the OSD should listen on private network. 

Is they are anyway to configure booth public_network and private_network
with cephadm bootstrap ? 

Regards.


-- 
Albert SHIH 嶺 
France
Heure locale/Local time:
jeu. 30 nov. 2023 18:27:08 CET
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Recommended architecture

2023-11-30 Thread Francisco Arencibia Quesada
Hello again guys,

Can you recommend me a book that explains best practices with Ceph,
for example is it okay to have mon,mgr, osd in the same virtual machine,
what is the recommended architecture according to your experience?

Because by default is doing this:

Cluster Ceph | +++ |
| | |10.0.0.52 |10.0.0.194 |10.0.0.229 +---+---+
+---+---+ +---+---+ |[node01.jotelulu.space]|
[node02.jotelulu.space] |[node03.jotelulu.space]| | OSD ++ OSD ++
OSD | | Monitor Daemon | | Monitor Daemon | Monitor Daemon | | Manager
Daemon | |Manager Daemon(standby) | | | +---+
+---+ +---+
-- 


Regards
*Francisco Arencibia Quesada.*
*DevOps Engineer*
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: reef 18.2.1 QE Validation status

2023-11-30 Thread Yuri Weinstein
The fs PRs:
https://github.com/ceph/ceph/pull/54407
https://github.com/ceph/ceph/pull/54677
were approved/tested and ready for merge.

What is the status/plan for https://tracker.ceph.com/issues/63618?

On Wed, Nov 29, 2023 at 10:51 AM Igor Fedotov  wrote:
>
> https://tracker.ceph.com/issues/63618 to be considered as a blocker for
> the next Reef release.
>
> On 07/11/2023 00:30, Yuri Weinstein wrote:
> > Details of this release are summarized here:
> >
> > https://tracker.ceph.com/issues/63443#note-1
> >
> > Seeking approvals/reviews for:
> >
> > smoke - Laura, Radek, Prashant, Venky (POOL_APP_NOT_ENABLE failures)
> > rados - Neha, Radek, Travis, Ernesto, Adam King
> > rgw - Casey
> > fs - Venky
> > orch - Adam King
> > rbd - Ilya
> > krbd - Ilya
> > upgrade/quincy-x (reef) - Laura PTL
> > powercycle - Brad
> > perf-basic - Laura, Prashant (POOL_APP_NOT_ENABLE failures)
> >
> > Please reply to this email with approval and/or trackers of known
> > issues/PRs to address them.
> >
> > TIA
> > YuriW
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS_DAMAGE in 17.2.7 / Cannot delete affected files

2023-11-30 Thread Sebastian Knust

Hi Patrick,

On 30.11.23 03:58, Patrick Donnelly wrote:


I've not yet fully reviewed the logs but it seems there is a bug in
the detection logic which causes a spurious abort. This does not
appear to be actually new damage.


We are accessing the metadata (read-only) daily. The issue only popped 
up after updating to 17.2.7. Of course, this does not mean that there 
was no damage there before, only that it was not detected.


Are you using postgres?

Not on top of CephFS, no. We do use postgres on some RBD volumes.



If you can share details about your snapshot
workflow and general workloads that would be helpful (privately if
desired).


Our CephFS root looks like this:
/archive
/homes
/no-snapshot
/other-snapshot
/scratch

We are running snapshots on /homes and /other-snapshot with the same 
schedule. We mount the filesystem with a Kernel client on one of the 
Ceph Hosts (not running the MDS) and mkdir / rmdir as needed.
- daily between 06:00 and 19:45 UTC (inclusive): Create a snapshot every 
15 minutes, delete it unless it is hourly (xx:00) one hour later
- daily on the full hour: Create a snapshot, delete the 24 hours old 
snapshot unless it is midnight

- daily at midnight delete the snapshot from 14 days ago unless it is Sunday
- every Sunday at midnight delete the snapshot from 8 weeks ago

Workload is two main Samba servers (one only sharing a subdirectory 
which is generally not accessed on the other). Client access to those 
servers is limited to 1GBit/s each. Until Tuesday, we also had a 
mailserver with Dovecot running on top of CephFS. This was migrated on 
Tuesday to an RBD volume as we had some issues with hanging access to 
some files / directories (interestingly only in the main tree, in 
snapshots access was without issue). Additionally, we have a Nextcloud 
instance with ~200 active users storing data in CephFS as well as some 
other Kernel clients with little / sporadic traffic, some running Samba, 
some NFS, some interactive SSH / x2go servers with direct user access, 
some specialised web applications (notably OMERO).


We run daily incremental backups of most of the CephFS content with 
Bareos running on a dedicated server which has the whole CephFS tree 
mounted read-only. For most data a full backup is performed every two 
months, for some data only every six months. The affected area is 
contained in this "every six months" full backup portion of the file 
system tree.



Two weeks ago we deleted a folder structure with 6 TB, average file size 
in the range of 1GB. The structure was unter /other-snapshot as well. 
This led to severe load on the MDS, especially starting midnight. In 
conjunction with Ubuntu kernel mount, we also had issues with 
non-released capabilities preventing read-access to the /other-snapshot 
part.


To combat these lingering problems, we deleted all snapshots in 
/other-snapshot which led to a half a dozen PGs stuck in snaptrim state 
(and a few hundred in snaptrim_wait). Updating from 17.2.6 to 17.2.7 
solved that issue quickly, the affected PGs became unstuck and the whole 
cluster was in active+clean a few hours later.






For now, I'll hold off on running first-damage.py to try to remove the
affected files / inodes. Ultimately however, this seems to be the most
sensible solution to me, at least with regards to cluster downtime.


Please give me another day to review then feel free to use
first-damage.py to cleanup. If you see new damage please upload the
logs.

We are in no hurry and will probably run first-damage.py sometime next 
week. I will report new damage if it comes in.


Cheers
Sebastian

--
Dr. Sebastian Knust  | Bielefeld University
IT Administrator | Faculty of Physics
Office: D2-110   | Universitätsstr. 25
Phone: +49 521 106 5234  | 33615 Bielefeld
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] rook-ceph RAW USE / DATA size difference reported after osd resize operation

2023-11-30 Thread merp
Hi,

I am set to resize OSDs in ceph cluster to extend overall cluster capacity, by 
adding 40GB to each of disk and noticed that after disk resize and OSD restart 
RAW USE size grows proportionally to new size, ex. by 20GB while DATA remains 
the same, which makes new space not readily available. Here is the osd output 
of cluster:

kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph osd df
ID  CLASS  WEIGHT   REWEIGHT  SIZE RAW USE  DATA OMAP META  
AVAIL    %USE   VAR   PGS  STATUS
 1    hdd  0.09769   1.0  100 GiB   83 GiB   82 GiB  164 MiB   891 MiB   17 
GiB  82.79  1.00   77  up
 3    hdd  0.09769   1.0  100 GiB   83 GiB   82 GiB  355 MiB   772 MiB   17 
GiB  82.74  1.00   84  up
 2    hdd  0.09769   1.0  100 GiB   84 GiB   82 GiB  337 MiB   1.3 GiB   16 
GiB  83.88  1.01   82  up
 4    hdd  0.09769   1.0  140 GiB  125 GiB   84 GiB  148 MiB   919 MiB   15 
GiB  89.24  1.07   80  up
 6    hdd  0.09769   1.0  140 GiB  106 GiB  104 GiB  333 MiB  1015 MiB   34 
GiB  75.47  0.91  107  up
 7    hdd  0.09769   1.0  140 GiB  118 GiB   97 GiB  351 MiB   1.2 GiB   22 
GiB  84.48  1.02  101  up
   TOTAL  720 GiB  598 GiB  531 GiB  1.6 GiB   6.1 GiB  122 
GiB  83.10
MIN/MAX VAR: 0.91/1.07  STDDEV: 4.06

The OSDs I managed to extend are 7, 6 and 4. Only OSD number 6 detected new 
size and did not inflate RAW USE, OSD 7 and 4 have RAW USE vs DATA gap.

kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph df
--- RAW STORAGE ---
CLASS SIZE    AVAIL USED  RAW USED  %RAW USED
hdd    720 GiB  122 GiB  598 GiB   598 GiB  83.10
TOTAL  720 GiB  122 GiB  598 GiB   598 GiB  83.10

--- POOLS ---
POOL   ID  PGS   STORED  OBJECTS USED  %USED  MAX AVAIL
.mgr    1    1  449 KiB    2  1.3 MiB  0 16 GiB
cephfs-metadata 2   16  832 MiB  245.62k  2.4 GiB   4.80 16 GiB
cephfs-replicated   3  128  176 GiB  545.23k  530 GiB  91.63 16 GiB
replicapool 4   32 19 B    2   12 KiB  0 16 GiB


This reports nearly 600GB used, while it should be more like 530GB as 
cephfs-replicated pool is reporting its data usage.
Any ideas why is this happening? Should I continue with extension of all OSDs 
to 140GB to see if that makes a difference?
Br,
merp.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: error deploying ceph

2023-11-30 Thread Adam King
That message in the `ceph orch device ls` output is just why the device is
unavailable for an OSD. The reason it now has sufficient space in this case
is because you've already put an OSD on it, so it's really just telling you
you can't place another one. So you can expect to see something like that
for each device you place an OSD on and it's nothing to worry about. It's
useful information if, for example, you remove the OSD associated with the
device but forget to zap the device after, and are wondering why you can't
put another OSD on it later.

On Thu, Nov 30, 2023 at 8:00 AM Francisco Arencibia Quesada <
arencibia.franci...@gmail.com> wrote:

> Thanks again guys,
>
> The cluster is healthy now, is this normal?  all looks look except for
> this output
> *Has a FileSystem, Insufficient space (<10 extents) on vgs, LVM detected  *
>
> root@node1-ceph:~# cephadm shell -- ceph status
> Inferring fsid 209a7bf0-8f6d-11ee-8828-23977d76b74f
> Inferring config
> /var/lib/ceph/209a7bf0-8f6d-11ee-8828-23977d76b74f/mon.node1-ceph/config
> Using ceph image with id '921993c4dfd2' and tag 'v17' created on
> 2023-11-22 16:03:22 + UTC
>
> quay.io/ceph/ceph@sha256:dad2876c2916b732d060b71320f97111bc961108f9c249f4daa9540957a2b6a2
>   cluster:
> id: 209a7bf0-8f6d-11ee-8828-23977d76b74f
> health: HEALTH_OK
>
>   services:
> mon: 3 daemons, quorum node1-ceph,node2-ceph,node3-ceph (age 2h)
> mgr: node1-ceph.peedpx(active, since 2h), standbys: node2-ceph.ykkvho
> osd: 3 osds: 3 up (since 2h), 3 in (since 2h)
>
>   data:
> pools:   2 pools, 33 pgs
> objects: 7 objects, 449 KiB
> usage:   873 MiB used, 299 GiB / 300 GiB avail
> pgs: 33 active+clean
>
> root@node1-ceph:~# cephadm shell -- ceph orch device ls --wide
> Inferring fsid 209a7bf0-8f6d-11ee-8828-23977d76b74f
> Inferring config
> /var/lib/ceph/209a7bf0-8f6d-11ee-8828-23977d76b74f/mon.node1-ceph/config
> Using ceph image with id '921993c4dfd2' and tag 'v17' created on
> 2023-11-22 16:03:22 + UTC
>
> quay.io/ceph/ceph@sha256:dad2876c2916b732d060b71320f97111bc961108f9c249f4daa9540957a2b6a2
> HOSTPATH   TYPE  TRANSPORT  RPM  DEVICE ID   SIZE  HEALTH
>  IDENT  FAULT  AVAILABLE  REFRESHED  REJECT REASONS
>
> node1-ceph  /dev/xvdb  ssd   100G  N/A
>N/ANo 27m agoHas a FileSystem, Insufficient space (<10
> extents) on vgs, LVM detected
> node2-ceph  /dev/xvdb  ssd   100G  N/A
>N/ANo 27m agoHas a FileSystem, Insufficient space (<10
> extents) on vgs, LVM detected
> node3-ceph  /dev/xvdb  ssd   100G  N/A
>N/ANo 27m agoHas a FileSystem, Insufficient space (<10
> extents) on vgs, LVM detected
> root@node1-ceph:~#
>
> On Wed, Nov 29, 2023 at 10:38 PM Adam King  wrote:
>
>> To run a `ceph orch...` (or really any command to the cluster) you should
>> first open a shell with `cephadm shell`. That will put you in a bash shell
>> inside a container that has the ceph packages matching the ceph version in
>> your cluster. If you just want a single command rather than an interactive
>> shell, you can also do `cephadm shell -- ceph orch...`. Also, this might
>> not turn out to be an issue, but just thinking ahead, the devices cephadm
>> will typically allow you to put an OSD on should match what's output by
>> `ceph orch device ls` (which is populated by `cephadm ceph-volume --
>> inventory --format=json-pretty` if you want to look further). So I'd
>> generally say to always check that before making any OSDs through the
>> orchestrator. I also generally like to recommend setting up OSDs through
>> drive group specs (
>> https://docs.ceph.com/en/latest/cephadm/services/osd/#advanced-osd-service-specifications)
>> over using `ceph orch daemon add osd...` although that's a tangent to what
>> you're trying to do now.
>>
>> On Wed, Nov 29, 2023 at 4:14 PM Francisco Arencibia Quesada <
>> arencibia.franci...@gmail.com> wrote:
>>
>>> Thanks so much Adam, that worked great, however I can not add any
>>> storage with:
>>>
>>> sudo cephadm ceph orch daemon add osd node2-ceph:/dev/nvme1n1
>>>
>>> root@node1-ceph:~# ceph status
>>>   cluster:
>>> id: 9d8f1112-8ef9-11ee-838e-a74e679f7866
>>> health: HEALTH_WARN
>>> Failed to apply 1 service(s): osd.all-available-devices
>>> 2 failed cephadm daemon(s)
>>> OSD count 0 < osd_pool_default_size 3
>>>
>>>   services:
>>> mon: 1 daemons, quorum node1-ceph (age 18m)
>>> mgr: node1-ceph.jitjfd(active, since 17m)
>>> osd: 0 osds: 0 up, 0 in (since 6m)
>>>
>>>   data:
>>> pools:   0 pools, 0 pgs
>>> objects: 0 objects, 0 B
>>> usage:   0 B used, 0 B / 0 B avail
>>> pgs:
>>>
>>> root@node1-ceph:~#
>>>
>>> Regards
>>>
>>>
>>>
>>> On Wed, Nov 29, 2023 at 5:45 PM Adam King  wrote:
>>>
 I think I remember a bug that happened when there was a small mismatch
 

[ceph-users] Re: error deploying ceph

2023-11-30 Thread Francisco Arencibia Quesada
Thanks again guys,

The cluster is healthy now, is this normal?  all looks look except for this
output
*Has a FileSystem, Insufficient space (<10 extents) on vgs, LVM detected  *

root@node1-ceph:~# cephadm shell -- ceph status
Inferring fsid 209a7bf0-8f6d-11ee-8828-23977d76b74f
Inferring config
/var/lib/ceph/209a7bf0-8f6d-11ee-8828-23977d76b74f/mon.node1-ceph/config
Using ceph image with id '921993c4dfd2' and tag 'v17' created on 2023-11-22
16:03:22 + UTC
quay.io/ceph/ceph@sha256:dad2876c2916b732d060b71320f97111bc961108f9c249f4daa9540957a2b6a2
  cluster:
id: 209a7bf0-8f6d-11ee-8828-23977d76b74f
health: HEALTH_OK

  services:
mon: 3 daemons, quorum node1-ceph,node2-ceph,node3-ceph (age 2h)
mgr: node1-ceph.peedpx(active, since 2h), standbys: node2-ceph.ykkvho
osd: 3 osds: 3 up (since 2h), 3 in (since 2h)

  data:
pools:   2 pools, 33 pgs
objects: 7 objects, 449 KiB
usage:   873 MiB used, 299 GiB / 300 GiB avail
pgs: 33 active+clean

root@node1-ceph:~# cephadm shell -- ceph orch device ls --wide
Inferring fsid 209a7bf0-8f6d-11ee-8828-23977d76b74f
Inferring config
/var/lib/ceph/209a7bf0-8f6d-11ee-8828-23977d76b74f/mon.node1-ceph/config
Using ceph image with id '921993c4dfd2' and tag 'v17' created on 2023-11-22
16:03:22 + UTC
quay.io/ceph/ceph@sha256:dad2876c2916b732d060b71320f97111bc961108f9c249f4daa9540957a2b6a2
HOSTPATH   TYPE  TRANSPORT  RPM  DEVICE ID   SIZE  HEALTH
 IDENT  FAULT  AVAILABLE  REFRESHED  REJECT REASONS

node1-ceph  /dev/xvdb  ssd   100G  N/A
   N/ANo 27m agoHas a FileSystem, Insufficient space (<10
extents) on vgs, LVM detected
node2-ceph  /dev/xvdb  ssd   100G  N/A
   N/ANo 27m agoHas a FileSystem, Insufficient space (<10
extents) on vgs, LVM detected
node3-ceph  /dev/xvdb  ssd   100G  N/A
   N/ANo 27m agoHas a FileSystem, Insufficient space (<10
extents) on vgs, LVM detected
root@node1-ceph:~#

On Wed, Nov 29, 2023 at 10:38 PM Adam King  wrote:

> To run a `ceph orch...` (or really any command to the cluster) you should
> first open a shell with `cephadm shell`. That will put you in a bash shell
> inside a container that has the ceph packages matching the ceph version in
> your cluster. If you just want a single command rather than an interactive
> shell, you can also do `cephadm shell -- ceph orch...`. Also, this might
> not turn out to be an issue, but just thinking ahead, the devices cephadm
> will typically allow you to put an OSD on should match what's output by
> `ceph orch device ls` (which is populated by `cephadm ceph-volume --
> inventory --format=json-pretty` if you want to look further). So I'd
> generally say to always check that before making any OSDs through the
> orchestrator. I also generally like to recommend setting up OSDs through
> drive group specs (
> https://docs.ceph.com/en/latest/cephadm/services/osd/#advanced-osd-service-specifications)
> over using `ceph orch daemon add osd...` although that's a tangent to what
> you're trying to do now.
>
> On Wed, Nov 29, 2023 at 4:14 PM Francisco Arencibia Quesada <
> arencibia.franci...@gmail.com> wrote:
>
>> Thanks so much Adam, that worked great, however I can not add any storage
>> with:
>>
>> sudo cephadm ceph orch daemon add osd node2-ceph:/dev/nvme1n1
>>
>> root@node1-ceph:~# ceph status
>>   cluster:
>> id: 9d8f1112-8ef9-11ee-838e-a74e679f7866
>> health: HEALTH_WARN
>> Failed to apply 1 service(s): osd.all-available-devices
>> 2 failed cephadm daemon(s)
>> OSD count 0 < osd_pool_default_size 3
>>
>>   services:
>> mon: 1 daemons, quorum node1-ceph (age 18m)
>> mgr: node1-ceph.jitjfd(active, since 17m)
>> osd: 0 osds: 0 up, 0 in (since 6m)
>>
>>   data:
>> pools:   0 pools, 0 pgs
>> objects: 0 objects, 0 B
>> usage:   0 B used, 0 B / 0 B avail
>> pgs:
>>
>> root@node1-ceph:~#
>>
>> Regards
>>
>>
>>
>> On Wed, Nov 29, 2023 at 5:45 PM Adam King  wrote:
>>
>>> I think I remember a bug that happened when there was a small mismatch
>>> between the cephadm version being used for bootstrapping and the container.
>>> In this case, the cephadm binary used for bootstrap knows about the
>>> ceph-exporter service and the container image being used does not. The
>>> ceph-exporter was removed from quincy between 17.2.6 and 17.2.7 so I'd
>>> guess the cephadm binary here is a bit older and it's pulling hte 17.2.7
>>> image. For now, I'd say just workaround this by running bootstrap with
>>> `--skip-monitoring-stack` flag. If you want the other services in the
>>> monitoring stack after bootstrap you can just run `ceph orch apply
>>> ` for services alertmanager, prometheus, node-exporter, and
>>> grafana and it would get you in the same spot as if you didn't provide the
>>> flag and weren't hitting the issue.
>>>
>>> For an extra note, this 

[ceph-users] Re: Space reclaim doesn't happening in nautilus RBD pool

2023-11-30 Thread Ilya Dryomov
On Thu, Nov 30, 2023 at 8:25 AM Szabo, Istvan (Agoda)
 wrote:
>
> Hi,
>
> Is there any config on Ceph that block/not perform space reclaim?
> I test on one pool which has only one image 1.8 TiB in used.
>
>
> rbd $p du im/root
> warning: fast-diff map is not enabled for root. operation may be slow.
> NAMEPROVISIONED USED
> root 2.2 TiB 1.8 TiB
>
>
>
> I already removed all snaphots and now pool has only one image alone.
> I run both fstrim  over the filesystem (XFS) and try rbd sparsify im/root  
> (don't know what it is exactly but it mentions to reclaim something)
> It still shows the pool used 6.9 TiB which totally not make sense right? It 
> should be up to 3.6 (1.8 * 2) according to its replica?

Hi Istvan,

Have you checked RBD trash?

$ rbd trash ls -p im

Thanks,

Ilya
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io