[ceph-users] Re: rbd online sparsify image

2023-01-29 Thread Jiatong Shen
Hello  Ilya,

   Thank you very much for the clarification! Another question, for some
historical reasons, there are still some
Luminous clients existing. Is it dangerous to sparsify an image which is
still being used by a Luminous client?

  Thank you very much for informing me that N/O are both retired. We are
definitely going to investigate on
upgrading.

Best,

Jiatong Shen



On Sun, Jan 29, 2023 at 6:55 PM Ilya Dryomov  wrote:

> On Sun, Jan 29, 2023 at 11:29 AM Jiatong Shen 
> wrote:
> >
> > Hello community experts,
> >
> >I would like to know the status of rbd image sparsify. From the
> website,
> > it should be added at Nautilus (
> > https://docs.ceph.com/en/latest/releases/nautilus/ from pr  (26226
> > ,)  ) but on mimic it is
> related
> > again https://docs.ceph.com/en/latest/releases/octopus/) Is it still the
> > same commit? or is there any update/bug fixes that is not backported to
> > 14.x?
>
> Hi Jiatong,
>
> "rbd sparsify" for replicated pools [1] was released in Nautilus
> (14.2.0).  Support for EC pools [2] was added a bit later, targeted for
> Octopus (15.2.0), but it was also made available in an early point
> release of Nautilus (14.2.2).
>
> Please note that both Nautilus and Octopus releases are EOL and no
> longer supported though.
>
> >   I am also curious why it is named re-sparsify in the pr title?
>
> By default, all RBD images are sparse.  "rbd sparsify" can bring some
> of that sparseness back, hence the re- prefix.
>
> [1] https://github.com/ceph/ceph/pull/26226
> [2] https://github.com/ceph/ceph/pull/27268
>
> Thanks,
>
> Ilya
>


-- 

Best Regards,

Jiatong Shen
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: excluding from host_pattern

2023-01-29 Thread mored1948
Everybody has apparatuses at home and some of the time they breakdown. I 
prescribe you to https://serviceservotech.com/appliance-repair/ utilize a help 
that can rapidly and proficiently fix them so you don't need to purchase 
another one. I have been to this help oftentimes previously and they have 
consistently given me quality assistance.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph rbd clients surrender exclusive lock in critical situation

2023-01-29 Thread Mored1948
Everybody has apparatuses at home and some of the time they breakdown. I 
prescribe you to https://serviceservotech.com/appliance-repair/ utilize a help 
that can rapidly and proficiently fix them so you don't need to purchase 
another one. I have been to this help oftentimes previously and they have 
consistently given me quality assistance.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Debian update to 16.2.11-1~bpo11+1 failing

2023-01-29 Thread maebi
This problem has been fixed by the Ceph team in the mean time, Pacific upgrades 
and installations on Debian are now working as expected!
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Real memory usage of the osd(s)

2023-01-29 Thread Szabo, Istvan (Agoda)
Hello,

If buffered_io is enabled, is there a way to know what is the exactly used 
physical memory from each osd?

What I've found is the dump_mempools which last entries are the following, but 
this bytes would be the real physical memory usage?

"total": {
"items": 60005205,
"bytes": 995781359

Also which metric is this value? I haven't found any.

Thank you


This message is confidential and is for the sole use of the intended 
recipient(s). It may also be privileged or otherwise protected by copyright or 
other legal rules. If you have received it by mistake please let us know by 
reply email and delete it from your system. It is prohibited to copy this 
message or disclose its content to anyone. Any confidentiality or privilege is 
not waived or lost by any mistaken delivery or unauthorized disclosure of the 
message. All messages sent to and from Agoda may be monitored to ensure 
compliance with company policies, to protect the company's interests and to 
remove potential malware. Electronic messages may be intercepted, amended, lost 
or deleted, or contain viruses.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Very slow snaptrim operations blocking client I/O

2023-01-29 Thread Victor Rodriguez


Looks like this is going to take a few days. I hope to manage the 
available performance for VMs with osd_snap_trim_sleep_ssd.


I'm wondering if after that long snaptrim process you went through, was 
your cluster was stable again and snapshots/snaptrims did work properly?



On 1/29/23 16:01, Matt Vandermeulen wrote:
I should have explicitly stated that during the recovery, it was still 
quite bumpy for customers.  Some snaptrims were very quick, some took 
what felt like a really long time.  This was however a cluster with a 
very large number of volumes and a long, long history of snapshots.  
I'm not sure what the difference will be from our case versus a single 
large volume with a big snapshot.




On 2023-01-28 20:45, Victor Rodriguez wrote:

On 1/29/23 00:50, Matt Vandermeulen wrote:
I've observed a similar horror when upgrading a cluster from 
Luminous to Nautilus, which had the same effect of an overwhelming 
amount of snaptrim making the cluster unusable.


In our case, we held its hand by setting all OSDs to have zero max 
trimming PGs, unsetting nosnaptrim, and then slowly enabling 
snaptrim a few OSDs at a time.  It was painful to babysit but it 
allowed the cluster to catch up without falling over.



That's an interesting approach! Thanks!

On preliminary tests seems that just running snaptrim on a single PG 
of a single OSD still makes the cluster barely usable. I have to 
increase osd_snap_trim_sleep_ssd to ~1 so the cluster remains usable 
by getting a third of its performance. After a while, a few PG got 
trimmed and feels like some of them are harder to trim than others, 
as some need a higher osd_snap_trim_sleep_ssd value to let the 
cluster perform.


I don't know how long this is going to take... Maybe recreating the 
OSD's and dealing with the rebalance is a better option?


There's something ugly going on here... I would really like to put my 
finger on it.




On 2023-01-28 19:43, Victor Rodriguez wrote:

After some investigation this is what I'm seeing:

- OSD processes get stuck at least at 100% CPU if I ceph osd unset 
nosnaptrim. They keep at 100% CPU even if I ceph osd set 
nosnaptrim. They stayed like that for at least 26 hours. Some quick 
benchmarks don't show a reduction of the performance of the cluster.


- Restarting a OSD lowers it's CPU usage to typical levels, as 
expected, but it also usually sets some other OSD in a different 
host to typical levels.


- All OSDs in this cluster take quite a bit to start: between 35 to 
70 seconds depending on the OSD. Clearly much longer than any other 
OSD in any of my clusters.


- I believe that the size of the rocksdb database is dumped in the 
OSD log when an automatic compact operation is triggered. The "sum" 
sizes of these OSD range between 2.5 and 5.1 GB. Thats way bigger 
that those in any other cluster I have.


- ceph daemon osd.* calc_objectstore_db_histogram is giving values 
for num_pgmeta_omap (I don't know what it is) way bigger than those 
on any other of my clusters for some OSD. Also, values are not 
similar among the OSD which hold the same PGs.


osd.0:    "num_pgmeta_omap": 17526766,
osd.1:    "num_pgmeta_omap": 2653379,
osd.2:    "num_pgmeta_omap": 12358703,
osd.3:    "num_pgmeta_omap": 6404975,
osd.6:    "num_pgmeta_omap": 19845318,
osd.7:    "num_pgmeta_omap": 6043083,
osd.12:   "num_pgmeta_omap": 18666776,
osd.13:    "num_pgmeta_omap": 615846,
osd.14:    "num_pgmeta_omap": 13190188,

- Compacting the OSD barely reduces rocksdb size and does not 
reduce num_pgmeta_omap at all.


- This is the only cluster I have were there are some RBD images 
that I mount directly from some clients, that is, they are not 
disks for QEMU/Proxmox VMs. Maybe I have something misconfigured 
related to this?  This cluster is at least two and half years old 
an never had this issue with snaptrims.


Thanks in advance!

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



--
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Replacing OSD with containerized deployment

2023-01-29 Thread David Orman
What does "ceph orch osd rm status" show before you try the zap? Is your 
cluster still backfilling to the other OSDs for the PGs that were on the failed 
disk?

David

On Fri, Jan 27, 2023, at 03:25, mailing-lists wrote:
> Dear Ceph-Users,
>
> i am struggling to replace a disk. My ceph-cluster is not replacing the 
> old OSD even though I did:
>
> ceph orch osd rm 232 --replace
>
> The OSD 232 is still shown in the osd list, but the new hdd will be 
> placed as a new OSD. This wouldnt mind me much, if the OSD was also 
> placed on the bluestoreDB / NVME, but it doesn't.
>
>
> My steps:
>
> "ceph orch osd rm 232 --replace"
>
> remove the failed hdd.
>
> add the new one.
>
> Convert the disk within the servers bios, so that the node can have 
> direct access on it.
>
> It shows up as /dev/sdt,
>
> enter maintenance mode
>
> reboot server
>
> drive is now /dev/sdm (which the old drive had)
>
> "ceph orch device zap node-x /dev/sdm "
>
> A new OSD is placed on the cluster.
>
>
> Can you give me a hint, where did I take a wrong turn? Why is the disk 
> not being used as OSD 232?
>
>
> Best
>
> Ken
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: All pgs unknown

2023-01-29 Thread Josh Baergen
This often indicates that something is up with your mgr process. Based
on ceph status, it looks like both the mgr and mon had recently
restarted. Is that expected?

Josh

On Sun, Jan 29, 2023 at 3:36 AM Daniel Brunner  wrote:
>
> Hi,
>
> my ceph cluster started to show HEALTH_WARN, there are no healthy pgs left,
> all are unknown, but it seems my cephfs is still readable, how to
> investigate this any further?
>
> $ sudo ceph -s
>   cluster:
> id: ddb7ebd8-65b5-11ed-84d7-22aca0408523
> health: HEALTH_WARN
> failed to probe daemons or devices
> noout flag(s) set
> Reduced data availability: 339 pgs inactive
>
>   services:
> mon: 1 daemons, quorum flucky-server (age 3m)
> mgr: flucky-server.cupbak(active, since 3m)
> mds: 1/1 daemons up
> osd: 18 osds: 18 up (since 26h), 18 in (since 7w)
>  flags noout
> rgw: 1 daemon active (1 hosts, 1 zones)
>
>   data:
> volumes: 1/1 healthy
> pools:   11 pools, 339 pgs
> objects: 0 objects, 0 B
> usage:   0 B used, 0 B / 0 B avail
> pgs: 100.000% pgs unknown
>  339 unknown
>
>
>
> $ sudo ceph fs status
> cephfs - 2 clients
> ==
> RANK  STATE   MDS ACTIVITY DNSINOS
> DIRS   CAPS
>  0active  cephfs.flucky-server.ldzavv  Reqs:0 /s  61.9k  61.9k
>  17.1k  54.5k
>   POOL TYPE USED  AVAIL
> cephfs_metadata  metadata 0  0
>   cephfs_data  data   0  0
> MDS version: ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757)
> quincy (stable)
>
>
>
> $ docker logs ceph-ddb7ebd8-65b5-11ed-84d7-22aca0408523-mon-flucky-server
> cluster 2023-01-27T12:15:30.437140+ mgr.flucky-server.cupbak
> (mgr.144098) 200 : cluster [DBG] pgmap v189: 339 pgs: 339 unknown; 0 B
> data, 0 B used, 0 B / 0 B avail
>
>
> debug 2023-01-27T12:15:31.995+ 7fa90b3f7700  1
> mon.flucky-server@0(leader).osd
> e50043 _set_new_cache_sizes cache_size:1020054731 inc_alloc: 348127232
> full_alloc: 348127232 kv_alloc: 322961408
>
>
> cluster 2023-01-27T12:15:32.437854+ mgr.flucky-server.cupbak
> (mgr.144098) 201 : cluster [DBG] pgmap v190: 339 pgs: 339 unknown; 0 B
> data, 0 B used, 0 B / 0 B avail
>
>
> cluster 2023-01-27T12:15:32.373735+ osd.9 (osd.9) 123948 : cluster
> [DBG] 9.a deep-scrub starts
>
>
>
> cluster 2023-01-27T12:15:33.013990+ osd.2 (osd.2) 41797 : cluster [DBG]
> 5.6 scrub starts
>
>
>
> cluster 2023-01-27T12:15:33.402881+ osd.9 (osd.9) 123949 : cluster
> [DBG] 9.13 scrub starts
>
>
>
> cluster 2023-01-27T12:15:34.438591+ mgr.flucky-server.cupbak
> (mgr.144098) 202 : cluster [DBG] pgmap v191: 339 pgs: 339 unknown; 0 B
> data, 0 B used, 0 B / 0 B avail
>
>
> cluster 2023-01-27T12:15:35.461575+ osd.9 (osd.9) 123950 : cluster
> [DBG] 7.16 deep-scrub starts
>
>
>
> debug 2023-01-27T12:15:37.005+ 7fa90b3f7700  1
> mon.flucky-server@0(leader).osd
> e50043 _set_new_cache_sizes cache_size:1020054731 inc_alloc: 348127232
> full_alloc: 348127232 kv_alloc: 322961408
>
>
> cluster 2023-01-27T12:15:36.439416+ mgr.flucky-server.cupbak
> (mgr.144098) 203 : cluster [DBG] pgmap v192: 339 pgs: 339 unknown; 0 B
> data, 0 B used, 0 B / 0 B avail
>
>
> cluster 2023-01-27T12:15:36.925368+ osd.2 (osd.2) 41798 : cluster [DBG]
> 7.15 deep-scrub starts
>
>
>
> cluster 2023-01-27T12:15:37.960907+ osd.2 (osd.2) 41799 : cluster [DBG]
> 6.6 scrub starts
>
>
>
> cluster 2023-01-27T12:15:38.440099+ mgr.flucky-server.cupbak
> (mgr.144098) 204 : cluster [DBG] pgmap v193: 339 pgs: 339 unknown; 0 B
> data, 0 B used, 0 B / 0 B avail
>
>
> cluster 2023-01-27T12:15:38.482333+ osd.9 (osd.9) 123951 : cluster
> [DBG] 2.2 scrub starts
>
>
>
> cluster 2023-01-27T12:15:38.959557+ osd.2 (osd.2) 41800 : cluster [DBG]
> 9.47 scrub starts
>
>
>
> cluster 2023-01-27T12:15:39.519980+ osd.9 (osd.9) 123952 : cluster
> [DBG] 4.b scrub starts
>
>
>
> cluster 2023-01-27T12:15:40.440711+ mgr.flucky-server.cupbak
> (mgr.144098) 205 : cluster [DBG] pgmap v194: 339 pgs: 339 unknown; 0 B
> data, 0 B used, 0 B / 0 B avail
>
>
> debug 2023-01-27T12:15:42.012+ 7fa90b3f7700  1
> mon.flucky-server@0(leader).osd
> e50043 _set_new_cache_sizes cache_size:1020054731 inc_alloc: 348127232
> full_alloc: 348127232 kv_alloc: 322961408
>
>
> cluster 2023-01-27T12:15:41.536421+ osd.9 (osd.9) 123953 : cluster
> [DBG] 2.7 scrub starts
>
>
>
> cluster 2023-01-27T12:15:42.441314+ mgr.flucky-server.cupbak
> (mgr.144098) 206 : cluster [DBG] pgmap v195: 339 pgs: 339 unknown; 0 B
> data, 0 B used, 0 B / 0 B avail
>
>
> cluster 2023-01-27T12:15:43.954128+ osd.2 (osd.2) 41801 : cluster [DBG]
> 9.4f scrub starts
>
>
>
> cluster 2023-01-27T12:15:44.441897+ mgr.flucky-server.cupbak
> (mgr.144098) 207 : cluster [DBG] pgmap v196: 339 pgs: 339 unknown; 0 B
> data, 0 B used, 0 B / 0 B avail
>
>
> cluster 2023-01-27T12:15:45.944038+ osd.2 (osd.2) 41802 : cluster [DBG]
> 1.1f deep-scrub starts
>
>
>
> debug 

[ceph-users] Re: Very slow snaptrim operations blocking client I/O

2023-01-29 Thread Matt Vandermeulen
I should have explicitly stated that during the recovery, it was still 
quite bumpy for customers.  Some snaptrims were very quick, some took 
what felt like a really long time.  This was however a cluster with a 
very large number of volumes and a long, long history of snapshots.  I'm 
not sure what the difference will be from our case versus a single large 
volume with a big snapshot.




On 2023-01-28 20:45, Victor Rodriguez wrote:

On 1/29/23 00:50, Matt Vandermeulen wrote:
I've observed a similar horror when upgrading a cluster from Luminous 
to Nautilus, which had the same effect of an overwhelming amount of 
snaptrim making the cluster unusable.


In our case, we held its hand by setting all OSDs to have zero max 
trimming PGs, unsetting nosnaptrim, and then slowly enabling snaptrim 
a few OSDs at a time.  It was painful to babysit but it allowed the 
cluster to catch up without falling over.



That's an interesting approach! Thanks!

On preliminary tests seems that just running snaptrim on a single PG of 
a single OSD still makes the cluster barely usable. I have to increase 
osd_snap_trim_sleep_ssd to ~1 so the cluster remains usable by getting 
a third of its performance. After a while, a few PG got trimmed and 
feels like some of them are harder to trim than others, as some need a 
higher osd_snap_trim_sleep_ssd value to let the cluster perform.


I don't know how long this is going to take... Maybe recreating the 
OSD's and dealing with the rebalance is a better option?


There's something ugly going on here... I would really like to put my 
finger on it.




On 2023-01-28 19:43, Victor Rodriguez wrote:

After some investigation this is what I'm seeing:

- OSD processes get stuck at least at 100% CPU if I ceph osd unset 
nosnaptrim. They keep at 100% CPU even if I ceph osd set nosnaptrim. 
They stayed like that for at least 26 hours. Some quick benchmarks 
don't show a reduction of the performance of the cluster.


- Restarting a OSD lowers it's CPU usage to typical levels, as 
expected, but it also usually sets some other OSD in a different host 
to typical levels.


- All OSDs in this cluster take quite a bit to start: between 35 to 
70 seconds depending on the OSD. Clearly much longer than any other 
OSD in any of my clusters.


- I believe that the size of the rocksdb database is dumped in the 
OSD log when an automatic compact operation is triggered. The "sum" 
sizes of these OSD range between 2.5 and 5.1 GB. Thats way bigger 
that those in any other cluster I have.


- ceph daemon osd.* calc_objectstore_db_histogram is giving values 
for num_pgmeta_omap (I don't know what it is) way bigger than those 
on any other of my clusters for some OSD. Also, values are not 
similar among the OSD which hold the same PGs.


osd.0:    "num_pgmeta_omap": 17526766,
osd.1:    "num_pgmeta_omap": 2653379,
osd.2:    "num_pgmeta_omap": 12358703,
osd.3:    "num_pgmeta_omap": 6404975,
osd.6:    "num_pgmeta_omap": 19845318,
osd.7:    "num_pgmeta_omap": 6043083,
osd.12:   "num_pgmeta_omap": 18666776,
osd.13:    "num_pgmeta_omap": 615846,
osd.14:    "num_pgmeta_omap": 13190188,

- Compacting the OSD barely reduces rocksdb size and does not reduce 
num_pgmeta_omap at all.


- This is the only cluster I have were there are some RBD images that 
I mount directly from some clients, that is, they are not disks for 
QEMU/Proxmox VMs. Maybe I have something misconfigured related to 
this?  This cluster is at least two and half years old an never had 
this issue with snaptrims.


Thanks in advance!

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rbd online sparsify image

2023-01-29 Thread Ilya Dryomov
On Sun, Jan 29, 2023 at 11:29 AM Jiatong Shen  wrote:
>
> Hello community experts,
>
>I would like to know the status of rbd image sparsify. From the website,
> it should be added at Nautilus (
> https://docs.ceph.com/en/latest/releases/nautilus/ from pr  (26226
> ,)  ) but on mimic it is related
> again https://docs.ceph.com/en/latest/releases/octopus/) Is it still the
> same commit? or is there any update/bug fixes that is not backported to
> 14.x?

Hi Jiatong,

"rbd sparsify" for replicated pools [1] was released in Nautilus
(14.2.0).  Support for EC pools [2] was added a bit later, targeted for
Octopus (15.2.0), but it was also made available in an early point
release of Nautilus (14.2.2).

Please note that both Nautilus and Octopus releases are EOL and no
longer supported though.

>   I am also curious why it is named re-sparsify in the pr title?

By default, all RBD images are sparse.  "rbd sparsify" can bring some
of that sparseness back, hence the re- prefix.

[1] https://github.com/ceph/ceph/pull/26226
[2] https://github.com/ceph/ceph/pull/27268

Thanks,

Ilya
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] rbd-mirror replication speed is very slow - but initial replication is fast

2023-01-29 Thread ankit raikwar
Hello Team,
Please help me i deploy two ceph cluster with 6 node configuration almost 800tb 
of capacity. and configurae in the DC-DR configuration for the data high 
availability. i eanbel the rwg and rbd block device mirroring for the 
replocatio of the data. we have the 10 GBPS fiber replication network .
when we first start rbd mirror from our dc to dr starting time when we are 
replication our exsisting data that time we are getting almomst 8 GBPS 
replication speed and it's work fine. once all the exesting images data 
replicated now we are facing the replication speed issue . now only we are 
getting the 5 to 10 mbps relication speed. we also try to find the option like 
rbd_journal_max_payload_bytes and rbd_mirror_journal_max_fetch_bytes but max 
payload size we try to increase but we don't get any result regarding the 
speed. it still same . and rbd_mirror_journal_max_fetch_bytes option we are not 
able to find on the our ceph version. i also try to modify some other values 
and increase like
rbd_mirror_memory_target
rbd_mirror_memory_cache_min

you also can find some refrence regarding this values for increase performace.

Eugen

[1]

https://tracker.ceph.com/projects/ceph/repository/revisions/1ef12ea0d29f955…

[2]

https://github.com/ceph/ceph/pull/27670

Information of my ceph Cluster.

Version: ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy 
(stable)

rbd-mirror daemon version: 17.2.5

Mirror mode; pool

max image mirro at time: 5

replication network: 10 gbps (dedicated)

Client: DC cluster we are continue writing the 50 to 400 mbps data but
replication only 5 to 10 mbps.

issue: speed only we get the 4 to 5 mbps. also we have the 10 Gbps replication 
network
bandwidth.

Note::- I also try to find the option rbd_mirror_journal_max_fetch_bytes but 
i'm not
able to find the this option in the configuration. also when i try to set from 
the
command
line it's showing error like

command:
ceph config set client.rbd rbd_mirror_journal_max_fetch_bytes 33554432

error:
Error EINVAL: unrecognized config option 'rbd_mirror_journal_max_fetch_bytes'
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] All pgs unknown

2023-01-29 Thread Daniel Brunner
Hi,

my ceph cluster started to show HEALTH_WARN, there are no healthy pgs left,
all are unknown, but it seems my cephfs is still readable, how to
investigate this any further?

$ sudo ceph -s
  cluster:
id: ddb7ebd8-65b5-11ed-84d7-22aca0408523
health: HEALTH_WARN
failed to probe daemons or devices
noout flag(s) set
Reduced data availability: 339 pgs inactive

  services:
mon: 1 daemons, quorum flucky-server (age 3m)
mgr: flucky-server.cupbak(active, since 3m)
mds: 1/1 daemons up
osd: 18 osds: 18 up (since 26h), 18 in (since 7w)
 flags noout
rgw: 1 daemon active (1 hosts, 1 zones)

  data:
volumes: 1/1 healthy
pools:   11 pools, 339 pgs
objects: 0 objects, 0 B
usage:   0 B used, 0 B / 0 B avail
pgs: 100.000% pgs unknown
 339 unknown



$ sudo ceph fs status
cephfs - 2 clients
==
RANK  STATE   MDS ACTIVITY DNSINOS
DIRS   CAPS
 0active  cephfs.flucky-server.ldzavv  Reqs:0 /s  61.9k  61.9k
 17.1k  54.5k
  POOL TYPE USED  AVAIL
cephfs_metadata  metadata 0  0
  cephfs_data  data   0  0
MDS version: ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757)
quincy (stable)



$ docker logs ceph-ddb7ebd8-65b5-11ed-84d7-22aca0408523-mon-flucky-server
cluster 2023-01-27T12:15:30.437140+ mgr.flucky-server.cupbak
(mgr.144098) 200 : cluster [DBG] pgmap v189: 339 pgs: 339 unknown; 0 B
data, 0 B used, 0 B / 0 B avail


debug 2023-01-27T12:15:31.995+ 7fa90b3f7700  1
mon.flucky-server@0(leader).osd
e50043 _set_new_cache_sizes cache_size:1020054731 inc_alloc: 348127232
full_alloc: 348127232 kv_alloc: 322961408


cluster 2023-01-27T12:15:32.437854+ mgr.flucky-server.cupbak
(mgr.144098) 201 : cluster [DBG] pgmap v190: 339 pgs: 339 unknown; 0 B
data, 0 B used, 0 B / 0 B avail


cluster 2023-01-27T12:15:32.373735+ osd.9 (osd.9) 123948 : cluster
[DBG] 9.a deep-scrub starts



cluster 2023-01-27T12:15:33.013990+ osd.2 (osd.2) 41797 : cluster [DBG]
5.6 scrub starts



cluster 2023-01-27T12:15:33.402881+ osd.9 (osd.9) 123949 : cluster
[DBG] 9.13 scrub starts



cluster 2023-01-27T12:15:34.438591+ mgr.flucky-server.cupbak
(mgr.144098) 202 : cluster [DBG] pgmap v191: 339 pgs: 339 unknown; 0 B
data, 0 B used, 0 B / 0 B avail


cluster 2023-01-27T12:15:35.461575+ osd.9 (osd.9) 123950 : cluster
[DBG] 7.16 deep-scrub starts



debug 2023-01-27T12:15:37.005+ 7fa90b3f7700  1
mon.flucky-server@0(leader).osd
e50043 _set_new_cache_sizes cache_size:1020054731 inc_alloc: 348127232
full_alloc: 348127232 kv_alloc: 322961408


cluster 2023-01-27T12:15:36.439416+ mgr.flucky-server.cupbak
(mgr.144098) 203 : cluster [DBG] pgmap v192: 339 pgs: 339 unknown; 0 B
data, 0 B used, 0 B / 0 B avail


cluster 2023-01-27T12:15:36.925368+ osd.2 (osd.2) 41798 : cluster [DBG]
7.15 deep-scrub starts



cluster 2023-01-27T12:15:37.960907+ osd.2 (osd.2) 41799 : cluster [DBG]
6.6 scrub starts



cluster 2023-01-27T12:15:38.440099+ mgr.flucky-server.cupbak
(mgr.144098) 204 : cluster [DBG] pgmap v193: 339 pgs: 339 unknown; 0 B
data, 0 B used, 0 B / 0 B avail


cluster 2023-01-27T12:15:38.482333+ osd.9 (osd.9) 123951 : cluster
[DBG] 2.2 scrub starts



cluster 2023-01-27T12:15:38.959557+ osd.2 (osd.2) 41800 : cluster [DBG]
9.47 scrub starts



cluster 2023-01-27T12:15:39.519980+ osd.9 (osd.9) 123952 : cluster
[DBG] 4.b scrub starts



cluster 2023-01-27T12:15:40.440711+ mgr.flucky-server.cupbak
(mgr.144098) 205 : cluster [DBG] pgmap v194: 339 pgs: 339 unknown; 0 B
data, 0 B used, 0 B / 0 B avail


debug 2023-01-27T12:15:42.012+ 7fa90b3f7700  1
mon.flucky-server@0(leader).osd
e50043 _set_new_cache_sizes cache_size:1020054731 inc_alloc: 348127232
full_alloc: 348127232 kv_alloc: 322961408


cluster 2023-01-27T12:15:41.536421+ osd.9 (osd.9) 123953 : cluster
[DBG] 2.7 scrub starts



cluster 2023-01-27T12:15:42.441314+ mgr.flucky-server.cupbak
(mgr.144098) 206 : cluster [DBG] pgmap v195: 339 pgs: 339 unknown; 0 B
data, 0 B used, 0 B / 0 B avail


cluster 2023-01-27T12:15:43.954128+ osd.2 (osd.2) 41801 : cluster [DBG]
9.4f scrub starts



cluster 2023-01-27T12:15:44.441897+ mgr.flucky-server.cupbak
(mgr.144098) 207 : cluster [DBG] pgmap v196: 339 pgs: 339 unknown; 0 B
data, 0 B used, 0 B / 0 B avail


cluster 2023-01-27T12:15:45.944038+ osd.2 (osd.2) 41802 : cluster [DBG]
1.1f deep-scrub starts



debug 2023-01-27T12:15:47.019+ 7fa90b3f7700  1
mon.flucky-server@0(leader).osd
e50043 _set_new_cache_sizes cache_size:1020054731 inc_alloc: 348127232
full_alloc: 348127232 kv_alloc: 322961408


cluster 2023-01-27T12:15:46.442532+ mgr.flucky-server.cupbak
(mgr.144098) 208 : cluster [DBG] pgmap v197: 339 pgs: 339 unknown; 0 B
data, 0 B used, 0 B / 0 B avail


cluster 2023-01-27T12:15:47.543275+ osd.9 (osd.9) 123954 : cluster
[DBG] 2.3 scrub starts



cluster 

[ceph-users] Replacing OSD with containerized deployment

2023-01-29 Thread Ken D
Dear Ceph-Users,

i am struggling to replace a disk. My ceph-cluster is not replacing the old OSD 
even though I did:

ceph orch osd rm 232 --replace

The OSD 232 is still shown in the osd list, but the new hdd will be placed as a 
new OSD. This wouldnt mind me much, if the OSD was also placed on the 
bluestoreDB / NVME, but it doesn't.


My steps:

"ceph orch osd rm 232 --replace"

remove the failed hdd.

add the new one.

Convert the disk within the servers bios, so that the node can have direct 
access on it.

It shows up as /dev/sdt,

enter maintenance mode

reboot server

drive is now /dev/sdm (which the old drive had)

"ceph orch device zap node-x /dev/sdm "

A new OSD is placed on the cluster.


Can you give me a hint, where did I take a wrong turn? Why is the disk not 
being used as OSD 232?


Best

Ken


P.S. Sorry for double sending this message, somehow this mail-address was not 
subscribed to the list anymore.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Replacing OSD with containerized deployment

2023-01-29 Thread mailing-lists

Dear Ceph-Users,

i am struggling to replace a disk. My ceph-cluster is not replacing the 
old OSD even though I did:


ceph orch osd rm 232 --replace

The OSD 232 is still shown in the osd list, but the new hdd will be 
placed as a new OSD. This wouldnt mind me much, if the OSD was also 
placed on the bluestoreDB / NVME, but it doesn't.



My steps:

"ceph orch osd rm 232 --replace"

remove the failed hdd.

add the new one.

Convert the disk within the servers bios, so that the node can have 
direct access on it.


It shows up as /dev/sdt,

enter maintenance mode

reboot server

drive is now /dev/sdm (which the old drive had)

"ceph orch device zap node-x /dev/sdm "

A new OSD is placed on the cluster.


Can you give me a hint, where did I take a wrong turn? Why is the disk 
not being used as OSD 232?



Best

Ken


P.S. Sorry for double sending this message, somehow this mail was not 
subscribed to the list anymore.



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Replacing OSD with containerized deployment

2023-01-29 Thread mailing-lists

Dear Ceph-Users,

i am struggling to replace a disk. My ceph-cluster is not replacing the 
old OSD even though I did:


ceph orch osd rm 232 --replace

The OSD 232 is still shown in the osd list, but the new hdd will be 
placed as a new OSD. This wouldnt mind me much, if the OSD was also 
placed on the bluestoreDB / NVME, but it doesn't.



My steps:

"ceph orch osd rm 232 --replace"

remove the failed hdd.

add the new one.

Convert the disk within the servers bios, so that the node can have 
direct access on it.


It shows up as /dev/sdt,

enter maintenance mode

reboot server

drive is now /dev/sdm (which the old drive had)

"ceph orch device zap node-x /dev/sdm "

A new OSD is placed on the cluster.


Can you give me a hint, where did I take a wrong turn? Why is the disk 
not being used as OSD 232?



Best

Ken

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] rbd online sparsify image

2023-01-29 Thread Jiatong Shen
Hello community experts,

   I would like to know the status of rbd image sparsify. From the website,
it should be added at Nautilus (
https://docs.ceph.com/en/latest/releases/nautilus/ from pr  (26226
,)  ) but on mimic it is related
again https://docs.ceph.com/en/latest/releases/octopus/) Is it still the
same commit? or is there any update/bug fixes that is not backported to
14.x?
  I am also curious why it is named re-sparsify in the pr title?

Thank you very much for the help.

-- 

Best Regards,

Jiatong Shen
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io