[ceph-users] Re: Cannot recreate monitor in upgrade from pacific to quincy (leveldb -> rocksdb)

2024-02-01 Thread Eugen Block
I might have a reproducer, the second rebuilt mon is not joining the  
cluster as well, I'll look into it and let you know if I find anything.


Zitat von Eugen Block :


Hi,

Can anyone confirm that ancient (2017) leveldb database mons should  
just accept ‘mon.$hostname’ names for mons, a well as ‘mon.$id’ ?


at some point you had or have to remove one of the mons to recreate  
it with a rocksdb backend, so the mismatch should not be an issue  
here. I can confirm that when I tried to reproduce it in a small  
test cluster with leveldb. So now I have two leveldb MONs and one  
rocksdb MON:


jewel:~ # cat  
/var/lib/ceph/b08424fa-8530-4080-876d-2821c916d26c/mon.jewel/kv_backend

rocksdb
jewel2:~ # cat  
/var/lib/ceph/b08424fa-8530-4080-876d-2821c916d26c/mon.jewel2/kv_backend

leveldb
jewel3:~ # cat  
/var/lib/ceph/b08424fa-8530-4080-876d-2821c916d26c/mon.jewel3/kv_backend

leveldb

And the cluster is healthy, although it took a minute or two for the  
rebuilt MON to sync (in a real cluster with some load etc. it might  
take longer):


jewel:~ # ceph -s
  cluster:
id: b08424fa-8530-4080-876d-2821c916d26c
health: HEALTH_OK

  services:
mon: 3 daemons, quorum jewel2,jewel3,jewel (age 3m)

I'm wondering if this could have to do with the insecure_global_id  
things. Can you send the output of:


ceph config get mon auth_allow_insecure_global_id_reclaim
ceph config get mon auth_expose_insecure_global_id_reclaim
ceph config get mon mon_warn_on_insecure_global_id_reclaim
ceph config get mon mon_warn_on_insecure_global_id_reclaim_allowed



Zitat von Mark Schouten :


Hi,

I don’t have a fourth machine available, so that’s not an option  
unfortunatly.


I did enable a lot of debugging earlier, but that shows no  
information as to why stuff is not working as to be expected.


Proxmox just deploys the mons, nothing fancy there, no special cases.

Can anyone confirm that ancient (2017) leveldb database mons should  
just accept ‘mon.$hostname’ names for mons, a well as ‘mon.$id’ ?


—
Mark Schouten
CTO, Tuxis B.V.
+31 318 200208 / m...@tuxis.nl


-- Original Message --
From "Eugen Block" 
To ceph-users@ceph.io
Date 31/01/2024, 13:02:04
Subject [ceph-users] Re: Cannot recreate monitor in upgrade from  
pacific to quincy (leveldb -> rocksdb)



Hi Mark,

as I'm not familiar with proxmox I'm not sure what happens under  
the  hood. There are a couple of things I would try, not  
necessarily in  this order:


- Check the troubleshooting guide [1], for example a clock skew  
could  be one reason, have you verified ntp/chronyd functionality?
- Inspect debug log output, maybe first on the probing mon and if   
those don't reveal the reason, enable debug logs for the other  
MONs as  well:

ceph config set mon.proxmox03 debug_mon 20
ceph config set mon.proxmox03 debug_paxos 20

or for all MONs:
ceph config set mon debug_mon 20
ceph config set mon debug_paxos 20

- Try to deploy an additional MON on a different server (if you  
have  more available) and see if that works.

- Does proxmox log anything?
- Maybe last resort, try to start a MON manually after adding it  
to  the monmap with the monmaptool, but only if you know what  
you're  doing. I wonder if the monmap doesn't get updated...


Regards,
Eugen

[1]  
https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/


Zitat von Mark Schouten :


Hi,

During an upgrade from pacific to quincy, we needed to recreate  
the  mons because the mons were pretty old and still using leveldb.


So step one was to destroy one of the mons. After that we  
recreated  the monitor, and although it starts, it remains in  
state ‘probing’,  as you can see below.


No matter what I tried, it won’t come up. I’ve seen quite some   
messages that the MTU might be an issue, but that seems to be ok:

root@proxmox03:/var/log/ceph# fping -b 1472 10.10.10.{1..3} -M
10.10.10.1 is alive
10.10.10.2 is alive
10.10.10.3 is alive


Does anyone have an idea how to fix this? I’ve tried destroying  
and  recreating the mon a few times now. Could it be that the  
leveldb  mons only support mon.$id notation for the monitors?


root@proxmox03:/var/log/ceph# ceph daemon mon.proxmox03 mon_status
{
  "name": “proxmox03”,
  "rank": 2,
  "state": “probing”,
  "election_epoch": 0,
  "quorum": [],
  "features": {
  "required_con": “2449958197560098820”,
  "required_mon": [
  “kraken”,
  “luminous”,
  “mimic”,
  "osdmap-prune”,
  “nautilus”,
  “octopus”,
  “pacific”,
  "elector-pinging”
  ],
  "quorum_con": “0”,
  "quorum_mon": []
  },
  "outside_quorum": [
  “proxmox03”
  ],
  "extra_probe_peers": [],
  "sync_provider": [],
  "monmap": {
  "epoch": 0,
  "fsid": "39b1e85c-7b47-4262-9f0a-47ae91042bac”,
  "modified": "2024-01-23T21:02:12.631320Z”,
  "created": "2017-03-15T14:54:55.743017Z”,
  "min_mon_release": 16,
  "min_mon_release_name": “pacific”,
  

[ceph-users] RADOSGW Multi-Site Sync Metrics

2024-02-01 Thread Rhys Powell
Hi All,

I am in the process of implementing multi-site RGW instance and have 
successfully set up a POC and confirmed the functionality.

I am working on metrics and alerting for this service, and I am not seeing 
metrics available for the output shown by

radosgw-admin sync status --rgw-realm=<>


Sample output:


[@cepha-cn02 ~]# radosgw-admin sync status --rgw-realm=<>

  realm a207b396-8d1b-408b-851e-10ad545861b7 (realm-name)
  zonegroup 77e8924b-05e3-4d86-b887-aedd7fe5306c (zonegroup-name)
   zone a26c27b2-d6ac-4eab-a4ce-1036ce2d37dc (zone-name)
  metadata sync syncing
full sync: 0/64 shards
incremental sync: 64/64 shards
metadata is caught up with master
  data sync source: 8c7d69db-85ae-45f4-b4ec-f712fad4af07 (zone-name)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is caught up with source


I'd like to measure, track, and alert on shard status during sync operations.

Is there a way to expose these metrics? I'm struggling to find guidance or 
details.


Thanks in advance


Rhys




Rhys Powell (He/Him)
KORE | Senior Systems Engineer

(m)
rpow...@korewireless.com

LinkedIn | 
Twitter| 
Instagram

Disclaimer

The information contained in this communication from the sender is 
confidential. It is intended solely for use by the recipient and others 
authorized to receive it. If you are not the recipient, you are hereby notified 
that any disclosure, copying, distribution or taking action in relation of the 
contents of this information is strictly prohibited and may be unlawful.

This email has been scanned for viruses and malware, and may have been 
automatically archived by Mimecast Ltd, an innovator in Software as a Service 
(SaaS) for business. Providing a safer and more useful place for your human 
generated data. Specializing in; Security, archiving and compliance. To find 
out more visit the Mimecast website.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Performance improvement suggestion

2024-02-01 Thread Anthony D'Atri
I'd totally defer to the RADOS folks.

One issue might be adding a separate code path, which can have all sorts of 
problems.

> On Feb 1, 2024, at 12:53, quag...@bol.com.br wrote:
> 
>  
>  
> Ok Anthony,
> 
> I understood what you said. I also believe in all the professional history 
> and experience you have.
> 
> Anyway, could there be a configuration flag to make this happen?
> 
> As well as those that already exist: "--yes-i-really-mean-it".
> 
> This way, the storage pattern would remain as it is. However, it would allow 
> situations like the one I mentioned to be possible.
> 
> This situation will permit some rules to be relaxed (even if they are not ok 
> at first).
> Likewise, there are already situations like lazyio that make some exceptions 
> to standard procedures.
> Remembering: it's just a suggestion.
> If this type of functionality is not interesting, it is ok.
> 
> 
> 
> Rafael.
>  
> 
> De: "Anthony D'Atri" 
> Enviada: 2024/02/01 12:10:30
> Para: quag...@bol.com.br
> Cc: ceph-users@ceph.io
> Assunto: [ceph-users] Re: Performance improvement suggestion
>  
> 
> 
> > I didn't say I would accept the risk of losing data.
> 
> That's implicit in what you suggest, though.
> 
> > I just said that it would be interesting if the objects were first recorded 
> > only in the primary OSD.
> 
> What happens when that host / drive smokes before it can replicate? What 
> happens if a secondary OSD gets a read op before the primary updates it? 
> Swift object storage users have to code around this potential. It's a 
> non-starter for block storage.
> 
> This is similar to why RoC HBAs (which are a badly outdated thing to begin 
> with) will only enter writeback mode if they have a BBU / supercap -- and of 
> course if their firmware and hardware isn't pervasively buggy. Guess how I 
> know this?
> 
> > This way it would greatly increase performance (both for iops and throuput).
> 
> It might increase low-QD IOPS for a single client on slow media with certain 
> networking. Depending on media, it wouldn't increase throughput.
> 
> Consider QEMU drive-mirror. If you're doing RF=3 replication, you use 3x the 
> network resources between the client and the servers.
> 
> > Later (in the background), record the replicas. This situation would avoid 
> > leaving users/software waiting for the recording response from all replicas 
> > when the storage is overloaded.
> 
> If one makes the mistake of using HDDs, they're going to be overloaded no 
> matter how one slices and dices the ops. Ya just canna squeeze IOPS from a 
> stone. Throughput is going to be limited by the SATA interface and seeking no 
> matter what.
> 
> > Where I work, performance is very important and we don't have money to make 
> > a entire cluster only with NVMe.
> 
> If there isn't money, then it isn't very important. But as I've written 
> before, NVMe clusters *do not cost appreciably more than spinners* unless 
> your procurement processes are bad. In fact they can cost significantly less. 
> This is especially true with object storage and archival where one can 
> leverage QLC.
> 
> * Buy generic drives from a VAR, not channel drives through a chassis brand. 
> Far less markup, and moreover you get the full 5 year warranty, not just 3 
> years. And you can painlessly RMA drives yourself - you don't have to spend 
> hours going back and forth with $chassisvendor's TAC arguing about every 
> single RMA. I've found that this is so bad that it is more economical to just 
> throw away a failed component worth < USD 500 than to RMA it. Do you pay for 
> extended warranty / support? That's expensive too.
> 
> * Certain chassis brands who shall remain nameless push RoC HBAs hard with 
> extreme markups. List prices as high as USD2000. Per server, eschewing those 
> abominations makes up for a lot of the drive-only unit economics
> 
> * But this is the part that lots of people don't get: You don't just stack up 
> the drives on a desk and use them. They go into *servers* that cost money and 
> *racks* that cost money. They take *power* that costs money.
> 
> * $ / IOPS are FAR better for ANY SSD than for HDDs
> 
> * RUs cost money, so do chassis and switches
> 
> * Drive failures cost money
> 
> * So does having your people and applications twiddle their thumbs waiting 
> for stuff to happen. I worked for a supercomputer company who put low-memory 
> low-end diskless workstations on engineer's desks. They spent lots of time 
> doing nothing waiting for their applications to respond. This company no 
> longer exists.
> 
> * So does the risk of taking *weeks* to heal from a drive failure
> 
> Punch honest numbers into https://www.snia.org/forums/cmsi/programs/TCOcalc
> 
> I walked through this with a certain global company. QLC SSDs were 
> demonstrated to have like 30% lower TCO than spinners. Part of the equation 
> is that they were accustomed to limiting HDD size to 8 TB because of the 
> bottlenecks, and thus requiring more servers, more 

[ceph-users] Re: Performance improvement suggestion

2024-02-01 Thread quag...@bol.com.br
 
 
Ok Anthony,

I understood what you said. I also believe in all the professional history and experience you have.

Anyway, could there be a configuration flag to make this happen?

As well as those that already exist: "--yes-i-really-mean-it".

This way, the storage pattern would remain as it is. However, it would allow situations like the one I mentioned to be possible.

This situation will permit some rules to be relaxed (even if they are not ok at first).
Likewise, there are already situations like lazyio that make some exceptions to standard procedures.
 
Remembering: it's just a suggestion.
If this type of functionality is not interesting, it is ok.


Rafael.
 


De: "Anthony D'Atri" 
Enviada: 2024/02/01 12:10:30
Para: quag...@bol.com.br
Cc:  ceph-users@ceph.io
Assunto:  [ceph-users] Re: Performance improvement suggestion
 


> I didn't say I would accept the risk of losing data.

That's implicit in what you suggest, though.

> I just said that it would be interesting if the objects were first recorded only in the primary OSD.

What happens when that host / drive smokes before it can replicate? What happens if a secondary OSD gets a read op before the primary updates it? Swift object storage users have to code around this potential. It's a non-starter for block storage.

This is similar to why RoC HBAs (which are a badly outdated thing to begin with) will only enter writeback mode if they have a BBU / supercap -- and of course if their firmware and hardware isn't pervasively buggy. Guess how I know this?

> This way it would greatly increase performance (both for iops and throuput).

It might increase low-QD IOPS for a single client on slow media with certain networking. Depending on media, it wouldn't increase throughput.

Consider QEMU drive-mirror. If you're doing RF=3 replication, you use 3x the network resources between the client and the servers.

> Later (in the background), record the replicas. This situation would avoid leaving users/software waiting for the recording response from all replicas when the storage is overloaded.

If one makes the mistake of using HDDs, they're going to be overloaded no matter how one slices and dices the ops. Ya just canna squeeze IOPS from a stone. Throughput is going to be limited by the SATA interface and seeking no matter what.

> Where I work, performance is very important and we don't have money to make a entire cluster only with NVMe.

If there isn't money, then it isn't very important. But as I've written before, NVMe clusters *do not cost appreciably more than spinners* unless your procurement processes are bad. In fact they can cost significantly less. This is especially true with object storage and archival where one can leverage QLC.

* Buy generic drives from a VAR, not channel drives through a chassis brand. Far less markup, and moreover you get the full 5 year warranty, not just 3 years. And you can painlessly RMA drives yourself - you don't have to spend hours going back and forth with $chassisvendor's TAC arguing about every single RMA. I've found that this is so bad that it is more economical to just throw away a failed component worth < USD 500 than to RMA it. Do you pay for extended warranty / support? That's expensive too.

* Certain chassis brands who shall remain nameless push RoC HBAs hard with extreme markups. List prices as high as USD2000. Per server, eschewing those abominations makes up for a lot of the drive-only unit economics

* But this is the part that lots of people don't get: You don't just stack up the drives on a desk and use them. They go into *servers* that cost money and *racks* that cost money. They take *power* that costs money.

* $ / IOPS are FAR better for ANY SSD than for HDDs

* RUs cost money, so do chassis and switches

* Drive failures cost money

* So does having your people and applications twiddle their thumbs waiting for stuff to happen. I worked for a supercomputer company who put low-memory low-end diskless workstations on engineer's desks. They spent lots of time doing nothing waiting for their applications to respond. This company no longer exists.

* So does the risk of taking *weeks* to heal from a drive failure

Punch honest numbers into https://www.snia.org/forums/cmsi/programs/TCOcalc

I walked through this with a certain global company. QLC SSDs were demonstrated to have like 30% lower TCO than spinners. Part of the equation is that they were accustomed to limiting HDD size to 8 TB because of the bottlenecks, and thus requiring more servers, more switch ports, more DC racks, more rack/stack time, more administrative overhead. You can fit 1.9 PB of raw SSD capacity in a 1U server. That same RU will hold at most 88 TB of the largest spinners you can get today. 22 TIMES the density. And since many applications can't even barely tolerate the spinner bottlenecks, capping spinner size at even 10T makes that like 40 TIMES better density with SSDs.


> However, I don't think it's interesting to lose 

[ceph-users] Re: Performance improvement suggestion

2024-02-01 Thread quag...@bol.com.br
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: pacific 16.2.15 QE validation status

2024-02-01 Thread Ilya Dryomov
On Thu, Feb 1, 2024 at 5:23 PM Yuri Weinstein  wrote:
>
> Update.
> Seeking approvals/reviews for:
>
> rados - Radek, Laura, Travis, Adam King (see Laura's comments below)
> rgw - Casey approved
> fs - Venky approved
> rbd - Ilya

No issues in RBD, formal approval is pending on [1] which also spills
into RADOS (same job, I believe).

> krbd - Ilya

Approved.

>
> upgrade/nautilus-x (pacific) - fixed by Casey
> upgrade/octopus-x (pacific) - Adam King is looking
> https://tracker.ceph.com/issues/64279
>
> upgrade/pacific-x (quincy) - blocked by
> https://tracker.ceph.com/issues/64256 (Laura, Dan, Adam pls take a
> look)
>
> upgrade/pacific-p2p - Ilya PTL (maybe rbd related?)

I can answer only for test_librbd_python.sh failures -- 3 out of 5 jobs
there.  They popped up because these jobs are running tests from 16.2.7
against librbd from pacific-release (i.e. almost 16.2.15).  One of the
tests happened to pass an incorrect argument and was adjusted together
with the fix for the bug in question [2].

A different variation of this came up in [3] earlier.

I don't think there is any way to fix this other than to disable
test_rbd.TestImage.test_diff_iterate test in upgrades.  We can't
account for such version mismatches when writing tests.

[1] https://tracker.ceph.com/issues/64126
[2] https://tracker.ceph.com/issues/63846
[3] https://tracker.ceph.com/issues/63941

Thanks,

Ilya
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: pacific 16.2.15 QE validation status

2024-02-01 Thread Zakhar Kirpichenko
Hi,

Please consider not leaving this behind:
https://github.com/ceph/ceph/pull/55109

It's a serious bug, which potentially affects a whole node stability if the
affected mgr is colocated with OSDs. The bug was known for quite a while
and really shouldn't be left unfixed.

/Z

On Thu, 1 Feb 2024 at 18:45, Nizamudeen A  wrote:

> Thanks Laura,
>
> Raised a PR for  https://tracker.ceph.com/issues/57386
> https://github.com/ceph/ceph/pull/55415
>
>
> On Thu, Feb 1, 2024 at 5:15 AM Laura Flores  wrote:
>
> > I reviewed the rados suite. @Adam King , @Nizamudeen
> A
> >  would appreciate a look from you, as there are some
> > orchestrator and dashboard trackers that came up.
> >
> > pacific-release, 16.2.15
> >
> > Failures:
> > 1. https://tracker.ceph.com/issues/62225
> > 2. https://tracker.ceph.com/issues/64278
> > 3. https://tracker.ceph.com/issues/58659
> > 4. https://tracker.ceph.com/issues/58658
> > 5. https://tracker.ceph.com/issues/64280 -- new tracker, worth a
> look
> > from Orch
> > 6. https://tracker.ceph.com/issues/63577
> > 7. https://tracker.ceph.com/issues/63894
> > 8. https://tracker.ceph.com/issues/64126
> > 9. https://tracker.ceph.com/issues/63887
> > 10. https://tracker.ceph.com/issues/61602
> > 11. https://tracker.ceph.com/issues/54071
> > 12. https://tracker.ceph.com/issues/57386
> > 13. https://tracker.ceph.com/issues/64281
> > 14. https://tracker.ceph.com/issues/49287
> >
> > Details:
> > 1. pacific upgrade test fails on 'ceph versions | jq -e' command -
> > Ceph - RADOS
> > 2. Unable to update caps for client.iscsi.iscsi.a - Ceph -
> Orchestrator
> > 3. mds_upgrade_sequence: failure when deploying node-exporter - Ceph
> -
> > Orchestrator
> > 4. mds_upgrade_sequence: Error: initializing source
> > docker://prom/alertmanager:v0.20.0 - Ceph - Orchestrator
> > 5. mgr-nfs-upgrade test times out from failed cephadm daemons - Ceph
> -
> > Orchestrator
> > 6. cephadm: docker.io/library/haproxy: toomanyrequests: You have
> > reached your pull rate limit. You may increase the limit by
> authenticating
> > and upgrading: https://www.docker.com/increase-rate-limit - Ceph -
> > Orchestrator
> > 7. qa: cephadm failed with an error code 1, alertmanager container
> not
> > found. - Ceph - Orchestrator
> > 8. ceph-iscsi build was retriggered and now missing
> > package_manager_version attribute - Ceph
> > 9. Starting alertmanager fails from missing container - Ceph -
> > Orchestrator
> > 10. pacific: cls/test_cls_sdk.sh: Health check failed: 1 pool(s) do
> > not have an application enabled (POOL_APP_NOT_ENABLED) - Ceph - RADOS
> > 11. rados/cephadm/osds: Invalid command: missing required parameter
> > hostname() - Ceph - Orchestrator
> > 12. cephadm/test_dashboard_e2e.sh: Expected to find content:
> '/^foo$/'
> > within the selector: 'cd-modal .badge' but never did - Ceph - Mgr -
> > Dashboard
> > 13. Failed to download key at
> > http://download.ceph.com/keys/autobuild.asc: Request failed:  > error [Errno 101] Network is unreachable> - Infrastructure
> > 14. podman: setting cgroup config for procHooks process caused: Unit
> > libpod-$hash.scope not found - Ceph - Orchestrator
> >
> > On Wed, Jan 31, 2024 at 1:41 PM Casey Bodley  wrote:
> >
> >> On Mon, Jan 29, 2024 at 4:39 PM Yuri Weinstein 
> >> wrote:
> >> >
> >> > Details of this release are summarized here:
> >> >
> >> > https://tracker.ceph.com/issues/64151#note-1
> >> >
> >> > Seeking approvals/reviews for:
> >> >
> >> > rados - Radek, Laura, Travis, Ernesto, Adam King
> >> > rgw - Casey
> >>
> >> rgw approved, thanks
> >>
> >> > fs - Venky
> >> > rbd - Ilya
> >> > krbd - in progress
> >> >
> >> > upgrade/nautilus-x (pacific) - Casey PTL (regweed tests failed)
> >> > upgrade/octopus-x (pacific) - Casey PTL (regweed tests failed)
> >> >
> >> > upgrade/pacific-x (quincy) - in progress
> >> > upgrade/pacific-p2p - Ilya PTL (maybe rbd related?)
> >> >
> >> > ceph-volume - Guillaume
> >> >
> >> > TIA
> >> > YuriW
> >> > ___
> >> > ceph-users mailing list -- ceph-users@ceph.io
> >> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >> >
> >> ___
> >> Dev mailing list -- d...@ceph.io
> >> To unsubscribe send an email to dev-le...@ceph.io
> >>
> >
> >
> > --
> >
> > Laura Flores
> >
> > She/Her/Hers
> >
> > Software Engineer, Ceph Storage 
> >
> > Chicago, IL
> >
> > lflo...@ibm.com | lflo...@redhat.com 
> > M: +17087388804
> >
> >
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: pacific 16.2.15 QE validation status

2024-02-01 Thread Nizamudeen A
Thanks Laura,

Raised a PR for  https://tracker.ceph.com/issues/57386
https://github.com/ceph/ceph/pull/55415


On Thu, Feb 1, 2024 at 5:15 AM Laura Flores  wrote:

> I reviewed the rados suite. @Adam King , @Nizamudeen A
>  would appreciate a look from you, as there are some
> orchestrator and dashboard trackers that came up.
>
> pacific-release, 16.2.15
>
> Failures:
> 1. https://tracker.ceph.com/issues/62225
> 2. https://tracker.ceph.com/issues/64278
> 3. https://tracker.ceph.com/issues/58659
> 4. https://tracker.ceph.com/issues/58658
> 5. https://tracker.ceph.com/issues/64280 -- new tracker, worth a look
> from Orch
> 6. https://tracker.ceph.com/issues/63577
> 7. https://tracker.ceph.com/issues/63894
> 8. https://tracker.ceph.com/issues/64126
> 9. https://tracker.ceph.com/issues/63887
> 10. https://tracker.ceph.com/issues/61602
> 11. https://tracker.ceph.com/issues/54071
> 12. https://tracker.ceph.com/issues/57386
> 13. https://tracker.ceph.com/issues/64281
> 14. https://tracker.ceph.com/issues/49287
>
> Details:
> 1. pacific upgrade test fails on 'ceph versions | jq -e' command -
> Ceph - RADOS
> 2. Unable to update caps for client.iscsi.iscsi.a - Ceph - Orchestrator
> 3. mds_upgrade_sequence: failure when deploying node-exporter - Ceph -
> Orchestrator
> 4. mds_upgrade_sequence: Error: initializing source
> docker://prom/alertmanager:v0.20.0 - Ceph - Orchestrator
> 5. mgr-nfs-upgrade test times out from failed cephadm daemons - Ceph -
> Orchestrator
> 6. cephadm: docker.io/library/haproxy: toomanyrequests: You have
> reached your pull rate limit. You may increase the limit by authenticating
> and upgrading: https://www.docker.com/increase-rate-limit - Ceph -
> Orchestrator
> 7. qa: cephadm failed with an error code 1, alertmanager container not
> found. - Ceph - Orchestrator
> 8. ceph-iscsi build was retriggered and now missing
> package_manager_version attribute - Ceph
> 9. Starting alertmanager fails from missing container - Ceph -
> Orchestrator
> 10. pacific: cls/test_cls_sdk.sh: Health check failed: 1 pool(s) do
> not have an application enabled (POOL_APP_NOT_ENABLED) - Ceph - RADOS
> 11. rados/cephadm/osds: Invalid command: missing required parameter
> hostname() - Ceph - Orchestrator
> 12. cephadm/test_dashboard_e2e.sh: Expected to find content: '/^foo$/'
> within the selector: 'cd-modal .badge' but never did - Ceph - Mgr -
> Dashboard
> 13. Failed to download key at
> http://download.ceph.com/keys/autobuild.asc: Request failed:  error [Errno 101] Network is unreachable> - Infrastructure
> 14. podman: setting cgroup config for procHooks process caused: Unit
> libpod-$hash.scope not found - Ceph - Orchestrator
>
> On Wed, Jan 31, 2024 at 1:41 PM Casey Bodley  wrote:
>
>> On Mon, Jan 29, 2024 at 4:39 PM Yuri Weinstein 
>> wrote:
>> >
>> > Details of this release are summarized here:
>> >
>> > https://tracker.ceph.com/issues/64151#note-1
>> >
>> > Seeking approvals/reviews for:
>> >
>> > rados - Radek, Laura, Travis, Ernesto, Adam King
>> > rgw - Casey
>>
>> rgw approved, thanks
>>
>> > fs - Venky
>> > rbd - Ilya
>> > krbd - in progress
>> >
>> > upgrade/nautilus-x (pacific) - Casey PTL (regweed tests failed)
>> > upgrade/octopus-x (pacific) - Casey PTL (regweed tests failed)
>> >
>> > upgrade/pacific-x (quincy) - in progress
>> > upgrade/pacific-p2p - Ilya PTL (maybe rbd related?)
>> >
>> > ceph-volume - Guillaume
>> >
>> > TIA
>> > YuriW
>> > ___
>> > ceph-users mailing list -- ceph-users@ceph.io
>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>> >
>> ___
>> Dev mailing list -- d...@ceph.io
>> To unsubscribe send an email to dev-le...@ceph.io
>>
>
>
> --
>
> Laura Flores
>
> She/Her/Hers
>
> Software Engineer, Ceph Storage 
>
> Chicago, IL
>
> lflo...@ibm.com | lflo...@redhat.com 
> M: +17087388804
>
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: pacific 16.2.15 QE validation status

2024-02-01 Thread Yuri Weinstein
Update.
Seeking approvals/reviews for:

rados - Radek, Laura, Travis, Adam King (see Laura's comments below)
rgw - Casey approved
fs - Venky approved
rbd - Ilya
krbd - Ilya

upgrade/nautilus-x (pacific) - fixed by Casey
upgrade/octopus-x (pacific) - Adam King is looking
https://tracker.ceph.com/issues/64279

upgrade/pacific-x (quincy) - blocked by
https://tracker.ceph.com/issues/64256 (Laura, Dan, Adam pls take a
look)

upgrade/pacific-p2p - Ilya PTL (maybe rbd related?)

ceph-volume - Guillaume is fixing
https://tracker.ceph.com/issues/64248
https://github.com/ceph/ceph/pull/55376

In addition to all these issues, we are considering adding a fix for
https://tracker.ceph.com/issues/63425
(https://github.com/ceph/ceph/pull/54312).  Adam King and Ilya are
looking into this.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cannot recreate monitor in upgrade from pacific to quincy (leveldb -> rocksdb)

2024-02-01 Thread Eugen Block

Hi,

Can anyone confirm that ancient (2017) leveldb database mons should  
just accept ‘mon.$hostname’ names for mons, a well as ‘mon.$id’ ?


at some point you had or have to remove one of the mons to recreate it  
with a rocksdb backend, so the mismatch should not be an issue here. I  
can confirm that when I tried to reproduce it in a small test cluster  
with leveldb. So now I have two leveldb MONs and one rocksdb MON:


jewel:~ # cat  
/var/lib/ceph/b08424fa-8530-4080-876d-2821c916d26c/mon.jewel/kv_backend

rocksdb
jewel2:~ # cat  
/var/lib/ceph/b08424fa-8530-4080-876d-2821c916d26c/mon.jewel2/kv_backend

leveldb
jewel3:~ # cat  
/var/lib/ceph/b08424fa-8530-4080-876d-2821c916d26c/mon.jewel3/kv_backend

leveldb

And the cluster is healthy, although it took a minute or two for the  
rebuilt MON to sync (in a real cluster with some load etc. it might  
take longer):


jewel:~ # ceph -s
  cluster:
id: b08424fa-8530-4080-876d-2821c916d26c
health: HEALTH_OK

  services:
mon: 3 daemons, quorum jewel2,jewel3,jewel (age 3m)

I'm wondering if this could have to do with the insecure_global_id  
things. Can you send the output of:


ceph config get mon auth_allow_insecure_global_id_reclaim
ceph config get mon auth_expose_insecure_global_id_reclaim
ceph config get mon mon_warn_on_insecure_global_id_reclaim
ceph config get mon mon_warn_on_insecure_global_id_reclaim_allowed



Zitat von Mark Schouten :


Hi,

I don’t have a fourth machine available, so that’s not an option  
unfortunatly.


I did enable a lot of debugging earlier, but that shows no  
information as to why stuff is not working as to be expected.


Proxmox just deploys the mons, nothing fancy there, no special cases.

Can anyone confirm that ancient (2017) leveldb database mons should  
just accept ‘mon.$hostname’ names for mons, a well as ‘mon.$id’ ?


—
Mark Schouten
CTO, Tuxis B.V.
+31 318 200208 / m...@tuxis.nl


-- Original Message --
From "Eugen Block" 
To ceph-users@ceph.io
Date 31/01/2024, 13:02:04
Subject [ceph-users] Re: Cannot recreate monitor in upgrade from  
pacific to quincy (leveldb -> rocksdb)



Hi Mark,

as I'm not familiar with proxmox I'm not sure what happens under  
the  hood. There are a couple of things I would try, not  
necessarily in  this order:


- Check the troubleshooting guide [1], for example a clock skew  
could  be one reason, have you verified ntp/chronyd functionality?
- Inspect debug log output, maybe first on the probing mon and if   
those don't reveal the reason, enable debug logs for the other MONs  
as  well:

ceph config set mon.proxmox03 debug_mon 20
ceph config set mon.proxmox03 debug_paxos 20

or for all MONs:
ceph config set mon debug_mon 20
ceph config set mon debug_paxos 20

- Try to deploy an additional MON on a different server (if you  
have  more available) and see if that works.

- Does proxmox log anything?
- Maybe last resort, try to start a MON manually after adding it to  
 the monmap with the monmaptool, but only if you know what you're   
doing. I wonder if the monmap doesn't get updated...


Regards,
Eugen

[1]  
https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/


Zitat von Mark Schouten :


Hi,

During an upgrade from pacific to quincy, we needed to recreate  
the  mons because the mons were pretty old and still using leveldb.


So step one was to destroy one of the mons. After that we  
recreated  the monitor, and although it starts, it remains in  
state ‘probing’,  as you can see below.


No matter what I tried, it won’t come up. I’ve seen quite some   
messages that the MTU might be an issue, but that seems to be ok:

root@proxmox03:/var/log/ceph# fping -b 1472 10.10.10.{1..3} -M
10.10.10.1 is alive
10.10.10.2 is alive
10.10.10.3 is alive


Does anyone have an idea how to fix this? I’ve tried destroying  
and  recreating the mon a few times now. Could it be that the  
leveldb  mons only support mon.$id notation for the monitors?


root@proxmox03:/var/log/ceph# ceph daemon mon.proxmox03 mon_status
{
   "name": “proxmox03”,
   "rank": 2,
   "state": “probing”,
   "election_epoch": 0,
   "quorum": [],
   "features": {
   "required_con": “2449958197560098820”,
   "required_mon": [
   “kraken”,
   “luminous”,
   “mimic”,
   "osdmap-prune”,
   “nautilus”,
   “octopus”,
   “pacific”,
   "elector-pinging”
   ],
   "quorum_con": “0”,
   "quorum_mon": []
   },
   "outside_quorum": [
   “proxmox03”
   ],
   "extra_probe_peers": [],
   "sync_provider": [],
   "monmap": {
   "epoch": 0,
   "fsid": "39b1e85c-7b47-4262-9f0a-47ae91042bac”,
   "modified": "2024-01-23T21:02:12.631320Z”,
   "created": "2017-03-15T14:54:55.743017Z”,
   "min_mon_release": 16,
   "min_mon_release_name": “pacific”,
   "election_strategy": 1,
   "disallowed_leaders: ": “”,
   "stretch_mode": false,
   "tiebreaker_mon": “”,
   "removed_ranks: ": 

[ceph-users] Re: 6 pgs not deep-scrubbed in time

2024-02-01 Thread Wesley Dillingham
I would just set noout for the duration of the reboot no other flags really
needed. There is a better option to limit that flag to just the host being
rebooted. which is "set-group noout " where  is the servers
name  in CRUSH. Just the global noout will suffice though.

Anyways... your not scrubbed in time warnings arent going away anytime in
the short term until you finish the pg split. In fact, they will get more
numerous until the pg split finishes (did you start that?). If you want to
get rid of the "cosmetic" issue of the warning you can adjust the interval
in which the warning comes up, but I would suggest you leave it since you
are trying to address the root of the situation and want to see the
resolution.



Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Thu, Feb 1, 2024 at 9:16 AM Michel Niyoyita  wrote:

> And as said before still it is in warning state with pgs not deep-scrubed
> in time . Hope this can be ignored and set those two flags "noout and
> nobackfill" then reboot .
>
> Thank you again Sir
>
> On Thu, 1 Feb 2024, 16:11 Michel Niyoyita,  wrote:
>
>> Thank you very much Janne.
>>
>> On Thu, 1 Feb 2024, 15:21 Janne Johansson,  wrote:
>>
>>> pause and nodown is not a good option to set, that will certainly make
>>> clients stop IO. Pause will stop it immediately, and nodown will stop
>>> IO when the OSD processes stop running on this host.
>>>
>>> When we do service on a host, we set "noout" and "nobackfill", that is
>>> enough for reboots, OS upgrades and simple disk exchanges.
>>> The PGs on this one host will be degraded during the down period, but
>>> IO continues.
>>> Of course this is when the cluster was healthy to begin with (not
>>> counting "not scrubbed in time" warnings, they don't matter in this
>>> case.)
>>>
>>>
>>>
>>> Den tors 1 feb. 2024 kl 12:21 skrev Michel Niyoyita :
>>> >
>>> > Thanks Very much Wesley,
>>> >
>>> > We have decided to restart one host among three osds hosts. before
>>> doing
>>> > that I need the advices of the team . these are flags I want to set
>>> before
>>> > restart.
>>> >
>>> >  'ceph osd set noout'
>>> >  'ceph osd set nobackfill'
>>> >  'ceph osd set norecover'
>>> >  'ceph osd set norebalance'
>>> > 'ceph osd set nodown'
>>> >  'ceph osd set pause'
>>> > 'ceph osd set nodeep-scrub'
>>> > 'ceph osd set noscrub'
>>> >
>>> >
>>> > Would like to ask if this can be enough to set and restart the host
>>> safely
>>> > . the cluster has 3 as replicas.
>>> >
>>> > will the cluster still be accessible while restart the hosts? after
>>> > restarting I will unset the flags.
>>> >
>>> > Kindly advise.
>>> >
>>> > Michel
>>> >
>>> >
>>> > On Tue, 30 Jan 2024, 17:44 Wesley Dillingham, 
>>> wrote:
>>> >
>>> > > actually it seems the issue I had in mind was fixed in 16.2.11 so you
>>> > > should be fine.
>>> > >
>>> > > Respectfully,
>>> > >
>>> > > *Wes Dillingham*
>>> > > w...@wesdillingham.com
>>> > > LinkedIn 
>>> > >
>>> > >
>>> > > On Tue, Jan 30, 2024 at 10:34 AM Wesley Dillingham <
>>> w...@wesdillingham.com>
>>> > > wrote:
>>> > >
>>> > >> You may want to consider upgrading to 16.2.14 before you do the pg
>>> split.
>>> > >>
>>> > >> Respectfully,
>>> > >>
>>> > >> *Wes Dillingham*
>>> > >> w...@wesdillingham.com
>>> > >> LinkedIn 
>>> > >>
>>> > >>
>>> > >> On Tue, Jan 30, 2024 at 10:18 AM Michel Niyoyita >> >
>>> > >> wrote:
>>> > >>
>>> > >>> I tried that on one of my pool (pool id 3) but the number of pgs
>>> not
>>> > >>> deep-scrubbed in time increased also from 55 to 100 but the number
>>> of PGs
>>> > >>> was increased. I set also autoscale to off mode. before continue
>>> to other
>>> > >>> pools would like to ask if so far there is no negative impact.
>>> > >>>
>>> > >>> ceph -s
>>> > >>>   cluster:
>>> > >>> id: cb0caedc-eb5b-42d1-a34f-96facfda8c27
>>> > >>> health: HEALTH_WARN
>>> > >>> 100 pgs not deep-scrubbed in time
>>> > >>>
>>> > >>>   services:
>>> > >>> mon: 3 daemons, quorum ceph-mon1,ceph-mon2,ceph-mon3 (age 11M)
>>> > >>> mgr: ceph-mon2(active, since 11M), standbys: ceph-mon3,
>>> ceph-mon1
>>> > >>> osd: 48 osds: 48 up (since 11M), 48 in (since 12M)
>>> > >>> rgw: 6 daemons active (6 hosts, 1 zones)
>>> > >>>
>>> > >>>   data:
>>> > >>> pools:   10 pools, 609 pgs
>>> > >>> objects: 6.03M objects, 23 TiB
>>> > >>> usage:   151 TiB used, 282 TiB / 433 TiB avail
>>> > >>> pgs: 603 active+clean
>>> > >>>  4   active+clean+scrubbing+deep
>>> > >>>  2   active+clean+scrubbing
>>> > >>>
>>> > >>>   io:
>>> > >>> client:   96 MiB/s rd, 573 MiB/s wr, 576 op/s rd, 648 op/s wr
>>> > >>>
>>> > >>> root@ceph-osd3:/var/log# ceph df
>>> > >>> --- RAW STORAGE ---
>>> > >>> CLASS SIZEAVAIL USED  RAW USED  %RAW USED
>>> > >>> hdd433 TiB  282 TiB  151 TiB   151 TiB  34.93
>>> > >>> 

[ceph-users] Re: Performance improvement suggestion

2024-02-01 Thread Anthony D'Atri


>  I didn't say I would accept the risk of losing data.

That's implicit in what you suggest, though.

>  I just said that it would be interesting if the objects were first 
> recorded only in the primary OSD.

What happens when that host / drive smokes before it can replicate?  What 
happens if a secondary OSD gets a read op before the primary updates it?  Swift 
object storage users have to code around this potential.  It's a non-starter 
for block storage.

This is similar to why RoC HBAs (which are a badly outdated thing to begin 
with) will only enter writeback mode if they have a BBU / supercap -- and of 
course if their firmware and hardware isn't pervasively buggy.  Guess how I 
know this?

>  This way it would greatly increase performance (both for iops and 
> throuput).

It might increase low-QD IOPS for a single client on slow media with certain 
networking.  Depending on media, it wouldn't increase throughput.

Consider QEMU drive-mirror.  If you're doing RF=3 replication, you use 3x the 
network resources between the client and the servers.

>  Later (in the background), record the replicas. This situation would 
> avoid leaving users/software waiting for the recording response from all 
> replicas when the storage is overloaded.

If one makes the mistake of using HDDs, they're going to be overloaded no 
matter how one slices and dices the ops.  Ya just canna squeeze IOPS from a 
stone.  Throughput is going to be limited by the SATA interface and seeking no 
matter what.

>  Where I work, performance is very important and we don't have money to 
> make a entire cluster only with NVMe.

If there isn't money, then it isn't very important.  But as I've written 
before, NVMe clusters *do not cost appreciably more than spinners* unless your 
procurement processes are bad.  In fact they can cost significantly less.  This 
is especially true with object storage and archival where one can leverage QLC. 

* Buy generic drives from a VAR, not channel drives through a chassis brand.  
Far less markup, and moreover you get the full 5 year warranty, not just 3 
years.  And you can painlessly RMA drives yourself - you don't have to spend 
hours going back and forth with $chassisvendor's TAC arguing about every single 
RMA.  I've found that this is so bad that it is more economical to just throw 
away a failed component worth < USD 500 than to RMA it.  Do you pay for 
extended warranty / support?  That's expensive too.

* Certain chassis brands who shall remain nameless push RoC HBAs hard with 
extreme markups.  List prices as high as USD2000.  Per server, eschewing those 
abominations makes up for a lot of the drive-only unit economics

* But this is the part that lots of people don't get:  You don't just stack up 
the drives on a desk and use them.  They go into *servers* that cost money and 
*racks* that cost money.  They take *power* that costs money.

* $ / IOPS are FAR better for ANY SSD than for HDDs

* RUs cost money, so do chassis and switches

* Drive failures cost money

* So does having your people and applications twiddle their thumbs waiting for 
stuff to happen.  I worked for a supercomputer company who put low-memory 
low-end diskless workstations on engineer's desks.  They spent lots of time 
doing nothing waiting for their applications to respond.  This company no 
longer exists.

* So does the risk of taking *weeks* to heal from a drive failure

Punch honest numbers into https://www.snia.org/forums/cmsi/programs/TCOcalc

 I walked through this with a certain global company.  QLC SSDs were 
demonstrated to have like 30% lower TCO than spinners.  Part of the equation is 
that they were accustomed to limiting HDD size to 8 TB because of the 
bottlenecks, and thus requiring more servers, more switch ports, more DC racks, 
more rack/stack time, more administrative overhead.  You can fit 1.9 PB of raw 
SSD capacity in a 1U server.  That same RU will hold at most 88 TB of the 
largest spinners you can get today.  22 TIMES the density.  And since many 
applications can't even barely tolerate the spinner bottlenecks, capping 
spinner size at even 10T makes that like 40 TIMES better density with SSDs.


> However, I don't think it's interesting to lose the functionality of the 
> replicas.
>  I'm just suggesting another way to increase performance without losing 
> the functionality of replicas.
> 
> 
> Rafael.
>  
> 
> De: "Anthony D'Atri" 
> Enviada: 2024/01/31 17:04:08
> Para: quag...@bol.com.br
> Cc: ceph-users@ceph.io
> Assunto: Re: [ceph-users] Performance improvement suggestion
>  
> Would you be willing to accept the risk of data loss?
>  
>> 
>> On Jan 31, 2024, at 2:48 PM, quag...@bol.com.br wrote:
>>  
>> Hello everybody,
>>  I would like to make a suggestion for improving performance in Ceph 
>> architecture.
>>  I don't know if this group would be the best place or if my proposal is 
>> correct.
>> 
>>  My suggestion would be in the item 
>> 

[ceph-users] Re: cephfs inode backtrace information

2024-02-01 Thread Loïc Tortay

On 31/01/2024 20:13, Patrick Donnelly wrote:

On Tue, Jan 30, 2024 at 5:03 AM Dietmar Rieder
 wrote:


Hello,

I have a question regarding the default pool of a cephfs.

According to the docs it is recommended to use a fast ssd replicated
pool as default pool for cephfs. I'm asking what are the space
requirements for storing the inode backtrace information?


The actual recommendation is to use a replicated pool for the default
data pool. Regular hard drives are fine for the storage device.


Hello,
Is there a rule of thumb for the space requirements of the default pool 
(depending on the number of POSIX objects) ?


One of our CephFS clusters is configured with a replicated default pool, 
but we find the space usage on that pool to be very high with the 
(somewhat) moderate number of files:

[ceph: root@$NODE /]# ceph df
--- RAW STORAGE ---
CLASS SIZEAVAIL USED  RAW USED  %RAW USED
hdd5.9 PiB  3.9 PiB  2.0 PiB   2.0 PiB  33.53
ssd 51 TiB   35 TiB   15 TiB15 TiB  30.08
TOTAL  5.9 PiB  3.9 PiB  2.0 PiB   2.0 PiB  33.50

--- POOLS ---
POOL  ID   PGS   STORED  OBJECTS USED  %USED  MAX AVAIL
device_health_metrics  2 1  733 MiB  664  2.1 GiB  09.9 TiB
cephfs_EC_data 3  8192  1.5 PiB  515.56M  1.9 PiB  34.222.9 PiB
cephfs_metadata4   128   86 GiB   10.28M  259 GiB   0.849.9 TiB
cephfs_default 5   512  4.2 TiB  131.17M   13 TiB  29.879.9 TiB
[...]

According to our statistics, there are about 132 million files and 
symlinks in the filesystem which is consistent with the number of 
objects for the "cephfs_default" pool.

(same for the metadata pool and the ~10 million directories)

But 4.2 TiB stored (~32 kiB per object) seems high, is this overhead 
expected ?


This is a Pacific cluster (16.2.14) if that matters.


Have a nice day,
Loïc.
--
|   Loīc Tortay  - IN2P3 Computing Centre  |
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Performance improvement suggestion

2024-02-01 Thread quag...@bol.com.br
 
Hi Janne, thanks for your reply.

I think that it would be good to maintain the number of configured replicas. I don't think it's interesting to decrease to size=1.

However, I think it is not necessary to write to all disks to release the client's request. Replicas could be recorded immediately in a second step.

Nowadays, more and more software are implementing parallelism for writing through specific libraries. Examples: MPI-IO, HDF5, pnetCDF, etc...

This way, even if the cluster has multiple disks, the objects will be written in PARALLEL. The greater the number of processes recording at the same time, the greater the storage load, regardless of the type of disk used (HDD, SSD or NVMe).

I think and suggest that it is very useful to have the initial recording only be done on one disk and the replicas be done after the client is released (asynchronously).

Rafael.
 



De: "Janne Johansson" 
Enviada: 2024/02/01 04:08:05
Para: anthony.da...@gmail.com
Cc:  acozy...@gmail.com, quag...@bol.com.br, ceph-users@ceph.io
Assunto:  Re: [ceph-users] Re: Performance improvement suggestion
 
> I’ve heard conflicting asserts on whether the write returns with min_size shards have been persisted, or all of them.

I think it waits until all replicas have written the data, but from
simplistic tests with fast network and slow drives, the extra time
taken to write many copies is not linear to what it takes to write the
first, so unless you do go min_size=1 (not recommended at all), the
extra copies are not slowing you down as much as you'd expect. At
least not if the other drives are not 100% busy.

I get that this thread started on having one bad drive, and that is
another scenario of course, but having repl=2 or repl=3 is not about
writes taking 100% - 200% more time than the single write, it is less.

--
May the most significant bit of your life be positive.___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Performance improvement suggestion

2024-02-01 Thread quag...@bol.com.br
 
 
Hi Anthony,
 Thanks for your reply.

 I didn't say I would accept the risk of losing data.

 I just said that it would be interesting if the objects were first recorded only in the primary OSD.
 This way it would greatly increase performance (both for iops and throuput).
 Later (in the background), record the replicas. This situation would avoid leaving users/software waiting for the recording response from all replicas when the storage is overloaded.

 Where I work, performance is very important and we don't have money to make a entire cluster only with NVMe. However, I don't think it's interesting to lose the functionality of the replicas.
 I'm just suggesting another way to increase performance without losing the functionality of replicas.


Rafael.
 


De: "Anthony D'Atri" 
Enviada: 2024/01/31 17:04:08
Para: quag...@bol.com.br
Cc:  ceph-users@ceph.io
Assunto:  Re: [ceph-users] Performance improvement suggestion
 
Would you be willing to accept the risk of data loss?

 

On Jan 31, 2024, at 2:48 PM, quag...@bol.com.br wrote:
 

Hello everybody,
 I would like to make a suggestion for improving performance in Ceph architecture.
 I don't know if this group would be the best place or if my proposal is correct.

 My suggestion would be in the item https://docs.ceph.com/en/latest/architecture/, at the end of the topic "Smart Daemons Enable Hyperscale".

 The Client needs to "wait" for the configured amount of replicas to be written (so that the client receives an ok and continues). This way, if there is slowness on any of the disks on which the PG will be updated, the client is left waiting.
     
 It would be possible:
     
 1-) Only record on the primary OSD
 2-) Write other replicas in background (like the same way as when an OSD fails: "degraded" ).

 This way, client has a faster response when writing to storage: improving latency and performance (throughput and IOPS).
     
 I would find it plausible to accept a period of time (seconds) until all replicas are ok (written asynchronously) at the expense of improving performance.
     
 Could you evaluate this scenario?


Rafael.

 ___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Performance improvement suggestion

2024-02-01 Thread quag...@bol.com.br
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 6 pgs not deep-scrubbed in time

2024-02-01 Thread Michel Niyoyita
And as said before still it is in warning state with pgs not deep-scrubed
in time . Hope this can be ignored and set those two flags "noout and
nobackfill" then reboot .

Thank you again Sir

On Thu, 1 Feb 2024, 16:11 Michel Niyoyita,  wrote:

> Thank you very much Janne.
>
> On Thu, 1 Feb 2024, 15:21 Janne Johansson,  wrote:
>
>> pause and nodown is not a good option to set, that will certainly make
>> clients stop IO. Pause will stop it immediately, and nodown will stop
>> IO when the OSD processes stop running on this host.
>>
>> When we do service on a host, we set "noout" and "nobackfill", that is
>> enough for reboots, OS upgrades and simple disk exchanges.
>> The PGs on this one host will be degraded during the down period, but
>> IO continues.
>> Of course this is when the cluster was healthy to begin with (not
>> counting "not scrubbed in time" warnings, they don't matter in this
>> case.)
>>
>>
>>
>> Den tors 1 feb. 2024 kl 12:21 skrev Michel Niyoyita :
>> >
>> > Thanks Very much Wesley,
>> >
>> > We have decided to restart one host among three osds hosts. before doing
>> > that I need the advices of the team . these are flags I want to set
>> before
>> > restart.
>> >
>> >  'ceph osd set noout'
>> >  'ceph osd set nobackfill'
>> >  'ceph osd set norecover'
>> >  'ceph osd set norebalance'
>> > 'ceph osd set nodown'
>> >  'ceph osd set pause'
>> > 'ceph osd set nodeep-scrub'
>> > 'ceph osd set noscrub'
>> >
>> >
>> > Would like to ask if this can be enough to set and restart the host
>> safely
>> > . the cluster has 3 as replicas.
>> >
>> > will the cluster still be accessible while restart the hosts? after
>> > restarting I will unset the flags.
>> >
>> > Kindly advise.
>> >
>> > Michel
>> >
>> >
>> > On Tue, 30 Jan 2024, 17:44 Wesley Dillingham, 
>> wrote:
>> >
>> > > actually it seems the issue I had in mind was fixed in 16.2.11 so you
>> > > should be fine.
>> > >
>> > > Respectfully,
>> > >
>> > > *Wes Dillingham*
>> > > w...@wesdillingham.com
>> > > LinkedIn 
>> > >
>> > >
>> > > On Tue, Jan 30, 2024 at 10:34 AM Wesley Dillingham <
>> w...@wesdillingham.com>
>> > > wrote:
>> > >
>> > >> You may want to consider upgrading to 16.2.14 before you do the pg
>> split.
>> > >>
>> > >> Respectfully,
>> > >>
>> > >> *Wes Dillingham*
>> > >> w...@wesdillingham.com
>> > >> LinkedIn 
>> > >>
>> > >>
>> > >> On Tue, Jan 30, 2024 at 10:18 AM Michel Niyoyita 
>> > >> wrote:
>> > >>
>> > >>> I tried that on one of my pool (pool id 3) but the number of pgs not
>> > >>> deep-scrubbed in time increased also from 55 to 100 but the number
>> of PGs
>> > >>> was increased. I set also autoscale to off mode. before continue to
>> other
>> > >>> pools would like to ask if so far there is no negative impact.
>> > >>>
>> > >>> ceph -s
>> > >>>   cluster:
>> > >>> id: cb0caedc-eb5b-42d1-a34f-96facfda8c27
>> > >>> health: HEALTH_WARN
>> > >>> 100 pgs not deep-scrubbed in time
>> > >>>
>> > >>>   services:
>> > >>> mon: 3 daemons, quorum ceph-mon1,ceph-mon2,ceph-mon3 (age 11M)
>> > >>> mgr: ceph-mon2(active, since 11M), standbys: ceph-mon3,
>> ceph-mon1
>> > >>> osd: 48 osds: 48 up (since 11M), 48 in (since 12M)
>> > >>> rgw: 6 daemons active (6 hosts, 1 zones)
>> > >>>
>> > >>>   data:
>> > >>> pools:   10 pools, 609 pgs
>> > >>> objects: 6.03M objects, 23 TiB
>> > >>> usage:   151 TiB used, 282 TiB / 433 TiB avail
>> > >>> pgs: 603 active+clean
>> > >>>  4   active+clean+scrubbing+deep
>> > >>>  2   active+clean+scrubbing
>> > >>>
>> > >>>   io:
>> > >>> client:   96 MiB/s rd, 573 MiB/s wr, 576 op/s rd, 648 op/s wr
>> > >>>
>> > >>> root@ceph-osd3:/var/log# ceph df
>> > >>> --- RAW STORAGE ---
>> > >>> CLASS SIZEAVAIL USED  RAW USED  %RAW USED
>> > >>> hdd433 TiB  282 TiB  151 TiB   151 TiB  34.93
>> > >>> TOTAL  433 TiB  282 TiB  151 TiB   151 TiB  34.93
>> > >>>
>> > >>> --- POOLS ---
>> > >>> POOL   ID  PGS   STORED  OBJECTS USED  %USED
>> MAX
>> > >>> AVAIL
>> > >>> device_health_metrics   11  1.1 MiB3  3.2 MiB  0
>>  72
>> > >>> TiB
>> > >>> .rgw.root   2   32  3.7 KiB8   96 KiB  0
>>  72
>> > >>> TiB
>> > >>> default.rgw.log 3  256  3.6 KiB  204  408 KiB  0
>>  72
>> > >>> TiB
>> > >>> default.rgw.control 4   32  0 B8  0 B  0
>>  72
>> > >>> TiB
>> > >>> default.rgw.meta5   32382 B2   24 KiB  0
>>  72
>> > >>> TiB
>> > >>> volumes 6  128   21 TiB5.74M   62 TiB  22.30
>>  72
>> > >>> TiB
>> > >>> images  7   32  878 GiB  112.50k  2.6 TiB   1.17
>>  72
>> > >>> TiB
>> > >>> backups 8   32  0 B0  0 B  0
>>  72
>> > >>> TiB
>> > >>> vms 9   32  870 GiB  170.73k  2.5 TiB   1.13
>>  72
>> > >>> TiB
>> > >>> 

[ceph-users] Re: 6 pgs not deep-scrubbed in time

2024-02-01 Thread Michel Niyoyita
Thank you very much Janne.

On Thu, 1 Feb 2024, 15:21 Janne Johansson,  wrote:

> pause and nodown is not a good option to set, that will certainly make
> clients stop IO. Pause will stop it immediately, and nodown will stop
> IO when the OSD processes stop running on this host.
>
> When we do service on a host, we set "noout" and "nobackfill", that is
> enough for reboots, OS upgrades and simple disk exchanges.
> The PGs on this one host will be degraded during the down period, but
> IO continues.
> Of course this is when the cluster was healthy to begin with (not
> counting "not scrubbed in time" warnings, they don't matter in this
> case.)
>
>
>
> Den tors 1 feb. 2024 kl 12:21 skrev Michel Niyoyita :
> >
> > Thanks Very much Wesley,
> >
> > We have decided to restart one host among three osds hosts. before doing
> > that I need the advices of the team . these are flags I want to set
> before
> > restart.
> >
> >  'ceph osd set noout'
> >  'ceph osd set nobackfill'
> >  'ceph osd set norecover'
> >  'ceph osd set norebalance'
> > 'ceph osd set nodown'
> >  'ceph osd set pause'
> > 'ceph osd set nodeep-scrub'
> > 'ceph osd set noscrub'
> >
> >
> > Would like to ask if this can be enough to set and restart the host
> safely
> > . the cluster has 3 as replicas.
> >
> > will the cluster still be accessible while restart the hosts? after
> > restarting I will unset the flags.
> >
> > Kindly advise.
> >
> > Michel
> >
> >
> > On Tue, 30 Jan 2024, 17:44 Wesley Dillingham, 
> wrote:
> >
> > > actually it seems the issue I had in mind was fixed in 16.2.11 so you
> > > should be fine.
> > >
> > > Respectfully,
> > >
> > > *Wes Dillingham*
> > > w...@wesdillingham.com
> > > LinkedIn 
> > >
> > >
> > > On Tue, Jan 30, 2024 at 10:34 AM Wesley Dillingham <
> w...@wesdillingham.com>
> > > wrote:
> > >
> > >> You may want to consider upgrading to 16.2.14 before you do the pg
> split.
> > >>
> > >> Respectfully,
> > >>
> > >> *Wes Dillingham*
> > >> w...@wesdillingham.com
> > >> LinkedIn 
> > >>
> > >>
> > >> On Tue, Jan 30, 2024 at 10:18 AM Michel Niyoyita 
> > >> wrote:
> > >>
> > >>> I tried that on one of my pool (pool id 3) but the number of pgs not
> > >>> deep-scrubbed in time increased also from 55 to 100 but the number
> of PGs
> > >>> was increased. I set also autoscale to off mode. before continue to
> other
> > >>> pools would like to ask if so far there is no negative impact.
> > >>>
> > >>> ceph -s
> > >>>   cluster:
> > >>> id: cb0caedc-eb5b-42d1-a34f-96facfda8c27
> > >>> health: HEALTH_WARN
> > >>> 100 pgs not deep-scrubbed in time
> > >>>
> > >>>   services:
> > >>> mon: 3 daemons, quorum ceph-mon1,ceph-mon2,ceph-mon3 (age 11M)
> > >>> mgr: ceph-mon2(active, since 11M), standbys: ceph-mon3, ceph-mon1
> > >>> osd: 48 osds: 48 up (since 11M), 48 in (since 12M)
> > >>> rgw: 6 daemons active (6 hosts, 1 zones)
> > >>>
> > >>>   data:
> > >>> pools:   10 pools, 609 pgs
> > >>> objects: 6.03M objects, 23 TiB
> > >>> usage:   151 TiB used, 282 TiB / 433 TiB avail
> > >>> pgs: 603 active+clean
> > >>>  4   active+clean+scrubbing+deep
> > >>>  2   active+clean+scrubbing
> > >>>
> > >>>   io:
> > >>> client:   96 MiB/s rd, 573 MiB/s wr, 576 op/s rd, 648 op/s wr
> > >>>
> > >>> root@ceph-osd3:/var/log# ceph df
> > >>> --- RAW STORAGE ---
> > >>> CLASS SIZEAVAIL USED  RAW USED  %RAW USED
> > >>> hdd433 TiB  282 TiB  151 TiB   151 TiB  34.93
> > >>> TOTAL  433 TiB  282 TiB  151 TiB   151 TiB  34.93
> > >>>
> > >>> --- POOLS ---
> > >>> POOL   ID  PGS   STORED  OBJECTS USED  %USED  MAX
> > >>> AVAIL
> > >>> device_health_metrics   11  1.1 MiB3  3.2 MiB  0
>  72
> > >>> TiB
> > >>> .rgw.root   2   32  3.7 KiB8   96 KiB  0
>  72
> > >>> TiB
> > >>> default.rgw.log 3  256  3.6 KiB  204  408 KiB  0
>  72
> > >>> TiB
> > >>> default.rgw.control 4   32  0 B8  0 B  0
>  72
> > >>> TiB
> > >>> default.rgw.meta5   32382 B2   24 KiB  0
>  72
> > >>> TiB
> > >>> volumes 6  128   21 TiB5.74M   62 TiB  22.30
>  72
> > >>> TiB
> > >>> images  7   32  878 GiB  112.50k  2.6 TiB   1.17
>  72
> > >>> TiB
> > >>> backups 8   32  0 B0  0 B  0
>  72
> > >>> TiB
> > >>> vms 9   32  870 GiB  170.73k  2.5 TiB   1.13
>  72
> > >>> TiB
> > >>> testbench  10   32  0 B0  0 B  0
>  72
> > >>> TiB
> > >>>
> > >>> On Tue, Jan 30, 2024 at 5:05 PM Wesley Dillingham <
> w...@wesdillingham.com>
> > >>> wrote:
> > >>>
> >  It will take a couple weeks to a couple months to complete is my
> best
> >  guess on 10TB spinners at ~40% full. The cluster should be usable
> >  throughout the process.
> > 

[ceph-users] Re: Performance improvement suggestion

2024-02-01 Thread quag...@bol.com.br
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 6 pgs not deep-scrubbed in time

2024-02-01 Thread Janne Johansson
pause and nodown is not a good option to set, that will certainly make
clients stop IO. Pause will stop it immediately, and nodown will stop
IO when the OSD processes stop running on this host.

When we do service on a host, we set "noout" and "nobackfill", that is
enough for reboots, OS upgrades and simple disk exchanges.
The PGs on this one host will be degraded during the down period, but
IO continues.
Of course this is when the cluster was healthy to begin with (not
counting "not scrubbed in time" warnings, they don't matter in this
case.)



Den tors 1 feb. 2024 kl 12:21 skrev Michel Niyoyita :
>
> Thanks Very much Wesley,
>
> We have decided to restart one host among three osds hosts. before doing
> that I need the advices of the team . these are flags I want to set before
> restart.
>
>  'ceph osd set noout'
>  'ceph osd set nobackfill'
>  'ceph osd set norecover'
>  'ceph osd set norebalance'
> 'ceph osd set nodown'
>  'ceph osd set pause'
> 'ceph osd set nodeep-scrub'
> 'ceph osd set noscrub'
>
>
> Would like to ask if this can be enough to set and restart the host safely
> . the cluster has 3 as replicas.
>
> will the cluster still be accessible while restart the hosts? after
> restarting I will unset the flags.
>
> Kindly advise.
>
> Michel
>
>
> On Tue, 30 Jan 2024, 17:44 Wesley Dillingham,  wrote:
>
> > actually it seems the issue I had in mind was fixed in 16.2.11 so you
> > should be fine.
> >
> > Respectfully,
> >
> > *Wes Dillingham*
> > w...@wesdillingham.com
> > LinkedIn 
> >
> >
> > On Tue, Jan 30, 2024 at 10:34 AM Wesley Dillingham 
> > wrote:
> >
> >> You may want to consider upgrading to 16.2.14 before you do the pg split.
> >>
> >> Respectfully,
> >>
> >> *Wes Dillingham*
> >> w...@wesdillingham.com
> >> LinkedIn 
> >>
> >>
> >> On Tue, Jan 30, 2024 at 10:18 AM Michel Niyoyita 
> >> wrote:
> >>
> >>> I tried that on one of my pool (pool id 3) but the number of pgs not
> >>> deep-scrubbed in time increased also from 55 to 100 but the number of PGs
> >>> was increased. I set also autoscale to off mode. before continue to other
> >>> pools would like to ask if so far there is no negative impact.
> >>>
> >>> ceph -s
> >>>   cluster:
> >>> id: cb0caedc-eb5b-42d1-a34f-96facfda8c27
> >>> health: HEALTH_WARN
> >>> 100 pgs not deep-scrubbed in time
> >>>
> >>>   services:
> >>> mon: 3 daemons, quorum ceph-mon1,ceph-mon2,ceph-mon3 (age 11M)
> >>> mgr: ceph-mon2(active, since 11M), standbys: ceph-mon3, ceph-mon1
> >>> osd: 48 osds: 48 up (since 11M), 48 in (since 12M)
> >>> rgw: 6 daemons active (6 hosts, 1 zones)
> >>>
> >>>   data:
> >>> pools:   10 pools, 609 pgs
> >>> objects: 6.03M objects, 23 TiB
> >>> usage:   151 TiB used, 282 TiB / 433 TiB avail
> >>> pgs: 603 active+clean
> >>>  4   active+clean+scrubbing+deep
> >>>  2   active+clean+scrubbing
> >>>
> >>>   io:
> >>> client:   96 MiB/s rd, 573 MiB/s wr, 576 op/s rd, 648 op/s wr
> >>>
> >>> root@ceph-osd3:/var/log# ceph df
> >>> --- RAW STORAGE ---
> >>> CLASS SIZEAVAIL USED  RAW USED  %RAW USED
> >>> hdd433 TiB  282 TiB  151 TiB   151 TiB  34.93
> >>> TOTAL  433 TiB  282 TiB  151 TiB   151 TiB  34.93
> >>>
> >>> --- POOLS ---
> >>> POOL   ID  PGS   STORED  OBJECTS USED  %USED  MAX
> >>> AVAIL
> >>> device_health_metrics   11  1.1 MiB3  3.2 MiB  0 72
> >>> TiB
> >>> .rgw.root   2   32  3.7 KiB8   96 KiB  0 72
> >>> TiB
> >>> default.rgw.log 3  256  3.6 KiB  204  408 KiB  0 72
> >>> TiB
> >>> default.rgw.control 4   32  0 B8  0 B  0 72
> >>> TiB
> >>> default.rgw.meta5   32382 B2   24 KiB  0 72
> >>> TiB
> >>> volumes 6  128   21 TiB5.74M   62 TiB  22.30 72
> >>> TiB
> >>> images  7   32  878 GiB  112.50k  2.6 TiB   1.17 72
> >>> TiB
> >>> backups 8   32  0 B0  0 B  0 72
> >>> TiB
> >>> vms 9   32  870 GiB  170.73k  2.5 TiB   1.13 72
> >>> TiB
> >>> testbench  10   32  0 B0  0 B  0 72
> >>> TiB
> >>>
> >>> On Tue, Jan 30, 2024 at 5:05 PM Wesley Dillingham 
> >>> wrote:
> >>>
>  It will take a couple weeks to a couple months to complete is my best
>  guess on 10TB spinners at ~40% full. The cluster should be usable
>  throughout the process.
> 
>  Keep in mind, you should disable the pg autoscaler on any pool which
>  you are manually adjusting the pg_num for. Increasing the pg_num is 
>  called
>  "pg splitting" you can google around for this to see how it will work 
>  etc.
> 
>  There are a few knobs to increase or decrease the aggressiveness of the
>  pg split, primarily these are osd_max_backfills and
> 

[ceph-users] Re: Syslog server log naming

2024-02-01 Thread Torkil Svensgaard



On 01/02/2024 12:47, Eugen Block wrote:

Hi Torkil,


Hi Eugen

cephadm does regular checks, for example some 'ceph-volume' stuff to see 
if all assigned disks have actually been deployed as OSDs and so on. 
That's why there are "random" containers created and destroyed. I don't 
have a complete list of checks, though. You should be able to match 
those timestamps with the /var/log/ceph/cephadm.log, here's an example 
from one of my clusters:


# cephadm.log
2024-02-01 00:05:16,491 7f1265f64740 DEBUG 

cephadm ['--image', 
'registry.domain/ebl/ceph-upstream@sha256:057e08bf8d2d20742173a571bc28b65674b055bebe5f4c6cd488c1a6fd51f685', '--timeout', '895', 'ceph-volume', '--fsid', '201a2fbc-ce7b-44a3-9ed7-39427972083b', '--', 'inventory', '--format=json-pretty', '--filter-for-batch']


# syslog:

2024-02-01T00:05:17.335128+01:00 nautilus2 podman[2526232]: 2024-02-01 
00:05:17.334873422 +0100 CET m=+0.210837970 container create 
1731d65c011e4178380d70ed662f4c36dd6fea6429adc382ff1978007be770a8 
(image=registry.domain/ebl/ceph-upstream@sha256:057e08bf8d2d20742173a571bc28b65674b055bebe5f4c6cd488c1a6fd51f685, name=cranky_hermann, GIT_BRANCH=HEAD, GIT_COMMIT=0396eef90bef641b676c164ec7a3876f45010308,

...

A couple of seconds later it checks "cephadm list-networks" and then 
"cephadm ls", and so on. Basically, it's a consistency check.


Aye, that's we we concluded as well. It would be nice if those checks 
were spawned with --name though, so we could have 
"cephadm-list-networks" as program name instead of "practical_hypatia", 
or just "ceph-adm" if the container is used for more than one thing.


I've created a ticket for it on the tracker.

Mvh.

Torkil


Regards,
Eugen

Zitat von Torkil Svensgaard :

So it seems some ceph housekeeping spawn containers without giving 
them a name and that causes this in the journal:


"
Feb 01 04:10:07 dopey podman[766731]: 2024-02-01 04:10:07.967786606 
+0100 CET m=+0.043987882 container create 
95967a040795bd61588dcfdc6ba5daf92553cd2cb3ecd7318cd8b16c1b15782d 
(image=quay.io/ceph/ceph@sha256:1793ff3af6ae74527c86e1a0b22401e9c42dc08d0ebb8379653be07db17d0007, name=practical_hypatia, org.label-schema.vendor=CentOS, GIT_BRANCH=HEAD, maintainer=Guillaume Abrioux , io.buildah.version=1.29.1, org.label-schema.license=GPLv2, GIT_CLEAN=True, GIT_COMMIT=0396eef90bef641b676c164ec7a3876f45010308, ceph=True, RELEASE=HEAD, GIT_REPO=https://github.com/ceph/ceph-container.git, CEPH_POINT_RELEASE=-18.2.0, org.label-schema.build-date=20231212, org.label-schema.name=CentOS Stream 8 Base Image, org.label-schema.schema-version=1.0)

...
Feb 01 04:10:08 dopey practical_hypatia[766758]: 167 167
...
Feb 01 04:10:08 dopey systemd[1]: 
libpod-conmon-95967a040795bd61588dcfdc6ba5daf92553cd2cb3ecd7318cd8b16c1b15782d.scope: Deactivated successfully

"

Mvh.

Torkil

On 01/02/2024 08:24, Torkil Svensgaard wrote:
We have ceph (currently 18.2.0) log to an rsyslog server with the 
following file name format:


template (name="DynaFile" type="string" 
string="/tank/syslog/%fromhost-ip%/%hostname%/%programname%.log")



Around May 25th this year something changed so instead of getting the 
usual program log names we are also getting a lot of diffent logs 
with weird names. Here's an ls excerpt:


"
...
-rw---. 1 root root  517K Feb  1 05:42 hardcore_raman.log
-rw---. 1 root root  198K Feb  1 05:42 sleepy_moser.log
-rw---. 1 root root  203K Feb  1 05:42 friendly_gagarin.log
-rw---. 1 root root  1.1K Feb  1 05:42 goofy_hypatia.log
-rw---. 1 root root  164K Feb  1 05:42 kind_chebyshev.log
-rw---. 1 root root   11K Feb  1 05:42 magical_archimedes.log
-rw---. 1 root root  373K Feb  1 05:42 busy_bardeen.log
-rw---. 1 root root  178K Feb  1 05:42 trusting_euler.log
-rw---. 1 root root  526K Feb  1 05:42 inspiring_golick.log
-rw---. 1 root root  369K Feb  1 06:12 condescending_ganguly.log
-rw---. 1 root root  191K Feb  1 06:12 mystifying_torvalds.log
-rw---. 1 root root  475K Feb  1 06:12 charming_nash.log
-rw---. 1 root root  168K Feb  1 06:12 zealous_sinoussi.log
-rw---. 1 root root  325K Feb  1 06:12 amazing_booth.log
-rw---. 1 root root  516K Feb  1 06:12 great_ardinghelli.log
-rw---. 1 root root  313K Feb  1 06:12 magical_bell.log
-rw---. 1 root root   22K Feb  1 06:12 nifty_swartz.log
-rw---. 1 root root   426 Feb  1 06:12 upbeat_beaver.log
-rw---. 1 root root  166K Feb  1 06:13 funny_lederberg.log
-rw---. 1 root root  164K Feb  1 06:13 frosty_murdock.log
-rw---. 1 root root  374K Feb  1 06:13 elastic_banach.log
-rw---. 1 root root  308K Feb  1 06:13 inspiring_cohen.log
-rw---. 1 root root  176K Feb  1 06:13 angry_wu.log
-rw---. 1 root root   662 Feb  1 06:42 admiring_kalam.log
-rw---. 1 root root  3.1K Feb  1 06:43 thirsty_colden.log
-rw---. 1 root root  4.5M Feb  1 07:01 run-parts.log
-rw---. 1 root root   16M Feb  1 07:01 

[ceph-users] Re: Syslog server log naming

2024-02-01 Thread Eugen Block

Hi Torkil,

cephadm does regular checks, for example some 'ceph-volume' stuff to  
see if all assigned disks have actually been deployed as OSDs and so  
on. That's why there are "random" containers created and destroyed. I  
don't have a complete list of checks, though. You should be able to  
match those timestamps with the /var/log/ceph/cephadm.log, here's an  
example from one of my clusters:


# cephadm.log
2024-02-01 00:05:16,491 7f1265f64740 DEBUG  

cephadm ['--image',  
'registry.domain/ebl/ceph-upstream@sha256:057e08bf8d2d20742173a571bc28b65674b055bebe5f4c6cd488c1a6fd51f685', '--timeout', '895', 'ceph-volume', '--fsid', '201a2fbc-ce7b-44a3-9ed7-39427972083b', '--', 'inventory', '--format=json-pretty',  
'--filter-for-batch']


# syslog:

2024-02-01T00:05:17.335128+01:00 nautilus2 podman[2526232]: 2024-02-01  
00:05:17.334873422 +0100 CET m=+0.210837970 container create  
1731d65c011e4178380d70ed662f4c36dd6fea6429adc382ff1978007be770a8  
(image=registry.domain/ebl/ceph-upstream@sha256:057e08bf8d2d20742173a571bc28b65674b055bebe5f4c6cd488c1a6fd51f685, name=cranky_hermann, GIT_BRANCH=HEAD,  
GIT_COMMIT=0396eef90bef641b676c164ec7a3876f45010308,

...

A couple of seconds later it checks "cephadm list-networks" and then  
"cephadm ls", and so on. Basically, it's a consistency check.


Regards,
Eugen

Zitat von Torkil Svensgaard :

So it seems some ceph housekeeping spawn containers without giving  
them a name and that causes this in the journal:


"
Feb 01 04:10:07 dopey podman[766731]: 2024-02-01 04:10:07.967786606  
+0100 CET m=+0.043987882 container create  
95967a040795bd61588dcfdc6ba5daf92553cd2cb3ecd7318cd8b16c1b15782d  
(image=quay.io/ceph/ceph@sha256:1793ff3af6ae74527c86e1a0b22401e9c42dc08d0ebb8379653be07db17d0007, name=practical_hypatia, org.label-schema.vendor=CentOS, GIT_BRANCH=HEAD, maintainer=Guillaume Abrioux , io.buildah.version=1.29.1, org.label-schema.license=GPLv2, GIT_CLEAN=True, GIT_COMMIT=0396eef90bef641b676c164ec7a3876f45010308, ceph=True, RELEASE=HEAD, GIT_REPO=https://github.com/ceph/ceph-container.git, CEPH_POINT_RELEASE=-18.2.0, org.label-schema.build-date=20231212, org.label-schema.name=CentOS Stream 8 Base Image,  
org.label-schema.schema-version=1.0)

...
Feb 01 04:10:08 dopey practical_hypatia[766758]: 167 167
...
Feb 01 04:10:08 dopey systemd[1]:  
libpod-conmon-95967a040795bd61588dcfdc6ba5daf92553cd2cb3ecd7318cd8b16c1b15782d.scope: Deactivated  
successfully

"

Mvh.

Torkil

On 01/02/2024 08:24, Torkil Svensgaard wrote:
We have ceph (currently 18.2.0) log to an rsyslog server with the  
following file name format:


template (name="DynaFile" type="string"  
string="/tank/syslog/%fromhost-ip%/%hostname%/%programname%.log")



Around May 25th this year something changed so instead of getting  
the usual program log names we are also getting a lot of diffent  
logs with weird names. Here's an ls excerpt:


"
...
-rw---. 1 root root  517K Feb  1 05:42 hardcore_raman.log
-rw---. 1 root root  198K Feb  1 05:42 sleepy_moser.log
-rw---. 1 root root  203K Feb  1 05:42 friendly_gagarin.log
-rw---. 1 root root  1.1K Feb  1 05:42 goofy_hypatia.log
-rw---. 1 root root  164K Feb  1 05:42 kind_chebyshev.log
-rw---. 1 root root   11K Feb  1 05:42 magical_archimedes.log
-rw---. 1 root root  373K Feb  1 05:42 busy_bardeen.log
-rw---. 1 root root  178K Feb  1 05:42 trusting_euler.log
-rw---. 1 root root  526K Feb  1 05:42 inspiring_golick.log
-rw---. 1 root root  369K Feb  1 06:12 condescending_ganguly.log
-rw---. 1 root root  191K Feb  1 06:12 mystifying_torvalds.log
-rw---. 1 root root  475K Feb  1 06:12 charming_nash.log
-rw---. 1 root root  168K Feb  1 06:12 zealous_sinoussi.log
-rw---. 1 root root  325K Feb  1 06:12 amazing_booth.log
-rw---. 1 root root  516K Feb  1 06:12 great_ardinghelli.log
-rw---. 1 root root  313K Feb  1 06:12 magical_bell.log
-rw---. 1 root root   22K Feb  1 06:12 nifty_swartz.log
-rw---. 1 root root   426 Feb  1 06:12 upbeat_beaver.log
-rw---. 1 root root  166K Feb  1 06:13 funny_lederberg.log
-rw---. 1 root root  164K Feb  1 06:13 frosty_murdock.log
-rw---. 1 root root  374K Feb  1 06:13 elastic_banach.log
-rw---. 1 root root  308K Feb  1 06:13 inspiring_cohen.log
-rw---. 1 root root  176K Feb  1 06:13 angry_wu.log
-rw---. 1 root root   662 Feb  1 06:42 admiring_kalam.log
-rw---. 1 root root  3.1K Feb  1 06:43 thirsty_colden.log
-rw---. 1 root root  4.5M Feb  1 07:01 run-parts.log
-rw---. 1 root root   16M Feb  1 07:01 CROND.log
-rw---. 1 root root  109M Feb  1 07:06 python3.log
-rw---. 1 root root  3.4M Feb  1 07:29 systemd-journald.log
-rw---. 1 root root  596M Feb  1 07:34 sudo.log
-rw---. 1 root root   549 Feb  1 07:44 interesting_rosalind.log
-rw---. 1 root root  342K Feb  1 07:45 beautiful_hamilton.log
-rw---. 1 root root  348K Feb  1 07:45 

[ceph-users] Re: Syslog server log naming

2024-02-01 Thread Torkil Svensgaard
So it seems some ceph housekeeping spawn containers without giving them 
a name and that causes this in the journal:


"
Feb 01 04:10:07 dopey podman[766731]: 2024-02-01 04:10:07.967786606 
+0100 CET m=+0.043987882 container create 
95967a040795bd61588dcfdc6ba5daf92553cd2cb3ecd7318cd8b16c1b15782d 
(image=quay.io/ceph/ceph@sha256:1793ff3af6ae74527c86e1a0b22401e9c42dc08d0ebb8379653be07db17d0007, 
name=practical_hypatia, org.label-schema.vendor=CentOS, GIT_BRANCH=HEAD, 
maintainer=Guillaume Abrioux , 
io.buildah.version=1.29.1, org.label-schema.license=GPLv2, 
GIT_CLEAN=True, GIT_COMMIT=0396eef90bef641b676c164ec7a3876f45010308, 
ceph=True, RELEASE=HEAD, 
GIT_REPO=https://github.com/ceph/ceph-container.git, 
CEPH_POINT_RELEASE=-18.2.0, org.label-schema.build-date=20231212, 
org.label-schema.name=CentOS Stream 8 Base Image, 
org.label-schema.schema-version=1.0)

...
Feb 01 04:10:08 dopey practical_hypatia[766758]: 167 167
...
Feb 01 04:10:08 dopey systemd[1]: 
libpod-conmon-95967a040795bd61588dcfdc6ba5daf92553cd2cb3ecd7318cd8b16c1b15782d.scope: 
Deactivated successfully

"

Mvh.

Torkil

On 01/02/2024 08:24, Torkil Svensgaard wrote:
We have ceph (currently 18.2.0) log to an rsyslog server with the 
following file name format:


template (name="DynaFile" type="string" 
string="/tank/syslog/%fromhost-ip%/%hostname%/%programname%.log")



Around May 25th this year something changed so instead of getting the 
usual program log names we are also getting a lot of diffent logs with 
weird names. Here's an ls excerpt:


"
...
-rw---. 1 root root  517K Feb  1 05:42 hardcore_raman.log
-rw---. 1 root root  198K Feb  1 05:42 sleepy_moser.log
-rw---. 1 root root  203K Feb  1 05:42 friendly_gagarin.log
-rw---. 1 root root  1.1K Feb  1 05:42 goofy_hypatia.log
-rw---. 1 root root  164K Feb  1 05:42 kind_chebyshev.log
-rw---. 1 root root   11K Feb  1 05:42 magical_archimedes.log
-rw---. 1 root root  373K Feb  1 05:42 busy_bardeen.log
-rw---. 1 root root  178K Feb  1 05:42 trusting_euler.log
-rw---. 1 root root  526K Feb  1 05:42 inspiring_golick.log
-rw---. 1 root root  369K Feb  1 06:12 condescending_ganguly.log
-rw---. 1 root root  191K Feb  1 06:12 mystifying_torvalds.log
-rw---. 1 root root  475K Feb  1 06:12 charming_nash.log
-rw---. 1 root root  168K Feb  1 06:12 zealous_sinoussi.log
-rw---. 1 root root  325K Feb  1 06:12 amazing_booth.log
-rw---. 1 root root  516K Feb  1 06:12 great_ardinghelli.log
-rw---. 1 root root  313K Feb  1 06:12 magical_bell.log
-rw---. 1 root root   22K Feb  1 06:12 nifty_swartz.log
-rw---. 1 root root   426 Feb  1 06:12 upbeat_beaver.log
-rw---. 1 root root  166K Feb  1 06:13 funny_lederberg.log
-rw---. 1 root root  164K Feb  1 06:13 frosty_murdock.log
-rw---. 1 root root  374K Feb  1 06:13 elastic_banach.log
-rw---. 1 root root  308K Feb  1 06:13 inspiring_cohen.log
-rw---. 1 root root  176K Feb  1 06:13 angry_wu.log
-rw---. 1 root root   662 Feb  1 06:42 admiring_kalam.log
-rw---. 1 root root  3.1K Feb  1 06:43 thirsty_colden.log
-rw---. 1 root root  4.5M Feb  1 07:01 run-parts.log
-rw---. 1 root root   16M Feb  1 07:01 CROND.log
-rw---. 1 root root  109M Feb  1 07:06 python3.log
-rw---. 1 root root  3.4M Feb  1 07:29 systemd-journald.log
-rw---. 1 root root  596M Feb  1 07:34 sudo.log
-rw---. 1 root root   549 Feb  1 07:44 interesting_rosalind.log
-rw---. 1 root root  342K Feb  1 07:45 beautiful_hamilton.log
-rw---. 1 root root  348K Feb  1 07:45 cool_ride.log
-rw---. 1 root root   15G Feb  1 07:45 conmon.log
-rw---. 1 root root  395K Feb  1 07:45 compassionate_satoshi.log
-rw---. 1 root root   11K Feb  1 07:45 hardcore_noether.log
-rw---. 1 root root  223K Feb  1 07:45 wizardly_johnson.log
-rw---. 1 root root  270M Feb  1 07:49 sshd.log
-rw---. 1 root root  111M Feb  1 07:49 systemd-logind.log
-rw---. 1 root root  1.6G Feb  1 07:50 systemd.log
-rw---. 1 root root  119M Feb  1 07:54 rsyslogd.log
-rw---. 1 root root   94G Feb  1 07:55 
ceph-8ee2d228-ed21-4580-8bbf-064.log

-rw---. 1 root root  1.1G Feb  1 07:56 podman.log
-rw---. 1 root root  1.8G Feb  1 07:58 ceph-mgr.log
-rw---. 1 root root  213G Feb  1 07:58 ceph-osd.log
-rw---. 1 root root   48G Feb  1 07:58 ceph-mon.log

"

Those are container names or something like that? The file content seems 
to be assorted bits from the ceph disk tool:


"
# cat goofy_hypatia.log
Jun  7 04:12:43 dopey goofy_hypatia[3224681]: 167 167
Jun 24 09:00:08 dopey goofy_hypatia[2319188]: --> passed data devices: 
22 physical, 0 LVM

Jun 24 09:00:08 dopey goofy_hypatia[2319188]: --> relative data size: 1.0
Jun 24 09:00:08 dopey goofy_hypatia[2319188]: --> All data devices are 
unavailable

Jun 24 09:00:08 dopey goofy_hypatia[2319188]: []
Sep 13 14:22:10 dopey goofy_hypatia[2027428]: --> passed data devices: 
22 physical, 0 LVM

Sep 13 14:22:10 dopey goofy_hypatia[2027428]: --> 

[ceph-users] Re: 6 pgs not deep-scrubbed in time

2024-02-01 Thread Michel Niyoyita
Thanks Very much Wesley,

We have decided to restart one host among three osds hosts. before doing
that I need the advices of the team . these are flags I want to set before
restart.

 'ceph osd set noout'
 'ceph osd set nobackfill'
 'ceph osd set norecover'
 'ceph osd set norebalance'
'ceph osd set nodown'
 'ceph osd set pause'
'ceph osd set nodeep-scrub'
'ceph osd set noscrub'


Would like to ask if this can be enough to set and restart the host safely
. the cluster has 3 as replicas.

will the cluster still be accessible while restart the hosts? after
restarting I will unset the flags.

Kindly advise.

Michel


On Tue, 30 Jan 2024, 17:44 Wesley Dillingham,  wrote:

> actually it seems the issue I had in mind was fixed in 16.2.11 so you
> should be fine.
>
> Respectfully,
>
> *Wes Dillingham*
> w...@wesdillingham.com
> LinkedIn 
>
>
> On Tue, Jan 30, 2024 at 10:34 AM Wesley Dillingham 
> wrote:
>
>> You may want to consider upgrading to 16.2.14 before you do the pg split.
>>
>> Respectfully,
>>
>> *Wes Dillingham*
>> w...@wesdillingham.com
>> LinkedIn 
>>
>>
>> On Tue, Jan 30, 2024 at 10:18 AM Michel Niyoyita 
>> wrote:
>>
>>> I tried that on one of my pool (pool id 3) but the number of pgs not
>>> deep-scrubbed in time increased also from 55 to 100 but the number of PGs
>>> was increased. I set also autoscale to off mode. before continue to other
>>> pools would like to ask if so far there is no negative impact.
>>>
>>> ceph -s
>>>   cluster:
>>> id: cb0caedc-eb5b-42d1-a34f-96facfda8c27
>>> health: HEALTH_WARN
>>> 100 pgs not deep-scrubbed in time
>>>
>>>   services:
>>> mon: 3 daemons, quorum ceph-mon1,ceph-mon2,ceph-mon3 (age 11M)
>>> mgr: ceph-mon2(active, since 11M), standbys: ceph-mon3, ceph-mon1
>>> osd: 48 osds: 48 up (since 11M), 48 in (since 12M)
>>> rgw: 6 daemons active (6 hosts, 1 zones)
>>>
>>>   data:
>>> pools:   10 pools, 609 pgs
>>> objects: 6.03M objects, 23 TiB
>>> usage:   151 TiB used, 282 TiB / 433 TiB avail
>>> pgs: 603 active+clean
>>>  4   active+clean+scrubbing+deep
>>>  2   active+clean+scrubbing
>>>
>>>   io:
>>> client:   96 MiB/s rd, 573 MiB/s wr, 576 op/s rd, 648 op/s wr
>>>
>>> root@ceph-osd3:/var/log# ceph df
>>> --- RAW STORAGE ---
>>> CLASS SIZEAVAIL USED  RAW USED  %RAW USED
>>> hdd433 TiB  282 TiB  151 TiB   151 TiB  34.93
>>> TOTAL  433 TiB  282 TiB  151 TiB   151 TiB  34.93
>>>
>>> --- POOLS ---
>>> POOL   ID  PGS   STORED  OBJECTS USED  %USED  MAX
>>> AVAIL
>>> device_health_metrics   11  1.1 MiB3  3.2 MiB  0 72
>>> TiB
>>> .rgw.root   2   32  3.7 KiB8   96 KiB  0 72
>>> TiB
>>> default.rgw.log 3  256  3.6 KiB  204  408 KiB  0 72
>>> TiB
>>> default.rgw.control 4   32  0 B8  0 B  0 72
>>> TiB
>>> default.rgw.meta5   32382 B2   24 KiB  0 72
>>> TiB
>>> volumes 6  128   21 TiB5.74M   62 TiB  22.30 72
>>> TiB
>>> images  7   32  878 GiB  112.50k  2.6 TiB   1.17 72
>>> TiB
>>> backups 8   32  0 B0  0 B  0 72
>>> TiB
>>> vms 9   32  870 GiB  170.73k  2.5 TiB   1.13 72
>>> TiB
>>> testbench  10   32  0 B0  0 B  0 72
>>> TiB
>>>
>>> On Tue, Jan 30, 2024 at 5:05 PM Wesley Dillingham 
>>> wrote:
>>>
 It will take a couple weeks to a couple months to complete is my best
 guess on 10TB spinners at ~40% full. The cluster should be usable
 throughout the process.

 Keep in mind, you should disable the pg autoscaler on any pool which
 you are manually adjusting the pg_num for. Increasing the pg_num is called
 "pg splitting" you can google around for this to see how it will work etc.

 There are a few knobs to increase or decrease the aggressiveness of the
 pg split, primarily these are osd_max_backfills and
 target_max_misplaced_ratio.

 You can monitor the progress of the split by looking at "ceph osd pool
 ls detail" for the pool you are splitting, for this pool pgp_num will
 slowly increase up until it reaches the pg_num / pg_num_target.

 IMO this blog post best covers the issue which you are looking to
 undertake:
 https://ceph.io/en/news/blog/2019/new-in-nautilus-pg-merging-and-autotuning/

 Respectfully,

 *Wes Dillingham*
 w...@wesdillingham.com
 LinkedIn 


 On Tue, Jan 30, 2024 at 9:38 AM Michel Niyoyita 
 wrote:

> Thanks for your advices Wes, below is what ceph osd df tree shows ,
> the increase of pg_num of the production cluster will not affect the
> performance or crush ? how long it can takes to finish?

[ceph-users] Re: Understanding subvolumes

2024-02-01 Thread Kotresh Hiremath Ravishankar
Comments inline.

On Thu, Feb 1, 2024 at 4:51 AM Matthew Melendy  wrote:

> In our department we're getting starting with Ceph 'reef', using Ceph FUSE
> client for our Ubuntu workstations.
>
> So far so good, except I can't quite figure out one aspect of subvolumes.
>
> When I do the commands:
>
> [root@ceph1 ~]# ceph fs subvolumegroup create cephfs csvg
> [root@ceph1 ~]# ceph fs subvolume create cephfs staff csvg --size
> 2
>
> I get these results:
>
> - A subvolume group csvg is created on volume cephfs
> - A subvolume called staff is created in csvg subvolume (like
> /volumes/csvg/staff ) however there is no size limit set at this folder in
> the Ceph dashboard view
> - A folder with an random UUID name is created inside the subvolume staff
> ( like /volumes/csvg/staff/6a1b3de5-f6ab-4878-aea3-3c3c6f96ffcf ); this
> folder does have a size set on it of 2TB
>
> My questions are:
> - what's the purpose of this UUID, and is it a requirement?
>

The UUID directory is essentially the data directory for the user to store
data.
The subvolume directory is used internally to store metadata related to
subvolume
to support all the subvolume operations.

For more detailed information, please go through the following comment in
the code.
https://github.com/ceph/ceph/blob/main/src/pybind/mgr/volumes/fs/operations/versions/subvolume_v2.py#L19C1-L38C8


- which directory should be mounted for my clients, staff/ or staff/{UUID},
> for the size limit to take effect?
>

 The quota (size passed during subvolume creation/or set after creation) is
enforced on the uuid directory not on subvolume
directory. So it should be staff/{uuid}. The idea is to use the 'subvolume
getpath' command and use the returned path to mount. That
should take care of all the things.

- is there any way to hide or disable this UUID for client mounts? (eg in
> /etc/fstab)
>

I didn't quite get this ?


> [root@ceph1 ~]# ceph fs subvolume getpath cephfs staff csvg
> /volumes/csvg/staff/6a1b3de5-f6ab-4878-aea3-3c3c6f96ffcf
>
> [root@ceph1 ~]# ceph fs subvolume ls cephfs csvg
> [
>  {
>  "name": "staff"
>  }
> ]
>
>
>
> --
> Sincerely,
>
>
> Matthew Melendy
>
> IT Services Specialist
> CS System Services Group
> FEC 3550, University of New Mexico
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Understanding subvolumes

2024-02-01 Thread Neeraj Pratap Singh
Hi,
In reply to your question for this UUID thing, we do need it for CLONING
subvolume and yes, it is a requirement.
And the subvolume mount path is the entire directory path.

On Thu, Feb 1, 2024 at 4:51 AM Matthew Melendy  wrote:

> In our department we're getting starting with Ceph 'reef', using Ceph FUSE
> client for our Ubuntu workstations.
>
> So far so good, except I can't quite figure out one aspect of subvolumes.
>
> When I do the commands:
>
> [root@ceph1 ~]# ceph fs subvolumegroup create cephfs csvg
> [root@ceph1 ~]# ceph fs subvolume create cephfs staff csvg --size
> 2
>
> I get these results:
>
> - A subvolume group csvg is created on volume cephfs
> - A subvolume called staff is created in csvg subvolume (like
> /volumes/csvg/staff ) however there is no size limit set at this folder in
> the Ceph dashboard view
> - A folder with an random UUID name is created inside the subvolume staff
> ( like /volumes/csvg/staff/6a1b3de5-f6ab-4878-aea3-3c3c6f96ffcf ); this
> folder does have a size set on it of 2TB
>
> My questions are:
> - what's the purpose of this UUID, and is it a requirement?
> - which directory should be mounted for my clients, staff/ or
> staff/{UUID}, for the size limit to take effect?
> - is there any way to hide or disable this UUID for client mounts? (eg in
> /etc/fstab)
>
> [root@ceph1 ~]# ceph fs subvolume getpath cephfs staff csvg
> /volumes/csvg/staff/6a1b3de5-f6ab-4878-aea3-3c3c6f96ffcf
>
> [root@ceph1 ~]# ceph fs subvolume ls cephfs csvg
> [
>  {
>  "name": "staff"
>  }
> ]
>
>
>
> --
> Sincerely,
>
>
> Matthew Melendy
>
> IT Services Specialist
> CS System Services Group
> FEC 3550, University of New Mexico
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Snapshot automation/scheduling for rbd?

2024-02-01 Thread Jeremy Hansen
Can rbd image snapshotting be scheduled like CephFS snapshots? Maybe I missed 
it in the documentation but it looked like scheduling snapshots wasn’t a 
feature for block images. I’m still running Pacific. We’re trying to devise a 
sufficient backup plan for Cloudstack and other things residing in Ceph.

Thanks.
-jeremy



signature.asc
Description: PGP signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: how can install latest dev release?

2024-02-01 Thread Christian Rohmann

On 31.01.24 11:33, garcetto wrote:
thank you, but seems related to quincy, there is nothing on latest 
vesions in the doc...maybe the doc is not updated?



I don't understand what you are missing. I just used a documentation 
link pointing to the Quincy version of this page, yes.
The "latest" documentation is at 
https://docs.ceph.com/en/latest/install/get-packages/#ceph-development-packages.
But it seems nothing has changed. There are dev packages available at 
the URLs mentioned there.



Regards


Christian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Throughput metrics missing iwhen updating Ceph Quincy to Reef

2024-02-01 Thread Christian Rohmann
This change is documented at 
https://docs.ceph.com/en/latest/mgr/prometheus/#ceph-daemon-performance-counters-metrics,
also mentioning the deployment of ceph-exporter which is now used to 
collect per-host metrics from the local daemons.


While this deployment is done by cephadm if used, I am wondering if 
ceph-exporter ([2] is also built and packaged via the ceph packages [3] 
for installations that use them?




Regards


Christian





[1] 
https://docs.ceph.com/en/latest/mgr/prometheus/#ceph-daemon-performance-counters-metrics

[2] https://github.com/ceph/ceph/tree/main/src/exporter
[3] https://docs.ceph.com/en/latest/install/get-packages/




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Merging two ceph clusters

2024-02-01 Thread Nico Schottelius


Good morning,

in the spirit of the previous thread, I am wondering if anyone ever
succeeded in merging two separate ceph clusters into one?

Background from my side: we are running multiple ceph clusters in
k8s/rook, but we still have some Nautilus/Devuan based clusters that are
hard-to-impossible to upgrade.

One of the ideas was to shift over the existing workload of the older
clusters into the newer clusters. In the best case the whole process
would be possible while keeping everything running in the new and old
cluster.

One clear challenge is that the two clusters obviously have a different
FSID. The pool (names) in the two clusters differ, but the pool IDs are
overlapping.

If anyone has ever done this or has ideas on how to accomplish this, I
would be very interested in hearing your opinion.

Greetings from the Swiss mountains,

Nico



--
Sustainable and modern Infrastructures by ungleich.ch
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io