[ceph-users] Re: RGW sync gets stuck every day

2024-09-11 Thread Matthew Darwin
I'm on quincy. I had lots of problems with RGW getting stuck.  Once I dedicated 1 single RGW on each side to do replication, my problems went away.  Having a cluster of RGW behind a load balancer seemed to be confusing things. I still have multiple RGW for user-facing load, but a single RGW

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-06 Thread Matthew Vernon
On 06/09/2024 10:27, Matthew Vernon wrote: On 06/09/2024 08:08, Redouane Kachach wrote: That makes sense. The ipv6 BUG can lead to the issue you described. In the current implementation whenever a mgr failover takes place, prometheus configuration (when using the monitoring stack deployed by

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-06 Thread Matthew Vernon
ly redeploy from time to time) that external monitoring could be pointed at? Regards, Matthew ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-05 Thread Matthew Vernon
On 05/09/2024 15:03, Matthew Vernon wrote: Hi, On 05/09/2024 12:49, Redouane Kachach wrote: The port 8765 is the "service discovery" (an internal server that runs in the mgr... you can change the port by changing the variable service_discovery_port of cephadm). Normally it is ope

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-05 Thread Matthew Vernon
). Right; it wasn't running because I have an IPv6 deployment (that bug's fixed in 18.2.4 - https://tracker.ceph.com/issues/63448). Regards, Matthew ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-04 Thread Matthew Vernon
g. an external Prometheus scraper at the service discovery endpoint of any mgr and it would then tell Prometheus where to scrape metrics from (i.e. the active mgr)? Thanks, Matthew ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an ema

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-03 Thread Matthew Vernon
rometheus config file (under "./etc) and see if there are irregularities there. It's not, it's the mgr container (I've enabled the prometheus mgr module, which makes an endpoint available from whence metrics can be scraped, rather than the prometheus container which r

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-03 Thread Matthew Vernon
I can tell from the docs it should just get started when you enable the prometheus endpoint (which does seem to be working)... Regards, Matthew ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-03 Thread Matthew Vernon
Hi, On 03/09/2024 11:46, Eugen Block wrote: Do you see the port definition in the unit.meta file? Oddly: "ports": [ 9283, 8765, 8765, 8765, 8765 ], which doesn't look right... Regards, Mattew ___ ce

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-03 Thread Matthew Vernon
yment where it's only listening on v4. Thanks, Matthew ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Discovery (port 8765) service not starting

2024-09-02 Thread Matthew Vernon
get the service discovery endpoint working? Thanks, Matthew [0] https://docs.ceph.com/en/reef/cephadm/services/monitoring/#deploying-monitoring-without-cephadm ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email t

[ceph-users] Re: cephadm basic questions: image config, OS reimages

2024-08-27 Thread Matthew Vernon
warning message in that case... Thanks, Matthew ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] multipart file in broken state

2024-07-03 Thread Matthew Darwin
When trying to clean up multi-part files, I get the following error: $ rclone backend cleanup s3:bucket-name 2024/07/04 02:42:19 ERROR : S3 bucket bucket-name: failed to remove pending multipart upload for bucket "bucket-name" key "0a424a15dee6fecb241130e9e4e49d99ed120f05/outputs/012149-0

[ceph-users] Re: ceph rgw zone create fails EINVAL

2024-06-25 Thread Matthew Vernon
On 24/06/2024 21:18, Matthew Vernon wrote: 2024-06-24T17:33:26.880065+00:00 moss-be2001 ceph-mgr[129346]: [rgw ERROR root] Non-zero return from ['radosgw-admin', '-k', '/var/lib/ceph/mgr/ceph-moss-be2001.qvwcaq/keyring', '-n', 'mgr.moss-be20

[ceph-users] Re: ceph rgw zone create fails EINVAL

2024-06-24 Thread Matthew Vernon
On 24/06/2024 20:49, Matthew Vernon wrote: On 19/06/2024 19:45, Adam King wrote: I think this is at least partially a code bug in the rgw module. Where ...the code path seems to have a bunch of places it might raise an exception; are those likely to result in some entry in a log-file? I&#x

[ceph-users] Re: ceph rgw zone create fails EINVAL

2024-06-24 Thread Matthew Vernon
rking out what the problem is quite challenging... Thanks, Matthew ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] ceph rgw zone create fails EINVAL

2024-06-19 Thread Matthew Vernon
up the spec file, but it looks like the one in the docs[0]. Can anyone point me in the right direction, please? [if the underlying command emits anything useful, I can't find it in the logs] Thanks, Matthew [0] https://docs.ceph.com/en/reef/mgr/rgw/#realm-credentials-token ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Setting hostnames for zonegroups via cephadm / rgw mgr module?

2024-06-04 Thread Matthew Vernon
of the zonegroup (and thus control what hostname(s) the rgws are expecting to serve)? Have I missed something, or do I need to set up the realm/zonegroup/zone, extract the zonegroup json and edit hostnames by hand? Thanks, Matthew ___ ceph-users ma

[ceph-users] rgw mgr module not shipped? (in reef at least)

2024-05-31 Thread Matthew Vernon
of modules already, and the rgw one is effectively one small python file, I think... I'm using 18.2.2. Thanks, Matthew ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: ceph orch osd rm --zap --replace leaves cluster in odd state

2024-05-28 Thread Matthew Vernon
roy osd.35 ; echo $? OSD(s) 35 are safe to destroy without reducing data durability. 0 I should have said - this is a reef 18.2.2 cluster, cephadm deployed. Regards, Matthew ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] ceph orch osd rm --zap --replace leaves cluster in odd state

2024-05-28 Thread Matthew Vernon
emoved. What did I do wrong? I don't much care about the OSD id (but obviously it's neater to not just incrementally increase OSD numbers every time a disk died), but I thought that telling ceph orch not to make new OSDs then using ceph orch osd rm to zap the disk and NVME lv

[ceph-users] Re: cephadm bootstraps cluster with bad CRUSH map(?)

2024-05-22 Thread Matthew Vernon
hope it's at least useful as a starter-for-ten: https://github.com/ceph/ceph/pull/57633 Thanks, Matthew ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: cephadm bootstraps cluster with bad CRUSH map(?)

2024-05-21 Thread Matthew Vernon
quot; and similar for the others, but is there a way to have what I want done by cephadm bootstrap? Thanks, Matthew ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: cephadm bootstraps cluster with bad CRUSH map(?)

2024-05-20 Thread Matthew Vernon
work, and vgdisplay on the vg that pvs tells me the nvme device is in shows 24 LVs... Thanks, Matthew ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: cephadm bootstraps cluster with bad CRUSH map(?)

2024-05-20 Thread Matthew Vernon
Hi, On 20/05/2024 17:29, Anthony D'Atri wrote: On May 20, 2024, at 12:21 PM, Matthew Vernon wrote: This has left me with a single sad pg: [WRN] PG_AVAILABILITY: Reduced data availability: 1 pg inactive pg 1.0 is stuck inactive for 33m, current state unknown, last acting [] .mgr

[ceph-users] cephadm bootstraps cluster with bad CRUSH map(?)

2024-05-20 Thread Matthew Vernon
thing to want to do with cephadm? I'm running ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable) Thanks, Matthew ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] cephadm basic questions: image config, OS reimages

2024-05-16 Thread Matthew Vernon
Ds and away you went; how does one do this in a cephadm cluster? [I presume involves telling cephadm to download a new image for podman to use and suchlike] Would the process be smoother if we arranged to leave /var/lib/ceph intact between reimages

[ceph-users] Re: Ceph reef and (slow) backfilling - how to speed it up

2024-05-10 Thread Matthew Darwin
We have had pgs get stuck  in quincy (17.2.7).  After changing to wpq, no such problems were observed.  We're using a replicated (x3) pool. On 2024-05-02 10:02, Wesley Dillingham wrote: In our case it was with a EC pool as well. I believe the PG state was degraded+recovering / recovery_wait and

[ceph-users] How to define a read-only sub-user?

2024-05-08 Thread Matthew Darwin
Hi, I'm new to bucket policies. I'm trying to create a sub-user that has only read-only access to all the buckets of the main user. I created the below policy, I can't create or delete files, but I can still create buckets using "rclone mkdir".  Any idea what I'm doing wrong? I'm using ceph

[ceph-users] Re: Reconstructing an OSD server when the boot OS is corrupted

2024-05-02 Thread Matthew Vernon
On 24/04/2024 13:43, Bailey Allison wrote: A simple ceph-volume lvm activate should get all of the OSDs back up and running once you install the proper packages/restore the ceph config file/etc., What's the equivalent procedure in a cephadm-managed cluster? Thanks, Ma

[ceph-users] Linux Laptop Losing CephFS mounts on Sleep/Hibernate

2024-03-25 Thread matthew
Hi All, So I've got a Ceph Reef Cluster (latest version) with a CephFS system set up with a number of directories on it. On a Laptop (running Rocky Linux (latest version)) I've used fstab to mount a number of those directories - all good, everything works, happy happy joy joy! :-) However, wh

[ceph-users] Mounting A RBD Image via Kernal Modules

2024-03-25 Thread matthew
Hi All, I'm looking for a bit of advice on the subject of this post. I've been "staring at the trees so long I can't see the forest any more". :-) Rocky Linux Client latest version. Ceph Reef latest version. I have read *all* the doco on the Ceph website. I have created a pool (my_pool) and an

[ceph-users] Re: Ceph Cluster Config File Locations?

2024-03-06 Thread matthew
Thanks Eugen, you pointed me in the right direction :-) Yes, the config files I mentioned were the ones in `/var/lib/ceph/{FSID}/mgr.{MGR}/config` - I wasn't aware there were others (well, I suspected their was, hence my Q). The `global public-network` was (re-)set to the old subnet, while the

[ceph-users] Re: Ceph-storage slack access

2024-03-06 Thread Matthew Vernon
e to https://docs.ceph.com/en/latest/start/get-involved/ which lacks the registration link. Regards, Matthew ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Ceph-storage slack access

2024-03-06 Thread Matthew Vernon
Hi, How does one get an invite to the ceph-storage slack, please? Thanks, Matthew ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-26 Thread Matthew Leonard (BLOOMBERG/ 120 PARK)
Glad to hear it all worked out for you! From: nguyenvand...@baoviet.com.vn At: 02/26/24 05:32:32 UTC-5:00To: ceph-users@ceph.io Subject: [ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering Dear Mr Eugen, Mr Matthew, Mr David, Mr Anthony My System is UP. Thank you so

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread Matthew Leonard (BLOOMBERG/ 120 PARK)
Once recovery is underway way simply restarting the RGWs should be enough to reset them and get your object store back up. Bloomberg doesn’t use cephfs so hopefully David’s suggestions work or if anyone else in the community can chip in for that part. Sent from Bloomberg Professional for iPh

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread Matthew Leonard (BLOOMBERG/ 120 PARK)
: nguyenvand...@baoviet.com.vn To: ceph-users@ceph.io At: 02/24/24 16:14:12 UTC Thank you Matthew. Im following guidance from Mr Anthony and now my recovery progress speed is much faster. I will update my case day by day. Thank you so much ___ ceph-users mailing

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread Matthew Leonard (BLOOMBERG/ 120 PARK)
Anthony is correct, this is what I was getting at as well when seeing your ceph -s output. More details in the Ceph docs here if you want to understand the details of why you need to balance your nodes. https://docs.ceph.com/en/quincy/rados/operations/monitoring-osd-pg/ But you need to get you

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread Matthew Leonard (BLOOMBERG/ 120 PARK)
It looks like you have quite a few problems I’ll try and address them one by one. 1) Looks like you had a bunch of crashes, from the ceph -s it looks like you don’t have enough MDS daemons running for a quorum. So you’ll need to restart the crashed containers. 2) It looks like you might have

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-23 Thread Matthew Leonard (BLOOMBERG/ 120 PARK)
Can you send sudo ceph -s and sudo ceph health detail Sent from Bloomberg Professional for iPhone - Original Message - From: nguyenvand...@baoviet.com.vn To: ceph-users@ceph.io At: 02/23/24 20:27:53 UTC-05:00 Could you pls guide me more detail :( im very newbie in Ceph :( _

[ceph-users] Re: Issue with Setting Public/Private Permissions for Bucket

2024-02-23 Thread Matthew Leonard (BLOOMBERG/ 120 PARK)
https://docs.aws.amazon.com/AmazonS3/latest/userguide/acl-overview.html From: asad.siddi...@rapidcompute.com At: 02/23/24 09:42:29 UTC-5:00To: ceph-users@ceph.io Subject: [ceph-users] Issue with Setting Public/Private Permissions for Bucket Hi Team, I'm currently working with Ceph object stora

[ceph-users] Re: Debian 12 (bookworm) / Reef 18.2.1 problems

2024-02-21 Thread Matthew Vernon
the dashboard); there is a MR to fix just the dashboard issue which got merged into main. I've opened a MR to backport that change to Reef: https://github.com/ceph/ceph/pull/55689 I don't know what the devs' plans are for dealing with the broader pyO3 issue, but I'll ask on

[ceph-users] Re: Debian 12 (bookworm) / Reef 18.2.1 problems

2024-02-02 Thread Matthew Darwin
Chris, Thanks for all the investigations you are doing here. We're on quincy/debian11.  Is there any working path at this point to reef/debian12?  Ideally I want to go in two steps.  Upgrade ceph first or upgrade debian first, then do the upgrade to the other one. Most of our infra is already

[ceph-users] Understanding subvolumes

2024-01-31 Thread Matthew Melendy
3c3c6f96ffcf [root@ceph1 ~]# ceph fs subvolume ls cephfs csvg [ { "name": "staff" } ] -- Sincerely, Matthew Melendy IT Services Specialist CS System Services Group FEC 3550, University of New Mexico ___ ceph-users mail

[ceph-users] Re: v18.2.1 Reef released

2023-12-19 Thread Matthew Vernon
18.2.1 (whereas the reporter is still on 18.2.0)? i.e. one has to upgrade to 18.2.1 before this bug will be fixed and so the upgrade _to_ 18.2.1 is still affected. Regards, Matthew ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send

[ceph-users] Re: Debian 12 support

2023-11-13 Thread Matthew Vernon
ctation is that the next point release of Reef (due soon!) will have Debian packages built as part of it. Regards, Matthew ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: v17.2.7 Quincy released

2023-11-12 Thread Matthew Darwin
It would be nice if the dashboard changes which are very big would have been covered in the release notes, especially since they are not really backwards compatible. (See my previous messages on this topic) On 2023-10-30 10:50, Yuri Weinstein wrote: We're happy to announce the 7th backport rel

[ceph-users] Re: Debian 12 support

2023-11-12 Thread Matthew Darwin
We are still waiting on debian 12 support.  Currently our ceph is stuck on debian 11 due to lack of debian 12 releases. On 2023-11-01 03:23, nessero karuzo wrote: Hi to all ceph community. I have a question about Debian 12 support for ceph 17. I didn’t find repo for that release athttps://down

[ceph-users] Re: OSD fails to start after 17.2.6 to 17.2.7 update

2023-11-07 Thread Matthew Booth
I just discovered that rook is tracking this here: https://github.com/rook/rook/issues/13136 On Tue, 7 Nov 2023 at 18:09, Matthew Booth wrote: > On Tue, 7 Nov 2023 at 16:26, Matthew Booth wrote: > >> FYI I left rook as is and reverted to ceph 17.2.6 and the issue is >> resolv

[ceph-users] Re: OSD fails to start after 17.2.6 to 17.2.7 update

2023-11-07 Thread Matthew Booth
On Tue, 7 Nov 2023 at 16:26, Matthew Booth wrote: > FYI I left rook as is and reverted to ceph 17.2.6 and the issue is > resolved. > > The code change was added by > commit 2e52c029bc2b052bb96f4731c6bb00e30ed209be: > ceph-volume: fix broken workaround for atari partitions

[ceph-users] Re: OSD fails to start after 17.2.6 to 17.2.7 update

2023-11-07 Thread Matthew Booth
that regression. Fixes: https://tracker.ceph.com/issues/62001 Signed-off-by: Guillaume Abrioux (cherry picked from commit b3fd5b513176fb9ba1e6e0595ded4b41d401c68e) It feels like a regression to me. Matt On Tue, 7 Nov 2023 at 16:13, Matthew Booth wrote: > Firstly I'm rolli

[ceph-users] OSD fails to start after 17.2.6 to 17.2.7 update

2023-11-07 Thread Matthew Booth
self.list(args) File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 16, in is_root return func(*a, **kw) File "/usr/lib/python3.6/site-packages/ceph_volume/devices/raw/list.py", line 122, in list report = self.generate(args.device) File &qu

[ceph-users] Re: Many pgs inactive after node failure

2023-11-06 Thread Matthew Booth
king I had enough space. Thanks! Matt > > Regards, > Eugen > > [1] https://docs.ceph.com/en/reef/cephadm/services/osd/#activate-existing-osds > > Zitat von Matthew Booth : > > > I have a 3 node ceph cluster in my home lab. One of the pools spans 3 > > hdds,

[ceph-users] Many pgs inactive after node failure

2023-11-04 Thread Matthew Booth
so I will most likely rebuild it. I'm running rook, and I will most likely delete the old node and create a new one with the same name. AFAIK, the OSDs are fine. When rook rediscovers the OSDs, will it add them back with data intact? If not, is there any way I can make it so it will? Thanks! --

[ceph-users] Re: 17.2.7 quincy dashboard issues

2023-11-02 Thread Matthew Darwin
ome filtering done with cluster id or something to properly identify it. FYI @Pedro Gonzalez Gomez <mailto:pegon...@redhat.com> @Ankush Behl <mailto:anb...@redhat.com> @Aashish Sharma <mailto:aasha...@redhat.com> Regards, Nizam On Mon, Oct 30, 2023 at 11:05 PM Matthew Darwin w

[ceph-users] Re: 17.2.7 quincy dashboard issues

2023-10-30 Thread Matthew Darwin
t's why the utilization charts are empty because it relies on the prometheus info. And I raised a PR to disable the new dashboard in quincy. https://github.com/ceph/ceph/pull/54250 Regards, Nizam On Mon, Oct 30, 2023 at 6:09 PM Matthew Darwin wrote: Hello, We're not using

[ceph-users] Re: 17.2.7 quincy

2023-10-30 Thread Matthew Darwin
ff by default. "ceph dashboard feature disable dashboard" works to put the old dashboard back.  Thanks. On 2023-10-30 00:09, Nizamudeen A wrote: Hi Matthew, Is the prometheus configured in the cluster? And also the PROMETHUEUS_API_URL is set? You can set it manually by ceph dashboard set

[ceph-users] 17.2.7 quincy

2023-10-29 Thread Matthew Darwin
Hi all, I see17.2.7 quincy is published as debian-bullseye packages.  So I tried it on a test cluster. I must say I was not expecting the big dashboard change in a patch release.  Also all the "cluster utilization" numbers are all blank now (any way to fix it?), so the dashboard is much less

[ceph-users] Re: radosgw-admin sync error trim seems to do nothing

2023-10-03 Thread Matthew Darwin
2023-08-22 08:00, Matthew Darwin wrote: Thanks Rich, On quincy it seems that provding an end-date is an error.  Any other ideas from anyone? $ radosgw-admin sync error trim --end-date="2023-08-20 23:00:00" end-date not allowed. On 2023-08-20 19:00, Richard Bade wrote: Hi Matthew, At

[ceph-users] Re: Debian/bullseye build for reef

2023-09-07 Thread Matthew Vernon
s... HTH, Matthew ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Debian/bullseye build for reef

2023-09-04 Thread Matthew Vernon
yet offer much time] Regards, Matthew ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: radosgw-admin sync error trim seems to do nothing

2023-08-22 Thread Matthew Darwin
Thanks Rich, On quincy it seems that provding an end-date is an error.  Any other ideas from anyone? $ radosgw-admin sync error trim --end-date="2023-08-20 23:00:00" end-date not allowed. On 2023-08-20 19:00, Richard Bade wrote: Hi Matthew, At least for nautilus (14.2.22) i have

[ceph-users] Re: Debian/bullseye build for reef

2023-08-21 Thread Matthew Darwin
Last few upgrades we upgraded ceph, then upgraded the O/S... it worked great... I was hoping we could do the same again this time. On 2023-08-21 12:18, Chris Palmer wrote: Ohhh.. so if I read that correctly we can't upgrade either debian or ceph until the dependency problem is resolved,

[ceph-users] radosgw-admin sync error trim seems to do nothing

2023-08-19 Thread Matthew Darwin
Hello all, "radosgw-admin sync error list" returns errors from 2022.  I want to clear those out. I tried "radosgw-admin sync error trim" but it seems to do nothing.  The man page seems to offer no suggestions https://docs.ceph.com/en/quincy/man/8/radosgw-admin/ Any ideas what I need to do

[ceph-users] Re: 1 PG stucked in "active+undersized+degraded for long time

2023-07-20 Thread Matthew Leonard (BLOOMBERG/ 120 PARK)
Assuming you're running systemctl OSDs you can run the following command on the host that OSD 343 resides on. systemctl restart ceph-osd@343 From: siddhit.ren...@nxtgen.com At: 07/20/23 13:44:36 UTC-4:00To: ceph-users@ceph.io Subject: [ceph-users] Re: 1 PG stucked in "active+undersized+degrad

[ceph-users] Re: RBD with PWL cache shows poor performance compared to cache device

2023-07-10 Thread Matthew Booth
On Thu, 6 Jul 2023 at 12:54, Mark Nelson wrote: > > > On 7/6/23 06:02, Matthew Booth wrote: > > On Wed, 5 Jul 2023 at 15:18, Mark Nelson wrote: > >> I'm sort of amazed that it gave you symbols without the debuginfo > >> packages installed. I'

[ceph-users] Re: RBD with PWL cache shows poor performance compared to cache device

2023-07-06 Thread Matthew Booth
pping the number of tp_pwl > threads from 4 to 1 and see if that changes anything. Will do. Any idea how to do that? I don't see an obvious rbd config option. Thanks for looking into this, Matt -- Matthew Booth ___ ceph-users mailing list

[ceph-users] Re: RBD with PWL cache shows poor performance compared to cache device

2023-07-04 Thread Matthew Booth
On Tue, 4 Jul 2023 at 10:00, Matthew Booth wrote: > > On Mon, 3 Jul 2023 at 18:33, Ilya Dryomov wrote: > > > > On Mon, Jul 3, 2023 at 6:58 PM Mark Nelson wrote: > > > > > > > > > On 7/3/23 04:53, Matthew Booth wrote: > > > > On Thu, 2

[ceph-users] Re: RBD with PWL cache shows poor performance compared to cache device

2023-07-04 Thread Matthew Booth
On Tue, 4 Jul 2023 at 14:24, Matthew Booth wrote: > On Tue, 4 Jul 2023 at 10:45, Yin, Congmin wrote: > > > > Hi , Matthew > > > > I see "rbd with pwl cache: 5210112 ns", This latency is beyond my > > expectations and I believe it is unlikely to

[ceph-users] Re: RBD with PWL cache shows poor performance compared to cache device

2023-07-04 Thread Matthew Booth
On Tue, 4 Jul 2023 at 10:45, Yin, Congmin wrote: > > Hi , Matthew > > I see "rbd with pwl cache: 5210112 ns", This latency is beyond my > expectations and I believe it is unlikely to occur. In theory, this value > should be around a few hundred microseconds. But

[ceph-users] Re: RBD with PWL cache shows poor performance compared to cache device

2023-07-04 Thread Matthew Booth
On Mon, 3 Jul 2023 at 18:33, Ilya Dryomov wrote: > > On Mon, Jul 3, 2023 at 6:58 PM Mark Nelson wrote: > > > > > > On 7/3/23 04:53, Matthew Booth wrote: > > > On Thu, 29 Jun 2023 at 14:11, Mark Nelson wrote: > > >>>>> This contain

[ceph-users] Re: RBD with PWL cache shows poor performance compared to cache device

2023-07-03 Thread Matthew Booth
On Fri, 30 Jun 2023 at 08:50, Yin, Congmin wrote: > > Hi Matthew, > > Due to the latency of rbd layers, the write latency of the pwl cache is more > than ten times that of the Raw device. > I replied directly below the 2 questions. > > Best regards. > Congmin Yin

[ceph-users] Re: RBD with PWL cache shows poor performance compared to cache device

2023-07-03 Thread Matthew Booth
ntime=60 --time_based=1 > >>> > >>> And extracts sync.lat_ns.percentile["99.00"] > >> > >> Matthew, do you have the rest of the fio output captured? It would be > >> interesting to see if it's just the 99th percentile that is bad or

[ceph-users] Re: RBD with PWL cache shows poor performance compared to cache device

2023-06-29 Thread Matthew Booth
ing with fio. Specifically I am running a containerised test, >> executed with: >>podman run --volume .:/var/lib/etcd:Z quay.io/openshift-scale/etcd-perf >> >> This container runs: >>fio --rw=write --ioengine=sync --fdatasync=1 >> --directory=/var/lib/et

[ceph-users] Re: RBD with PWL cache shows poor performance compared to cache device

2023-06-27 Thread Matthew Booth
On Tue, 27 Jun 2023 at 18:20, Josh Baergen wrote: > > Hi Matthew, > > We've done a limited amount of work on characterizing the pwl and I think it > suffers the classic problem of some writeback caches in that, once the cache > is saturated, it's actually worse tha

[ceph-users] RBD with PWL cache shows poor performance compared to cache device

2023-06-27 Thread Matthew Booth
d: 180 MiB cached: 135 MiB dirty: 0 B free: 844 MiB hits_full: 1 / 0% hits_partial: 3 / 0% misses: 21952 hit_bytes: 6 KiB / 0% miss_bytes: 349 MiB -- Matthew Booth ___ ceph-users mailing li

[ceph-users] Re: Bucket sync policy

2023-04-24 Thread Matthew Darwin
I have basically given up relying on bucket sync to work properly in quincy.  I have been running a cron job to manually sync files between datacentres to catch the files that don't get replicated.  It's pretty inefficient, but at least all the files get to the backup datacentre. Would love to

[ceph-users] Re: Do not use SSDs with (small) SLC cache

2023-02-28 Thread Matthew Stroud
A bit late to the game, but I'm not sure if it is your drives. I had a very similar issue to yours on enterprise drives (not that means much outside of support). What I was seeing is that a rebuild would kick off, PGs would instantly start to become laggy and then our clients (openstack rbd) wo

[ceph-users] Re: Debian update to 16.2.11-1~bpo11+1 failing

2023-01-31 Thread Matthew Booth
e send an email to ceph-users-le...@ceph.io > -- Matthew Booth ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] PSA: Potential problems in a recent kernel?

2023-01-27 Thread Matthew Booth
4-0.fc37 -> 2:4.17.4-2.fc37 selinux-policy 37.16-1.fc37 -> 37.17-1.fc37 selinux-policy-targeted 37.16-1.fc37 -> 37.17-1.fc37 tpm2-tss 3.2.0-3.fc37 -> 3.2.1-1.fc37 Removed: cracklib-dicts-2.9.7-30.fc37.x86_64 -- Matthew Booth ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Building Ceph containers

2023-01-16 Thread Matthew Vernon
o be going... Thanks, Matthew [0] https://docs.ceph.com/en/quincy/install/build-ceph/ ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Mysterious HDD-Space Eating Issue

2023-01-16 Thread matthew
Hi Guys, I've got a funny one I'm hoping someone can point me in the right direction with: We've got three identical(?) Ceph nodes running 4 OSDs, Mon, Man, and iSCSI G/W each (we're only a small shop) on Rocky Linux 8 / Ceph Quincy. Everything is running fine, no bottle-necks (as far as we ca

[ceph-users] Laggy PGs on a fairly high performance cluster

2023-01-12 Thread Matthew Stroud
We have a 14 osd node all ssd cluster and for some reason we are continually getting laggy PGs and those seem to correlate to slow requests on Quincy (doesn't seem to happen on our Pacific clusters). These laggy pgs seem to shift between osds. The network seems solid, as in I'm not seeing errors

[ceph-users] Re: S3 Deletes in Multisite Sometimes Not Syncing

2022-12-23 Thread Matthew Darwin
Hi Alex, We also have a multi-site setup (17.2.5). I just deleted a bunch of files from one side and some files got deleted on the other side but not others. I waited 10 hours to see if the files would delete. I didn't do an exhaustive test like yours, but seems similar issues. In our case, l

[ceph-users] Re: Multi site alternative

2022-11-23 Thread Matthew Leonard (BLOOMBERG/ 120 PARK)
Hey Ivan, I think the answer would be multisite. I know there is a lot of effort currently to work out the last few kinks. This tracker might be of interest as it sounds like an already identified issue, https://tracker.ceph.com/issues/57562#change-228263 Matt From: istvan.sz...@agoda.com At:

[ceph-users] Re: strange OSD status when rebooting one server

2022-10-14 Thread Matthew Darwin
9, rum S14 ____ From: Matthew Darwin Sent: 14 October 2022 18:57:37 To:c...@elchaka.de;ceph-users@ceph.io Subject: [ceph-users] Re: strange OSD status when rebooting one server https://gist.githubusercontent.com/matthewdarwin/aec3c2b16ba5e74beb4af1d49e

[ceph-users] Re: strange OSD status when rebooting one server

2022-10-14 Thread Matthew Darwin
hint... Hth Am 14. Oktober 2022 18:45:40 MESZ schrieb Matthew Darwin : Hi, I am hoping someone can help explain this strange message.  I took 1 physical server offline which contains 11 OSDs.  "ceph -s" reports 11 osd down.  Great. But on the next line it says &qu

[ceph-users] strange OSD status when rebooting one server

2022-10-14 Thread Matthew Darwin
Hi, I am hoping someone can help explain this strange message.  I took 1 physical server offline which contains 11 OSDs.  "ceph -s" reports 11 osd down.  Great. But on the next line it says "4 hosts" are impacted.  It should only be 1 single host?  When I look the manager dashboard all the O

[ceph-users] Re: Ceph iSCSI rbd-target.api Failed to Load

2022-09-09 Thread Matthew J Black
Hi Li, Yeah, that's what I thought (about having the api_secure), so I checked for the iscsi-gateway.cfg file and there's only one on the system, in the /etc/ceph/ folder. Any other ideas? Cheers PEREGRINE IT Signature On 09/09/2022 18:35, Xiubo Li wrote: On 07/09/2022 17:37, duluxoz wrot

[ceph-users] ceph -s command hangs with an authentication timeout - a reply

2022-08-08 Thread Matthew J Black
Hi Eneko, Sorry for the round-about way of getting back to you (I can't seem to work out how to reply/post to my original message - I'm obviously tired/stupid/whatever  :-) Problem solved (about 15 minutes ago) - turns out I had a typo (one of those small, hard to spot ones) - so a PBCAK (or

[ceph-users] Re: multi-site replication not syncing metadata

2022-07-04 Thread Matthew Darwin
I did manage to get this working. Not sure what exactly fixed it, but creating the pool "default.rgw.otp" helped.  Why are missing pools not automatically created? Also this: radosgw-admin sync status radosgw-admin metadata sync run On 2022-06-20 19:26, Matthew Darwin wrot

[ceph-users] Re: How to remove TELEMETRY_CHANGED( Telemetry requires re-opt-in) message

2022-06-24 Thread Matthew Darwin
Not sure.  Long enough to try the command and write this email, so at least 10 minutes. I expected it to disappear after 30 seconds or so. On 2022-06-24 10:34, Laura Flores wrote: Hi Matthew, About how long did the warning stay up after you ran the `ceph telemetry on` command? - Laura On

[ceph-users] Re: How to remove TELEMETRY_CHANGED( Telemetry requires re-opt-in) message

2022-06-24 Thread Matthew Darwin
Thanks Yaarit, The cluster I was using is just a test cluster with a few OSD and almost no data. Not sure why I have to re-opt in upgrading from 17.2.0 to 17.2.1 On 2022-06-24 09:41, Yaarit Hatuka wrote: Hi Matthew, Thanks for your update. How big is the cluster? Thanks for opting-in to

[ceph-users] Re: How to remove TELEMETRY_CHANGED( Telemetry requires re-opt-in) message

2022-06-23 Thread Matthew Darwin
Sorry. Eventually it goes away.  Just slower than I was expecting. On 2022-06-23 23:42, Matthew Darwin wrote: I just updated quincy from 17.2.0 to 17.2.1.  Ceph status reports "Telemetry requires re-opt-in". I then run $ ceph telemetry on $ ceph telemetry on --license sharing-

[ceph-users] How to remove TELEMETRY_CHANGED( Telemetry requires re-opt-in) message

2022-06-23 Thread Matthew Darwin
I just updated quincy from 17.2.0 to 17.2.1.  Ceph status reports "Telemetry requires re-opt-in". I then run $ ceph telemetry on $ ceph telemetry on --license sharing-1-0 Still the message "TELEMETRY_CHANGED( Telemetry requires re-opt-in) message" remains in the log. Any ideas how to get ri

[ceph-users] multi-site replication not syncing metadata

2022-06-20 Thread Matthew Darwin
Hi all, Running into some trouble. I just setup ceph multi-site replication.  Good news is that it is syncing the data. But the metadata is NOT syncing. I was trying to follow the instructions from here: https://docs.ceph.com/en/quincy/radosgw/multisite/#create-a-secondary-zone I see there

[ceph-users] Re: osd_disk_thread_ioprio_class deprecated?

2022-05-18 Thread Matthew H
See this PR https://github.com/ceph/ceph/pull/19973 From: Josh Baergen Sent: Wednesday, May 18, 2022 10:54 AM To: Richard Bade Cc: Ceph Users Subject: [ceph-users] Re: osd_disk_thread_ioprio_class deprecated? Hi Richard, > Could anyone confirm this? And whic

[ceph-users] Re: OS suggestion for further ceph installations (centos stream, rocky, ubuntu)?

2022-02-04 Thread Matthew Vernon
Ubuntu Cloud Archive helpful if you want a more recent Ceph than the version your release shipped with; it can also help you decouple Ceph upgrades from OS upgrades. HTH, Matthew ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an

[ceph-users] Re: [RGW] bi_list(): (5) Input/output error blocking resharding

2022-01-10 Thread Matthew Vernon
Hi, On 07/01/2022 18:39, Gilles Mocellin wrote: Anyone who had that problem find a workaround ? Are you trying to reshard a bucket in a multisite setup? That isn't expected to work (and, IIRC, the changes to support doing so aren't going to make it into quincy). Regards

  1   2   >