[ceph-users] Re: mds terminated

2023-07-18 Thread Milind Changire
if possible, could you share the mds logs at debug level 20 you'll need to set debug_mds = 20 in the conf file until the crash and revert the level to the default after mds crash On Tue, Jul 18, 2023 at 9:12 PM wrote: > hello. > I am using ROK CEPH and have 20 MDSs in use. 10 are in rank 0-9

[ceph-users] Re: Workload that delete 100 M object daily via lifecycle

2023-07-18 Thread Anthony D'Atri
Indeed that's very useful. I improved the documentation for that not long ago, took a while to sort out exactly what it was about. Normally LC only runs once a day as I understand it, there's a debug option that compresses time so that it'll run more frequently, as having to wait for a day to

[ceph-users] Re: Workload that delete 100 M object daily via lifecycle

2023-07-18 Thread Hoan Nguyen Van
You can enable debug lc to test and tuning rgw lc parameter. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Workload that delete 100 M object daily via lifecycle

2023-07-18 Thread Anthony D'Atri
Index pool on Aerospike? Building OSDs on PRAM might be a lot less work than trying to ensure consistency on backing storage while still servicing out of RAM and not syncing every transaction. > On Jul 18, 2023, at 14:31, Peter Grandi wrote: > > [...] S3 workload, that will need to

[ceph-users] Re: index object in shard begins with hex 80

2023-07-18 Thread Christopher Durham
Dan, Ok, I've discovered a few more things. None of the bucket index objects show up as type 'olh' in the bi list they are all 'plain'. Since my objects begin with "<80>0_", this appears to be abucket log index as per: static std::string bucket_index_prefixes in cls_rgw.cc. One of these omapkey

[ceph-users] Re: librbd hangs during large backfill

2023-07-18 Thread Anthony D'Atri
I've seen this dynamic contribute to a hypervisor with many attachments running out of system-wide file descriptors. > On Jul 18, 2023, at 16:21, Konstantin Shalygin wrote: > > Hi, > > Check you libvirt limits for qemu open files/sockets. Seems, when you added > new OSD's, your librbd client

[ceph-users] Re: librbd hangs during large backfill

2023-07-18 Thread Konstantin Shalygin
Hi, Check you libvirt limits for qemu open files/sockets. Seems, when you added new OSD's, your librbd client limit reached k Sent from my iPhone > On 18 Jul 2023, at 19:32, Wesley Dillingham wrote: > > Did your automation / process allow for stalls in between changes to allow > peering to

[ceph-users] Re: index object in shard begins with hex 80

2023-07-18 Thread Christopher Durham
Dan, Don't worry, I won't do something blind. Thanks for the info, I hadn't thought to try --omap-key-file. -Chris On Tuesday, July 18, 2023 at 12:14:18 PM MDT, Dan van der Ster wrote: Hi Chris, Those objects are in the so called "ugly namespace" of the rgw, used to prefix special

[ceph-users] Re: Workload that delete 100 M object daily via lifecycle

2023-07-18 Thread Peter Grandi
[...] S3 workload, that will need to delete 100M file daily [...] >> [...] average (what about peaks?) around 1,200 committed >> deletions per second (across the traditional 3 metadata >> OSDs) sustained, that may not leave a lot of time for file > creation, writing or reading. :-)[...]

[ceph-users] Re: index object in shard begins with hex 80

2023-07-18 Thread Dan van der Ster
Hi Chris, Those objects are in the so called "ugly namespace" of the rgw, used to prefix special bucket index entries. // No UTF-8 character can begin with 0x80, so this is a safe indicator // of a special bucket-index entry for the first byte. Note: although // it has no impact, the 2nd, 3rd,

[ceph-users] Re: cephadm does not redeploy OSD

2023-07-18 Thread Luis Domingues
That part looks quite good: "available": false, "ceph_device": true, "created": "2023-07-18T16:01:16.715487Z", "device_id": "SAMSUNG MZPLJ1T6HBJR-7_S55JNG0R600354", "human_readable_type": "ssd", "lsm_data": {}, "lvs": [ {

[ceph-users] Re: librbd hangs during large backfill

2023-07-18 Thread Wesley Dillingham
Did your automation / process allow for stalls in between changes to allow peering to complete? My hunch is you caused a very large peering storm (during peering a PG is inactive) which in turn caused your VMs to panic. If the RBDs are unmapped and re-mapped does it still continue to struggle?

[ceph-users] index object in shard begins with hex 80

2023-07-18 Thread Christopher Durham
Hi, I am using ceph 17.2.6 on rocky linux 8. I got a large omap object warning today. Ok, So I tracked it down to a shard for a bucket in the index pool of an s3 pool. However, when lisitng the omapkeys with: # rados -p pool.index listomapkeys .dir.zone.bucketid.xx.indexshardnumber it is clear

[ceph-users] Re: cephadm does not redeploy OSD

2023-07-18 Thread Adam King
in the "ceph orch device ls --format json-pretty" output, in the blob for that specific device, is the "ceph_device" field set? There was a bug where it wouldn't be set at all (https://tracker.ceph.com/issues/57100) and it would make it so you couldn't use a device serving as a db device for any

[ceph-users] Re: Workload that delete 100 M object daily via lifecycle

2023-07-18 Thread Anthony D'Atri
Index pool distributed over a large number of NVMe OSDs? Multiple, dedicated RGW instances that only run LC? > On Jul 18, 2023, at 12:08, Peter Grandi wrote: > On Mon, 17 Jul 2023 19:19:34 +0700, Ha Nguyen Van said: > >> [...] S3 workload, that will need to delete 100M file daily

[ceph-users] Re: Workload that delete 100 M object daily via lifecycle

2023-07-18 Thread Peter Grandi
>>> On Mon, 17 Jul 2023 19:19:34 +0700, Ha Nguyen Van >>> said: > [...] S3 workload, that will need to delete 100M file daily [...] So many people seem to think that distributed (or even local) filesystems (and in particular their metadata servers) can sustain the same workload as high volume

[ceph-users] Re: CEPHADM_FAILED_SET_OPTION

2023-07-18 Thread Adam King
Someone hit what I think is this same issue the other day. Do you have a "config" section in your rgw spec that sets the "rgw_keystone_implicit_tenants" option to "True" or "true"? For them, changing the value to be 1 (which should be equivalent to "true" here) instead of "true" fixed it. Likely

[ceph-users] librbd hangs during large backfill

2023-07-18 Thread fb2cd0fc-933c-4cfe-b534-93d67045a088
Starting on Friday, as part of adding a new pod of 12 servers, we initiated a reweight on roughly 384 drives; from 0.1 to 0.25. Something about the resulting large backfill is causing librbd to hang, requiring server restarts. The volumes are showing buffer i/o errors when this happens.We are

[ceph-users] Re: quincy 17.2.6 - write performance continuously slowing down until OSD restart needed

2023-07-18 Thread Gabriel Benhanokh
Greg, You are correct - the timeout was used during debug to make sure the fast shutdown is indeed fast, but even if it completed after that timeout everything should be perfectly fine. The timeout was set to 15 seconds which is more than enough to complete shutdown on a valid system (in

[ceph-users] OSD crash after server reboot

2023-07-18 Thread pedro . martin
Hi We are facing error with OSD crash after reboot of the server where it is installed We rebooted servers in our ceph cluster for a patching and after rebooting two OSD where crashing One of them finally recovered but the other is still down Cluster is currently rebalancing objects : #

[ceph-users] mds terminated

2023-07-18 Thread dxodnd
hello. I am using ROK CEPH and have 20 MDSs in use. 10 are in rank 0-9 and 10 are in standby. I have one ceph filesystem, and 2 mds are trimming. Under one FILESYSTEM, there are 6 MDSs in RESOLVE, 1 MDS in REPLAY, and 3 in ACTIVE. For some reason, since 36 hours ago, RESOLVE is stuck in

[ceph-users] Re: cephadm upgrade 16.2.10 to 16.2.11: osds crash and get stuck restarting

2023-07-18 Thread letonphat1988
In my side, I saw the osd container trying to map and start on another "device mapper" that in "READ-ONLY". You could check by Step 1: check the folder store OSD infomation the path is /var/lib/ceph/{fsid}/osd.{id}/block , when we run `ls -lah {block}` , we will get a symlink like that

[ceph-users] replacing all disks in a stretch mode ceph cluster

2023-07-18 Thread Zoran Bošnjak
Hello ceph users, my ceph configuration is - ceph version 17.2.5 on ubuntu 20.04 - stretch mode - 2 rooms with OSDs and monitors + additional room for the tiebreaker monitor - 4 OSD servers in each room - 6 OSDs per OSD server - ceph installation/administration is manual (without ansible, orch...

[ceph-users] CEPHADM_FAILED_SET_OPTION

2023-07-18 Thread Arnoud de Jonge
Hi, After having set up RadosGW with keystone authentication the cluster shows this warning: # ceph health detail HEALTH_WARN Failed to set 1 option(s) [WRN] CEPHADM_FAILED_SET_OPTION: Failed to set 1 option(s) Failed to set rgw.fra option rgw_keystone_implicit_tenants: config set failed:

[ceph-users] ceph-mgr ssh connections left open

2023-07-18 Thread Wyll Ingersoll
Every night at midnight, our ceph-mgr daemons open up ssh connections to the other nodes and then leaves them open. Eventually they become zombies. I cannot figure out what module is causing this or how to turn it off. If left unchecked over days/weeks, the zombie ssh connections just keep

[ceph-users] cephadm does not redeploy OSD

2023-07-18 Thread Luis Domingues
Hi, We are running a ceph cluster managed with cephadm v16.2.13. Recently we needed to change a disk, and we replaced it with: ceph orch osd rm 37 --replace. It worked fine, the disk was drained and the OSD marked as destroy. However, after changing the disk, no OSD was created. Looking to

[ceph-users] Not all Bucket Shards being used

2023-07-18 Thread Christian Kugler
Hi, I have trouble with large OMAP files in a cluster in the RGW index pool. Some background information about the cluster: There is CephFS and RBD usage on the main cluster but for this issue I think only S3 is interesting. There is one realm, one zonegroup with two zones which have a