if possible, could you share the mds logs at debug level 20
you'll need to set debug_mds = 20 in the conf file until the crash and
revert the level to the default after mds crash
On Tue, Jul 18, 2023 at 9:12 PM wrote:
> hello.
> I am using ROK CEPH and have 20 MDSs in use. 10 are in rank 0-9
Indeed that's very useful. I improved the documentation for that not long ago,
took a while to sort out exactly what it was about.
Normally LC only runs once a day as I understand it, there's a debug option
that compresses time so that it'll run more frequently, as having to wait for a
day to
You can enable debug lc to test and tuning rgw lc parameter.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
Index pool on Aerospike?
Building OSDs on PRAM might be a lot less work than trying to ensure
consistency on backing storage while still servicing out of RAM and not syncing
every transaction.
> On Jul 18, 2023, at 14:31, Peter Grandi wrote:
>
> [...] S3 workload, that will need to
Dan,
Ok, I've discovered a few more things.
None of the bucket index objects show up as type 'olh' in the bi list they are
all 'plain'. Since my objects begin with "<80>0_", this appears to be abucket
log index as per:
static std::string bucket_index_prefixes
in cls_rgw.cc.
One of these omapkey
I've seen this dynamic contribute to a hypervisor with many attachments running
out of system-wide file descriptors.
> On Jul 18, 2023, at 16:21, Konstantin Shalygin wrote:
>
> Hi,
>
> Check you libvirt limits for qemu open files/sockets. Seems, when you added
> new OSD's, your librbd client
Hi,
Check you libvirt limits for qemu open files/sockets. Seems, when you added new
OSD's, your librbd client limit reached
k
Sent from my iPhone
> On 18 Jul 2023, at 19:32, Wesley Dillingham wrote:
>
> Did your automation / process allow for stalls in between changes to allow
> peering to
Dan,
Don't worry, I won't do something blind. Thanks for the info, I hadn't thought
to try --omap-key-file.
-Chris
On Tuesday, July 18, 2023 at 12:14:18 PM MDT, Dan van der Ster
wrote:
Hi Chris,
Those objects are in the so called "ugly namespace" of the rgw, used to prefix
special
[...] S3 workload, that will need to delete 100M file
daily [...]
>> [...] average (what about peaks?) around 1,200 committed
>> deletions per second (across the traditional 3 metadata
>> OSDs) sustained, that may not leave a lot of time for file
> creation, writing or reading. :-)[...]
Hi Chris,
Those objects are in the so called "ugly namespace" of the rgw, used to
prefix special bucket index entries.
// No UTF-8 character can begin with 0x80, so this is a safe indicator
// of a special bucket-index entry for the first byte. Note: although
// it has no impact, the 2nd, 3rd,
That part looks quite good:
"available": false,
"ceph_device": true,
"created": "2023-07-18T16:01:16.715487Z",
"device_id": "SAMSUNG MZPLJ1T6HBJR-7_S55JNG0R600354",
"human_readable_type": "ssd",
"lsm_data": {},
"lvs": [
{
Did your automation / process allow for stalls in between changes to allow
peering to complete? My hunch is you caused a very large peering storm
(during peering a PG is inactive) which in turn caused your VMs to panic.
If the RBDs are unmapped and re-mapped does it still continue to struggle?
Hi,
I am using ceph 17.2.6 on rocky linux 8.
I got a large omap object warning today.
Ok, So I tracked it down to a shard for a bucket in the index pool of an s3
pool.
However, when lisitng the omapkeys with:
# rados -p pool.index listomapkeys .dir.zone.bucketid.xx.indexshardnumber
it is clear
in the "ceph orch device ls --format json-pretty" output, in the blob for
that specific device, is the "ceph_device" field set? There was a bug where
it wouldn't be set at all (https://tracker.ceph.com/issues/57100) and it
would make it so you couldn't use a device serving as a db device for any
Index pool distributed over a large number of NVMe OSDs? Multiple, dedicated
RGW instances that only run LC?
> On Jul 18, 2023, at 12:08, Peter Grandi wrote:
>
On Mon, 17 Jul 2023 19:19:34 +0700, Ha Nguyen Van
said:
>
>> [...] S3 workload, that will need to delete 100M file daily
>>> On Mon, 17 Jul 2023 19:19:34 +0700, Ha Nguyen Van
>>> said:
> [...] S3 workload, that will need to delete 100M file daily [...]
So many people seem to think that distributed (or even local)
filesystems (and in particular their metadata servers) can
sustain the same workload as high volume
Someone hit what I think is this same issue the other day. Do you have a
"config" section in your rgw spec that sets the
"rgw_keystone_implicit_tenants" option to "True" or "true"? For them,
changing the value to be 1 (which should be equivalent to "true" here)
instead of "true" fixed it. Likely
Starting on Friday, as part of adding a new pod of 12 servers, we initiated a
reweight on roughly 384 drives; from 0.1 to 0.25. Something about the resulting
large backfill is causing librbd to hang, requiring server restarts. The
volumes are showing buffer i/o errors when this happens.We are
Greg,
You are correct - the timeout was used during debug to make sure the fast
shutdown is indeed fast, but even if it completed after that timeout everything
should be perfectly fine.
The timeout was set to 15 seconds which is more than enough to complete
shutdown on a valid system (in
Hi
We are facing error with OSD crash after reboot of the server where it is
installed
We rebooted servers in our ceph cluster for a patching and after rebooting two
OSD where crashing
One of them finally recovered but the other is still down
Cluster is currently rebalancing objects :
#
hello.
I am using ROK CEPH and have 20 MDSs in use. 10 are in rank 0-9 and 10 are in
standby.
I have one ceph filesystem, and 2 mds are trimming.
Under one FILESYSTEM, there are 6 MDSs in RESOLVE, 1 MDS in REPLAY, and 3 in
ACTIVE.
For some reason, since 36 hours ago, RESOLVE is stuck in
In my side, I saw the osd container trying to map and start on another "device
mapper" that in "READ-ONLY". You could check by
Step 1: check the folder store OSD infomation
the path is /var/lib/ceph/{fsid}/osd.{id}/block , when we run `ls -lah {block}`
, we will get a symlink like that
Hello ceph users,
my ceph configuration is
- ceph version 17.2.5 on ubuntu 20.04
- stretch mode
- 2 rooms with OSDs and monitors + additional room for the tiebreaker monitor
- 4 OSD servers in each room
- 6 OSDs per OSD server
- ceph installation/administration is manual (without ansible, orch...
Hi,
After having set up RadosGW with keystone authentication the cluster shows this
warning:
# ceph health detail
HEALTH_WARN Failed to set 1 option(s)
[WRN] CEPHADM_FAILED_SET_OPTION: Failed to set 1 option(s)
Failed to set rgw.fra option rgw_keystone_implicit_tenants: config set
failed:
Every night at midnight, our ceph-mgr daemons open up ssh connections to the
other nodes and then leaves them open. Eventually they become zombies.
I cannot figure out what module is causing this or how to turn it off. If left
unchecked over days/weeks, the zombie ssh connections just keep
Hi,
We are running a ceph cluster managed with cephadm v16.2.13. Recently we needed
to change a disk, and we replaced it with:
ceph orch osd rm 37 --replace.
It worked fine, the disk was drained and the OSD marked as destroy.
However, after changing the disk, no OSD was created. Looking to
Hi,
I have trouble with large OMAP files in a cluster in the RGW index pool. Some
background information about the cluster: There is CephFS and RBD usage on the
main cluster but for this issue I think only S3 is interesting.
There is one realm, one zonegroup with two zones which have a
27 matches
Mail list logo