[ceph-users] Re: octopus (15.2.16) OSDs crash or don't answer heathbeats (and get marked as down)

2022-04-26 Thread Boris Behrens
113, "filter_policy_name": "rocksdb.BuiltinBloomFilter"}} 2022-04-24T06:54:28.689+ 7f7c76c00700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f7c5d130700' had timed out after 15 2022-04-24T06:54:28.689+ 7f7c75bfe700 1 heartbeat_map is_healthy 'OSD::os

[ceph-users] Ceph Octopus RGW - files vanished from rados while still in bucket index

2022-06-13 Thread Boris Behrens
ormed a month before the files last MTIME. Is there ANY way this could happen in some correlation with the GC, restarting/adding/removing OSDs, sharding bucket indexes, OSD crashes and other? Anything that isn't "rados -p POOL rm OBJECT"? Cheers Boris

[ceph-users] Re: Ceph Octopus RGW - files vanished from rados while still in bucket index

2022-06-13 Thread Boris Behrens
Hmm.. I will check what the user is deleting. Maybe this is it. Do you know if this bug is new in 15.2.16? I can't share the data, but I can share the metadata: https://pastebin.com/raw/T1YYLuec For the missing files I have, the multipart file is not available in rados, but the 0 byte file is. Th

[ceph-users] Re: Ceph Octopus RGW - files vanished from rados while still in bucket index

2022-06-14 Thread Boris Behrens
any deletes of said files I am even more clueless. Thank you for your time and your reply. Cheers Boris Am Di., 14. Juni 2022 um 18:38 Uhr schrieb J. Eric Ivancich < ivanc...@redhat.com>: > Hi Boris, > > I’m a little confused. The pastebin seems to show that you can stat "

[ceph-users] Possible customer impact on resharding radosgw bucket indexes?

2022-07-06 Thread Boris Behrens
time. So my question is: does this somehow affect customer workload, or do I put their data in danger, when I reshard and they upload files? And how do you approach this problem? Do you have a very high default for all buckets, or do you just ignore the large omap objects message? Cheers Boris

[ceph-users] creating OSD partition on blockdb ssd

2022-07-23 Thread Boris Behrens
Hi, I would like to use some of the blockdb ssd space for OSDs. We provide some radosgw clusters with 8TB and 16TB rotational OSDs. We added 2TB SSDs and use one SSD per 5 8TB OSDs or 3 16TB OSDs. Now there is still space left on the devices and I thought I could just create another LV of 100GB o

[ceph-users] Ceph Octopus RGW 15.2.17 - files not available in rados while still in bucket index

2022-08-21 Thread Boris Behrens
this error. Hope someone can tell me what this is and how I can fix it. Cheers Boris Strange errors: 2022-08-18T22:04:29.538+ 7f7ba9fcb700 0 req 9033182355071581504 183.407425780s s3:complete_multipart WARNING: failed to remove object sql-backup-de:_multipart_IM_DIFFERENTIA

[ceph-users] Re: Ceph Octopus RGW 15.2.17 - files not available in rados while still in bucket index

2022-08-21 Thread Boris Behrens
pressure from the cluster, when the GC goes nuts. Maybe this happens together. Am So., 21. Aug. 2022 um 19:34 Uhr schrieb Boris Behrens : > Cheers everybody, > > I had this issue some time ago, and we though it was fixed, but it seems > to happen again. > We have files, that get upl

[ceph-users] Re: Reserve OSDs exclusive for pool

2022-08-21 Thread Boris Behrens
Hi Anthony, oh that is cool. Does the OSD overwrite it, after restarts? Anything I would need to know, after doing this to persist it? Cheers Boris Am So., 21. Aug. 2022 um 20:55 Uhr schrieb Anthony D'Atri < anthony.da...@gmail.com>: > Set an arbitrary device class for t

[ceph-users] large omap object in .rgw.usage pool

2022-08-26 Thread Boris Behrens
at least I have not trimmed below two months). I also tried to increase the PGs on it, but this also did not help. For normal buckets, I just reshard, but I haven't found any resharing options for the usage log. Does anyone got a solution for it?

[ceph-users] Downside of many rgw bucket shards?

2022-08-29 Thread Boris Behrens
Hi there, I have some buckets that would require >100 shards and I would like to ask if there are any downsides to have these many shards on a bucket? Cheers Boris ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to c

[ceph-users] Re: Downside of many rgw bucket shards?

2022-08-30 Thread Boris Behrens
, and return them to the >> client. And it has to do this in batches of about 1000 at a time. >> > >> > It looks like you’re expecting on the order of 10,000,000 objects in >> these buckets, so I imagine you’re not going to be listing them with any >> regularity.

[ceph-users] laggy OSDs and staling krbd IO after upgrade from nautilus to octopus

2022-09-13 Thread Boris Behrens
Hi, I need you help really bad. we are currently experiencing a very bad cluster hangups that happen sporadic. (once on 2022-09-08 mid day (48 hrs after the upgrade) and once 2022-09-12 in the evening) We use krbd without cephx for the qemu clients and when the OSDs are getting laggy, the krbd con

[ceph-users] Re: laggy OSDs and staling krbd IO after upgrade from nautilus to octopus

2022-09-13 Thread Boris Behrens
Because someone mentioned that the attachments did not went through I created pastebin links: monlog: https://pastebin.com/jiNPUrtL osdlog: https://pastebin.com/dxqXgqDz Am Di., 13. Sept. 2022 um 11:43 Uhr schrieb Boris Behrens : > Hi, I need you help really bad. > > we are

[ceph-users] Re: laggy OSDs and staling krbd IO after upgrade from nautilus to octopus

2022-09-13 Thread Boris Behrens
I checked the cluster for other snaptrim operations and they happen all over the place, so for me it looks like they just happend to be done when the issue occured, but were not the driving factor. Am Di., 13. Sept. 2022 um 12:04 Uhr schrieb Boris Behrens : > Because someone mentioned that

[ceph-users] Re: laggy OSDs and staling krbd IO after upgrade from nautilus to octopus

2022-09-13 Thread Boris Behrens
HZ4CMJKMI5K/ > = > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > ____ > From: Boris Behrens > Sent: 13 September 2022 11:43:20 > To: ceph-users@ceph.io > Subject: [ceph-users] laggy OSDs and staling krbd IO

[ceph-users] Re: laggy OSDs and staling krbd IO after upgrade from nautilus to octopus

2022-09-13 Thread Boris Behrens
rieb Frank Schilder : > Hi Boris. > > > 3. wait some time (took around 5-20 minutes) > > Sounds short. Might just have been the compaction that the OSDs do any > ways on startup after upgrade. I don't know how to check for completed > format conversion. What I see in your MON

[ceph-users] Re: laggy OSDs and staling krbd IO after upgrade from nautilus to octopus

2022-09-13 Thread Boris Behrens
efs_buffered_io to true alleviated that issue. That was on a nautilus > cluster. > > Respectfully, > > *Wes Dillingham* > w...@wesdillingham.com > LinkedIn <http://www.linkedin.com/in/wesleydillingham> > > > On Tue, Sep 13, 2022 at 10:48 AM Boris Behrens wrote: &

[ceph-users] Public RGW access without any LB in front?

2022-09-16 Thread Boris Behrens
someone got experience with it and can share some insights? Cheers Boris -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an ema

[ceph-users] Re: Traffic between public and cluster network

2022-09-29 Thread Boris Behrens
network. Hope I understood ceph correctly and your question :) Cheers Boris Am Do., 29. Sept. 2022 um 04:11 Uhr schrieb Murilo Morais < mur...@evocorp.com.br>: > Good evening everyone. > > I setup a cluster with three machines, each with two network interfaces, > one for the

[ceph-users] octopus 15.2.17 RGW daemons begin to crash regularly

2022-10-01 Thread Boris Behrens
Hi, we are experiencing that the rgw daemons crash and I don't understand why, Maybe someone here can lead me to a point where I can dig further. { "backtrace": [ "(()+0x43090) [0x7f143ca06090]", "(gsignal()+0xcb) [0x7f143ca0600b]", "(abort()+0x12b) [0x7f143c9e5859]",

[ceph-users] Re: Convert mon kv backend to rocksdb

2022-10-04 Thread Boris Behrens
Cheers Reed, just saw this and checked on my own. Also had one mon that ran on leveldb. I just removed the mon, pulled the new monmap and deployed it. After that all was fine. Thanks for paging the ML, so I've read it :D Boris # assuming there is only one mon and you are connected to the

[ceph-users] Re: octopus 15.2.17 RGW daemons begin to crash regularly

2022-10-06 Thread Boris Behrens
Any ideas on this? Am So., 2. Okt. 2022 um 00:44 Uhr schrieb Boris Behrens : > Hi, > we are experiencing that the rgw daemons crash and I don't understand why, > Maybe someone here can lead me to a point where I can dig further. > > { > "backtrace": [ >

[ceph-users] rgw multisite octopus - bucket can not be resharded after cancelling prior reshard process

2022-10-07 Thread Boris Behrens
}, "mtime": "2022-10-07T07:16:49.231685Z", "data": { "bucket_info": { "bucket": { "name": "bucket", "marker": "ff7a8b0c-07e6-463a-861b-78f0adeba8ad.2296

[ceph-users] Re: octopus 15.2.17 RGW daemons begin to crash regularly

2022-10-07 Thread Boris Behrens
Hi Casey, thanks a lot. I added the full stack trace from our ceph-client log. Cheers Boris Am Do., 6. Okt. 2022 um 19:21 Uhr schrieb Casey Bodley : > hey Boris, > > that looks a lot like https://tracker.ceph.com/issues/40018 where an > exception was thrown when trying to rea

[ceph-users] Re: rgw multisite octopus - bucket can not be resharded after cancelling prior reshard process

2022-10-24 Thread Boris Behrens
Cheers again. I am still stuck at this. Someone got an idea how to fix it? Am Fr., 7. Okt. 2022 um 11:30 Uhr schrieb Boris Behrens : > Hi, > I just wanted to reshard a bucket but mistyped the amount of shards. In a > reflex I hit ctrl-c and waited. It looked like the resharding did not

[ceph-users] Re: rgw multisite octopus - bucket can not be resharded after cancelling prior reshard process

2022-10-25 Thread Boris Behrens
Opened a bug on the tracker for it: https://tracker.ceph.com/issues/57919 Am Fr., 7. Okt. 2022 um 11:30 Uhr schrieb Boris Behrens : > Hi, > I just wanted to reshard a bucket but mistyped the amount of shards. In a > reflex I hit ctrl-c and waited. It looked like the resharding did not

[ceph-users] MAX AVAIL goes up when I reboot an OSD node

2020-05-28 Thread Boris Behrens
autilus. Cheers and thanks in advance Boris ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: MAX AVAIL goes up when I reboot an OSD node

2020-05-29 Thread Boris Behrens
Well, this happens when any OSD goes offline. (I stopped a single OSD service on one of our OSD nodes) On Fri, May 29, 2020 at 8:44 AM KervyN wrote: > > Hi Eugene, > no. The mgr services are located on our mon servers. > > This happens when I reboot any OSD node. > > >

[ceph-users] Re: MAX AVAIL goes up when I reboot an OSD node

2020-05-29 Thread Boris Behrens
Hi Sinan, this happens with any node, and any single OSD. On Fri, May 29, 2020 at 10:09 AM si...@turka.nl wrote: > > Does this happen with any random node or specific to 1 node? > > If specific to 1 node, does this node holds more data compared to other nodes > (ceph osd df)? > > Sinan Polat > _

[ceph-users] enabling pg_autoscaler on a large production storage?

2020-06-16 Thread Boris Behrens
Hi, I would like to enable the pg_autoscaler on our nautilus cluster. Someone told me that I should be really really careful to NOT have customer impact. Maybe someone can share some experience on this? The Cluster got 455 OSDs on 19 hosts with ~17000 PGs and ~1petabyte raw storage where ~600TB

[ceph-users] Re: enabling pg_autoscaler on a large production storage?

2020-06-16 Thread Boris Behrens
eak all the target_size_ratio or > target_size_bytes accordingly. > > BTW, do you have some feeling that your 17000 PGs are currently not > correctly proportioned for your cluster? > > -- Dan > > On Tue, Jun 16, 2020 at 11:31 AM Boris Behrens wrote: > > > > Hi, &

[ceph-users] Re: enabling pg_autoscaler on a large production storage?

2020-06-16 Thread Boris Behrens
See inline comments Am Di., 16. Juni 2020 um 13:29 Uhr schrieb Zhenshi Zhou : > > I did this on my cluster and there was a huge number of pg rebalanced. > I think setting this option to 'on' is a good idea if it's a brand new > cluster. > On our new cluster we enabled them, but not on our primary

[ceph-users] Re: enabling pg_autoscaler on a large production storage?

2020-06-16 Thread Boris Behrens
67.2196 TiB pool 44 7.9 MiB 2.01k 68 MiB 096 TiB pool 5519 B 2 36 KiB 096 TiB Am Di., 16. Juni 2020 um 14:13 Uhr schrieb Dan van der Ster : > > On Tue, Jun 16, 2020 at 2:00 PM Boris B

[ceph-users] Bluestore cache parameter precedence

2020-02-04 Thread Boris Epstein
a certain value. Thanks in advance. Boris. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Bluestore cache parameter precedence

2020-02-04 Thread Boris Epstein
be happy to correct that. Regards, Boris. On Tue, Feb 4, 2020 at 12:10 PM Igor Fedotov wrote: > Hi Boris, > > general settings (unless they are set to zero) override disk-specific > settings . > > I.e. bluestore_cache_size overrides both bluestore_cache_size_hdd and > blues

<    1   2   3   4