[ceph-users] Re: unknown PGs after adding hosts in different subtree

2024-05-24 Thread Frank Schilder
have time, it would be great if you could collect information on (reproducing) the fatal peering problem. While remappings might be "unexpectedly expected" it is clearly a serious bug that incomplete and unknown PGs show up in the process of adding hosts at the root. Best regards, ==

[ceph-users] Re: unknown PGs after adding hosts in different subtree

2024-05-23 Thread Frank Schilder
e it happen separate >from adding and not a total mess with everything in parallel. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ____ From: Frank Schilder Sent: Thursday, May 23, 2024 6:32 PM To: Eugen Block Cc: ceph-u

[ceph-users] Re: unknown PGs after adding hosts in different subtree

2024-05-23 Thread Frank Schilder
unresolved. In case you need to file a tracker, please consider to refer to the two above as well as "might be related" if you deem that they might be related. Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___

[ceph-users] Re: does the RBD client block write when the Watcher times out?

2024-05-23 Thread Frank Schilder
job. The rbd interface just provides the tools to do it, for example, you can attach information that helps you hunting down dead-looking clients and kill them proper before mapping an image somewhere else. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14

[ceph-users] Re: unknown PGs after adding hosts in different subtree

2024-05-23 Thread Frank Schilder
rds, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Eugen Block Sent: Thursday, May 23, 2024 1:26 PM To: Frank Schilder Cc: ceph-users@ceph.io Subject: Re: [ceph-users] Re: unknown PGs after adding hosts in different subtree Hi Fr

[ceph-users] Re: unknown PGs after adding hosts in different subtree

2024-05-23 Thread Frank Schilder
ocess. Can you please check if my interpretation is correct and describe at which step exactly things start diverging from my expectations. Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Eugen Block Sent: Thursday,

[ceph-users] Re: How network latency affects ceph performance really with NVME only storage?

2024-05-22 Thread Frank Schilder
Hi Stefan, ahh OK, misunderstood your e-mail. It sounded like it was a custom profile, not a standard one shipped with tuned. Thanks for the clarification! = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Stefan Bauer Sent

[ceph-users] Re: How network latency affects ceph performance really with NVME only storage?

2024-05-22 Thread Frank Schilder
Hi Stefan, can you provide a link to or copy of the contents of the tuned-profile so others can also profit from it? Thanks! = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Stefan Bauer Sent: Wednesday, May 22, 2024 10:51 AM

[ceph-users] Re: dkim on this mailing list

2024-05-21 Thread Frank Schilder
Hi Marc, in case you are working on the list server, at least for me the situation seems to have improved no more than 2-3 hours ago. My own e-mails to the list now pass. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14

[ceph-users] Re: Please discuss about Slow Peering

2024-05-21 Thread Frank Schilder
a lot with IO, recovery, everything. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Anthony D'Atri Sent: Tuesday, May 21, 2024 3:06 PM To: 서민우 Cc: Frank Schilder; ceph-users@ceph.io Subject: Re: [ceph-users]

[ceph-users] Re: Please discuss about Slow Peering

2024-05-21 Thread Frank Schilder
, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: 서민우 Sent: Tuesday, May 21, 2024 11:25 AM To: Anthony D'Atri Cc: Frank Schilder; ceph-users@ceph.io Subject: Re: [ceph-users] Please discuss about Slow Peering We used the "kioxia kcd6xvul3t20&q

[ceph-users] Re: Please discuss about Slow Peering

2024-05-16 Thread Frank Schilder
latencies for your drives when peering and look for something that sticks out. People on this list were reporting quite bad results for certain infamous NVMe brands. If you state your model numbers, someone else might recognize it. Best regards, = Frank Schilder AIT Risø Campus

[ceph-users] Re: Remove an OSD with hardware issue caused rgw 503

2024-04-30 Thread Frank Schilder
still have the disk of the down OSD. Someone will send you the export/import commands within a short time. So stop worrying and just administrate your cluster with common storage admin sense. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14

[ceph-users] Re: Remove an OSD with hardware issue caused rgw 503

2024-04-30 Thread Frank Schilder
e a chance to recover data. Look at the manual of ddrescue why it is important to stop IO from a failing disk as soon as possible. Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Eugen Block Sent: Saturday, April 27, 202

[ceph-users] Re: Latest Doco Out Of Date?

2024-04-24 Thread Frank Schilder
an save time on the documentation, because it works like other stuff. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Eugen Block Sent: Wednesday, April 24, 2024 9:02 AM To: ceph-users@ceph.io Subject: [ceph-u

[ceph-users] (deep-)scrubs blocked by backfill

2024-04-17 Thread Frank Schilder
when things will go back to normal. Thanks a lot and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Have a problem with haproxy/keepalived/ganesha/docker

2024-04-16 Thread Frank Schilder
happen with this specific HA set-up in the original request, but a fail-over of the NFS server ought to be handled gracefully by starting a new one up with the IP of the down one. Or not? Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14

[ceph-users] Re: Performance improvement suggestion

2024-03-04 Thread Frank Schilder
>>> Fast write enabled would mean that the primary OSD sends #size copies to the >>> entire active set (including itself) in parallel and sends an ACK to the >>> client as soon as min_size ACKs have been received from the peers (including >>> itself). In this way, one can tolerate (size-min_size)

[ceph-users] Re: Performance improvement suggestion

2024-03-04 Thread Frank Schilder
part and external connections for the remote parts. It would be great to have similar ways of mitigating some penalties of the slow write paths to remote sites. Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Peter Grandi

[ceph-users] Re: 6 pgs not deep-scrubbed in time

2024-01-29 Thread Frank Schilder
result will be. Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Michel Niyoyita Sent: Monday, January 29, 2024 2:04 PM To: Janne Johansson Cc: Frank Schilder; E Taka; ceph-users Subject: Re: [ceph-users] Re: 6 pgs not deep

[ceph-users] Re: 6 pgs not deep-scrubbed in time

2024-01-29 Thread Frank Schilder
eps)/2 > 1. For spinners a consideration looking at the actually available drive performance is required, plus a few things more, like PG count, distribution etc. Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Wesley Dil

[ceph-users] Re: 6 pgs not deep-scrubbed in time

2024-01-29 Thread Frank Schilder
increasing the PG count for pools with lots of data. This should already relax the situation somewhat. Then do the calc above and tune deep-scrub times per pool such that they match with disk performance. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14

[ceph-users] Re: 1 clients failing to respond to cache pressure (quincy:17.2.6)

2024-01-27 Thread Frank Schilder
hen until its fixed. Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: 1 clients failing to respond to cache pressure (quincy:17.2.6)

2024-01-26 Thread Frank Schilder
, it has no performance or otherwise negative impact. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Eugen Block Sent: Friday, January 26, 2024 10:05 AM To: Özkan Göksu Cc: ceph-users@ceph.io Subject: [ceph-users

[ceph-users] Re: Degraded PGs on EC pool when marking an OSD out

2024-01-24 Thread Frank Schilder
this resolves the PG. If so, there is a temporary condition that prevents the PGs from becoming clean when going through the standard peering procedure. Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Eugen Bl

[ceph-users] List contents of stray buckets with octopus

2024-01-24 Thread Frank Schilder
://tracker.ceph.com/issues/57059 so a "dump tree" will not work. In addition, I clearly don't just need the entries in cache, I need a listing of everything. How can I get that? I'm willing to run rados commands and pipe through ceph-encoder if necessary. Thanks and best regards, =

[ceph-users] Re: Degraded PGs on EC pool when marking an OSD out

2024-01-22 Thread Frank Schilder
valid mappings, you can pull the osdmap of your cluster and use osdmaptool to experiment with it without risk of destroying anything. It allows you to try different crush rules and failure scenarios on off-line but real cluster meta-data. Best regards, ===== Frank Schilder AIT Risø Cam

[ceph-users] Re: Adding OSD's results in slow ops, inactive PG's

2024-01-18 Thread Frank Schilder
, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Eugen Block Sent: Thursday, January 18, 2024 9:46 AM To: ceph-users@ceph.io Subject: [ceph-users] Re: Adding OSD's results in slow ops, inactive PG's I'm glad to hear (or read) that it worked

[ceph-users] Re: Performance impact of Heterogeneous environment

2024-01-18 Thread Frank Schilder
, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Bailey Allison Sent: Thursday, January 18, 2024 12:36 AM To: ceph-users@ceph.io Subject: [ceph-users] Re: Performance impact of Heterogeneous environment +1 to this, great article

[ceph-users] Re: Recomand number of k and m erasure code

2024-01-15 Thread Frank Schilder
in it. We had no service outages during such operations. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Anthony D'Atri Sent: Saturday, January 13, 2024 5:36 PM To: Phong Tran Thanh Cc: ceph-users@ceph.io Subject

[ceph-users] Re: 3 DC with 4+5 EC not quite working

2024-01-12 Thread Frank Schilder
Is it maybe this here: https://docs.ceph.com/en/reef/rados/troubleshooting/troubleshooting-pg/#crush-gives-up-too-soon I always have to tweak the num-tries parameters. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14

[ceph-users] Re: Rack outage test failing when nodes get integrated again

2024-01-11 Thread Frank Schilder
to file a tracker issue. I observed this with mimic, but since you report it for Pacific I'm pretty sure its affecting all versions. My guess is that this is not part of the CI testing, at least not in a way that covers network cut-off. Best regards, = Frank Schilder AIT Risø Campus

[ceph-users] Re: How to configure something like osd_deep_scrub_min_interval?

2024-01-09 Thread Frank Schilder
waiting for some deep-scrub histograms to converge to equilibrium. This takes months for our large pools, but I would like to have the numbers for an example of how it should look like. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14

[ceph-users] Re: How to configure something like osd_deep_scrub_min_interval?

2023-12-15 Thread Frank Schilder
Hi all, another quick update: please use this link to download the script: https://github.com/frans42/ceph-goodies/blob/main/scripts/pool-scrub-report The one I sent originally does not follow latest. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14

[ceph-users] Re: How to configure something like osd_deep_scrub_min_interval?

2023-12-13 Thread Frank Schilder
ful in general. Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-us

[ceph-users] Re: increasing number of (deep) scrubs

2023-12-13 Thread Frank Schilder
Yes, octopus. -- Frank = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Szabo, Istvan (Agoda) Sent: Wednesday, December 13, 2023 6:13 AM To: Frank Schilder; ceph-users@ceph.io Subject: Re: [ceph-users] Re: increasing number

[ceph-users] Re: increasing number of (deep) scrubs

2023-12-12 Thread Frank Schilder
ick update in the other thread, because the solution was not to increase the number of scrubs, but to tune parameters. Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 ____ From: Frank Schilder Sent: Monday, January 9, 20

[ceph-users] Re: How to configure something like osd_deep_scrub_min_interval?

2023-12-12 Thread Frank Schilder
on.ceph-01 mon_warn_pg_not_deep_scrubbed_ratio=0.75 warn: 24.5d Best regards, merry Christmas and a happy new year to everyone! ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe sen

[ceph-users] Re: ceph fs (meta) data inconsistent

2023-12-08 Thread Frank Schilder
Hi Xiubo, I will update the case. I'm afraid this will have to wait a little bit though. I'm too occupied for a while and also don't have a test cluster that would help speed things up. I will update you, please keep the tracker open. Best regards, = Frank Schilder AIT Risø

[ceph-users] Re: EC Profiles & DR

2023-12-06 Thread Frank Schilder
o way around it. I was happy when I got the extra hosts. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Curt Sent: Wednesday, December 6, 2023 3:56 PM To: Patrick Begou Cc: ceph-users@ceph.io Subject: [ceph-user

[ceph-users] ceph df reports incorrect stats

2023-12-06 Thread Frank Schilder
host ceph-20 -64 99.77657 host ceph-21 -66 103.56137 host ceph-22 -1 0 root default Best regards, =====

[ceph-users] Re: [ext] CephFS pool not releasing space after data deletion

2023-12-02 Thread Frank Schilder
Hi Mathias, have you made any progress on this? Did the capacity become available eventually? Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Kuhring, Mathias Sent: Friday, October 27, 2023 3:52 PM To: ceph

[ceph-users] Re: ceph fs (meta) data inconsistent

2023-12-01 Thread Frank Schilder
include the part executed on the second host explicitly in an ssh-command. Running your scripts alone in their current form will not reproduce the issue. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Xiubo Li

[ceph-users] Re: ceph fs (meta) data inconsistent

2023-11-24 Thread Frank Schilder
to know the python and libc versions. We observe this only for newer versions of both. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Xiubo Li Sent: Thursday, November 23, 2023 3:47 AM To: Frank Schilder; Gregory

[ceph-users] Re: Full cluster outage when ECONNREFUSED is triggered

2023-11-24 Thread Frank Schilder
. If not, then there is something wrong with the down reporting that should be looked at. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: Friday, November 24, 2023 1:20 PM To: Denis Krienbühl; Burkhard

[ceph-users] Re: Full cluster outage when ECONNREFUSED is triggered

2023-11-24 Thread Frank Schilder
igated the relevant code lines, please update/create the tracker with your findings. Hope a dev looks at this. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Denis Krienbühl Sent: Friday, November 24, 2023 12:04 PM To: Bu

[ceph-users] Re: Full cluster outage when ECONNREFUSED is triggered

2023-11-24 Thread Frank Schilder
the connection error. I think the intention is to shut down fast the OSDs with connection refused (where timeouts are not required) and not other OSDs. A bug report with tracker seems warranted. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14

[ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported"

2023-11-22 Thread Frank Schilder
directories to ranks, all our problems disappeared and performance improved a lot. MDS load dropped from 130% average to 10-20%. So did memory consumption and cache recycling. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14

[ceph-users] Re: How to use hardware

2023-11-20 Thread Frank Schilder
with large min_alloc_sizes has to be S3-like, only upload, download and delete are allowed. Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Anthony D'Atri Sent: Saturday, November 18, 2023 3:24 PM To: Simon Kepp Cc: Albert

[ceph-users] Re: How to configure something like osd_deep_scrub_min_interval?

2023-11-16 Thread Frank Schilder
to. This will need 1-2 months observations and I will report back when significant changes show up. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: Wednesday, November 15, 2023 11:14 AM To: ceph

[ceph-users] How to configure something like osd_deep_scrub_min_interval?

2023-11-15 Thread Frank Schilder
for(pg in pgs) { split(pg_osds[pgs[pg]], osds) for(o in osds) if(osd[osds[o]]=="busy") osds_busy=1 if(osds_busy) printf(" %s*", pgs[pg]) if(!osds_

[ceph-users] Re: ceph fs (meta) data inconsistent

2023-11-10 Thread Frank Schilder
the python >> findings above, is this something that should work on ceph or is it a python >> issue? > > Not sure yet. I need to understand what exactly shutil.copy does in kclient. Thanks! Will wait for further instructions. = Frank Schilder AIT Risø Campus B

[ceph-users] Re: ceph fs (meta) data inconsistent

2023-11-09 Thread Frank Schilder
by setting: > [...] I can do a test with MDS logs on high level. Before I do that, looking at the python findings above, is this something that should work on ceph or is it a python issue? Thanks for your help! = Frank Schilder AIT Risø Campus Bygning 109, rum S14 __

[ceph-users] Re: MDS stuck in rejoin

2023-11-09 Thread Frank Schilder
regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Xiubo Li Sent: Wednesday, November 8, 2023 1:38 AM To: Frank Schilder; ceph-users@ceph.io Subject: Re: [ceph-users] Re: MDS stuck in rejoin Hi Frank, Recently I found

[ceph-users] Re: ceph fs (meta) data inconsistent

2023-11-03 Thread Frank Schilder
0-bare These are easybuild python modules using different gcc versions to build. The default version of python referred to is Python 2.7.5. Is this a known problem with python3 and is there a patch we can apply? I wonder how python manages to break the file system so consistently. Thanks and best r

[ceph-users] Re: ceph fs (meta) data inconsistent

2023-11-02 Thread Frank Schilder
to reboot later today the server where the file was written. Until hen we can do diagnostics while the issue is visible. Please let us know what information we can provide. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From

[ceph-users] ceph fs (meta) data inconsistent

2023-11-01 Thread Frank Schilder
different numbers, I see a 0 length now everywhere for the moved folder. I'm pretty sure though that the file still is non-zero length. Thanks for any pointers. = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing

[ceph-users] Re: find PG with large omap object

2023-10-31 Thread Frank Schilder
ting pools without asking. It would be great if you could add a sanity check that confirms that RGW services are actually present *before* executing any radosgw-admin command and exiting if none are present. Best regards, ===== Frank Schilder AIT Risø Campus Bygning 10

[ceph-users] Combining masks in ceph config

2023-10-25 Thread Frank Schilder
a syntax error, but I'm also not sure it does the right thing. Does the above mean "class:hdd and datacenter:A" or does it mean "for OSDs with device class 'hdd,datacenter:A'"? Thanks and best regards, ===== Frank Schilder AIT Risø Campus

[ceph-users] Re: stuck MDS warning: Client HOST failing to respond to cache pressure

2023-10-19 Thread Frank Schilder
to show 2% CPU usage even though there was no file IO going on. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: Thursday, October 19, 2023 10:02 AM To: Stefan Kooman; ceph-users@ceph.io Subject

[ceph-users] Re: stuck MDS warning: Client HOST failing to respond to cache pressure

2023-10-19 Thread Frank Schilder
he client have so many caps allocated? Is there another way than open files that requires allocations? - Is there a way to find out what these caps are for? - We will look at the code (its python+miniconda), any pointers what to look for? Thanks and best regards, = Frank Schilder AIT Risø

[ceph-users] Re: Ceph 16.2.x mon compactions, disk writes

2023-10-18 Thread Frank Schilder
Hi Zakhar, since its a bit beyond of the scope of basic, could you please post the complete ceph.conf config section for these changes for reference? Thanks! = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Zakhar Kirpichenko

[ceph-users] Re: stuck MDS warning: Client HOST failing to respond to cache pressure

2023-10-18 Thread Frank Schilder
ing caches for a bunch of users who are *not* running a job. I guess I have to wait for the jobs to end. Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Loïc Tortay Sent: Tuesday, October 17, 2023 3:40 PM To: Frank Schi

[ceph-users] Re: stuck MDS warning: Client HOST failing to respond to cache pressure

2023-10-17 Thread Frank Schilder
, a user-level command that I could execute on the client without possibly affecting other users jobs. Thanks and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Stefan Kooman Sent: Tuesday, October 17, 2023 11:13 AM

[ceph-users] stuck MDS warning: Client HOST failing to respond to cache pressure

2023-10-17 Thread Frank Schilder
t to evict/reboot just because of that. Thanks and best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: find PG with large omap object

2023-10-16 Thread Frank Schilder
issued a deep-scrub on this PG and the warning is resolved. Thanks and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Eugen Block Sent: Monday, October 16, 2023 2:41 PM To: Frank Schilder Cc: ceph-users@ceph.io

[ceph-users] Re: find PG with large omap object

2023-10-16 Thread Frank Schilder
I still don't know and can neither conclude which PG the warning originates from. As far as I can tell, the warning should not be there. Do you have an idea how to continue diagnosis from here apart from just trying a deep scrub on all PGs in the list from the log? Thanks and best regards, ==

[ceph-users] find PG with large omap object

2023-10-16 Thread Frank Schilder
that ought to be part of "ceph health detail". Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Please help collecting stats of Ceph monitor disk writes

2023-10-13 Thread Frank Schilder
l do. Its the start time of the log that's missing. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Please help collecting stats of Ceph monitor disk writes

2023-10-13 Thread Frank Schilder
/s wr Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Ceph 16.2.x mon compactions, disk writes

2023-10-11 Thread Frank Schilder
so provide extra endurance with SSDs with good controllers. I also think the recommendations on the ceph docs deserve a reality check. Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Zakhar Kirpichenko Sent:

[ceph-users] Re: Ceph 16.2.x mon compactions, disk writes

2023-10-11 Thread Frank Schilder
MON store for a healthy cluster is 500M-1G, but we have seen this ballooning to 100+GB in degraded conditions. Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Zakhar Kirpichenko Sent: Wednesday, October 11, 202

[ceph-users] Re: backfill_wait preventing deep scrubs

2023-09-21 Thread Frank Schilder
Thanks! Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Mykola Golub Sent: Thursday, September 21, 2023 4:53 PM To: Frank Schilder Cc: ceph-users@ceph.io Subject: Re: [ceph-users] backfill_wait preventing deep

[ceph-users] backfill_wait preventing deep scrubs

2023-09-21 Thread Frank Schilder
and IOP/s available to deep-scrub PGs on the side, but since the backfill started there is zero scrubbing/deep scrubbing going on and "PGs not deep scrubbed in time" messages are piling up. Is there a way to allow (deep) scrub in this situation? Thanks and best regards, =

[ceph-users] Re: libceph: mds1 IP+PORT wrong peer at address

2023-09-20 Thread Frank Schilder
to reboot the client hosts. Please specify what is trying to reach the outdated OSD instances. Then a relevant developer is more likely to look at it. Since its not MDS-kclient interaction it might be useful to open a new case. Best regards, ===== Frank Schilder AIT Risø Campus Bygning 10

[ceph-users] Re: MDS daemons don't report any more

2023-09-12 Thread Frank Schilder
==== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Patrick Donnelly Sent: Monday, September 11, 2023 7:51 PM To: Frank Schilder Cc: ceph-users@ceph.io Subject: Re: [ceph-users] Re: MDS daemons don't report any more Hello Frank, On Mon, Sep 11, 2023

[ceph-users] Re: MDS daemons don't report any more

2023-09-11 Thread Frank Schilder
yet harmless bug? Thanks for any help! = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: Sunday, September 10, 2023 12:39 AM To: ceph-users@ceph.io Subject: [ceph-users] MDS daemons don't report any more Hi all

[ceph-users] Re: MGR executes config rm all the time

2023-09-11 Thread Frank Schilder
Thanks for getting back. We are on octopus, looks like I can't do much about it. I hope it is just spam and harmless otherwise. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Mykola Golub Sent: Sunday

[ceph-users] MGR executes config rm all the time

2023-09-10 Thread Frank Schilder
eph-25' cmd=[{"prefix":"config rm","who":"mgr","name":"mgr/rbd_support/ceph-25/mirror_snapshot_schedule"}]: dispatch We don't have mirroring, ts a single cluster. What is going on here and how can I stop that? I already restarted all MGR daemons to no avail. Thanks and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] MDS daemons don't report any more

2023-09-09 Thread Frank Schilder
covery: 8.7 GiB/s, 3.41k objects/s My first thought is that the status module failed. However, I don't manage to restart it (always on). An MGR fail-over did not help. Any ideas what is going on here? Thanks and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___

[ceph-users] Re: Is it possible (or meaningful) to revive old OSDs?

2023-09-07 Thread Frank Schilder
norecover and norebalance until you see in the OSD log that they have the latest OSD map version. Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Malte Stroem Sent: Wednesday, September 6, 2023 4:16 PM

[ceph-users] Re: A couple OSDs not starting after host reboot

2023-08-29 Thread Frank Schilder
h-volume inventory" says and if you can manually activate/deactivate the OSD on disk (be careful to include the --no-systemd option everywhere to avoid unintended change of persistent configurations). Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14

[ceph-users] Re: Client failing to respond to capability release

2023-08-23 Thread Frank Schilder
Hi Dhairya, this is the thing, the client appeared to be responsive and worked fine (file system was on-line and responsive as usual). There was something off though; see my response to Eugen. Thanks and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14

[ceph-users] Re: Client failing to respond to capability release

2023-08-23 Thread Frank Schilder
that out. On first inspection these clients looked OK. Only some deeper debugging revealed that something was off. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Eugen Block Sent: Wednesday, August 23, 2023 8:55 AM

[ceph-users] Client failing to respond to capability release

2023-08-22 Thread Frank Schilder
quot;h":"sn352.hpc.ait.dtu.dk","addr":"client.145698301 v1:192.168.57.223:0/2146607320","fs":"/hpc/groups","caps":7,"req":0} We have mds_min_caps_per_client=4096, so it looks like the limit is well satisfied. Also, the file system is pretty idle at the moment. Why and what exactly is the MDS complaining about here? Thanks and best regards. = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: snaptrim number of objects

2023-08-21 Thread Frank Schilder
, rebuilding the OSDs might be the only way out. You should also confirm that all ceph-daemons are on the same version and that require-osd-release is reporting the same major version as well: ceph report | jq '.osdmap.require_osd_release' Best regards, = Frank Schilder AIT Risø Campus

[ceph-users] Re: MDS stuck in rejoin

2023-08-08 Thread Frank Schilder
, I'm eagerly waiting for this and another one. Any idea when they might show up in distro kernels? Thanks and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Xiubo Li Sent: Tuesday, August 8, 2023 2:57 AM

[ceph-users] Re: MDS stuck in rejoin

2023-08-07 Thread Frank Schilder
46:41 2023] ceph: mds0 reconnect denied [Mon Jul 31 09:46:41 2023] ceph: mds5 reconnect denied [Mon Jul 31 09:46:41 2023] ceph: mds3 reconnect denied [Mon Jul 31 09:46:41 2023] ceph: mds7 reconnect denied [Mon Jul 31 09:46:41 2023] ceph: mds4 reconnect denied Best regards, ===== Frank Sc

[ceph-users] Re: MDS stuck in rejoin

2023-07-31 Thread Frank Schilder
regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Xiubo Li Sent: Monday, July 31, 2023 4:12 AM To: Frank Schilder; ceph-users@ceph.io Subject: Re: [ceph-users] Re: MDS stuck in rejoin Hi Frank, On 7/30/23 16:52, Frank

[ceph-users] Re: MDS stuck in rejoin

2023-07-30 Thread Frank Schilder
it in this state until I return from holidays. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Xiubo Li Sent: Friday, July 28, 2023 11:37 AM To: Frank Schilder; ceph-users@ceph.io Subject: Re: [ceph-users] Re: MDS stuck

[ceph-users] Re: MDS stuck in rejoin

2023-07-26 Thread Frank Schilder
ce its oldest_client_tid (16121616), 10 completed requests recorded in session Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: MDS stuck in rejoin

2023-07-24 Thread Frank Schilder
like our situation was of a more harmless nature. Still, the fail did not go entirely smooth. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Xiubo Li Sent: Friday, July 21, 2023 1:32 PM To: ceph-users@ceph.io

[ceph-users] MDS stuck in rejoin

2023-07-20 Thread Frank Schilder
ssage like the above, what is the clean way of getting the client clean again (version: 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) octopus (stable))? Thanks and best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___

[ceph-users] Re: OSD memory usage after cephadm adoption

2023-07-17 Thread Frank Schilder
in even the latest docs: https://docs.ceph.com/en/quincy/rados/configuration/ceph-conf/#sections-and-masks . Would be great if someone could add this. Thanks! = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Luis Domingues Sent

[ceph-users] Re: Cluster down after network outage

2023-07-13 Thread Frank Schilder
lying to my messages and for pointing me to osd_recovery_delay_start. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Dan van der Ster Sent: Wednesday, July 12, 2023 6:58 PM To: Frank Schilder Cc: ceph-users@ceph.io Subject: Re: [ce

[ceph-users] Re: Cluster down after network outage

2023-07-12 Thread Frank Schilder
Answering myself for posteriority. The rebalancing list disappeared after waiting even longer. Might just have been an MGR that needed to catch up. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder

[ceph-users] Re: Cluster down after network outage

2023-07-12 Thread Frank Schilder
) [===.] Rebalancing after osd.142 marked in (2s) [===.] Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: Wednesday, July 12, 2023 9:53

[ceph-users] Cluster down after network outage

2023-07-12 Thread Frank Schilder
: client: 1.8 MiB/s rd, 18 MiB/s wr, 409 op/s rd, 796 op/s wr Thanks for any hints! = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le

[ceph-users] Re: 1 pg inconsistent and does not recover

2023-06-28 Thread Frank Schilder
tly like what you are doing. Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Stefan Kooman Sent: Wednesday, June 28, 2023 2:17 PM To: Frank Schilder; Alexander E. Patrakov; Niklas Hambüchen Cc: ceph-users@ceph.i

[ceph-users] Re: 1 pg inconsistent and does not recover

2023-06-28 Thread Frank Schilder
A repair always comes with a deep scrub. You can replace it if you want. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Niklas Hambüchen Sent: Wednesday, June 28, 2023 3:05 PM To: Frank Schilder; Alexander E

  1   2   3   4   5   6   7   8   >