[ceph-users] Re: quincy v17.2.4 QE Validation status

2022-09-13 Thread Casey Bodley
On Tue, Sep 13, 2022 at 4:03 PM Yuri Weinstein wrote: > > Details of this release are summarized here: > > https://tracker.ceph.com/issues/57472#note-1 > Release Notes - https://github.com/ceph/ceph/pull/48072 > > Seeking approvals for: > > rados - Neha, Travis, Ernesto, Adam > rgw - Casey rgw

[ceph-users] Re: laggy OSDs and staling krbd IO after upgrade from nautilus to octopus

2022-09-13 Thread Boris Behrens
Hi, i just checked and all OSDs have it set to true. It seems also not a problem with the snaptrim opration. We just had two times in the last 7 days where nearly all OSDs logged a lot (around 3k times in 20 minutes) of these messages: 022-09-12T20:27:19.146+0200 7f576de49700 -1 osd.9 786378

[ceph-users] Re: laggy OSDs and staling krbd IO after upgrade from nautilus to octopus

2022-09-13 Thread Wesley Dillingham
I haven't read through this entire thread so forgive me if already mentioned: What is the parameter "bluefs_buffered_io" set to on your OSDs? We once saw a terrible slowdown on our OSDs during snaptrim events and setting bluefs_buffered_io to true alleviated that issue. That was on a nautilus

[ceph-users] Re: 16.2.10 Cephfs with CTDB, Samba running on Ubuntu

2022-09-13 Thread Marco Pizzolo
Thanks so much Bailey, Tim, We've been pinned the past week and a half, but will look at reviewing the configuration provided this week or more likely next. Thanks again. Marco On Fri, Sep 9, 2022 at 10:29 AM Bailey Allison wrote: > Hi Tim, > > We've actually been having issues with ceph

[ceph-users] Re: laggy OSDs and staling krbd IO after upgrade from nautilus to octopus

2022-09-13 Thread Boris Behrens
The cluster is SSD only with 2TB,4TB and 8TB disks. I would expect that this should be done fairly fast. For now I will recreate every OSD in the cluster and check if this helps. Do you experience slow OPS (so the cluster shows a message like "cluster [WRN] Health check update: 679 slow ops,

[ceph-users] Re: Increasing number of unscrubbed PGs

2022-09-13 Thread Wesley Dillingham
what does "ceph pg ls scrubbing" show? Do you have PGs that have been stuck in a scrubbing state for a long period of time (many hours,days,weeks etc). This will show in the "SINCE" column. Respectfully, *Wes Dillingham* w...@wesdillingham.com LinkedIn

[ceph-users] Re: laggy OSDs and staling krbd IO after upgrade from nautilus to octopus

2022-09-13 Thread Marc
> > It might be possible that converting OSDs before setting require-osd- > release=octopus leads to a broken state of the converted OSDs. I could > not yet find a way out of this situation. We will soon perform a third > upgrade test to test this hypothesis. > So with upgrading one should

[ceph-users] Re: Increasing number of unscrubbed PGs

2022-09-13 Thread Burkhard Linke
Hi Josh, thx for the link. I'm not sure whether this is the root cause, since we did not use the noscrub and nodeepscrub flags in the past. I've set them for a short period to test whether removing the flag triggers more backfilling. During that time no OSD were restarted etc. But the

[ceph-users] Re: laggy OSDs and staling krbd IO after upgrade from nautilus to octopus

2022-09-13 Thread Boris Behrens
Hi Frank, we converted the OSDs directly on the upgrade. 1. installing new ceph versions 2. restart all OSD daemons 3. wait some time (took around 5-20 minutes) 4. all OSDs were online again. So I would expect, that the OSDs are all upgraded correctly. I also checked when the trimming happens,

[ceph-users] Re: laggy OSDs and staling krbd IO after upgrade from nautilus to octopus

2022-09-13 Thread Boris Behrens
I checked the cluster for other snaptrim operations and they happen all over the place, so for me it looks like they just happend to be done when the issue occured, but were not the driving factor. Am Di., 13. Sept. 2022 um 12:04 Uhr schrieb Boris Behrens : > Because someone mentioned that the

[ceph-users] Re: laggy OSDs and staling krbd IO after upgrade from nautilus to octopus

2022-09-13 Thread Boris Behrens
Because someone mentioned that the attachments did not went through I created pastebin links: monlog: https://pastebin.com/jiNPUrtL osdlog: https://pastebin.com/dxqXgqDz Am Di., 13. Sept. 2022 um 11:43 Uhr schrieb Boris Behrens : > Hi, I need you help really bad. > > we are currently

[ceph-users] laggy OSDs and staling krbd IO after upgrade from nautilus to octopus

2022-09-13 Thread Boris Behrens
Hi, I need you help really bad. we are currently experiencing a very bad cluster hangups that happen sporadic. (once on 2022-09-08 mid day (48 hrs after the upgrade) and once 2022-09-12 in the evening) We use krbd without cephx for the qemu clients and when the OSDs are getting laggy, the krbd

[ceph-users] Re: OSD Crash in recovery: SST file contains data beyond the point of corruption.

2022-09-13 Thread Igor Fedotov
Hi Benjamin, sorry for the confusion this should be kSkipAnyCorruptedRecords not kSkipAnyCorruptedRecord Thanks, Igor On 9/12/2022 11:26 PM, Benjamin Naber wrote: Hi Igor, looks like the setting wont work, the container now starts with a different error message that the setting is an

[ceph-users] Re: Ceph on windows (wnbd) rbd.exe keeps crashing

2022-09-13 Thread Lucian Petrut
Hi, Thanks for bringing up this issue. The Pacific release doesn't support ipv6 but fortunately there's a backport that's likely going to merge soon: https://github.com/ceph/ceph/pull/47303. There are a few other fixes that we'd like to include before releasing new Pacific and Quincy MSIs.

[ceph-users] Re: just-rebuilt mon does not join the cluster

2022-09-13 Thread Jan Kasprzak
Hello, Stefan Kooman wrote: : Hi, : : On 9/9/22 10:53, Frank Schilder wrote: : >Is there a chance you might have seen this https://tracker.ceph.com/issues/49231 ? : > : >Do you have network monitoring with packet reports? It is possible though that you have observed something new. : >