[ceph-users] Re: BlueStore fragmentation woes

2023-05-24 Thread Hector Martin
On 25/05/2023 01.40, 胡 玮文 wrote: > Hi Hector, > > Not related to fragmentation. But I see you mentioned CephFS, and your OSDs > are at high utilization. Is your pool NEAR FULL? CephFS write performance is > severely degraded if the pool is NEAR FULL. Buffered write will be disabled, > and

[ceph-users] Re: [Help appreciated] ceph mds damaged

2023-05-24 Thread Justin Li
Hi Patrick, Thanks for the instructions. We started the MDS recovery scan with below cmds following the link below. The first bit of scan extens has finished and we're waiting on scan inodes. Probably we shouldn't interrupt the process. Once this procedure failed, I'll follow your steps and

[ceph-users] Re: MDS Upgrade from 17.2.5 to 17.2.6 not possible

2023-05-24 Thread Dhairya Parmar
On Wed, May 17, 2023 at 9:26 PM Henning Achterrath wrote: > Hi all, > > we did a major update from Pacific to Quincy (17.2.5) a month ago > without any problems. > > Now we have tried a minor update from 17.2.5 to 17.2.6 (ceph orch > upgrade). It stucks at mds upgrade phase. At this point the

[ceph-users] Re: MDS Upgrade from 17.2.5 to 17.2.6 not possible

2023-05-24 Thread 胡 玮文
Hi Henning, I think the increasing strays_created is normal. This is a counter that is monotonically increasing when any file is deleted. And is only reset when the MDS is restarted. The num_strays is the actual number of strays in your system, and they are not necessarily reside in memory.

[ceph-users] Re: Ceph Tech Talk For May 2023: RGW Lua Scripting Code Walkthrough

2023-05-24 Thread Mike Perez
Hi everyone, Just a reminder that we will be starting at the next hour, 17:00 UTC. https://ceph.io/en/community/tech-talks/ On Tue, May 16, 2023 at 10:19 AM Mike Perez wrote: > Hello everyone, > > Join us on May 24th at 17:00 UTC for a long overdue Ceph Tech Talk! This > month, Yuval Lifshitz

[ceph-users] Re: BlueStore fragmentation woes

2023-05-24 Thread 胡 玮文
Hi Hector, Not related to fragmentation. But I see you mentioned CephFS, and your OSDs are at high utilization. Is your pool NEAR FULL? CephFS write performance is severely degraded if the pool is NEAR FULL. Buffered write will be disabled, and every single write() system call needs to wait

[ceph-users] Unexpected behavior of directory mtime after being set explicitly

2023-05-24 Thread Sandip Divekar
Hi Team, I'm writing to bring to your attention an issue we have encountered with the "mtime" (modification time) behavior for directories in the Ceph filesystem. Upon observation, we have noticed that when the mtime of a directory (let's say: dir1) is explicitly changed in CephFS, subsequent

[ceph-users] Re: ceph Pacific - MDS activity freezes when one the MDSs is restarted

2023-05-24 Thread Wesley Dillingham
There was a memory issue with standby-replay that may have been resolved since and fix is in 16.2.10 (not sure), the suggestion at the time was to avoid standby-replay. Perhaps a dev can chime in on that status. Your MDSs look pretty inactive. I would consider scaling them down (potentially to

[ceph-users] Re: Training on ceph fs

2023-05-24 Thread Stefan Kooman
On 5/24/23 14:03, Emmanuel Jaep wrote: Hi, I inherited a ceph fs cluster. Even if I have years of experience in systems management, I fail to grasp the complete logic of it fully. From what I found on the web, the documentation is either too "high level" or too detailed. Is this a setup

[ceph-users] Re: ceph Pacific - MDS activity freezes when one the MDSs is restarted

2023-05-24 Thread Eugen Block
Hi, using standby-replay daemons is something to test as it can have a negative impact, it really depends on the actual workload. We stopped using standby-replay in all clusters we (help) maintain, in one specific case with many active MDSs and a high load the failover time decreased and

[ceph-users] Re: ceph Pacific - MDS activity freezes when one the MDSs is restarted

2023-05-24 Thread Emmanuel Jaep
So I guess, I'll end up doing: ceph fs set cephfs max_mds 4 ceph fs set cephfs allow_standby_replay true On Wed, May 24, 2023 at 4:13 PM Hector Martin wrote: > Hi, > > On 24/05/2023 22.02, Emmanuel Jaep wrote: > > Hi Hector, > > > > thank you very much for the detailed explanation and link to

[ceph-users] Re: BlueStore fragmentation woes

2023-05-24 Thread Hector Martin
On 24/05/2023 22.07, Mark Nelson wrote: > Yep, bluestore fragmentation is an issue.  It's sort of a natural result > of using copy-on-write and never implementing any kind of > defragmentation scheme.  Adam and I have been talking about doing it > now, probably piggybacking on scrub or other

[ceph-users] Re: ceph Pacific - MDS activity freezes when one the MDSs is restarted

2023-05-24 Thread Hector Martin
Hi, On 24/05/2023 22.02, Emmanuel Jaep wrote: > Hi Hector, > > thank you very much for the detailed explanation and link to the > documentation. > > Given our current situation (7 active MDSs and 1 standby MDS): > RANK STATE MDS ACTIVITY DNSINOS DIRS CAPS > 0

[ceph-users] Re: MDS crashes to damaged metadata

2023-05-24 Thread Patrick Donnelly
On Wed, May 24, 2023 at 4:26 AM Stefan Kooman wrote: > > On 5/22/23 20:24, Patrick Donnelly wrote: > > > > > The original script is here: > > https://github.com/ceph/ceph/blob/main/src/tools/cephfs/first-damage.py > > > "# Suggested recovery sequence (for single MDS cluster): > # > # 1) Unmount

[ceph-users] Re: [Help appreciated] ceph mds damaged

2023-05-24 Thread Patrick Donnelly
Hello Justin, Please do: ceph config set mds debug_mds 20 ceph config set mds debug_ms 1 Then wait for a crash. Please upload the log. To restore your file system: ceph config set mds mds_abort_on_newly_corrupt_dentry false Let the MDS purge the strays and then try: ceph config set mds

[ceph-users] Re: [EXTERN] Re: cephfs max_file_size

2023-05-24 Thread Gregory Farnum
On Tue, May 23, 2023 at 11:52 PM Dietmar Rieder wrote: > > On 5/23/23 15:58, Gregory Farnum wrote: > > On Tue, May 23, 2023 at 3:28 AM Dietmar Rieder > > wrote: > >> > >> Hi, > >> > >> can the cephfs "max_file_size" setting be changed at any point in the > >> lifetime of a cephfs? > >> Or is it

[ceph-users] Re: BlueStore fragmentation woes

2023-05-24 Thread Mark Nelson
Yep, bluestore fragmentation is an issue.  It's sort of a natural result of using copy-on-write and never implementing any kind of defragmentation scheme.  Adam and I have been talking about doing it now, probably piggybacking on scrub or other operations that already area reading all of the

[ceph-users] Re: ceph Pacific - MDS activity freezes when one the MDSs is restarted

2023-05-24 Thread Emmanuel Jaep
Hi Hector, thank you very much for the detailed explanation and link to the documentation. Given our current situation (7 active MDSs and 1 standby MDS): RANK STATE MDS ACTIVITY DNSINOS DIRS CAPS 0active icadmin012 Reqs: 82 /s 2345k 2288k 97.2k 307k 1

[ceph-users] Re: ceph Pacific - MDS activity freezes when one the MDSs is restarted

2023-05-24 Thread Hector Martin
On 24/05/2023 21.15, Emmanuel Jaep wrote: > Hi, > > we are currently running a ceph fs cluster at the following version: > MDS version: ceph version 16.2.10 > (45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific (stable) > > The cluster is composed of 7 active MDSs and 1 standby MDS: > RANK STATE

[ceph-users] Re: Slow recovery on Quincy

2023-05-24 Thread Sake Paulusma
Thanks, will keep an eye out for this version. Will report back to this thread about these options and the recovery time/number of objects per second for recovery. Again, thank you'll for the information and answers! ___ ceph-users mailing list --

[ceph-users] Re: Slow recovery on Quincy

2023-05-24 Thread Sridhar Seshasayee
Yes, the fix should be in the next quincy upstream version. The version I posted was the downstream one. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: ceph Pacific - MDS activity freezes when one the MDSs is restarted

2023-05-24 Thread Hector Martin
On 24/05/2023 21.15, Emmanuel Jaep wrote: > Hi, > > we are currently running a ceph fs cluster at the following version: > MDS version: ceph version 16.2.10 > (45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific (stable) > > The cluster is composed of 7 active MDSs and 1 standby MDS: > RANK STATE

[ceph-users] Re: MDS Upgrade from 17.2.5 to 17.2.6 not possible

2023-05-24 Thread Henning Achterrath
Hello again, In two days, the number has increased by about one and a half million and the ram usage of mds remains high by about 50G. We are very unsure if this is a normal behavior. Today: "num_strays": 53695, "num_strays_delayed": 4, "num_strays_enqueuing": 0,

[ceph-users] BlueStore fragmentation woes

2023-05-24 Thread Hector Martin
Hi, I've been seeing relatively large fragmentation numbers on all my OSDs: ceph daemon osd.13 bluestore allocator score block { "fragmentation_rating": 0.77251526920454427 } These aren't that old, as I recreated them all around July last year. They mostly hold CephFS data with erasure

[ceph-users] ceph Pacific - MDS activity freezes when one the MDSs is restarted

2023-05-24 Thread Emmanuel Jaep
Hi, we are currently running a ceph fs cluster at the following version: MDS version: ceph version 16.2.10 (45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific (stable) The cluster is composed of 7 active MDSs and 1 standby MDS: RANK STATE MDS ACTIVITY DNSINOS DIRS CAPS

[ceph-users] Training on ceph fs

2023-05-24 Thread Emmanuel Jaep
Hi, I inherited a ceph fs cluster. Even if I have years of experience in systems management, I fail to grasp the complete logic of it fully. >From what I found on the web, the documentation is either too "high level" or too detailed. Do you know any good resources to get fully acquainted with

[ceph-users] Re: Slow recovery on Quincy

2023-05-24 Thread Sake Paulusma
If I glance at the commits to the quincy branch, shouldn't the mentioned configurations be included in 17.2.7? The requested command output: [ceph: root@mgrhost1 /]# ceph version ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable) [ceph: root@mgrhost1 /]# ceph config

[ceph-users] Re: Troubleshooting "N slow requests are blocked > 30 secs" on Pacific

2023-05-24 Thread Emmanuel Jaep
Absolutely! :-) root@icadmin011:/tmp# ceph --cluster floki daemon mds.icadmin011 dump cache /tmp/dump.txt root@icadmin011:/tmp# ll total 48 drwxrwxrwt 12 root root 4096 May 24 13:23 ./ drwxr-xr-x 18 root root 4096 Jun 9 2022 ../ drwxrwxrwt 2 root root 4096 May 4 12:43 .ICE-unix/ drwxrwxrwt

[ceph-users] Re: Slow recovery on Quincy

2023-05-24 Thread Sridhar Seshasayee
I'm on 17.2.6, but the option "osd_mclock_max_sequential_bandwidth_hdd" > isn't available when I try to set it via "ceph config set osd.0 > osd_mclock_max_sequential_bandwidth_hdd 500Mi". > > Can you paste the output of 1. ceph version 2. ceph config show-with-defaults osd.0 | grep osd_mclock 3.

[ceph-users] Re: Troubleshooting "N slow requests are blocked > 30 secs" on Pacific

2023-05-24 Thread Milind Changire
I hope the daemon mds.icadmin011 is running on the same machine that you are looking for /tmp/dump.txt, since the file is created on the system which has that daemon running. On Wed, May 24, 2023 at 2:16 PM Emmanuel Jaep wrote: > Hi Milind, > > you are absolutely right. > > The

[ceph-users] [cephfs-data-scan] Estimate time for scanning extents and inodes

2023-05-24 Thread Justin Li
Dear All, I'm using metadata repair tools to repair a damaged MDS following below document. My storage has about 276TB data. Cephfs-data-scan is using 32 workers. How long will it take to finish scanning extents? What about scanning inodes? It has run 6 hours and metadata pool dropped 1G. Is

[ceph-users] Re: Slow recovery on Quincy

2023-05-24 Thread Sake Paulusma
I'm on 17.2.6, but the option "osd_mclock_max_sequential_bandwidth_hdd" isn't available when I try to set it via "ceph config set osd.0 osd_mclock_max_sequential_bandwidth_hdd 500Mi". I need to use large numbers for hdd, because it looks like the mclock scheduler isn't using the device class

[ceph-users] RGW: How to restore the index and metadata of a bucket

2023-05-24 Thread huy nguyen
There is a test bucket, I have removed its index and metadata: radosgw-admin bi purge --bucket abccc --yes-i-really-mean-it radosgw-admin metadata rm bucket.instance:abccc:17a4ce99-009e-40f2-a2d2-2afc218ebd9b.425824299.4 Now the index and metadata is gone, but how to clean its data? Or is there

[ceph-users] Re: Slow recovery on Quincy

2023-05-24 Thread Sridhar Seshasayee
As someone in this thread noted, the cost related config options are removed in the next version (ceph-17.2.6-45.el9cp). The cost parameters may not work in all cases due to the inherent differences in the underlying device types and other external factors. With the endeavor to achieve a more

[ceph-users] Re: Troubleshooting "N slow requests are blocked > 30 secs" on Pacific

2023-05-24 Thread Emmanuel Jaep
Hi Milind, you are absolutely right. The dump_ops_in_flight is giving a good hint about what's happening: { "ops": [ { "description": "internal op exportdir:mds.5:975673", "initiated_at": "2023-05-23T17:49:53.030611+0200", "age":

[ceph-users] Re: Troubleshooting "N slow requests are blocked > 30 secs" on Pacific

2023-05-24 Thread Milind Changire
Emmanuel, You probably missed the "daemon" keyword after the "ceph" command name. Here's the docs for pacific: https://docs.ceph.com/en/pacific/cephfs/troubleshooting/ So, your command should've been: # ceph daemon mds.icadmin011 dump cache /tmp/dump.txt You could also dump the ops in flight

[ceph-users] Re: MDS crashes to damaged metadata

2023-05-24 Thread Stefan Kooman
On 5/22/23 20:24, Patrick Donnelly wrote: The original script is here: https://github.com/ceph/ceph/blob/main/src/tools/cephfs/first-damage.py "# Suggested recovery sequence (for single MDS cluster): # # 1) Unmount all clients." Is this a hard requirement? This might not be feasible for an

[ceph-users] Troubleshooting "N slow requests are blocked > 30 secs" on Pacific

2023-05-24 Thread Emmanuel Jaep
Hi, we are running a cephfs cluster with the following version: ceph version 16.2.10 (45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific (stable) Several MDSs are reporting slow requests: HEALTH_WARN 4 MDSs report slow requests [WRN] MDS_SLOW_REQUEST: 4 MDSs report slow requests

[ceph-users] Re: Encryption per user Howto

2023-05-24 Thread Stefan Kooman
On 5/22/23 17:28, huxia...@horebdata.cn wrote: Hi, Stefan, Thanks a lot for the message. It seems that client-side encryption (or per use) is still on the way and not ready yet for today. Are there  practical methods to implement encryption for CephFS with today' technique? e.g using LUKS

[ceph-users] Re: [EXTERN] Re: cephfs max_file_size

2023-05-24 Thread Dietmar Rieder
On 5/23/23 15:58, Gregory Farnum wrote: On Tue, May 23, 2023 at 3:28 AM Dietmar Rieder wrote: Hi, can the cephfs "max_file_size" setting be changed at any point in the lifetime of a cephfs? Or is it critical for existing data if it is changed after some time? Is there anything to consider

[ceph-users] Re: [EXTERN] Re: cephfs max_file_size

2023-05-24 Thread Dietmar Rieder
On 5/23/23 15:53, Konstantin Shalygin wrote: Hi, On 23 May 2023, at 13:27, Dietmar Rieder wrote: can the cephfs "max_file_size" setting be changed at any point in the lifetime of a cephfs? Or is it critical for existing data if it is changed after some time? Is there anything to consider

[ceph-users] Timeout in Dashboard

2023-05-24 Thread mailing-lists
Hey all, im facing a "minor" problem. I do not always get results when going to the dashboard, under Block->Images in the tab Images or Namespaces. The little refresh button will keep spinning and sometimes after several minutes it will finally show something. That is odd, because from the