[ceph-users] Re: [MDS] Pacific memory leak

2024-07-22 Thread Patrick Donnelly
Hi Adrien,

On Mon, Jul 22, 2024 at 5:17 AM Adrien Georget
 wrote:
>
> Hi,
>
> For the last 2 months, our MDS is frequently switching to another
> because of a sudden memory leak.
> The host has 128G RAM and most of the time the MDS occupies ~20% of
> memory. And in less than 3 minutes it increases to 100% and crashs with
> tcmalloc: allocation failed.
>
> We tried to run heap stats / perf dump on the host but we couldn't find
> any reasons why the memory used by the MDS exploses so quickly.
> MDS log available here :
> https://filesender.renater.fr/?s=download=c1e60c3c-7f02-4f1e-b23e-f5b25c0cd2a8
>
>
> Any idea what could lead to this memory leak? Anything we can try to
> understand what happens or prevent this?
> We use Pacific 16.2.14.

It is probably an instance of this:

https://tracker.ceph.com/issues/66704

Check backports of an MDS fix here: https://tracker.ceph.com/issues/64977

If you can, using an older kernel or wait until the release is
available with the backported fix.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [EXTERN] Re: Urgent help with degraded filesystem needed

2024-06-24 Thread Patrick Donnelly
6:23.812+ 7fe58d619b00  0 ceph version 18.2.2 
> (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable), process ceph-mds, 
> pid 2
> 2024-06-23T08:06:23.812+ 7fe58d619b00  1 main not setting numa affinity
> 2024-06-23T08:06:23.813+ 7fe58d619b00  0 pidfile_write: ignore empty 
> --pid-file
> 2024-06-23T08:06:23.814+ 7fe58226e700  1 mds.default.cephmon-01.cepqjp 
> Updating MDS map to version 8067 from mon.0
> 2024-06-23T08:06:24.772+ 7fe58226e700  1 mds.default.cephmon-01.cepqjp 
> Updating MDS map to version 8068 from mon.0
> 2024-06-23T08:06:24.772+ 7fe58226e700  1 mds.default.cephmon-01.cepqjp 
> Monitors have assigned me to become a standby.
> 2024-06-23T08:49:28.778+ 7fe584272700  1 mds.default.cephmon-01.cepqjp 
> asok_command: heap {heapcmd=stats,prefix=heap} (starting...)
> 2024-06-23T22:00:04.664+ 7fe583a71700 -1 received  signal: Hangup from 
> Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() ) 
> UID: 0
>
>
> Any ideas how to proceed?

Whatever you snipped from the log has the real error. The MDS tries to
recover from both "ERR" messages you are concerned about. It should
not go damaged.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Urgent help with degraded filesystem needed

2024-06-19 Thread Patrick Donnelly
   DNSINOS
> DIRS   CAPS
>   0   resolve default.cephmon-02.nyfook12.3k  11.8k
> 3228  0
>   1replay(laggy)  default.cephmon-02.duujba   0  0
> 0  0
>   2   resolve default.cephmon-01.pvnqad15.8k  3541
> 1409  0
>   POOLTYPE USED  AVAIL
> ssd-rep-metadata-pool  metadata   295G  63.5T
>sdd-rep-data-pool  data10.2T  84.6T
> hdd-ec-data-pool  data 808T  1929T
> MDS version: ceph version 18.2.2
> (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable)
>
>
> The end log file of the  replay(laggy)  default.cephmon-02.duujba shows:
>
> [...]
> -11> 2024-06-19T07:12:38.980+ 7f90fd117700  1
> mds.1.journaler.pq(ro) _finish_probe_end write_pos = 8673820672 (header
> had 8623488918). recovered.
> -10> 2024-06-19T07:12:38.980+ 7f90fd117700  4 mds.1.purge_queue
> operator(): open complete
>  -9> 2024-06-19T07:12:38.980+ 7f90fd117700  4 mds.1.purge_queue
> operator(): recovering write_pos
>  -8> 2024-06-19T07:12:39.015+ 7f9104926700 10 monclient:
> get_auth_request con 0x55a93ef42c00 auth_method 0
>  -7> 2024-06-19T07:12:39.025+ 7f9105928700 10 monclient:
> get_auth_request con 0x55a93ef43400 auth_method 0
>  -6> 2024-06-19T07:12:39.038+ 7f90fd117700  4 mds.1.purge_queue
> _recover: write_pos recovered
>  -5> 2024-06-19T07:12:39.038+ 7f90fd117700  1
> mds.1.journaler.pq(ro) set_writeable
>  -4> 2024-06-19T07:12:39.044+ 7f9105127700 10 monclient:
> get_auth_request con 0x55a93ef43c00 auth_method 0
>  -3> 2024-06-19T07:12:39.113+ 7f9104926700 10 monclient:
> get_auth_request con 0x55a93ed97000 auth_method 0
>  -2> 2024-06-19T07:12:39.123+ 7f9105928700 10 monclient:
> get_auth_request con 0x55a93e903c00 auth_method 0
>  -1> 2024-06-19T07:12:39.236+ 7f90fa912700 -1
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el8/BUILD/ceph-18.2.2/src/include/interval_set.h:
> In function 'void interval_set::erase(T, T, std::function T)>) [with T = inodeno_t; C = std::map]' thread 7f90fa912700 time
> 2024-06-19T07:12:39.235633+
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el8/BUILD/ceph-18.2.2/src/include/interval_set.h:
> 568: FAILED ceph_assert(p->first <= start)
>
>   ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef
> (stable)
>   1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x135) [0x7f910c722e15]
>   2: /usr/lib64/ceph/libceph-common.so.2(+0x2a9fdb) [0x7f910c722fdb]
>   3: (interval_set::erase(inodeno_t, inodeno_t,
> std::function)+0x2e5) [0x55a93c0de9a5]
>   4: (EMetaBlob::replay(MDSRank*, LogSegment*, int,
> MDPeerUpdate*)+0x4207) [0x55a93c3e76e7]
>   5: (EUpdate::replay(MDSRank*)+0x61) [0x55a93c3e9f81]
>   6: (MDLog::_replay_thread()+0x6c9) [0x55a93c3701d9]
>   7: (MDLog::ReplayThread::entry()+0x11) [0x55a93c01e2d1]
>   8: /lib64/libpthread.so.0(+0x81ca) [0x7f910b4c81ca]
>   9: clone()

Suggest following the recommendations by Xiubo.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS crashes to damaged metadata

2024-06-10 Thread Patrick Donnelly
You could try manually deleting the files from the directory
fragments, using `rados` commands. Make sure to flush your MDS journal
first and take the fs offline (`ceph fs fail`).

On Tue, Jun 4, 2024 at 8:50 AM Stolte, Felix  wrote:
>
> Hi Patrick,
>
> it has been a year now and we did not have a single crash since upgrading to 
> 16.2.13. We still have the 19 corrupted files which are reported by 'damage 
> ls‘. Is it now possible to delete the corrupted files without taking the 
> filesystem offline?
>
> Am 22.05.2023 um 20:23 schrieb Patrick Donnelly :
>
> Hi Felix,
>
> On Sat, May 13, 2023 at 9:18 AM Stolte, Felix  wrote:
>
> Hi Patrick,
>
> we have been running one daily snapshot since december and our cephfs crashed 
> 3 times because of this https://tracker.ceph.com/issues/38452
>
> We currentliy have 19 files with corrupt metadata found by your 
> first-damage.py script. We isolated the these files from access by users and 
> are waiting for a fix before we remove them with your script (or maybe a new 
> way?)
>
> No other fix is anticipated at this time. Probably one will be
> developed after the cause is understood.
>
> Today we upgraded our cluster from 16.2.11 and 16.2.13. After Upgrading the 
> mds  servers, cluster health went to ERROR MDS_DAMAGE. 'ceph tells mds 0 
> damage ls‘ is showing me the same files as your script (initially only a 
> part, after a cephfs scrub all of them).
>
> This is expected. Once the dentries are marked damaged, the MDS won't
> allow operations on those files (like those triggering tracker
> #38452).
>
> I noticed "mds: catch damage to CDentry’s first member before persisting 
> (issue#58482, pr#50781, Patrick Donnelly)“ in the change logs for 16.2.13  
> and like to ask you the following questions:
>
> a) can we repair the damaged files online now instead of bringing down the 
> whole fs and using the python script?
>
> Not yet.
>
> b) should we set one of the new mds options in our specific case to avoid our 
> fileserver crashing because of the wrong snap ids?
>
> Have your MDS crashed or just marked the dentries damaged? If you can
> reproduce a crash with detailed logs (debug_mds=20), that would be
> incredibly helpful.
>
> c) will your patch prevent wrong snap ids in the future?
>
> It will prevent persisting the damage.
>
>
> --
> Patrick Donnelly, Ph.D.
> He / Him / His
> Red Hat Partner Engineer
> IBM, Inc.
> GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
>
>
> mit freundlichem Gruß
> Felix Stolte
>
> IT-Services
> mailto: f.sto...@fz-juelich.de
> Tel: 02461-619243
>
> -
> -
> Forschungszentrum Juelich GmbH
> 52425 Juelich
> Sitz der Gesellschaft: Juelich
> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
> Vorsitzender des Aufsichtsrats: MinDir Stefan Müller
> Geschäftsführung: Prof. Dr. Astrid Lambrecht (Vorsitzende),
> Karsten Beneke (stellv. Vorsitzender), Dr. Ir. Pieter Jansens
> -
> ---------
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io



-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS HA: mgr finish mon failed to return metadata for mds

2024-05-30 Thread Patrick Donnelly
The fix was actually backported to v18.2.3. The tracker was wrong.

On Wed, May 29, 2024 at 3:26 PM  wrote:
>
> Hi,
>
> we have a stretched cluster (Reef 18.2.1) with 5 nodes (2 nodes on each side 
> + witness). You can se our daemon placement below.
>
> [admin]
> ceph-admin01 labels="['_admin', 'mon', 'mgr']"
>
> [nodes]
> [DC1]
> ceph-node01 labels="['mon', 'mgr', 'mds', 'osd']"
> ceph-node02 labels="['mon', 'rgw', 'mds', 'osd']"
> [DC2]
> ceph-node03 labels="['mon', 'mgr', 'mds', 'osd']"
> ceph-node04 labels="['mon', 'rgw', 'mds', 'osd']"
>
> We have been testing CephFS HA and noticed when we have active MDS (we have 
> two active MDS daemons at all times) and active MGR (MGR is either on admin 
> node or in one of the DC's) in one DC and when we shutdown that site (DC) we 
> have a problem when one of the MDS metadata can't be retrieved thus showing 
> in logs as:
>
> "mgr finish mon failed to return metadata for mds"
>
> After we turn that site back on the problem persists and metadata of MDS in 
> question can't be retrieved with "ceph mds metadata"
>
> After I manually fail MDS daemon in question with "ceph mds fail" the problem 
> is solved and I can retrieve MDS metadata.
>
> My question is, would this be related to the following bug 
> (https://tracker.ceph.com/issues/63166) - I can see that it is showed as 
> backported to 18.2.1 but I can't find it in release notes for Reef.
>
> Second question is should this work in current configuration at all as MDS 
> and MGR are both at the same moment disconnected from the rest of the cluster?
>
> And final question would be what would be the solution here and is there any 
> loss of data when this happens?
>
> Any help is appreciated.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Help needed! First MDs crashing, then MONs. How to recover ?

2024-05-30 Thread Patrick Donnelly
On Tue, May 28, 2024 at 8:54 AM Noe P.  wrote:
>
> Hi,
>
> we ran into a bigger problem today with our ceph cluster (Quincy,
> Alma8.9).
> We have 4 filesystems and a total of 6 MDs, the largest fs having
> two ranks assigned (i.e. one standby).
>
> Since we often have the problem of MDs lagging behind, we restart
> the MDs occasionally. Helps ususally, the standby taking over.

Please do not routinely restart MDS. Starting MDS recovery may only
multiply your problems (as it has).

> Today however, the restart didn't work and the rank 1 MDs started to
> crash for unclear reasons. Rank 0 seemed ok.

Figure out why! You might have tried increasing debugging on the mds:

ceph config set mds mds.X debug_mds 20
ceph config set mds mds.X debug_ms 1

> We decided at some point to go back to one rank by settings max_mds to 1.

Doing this will have no positive effect. I've made a tracker ticket so
that folks don't do this:

https://tracker.ceph.com/issues/66301

> Due to the permanent crashing, the rank1 didn't stop however, and at some
> point we set it to failed and the fs not joinable.

The monitors will not stop rank 1 until the cluster is healthy again.
What do you mean "set it to failed"? Setting the fs as not joinable
will mean it never becomes healthy again.

Please do not flail around with administration commands without
understanding the effects.

> At this point it looked like this:
>  fs_cluster - 716 clients
>  ==
>  RANK  STATE MDSACTIVITY DNSINOS   DIRS   CAPS
>   0active  cephmd6a  Reqs:0 /s  13.1M  13.1M  1419k  79.2k
>   1failed
>POOL TYPE USED  AVAIL
>  fs_cluster_meta  metadata  1791G  54.2T
>  fs_cluster_datadata 421T  54.2T
>
> with rank1 still being listed.
>
> The next attempt was to remove that failed
>
>ceph mds rmfailed fs_cluster:1 --yes-i-really-mean-it
>
> which, after a short while brought down 3 out of five MONs.
> They keep crashing shortly after restart with stack traces like this:
>
> ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy 
> (stable)
> 1: /lib64/libpthread.so.0(+0x12cf0) [0x7ff8813adcf0]
> 2: gsignal()
> 3: abort()
> 4: /lib64/libstdc++.so.6(+0x9009b) [0x7ff8809bf09b]
> 5: /lib64/libstdc++.so.6(+0x9654c) [0x7ff8809c554c]
> 6: /lib64/libstdc++.so.6(+0x965a7) [0x7ff8809c55a7]
> 7: /lib64/libstdc++.so.6(+0x96808) [0x7ff8809c5808]
> 8: /lib64/libstdc++.so.6(+0x92045) [0x7ff8809c1045]
> 9: (MDSMonitor::maybe_resize_cluster(FSMap&, int)+0xa9e) [0x55f05d9a5e8e]
> 10: (MDSMonitor::tick()+0x18a) [0x55f05d9b18da]
> 11: (MDSMonitor::on_active()+0x2c) [0x55f05d99a17c]
> 12: (Context::complete(int)+0xd) [0x55f05d76c56d]
> 13: (void finish_contexts std::allocator > >(ceph::common::CephContext*, 
> std::__cxx11::list >&, int)+0x9d) [0x55f05
>d799d7d]
> 14: (Paxos::finish_round()+0x74) [0x55f05d8c5c24]
> 15: (Paxos::dispatch(boost::intrusive_ptr)+0x41b) 
> [0x55f05d8c7e5b]
> 16: (Monitor::dispatch_op(boost::intrusive_ptr)+0x123e) 
> [0x55f05d76a2ae]
> 17: (Monitor::_ms_dispatch(Message*)+0x406) [0x55f05d76a976]
> 18: (Dispatcher::ms_dispatch2(boost::intrusive_ptr const&)+0x5d) 
> [0x55f05d79b3ed]
> 19: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr 
> const&)+0x478) [0x7ff88367fed8]
> 20: (DispatchQueue::entry()+0x50f) [0x7ff88367d31f]
> 21: (DispatchQueue::DispatchThread::entry()+0x11) [0x7ff883747381]
> 22: /lib64/libpthread.so.0(+0x81ca) [0x7ff8813a31ca]
> 23: clone()
> NOTE: a copy of the executable, or `objdump -rdS ` is needed 
> to interpret this.
>
> The MDSMonitor::maybe_resize_cluster somehow suggests a connection to the 
> above MDs operation.

Yes, you've made a mess of things. I assume you ignored this warning:

"WARNING: this can make your filesystem inaccessible! Add
--yes-i-really-mean-it if you are sure you wish to continue."

:(

> Does anyone have an idea how to get this cluster back together again ? Like 
> manually fixing the
> MD ranks ?

You will probably need to bring the file system down but you've
clearly caused the mons to hit an assert where this will be difficult.
You need to increase debugging on the mons (in their
/etc/ceph/ceph.conf):

[mon]
   debug mon = 20
   debug ms = 1

and share the logs on this list or via ceph-post-file.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS Abort druing FS scrub

2024-05-30 Thread Patrick Donnelly
On Fri, May 24, 2024 at 7:09 PM Malcolm Haak  wrote:
>
> When running a cephfs scrub the MDS will crash with the following backtrace
>
> -1> 2024-05-25T09:00:23.028+1000 7ef2958006c0 -1
> /usr/src/debug/ceph/ceph-18.2.2/src/mds/MDSRank.cc: In function 'void
> MDSRank::abort(std::string_view)' thread 7ef2958006c0 time
> 2024-05-25T09:00:23.031373+1000
> /usr/src/debug/ceph/ceph-18.2.2/src/mds/MDSRank.cc: 938:
> ceph_abort_msg("abort() called")
> [...]

Do you have more of the logs you can share? Possibly using ceph-post-file?

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Question about PR merge

2024-04-17 Thread Patrick Donnelly
On Wed, Apr 17, 2024 at 11:36 AM Erich Weiler  wrote:
>
> Hello,
>
> We are tracking PR #56805:
>
> https://github.com/ceph/ceph/pull/56805
>
> And the resolution of this item would potentially fix a pervasive and
> ongoing issue that needs daily attention in our cephfs cluster.

Have you already shared information about this issue? Please do if not.

> I was
> wondering if it would be included in 18.2.3 which I *think* should be
> released soon?  Is there any way of knowing if that is true?

This PR is primarily a debugging tool. It will not make 18.2.3 as it's
not even merged to main yet.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Spam in log file

2024-03-25 Thread Patrick Donnelly
Nope.

On Mon, Mar 25, 2024 at 8:33 AM Albert Shih  wrote:
>
> Le 25/03/2024 à 08:28:54-0400, Patrick Donnelly a écrit
> Hi,
>
> >
> > The fix is in one of the next releases. Check the tracker ticket:
> > https://tracker.ceph.com/issues/63166
>
> Oh thanks. Didn't find it with google.
>
> Is they are any risk/impact for the cluster ?
>
> Regards.
> --
> Albert SHIH 嶺 
> France
> Heure locale/Local time:
> lun. 25 mars 2024 13:31:27 CET
>


-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Spam in log file

2024-03-25 Thread Patrick Donnelly
Hi Albert,

The fix is in one of the next releases. Check the tracker ticket:
https://tracker.ceph.com/issues/63166

On Mon, Mar 25, 2024 at 8:23 AM Albert Shih  wrote:
>
> Hi everyone.
>
> On my cluster I got spam by my cluster with message like
>
> Mar 25 13:10:13 cthulhu2 ceph-mgr[2843]: mgr finish mon failed to return 
> metadata for mds.cephfs.cthulhu2.dqahyt: (2) No such file or directory
> Mar 25 13:10:13 cthulhu2 ceph-mgr[2843]: mgr finish mon failed to return 
> metadata for mds.cephfs.cthulhu3.xvboir: (2) No such file or directory
> Mar 25 13:10:13 cthulhu2 ceph-mgr[2843]: mgr finish mon failed to return 
> metadata for mds.cephfs.cthulhu5.kwmyyg: (2) No such file or directory
>
> I got 5 server for the service (cthulhu 1->5) and indeed when from cthulhu1 
> (or 2) I try :
>
> something:
>
> root@cthulhu2:/etc/ceph# ceph mds metadata cephfs.cthulhu2.dqahyt
> {}
> Error ENOENT:
> root@cthulhu2:
>
> but that works on 1 or 4
>
> root@cthulhu2:/etc/ceph# ceph mds metadata cephfs.cthulhu1.sikvjf
> {
> "addr": 
> "[v2:145.238.187.184:6800/1315478297,v1:145.238.187.184:6801/1315478297]",
> "arch": "x86_64",
> "ceph_release": "quincy",
> "ceph_version": "ceph version 17.2.7 
> (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable)",
> "ceph_version_short": "17.2.7",
> "container_hostname": "cthulhu1",
> "container_image": 
> "quay.io/ceph/ceph@sha256:62465e744a80832bde6a57120d3ba076613e8a19884b274f9cc82580e249f6e1",
> "cpu": "Intel(R) Xeon(R) Silver 4310 CPU @ 2.10GHz",
> "distro": "centos",
> "distro_description": "CentOS Stream 8",
> "distro_version": "8",
> "hostname": "cthulhu1",
> "kernel_description": "#1 SMP Debian 5.10.209-2 (2024-01-31)",
> "kernel_version": "5.10.0-28-amd64",
> "mem_swap_kb": "16777212",
> "mem_total_kb": "263803496",
> "os": "Linux"
> }
> root@cthulhu2:/etc/ceph#
>
> I check the caps and don't see anything special.
>
> I got also (I don't know if it's related) those message :
>
> Mar 25 13:18:38 cthulhu2 ceph-mgr[2843]: mgr.server handle_open ignoring open 
> from mds.cephfs.cthulhu2.dqahyt v2:145.238.187.185:6800/2763465960; not ready 
> for session (expect reconnect)
> Mar 25 13:18:38 cthulhu2 ceph-mgr[2843]: mgr.server handle_open ignoring open 
> from mds.cephfs.cthulhu3.xvboir v2:145.238.187.186:6800/1297104944; not ready 
> for session (expect reconnect)
> Mar 25 13:18:38 cthulhu2 ceph-mgr[2843]: mgr.server handle_open ignoring open 
> from mds.cephfs.cthulhu5.kwmyyg v2:145.238.187.188:6800/449122091; not ready 
> for session (expect reconnect)
> Mar 25 13:18:39 cthulhu2 ceph-mgr[2843]: mgr.server handle_open ignoring open 
> from mds.cephfs.cthulhu3.xvboir v2:145.238.187.186:6800/1297104944; not ready 
> for session (expect reconnect)
> Mar 25 13:18:39 cthulhu2 ceph-mgr[2843]: mgr.server handle_open ignoring open 
> from mds.cephfs.cthulhu2.dqahyt v2:145.238.187.185:6800/2763465960; not ready 
> for session (expect reconnect)
> Mar 25 13:18:39 cthulhu2 ceph-mgr[2843]: mgr.server handle_open ignoring open 
> from mds.cephfs.cthulhu5.kwmyyg v2:145.238.187.188:6800/449122091; not ready 
> for session (expect reconnect)
>
> Regards.
> --
> Albert SHIH 嶺 
> France
> Heure locale/Local time:
> lun. 25 mars 2024 13:08:33 CET
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io



-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cephfs error state with one bad file

2024-03-15 Thread Patrick Donnelly
Hi Sake,

On Tue, Jan 2, 2024 at 4:02 AM Sake Ceph  wrote:
>
> Hi again, hopefully for the last time with problems.
>
> We had a MDS crash earlier with the MDS staying in failed state and used a 
> command to reset the filesystem (this was wrong, I know now, thanks Patrick 
> Donnelly for pointing this out). I did a full scrub on the filesystem and two 
> files were damaged. One of those got repaired, but the following file keeps 
> giving errors and can't be removed.
> What can I do now? Below some information.
>
> # ceph tell mds.atlassian-prod:0 damage ls
> [
> {
> "damage_type": "backtrace",
> "id": 224901,
> "ino": 1099534008829,
> "path": 
> "/app1/shared/data/repositories/11271/objects/41/8f82507a0737c611720ed224bcc8b7a24fda01"
> }
> ]
>
>
> Trying to repair the error (online research shows this should work for a 
> backtrace damage type)
> --
> # ceph tell mds.atlassian-prod:0 scrub start 
> /app1/shared/data/repositories/11271 recursive,repair,force
> {
> "return_code": 0,
> "scrub_tag": "d10ead42-5280-4224-971e-4f3022e79278",
> "mode": "asynchronous"
> }
>
>
> Cluster logs after this
> --
> 1/2/24 9:37:05 AM
> [INF]
> scrub summary: idle
>
> 1/2/24 9:37:02 AM
> [INF]
> scrub summary: idle+waiting paths [/app1/shared/data/repositories/11271]
>
> 1/2/24 9:37:01 AM
> [INF]
> scrub summary: active paths [/app1/shared/data/repositories/11271]
>
> 1/2/24 9:37:01 AM
> [INF]
> scrub summary: idle+waiting paths [/app1/shared/data/repositories/11271]
>
> 1/2/24 9:37:01 AM
> [INF]
> scrub queued for path: /app1/shared/data/repositories/11271
>
>
> But the error doesn't disappear and still can't remove the file.
>
>
> On the client trying to remove the file (we got a backup)
> --
> $ rm -f 
> /mnt/shared_disk-app1/shared/data/repositories/11271/objects/41/8f82507a0737c611720ed224bcc8b7a24fda01
> rm: cannot remove 
> '/mnt/shared_disk-app1/shared/data/repositories/11271/objects/41/8f82507a0737c611720ed224bcc8b7a24fda01':
>  Input/output error

Did you try `damage rm ` after scrubbing?


-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS subtree pinning

2024-03-15 Thread Patrick Donnelly
Hi Sake,

On Fri, Dec 22, 2023 at 7:44 AM Sake Ceph  wrote:
>
> Hi!
>
> As I'm reading through the documentation about subtree pinning, I was 
> wondering if the following is possible.
>
> We've got the following directory structure.
> /
>   /app1
>   /app2
>   /app3
>   /app4
>
> Can I pin /app1 to MDS rank 0 and 1,

This will be possible with: https://github.com/ceph/ceph/pull/52373

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: reef 18.2.2 (hot-fix) QE validation status

2024-03-06 Thread Patrick Donnelly
On Wed, Mar 6, 2024 at 2:55 AM Venky Shankar  wrote:
>
> +Patrick Donnelly
>
> On Tue, Mar 5, 2024 at 9:18 PM Yuri Weinstein  wrote:
> >
> > Details of this release are summarized here:
> >
> > https://tracker.ceph.com/issues/64721#note-1
> > Release Notes - TBD
> > LRC upgrade - TBD
> >
> > Seeking approvals/reviews for:
> >
> > smoke - in progress
> > rados - Radek, Laura?
> > quincy-x - in progress
>
> I think
>
> https://github.com/ceph/ceph/pull/55669
>
> was supposed to be included in this hotfix (I recall Patrick
> mentioning this in last week's CLT). The change was merged into reef
> head last week.

That's fixing a bug in reef HEAD not v18.2.1 so no need to get into this hotfix.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph Leadership Team Meeting, 2024-02-28 Minutes

2024-02-28 Thread Patrick Donnelly
Hi folks,

Today we discussed:


   - [casey] on dropping ubuntu focal support for squid


   - Discussion thread:
  
https://lists.ceph.io/hyperkitty/list/d...@ceph.io/thread/ONAWOAE7MPMT7CP6KH7Y4NGWIP5SZ7XR/


   - Quincy doesn't build jammy packages, so quincy->squid upgrade tests
  have to run on focal


   - proposing to add jammy packages for quincy to enable that upgrade path
  (from 17.2.8+)


   - https://github.com/ceph/ceph-build/pull/2206


   - Need to indicate that Quincy clusters must upgrade to jammy before
  upgrading to Squid.


   - T release name: https://pad.ceph.com/p/t


   - Tentacle wins!


   - Patrick to do release kick-off


   - Cephalocon news?


   - Planning is in progress; no news as knowledgeable parties not present
  for this meeting.


   - Volunteers for compiling the Contributor Credits?


   -
  
https://tracker.ceph.com/projects/ceph/wiki/Ceph_contributors_list_maintenance_guide


   - Laura will give it a try.


   - Plan for tagged vs. named Github milestones?


   - Continue using priority order for qa testing: exhaust testing on
  tagged milestone, then go to "release" catch-all milestone


   - v18.2.2 hotfix release next


   - Reef HEAD is still cooking with to-be-addressed upgrade issues.


   - v19.1.0 (first Squid RC)


   - two rgw features still waiting to go into squid


   - cephfs quiesce feature to be backported


   - Nightlies crontab to be updated by Patrick.


   - V19.1.0 milestone: https://github.com/ceph/ceph/milestone/21


-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Direct ceph mount on desktops

2024-02-06 Thread Patrick Donnelly
On Tue, Feb 6, 2024 at 12:09 PM Tim Holloway  wrote:
>
> Back when I was battline Octopus, I had problems getting ganesha's NFS
> to work reliably. I resolved this by doing a direct (ceph) mount on my
> desktop machine instead of an NFS mount.
>
> I've since been plagued by ceph "laggy OSD" complaints that appear to
> be due to a non-responsive client and I'm suspecting that the client in
> question is the desktop machine when it's suspended while the ceph
> mount is in effect.

You should not see "laggy OSD" messages due to a client becoming unresponsive.

> So the question is: Should ceph native mounts be used on general client
> machines which may hibernate or otherwise go offline?

The mounts will eventually be evicted (generally) by the MDS if the
machine hibernates/suspends. There are mechanisms for the mount to
recover (see "recover_session" in the mount.ceph man page).  Any dirty
data would be lost.

As for whether you should have clients that hibernate, it's not ideal.
It could conceivably create problems if client machines hibernate
longer than the blocklist duration (after eviction by the MDS).

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephfs inode backtrace information

2024-01-31 Thread Patrick Donnelly
On Tue, Jan 30, 2024 at 5:03 AM Dietmar Rieder
 wrote:
>
> Hello,
>
> I have a question regarding the default pool of a cephfs.
>
> According to the docs it is recommended to use a fast ssd replicated
> pool as default pool for cephfs. I'm asking what are the space
> requirements for storing the inode backtrace information?

The actual recommendation is to use a replicated pool for the default
data pool. Regular hard drives are fine for the storage device.

> Let's say I have a 85 TiB replicated ssd pool (hot data) and as 3 PiB EC
> data pool (cold data).
>
> Does it make sense to create a third pool as default pool which only
> holds the inode backtrace information (what would be a good size), or is
> it OK to use the ssd pool as default pool?

Assuming your 85 TiB rep ssd pool is the default data pool already, use that.

(I am curious why this question is asked now when the file system
already has a significant amount of data? Are you thinking about
recreating the fs?)

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: mds crashes after up:replay state

2024-01-05 Thread Patrick Donnelly
Hi Lars,

On Fri, Jan 5, 2024 at 9:53 AM Lars Köppel  wrote:
>
> Hello everyone,
>
> we are running a small cluster with 3 nodes and 25 osds per node. And Ceph
> version 17.2.6.
> Recently the active mds crashed and since then the new starting mds has
> always been in the up:replay state. In the output of the command 'ceph tell
> mds.cephfs:0 status' you can see that the journal is completely read in. As
> soon as it's finished, the mds crashes and the next one starts reading the
> journal.
>
> At the moment I have the journal inspection running ('cephfs-journal-tool
> --rank=cephfs:0 journal inspect').
>
> Does anyone have any further suggestions on how I can get the cluster
> running again as quickly as possible?

Please review:

https://docs.ceph.com/en/reef/cephfs/troubleshooting/#stuck-during-recovery

Note: your MDS is probably not failing in up:replay but shortly after
reaching one of the later states. Check the mon logs to see what the
FSMap changes were.


Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: FS down - mds degraded

2023-12-21 Thread Patrick Donnelly
On Thu, Dec 21, 2023 at 3:05 AM Sake Ceph  wrote:
>
> Hi David
>
> Reducing max_mds didn't work. So I executed a fs reset:
> ceph fs set atlassian-prod allow_standby_replay false
> ceph fs set atlassian-prod cluster_down true
> ceph mds fail atlassian-prod.pwsoel13142.egsdfl
> ceph mds fail atlassian-prod.pwsoel13143.qlvypn
> ceph fs reset atlassian-prod
> ceph fs reset atlassian-prod --yes-i-really-mean-it
>
> This brought the fs back online and the servers/applications are working 
> again.

This was not the right thing to do. You can mark the rank repaired. See end of:

https://docs.ceph.com/en/latest/cephfs/administration/#daemons

(ceph mds repaired )

I admit that is not easy to find. I will add a ticket to improve the
documentation:

https://tracker.ceph.com/issues/63885

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: FS down - mds degraded

2023-12-21 Thread Patrick Donnelly
On Thu, Dec 21, 2023 at 2:49 AM David C.  wrote:
> I would start by decrementing max_mds by 1:
> ceph fs set atlassian-prod max_mds 2

This will have no positive effect. The monitors will not alter the
number of ranks (i.e. stop a rank) if the cluster is degraded.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: FS down - mds degraded

2023-12-21 Thread Patrick Donnelly
On Thu, Dec 21, 2023 at 2:11 AM Sake Ceph  wrote:
>
> Starting a new thread, forgot subject in the previous.
> So our FS down. Got the following error, what can I do?
>
> # ceph health detail
> HEALTH_ERR 1 filesystem is degraded; 1 mds daemon damaged
> [WRN] FS_DEGRADED: 1 filesystem is degraded
> fs atlassian/prod is degraded
> [ERR] MDS_DAMAGE: 1 mds daemon damaged
> fs atlassian-prod mds.1 is damaged

Identify what is damaged by reviewing the MDS logs. Increase mds
debugging and mark the rank repaired if there is insufficient
information (which assumes that whatever caused the MDS to become
damaged will reoccur when it restarts).

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: mds.0.journaler.pq(ro) _finish_read got error -2

2023-12-12 Thread Patrick Donnelly
On Mon, Dec 11, 2023 at 6:38 AM Eugen Block  wrote:
>
> Hi,
>
> I'm trying to help someone with a broken CephFS. We managed to recover
> basic ceph functionality but the CephFS is still inaccessible
> (currently read-only). We went through the disaster recovery steps but
> to no avail. Here's a snippet from the startup logs:
>
> ---snip---
> mds.0.41 Booting: 2: waiting for purge queue recovered
> mds.0.journaler.pq(ro) _finish_probe_end write_pos = 14797504512
> (header had 14789452521). recovered.
> mds.0.purge_queue operator(): open complete
> mds.0.purge_queue operator(): recovering write_pos
> monclient: get_auth_request con 0x55c280bc5c00 auth_method 0
> monclient: get_auth_request con 0x55c280ee0c00 auth_method 0
> mds.0.journaler.pq(ro) _finish_read got error -2
> mds.0.purge_queue _recover: Error -2 recovering write_pos
> mds.0.purge_queue _go_readonly: going readonly because internal IO
> failed: No such file or directory
> mds.0.journaler.pq(ro) set_readonly
> mds.0.41 unhandled write error (2) No such file or directory, force
> readonly...
> mds.0.cache force file system read-only
> force file system read-only
> ---snip---
>
> I've added the dev mailing list, maybe someone can give some advice
> how to continue from here (we could try to recover with an empty
> metadata pool). Or is this FS lost?

Looks like one of the purge queue journal objects was lost? Were other
objects lost? It would be helpful to know more about the circumstances
of this "broken CephFS"? What Ceph version?

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS_DAMAGE in 17.2.7 / Cannot delete affected files

2023-11-29 Thread Patrick Donnelly
Hi Sebastian,

On Wed, Nov 29, 2023 at 3:11 PM Sebastian Knust
 wrote:
>
> Hello Patrick,
>
> On 27.11.23 19:05, Patrick Donnelly wrote:
> >
> > I would **really** love to see the debug logs from the MDS. Please
> > upload them using ceph-post-file [1]. If you can reliably reproduce,
> > turn on more debugging:
> >
> >> ceph config set mds debug_mds 20
> >> ceph config set mds debug_ms 1
> >
> > [1] https://docs.ceph.com/en/reef/man/8/ceph-post-file/
> >
>
> Uploaded debug log and core dump, see ceph-post-file:
> 02f78445-7136-44c9-a362-410de37a0b7d
> Unfortunately, we cannot easily shut down normal access to the cluster
> for these tests, therefore there is quite some clutter in the logs. The
> logs show three crashes, the last one with enabled core dumping (ulimits
> set to unlimited)
>
> A note on reproducibility: To recreate the crash, reading the contents
> of the file prior to removal seems necessary. Simply calling stat on the
> file and then performing the removal also yields an Input/output error
> but does not crash the MDS.
>
> Interestingly, the MDS_DAMAGE flag is reset on restart of the MDS and
> only comes back once the files in question are accessed (stat call is
> sufficient).

I've not yet fully reviewed the logs but it seems there is a bug in
the detection logic which causes a spurious abort. This does not
appear to be actually new damage.

Are you using postgres? If you can share details about your snapshot
workflow and general workloads that would be helpful (privately if
desired).

> For now, I'll hold off on running first-damage.py to try to remove the
> affected files / inodes. Ultimately however, this seems to be the most
> sensible solution to me, at least with regards to cluster downtime.

Please give me another day to review then feel free to use
first-damage.py to cleanup. If you see new damage please upload the
logs.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS_DAMAGE in 17.2.7 / Cannot delete affected files

2023-11-27 Thread Patrick Donnelly
9b2f35e32e0de30d70e9a4c060d2) quincy (stable)
> >1: 
> > (ceph::__ceph_abort(char const*, int, char const*, 
> > std::__cxx11::basic_string, 
> > std::allocator > const&)+0xd7) [0x7f3fe5a1cb03]
> >2: 
> > (MDSRank::abort(std::basic_string_view 
> > >)+0x7d) [0x5640f2e6fa2d]
> >3: 
> > (CDentry::check_corruption(bool)+0x740) [0x5640f30e4820]
> >4: 
> > (EMetaBlob::add_primary_dentry(EMetaBlob::dirlump&, CDentry*, CInode*, 
> > unsigned char)+0x47) [0x5640f2f41877]
> >5: 
> > (EOpen::add_clean_inode(CInode*)+0x121) [0x5640f2f49fc1]
> >6: 
> > (Locker::adjust_cap_wanted(Capability*, int, int)+0x426) [0x5640f305e036]
> >7: 
> > (Locker::process_request_cap_release(boost::intrusive_ptr&, 
> > client_t, ceph_mds_request_release const&, std::basic_string_view > std::char_traits >)+0x599) [0x5640f307f7e9]
> >8: 
> > (Server::handle_client_request(boost::intrusive_ptr 
> > const&)+0xc06) [0x5640f2f2a7c6]
> >9: 
> > (Server::dispatch(boost::intrusive_ptr const&)+0x13c) 
> > [0x5640f2f2ef6c]
> >10: 
> > (MDSRank::_dispatch(boost::intrusive_ptr const&, 
> > bool)+0x5db) [0x5640f2e7727b]
> >11: 
> > (MDSRankDispatcher::ms_dispatch(boost::intrusive_ptr 
> > const&)+0x5c) [0x5640f2e778bc]
> >12: 
> > (MDSDaemon::ms_dispatch2(boost::intrusive_ptr const&)+0x1bf) 
> > [0x5640f2e60c2f]
> >13: 
> > (Messenger::ms_deliver_dispatch(boost::intrusive_ptr 
> > const&)+0x478) [0x7f3fe5c97ed8]
> >14: 
> > (DispatchQueue::entry()+0x50f) [0x7f3fe5c9531f]
> >15: 
> > (DispatchQueue::DispatchThread::entry()+0x11) [0x7f3fe5d5f381]
> >16: 
> > /lib64/libpthread.so.0(+0x81ca) [0x7f3fe4a0b1ca]
> >17: clone()
>
> Deleting the file with cephfs-shell also does give Input/output error (5).
>
> Does anyone have an idea on how to proceed here? I am perfectly fine
> with loosing the affected files, they can all be easily restored from
> backup.

I would **really** love to see the debug logs from the MDS. Please
upload them using ceph-post-file [1]. If you can reliably reproduce,
turn on more debugging:

> ceph config set mds debug_mds 20
> ceph config set mds debug_ms 1

[1] https://docs.ceph.com/en/reef/man/8/ceph-post-file/

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Does cephfs ensure close-to-open consistency after enabling lazyio?

2023-11-27 Thread Patrick Donnelly
No. You must call lazyio_synchronize.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph Leadership Team Weekly Meeting Minutes 2023-11-08

2023-11-08 Thread Patrick Donnelly
Hello all,

Here are the minutes from today's meeting.

   - New time for CDM APAC to increase participation


   - 9.30 - 11.30 pm PT seems like the most popular based on
  https://doodle.com/meeting/participate/id/aM9XGZ3a/vote


   - One more week for more feedback; please ask more APAC folks to suggest
  their preferred times.


   - [Ernesto] Revamp Ansible/Ceph-Ansible for non-containerized users?


   - open nebula / proxmox


   - solicit maintainers for ceph-ansible on the ML


   - 18.2.1


   - yuri: approval email sent out a few days ago; waiting on some approvals


   - Blocker:


   - https://tracker.ceph.com/issues/63391


   - lab upgrades (Laura will help Yuri coordinate)


   - Next Pacific release being worked on in background by Yuri.


   - https://pad.ceph.com/p/pacific_16.2.15


   - Try v16.2.15 milestone to help prune PRs


   - https://github.com/ceph/ceph/milestone/17


   - [Nizam] Ceph News Ticker - Ceph Dashboard


   - Notify when new release is available (display changelogs)


   - Display important ceph events


   - CVEs, critical bug fixes


   - Maybe newly added blog posts or informations regarding the upcoming
  group meetings?


   - User + Dev meeting next week


   - Topics include migration between EC profiles and challenges related to
  RGW zone replication


   - Casey can attend end of meeting


   - open nebula folks planning to do webinar; looking for speakers


-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: list cephfs dirfrags

2023-11-08 Thread Patrick Donnelly
On Mon, Nov 6, 2023 at 4:56 AM Ben  wrote:

> Hi,
> I used this but all returns "directory inode not in cache"
> ceph tell mds.* dirfrag ls path
>
> I would like to pin some subdirs to a rank after dynamic subtree
> partitioning. Before that, I need to know where are they exactly
>

If the dirfrag is not in cache on any rank then the dirfrag is "nowhere".
It's only pinned to a rank if it's in cache.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Specify priority for active MGR and MDS

2023-10-19 Thread Patrick Donnelly
Hello Nicolas,

On Wed, Sep 27, 2023 at 9:32 AM Nicolas FONTAINE  wrote:
>
> Hi everyone,
>
> Is there a way to specify which MGR and which MDS should be the active one?

With respect to the MDS, if your reason for asking is because you want
to have the better provisioned MDS as the active then I discourage you
from architecting your system that way. The standby should be equally
provisioned as recovery generally is a resource intense process that
will usually consume more resources than the steady-state active. If
you underprovision memory, especially, then your standup will simply
not function (go OOM).

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: No snap_schedule module in Octopus

2023-09-19 Thread Patrick Donnelly
I'm not sure off-hand. The module did have several changes as recently
as pacific so it's possible something is broken. Perhaps you don't
have a file system created yet? I would still expect to see the
commands however...

I suggest you figure out why Ceph Pacific+ can't detect your hard disk
drives (???). That seems more productive than debugging a long EOLifed
release.

On Tue, Sep 19, 2023 at 8:49 AM Patrick Begou
 wrote:
>
> Hi Patrick,
>
> sorry for the bad copy/paste.  As it was not working I have also tried
> with the module name 
>
> [ceph: root@mostha1 /]# ceph fs snap-schedule
> no valid command found; 10 closest matches:
> fs status []
> fs volume ls
> fs volume create  []
> fs volume rm  []
> fs subvolumegroup ls 
> fs subvolumegroup create   []
> [] [] []
> fs subvolumegroup rm   [--force]
> fs subvolume ls  []
> fs subvolume create   [] []
> [] [] [] [] [--namespace-isolated]
> fs subvolume rm   [] [--force]
> [--retain-snapshots]
> Error EINVAL: invalid command
>
> I'm reading the same documentation, but for Octopus:
> https://docs.ceph.com/en/octopus/cephfs/snap-schedule/#
>
> I think that if  "ceph mgr module enable snap_schedule" was not working
> without the "--force" option, it was because something was wrong in my
> Ceph install.
>
> Patrick
>
> Le 19/09/2023 à 14:29, Patrick Donnelly a écrit :
> > https://docs.ceph.com/en/quincy/cephfs/snap-schedule/#usage
> >
> > ceph fs snap-schedule
> >
> > (note the hyphen!)
> >
> > On Tue, Sep 19, 2023 at 8:23 AM Patrick Begou
> >  wrote:
> >> Hi,
> >>
> >> still some problems with snap_schedule as as the ceph fs snap-schedule
> >> namespace is not available on my nodes.
> >>
> >> [ceph: root@mostha1 /]# ceph mgr module ls | jq -r '.enabled_modules []'
> >> cephadm
> >> dashboard
> >> iostat
> >> prometheus
> >> restful
> >> snap_schedule
> >>
> >> [ceph: root@mostha1 /]# ceph fs snap_schedule
> >> no valid command found; 10 closest matches:
> >> fs status []
> >> fs volume ls
> >> fs volume create  []
> >> fs volume rm  []
> >> fs subvolumegroup ls 
> >> fs subvolumegroup create   []
> >> [] [] []
> >> fs subvolumegroup rm   [--force]
> >> fs subvolume ls  []
> >> fs subvolume create   [] []
> >> [] [] [] [] [--namespace-isolated]
> >> fs subvolume rm   [] [--force]
> >> [--retain-snapshots]
> >> Error EINVAL: invalid command
> >>
> >> I think I need your help to go further 
> >>
> >> Patrick
> >> Le 19/09/2023 à 10:23, Patrick Begou a écrit :
> >>> Hi,
> >>>
> >>> bad question, sorry.
> >>> I've just run
> >>>
> >>> ceph mgr module enable snap_schedule --force
> >>>
> >>> to solve this problem. I was just afraid to use "--force"   but as I
> >>> can break this test configuration
> >>>
> >>> Patrick
> >>>
> >>> Le 19/09/2023 à 09:47, Patrick Begou a écrit :
> >>>> Hi,
> >>>>
> >>>> I'm working on a small POC for a ceph setup on 4 old C6100
> >>>> power-edge. I had to install Octopus since latest versions were
> >>>> unable to detect the HDD (too old hardware ??).  No matter, this is
> >>>> only for training and understanding Ceph environment.
> >>>>
> >>>> My installation is based on
> >>>> https://download.ceph.com/rpm-15.2.12/el8/noarch/cephadm-15.2.12-0.el8.noarch.rpm
> >>>> bootstrapped.
> >>>>
> >>>> I'm reaching the point to automate the snapshots (I can create
> >>>> snapshot by hand without any problem). The documentation
> >>>> https://download.ceph.com/rpm-15.2.12/el8/noarch/cephadm-15.2.12-0.el8.noarch.rpm
> >>>> says to use the snap_schedule module but this module does not exist.
> >>>>
> >>>> # ceph mgr module ls | jq -r '.enabled_modules []'
> >>>> cephadm
> >>>> dashboard
> >>>> iostat
> >>>> prometheus
> >>>> restful
> >>>>
> >>>> Have I missed something ? Is there some additional install steps to
> >>>> do for this module ?
> >>>>
> >>>> Thanks for your help.
> >>>>
> >>>> Patrick
> >>>> ___
> >>>> ceph-users mailing list -- ceph-users@ceph.io
> >>>> To unsubscribe send an email to ceph-users-le...@ceph.io
> >>> ___
> >>> ceph-users mailing list -- ceph-users@ceph.io
> >>> To unsubscribe send an email to ceph-users-le...@ceph.io
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> >
>


-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: No snap_schedule module in Octopus

2023-09-19 Thread Patrick Donnelly
https://docs.ceph.com/en/quincy/cephfs/snap-schedule/#usage

ceph fs snap-schedule

(note the hyphen!)

On Tue, Sep 19, 2023 at 8:23 AM Patrick Begou
 wrote:
>
> Hi,
>
> still some problems with snap_schedule as as the ceph fs snap-schedule
> namespace is not available on my nodes.
>
> [ceph: root@mostha1 /]# ceph mgr module ls | jq -r '.enabled_modules []'
> cephadm
> dashboard
> iostat
> prometheus
> restful
> snap_schedule
>
> [ceph: root@mostha1 /]# ceph fs snap_schedule
> no valid command found; 10 closest matches:
> fs status []
> fs volume ls
> fs volume create  []
> fs volume rm  []
> fs subvolumegroup ls 
> fs subvolumegroup create   []
> [] [] []
> fs subvolumegroup rm   [--force]
> fs subvolume ls  []
> fs subvolume create   [] []
> [] [] [] [] [--namespace-isolated]
> fs subvolume rm   [] [--force]
> [--retain-snapshots]
> Error EINVAL: invalid command
>
> I think I need your help to go further 
>
> Patrick
> Le 19/09/2023 à 10:23, Patrick Begou a écrit :
> > Hi,
> >
> > bad question, sorry.
> > I've just run
> >
> > ceph mgr module enable snap_schedule --force
> >
> > to solve this problem. I was just afraid to use "--force"   but as I
> > can break this test configuration
> >
> > Patrick
> >
> > Le 19/09/2023 à 09:47, Patrick Begou a écrit :
> >> Hi,
> >>
> >> I'm working on a small POC for a ceph setup on 4 old C6100
> >> power-edge. I had to install Octopus since latest versions were
> >> unable to detect the HDD (too old hardware ??).  No matter, this is
> >> only for training and understanding Ceph environment.
> >>
> >> My installation is based on
> >> https://download.ceph.com/rpm-15.2.12/el8/noarch/cephadm-15.2.12-0.el8.noarch.rpm
> >> bootstrapped.
> >>
> >> I'm reaching the point to automate the snapshots (I can create
> >> snapshot by hand without any problem). The documentation
> >> https://download.ceph.com/rpm-15.2.12/el8/noarch/cephadm-15.2.12-0.el8.noarch.rpm
> >> says to use the snap_schedule module but this module does not exist.
> >>
> >> # ceph mgr module ls | jq -r '.enabled_modules []'
> >> cephadm
> >> dashboard
> >> iostat
> >> prometheus
> >> restful
> >>
> >> Have I missed something ? Is there some additional install steps to
> >> do for this module ?
> >>
> >> Thanks for your help.
> >>
> >> Patrick
> >> _______
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io



-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS daemons don't report any more

2023-09-11 Thread Patrick Donnelly
6k op/s rd, 3.04k op/s wr
> recovery: 8.7 GiB/s, 3.41k objects/s
>
> My first thought is that the status module failed. However, I don't manage to 
> restart it (always on). An MGR fail-over did not help.
>
> Any ideas what is going on here?
>
> Thanks and best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Client failing to respond to capability release

2023-09-01 Thread Patrick Donnelly
Hello Frank,

On Tue, Aug 22, 2023 at 11:42 AM Frank Schilder  wrote:
>
> Hi all,
>
> I have this warning the whole day already (octopus latest cluster):
>
> HEALTH_WARN 4 clients failing to respond to capability release; 1 pgs not 
> deep-scrubbed in time
> [WRN] MDS_CLIENT_LATE_RELEASE: 4 clients failing to respond to capability 
> release
> mds.ceph-24(mds.1): Client sn352.hpc.ait.dtu.dk:con-fs2-hpc failing to 
> respond to capability release client_id: 145698301
> mds.ceph-24(mds.1): Client sn463.hpc.ait.dtu.dk:con-fs2-hpc failing to 
> respond to capability release client_id: 189511877
> mds.ceph-24(mds.1): Client sn350.hpc.ait.dtu.dk:con-fs2-hpc failing to 
> respond to capability release client_id: 189511887
> mds.ceph-24(mds.1): Client sn403.hpc.ait.dtu.dk:con-fs2-hpc failing to 
> respond to capability release client_id: 231250695
>
> If I look at the session info from mds.1 for these clients I see this:
>
> # ceph tell mds.1 session ls | jq -c '[.[] | {id: .id, h: 
> .client_metadata.hostname, addr: .inst, fs: .client_metadata.root, caps: 
> .num_caps, req: .request_load_avg}]|sort_by(.caps)|.[]' | grep -e 145698301 
> -e 189511877 -e 189511887 -e 231250695
> {"id":189511887,"h":"sn350.hpc.ait.dtu.dk","addr":"client.189511887 
> v1:192.168.57.221:0/4262844211","fs":"/hpc/groups","caps":2,"req":0}
> {"id":231250695,"h":"sn403.hpc.ait.dtu.dk","addr":"client.231250695 
> v1:192.168.58.18:0/1334540218","fs":"/hpc/groups","caps":3,"req":0}
> {"id":189511877,"h":"sn463.hpc.ait.dtu.dk","addr":"client.189511877 
> v1:192.168.58.78:0/3535879569","fs":"/hpc/groups","caps":4,"req":0}
> {"id":145698301,"h":"sn352.hpc.ait.dtu.dk","addr":"client.145698301 
> v1:192.168.57.223:0/2146607320","fs":"/hpc/groups","caps":7,"req":0}
>
> We have mds_min_caps_per_client=4096, so it looks like the limit is well 
> satisfied. Also, the file system is pretty idle at the moment.
>
> Why and what exactly is the MDS complaining about here?

These days, you'll generally see this because the client is "quiet"
and the MDS is opportunistically recalling caps to reduce future work
when shrinking its cache is necessary. This would be indicated by:

* The MDS is not complaining about an oversized cache.
* The session listing shows the session is quiet (the
"session_cache_liveness" is near 0).

However, the MDS should respect mds_min_caps_per_client by (a) not
recalling more caps than mds_min_caps_per_client and (b) not
complaining the client has caps < mds_min_caps_per_client when it's
quiet.

So, you may have found a bug. The next time this happens, a `ceph tell
mds.X config diff`, `ceph tell mds.X perf dump`, and selection of the
relevant `ceph tell mds.X session ls` will help debug this I think.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: When to use the auth profiles simple-rados-client and profile simple-rados-client-with-blocklist?

2023-09-01 Thread Patrick Donnelly
Hello Christian,

On Tue, Aug 22, 2023 at 7:51 AM Christian Rohmann
 wrote:
>
> Hey ceph-users,
>
> 1) When configuring Gnocchi to use Ceph storage (see
> https://gnocchi.osci.io/install.html#ceph-requirements)
> I was wondering if one could use any of the auth profiles like
>   * simple-rados-client
>   * simple-rados-client-with-blocklist ?
>
> Or are those for different use cases?
>
> 2) I was also wondering why the documentation mentions "(Monitor only)"
> but then it says
> "Gives a user read-only permissions for monitor, OSD, and PG data."?
>
> 3) And are those profiles really for "read-only" users? Why don't they
> have "read-only" in their name like the rbd and the corresponding
> "rbd-read-only" profile?

I don't know anything about Gnocchi (except the food) but to answer
the question in $SUBJECT:

https://docs.ceph.com/en/reef/rados/api/libcephsqlite/#user

You would want to use the simple-rados-client-with-blocklist profile
for a libcephsqlite application.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 16.2.14 pacific QE validation status

2023-08-24 Thread Patrick Donnelly
On Wed, Aug 23, 2023 at 10:41 AM Yuri Weinstein  wrote:
>
> Details of this release are summarized here:
>
> https://tracker.ceph.com/issues/62527#note-1
> Release Notes - TBD
>
> Seeking approvals for:
>
> smoke - Venky
> rados - Radek, Laura
>   rook - Sébastien Han
>   cephadm - Adam K
>   dashboard - Ernesto
>
> rgw - Casey
> rbd - Ilya
> krbd - Ilya
> fs - Venky, Patrick

approved

https://tracker.ceph.com/projects/cephfs/wiki/Pacific#2023-August-22


-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: help, ceph fs status stuck with no response

2023-08-14 Thread Patrick Donnelly
On Tue, Aug 8, 2023 at 1:18 AM Zhang Bao  wrote:
>
> Hi, thanks for your help.
>
> I am using ceph Pacific 16.2.7.
>
> Before my Ceph stuck at `ceph fs status fsname`, one of my cephfs became 
> readonly.

Probably the ceph-mgr is stuck (the "volumes" plugin) somehow talking
to the read-only CephFS. That's not a scenario we've tested well.

> The metadata pool of the readonly cephfs grew up from 10GB to 3TB. Then I 
> shut down the readonly mds.

Your metadata pool grew from 10GB to 3TB in read-only mode? That's unbelievable!

We would need a lot more information to help figure the cause. Such
as: `ceph tell mds.X perf dump`, `ceph tell mds.X status`, `ceph fs
dump`, and `ceph tell mgr.X perf dump` while this is occurring.


--
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph Leadership Team Meeting: 2023-08-09 Minutes

2023-08-09 Thread Patrick Donnelly
Today we discussed:

- Delegating more privileges for internal hardware to allow on-call
folks to fix issues.
- Maybe using CephFS for the teuthology VM /home directory (it became
full on Friday night)
- Preparation for Open Source Day: we are seeking "low-hanging-fruit"
tickets for new developers to try fixing.
- Reef is released! Time for blog posts. We are gathering options from PTLs.
- Ceph organization Github plan migration from the "bronze legacy
plan" to the FOSS "free" plan. There is some uncertainty about
surprise drawbacks, Ernesto is continuing his investigation.
- Case is updating contributors to generate accurate credits for the
new reef release: https://github.com/ceph/ceph/pull/52868

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: help, ceph fs status stuck with no response

2023-08-07 Thread Patrick Donnelly
On Mon, Aug 7, 2023 at 6:12 AM Zhang Bao  wrote:
>
> Hi,
>
> I have a ceph stucked at `ceph --verbose stats fs fsname`. And in  the
> monitor log, I can found something like `audit [DBG] from='client.431973 -'
> entity='client.admin' cmd=[{"prefix": "fs status", "fs": "fsname",
> "target": ["mon-mgr", ""]}]: dispatch`.

`ceph fs status` goes through the ceph-mgr. If there are slowdowns
with that daemon, the command may also be slow. You can share more
information about your cluster to help diagnose that.

You can use `ceph fs dump` to get most of the same information from
the mons directly.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: snaptrim number of objects

2023-08-07 Thread Patrick Donnelly
On Fri, Aug 4, 2023 at 5:41 PM Angelo Höngens  wrote:
>
> Hey guys,
>
> I'm trying to figure out what's happening to my backup cluster that
> often grinds to a halt when cephfs automatically removes snapshots.

CephFS does not "automatically" remove snapshots. Do you mean the
snap_schedule mgr module?

> Almost all OSD's go to 100% CPU, ceph complains about slow ops, and
> CephFS stops doing client i/o.

What health warnings do you see? You can try configuring snap trim:

https://docs.ceph.com/en/latest/rados/configuration/osd-config-ref/#confval-osd_snap_trim_sleep

> I'm graphing the cumulative value of the snaptrimq_len value, and that
> slowly decreases over time. One night it takes an hour, but other
> days, like today, my cluster has been down for almost 20 hours, and I
> think we're half way. Funny thing is that in both cases, the
> snaptrimq_len value initially goes to the same value, around 3000, and
> then slowly decreases, but my guess is that the number of objects that
> need to be trimmed varies hugely every day.
>
> Is there a way to show the size of cephfs snapshots, or get the number
> of objects or bytes that need snaptrimming?

Unfortunately, no.

> Perhaps I can graph that
> and see where the differences are.
>
> That won't explain why my cluster bogs down, but at least it gives
> some visibility. Running 17.2.6 everywhere by the way.

Please let us know how configuring snaptrim helps or not.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephfs - unable to create new subvolume

2023-07-21 Thread Patrick Donnelly
Hello karon,

On Fri, Jun 23, 2023 at 4:55 AM karon karon  wrote:
>
> Hello,
>
> I recently use cephfs in version 17.2.6
> I have a pool named "*data*" and a fs "*kube*"
> it was working fine until a few days ago, now i can no longer create a new
> subvolume*, *it gives me the following error:
>
> Error EINVAL: invalid value specified for ceph.dir.subvolume

We have heard other reports of this. We don't know how, but it seems
something has erroneously set the subvolume flag on parent
directories. Please try:

setfattr -n ceph.dir.subvolume -v 0 /volumes/csi

Then check if it works. If still not:

setfattr -n ceph.dir.subvolume -v 0 /volumes/

try again, if still not:

setfattr -n ceph.dir.subvolume -v 0 /

Please let us know which directory fixed the issue for you.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS cache is too large and crashes

2023-07-21 Thread Patrick Donnelly
Hello Sake,

On Fri, Jul 21, 2023 at 3:43 AM Sake Ceph  wrote:
>
> At 01:27 this morning I received the first email about MDS cache is too large 
> (mailing happens every 15 minutes if something happens). Looking into it, it 
> was again a standby-replay host which stops working.
>
> At 01:00 a few rsync processes start in parallel on a client machine. This 
> copies data from a NFS share to Cephfs share to sync the latest changes. (we 
> want to switch to Cephfs in the near future).
>
> This crashing of the standby-replay mds happend a couple times now, so I 
> think it would be good to get some help. Where should I look next?

It's this issue: https://tracker.ceph.com/issues/48673

Sorry I'm still evaluating the fix for it before merging. Hope to be
done with it soon.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Leadership Team Meeting, 2023-07-19 Minutes

2023-07-19 Thread Patrick Donnelly
Forgot the link:

On Wed, Jul 19, 2023 at 2:20 PM Patrick Donnelly  wrote:
>
> Hi folks,
>
> Today we discussed:
>
> - Reef is almost ready! The remaining issues are tracked in [1]. In
> particular, an epel9 package is holding back the release.

[1] https://pad.ceph.com/p/reef_final_blockers


-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph Leadership Team Meeting, 2023-07-19 Minutes

2023-07-19 Thread Patrick Donnelly
Hi folks,

Today we discussed:

- Reef is almost ready! The remaining issues are tracked in [1]. In
particular, an epel9 package is holding back the release.

- Vincent Hsu, Storage Group CTO of IBM, presented a proposal outline
for a Ceph Foundation Client Council. This council would be composed
of 10-25 invited significant operators or users of Ceph. The function
of the council is to provide essential feedback on use-cases,
pain-points, and successes arising during their use of Ceph. This
feedback will be used to steer development and initiatives. More
information on this will be forthcoming once the proposal is
finalized.
  The monthly user <-> dev meeting will be reevaluated in light of
this, possibly continuing on as usual.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: immutable bit

2023-07-07 Thread Patrick Donnelly
Unfortunately I think this ticket got put on the backburner then
forgotten. I've asked the team if anyone wants to work on it.

On Fri, Jul 7, 2023 at 6:38 PM Angelo Höngens  wrote:
>
> Hey guys and girls,
>
> I noticed CephFS on my kinda default 17.2.6 CephFS volume, it does not
> support setting the immutable bit. (Want to start using it with the
> Veeam hardened repo that uses the immutable bit).
>
> I do see a lot of very, very old posts with technical details on how
> to implement it, but is there a way for me to use that yet?
>
> Angelo.
>
>
> see https://tracker.ceph.com/issues/10679
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDSs report slow metadata IOs

2023-07-07 Thread Patrick Donnelly
https://docs.ceph.com/en/quincy/cephfs/createfs/#creating-pools

As an additional note: it's recommended to put the metadata pool on a
dedicated set of SSDs to prevent client load from disrupting MDS
performance.

On Fri, Jul 7, 2023 at 4:56 AM Ben  wrote:
>
> Hi,
>
> see many of this in cluster log channel. many are blocked with long period
> of seconds. It should hurt client access performance. Any ideas to get rid
> of them?
>
> Thanks,
> Ben
> -
> 7/7/23 4:48:50 PM
> [WRN]
> Health check update: 8 MDSs report slow metadata IOs (MDS_SLOW_METADATA_IO)
>
> 7/7/23 4:48:09 PM
> [WRN]
> Health check update: 7 MDSs report slow metadata IOs (MDS_SLOW_METADATA_IO)
>
> 7/7/23 4:48:09 PM
> [INF]
> MDS health message cleared (mds.?): 100+ slow metadata IOs are blocked > 30
> secs, oldest blocked for 559 secs
>
> 7/7/23 4:47:47 PM
> [WRN]
> Health check update: 8 MDSs report slow metadata IOs (MDS_SLOW_METADATA_IO)
>
> 7/7/23 4:47:11 PM
> [WRN]
> Health check update: 7 MDSs report slow metadata IOs (MDS_SLOW_METADATA_IO)
>
> 7/7/23 4:47:10 PM
> [INF]
> MDS health message cleared (mds.?): 100+ slow metadata IOs are blocked > 30
> secs, oldest blocked for 377 secs
>
> 7/7/23 4:46:22 PM
> [WRN]
> Health check update: 8 MDSs report slow metadata IOs (MDS_SLOW_METADATA_IO)
>
> 7/7/23 4:46:12 PM
> [WRN]
> Health check update: 7 MDSs report slow metadata IOs (MDS_SLOW_METADATA_IO)
>
> 7/7/23 4:45:40 PM
> [WRN]
> Health check update: 6 MDSs report slow metadata IOs (MDS_SLOW_METADATA_IO)
>
> 7/7/23 4:45:40 PM
> [INF]
> MDS health message cleared (mds.?): 100+ slow metadata IOs are blocked > 30
> secs, oldest blocked for 199 secs
>
> 7/7/23 4:45:12 PM
> [WRN]
> Health check update: 7 MDSs report slow metadata IOs (MDS_SLOW_METADATA_IO)
>
> 7/7/23 4:45:07 PM
> [WRN]
> Health check update: 6 MDSs report slow metadata IOs (MDS_SLOW_METADATA_IO)
>
> 7/7/23 4:44:58 PM
> [INF]
> MDS health message cleared (mds.?): 100+ slow metadata IOs are blocked > 30
> secs, oldest blocked for 565 secs
>
> 7/7/23 4:44:58 PM
> [WRN]
> Health check update: 7 MDSs report slow metadata IOs (MDS_SLOW_METADATA_IO)
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS metadata pool grows by two orders of magnitude while trimming (?) snapshots

2023-06-16 Thread Patrick Donnelly
Hi Janek,

On Mon, Jun 12, 2023 at 5:31 AM Janek Bevendorff
 wrote:
>
> Good news: We haven't had any new fill-ups so far. On the contrary, the
> pool size is as small as it's ever been (200GiB).

Great!

> Bad news: The MDS are still acting strangely. I have very uneven session
> load and I don't know where it comes from. ceph_mds_sessions_total_load
> reports a number of 1.4 million on mds.3, whereas all the others are
> mostly idle. I checked the client list on that rank, but the heaviest
> client has about 8k caps, which isn't very much at all. Most have 0 or
> 1. I don't see any blocked ops in flight. I don't think this is to do
> with the disabled balancer, because I've seen this pattern before.

That's interesting... I don't have an explanation.

> The event log size of 3/5 MDS is also very high, still. mds.1, mds.3,
> and mds.4 report between 4 and 5 million events, mds.0 around 1.4
> million and mds.2 between 0 and 200,000. The numbers have been constant
> since my last MDS restart four days ago.
>
> I ran your ceph-gather.sh script a couple of times, but dumps only
> mds.0. Should I modify it to dump mds.3 instead so you can have a look?

Yes, please.

> Janek
>
>
> On 10/06/2023 15:23, Patrick Donnelly wrote:
> > On Fri, Jun 9, 2023 at 3:27 AM Janek Bevendorff
> >  wrote:
> >> Hi Patrick,
> >>
> >>> I'm afraid your ceph-post-file logs were lost to the nether. AFAICT,
> >>> our ceph-post-file storage has been non-functional since the beginning
> >>> of the lab outage last year. We're looking into it.
> >> I have it here still. Any other way I can send it to you?
> > Nevermind, I found the machine it was stored on. It was a
> > misconfiguration caused by post-lab-outage rebuilds.
> >
> >>> Extremely unlikely.
> >> Okay, taking your word for it. But something seems to be stalling
> >> journal trimming. We had a similar thing yesterday evening, but at much
> >> smaller scale without noticeable pool size increase. I only got an alert
> >> that the ceph_mds_log_ev Prometheus metric starting going up again for a
> >> single MDS. It grew past 1M events, so I restarted it. I also restarted
> >> the other MDS and they all immediately jumped to above 5M events and
> >> stayed there. They are, in fact, still there and have decreased only
> >> very slightly in the morning. The pool size is totally within a normal
> >> range, though, at 290GiB.
> > Please keep monitoring it. I think you're not the only cluster to
> > experience this.
> >
> >>> So clearly (a) an incredible number of journal events are being logged
> >>> and (b) trimming is slow or unable to make progress. I'm looking into
> >>> why but you can help by running the attached script when the problem
> >>> is occurring so I can investigate. I'll need a tarball of the outputs.
> >> How do I send it to you if not via ceph-post-file?
> > It should work soon next week. We're moving the drop.ceph.com service
> > to a standalone VM soonish.
> >
> >>> Also, in the off-chance this is related to the MDS balancer, please
> >>> disable it since you're using ephemeral pinning:
> >>>
> >>> ceph config set mds mds_bal_interval 0
> >> Done.
> >>
> >> Thanks for your help!
> >> Janek
> >>
> >>
> >> --
> >>
> >> Bauhaus-Universität Weimar
> >> Bauhausstr. 9a, R308
> >> 99423 Weimar, Germany
> >>
> >> Phone: +49 3643 58 3577
> >> www.webis.de
> >>
> >
> --
>
> Bauhaus-Universität Weimar
> Bauhausstr. 9a, R308
> 99423 Weimar, Germany
>
> Phone: +49 3643 58 3577
> www.webis.de
>


-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS metadata pool grows by two orders of magnitude while trimming (?) snapshots

2023-06-10 Thread Patrick Donnelly
On Fri, Jun 9, 2023 at 3:27 AM Janek Bevendorff
 wrote:
>
> Hi Patrick,
>
> > I'm afraid your ceph-post-file logs were lost to the nether. AFAICT,
> > our ceph-post-file storage has been non-functional since the beginning
> > of the lab outage last year. We're looking into it.
>
> I have it here still. Any other way I can send it to you?

Nevermind, I found the machine it was stored on. It was a
misconfiguration caused by post-lab-outage rebuilds.

> > Extremely unlikely.
>
> Okay, taking your word for it. But something seems to be stalling
> journal trimming. We had a similar thing yesterday evening, but at much
> smaller scale without noticeable pool size increase. I only got an alert
> that the ceph_mds_log_ev Prometheus metric starting going up again for a
> single MDS. It grew past 1M events, so I restarted it. I also restarted
> the other MDS and they all immediately jumped to above 5M events and
> stayed there. They are, in fact, still there and have decreased only
> very slightly in the morning. The pool size is totally within a normal
> range, though, at 290GiB.

Please keep monitoring it. I think you're not the only cluster to
experience this.

> > So clearly (a) an incredible number of journal events are being logged
> > and (b) trimming is slow or unable to make progress. I'm looking into
> > why but you can help by running the attached script when the problem
> > is occurring so I can investigate. I'll need a tarball of the outputs.
>
> How do I send it to you if not via ceph-post-file?

It should work soon next week. We're moving the drop.ceph.com service
to a standalone VM soonish.

> > Also, in the off-chance this is related to the MDS balancer, please
> > disable it since you're using ephemeral pinning:
> >
> > ceph config set mds mds_bal_interval 0
>
> Done.
>
> Thanks for your help!
> Janek
>
>
> --
>
> Bauhaus-Universität Weimar
> Bauhausstr. 9a, R308
> 99423 Weimar, Germany
>
> Phone: +49 3643 58 3577
> www.webis.de
>


-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS metadata pool grows by two orders of magnitude while trimming (?) snapshots

2023-06-08 Thread Patrick Donnelly
On Mon, Jun 5, 2023 at 11:48 AM Janek Bevendorff
 wrote:
>
> Hi Patrick, hi Dan!
>
> I got the MDS back and I think the issue is connected to the "newly
> corrupt dentry" bug [1]. Even though I couldn't see any particular
> reason for the SIGABRT at first, I then noticed one of these awfully
> familiar stack traces.
>
> I rescheduled the two broken MDS ranks on two machines with 1.5TB RAM
> each (just to make sure it's not that) and then let them do their thing.
> The routine goes as follows: both replay the journal, then rank 4 goes
> into the "resolve" state, but as soon as rank 3 also starts resolving,
> they both crash.
>
> Then I set
>
> ceph config mds mds_abort_on_newly_corrupt_dentry false
> ceph config mds mds_go_bad_corrupt_dentry false
>
> and this time I was able to recover the ranks, even though "resolve" and
> "clientreplay" took forever. I uploaded a compressed log of rank 3 using
> ceph-post-file [2]. It's a log of several crash cycles, including the
> final successful attempt after changing the settings. The log
> decompresses to 815MB. I didn't censor any paths and they are not
> super-secret, but please don't share.

Probably only

ceph config mds mds_go_bad_corrupt_dentry false

was necessary for recovery. You don't have any logs showing it hit
those asserts?

I'm afraid your ceph-post-file logs were lost to the nether. AFAICT,
our ceph-post-file storage has been non-functional since the beginning
of the lab outage last year. We're looking into it.

> While writing this, the metadata pool size has reduced from 6TiB back to
> 440GiB. I am starting to think that the fill-ups may also be connected
> to the corruption issue.

Extremely unlikely.

> I also noticed that the ranks 3 and 4 always
> have huge journals. An inspection using ceph-journal-tool takes forever
> and consumes 50GB of memory in the process. Listing the events in the
> journal is impossible without running out of RAM. Ranks 0, 1, and 2
> don't have this problem and this wasn't a problem for ranks 3 and 4
> either before the fill-ups started happening.

So clearly (a) an incredible number of journal events are being logged
and (b) trimming is slow or unable to make progress. I'm looking into
why but you can help by running the attached script when the problem
is occurring so I can investigate. I'll need a tarball of the outputs.

Also, in the off-chance this is related to the MDS balancer, please
disable it since you're using ephemeral pinning:

ceph config set mds mds_bal_interval 0

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: slow mds requests with random read test

2023-06-07 Thread Patrick Donnelly
Are you using an EC pool?

On Wed, May 31, 2023 at 11:04 AM Ben  wrote:
>
> Thank you Patrick for help.
> The random write tests are performing well enough, though. Wonder why read 
> test is so poor with the same configuration(resulting read bandwidth about 
> 15MB/s vs 400MB/s of write).  especially the logs of slow requests are 
> irrelevant with testing ops. I am thinking it is something with cephfs kernel 
> client?
>
> Any other thoughts?
>
> Patrick Donnelly  于2023年5月31日周三 00:58写道:
>>
>> On Tue, May 30, 2023 at 8:42 AM Ben  wrote:
>> >
>> > Hi,
>> >
>> > We are performing couple performance tests on CephFS using fio. fio is run
>> > in k8s pod and 3 pods will be up running mounting the same pvc to CephFS
>> > volume. Here is command line for random read:
>> > fio -direct=1 -iodepth=128 -rw=randread -ioengine=libaio -bs=4k -size=1G
>> > -numjobs=5 -runtime=500 -group_reporting -directory=/tmp/cache
>> > -name=Rand_Read_Testing_$BUILD_TIMESTAMP
>> > The random read is performed very slow. Here is the cluster log from
>> > dashboard:
>> > [...]
>> > Any suggestions on the problem?
>>
>> Your random read workload is too extreme for your cluster of OSDs.
>> It's causing slow metadata ops for the MDS. To resolve this we would
>> normally suggest allocating a set of OSDs on SSDs for use by the
>> CephFS metadata pool to isolate the worklaods.
>>
>> --
>> Patrick Donnelly, Ph.D.
>> He / Him / His
>> Red Hat Partner Engineer
>> IBM, Inc.
>> GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
>>


-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: slow mds requests with random read test

2023-05-30 Thread Patrick Donnelly
On Tue, May 30, 2023 at 8:42 AM Ben  wrote:
>
> Hi,
>
> We are performing couple performance tests on CephFS using fio. fio is run
> in k8s pod and 3 pods will be up running mounting the same pvc to CephFS
> volume. Here is command line for random read:
> fio -direct=1 -iodepth=128 -rw=randread -ioengine=libaio -bs=4k -size=1G
> -numjobs=5 -runtime=500 -group_reporting -directory=/tmp/cache
> -name=Rand_Read_Testing_$BUILD_TIMESTAMP
> The random read is performed very slow. Here is the cluster log from
> dashboard:
> [...]
> Any suggestions on the problem?

Your random read workload is too extreme for your cluster of OSDs.
It's causing slow metadata ops for the MDS. To resolve this we would
normally suggest allocating a set of OSDs on SSDs for use by the
CephFS metadata pool to isolate the worklaods.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS crashes to damaged metadata

2023-05-24 Thread Patrick Donnelly
On Wed, May 24, 2023 at 4:26 AM Stefan Kooman  wrote:
>
> On 5/22/23 20:24, Patrick Donnelly wrote:
>
> >
> > The original script is here:
> > https://github.com/ceph/ceph/blob/main/src/tools/cephfs/first-damage.py
> >
> "# Suggested recovery sequence (for single MDS cluster):
> #
> # 1) Unmount all clients."
>
> Is this a hard requirement? This might not be feasible for an MDS with >
> 1K sessions, where not all mounts are in control of the Ceph operator.
> Would it also suffice to blocklist these clients?

Only for repair. You could run the script to just do a read-only scan.


-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [Help appreciated] ceph mds damaged

2023-05-24 Thread Patrick Donnelly
Hello Justin,

Please do:

ceph config set mds debug_mds 20
ceph config set mds debug_ms 1

Then wait for a crash. Please upload the log.

To restore your file system:

ceph config set mds mds_abort_on_newly_corrupt_dentry false

Let the MDS purge the strays and then try:

ceph config set mds mds_abort_on_newly_corrupt_dentry true

On Tue, May 23, 2023 at 7:04 PM Justin Li  wrote:
>
> Hi Patrick,
>
> Sorry for keeping bothering you but I found that MDS service kept crashing 
> even cluster shows MDS is up. I attached another log of MDS server - eowyn at 
> below. Look forward to hearing more insights. Thanks a lot.
>
> https://drive.google.com/file/d/1nD_Ks7fNGQp0GE5Q_x8M57HldYurPhuN/view?usp=sharing
>
> MDS crashed:
> root@eowyn:~# systemctl status  ceph-mds@eowyn
> ● ceph-mds@eowyn.service - Ceph metadata server daemon
>  Loaded: loaded (/lib/systemd/system/ceph-mds@.service; enabled; vendor 
> preset: enabled)
>  Active: failed (Result: signal) since Wed 2023-05-24 08:55:12 AEST; 24s 
> ago
> Process: 44349 ExecStart=/usr/bin/ceph-mds -f --cluster ${CLUSTER} --id 
> eowyn --setuser ceph --setgroup ceph (code=kill>
>Main PID: 44349 (code=killed, signal=ABRT)
>
> May 24 08:55:12 eowyn systemd[1]: ceph-mds@eowyn.service: Scheduled restart 
> job, restart counter is at 3.
> May 24 08:55:12 eowyn systemd[1]: Stopped Ceph metadata server daemon.
> May 24 08:55:12 eowyn systemd[1]: ceph-mds@eowyn.service: Start request 
> repeated too quickly.
> May 24 08:55:12 eowyn systemd[1]: ceph-mds@eowyn.service: Failed with result 
> 'signal'.
> May 24 08:55:12 eowyn systemd[1]: Failed to start Ceph metadata server daemon.
>
>
> Part of MDS log on eowyn (MDS server):
>-2> 2023-05-24T08:55:11.854+1000 7f1f8ee93700 -1 log_channel(cluster) log 
> [ERR] : MDS abort because newly corrupt dentry to be committed: [dentry 
> #0x100/stray0/1005480d3ac [19ce,head] auth (dversion lock) pv=2154265085 
> v=2154265074 ino=0x1005480d3ac state=1342177316 | purging=1 0x55b04517ca00]
> -1> 2023-05-24T08:55:11.858+1000 7f1f8ee93700 -1 
> /build/ceph-16.2.13/src/mds/CDentry.cc: In function 'bool 
> CDentry::check_corruption(bool)' thread 7f1f8ee93700 time 
> 2023-05-24T08:55:11.858329+1000
> /build/ceph-16.2.13/src/mds/CDentry.cc: 697: ceph_abort_msg("abort() called")
>
>  ceph version 16.2.13 (5378749ba6be3a0868b51803968ee9cde4833a3e) pacific 
> (stable)
>  1: (ceph::__ceph_abort(char const*, int, char const*, 
> std::__cxx11::basic_string, std::allocator 
> > const&)+0xe0) [0x7f1f99404495]
>  2: (CDentry::check_corruption(bool)+0x86b) [0x55b02652991b]
>  3: (StrayManager::_purge_stray_purged(CDentry*, bool)+0xc64) [0x55b026480ed4]
>  4: (MDSContext::complete(int)+0x61) [0x55b026601471]
>  5: (MDSIOContextBase::complete(int)+0x4fc) [0x55b026601b9c]
>  6: (Finisher::finisher_thread_entry()+0x19d) [0x7f1f994b8c6d]
>  7: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7f1f99146609]
>  8: clone()
>
>
>
>
> Justin Li
> Senior Technical Officer
> School of Information Technology
> Faculty of Science, Engineering and Built Environment
> For ICT Support please see https://www.deakin.edu.au/sebeicthelp
>
>
> Deakin University
> Melbourne Burwood Campus, 221 Burwood Highway, Burwood, VIC 3125
> +61 3 9246 8932
> justin...@deakin.edu.au
> http://www.deakin.edu.au/
> Deakin University CRICOS Provider Code 00113B
>
> Important Notice: The contents of this email are intended solely for the 
> named addressee and are confidential; any unauthorised use, reproduction or 
> storage of the contents is expressly prohibited. If you have received this 
> email in error, please delete it and any attachments immediately and advise 
> the sender by return email or telephone.
>
> Deakin University does not warrant that this email and any attachments are 
> error or virus free.
>
> -Original Message-
> From: Justin Li
> Sent: Wednesday, May 24, 2023 8:25 AM
> To: Patrick Donnelly 
> Cc: ceph-users@ceph.io
> Subject: RE: [ceph-users] [Help appreciated] ceph mds damaged
>
> Sorry Patrick, last email was restricted as attachment size. I attached a 
> link for you to download the log. Thanks.
> https://drive.google.com/drive/folders/1bV_X7vyma_-gTfLrPnEV27QzsdmgyK4g?usp=sharing
>
>
> Justin Li
> Senior Technical Officer
> School of Information Technology
> Faculty of Science, Engineering and Built Environment For ICT Support please 
> see https://www.deakin.edu.au/sebeicthelp
>
>
> Deakin University
> Melbourne Burwood Campus, 221 Burwood Highway, Burwood, VIC 3125
> +61 3 9246 8932
> justin...@deakin.edu.au
> http://www.deakin.edu.au/
> Deakin University CRICOS Provider Code 00113B
>
> Impo

[ceph-users] Re: [Help appreciated] ceph mds damaged

2023-05-23 Thread Patrick Donnelly
Hello Justin,

On Tue, May 23, 2023 at 4:55 PM Justin Li  wrote:
>
> Dear All,
>
> After a unsuccessful upgrade to pacific, MDS were offline and could not get 
> back on. Checked the MDS log and found below. See cluster info from below as 
> well. Appreciate it if anyone can point me to the right direction. Thanks.
>
>
> MDS log:
>
> 2023-05-24T06:21:36.831+1000 7efe56e7d700  1 mds.0.cache.den(0x600 
> 1005480d3b2) loaded already corrupt dentry: [dentry #0x100/stray0/1005480d3b2 
> [19ce,head] rep@0,-2.0<mailto:rep@0,-2.0> NULL (dversion lock) pv=0 
> v=2154265030 ino=(nil) state=0 0x556433addb80]
>
> -5> 2023-05-24T06:21:36.831+1000 7efe56e7d700 -1 mds.0.damage 
> notify_dentry Damage to dentries in fragment * of ino 0x600is fatal because 
> it is a system directory for this rank
>
> -4> 2023-05-24T06:21:36.831+1000 7efe56e7d700  5 mds.beacon.posco 
> set_want_state: up:active -> down:damaged
>
> -3> 2023-05-24T06:21:36.831+1000 7efe56e7d700  5 mds.beacon.posco Sending 
> beacon down:damaged seq 5339
>
> -2> 2023-05-24T06:21:36.831+1000 7efe56e7d700 10 monclient: 
> _send_mon_message to mon.ceph-3 at v2:10.120.0.146:3300/0
>
> -1> 2023-05-24T06:21:37.659+1000 7efe60690700  5 mds.beacon.posco 
> received beacon reply down:damaged seq 5339 rtt 0.827966
>
>  0> 2023-05-24T06:21:37.659+1000 7efe56e7d700  1 mds.posco respawn!
>
>
> Cluster info:
> root@ceph-1:~# ceph -s
>   cluster:
> id: e2b93a76-2f97-4b34-8670-727d6ac72a64
> health: HEALTH_ERR
> 1 filesystem is degraded
> 1 filesystem is offline
> 1 mds daemon damaged
>
>   services:
> mon: 3 daemons, quorum ceph-1,ceph-2,ceph-3 (age 26h)
> mgr: ceph-3(active, since 15h), standbys: ceph-1, ceph-2
> mds: 0/1 daemons up, 3 standby
> osd: 135 osds: 133 up (since 10h), 133 in (since 2w)
>
>   data:
> volumes: 0/1 healthy, 1 recovering; 1 damaged
> pools:   4 pools, 4161 pgs
> objects: 230.30M objects, 276 TiB
> usage:   836 TiB used, 460 TiB / 1.3 PiB avail
> pgs: 4138 active+clean
>  13   active+clean+scrubbing
>  10   active+clean+scrubbing+deep
>
>
>
> root@ceph-1:~# ceph health detail
> HEALTH_ERR 1 filesystem is degraded; 1 filesystem is offline; 1 mds daemon 
> damaged
> [WRN] FS_DEGRADED: 1 filesystem is degraded
> fs cephfs is degraded
> [ERR] MDS_ALL_DOWN: 1 filesystem is offline
> fs cephfs is offline because no MDS is active for it.
> [ERR] MDS_DAMAGE: 1 mds daemon damaged
> fs cephfs mds.0 is damaged

Do you have a complete log you can share? Try:

https://docs.ceph.com/en/quincy/man/8/ceph-post-file/

To get your upgrade to complete, you may set:

ceph config set mds mds_go_bad_corrupt_dentry false

--
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Deleting a CephFS volume

2023-05-22 Thread Patrick Donnelly
Hi Conrad,

On Wed, May 17, 2023 at 2:41 PM Conrad Hoffmann  wrote:
>
> On 5/17/23 18:07, Stefan Kooman wrote:
> > On 5/17/23 17:29, Conrad Hoffmann wrote:
> >> Hi all,
> >>
> >> I'm having difficulties removing a CephFS volume that I set up for
> >> testing. I've been through this with RBDs, so I do know about
> >> `mon_allow_pool_delete`. However, it doesn't help in this case.
> >>
> >> It is a cluster with 3 monitors. You can find a console log of me
> >> verifying that `mon_allow_pool_delete` is indeed true on all monitors
> >> but still fail to remove the volume here:
> >
> > That's not just a volume, that's the whole filesystem. If that's what
> > you want to do ... I see the MDS daemon is still up. IIRC there should
> > be no MDS running if you want to delete the fs. Can you stop the MDS
> > daemon and try again.
>
> That sort of got me in the right direction, but I am still confused. I
> don't think I understand the difference between a volume and a
> filesystem. I think I followed [1] when I set this up. It says to use
> `ceph fs volume create`. I went ahead and ran it again, and it certainly
> creates something that shows up in both `ceph fs ls` and `ceph fs volume
> ls`. Also, [2] says "FS volumes, an abstraction for CephFS file
> systems", so I guess they are the same thing?

Yes.

> At any rate, shutting down the MDS did _not_ help with `ceph fs volume
> rm` (it failed with the same error message), but it _did_ help with
> `ceph fs rm`, which then worked. Hard to make sense of, but I am pretty
> sure the error message I was seeing is pretty non-sensical in that
> context. Under what circumstance will `ceph fs volume rm` even work if
> it fails to delete a volume I just created?

`fs rm` just removes the file system from the monitor maps. You still
have the data pools lying around which is what the `volume rm` command
is complaining about.

Try:

ceph config set global mon_allow_pool_delete true
ceph fs volume rm ...

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS crashes to damaged metadata

2023-05-22 Thread Patrick Donnelly
On Mon, May 15, 2023 at 8:55 AM Stefan Kooman  wrote:
>
> On 12/15/22 15:31, Stolte, Felix wrote:
> > Hi Patrick,
> >
> > we used your script to repair the damaged objects on the weekend and it 
> > went smoothly. Thanks for your support.
> >
> > We adjusted your script to scan for damaged files on a daily basis, runtime 
> > is about 6h. Until thursday last week, we had exactly the same 17 Files. On 
> > thursday at 13:05 a snapshot was created and our active mds crashed once at 
> > this time (snapshot was created):
>
> Are you willing to share this script? I would like to use it to scan our
> CephFS before upgrading to 16.2.13. Do you run this script when the
> filesystem is online / active?

The original script is here:
https://github.com/ceph/ceph/blob/main/src/tools/cephfs/first-damage.py

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS crashes to damaged metadata

2023-05-22 Thread Patrick Donnelly
Hi Felix,

On Sat, May 13, 2023 at 9:18 AM Stolte, Felix  wrote:
>
> Hi Patrick,
>
> we have been running one daily snapshot since december and our cephfs crashed 
> 3 times because of this https://tracker.ceph.com/issues/38452
>
> We currentliy have 19 files with corrupt metadata found by your 
> first-damage.py script. We isolated the these files from access by users and 
> are waiting for a fix before we remove them with your script (or maybe a new 
> way?)

No other fix is anticipated at this time. Probably one will be
developed after the cause is understood.

> Today we upgraded our cluster from 16.2.11 and 16.2.13. After Upgrading the 
> mds  servers, cluster health went to ERROR MDS_DAMAGE. 'ceph tells mds 0 
> damage ls‘ is showing me the same files as your script (initially only a 
> part, after a cephfs scrub all of them).

This is expected. Once the dentries are marked damaged, the MDS won't
allow operations on those files (like those triggering tracker
#38452).

> I noticed "mds: catch damage to CDentry’s first member before persisting 
> (issue#58482, pr#50781, Patrick Donnelly)“ in the change logs for 16.2.13  
> and like to ask you the following questions:
>
> a) can we repair the damaged files online now instead of bringing down the 
> whole fs and using the python script?

Not yet.

> b) should we set one of the new mds options in our specific case to avoid our 
> fileserver crashing because of the wrong snap ids?

Have your MDS crashed or just marked the dentries damaged? If you can
reproduce a crash with detailed logs (debug_mds=20), that would be
incredibly helpful.

> c) will your patch prevent wrong snap ids in the future?

It will prevent persisting the damage.


-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS "newly corrupt dentry" after patch version upgrade

2023-05-10 Thread Patrick Donnelly
Hi Janek,

All this indicates is that you have some files with binary keys  that
cannot be decoded as utf-8. Unfortunately, the rados python library
assumes that omap keys can be decoded this way. I have a ticket here:

https://tracker.ceph.com/issues/59716

I hope to have a fix soon.

On Thu, May 4, 2023 at 3:15 AM Janek Bevendorff
 wrote:
>
> After running the tool for 11 hours straight, it exited with the
> following exception:
>
> Traceback (most recent call last):
>File "/home/webis/first-damage.py", line 156, in 
>  traverse(f, ioctx)
>File "/home/webis/first-damage.py", line 84, in traverse
>  for (dnk, val) in it:
>File "rados.pyx", line 1389, in rados.OmapIterator.__next__
>File "rados.pyx", line 318, in rados.decode_cstr
> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 8:
> invalid start byte
>
> Does that mean that the last inode listed in the output file is corrupt?
> Any way I can fix it?
>
> The output file has 14 million lines. We have about 24.5 million objects
> in the metadata pool.
>
> Janek
>
>
> On 03/05/2023 14:20, Patrick Donnelly wrote:
> > On Wed, May 3, 2023 at 4:33 AM Janek Bevendorff
> >  wrote:
> >> Hi Patrick,
> >>
> >>> I'll try that tomorrow and let you know, thanks!
> >> I was unable to reproduce the crash today. Even with
> >> mds_abort_on_newly_corrupt_dentry set to true, all MDS booted up
> >> correctly (though they took forever to rejoin with logs set to 20).
> >>
> >> To me it looks like the issue has resolved itself overnight. I had run a
> >> recursive scrub on the file system and another snapshot was taken, in
> >> case any of those might have had an effect on this. It could also be the
> >> case that the (supposedly) corrupt journal entry has simply been
> >> committed now and hence doesn't trigger the assertion any more. Is there
> >> any way I can verify this?
> > You can run:
> >
> > https://github.com/ceph/ceph/blob/main/src/tools/cephfs/first-damage.py
> >
> > Just do:
> >
> > python3 first-damage.py --memo run.1 
> >
> > No need to do any of the other steps if you just want a read-only check.
> >
> --
>
> Bauhaus-Universität Weimar
> Bauhausstr. 9a, R308
> 99423 Weimar, Germany
>
> Phone: +49 3643 58 3577
> www.webis.de
>


-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS Scrub Questions

2023-05-04 Thread Patrick Donnelly
On Thu, May 4, 2023 at 11:35 AM Chris Palmer  wrote:
>
> Hi
>
> Grateful if someone could clarify some things about CephFS Scrubs:
>
> 1) Am I right that a command such as "ceph tell mds.cephfs:0 scrub start
> / recursive" only triggers a forward scrub (not a backward scrub)?

The naming here that has become conventional is unfortunate. Forward
scrub really just means metadata scrub. There is no data integrity
checking.

cephfs-data-scan ("backward" scrub) is just attempting to recover
metadata from what's available on the data pool.

To answer your question: yes.

> 2) I couldn't find any reference to forward scrubs being done
> automatically and was wondering whether I should do them using cron? But
> then I saw an undated (but I think a little elderly) presentation by
> Greg Farnum that states that "forward scrub...runs continuously in the
> background". Is that still correct (for Quincy), and if so what controls
> the frequency?

He was probably referring to RADOS scrub. CephFS does not have any
continuous scrub and has no plans to introduce one.

> 3) Are backward scrubs always manual, using the 3 cephfs-data-scan phases?

Technically there are 5 phases with some other steps. Please check:
https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/#recovery-from-missing-metadata-objects

> 4) Are regular backward scrubs recommended, or only if there is
> indication of a problem? (With due regard to the amount of time they may
> take...)

cephfs-data-scan should only be employed for disaster recovery.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS "newly corrupt dentry" after patch version upgrade

2023-05-03 Thread Patrick Donnelly
On Wed, May 3, 2023 at 4:33 AM Janek Bevendorff
 wrote:
>
> Hi Patrick,
>
> > I'll try that tomorrow and let you know, thanks!
>
> I was unable to reproduce the crash today. Even with
> mds_abort_on_newly_corrupt_dentry set to true, all MDS booted up
> correctly (though they took forever to rejoin with logs set to 20).
>
> To me it looks like the issue has resolved itself overnight. I had run a
> recursive scrub on the file system and another snapshot was taken, in
> case any of those might have had an effect on this. It could also be the
> case that the (supposedly) corrupt journal entry has simply been
> committed now and hence doesn't trigger the assertion any more. Is there
> any way I can verify this?

You can run:

https://github.com/ceph/ceph/blob/main/src/tools/cephfs/first-damage.py

Just do:

python3 first-damage.py --memo run.1 

No need to do any of the other steps if you just want a read-only check.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS "newly corrupt dentry" after patch version upgrade

2023-05-02 Thread Patrick Donnelly
On Tue, May 2, 2023 at 10:31 AM Janek Bevendorff
 wrote:
>
> Hi,
>
> After a patch version upgrade from 16.2.10 to 16.2.12, our rank 0 MDS
> fails start start. After replaying the journal, it just crashes with
>
> [ERR] : MDS abort because newly corrupt dentry to be committed: [dentry
> #0x1/storage [2,head] auth (dversion lock)
>
> Immediately after the upgrade, I had it running shortly, but then it
> decided to crash for unknown reasons and I cannot get it back up.
>
> We have five ranks in total, the other four seem to be fine. I backed up
> the journal and tried to run cephfs-journal-tool --rank=cephfs.storage:0
> event recover_dentries summary, but it never finishes only eats up a lot
> of RAM. I stopped it after an hour and 50GB RAM.
>
> Resetting the journal makes the MDS crash with a missing inode error on
> another top-level directory, so I re-imported the backed-up journal. Is
> there any way to recover from this without rebuilding the whole file system?

Please be careful resetting the journal. It was not necessary. You can
try to recover the missing inode using cephfs-data-scan [2].

Thanks for the report. Unfortunately this looks like a false positive.
You're not using snapshots, right?

In any case, if you can reproduce it again with:

> ceph config mds debug_mds 20
> ceph config mds debug_ms 1

and upload the logs using ceph-post-file [1], that would be helpful to
understand what happened.

After that you can disable the check as Dan pointed out:

ceph config set mds mds_abort_on_newly_corrupt_dentry false
ceph config set mds mds_go_bad_corrupt_dentry false

NOTE FOR OTHER READERS OF THIS MAIL: it is not recommended to blindly
set these configs as the MDS is trying to catch legitimate metadata
corruption.

[1] https://docs.ceph.com/en/quincy/man/8/ceph-post-file/
[2] https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph Leadership Team Meeting, 2023-04-12 Minutes

2023-04-12 Thread Patrick Donnelly
Hi folks,

Today we discussed:

- Just short of 1 exabyte of Ceph storage reported to Telemetry.
Telemetry's data is public and viewable at:
https://telemetry-public.ceph.com/d/ZFYuv1qWz/telemetry?orgId=1
  If your cluster is not reporting to Telemetry, please consider it! :)

- A request from the Ceph Foundation Board to begin tracking component
(e.g. CephFS) roadmaps in docs (or somewhere else appropriate).
Concurrently, leads may also begin sending out status updates on a
~quarterly basis. To be discussed further.

- Cephalocon schedule is available: https://ceph2023.sched.com/

- A regression was reported for the exporter in 17.2.6:
https://github.com/ceph/ceph/pull/50718#issuecomment-1503376925
  A follow-up hotfix/announcement is planned.

- Next week's meeting is canceled due to Cephalocon/travel.

Meeting minutes available here as always:
https://pad.ceph.com/p/clt-weekly-minutes

--
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph.v17 multi-mds ephemeral directory pinning: cannot set or retrieve extended attribute

2023-04-10 Thread Patrick Donnelly
On Sun, Apr 9, 2023 at 11:21 PM Ulrich Pralle
 wrote:
>
> Hi,
>
> we are using ceph version 17.2.5 on Ubuntu 22.04.1 LTS.
>
> We deployed multi-mds (max_mds=4, plus standby-replay mds).
> Currently we statically directory-pinned our user home directories (~50k).
> The cephfs' root directory is pinned to '-1', ./homes is pinned to "0".
> All user home directories below ./homes/ are pinned to -1, 1, 2, or 3
> depending on a simple hash algorithm.
> Cephfs is provided to our users as samba/cifs (clustered samba,ctdb).
>
> We want to try ephemeral directory pinning.
>
> We can successfully set the extended attribute
> "ceph.dir.pin.distributed" with setfattr(1), but cannot retrieve its
> setting afterwards.:
>
> # setfattr -n ceph.dir.pin.distributed -v 1 ./units
> # getfattr -n ceph.dir.pin.distributed ./units
> ./units: ceph.dir.pin.distributed: No such attribute
>
> strace setfattr reports success on setxattr
>
> setxattr("./units", "ceph.dir.pin.distributed", "1", 1, 0) = 0
>
> strace getfattr reports
>
> lstat("./units", {st_mode=S_IFDIR|0751, st_size=1, ...}) = 0
> getxattr("./units", "ceph.dir.pin.distributed", NULL, 0) = -1 ENODATA
> (No data available)
>
> The file system is mounted
> rw,noatime,,name=,mds_namespace=.acl,recover_session=clean.
> The cephfs mds caps are "allow rwps".
> "./units" has a ceph.dir.layout="stripe_unit=4194304 stripe_count=1
> object_size=4194304 pool=fs_data_units"
> Ubuntu's setfattr is version 2.4.48.
>
> Defining other cephfs extend attributes (like ceph.dir.pin,
> ceph.quota.max_bytes, etc.) works as expected.
>
> What are we missing?

Your kernel doesn't appear to know how to check virtual extended
attributes yet. It should be in 5.18.

> Should we clear all static directory pinnings in advance?

Start by removing the pin on /home. Then remove a group of pins on
some users directories. Confirm /home looks something like:

ceph tell mds.:0 dump tree /home 0 | jq '.[0].dirfrags[] | .dir_auth'
"0"
"0"
"1"
"1"
"1"
"1"
"0"
"0"

Which tells you the dirfrags for /home are distributed across the
ranks (in this case, 0 and 1).

At that point, it should be fine to remove the rest of the manual pins.

> Are there any experience with ephemeral directory pinning?
> Or should one refrain from multi-mds at all?

It should work fine. Please give it a try and report back!

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS stuck in "up:replay"

2023-02-23 Thread Patrick Donnelly
Please use:

https://docs.ceph.com/en/quincy/man/8/ceph-post-file/

to share debug logs from the MDS.

On Wed, Feb 22, 2023 at 4:56 PM Thomas Widhalm  wrote:
>
> Ah, sorry. My bad.
>
> The MDS crashed and I restarted them. And I'm waiting for them to crash
> again.
>
> There's a tracker for this or a related issue:
> https://tracker.ceph.com/issues/58489
>
> Is there any place I can upload you anything from the logs? I'm still a
> bit new to Ceph but I guess, you'd like to have the crash logs?
>
> Thank you in advance. Any help is really appreciated. My filesystems are
> still completely down.
>
> Cheers,
> Thomas
>
> On 22.02.23 18:36, Patrick Donnelly wrote:
> > On Wed, Feb 22, 2023 at 12:10 PM Thomas Widhalm  
> > wrote:
> >>
> >> Hi,
> >>
> >> Thanks for the idea!
> >>
> >> I tried it immediately but still, MDS are in up:replay mode. So far they
> >> haven't crashed but this usually takes a few minutes.
> >>
> >> So no effect so far. :-(
> >
> > The commands I gave were for producing hopefully useful debug logs.
> > Not intended to fix the problem for you.
> >



-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS stuck in "up:replay"

2023-02-22 Thread Patrick Donnelly
On Wed, Feb 22, 2023 at 12:10 PM Thomas Widhalm  wrote:
>
> Hi,
>
> Thanks for the idea!
>
> I tried it immediately but still, MDS are in up:replay mode. So far they
> haven't crashed but this usually takes a few minutes.
>
> So no effect so far. :-(

The commands I gave were for producing hopefully useful debug logs.
Not intended to fix the problem for you.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [Quincy] Module 'devicehealth' has failed: disk I/O error

2023-02-22 Thread Patrick Donnelly
Hello Satish,

On Thu, Feb 9, 2023 at 11:52 AM Satish Patel  wrote:
>
> Folks,
>
> Any idea what is going on, I am running 3 node quincy  version of openstack
> and today suddenly i noticed the following error. I found reference link
> but not sure if that is my issue or not
> https://tracker.ceph.com/issues/51974
>
> root@ceph1:~# ceph -s
>   cluster:
> id: cd748128-a3ea-11ed-9e46-c309158fad32
> health: HEALTH_ERR
>
> 1 mgr modules have recently crashed
>
>   services:
> mon: 3 daemons, quorum ceph1,ceph2,ceph3 (age 2d)
> mgr: ceph1.ckfkeb(active, since 6h), standbys: ceph2.aaptny
> osd: 9 osds: 9 up (since 2d), 9 in (since 2d)
>
>   data:
> pools:   4 pools, 128 pgs
> objects: 1.18k objects, 4.7 GiB
> usage:   17 GiB used, 16 TiB / 16 TiB avail
> pgs: 128 active+clean
>
>
>
> root@ceph1:~# ceph health
> HEALTH_ERR Module 'devicehealth' has failed: disk I/O error; 1 mgr modules
> have recently crashed
> root@ceph1:~# ceph crash ls
> IDENTITY
>  NEW
> 2023-02-07T00:07:12.739187Z_fcb9cbc9-bb55-4e7c-bf00-945b96469035
>  mgr.ceph1.ckfkeb   *
> root@ceph1:~# ceph crash info
> 2023-02-07T00:07:12.739187Z_fcb9cbc9-bb55-4e7c-bf00-945b96469035
> {
> "backtrace": [
> "  File \"/usr/share/ceph/mgr/devicehealth/module.py\", line 373,
> in serve\nself.scrape_all()",
> "  File \"/usr/share/ceph/mgr/devicehealth/module.py\", line 425,
> in scrape_all\nself.put_device_metrics(device, data)",
> "  File \"/usr/share/ceph/mgr/devicehealth/module.py\", line 500,
> in put_device_metrics\nself._create_device(devid)",
> "  File \"/usr/share/ceph/mgr/devicehealth/module.py\", line 487,
> in _create_device\ncursor = self.db.execute(SQL, (devid,))",
> "sqlite3.OperationalError: disk I/O error"
> ],
> "ceph_version": "17.2.5",
> "crash_id":
> "2023-02-07T00:07:12.739187Z_fcb9cbc9-bb55-4e7c-bf00-945b96469035",
> "entity_name": "mgr.ceph1.ckfkeb",
> "mgr_module": "devicehealth",
> "mgr_module_caller": "PyModuleRunner::serve",
> "mgr_python_exception": "OperationalError",
> "os_id": "centos",
> "os_name": "CentOS Stream",
> "os_version": "8",
> "os_version_id": "8",
> "process_name": "ceph-mgr",
> "stack_sig":
> "7e506cc2729d5a18403f0373447bb825b42aafa2405fb0e5cfffc2896b093ed8",
> "timestamp": "2023-02-07T00:07:12.739187Z",
> "utsname_hostname": "ceph1",
> "utsname_machine": "x86_64",
> "utsname_release": "5.15.0-58-generic",
> "utsname_sysname": "Linux",
> "utsname_version": "#64-Ubuntu SMP Thu Jan 5 11:43:13 UTC 2023"

It is probably: https://tracker.ceph.com/issues/55606

It is annoying but not serious. The mgr simply lost its lock to the
sqlite database for the devicehealth module. You can workaround by
restarting the mgr:

ceph mgr fail

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS stuck in "up:replay"

2023-02-22 Thread Patrick Donnelly
On Wed, Jan 25, 2023 at 3:36 PM Thomas Widhalm  wrote:
>
> Hi,
>
> Sorry for the delay. As I told Venky directly, there seems to be a
> problem with DMARC handling of the Ceph users list. So it was blocked by
> the company I work for.
>
> So I'm writing from my personal e-mail address, now.
>
> Did I miss something?
>
> Venky, you said, that, as soon as the underlying issue is solved, my
> filesystems should come up again. Is there anything I can do to help
> with solving? Or do I need to wait for the bug to be solved and then
> upgrade my Ceph while CephFS is still broken?
>
> I'm still seeing both MDS counting up seq numbers for days now. That
> really puzzles me because at least one of them hasn't seen changes for
> weeks before the crash.

It is likely that the MDS is not able to communicate with the OSDs if
it's stuck in up:replay. Use:

ceph config set mds debug_ms 5
ceph config set mds debug_mds 10

and

ceph fs fail X
ceph fs set X joinable true

to get fresh logs from the MDS to see what's going with the messages
to the OSDs.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Problem with IO after renaming File System .data pool

2023-02-22 Thread Patrick Donnelly
On Mon, Jan 16, 2023 at 11:43 AM  wrote:
>
> Good morning everyone.
>
> On this Thursday night we went through an accident, where they accidentally 
> renamed the .data pool of a File System making it instantly inaccessible, 
> when renaming it again to the correct name it was possible to mount and list 
> the files, but could not read or write. When trying to write, the FS returned 
> as Read Only, when trying to read it returned Operation not allowed.

This should only happen if the osd caps for the credential are like:

allow rw pool=

In general, we recommend for cephfs clients to have osd caps like:

allow rw tag cephfs data=

The pool name could therefore change without affecting clients.

> After a period of breaking my head I tried to mount with the ADMIN user and 
> everything worked correctly.
>
> I tried to remove the authentication of the current user through `ceph auth 
> rm`, I created a new user through `ceph fs authorize  client. 
> / rw` and it continued the same way, I also tried to recreate it through 
> `ceph auth get-or-create` and nothing different happened, it stayed exactly 
> the same.
> After setting `allow *` in mon, mds and osd I was able to mount, read and 
> write again with the new user.
>
> I can understand why the File System stopped after renaming the pool, what I 
> don't understand is why users are unable to perform operations on FS even 
> with RW or any other user created.
>
> What could have happened behind the scenes to not be able to perform IO even 
> with the correct permissions? Or did I apply incorrect permissions that 
> caused this problem?

I suspect you didn't recreate the cap like you thought. Be sure to
verify with `ceph auth get` that the credential changed as expected.

FWIW, I tested that data pool renames do not break client I/O for a
cap generated with `ceph fs authorize...`. It works fine.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 17.2.5 ceph fs status: AssertionError

2023-02-22 Thread Patrick Donnelly
Hello Robert,

It's probably an instance of this bug: https://tracker.ceph.com/issues/24403

We think we know the cause and a reproducer/fix is planned.

On Wed, Jan 18, 2023 at 4:14 AM Robert Sander
 wrote:
>
> Hi,
>
> I have a healthy (test) cluster running 17.2.5:
>
> root@cephtest20:~# ceph status
>cluster:
>  id: ba37db20-2b13-11eb-b8a9-871ba11409f6
>  health: HEALTH_OK
>
>services:
>  mon: 3 daemons, quorum cephtest31,cephtest41,cephtest21 (age 2d)
>  mgr: cephtest22.lqzdnk(active, since 4d), standbys: 
> cephtest32.ybltym, cephtest42.hnnfaf
>  mds: 1/1 daemons up, 1 standby, 1 hot standby
>  osd: 48 osds: 48 up (since 4d), 48 in (since 4M)
>  rgw: 2 daemons active (2 hosts, 1 zones)
>  tcmu-runner: 6 portals active (3 hosts)
>
>data:
>  volumes: 1/1 healthy
>  pools:   17 pools, 513 pgs
>  objects: 28.25k objects, 4.7 GiB
>  usage:   26 GiB used, 4.7 TiB / 4.7 TiB avail
>  pgs: 513 active+clean
>
>io:
>  client:   4.3 KiB/s rd, 170 B/s wr, 5 op/s rd, 0 op/s wr
>
> CephFS is mounted and can be used without any issue.
>
> But I get an error when I when querying its status:
>
> root@cephtest20:~# ceph fs status
> Error EINVAL: Traceback (most recent call last):
>File "/usr/share/ceph/mgr/mgr_module.py", line 1757, in _handle_command
>  return CLICommand.COMMANDS[cmd['prefix']].call(self, cmd, inbuf)
>File "/usr/share/ceph/mgr/mgr_module.py", line 462, in call
>  return self.func(mgr, **kwargs)
>File "/usr/share/ceph/mgr/status/module.py", line 159, in handle_fs_status
>  assert metadata
> AssertionError
>
>
> The dashboard's filesystem page shows no error and displays
> all information about cephfs.
>
> Where does this AssertionError come from?
>
> Regards
> --
> Robert Sander
> Heinlein Support GmbH
> Linux: Akademie - Support - Hosting
> http://www.heinlein-support.de
>
> Tel: 030-405051-43
> Fax: 030-405051-19
>
> Zwangsangaben lt. §35a GmbHG:
> HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
> Geschäftsführer: Peer Heinlein  -- Sitz: Berlin
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io



-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Retrieve number of read/write operations for a particular file in Cephfs

2023-01-20 Thread Patrick Donnelly
Hello,

On Mon, Jan 16, 2023 at 11:04 AM thanh son le  wrote:
>
> Hi,
>
> I have been studying the document from Ceph and Rados but I could not find
> any metrics to measure the number of read/write operations for each file. I
> understand that Cephfs is the front-end, the file is going to be stored as
> an object in the OSD and I have found that Ceph provides a Cache Tiering
> feature which also requires the monitor for read/write operation for each
> object. Could someone please give me guidance on how this is achieved?
> Thanks.

You can try to get the "perf dump" from ceph-fuse to measure. The osd
write/reads are not fine-grained though.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS crashes to damaged metadata

2023-01-08 Thread Patrick Donnelly
On Thu, Dec 15, 2022 at 9:32 AM Stolte, Felix  wrote:
>
> Hi Patrick,
>
> we used your script to repair the damaged objects on the weekend and it went 
> smoothly. Thanks for your support.
>
> We adjusted your script to scan for damaged files on a daily basis, runtime 
> is about 6h. Until thursday last week, we had exactly the same 17 Files. On 
> thursday at 13:05 a snapshot was created and our active mds crashed once at 
> this time (snapshot was created):
>
> 2022-12-08T13:05:48.919+0100 7f440afec700 -1 
> /build/ceph-16.2.10/src/mds/ScatterLock.h: In function 'void 
> ScatterLock::set_xlock_snap_sync(MDSContext*)' thread 7f440afec700 time 
> 2022-12-08T13:05:48.921223+0100
> /build/ceph-16.2.10/src/mds/ScatterLock.h: 59: FAILED ceph_assert(state 
> LOCK_XLOCK || state LOCK_XLOCKDONE)
>
> 12 Minutes lates the unlink_local error crashes appeared again. This time 
> with a new file. During debugging we noticed a MTU mismatch between MDS 
> (1500) and client (9000) with cephfs kernel mount. The client is also 
> creating the snapshots via mkdir in the .snap directory.
>
> We disabled snapshot creation for now, but really need this feature. I 
> uploaded the mds logs of the first crash along with the information above to 
> https://tracker.ceph.com/issues/38452
>
> I would greatly appreciate it, if you could answer me the following question:
>
> Is the Bug related to our MTU Mismatch? We fixed the MTU Issue going back to 
> 1500 on all nodes in the ceph public network on the weekend also.

I doubt it.

> If you need a debug level 20 log of the ScatterLock for further analysis, i 
> could schedule snapshots at the end of our workdays and increase the debug 
> level 5 Minutes arround snap shot creation.

This would be very helpful!

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: mds stuck in standby, not one active

2022-12-15 Thread Patrick Donnelly
ephadm-vm.zwagng crashed on host cephadm-vm at
> 2022-12-13T13:23:39.888401Z
>  mgr.cephadm-vm.zwagng crashed on host cephadm-vm at
> 2022-12-13T13:27:56.458529Z
>  mgr.cephadm-vm.zwagng crashed on host cephadm-vm at
> 2022-12-13T13:31:03.791532Z
>  mgr.cephadm-vm.zwagng crashed on host cephadm-vm at
> 2022-12-13T13:34:24.023106Z
>  osd.98 crashed on host store3 at 2022-12-13T16:11:38.064735Z
>  mgr.store1.uevcpd crashed on host store1 at 2022-12-13T18:39:33.091261Z
>  osd.322 crashed on host store6 at 2022-12-14T06:06:14.193437Z
>  osd.234 crashed on host store8 at 2022-12-15T02:32:13.009795Z
>  osd.311 crashed on host store8 at 2022-12-15T02:32:18.407978Z
>
> As suggested I was going to upgrade the ceph cluster to 16.2.7 to fix
> the mds issue, but it seems none of the running standby daemons is
> responding.

Suggest also looking at the cephadm logs which may explain how it's stuck:

https://docs.ceph.com/en/quincy/cephadm/operations/#watching-cephadm-log-messages

Except that your MDS daemons have not been upgraded, I don't see a
problem from the CephFS side. You can try removing the daemons, it
probably can't make things worse :)

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: mds stuck in standby, not one active

2022-12-15 Thread Patrick Donnelly
On Thu, Dec 15, 2022 at 7:24 AM Mevludin Blazevic
 wrote:
>
> Hi,
>
> while upgrading to ceph pacific 6.2.7, the upgrade process stuck exactly
> at the mds daemons. Before, I have tried to increase/shrink the
> placement size of them, but nothing happens. Currently I have 4/3
> running daemons. One daemon should be stopped and removed.
>
> Do you suggest to force remove these daemons or what could be the
> preferred workaround?

Hard to say without more information. Please share:

ceph fs dump
ceph status
ceph health detail

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: mds stuck in standby, not one active

2022-12-13 Thread Patrick Donnelly
On Tue, Dec 13, 2022 at 2:21 PM Mevludin Blazevic
 wrote:
>
> Hi,
>
> thanks for the quick response!
>
> CEPH STATUS:
>
> cluster:
>  id: 8c774934-1535-11ec-973e-525400130e4f
>  health: HEALTH_ERR
>  7 failed cephadm daemon(s)
>  There are daemons running an older version of ceph
>  1 filesystem is degraded
>  1 filesystem has a failed mds daemon
>  1 filesystem is offline
>  1 filesystem is online with fewer MDS than max_mds
>  23 daemons have recently crashed
>
>services:
>  mon: 2 daemons, quorum cephadm-vm,store2 (age 12d)
>  mgr: store1.uevcpd(active, since 34m), standbys: cephadm-vm.zwagng
>  mds: 0/1 daemons up (1 failed), 4 standby
>  osd: 324 osds: 318 up (since 3h), 318 in (since 2h)
>
>data:
>  volumes: 0/1 healthy, 1 failed
>  pools:   6 pools, 257 pgs
>  objects: 2.61M objects, 9.8 TiB
>  usage:   29 TiB used, 2.0 PiB / 2.0 PiB avail
>  pgs: 257 active+clean
>
>io:
>  client:   0 B/s rd, 2.8 MiB/s wr, 435 op/s rd, 496 op/s wr
>
> FS DUMP:
>
> e60
> enable_multiple, ever_enabled_multiple: 1,1
> default compat: compat={},rocompat={},incompat={1=base v0.20,2=client
> writeable ranges,3=default file layouts on dirs,4=dir inode in separate
> object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no
> anchor table,9=file layout v2,10=snaprealm v2}
> legacy client fscid: 1
>
> Filesystem 'ceph_fs' (1)
> fs_name ceph_fs
> epoch   58
> flags   32
> created 2022-11-28T12:05:17.203346+
> modified2022-12-13T19:03:46.707236+
> tableserver 0
> root0
> session_timeout 60
> session_autoclose   300
> max_file_size   1099511627776
> required_client_features{}
> last_failure0
> last_failure_osd_epoch  196035
> compat  compat={},rocompat={},incompat={1=base v0.20,2=client writeable
> ranges,3=default file layouts on dirs,4=dir inode in separate
> object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no
> anchor table,9=file layout v2,10=snaprealm v2}
> max_mds 2
> in  0
> up  {}
> failed  0
> damaged
> stopped
> data_pools  [4]
> metadata_pool   5
> inline_data disabled
> balancer
> standby_count_wanted1
>
>
> Standby daemons:
>
> [mds.ceph_fs.store5.gnlqqm{-1:152180029} state up:standby seq 1
> join_fscid=1 addr
> [v2:192.168.50.135:6800/3548272808,v1:192.168.50.135:6801/3548272808]
> compat {c=[1],r=[1],i=[1]}]
> [mds.ceph_fs.store6.fxgvoj{:915af89} state up:standby seq 1
> join_fscid=1 addr
> [v2:192.168.50.136:1b70/4fde2aa0,v1:192.168.50.136:1b71/4fde2aa0] compat
> {c=[1],r=[1],i=[1]}]
> [mds.ceph_fs.store4.mhvpot{:916a09d} state up:standby seq 1
> join_fscid=1 addr
> [v2:192.168.50.134:1a90/b8b1f33c,v1:192.168.50.134:1a91/b8b1f33c] compat
> {c=[1],r=[1],i=[1]}]
> [mds.ceph_fs.store3.vcnwzh{ffff:916aff7} state up:standby seq 1
> join_fscid=1 addr
> [v2:192.168.50.133:1a90/49cb4e4,v1:192.168.50.133:1a91/49cb4e4] compat
> {c=[1],r=[1],i=[1]}]
> dumped fsmap epoch 60

You're encountering a bug fixed in v16.2.7. Please upgrade to the
latest version.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: mds stuck in standby, not one active

2022-12-13 Thread Patrick Donnelly
On Tue, Dec 13, 2022 at 2:02 PM Mevludin Blazevic
 wrote:
>
> Hi all,
>
> in Ceph Pacific 6.2.5, the MDS failover function does not working. The
> one host with the active MDS hat to be rebooted and after that, the
> standby deamons did not jump in. The fs was not accessible, instead all
> mds remain until now to standby. Also the cluster remains in Ceph Error
> due to inactive mds so I did the following:
>
> ceph fs set cephfs false
> ceph fs set cephfs max_mds 2
>
> We also tried to restart the mds by the given yaml file, nothing works.
>
> The Ceph FS pool is green and clean.

Please share:

ceph status
ceph fs dump

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS crashes to damaged metadata

2022-11-30 Thread Patrick Donnelly
You can run this tool. Be sure to read the comments.

https://github.com/ceph/ceph/blob/main/src/tools/cephfs/first-damage.py

As of now what causes the damage is not yet known but we are trying to
reproduce it. If your workload reliably produces the damage, a
debug_mds=20 MDS log would be extremely helpful.

On Wed, Nov 30, 2022 at 6:15 PM Stolte, Felix  wrote:
>
> Hi Patrick,
>
> it does seem like it. We are not using postgres on cephfs as far as i know. 
> We narrowed it down to three damaged inodes, but files in question had been 
> xlsx, pdf or pst.
>
> Do you have any suggestion how to fix this?
>
> Is there a way to scan the cephfs for damaged inodes?
>
>
> -
> -
> Forschungszentrum Juelich GmbH
> 52425 Juelich
> Sitz der Gesellschaft: Juelich
> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
> Vorsitzender des Aufsichtsrats: MinDir Volker Rieke
> Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
> Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
> Dr. Astrid Lambrecht, Prof. Dr. Frauke Melchior
> -
> -----
>
> Am 30.11.2022 um 22:49 schrieb Patrick Donnelly :
>
> On Wed, Nov 30, 2022 at 3:10 PM Stolte, Felix  wrote:
>
>
> Hey guys,
>
> our mds daemons are crashing constantly when someone is trying to delete a 
> file:
>
> -26> 2022-11-29T12:32:58.807+0100 7f081b458700 -1 
> /build/ceph-16.2.10/src/mds/Server.cc<http://server.cc/>: In function 'void 
> Server::_unlink_local(MDRequestRef&, CDentry*, CDentry*)' thread 7f081b458700 
> time 2022-11-29T12:32:58.808844+0100
>
> 2022-11-29T12:32:58.807+0100 7f081b458700  4 mds.0.server 
> handle_client_request client_request(client.1189402075:14014394 unlink 
> #0x100197fa8e0/~$29.11. T.xlsx 2022-11-29T12:32:23.711889+0100 RETRY=1 
> caller_uid=133365,
>
> I observed that the corresponding object in the cephfs data pool does not 
> exist. Basically our MDS Daemons are crashing each time, when somone tries to 
> delete a file which does not exist in the data pool but metadata says 
> otherwise.
>
> Any suggestions how to fix this problem?
>
>
> Is this it?
>
> https://tracker.ceph.com/issues/38452
>
> Are you running postgres on CephFS by chance?
>
> --
> Patrick Donnelly, Ph.D.
> He / Him / His
> Principal Software Engineer
> Red Hat, Inc.
> GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
>
>


-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS crashes to damaged metadata

2022-11-30 Thread Patrick Donnelly
On Wed, Nov 30, 2022 at 3:10 PM Stolte, Felix  wrote:
>
> Hey guys,
>
> our mds daemons are crashing constantly when someone is trying to delete a 
> file:
>
> -26> 2022-11-29T12:32:58.807+0100 7f081b458700 -1 
> /build/ceph-16.2.10/src/mds/Server.cc<http://server.cc/>: In function 'void 
> Server::_unlink_local(MDRequestRef&, CDentry*, CDentry*)' thread 7f081b458700 
> time 2022-11-29T12:32:58.808844+0100
>
> 2022-11-29T12:32:58.807+0100 7f081b458700  4 mds.0.server 
> handle_client_request client_request(client.1189402075:14014394 unlink 
> #0x100197fa8e0/~$29.11. T.xlsx 2022-11-29T12:32:23.711889+0100 RETRY=1 
> caller_uid=133365,
>
> I observed that the corresponding object in the cephfs data pool does not 
> exist. Basically our MDS Daemons are crashing each time, when somone tries to 
> delete a file which does not exist in the data pool but metadata says 
> otherwise.
>
> Any suggestions how to fix this problem?

Is this it?

https://tracker.ceph.com/issues/38452

Are you running postgres on CephFS by chance?

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS internal op exportdir despite ephemeral pinning

2022-11-29 Thread Patrick Donnelly
Hi Frank,

Sorry for the delay and thanks for sharing the data privately.

On Wed, Nov 23, 2022 at 4:00 AM Frank Schilder  wrote:
>
> Hi Patrick and everybody,
>
> I wrote a small script that pins the immediate children of 3 sub-dirs on our 
> file system in a round-robin way to our 8 active ranks. I think the 
> experience is worth reporting here. In any case, Patrick, if you can help me 
> get distributed ephemeral pinning to work, this would be great as the 
> automatic pin updates when changing the size of the MDS cluster would 
> simplify life as an admin a lot.
>
> Before starting the script, the load-balancer had created and distributed 
> about 30K sub-trees over the MDSes. Running the script and setting the pins 
> (with a sleep 1 in between) immediately triggered a re-distribution and 
> consolidation of sub-trees. They were consolidated on the MDSes they were 
> pinned to. During this process no health issues. The process took a few 
> minutes to complete.
>
> After that, we ended up with very few sub-trees. Today, the distribution 
> looks like this (ceph tell mds.$mds get subtrees | grep '"path":' | wc -l):
>
> ceph-14: 27
> ceph-16: 107
> ceph-23: 39
> ceph-13: 32
> ceph-17: 27
> ceph-11: 55
> ceph-12: 49
> ceph-10: 24
>
> Rank 1 (ceph-16) has a few more pinned to by hand, but these are not very 
> active.
>
> After the sub-tree consolidation completed, there was suddenly very low load 
> on the MDS cluster and the meta-data pools. Also, the MDS daemons went down 
> in CPU load to 5-10% compared with the usual 80-140%.

Great!

> At first I thought things went bad, but logging in to a client showed there 
> were no problems. I did a standard benchmark and noticed a 3 to 4 times 
> increased single thread IOP/s performance! What I also see is that the MDS 
> cache allocation is very stable now, they need much less RAM compared with 
> before and they don't trash much. No file-system related slow OPS/requests 
> warning in the logs any more! I used to have exportdir/rejoin/behind on 
> trimming a lot, its all gone.
>
> Conclusion: The build-in dynamic load balancer seems to have been responsible 
> for 90-95% of the FS load - completely artificial internal load that was 
> greatly limiting client performance. I think making the internal load 
> balancer much less aggressive would help a lot. Default could be round-robin 
> pin of low-depth sub-dirs and then changing a pin every few hours based on a 
> number of activity metrics over, say 7 days, 1 day and 4 hours to aim for a 
> long-term stable pin distribution.
>
> For example, on our cluster if the most busy 2-3 high-level sub-tree pins are 
> considered for moving every 24h it would be completely sufficient. Also, 
> considering sub-trees very deep in the hierarchy seems pointless. A balancer 
> sub-tree max-depth setting to limit the depth the load balancer looks at 
> would probably improve things. I had a high-level sub-dir distributed over 
> 10K sub-trees, which really didn't help performance at all.
>
> If anyone has the dynamic balancer in action, intentionally or not, it might 
> be worth trying to pin everything up to a depth of 2-3 in the FS tree.

Hmm, maybe you forgot to turn on the configs?

https://docs.ceph.com/en/octopus/cephfs/multimds/#setting-subtree-partitioning-policies

"Both random and distributed ephemeral pin policies are off by default
in Octopus. The features may be enabled via the
mds_export_ephemeral_random and mds_export_ephemeral_distributed
configuration options."

Otherwise, maybe you found a bug. I would suggest keeping your
round-robin script until you can upgrade to Pacific or Quincy.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS internal op exportdir despite ephemeral pinning

2022-11-18 Thread Patrick Donnelly
On Fri, Nov 18, 2022 at 2:32 PM Frank Schilder  wrote:
>
> Hi Patrick,
>
> we plan to upgrade next year. Can't do any faster. However, distributed 
> ephemeral pinning was introduced with octopus. It was one of the major new 
> features and is explained in the octopus documentation in detail.
>
> Are you saying that it is actually not implemented?
> If so, how much of the documentation can I trust?

Generally you can trust the documentation. There are configurations
gating these features, as you're aware. While the documentation didn't
say as much, that indicates they are "previews".

> If it is implemented, I would like to get it working - if this is possible at 
> all. Would you still take a look at the data?

I'm willing to look.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS internal op exportdir despite ephemeral pinning

2022-11-18 Thread Patrick Donnelly
On Fri, Nov 18, 2022 at 2:11 PM Frank Schilder  wrote:
>
> Hi Patrick,
>
> thanks for your super fast answer.
>
> > I assume you mean "distributed ephemeral pinning"?
>
> Yes. Just to remove any potential for a misunderstanding from my side, I 
> enabled it with (copy-paste from the command history, /mnt/admin/cephfs/ is 
> the mount point of "/" with all possible client permissions granted):
>
> setfattr -n ceph.dir.pin.distributed -v 1 /mnt/admin/cephfs/hpc/home
> setfattr -n ceph.dir.pin.distributed -v 1 /mnt/admin/cephfs/hpc/groups
> setfattr -n ceph.dir.pin.distributed -v 1 /mnt/admin/cephfs/shares
>
> # ceph versions
> {
> "mon": {
> "ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) 
> octopus (stable)": 5
> },
> "mgr": {
> "ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) 
> octopus (stable)": 5
> },
> "osd": {
> "ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) 
> octopus (stable)": 1048
> },
> "mds": {
> "ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) 
> octopus (stable)": 12
> },
> "overall": {
> "ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) 
> octopus (stable)": 1070
> }
> }

Octopus is really too old for this. I think it's missing some very
important performance related patches. I suggest you upgrade if you
really want to use this.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS internal op exportdir despite ephemeral pinning

2022-11-18 Thread Patrick Donnelly
On Fri, Nov 18, 2022 at 12:51 PM Frank Schilder  wrote:
>
> Hi Patrick,
>
> thanks! I did the following but don't know how to interpret the result. The 
> three directories we have ephemeral pinning set are:
>
> /shares
> /hpc/home
> /hpc/groups

I assume you mean "distributed ephemeral pinning"?

> If I understand the documentation correctly, everything under /hpc/home/user 
> should be on the same MDS. Trying it out I get (user-name obscured):
>
> # for mds in $(bin/active_mds); do
>   echo -n "${mds}: "
>   ceph tell mds.$mds get subtrees | grep '"/hpc/home/user' | wc -l
> done 2>/dev/null
> ceph-13: 14
> ceph-16: 2
> ceph-14: 2
> ceph-08: 14
> ceph-17: 0
> ceph-11: 6
> ceph-12: 14
> ceph-10: 14
>
> Its all over the place. Could you please help me with how I should interpret 
> this?

Please share the version you're using. "/hpc/home/user" should not
show up in the subtree output. If possible, can you privately share
with me the output of:

- `ceph versions`
- `ceph fs dump`
- `get subtrees` on each active MDS
- `dump tree /hpc/home 0` on each active MDS

Feel free to anonymize path strings as desired.


-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS internal op exportdir despite ephemeral pinning

2022-11-18 Thread Patrick Donnelly
On Thu, Nov 17, 2022 at 4:45 AM Frank Schilder  wrote:
>
> Hi Patrick,
>
> thanks for your explanation. Is there a way to check which directory is 
> exported? For example, is the inode contained in the messages somewhere? A 
> readdir would usually happen on log-in and the number of slow exports seems 
> much higher than the number of people logging in (I would assume there are a 
> lot more that go without logging).

You can set debugging to 4 on the MDS and you should see messages for
each export. Or you can monitor subtrees on your MDS by periodically
running `get subtrees` command on each one.

> Also, does an export happen for every client connection? For example, we have 
> a 500+ node HPC cluster with kernel mounts. If a job starts on a dir that 
> needs to be loaded to cache, would such an export happen for every client 
> node (we do dropcaches on client nodes after job completion, so there is 
> potential for reloading data)?

The export only happens once the directory is loaded into cache.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS internal op exportdir despite ephemeral pinning

2022-11-16 Thread Patrick Donnelly
Hello Frank,

On Wed, Nov 16, 2022 at 5:38 AM Frank Schilder  wrote:
>
> Hi all,
>
> I have a question about ephemeral pinning on octopus latest. We have 
> ephemeral pinning set on all directories that are mounted (well on all their 
> parents), like /home etc. Every mount point of a ceph file system should, 
> therefore, be pinned to a specific and fixed MDS rank. However, in the log I 
> see a lot of slow ops warnings like this one:
>
> slow request 33.765074 seconds old, received at 
> 2022-11-16T11:30:28.340294+0100: internal op exportdir:mds.0:34770855 
> currently failed to wrlock, waiting
>
> I don't understand why MDSes still export directories between each other. Am 
> I misunderstanding the warning? What is happening here and why are these ops 
> there? Does this point to a config problem?

It may be whatever /home/X directory was pruned from the cache,
someone did /readdir on that directory thereby loading it into cache,
then the MDS authoritative for /home (probably 0?) exported that
directory to wherever it should go.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Temporary shutdown of subcluster and cephfs

2022-10-25 Thread Patrick Donnelly
On Tue, Oct 25, 2022 at 3:48 AM Frank Schilder  wrote:
>
> Hi Patrick,
>
> thanks for your answer. This is exactly the behaviour we need.
>
> For future reference some more background:
>
> We need to prepare a quite large installation for planned power outages. Even 
> though they are called planned, we will not be able to handle these manually 
> in good time for reasons irrelevant here. Our installation is protected by an 
> UPS, but the guaranteed uptime on outage is only 6 minutes. So, we talk more 
> about transient protection than uninterrupted power supply. Although we 
> survived more than 20 minute power outages without loss of power to the DC, 
> we need to plan with these 6 minutes.
>
> In these 6 minutes, we need to wait for at least 1-2 minutes to avoid 
> unintended shut-downs. In the remaining 4 minutes, we need to take down a 500 
> node HPC cluster and an 1000OSD+12MDS+2MON ceph sub-cluster. Part of this 
> ceph cluster will continue running on another site with higher power 
> redundancy. This gives maybe 1-2 minutes response time for the ceph cluster 
> and the best we can do is to try to achieve a "consistent at rest" state and 
> hope we can cleanly power down the system before the power is cut.
>
> Why am I so concerned about a "consistent at rest" state?
>
> Its because while not all instances of a power loss lead to data loss, all 
> instances of data loss I know of and were not caused by admin errors were 
> caused by a power loss (see https://tracker.ceph.com/issues/46847). We were 
> asked to prepare for a worst case of weekly power cuts, so no room for taking 
> too many chances here. Our approach is: unmount as much as possible, fail the 
> quickly FS to stop all remaining IO, give OSDs and MDSes a chance to flush 
> pending operations to disk or journal and then try a clean shut down.

To be clear in case there is any confusion: once you do `fs fail`, the
MDS are removed from the cluster and they will respawn. They are not
given any time to flush remaining I/O.

FYI as this may interest you: we have a ticket to set a flag on the
file system to prevent new client mounts:
https://tracker.ceph.com/issues/57090

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Temporary shutdown of subcluster and cephfs

2022-10-24 Thread Patrick Donnelly
On Wed, Oct 19, 2022 at 7:54 AM Frank Schilder  wrote:
>
> Hi Dan,
>
> I know that "fs fail ..." is not ideal, but we will not have time for a clean 
> "fs down true" and wait for journal flush procedure to complete (on our 
> cluster this takes at least 20 minutes, which is way too long). My question 
> is more along the lines 'Is an "fs fail" destructive?'

It is not but lingering clients will not be evicted automatically by
the MDS. If you can, unmount before doing `fs fail`.

A journal flush is not really necessary. You only should wait ~10
seconds after the last client unmounts to give the MDS time to write
out to its journal any outstanding events.

> , that is, will an FS come up again after
>
> - fs fail
> ...
> - fs set  joinable true

Yes.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS constant high write I/O to the metadata pool

2022-10-14 Thread Patrick Donnelly
Hello Olli,

On Thu, Oct 13, 2022 at 5:01 AM Olli Rajala  wrote:
>
> Hi,
>
> I'm seeing constant 25-50MB/s writes to the metadata pool even when all
> clients and the cluster is idling and in clean state. This surely can't be
> normal?
>
> There's no apparent issues with the performance of the cluster but this
> write rate seems excessive and I don't know where to look for the culprit.
>
> The setup is Ceph 16.2.9 running in hyperconverged 3 node core cluster and
> 6 hdd osd nodes.
>
> Here's typical status when pretty much all clients are idling. Most of that
> write bandwidth and maybe fifth of the write iops is hitting the
> metadata pool.
>
> ---
> root@pve-core-1:~# ceph -s
>   cluster:
> id: 2088b4b1-8de1-44d4-956e-aa3d3afff77f
> health: HEALTH_OK
>
>   services:
> mon: 3 daemons, quorum pve-core-1,pve-core-2,pve-core-3 (age 2w)
> mgr: pve-core-1(active, since 4w), standbys: pve-core-2, pve-core-3
> mds: 1/1 daemons up, 2 standby
> osd: 48 osds: 48 up (since 5h), 48 in (since 4M)
>
>   data:
> volumes: 1/1 healthy
> pools:   10 pools, 625 pgs
> objects: 70.06M objects, 46 TiB
> usage:   95 TiB used, 182 TiB / 278 TiB avail
> pgs: 625 active+clean
>
>   io:
> client:   45 KiB/s rd, 38 MiB/s wr, 6 op/s rd, 287 op/s wr
> ---
>
> Here's some daemonperf dump:
>
> ---
> root@pve-core-1:~# ceph daemonperf mds.`hostname -s`
> mds-
> --mds_cache--- --mds_log-- -mds_mem- ---mds_server--- mds_
> -objecter-- purg
> req  rlat fwd  inos caps exi  imi  hifc crev cgra ctru cfsa cfa  hcc  hccd
> hccr prcr|stry recy recd|subm evts segs repl|ino  dn  |hcr  hcs  hsr  cre
>  cat |sess|actv rd   wr   rdwr|purg|
>  4000  767k  78k   0001610055
>  37 |1.1k   00 | 17  3.7k 1340 |767k 767k| 40500
>  0 |110 |  42   210 |  2
>  5720  767k  78k   0003   16300   11   11
>  0   17 |1.1k   00 | 45  3.7k 1370 |767k 767k| 57800
>  0 |110 |  02   280 |  4
>  5740  767k  78k   0004   34400   34   33
>  2   26 |1.0k   00 |134  3.9k 1390 |767k 767k| 57   1300
>  0 |110 |  02  1120 | 19
>  6730  767k  78k   0006   32600   22   22
>  0   32 |1.1k   00 | 78  3.9k 1410 |767k 768k| 67400
>  0 |110 |  02   560 |  2
> ---
> Any ideas where to look at?

Check the perf dump output of the mds:

ceph tell mds.:0 perf dump

over a period of time to identify what's going on. You can also look
at the objecter_ops (another tell command) for the MDS.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph Leadership Team Meeting Minutes - 2022 Oct 12

2022-10-12 Thread Patrick Donnelly
Hello all,

Here's today's minutes:

- Update on OVH use by Ceph Foundation
- https://tracker.ceph.com/issues/57778 -- release doc and feature/bug
changes out of sync?
  + Create a ceph branch for docs, e.g. pacific-docs (updates with
each release; can be updated out-of-sync with release)
  + Maybe modify doc changes to release branches to include version
the change will be released in; difficulty is keeping the version
numbers accurate due to hotfixes.
  + Need to update readthedocs to use the new branch and update release process.
  + How to backport release notes? Statically edit release branch to
link to /latest? change readthedocs to no longer checkout main release
table. TODO: patrick
- Ceph Virtual 2022 https://github.com/ceph/ceph.io/pull/450
  + https://virtual-event-2022.ceph.io/en/community/events/2022/ceph-virtual/
- Question about k8s host recommendations re: cephfs kernel client
  + Experienced an outage because some k8s cluster (with cephfs pvcs)
was using kernel 5.16.13, which has a known null deref bug, fixed in
5.18. (kernel was coming with Fedora Core OS)
  + What is the recommended k8s host os in the rhel world for ceph kclients?


--
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS Performance and PG/PGP value

2022-10-10 Thread Patrick Donnelly
Hello Yoann,

On Fri, Oct 7, 2022 at 10:51 AM Yoann Moulin  wrote:
>
> Hello,
>
> >> Is 256 good value in our case ? We have 80TB of data with more than 300M 
> >> files.
> >
> > You want at least as many PGs that each of the OSDs host a portion of the 
> > OMAP data. You want to spread out OMAP to as many _fast_ OSDs as possible.
> >
> > I have tried to find an answer to your question: are more metadata PGs 
> > better? I haven't found a definitive answer. This would ideally be tested 
> > in a non-prod / pre-prod environment and tuned
> > to individual requirements (type of workload). For now, I would not blindly 
> > trust the PG autoscaler. I have seen it advise settings that would 
> > definately not be OK. You can skew things in the
> > autoscaler with the "bias" parameter, to compensate for this. But as far as 
> > I know the current heuristics to determine a good value do not take into 
> > account the importance of OMAP (RocksDB)
> > spread accross OSDs. See a blog post about autoscaler tuning [1].
> >
> > It would be great if tuning metadata PGs for CephFS / RGW could be 
> > performed during the "large scale tests" the devs are planning to perform 
> > in the future. With use cases that take into
> > consideration "a lot of small files / objects" versus "loads of large files 
> > / objects" to get a feeling how tuning this impacts performance for 
> > different work loads.
> >
> > Gr. Stefan
> >
> > [1]: https://ceph.io/en/news/blog/2022/autoscaler_tuning/
>
> Thanks for the information, I agree that autoscaler seem to not be able to 
> handle my use case.
> (thanks to icepic...@gmail.com too)
>
> By the way, since I have set PG=256, I have much less SLOW requests than 
> before, even I still have, the impact on my users has been reduced a lot.
>
> > # zgrep -c -E 'WRN.*(SLOW_OPS|SLOW_REQUEST|MDS_SLOW_METADATA_IO)' 
> > floki.log.4.gz floki.log.3.gz floki.log.2.gz floki.log.1.gz floki.log
> > floki.log.4.gz:6883
> > floki.log.3.gz:11794
> > floki.log.2.gz:3391
> > floki.log.1.gz:1180
> > floki.log:122
>
> If I have the opportunity, I will try to run some benchmark with multiple 
> value of the PG on cephfs_metadata pool.

256 sounds like a good number to me. Maybe even 128. If you do some
experiments, please do share the results.

Also, you mentioned you're using 7 active MDS. How's that working out
for you? Do you use pinning?


--
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS MDS sizing

2022-09-12 Thread Patrick Donnelly
On Tue, Sep 6, 2022 at 11:29 AM Vladimir Brik
 wrote:
>
>  > What problem are you actually
>  > trying to solve with that information?
> I suspect that the mds_cache_memory_limit we set (~60GB) is
> sub-optimal and I am wondering if we would be better off if,
> say, we halved the cache limits and doubled the number of
> MDSes. I am looking for metrics to quantify this, and
> cache_hit_rate and others in "dump loads" seem relevant.

There are other indirect ways to measure cache effectiveness. Using
the mds `perf dump` command, you can
look at the objecter.omap_rd to see how often the MDS
goes out to directory objects to read dentries. You can also look at
the mds_mem.ino+ mds_mem.ino- to see how often
inodes go in and out of the cache.


--
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Potential bug in cephfs-data-scan?

2022-08-19 Thread Patrick Donnelly
On Fri, Aug 19, 2022 at 5:02 AM Jesper Lykkegaard Karlsen
 wrote:
>
> Hi,
>
> I have recently been scanning the files in a PG with "cephfs-data-scan 
> pg_files ...".

Why?

> Although, after a long time the scan was still running and the list of files 
> consumed 44 GB, I stopped it, as something obviously was very wrong.
>
> It turns out some users had symlinks that looped and even a user had a 
> symlink to "/".

Symlinks are not stored in the data pool. This should be irrelevant.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Quincy: Corrupted devicehealth sqlite3 database from MGR crashing bug

2022-08-16 Thread Patrick Donnelly
Thank you, that's helpful. I have created a ticket with my findings so far:

https://tracker.ceph.com/issues/57152

Please follow there for updates.

On Mon, Aug 15, 2022 at 4:12 PM Daniel Williams  wrote:
>
> ceph-post-file: a9802e30-0096-410e-b5c0-f2e6d83acfd6
>
> On Tue, Aug 16, 2022 at 3:13 AM Patrick Donnelly  wrote:
>>
>> On Mon, Aug 15, 2022 at 11:39 AM Daniel Williams  wrote:
>> >
>> > Using ubuntu with apt repository from ceph.
>> >
>> > Ok that helped me figure out that it's .mgr not mgr.
>> > # ceph -v
>> > ceph version 17.2.3 (dff484dfc9e19a9819f375586300b3b79d80034d) quincy 
>> > (stable)
>> > # export CEPH_CONF='/etc/ceph/ceph.conf'
>> > # export CEPH_KEYRING='/etc/ceph/ceph.client.admin.keyring'
>> > # export CEPH_ARGS='--log_to_file true --log-file ceph-sqlite.log 
>> > --debug_cephsqlite 20 --debug_ms 1'
>> > # sqlite3
>> > SQLite version 3.31.1 2020-01-27 19:55:54
>> > Enter ".help" for usage hints.
>> > sqlite> .load libcephsqlite.so
>> > sqlite> .open file:///.mgr:devicehealth/main.db?vfs=ceph
>> > sqlite> .tables
>> > Segmentation fault (core dumped)
>> >
>> > # dpkg -l | grep ceph | grep sqlite
>> > ii  libsqlite3-mod-ceph  17.2.3-1focal 
>> >  amd64SQLite3 VFS for Ceph
>> >
>> > Attached ceph-sqlite.log
>>
>> No real good hint in the log unfortunately. I will need the core dump
>> to see where things went wrong. Can you upload it with
>>
>> https://docs.ceph.com/en/quincy/man/8/ceph-post-file/
>>
>> ?
>>
>> --
>> Patrick Donnelly, Ph.D.
>> He / Him / His
>> Principal Software Engineer
>> Red Hat, Inc.
>> GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
>>


-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: The next quincy point release

2022-08-15 Thread Patrick Donnelly
This must go in the next quincy release:

https://github.com/ceph/ceph/pull/47288

but we're still waiting on reviews and final tests before merging into main.

On Mon, Aug 15, 2022 at 11:02 AM Yuri Weinstein  wrote:
>
> We plan to start QE validation for the next quincy point release this week.
>
> Dev leads please tag all PRs needed to be included ("needs-qa") ASAP
> so they can be tested and merged on time.
>
> Thx
> YuriW
>
> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io
>


-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Quincy: Corrupted devicehealth sqlite3 database from MGR crashing bug

2022-08-15 Thread Patrick Donnelly
On Mon, Aug 15, 2022 at 11:39 AM Daniel Williams  wrote:
>
> Using ubuntu with apt repository from ceph.
>
> Ok that helped me figure out that it's .mgr not mgr.
> # ceph -v
> ceph version 17.2.3 (dff484dfc9e19a9819f375586300b3b79d80034d) quincy (stable)
> # export CEPH_CONF='/etc/ceph/ceph.conf'
> # export CEPH_KEYRING='/etc/ceph/ceph.client.admin.keyring'
> # export CEPH_ARGS='--log_to_file true --log-file ceph-sqlite.log 
> --debug_cephsqlite 20 --debug_ms 1'
> # sqlite3
> SQLite version 3.31.1 2020-01-27 19:55:54
> Enter ".help" for usage hints.
> sqlite> .load libcephsqlite.so
> sqlite> .open file:///.mgr:devicehealth/main.db?vfs=ceph
> sqlite> .tables
> Segmentation fault (core dumped)
>
> # dpkg -l | grep ceph | grep sqlite
> ii  libsqlite3-mod-ceph  17.2.3-1focal
>   amd64SQLite3 VFS for Ceph
>
> Attached ceph-sqlite.log

No real good hint in the log unfortunately. I will need the core dump
to see where things went wrong. Can you upload it with

https://docs.ceph.com/en/quincy/man/8/ceph-post-file/

?

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Quincy: Corrupted devicehealth sqlite3 database from MGR crashing bug

2022-08-15 Thread Patrick Donnelly
Hello Daniel,

On Mon, Aug 15, 2022 at 10:38 AM Daniel Williams  wrote:
>
> My managers are crashing reading the sqlite database for deviceheatlth:
> .mgr:devicehealth/main.db-journal
> debug -2> 2022-08-15T11:14:09.184+ 7fa5721b7700  5 cephsqlite:
> Read: (client.53284882) [.mgr:devicehealth/main.db-journal] 0x5601da0c0008
> 4129788~65536
> debug -1> 2022-08-15T11:14:09.184+ 7fa5721b7700  5 client.53284882:
> SimpleRADOSStriper: read: main.db-journal: 4129788~65536
> debug  0> 2022-08-15T11:14:09.200+ 7fa664aca700 -1 *** Caught
> signal (Segmentation fault) **
>
> I upgraded to 17.2.3 but it seems like I'll need to do a sqlite recovery on
> the database, since the devicehealth module is now non-optional.
>
> I tried:
> sqlite3 -cmd '.load libcephsqlite.so' '.open
> file:///mgr:devicehealth/main.db?vfs=ceph'
> but that didn't work
> Error: unable to open database ".open
> file:///mgr:devicehealth/main.db?vfs=ceph": unable to open database file
>
> Any suggestions?

Are you on Ubuntu or CentOS?

You can try to figure out where things are going wrong loading the database via:

env CEPH_ARGS='--log_to_file true --log-file foo.log
--debug_cephsqlite 20 --debug_ms 1'  sqlite3 ...

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS: permissions of the .snap directory do not inherit ACLs

2022-08-09 Thread Patrick Donnelly
Hello Robert,

On Wed, Aug 3, 2022 at 9:32 AM Robert Sander
 wrote:
>
> Hi,
>
> when using CephFS with POSIX ACLs I noticed that the .snap directory
> does not inherit the ACLs from its parent but only the standard UNIX
> permissions.
>
> This results in a permission denied error when users want to access
> snapshots in that directory because they are not the owner or in the
> group. They do have access to the directory via a group that is listed
> in the POSIX ACLs.
>
> Is this a known bug in 16.2.10?

It sounds like a bug. Please create a tracker ticket with details
about your environment and an example.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS standby-replay has more dns/inos/dirs than the active mds

2022-07-19 Thread Patrick Donnelly
You're probably seeing this bug: https://tracker.ceph.com/issues/48673

Sorry I've not had time to finish a fix for it yet. Hopefully soon...

On Tue, Jul 19, 2022 at 5:43 PM Bryan Stillwell  wrote:
>
> We have a cluster using multiple filesystems on Pacific (16.2.7) and even 
> though we have mds_cache_memory_limit set to 80 GiB one of the MDS daemons is 
> using 123.1 GiB.  This MDS is actually the standby-replay MDS and I'm 
> wondering if it's because it's using more dns/inos/dirs than the active MDS?:
>
> $ sudo ceph fs status cephfs19
> cephfs19 - 28 clients
> 
> RANK  STATE   MDS  ACTIVITY DNSINOS   DIRS   CAPS
>  0active  ceph006b  Reqs: 2879 /s  27.8M  27.8M  3490k  7767k
> 0-s   standby-replay  ceph008a  Evts: 1446 /s  40.1M  40.0M  6259k 0
>
> Shouldn't the standby-replay MDS daemons have similar stats to the active MDS 
> they're protecting?  What could be causing this to happen?
>
> Thanks,
> Bryan
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Single vs multiple cephfs file systems pros and cons

2022-07-19 Thread Patrick Donnelly
On Fri, Jul 15, 2022 at 1:46 PM Vladimir Brik
 wrote:
>
> Hello
>
> When would it be a good idea to use multiple smaller cephfs
> filesystems (in the same cluster) instead a big single one
> with active-active MDSs?
>
> I am migrating about 900M files from Lustre to Ceph and I am
> wondering if I should use a single file system or two
> filesystems. Right now the only significant benefit of using
> multiple cephfs filesystems I see is that a metadata scrub
> wouldn't take as long.
>
> Do people have other thoughts about single vs multiple
> filesystems?

Major consideration points: cost of having multiple MDS running (more
memory/cpu used), inability to move files between the two hierarchies
without full copies, and straightforward scaling w/ different file
systems.

Active-active file systems can often function in a similar way with
subtree pinning without the drawbacks.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS upgrade to Quincy

2022-05-18 Thread Patrick Donnelly
Hi Jimmy,

On Fri, Apr 22, 2022 at 11:02 AM Jimmy Spets  wrote:
>
> Does cephadm automatically reduce ranks to 1 or does that have to be done
> manually?

Automatically.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS upgrade to Quincy

2022-04-21 Thread Patrick Donnelly
On Wed, Apr 20, 2022 at 8:29 AM Chris Palmer  wrote:
>
> The Quincy release notes state that "MDS upgrades no longer require all
> standby MDS daemons to be stoped before upgrading a file systems's sole
> active MDS." but the "Upgrading non-cephadm clusters" instructions still
> include reducing ranks to 1, upgrading, then raising it again.

The instructions are correct? For both cephadm and non-cephadm
clusters, it is necessary to reduce max_mds to 1 (ranks to 1) before
doing an upgrade. The change noted in the release is that standby MDS
(MDS not holding a rank) no longer need to be stopped / shutdown.

> Does the new feature only apply once you have upgraded to Quincy, or do
> the MDS upgrade notes need adjusting now? (We're upgrading from Pacific).

It's only necessary to upgrade the monitors for the feature to be
available (which is already the first thing you do when upgrading a
cluster).

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: v17.2.0 Quincy released

2022-04-20 Thread Patrick Donnelly
On Wed, Apr 20, 2022 at 7:22 AM Stefan Kooman  wrote:
>
> On 4/20/22 03:36, David Galloway wrote:
> > We're very happy to announce the first stable release of the Quincy series.
> >
> > We encourage you to read the full release notes at
> > https://ceph.io/en/news/blog/2022/v17-2-0-quincy-released/
>
> When upgrading a MDS to 16.2.7 in a non cephadm environment you should
> set "ceph config set mon mon_mds_skip_sanity 1". And after the upgrade
> remove it.

Yes!

> I'm wondering if the same step is needed when upgrading from Octopus
> (and or pacific < 16.2.7) to Quincy? It's not in the release notes, but
> just double checking here [1].

Yes it is necessary.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Upgrade 16.2.6 -> 16.2.7 - MON assertion failure

2022-01-31 Thread Patrick Donnelly
Hi Chris,

On Thu, Dec 9, 2021 at 10:40 AM Chris Palmer  wrote:
>
> Hi
>
> I've just started an upgrade of a test cluster from 16.2.6 -> 16.2.7 and
> immediately hit a problem.
>
> The cluster started as octopus, and has upgraded through to 16.2.6
> without any trouble. It is a conventional deployment on Debian 10, NOT
> using cephadm. All was clean before the upgrade. It contains nodes as
> follows:
> - Node 1: MON, MGR, MDS, RGW
> - Node 2: MON, MGR, MDS, RGW
> - Node 3: MON
> - Node 4-6: OSDs
>
> In the absence of any specific upgrade instructions for 16.2.7, I
> upgraded Node 1 and rebooted. The MON on that host will now not start,
> throwing the following assertion:
>
> 2021-12-09T14:56:40.098+00:00 tstmon01 ceph-mon[960]: 
> /build/ceph-16.2.7/src/mds/FSMap.cc: In function 'void FSMap::sanity(bool) 
> const' thread 7f2d309085c0 time 2021-12-09T14:56:40.098395+
> 2021-12-09T14:56:40.098+00:00 tstmon01 ceph-mon[960]: 
> /build/ceph-16.2.7/src/mds/FSMap.cc: 868: FAILED 
> ceph_assert(info.compat.writeable(fs->mds_map.compat))
> 2021-12-09T14:56:40.103+00:00 tstmon01 ceph-mon[960]:  ceph version 
> 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific (stable)
> 2021-12-09T14:56:40.103+00:00 tstmon01 ceph-mon[960]:  1: 
> (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14b) 
> [0x7f2d3222423c]
> 2021-12-09T14:56:40.103+00:00 tstmon01 ceph-mon[960]:  2: 
> /usr/lib/ceph/libceph-common.so.2(+0x277414) [0x7f2d32224414]
> 2021-12-09T14:56:40.103+00:00 tstmon01 ceph-mon[960]:  3: 
> (FSMap::sanity(bool) const+0x2a8) [0x7f2d327331c8]
> 2021-12-09T14:56:40.103+00:00 tstmon01 ceph-mon[960]:  4: 
> (MDSMonitor::update_from_paxos(bool*)+0x396) [0x55a32fe6b546]
> 2021-12-09T14:56:40.103+00:00 tstmon01 ceph-mon[960]:  5: 
> (PaxosService::refresh(bool*)+0x10a) [0x55a32fd960ca]
> 2021-12-09T14:56:40.103+00:00 tstmon01 ceph-mon[960]:  6: 
> (Monitor::refresh_from_paxos(bool*)+0x17c) [0x55a32fc54bec]
> 2021-12-09T14:56:40.103+00:00 tstmon01 ceph-mon[960]:  7: 
> (Monitor::init_paxos()+0xfc) [0x55a32fc54e9c]
> 2021-12-09T14:56:40.103+00:00 tstmon01 ceph-mon[960]:  8: 
> (Monitor::preinit()+0xbb9) [0x55a32fc7eb09]
> 2021-12-09T14:56:40.103+00:00 tstmon01 ceph-mon[960]:  9: main()
> 2021-12-09T14:56:40.103+00:00 tstmon01 ceph-mon[960]:  10: 
> __libc_start_main()
> 2021-12-09T14:56:40.103+00:00 tstmon01 ceph-mon[960]:  11: _start()

I just want to followup that this is indeed a new bug (not an existing
bug as I originally thought!). The tracker ticket is here:
https://tracker.ceph.com/issues/54081

Sorry you ran across it!

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephfs: [ERR] loaded dup inode

2022-01-20 Thread Patrick Donnelly
Hi Frank,

On Tue, Jan 18, 2022 at 4:54 AM Frank Schilder  wrote:
>
> Hi Dan and Patrick,
>
> this problem seems to develop into a nightmare. I executed a find on the file 
> system and had some initial success. The number of stray files dropped by 
> about 8%. Unfortunately, this is about it. I'm running a find now also on 
> snap dirs, but I don't have much hope. There must be a way to find out what 
> is accumulating in the stray buckets. As I wrote in another reply to this 
> thread, I can't dump the trees:
>
> > I seem to have a problem. I cannot dump the mds tree:
> >
> > [root@ceph-08 ~]# ceph daemon mds.ceph-08 dump tree '~mdsdir/stray0'
> > root inode is not in cache
> > [root@ceph-08 ~]# ceph daemon mds.ceph-08 dump tree '~mds0/stray0'
> > root inode is not in cache
> > [root@ceph-08 ~]# ceph daemon mds.ceph-08 dump tree '~mds0' 0
> > root inode is not in cache
> > [root@ceph-08 ~]# ceph daemon mds.ceph-08 dump tree '~mdsdir' 0
> > root inode is not in cache
> >
> > [root@ceph-08 ~]# ceph daemon mds.ceph-08 get subtrees | grep path
> > "path": "",
> > "path": "~mds0",
> >
>
> However, this information is somewhere in rados objects and it should be 
> possible to figure something out similar to
>
> # rados getxattr --pool=con-fs2-meta1  parent | ceph-dencoder type 
> inode_backtrace_t import - decode dump_json
> # rados listomapkeys --pool=con-fs2-meta1 
>
> What OBJ_IDs am I looking for? How and where can I start to traverse the 
> structure? Version is mimic latest stable.

You mentioned you have snapshots? If you've deleted the directories
that have been snapshotted then they stick around in the stray
directory until the snapshot is deleted. There's no way to force
purging until the snapshot is also deleted. For this reason, the stray
directory size can grow without bound. You need to either upgrade to
Pacific where the stray directory will be fragmented or remove the
snapshots.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephfs: [ERR] loaded dup inode

2022-01-16 Thread Patrick Donnelly
Hi Dan,

On Fri, Jan 14, 2022 at 6:32 AM Dan van der Ster  wrote:
> We had this long ago related to a user generating lots of hard links.
> Snapshots will have a similar effect.
> (in these cases, if a user deletes the original file, the file goes
> into stray until it is "reintegrated").
>
> If you can find the dir where they're working, `ls -lR` will force
> those to reintegrate (you will see because the num strays will drop
> back down).
> You might have to ls -lR in a snap directory, or in the current tree
> -- you have to browse around and experiment.
>
> pacific does this re-integration automatically.

This reintegration is still not automatic (i.e. the MDS does not have
a mechanism (yet) for hunting for the dentry to do reintegration).
The next version (planned) of Pacific will have reintegration
triggered by recursive scrub:

https://github.com/ceph/ceph/pull/44514

which is significantly less disruptive than `ls -lR` or `find`.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


  1   2   3   >