[ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported"
On 11/23/23 11:25, zxcs wrote: Thanks a ton, Xiubo! it not disappear. even we umount the ceph directory on these two old os node. after dump ops flight , we can see some request, and the earliest complain “failed to authpin, subtree is being exported" And how to avoid this, would you please help to shed some light here? Okay, as Frank mentioned you can try to disable the balancer by pining the directories. As I remembered the balancer is buggy. And also you can raise one ceph tracker and provide the debug logs if you have. Thanks - Xiubo Thanks, xz 2023年11月22日 19:44,Xiubo Li 写道: On 11/22/23 16:02, zxcs wrote: HI, Experts, we are using cephfs with 16.2.* with multi active mds, and recently, we have two nodes mount with ceph-fuse due to the old os system. and one nodes run a python script with `glob.glob(path)`, and another client doing `cp` operation on the same path. then we see some log about `mds slow request`, and logs complain “failed to authpin, subtree is being exported" then need to restart mds, our question is, does there any dead lock? how can we avoid this and how to fix it without restart mds(it will influence other users) ? BTW, won't the slow requests disappear themself later ? It looks like the exporting is slow or there too many exports are going on. Thanks - Xiubo Thanks a ton! xz ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: How to use hardware
Le 20/11/2023 à 09:24:41+, Frank Schilder a écrit Hi, Thanks everyone for your answer. > > we are using something similar for ceph-fs. For a backup system your setup > can work, depending on how you back up. While HDD pools have poor IOP/s > performance, they are very good for streaming workloads. If you are using > something like Borg backup that writes huge files sequentially, a HDD > back-end should be OK. > Ok. Good to know > Here some things to consider and try out: > > 1. You really need to get a bunch of enterprise SSDs with power loss > protection for the FS meta data pool (disable write cache if enabled, this > will disable volatile write cache and switch to protected caching). We are > using (formerly Intel) 1.8T SATA drives that we subdivide into 4 OSDs each to > raise performance. Place the meta-data pool and the primary data pool on > these disks. Create a secondary data pool on the HDDs and assign it to the > root *before* creating anything on the FS (see the recommended 3-pool layout > for ceph file systems in the docs). I would not even consider running this > without SSDs. 1 such SSD per host is the minimum, 2 is better. If Borg or > whatever can make use of a small fast storage directory, assign a sub-dir of > the root to the primary data pool. OK. I will see what I can do. > > 2. Calculate with sufficient extra disk space. As long as utilization stays > below 60-70% bluestore will try to make large object writes sequential, which > is really important for HDDs. On our cluster we currently have 40% > utilization and I get full HDD bandwidth out for large sequential > reads/writes. Make sure your backup application makes large sequential IO > requests. > > 3. As Anthony said, add RAM. You should go for 512G on 50 HDD-nodes. You can > run the MDS daemons on the OSD nodes. Set a reasonable cache limit and use > ephemeral pinning. Depending on the CPUs you are using, 48 cores can be > plenty. The latest generation Intel Xeon Scalable Processors is so efficient > with ceph that 1HT per HDD is more than enough. Yes I get 512G on each node, 64 core on each server. > > 4. 3 MON+MGR nodes are sufficient. You can do something else with the > remaining 2 nodes. Of course, you can use them as additional MON+MGR nodes. > We also use 5 and it improves maintainability a lot. > Ok thanks. > Something more exotic if you have time: > > 5. To improve sequential performance further, you can experiment with larger > min_alloc_sizes for OSDs (on creation time, you will need to scrap and > re-deploy the cluster to test different values). Every HDD has a preferred > IO-size for which random IO achieves nearly the same band-with as sequential > writes. (But see 7.) > > 6. On your set-up you will probably go for a 4+2 EC data pool on HDD. With > object size 4M the max. chunk size per OSD will be 1M. For many HDDs this is > the preferred IO size (usually between 256K-1M). (But see 7.) > > 7. Important: large min_alloc_sizes are only good if your workload *never* > modifies files, but only replaces them. A bit like a pool without EC > overwrite enabled. The implementation of EC overwrites has a "feature" that > can lead to massive allocation amplification. If your backup workload does > modifications to files instead of adding new+deleting old, do *not* > experiment with options 5.-7. Instead, use the default and make sure you have > sufficient unused capacity to increase the chances for large bluestore writes > (keep utilization below 60-70% and just buy extra disks). A workload with > large min_alloc_sizes has to be S3-like, only upload, download and delete are > allowed. Thankt a lot for those tips. I'm newbie with ceph so it's going to take sometime before I understand everything you say. Best regards -- Albert SHIH 嶺 France Heure locale/Local time: jeu. 23 nov. 2023 08:32:20 CET ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: ceph-exporter binds to IPv4 only
On 22-11-2023 15:54, Stefan Kooman wrote: Hi, In a IPv6 only deployment the ceph-exporter daemons are not listening on IPv6 address(es). This can be fixed by editing the unit.run file of the ceph-exporter by changing "--addrs=0.0.0.0" to "--addrs=::". Is this configurable? So that cephadm deploys ceph-exporter with proper unit.run arguments? Related issue: https://tracker.ceph.com/issues/62220 A different fix is chosen as opposed to https://github.com/ceph/ceph/pull/54285/. Maybe better to remove the IPv4/IPv6 distinction and make the code IP family agnostic (i.e. go for the fix in 54285)? Gr. Stefan ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: ceph fs (meta) data inconsistent
Hi Frank, Locally I had some test by using the copy2 and copy, but they all worked well for me. Could you write a reproducing script ? Thanks - Xiubo On 11/10/23 22:53, Frank Schilder wrote: It looks like the cap update request was dropped to the ground in MDS. [...] If you can reproduce it, then please provide the mds logs by setting: [...] I can do a test with MDS logs on high level. Before I do that, looking at the python findings above, is this something that should work on ceph or is it a python issue? Not sure yet. I need to understand what exactly shutil.copy does in kclient. Thanks! Will wait for further instructions. = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Xiubo Li Sent: Friday, November 10, 2023 3:14 AM To: Frank Schilder; Gregory Farnum Cc: ceph-users@ceph.io Subject: Re: [ceph-users] Re: ceph fs (meta) data inconsistent On 11/10/23 00:18, Frank Schilder wrote: Hi Xiubo, I will try to answer questions from all your 3 e-mails here together with some new information we have. New: The problem occurs in newer python versions when using the shutil.copy function. There is also a function shutil.copy2 for which the problem does not show up. Copy2 behaves a bit like "cp -p" while copy is like "cp". The only code difference (linux) between these 2 functions is that copy calls copyfile+copymode while copy2 calls copyfile+copystat. For now we asked our users to use copy2 to avoid the issue. The copyfile function calls _fastcopy_sendfile on linux, which in turn calls os.sendfile, which seems to be part of libc: #include ssize_t sendfile(int out_fd, int in_fd, off_t *offset, size_t count); I'm wondering if using this function requires explicit meta-data updates or should be safe on ceph-fs. I'm also not sure if a user-space client even supports this function (seems to be meaningless). Should this function be safe to use on ceph kclient? I didn't foresee any limit for this in kclient. The shutil.copy will only copy the contents of the file, while the shutil.copy2 will also copy the metadata. I need to know what exactly they do in kclient for shutil.copy and shutil.copy2. Answers to questions: BTW, have you test the ceph-fuse with the same test ? Is also the same ? I don't have fuse clients available, so can't test right now. Have you tried other ceph version ? We are in the process of deploying a new test cluster, the old one is scrapped already. I can't test this at the moment. It looks like the cap update request was dropped to the ground in MDS. [...] If you can reproduce it, then please provide the mds logs by setting: [...] I can do a test with MDS logs on high level. Before I do that, looking at the python findings above, is this something that should work on ceph or is it a python issue? Not sure yet. I need to understand what exactly shutil.copy does in kclient. Thanks - Xiubo Thanks for your help! = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: ceph fs (meta) data inconsistent
I just raised one tracker to follow this: https://tracker.ceph.com/issues/63510 Thanks - Xiubo On 11/10/23 22:53, Frank Schilder wrote: It looks like the cap update request was dropped to the ground in MDS. [...] If you can reproduce it, then please provide the mds logs by setting: [...] I can do a test with MDS logs on high level. Before I do that, looking at the python findings above, is this something that should work on ceph or is it a python issue? Not sure yet. I need to understand what exactly shutil.copy does in kclient. Thanks! Will wait for further instructions. = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Xiubo Li Sent: Friday, November 10, 2023 3:14 AM To: Frank Schilder; Gregory Farnum Cc: ceph-users@ceph.io Subject: Re: [ceph-users] Re: ceph fs (meta) data inconsistent On 11/10/23 00:18, Frank Schilder wrote: Hi Xiubo, I will try to answer questions from all your 3 e-mails here together with some new information we have. New: The problem occurs in newer python versions when using the shutil.copy function. There is also a function shutil.copy2 for which the problem does not show up. Copy2 behaves a bit like "cp -p" while copy is like "cp". The only code difference (linux) between these 2 functions is that copy calls copyfile+copymode while copy2 calls copyfile+copystat. For now we asked our users to use copy2 to avoid the issue. The copyfile function calls _fastcopy_sendfile on linux, which in turn calls os.sendfile, which seems to be part of libc: #include ssize_t sendfile(int out_fd, int in_fd, off_t *offset, size_t count); I'm wondering if using this function requires explicit meta-data updates or should be safe on ceph-fs. I'm also not sure if a user-space client even supports this function (seems to be meaningless). Should this function be safe to use on ceph kclient? I didn't foresee any limit for this in kclient. The shutil.copy will only copy the contents of the file, while the shutil.copy2 will also copy the metadata. I need to know what exactly they do in kclient for shutil.copy and shutil.copy2. Answers to questions: BTW, have you test the ceph-fuse with the same test ? Is also the same ? I don't have fuse clients available, so can't test right now. Have you tried other ceph version ? We are in the process of deploying a new test cluster, the old one is scrapped already. I can't test this at the moment. It looks like the cap update request was dropped to the ground in MDS. [...] If you can reproduce it, then please provide the mds logs by setting: [...] I can do a test with MDS logs on high level. Before I do that, looking at the python findings above, is this something that should work on ceph or is it a python issue? Not sure yet. I need to understand what exactly shutil.copy does in kclient. Thanks - Xiubo Thanks for your help! = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported"
Thanks a ton, Xiubo! it not disappear. even we umount the ceph directory on these two old os node. after dump ops flight , we can see some request, and the earliest complain “failed to authpin, subtree is being exported" And how to avoid this, would you please help to shed some light here? Thanks, xz > 2023年11月22日 19:44,Xiubo Li 写道: > > > On 11/22/23 16:02, zxcs wrote: >> HI, Experts, >> >> we are using cephfs with 16.2.* with multi active mds, and recently, we >> have two nodes mount with ceph-fuse due to the old os system. >> >> and one nodes run a python script with `glob.glob(path)`, and another >> client doing `cp` operation on the same path. >> >> then we see some log about `mds slow request`, and logs complain “failed to >> authpin, subtree is being exported" >> >> then need to restart mds, >> >> >> our question is, does there any dead lock? how can we avoid this and how to >> fix it without restart mds(it will influence other users) ? > > BTW, won't the slow requests disappear themself later ? > > It looks like the exporting is slow or there too many exports are going on. > > Thanks > > - Xiubo > >> >> Thanks a ton! >> >> >> xz >> ___ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] ceph-exporter binds to IPv4 only
Hi, In a IPv6 only deployment the ceph-exporter daemons are not listening on IPv6 address(es). This can be fixed by editing the unit.run file of the ceph-exporter by changing "--addrs=0.0.0.0" to "--addrs=::". Is this configurable? So that cephadm deploys ceph-exporter with proper unit.run arguments? Gr. Stefan ... who really thinks the Ceph test lab should have an IPv6 only test environment to catch these things ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] CephFS - MDS removed from map - filesystem keeps to be stopped
Hi running Ceph Pacific 16.2.13. we had full CephFS filesystem and after adding new HW we tried to start it but our MDS daemons are pushed to be standby and are removed from MDS map. Filesystem was broken, so we repaired it with: # ceph fs fail cephfs # cephfs-journal-tool --rank=cephfs:0 event recover_dentries summary # cephfs-journal-tool --rank=cephfs:0 journal reset then I started ceph-mds service and marked rank as repaired mds after some time has switched to standby. Log is bellow. I would appreciate any help to resolve this situation. Thank you. from log: 2023-11-22T14:11:49.212+0100 7f5dc155e700 1 mds.0.9604 handle_mds_map i am now mds.0.9604 2023-11-22T14:11:49.212+0100 7f5dc155e700 1 mds.0.9604 handle_mds_map state change up:rejoin --> up:active 2023-11-22T14:11:49.212+0100 7f5dc155e700 1 mds.0.9604 recovery_done -- successful recovery! 2023-11-22T14:11:49.212+0100 7f5dc155e700 1 mds.0.9604 active_start 2023-11-22T14:11:49.216+0100 7f5dc155e700 1 mds.0.9604 cluster recovered. 2023-11-22T14:11:49.216+0100 7f5dc3d63700 0 --1- [v2:10.245.4.103:6800/1548097835,v1:10.245.4.103:6801/1548097835] >> v1:10.245.8.127:0/2123529386 conn(0x55a60627a800 0x55a606e5b000 :6801 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0). handle_connect_message_2 accept peer reset, then tried to connect to us, replacing 2023-11-22T14:11:49.216+0100 7f5dc4564700 0 --1- [v2:10.245.4.103:6800/1548097835,v1:10.245.4.103:6801/1548097835] >> v1:10.245.6.88:0/1899426587 conn(0x55a60627ac00 0x55a6070d :6801 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).h andle_connect_message_2 accept peer reset, then tried to connect to us, replacing 2023-11-22T14:11:49.216+0100 7f5dc4564700 0 --1- [v2:10.245.4.103:6800/1548097835,v1:10.245.4.103:6801/1548097835] >> v1:10.245.4.216:0/2058542052 conn(0x55a6070c9800 0x55a6070d1800 :6801 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0). handle_connect_message_2 accept peer reset, then tried to connect to us, replacing 2023-11-22T14:11:49.216+0100 7f5dc3d63700 0 --1- [v2:10.245.4.103:6800/1548097835,v1:10.245.4.103:6801/1548097835] >> v1:10.245.4.220:0/1549374180 conn(0x55a60708d000 0x55a6070d0800 :6801 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0). handle_connect_message_2 accept peer reset, then tried to connect to us, replacing 2023-11-22T14:11:49.216+0100 7f5dc4d65700 0 --1- [v2:10.245.4.103:6800/1548097835,v1:10.245.4.103:6801/1548097835] >> v1:10.245.8.180:0/270666178 conn(0x55a60703a000 0x55a6070cf800 :6801 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).h andle_connect_message_2 accept peer reset, then tried to connect to us, replacing 2023-11-22T14:11:49.216+0100 7f5dc4d65700 0 --1- [v2:10.245.4.103:6800/1548097835,v1:10.245.4.103:6801/1548097835] >> v1:10.245.8.178:0/3673271488 conn(0x55a6070c9400 0x55a6070d1000 :6801 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0). handle_connect_message_2 accept peer reset, then tried to connect to us, replacing 2023-11-22T14:11:49.216+0100 7f5dc4d65700 0 --1- [v2:10.245.4.103:6800/1548097835,v1:10.245.4.103:6801/1548097835] >> v1:10.245.4.167:0/2667964940 conn(0x55a6070c9c00 0x55a607112000 :6801 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0). handle_connect_message_2 accept peer reset, then tried to connect to us, replacing 2023-11-22T14:11:49.216+0100 7f5dc3d63700 0 --1- [v2:10.245.4.103:6800/1548097835,v1:10.245.4.103:6801/1548097835] >> v1:10.245.6.70:0/3181830075 conn(0x55a607116000 0x55a607112800 :6801 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).h andle_connect_message_2 accept peer reset, then tried to connect to us, replacing 2023-11-22T14:11:49.216+0100 7f5dc4564700 0 --1- [v2:10.245.4.103:6800/1548097835,v1:10.245.4.103:6801/1548097835] >> v1:10.245.6.72:0/3744737352 conn(0x55a60627a800 0x55a606e5b000 :6801 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).h andle_connect_message_2 accept peer reset, then tried to connect to us, replacing 2023-11-22T14:11:49.216+0100 7f5dc3d63700 0 --1- [v2:10.245.4.103:6800/1548097835,v1:10.245.4.103:6801/1548097835] >> v1:10.244.18.140:0/1607447464 conn(0x55a60627ac00 0x55a6070d :6801 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0) .handle_connect_message_2 accept peer reset, then tried to connect to us, replacing 2023-11-22T14:11:49.220+0100 7f5dc155e700 1 mds.mds1 Updating MDS map to version 9608 from mon.1 2023-11-22T14:11:49.220+0100 7f5dc155e700 1 mds.0.9604 handle_mds_map i am now mds.0.9604 2023-11-22T14:11:49.220+0100 7f5dc155e700 1 mds.0.9604 handle_mds_map state change up:active --> up:stopping 2023-11-22T14:11:52.412+0100 7f5dc3562700 1 mds.mds1 asok_command: client ls {prefix=client ls} (starting...) 2023-11-22T14:11:57.412+0100 7f5dc3562700 1 mds.mds1 asok_command: client ls {prefix=client ls} (starting...) 2023-11-22T14:12:02.416+0100 7f5dc3562700 1 mds.mds1 asok_command: client ls {prefix=client ls} (starting...) 2023-11-22T14:12:07.420+0100 7f5dc3562700 1 mds.mds1 asok_command:
[ceph-users] Re: Ceph 16.2.14: ceph-mgr getting oom-killed
Thanks for this. This looks similar to what we're observing. Although we don't use the API apart from the usage by Ceph deployment itself - which I guess still counts. /Z On Wed, 22 Nov 2023, 15:22 Adrien Georget, wrote: > Hi, > > This memory leak with ceph-mgr seems to be due to a change in Ceph 16.2.12. > Check this issue : https://tracker.ceph.com/issues/59580 > We are also affected by this, with or without containerized services. > > Cheers, > Adrien > > Le 22/11/2023 à 14:14, Eugen Block a écrit : > > One other difference is you use docker, right? We use podman, could it > > be some docker restriction? > > > > Zitat von Zakhar Kirpichenko : > > > >> It's a 6-node cluster with 96 OSDs, not much I/O, mgr . Each node has > >> 384 > >> GB of RAM, each OSD has a memory target of 16 GB, about 100 GB of > >> memory, > >> give or take, is available (mostly used by page cache) on each node > >> during > >> normal operation. Nothing unusual there, tbh. > >> > >> No unusual mgr modules or settings either, except for disabled progress: > >> > >> { > >> "always_on_modules": [ > >> "balancer", > >> "crash", > >> "devicehealth", > >> "orchestrator", > >> "pg_autoscaler", > >> "progress", > >> "rbd_support", > >> "status", > >> "telemetry", > >> "volumes" > >> ], > >> "enabled_modules": [ > >> "cephadm", > >> "dashboard", > >> "iostat", > >> "prometheus", > >> "restful" > >> ], > >> > >> /Z > >> > >> On Wed, 22 Nov 2023, 14:52 Eugen Block, wrote: > >> > >>> What does your hardware look like memory-wise? Just for comparison, > >>> one customer cluster has 4,5 GB in use (middle-sized cluster for > >>> openstack, 280 OSDs): > >>> > >>> PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ > >>> COMMAND > >>> 6077 ceph 20 0 6357560 4,522g 22316 S 12,00 1,797 > >>> 57022:54 ceph-mgr > >>> > >>> In our own cluster (smaller than that and not really heavily used) the > >>> mgr uses almost 2 GB. So those numbers you have seem relatively small. > >>> > >>> Zitat von Zakhar Kirpichenko : > >>> > >>> > I've disabled the progress module entirely and will see how it goes. > >>> > Otherwise, mgr memory usage keeps increasing slowly, from past > >>> experience > >>> > it will stabilize at around 1.5-1.6 GB. Other than this event > >>> warning, > >>> it's > >>> > unclear what could have caused random memory ballooning. > >>> > > >>> > /Z > >>> > > >>> > On Wed, 22 Nov 2023 at 13:07, Eugen Block wrote: > >>> > > >>> >> I see these progress messages all the time, I don't think they cause > >>> >> it, but I might be wrong. You can disable it just to rule that out. > >>> >> > >>> >> Zitat von Zakhar Kirpichenko : > >>> >> > >>> >> > Unfortunately, I don't have a full stack trace because there's no > >>> crash > >>> >> > when the mgr gets oom-killed. There's just the mgr log, which > >>> looks > >>> >> > completely normal until about 2-3 minutes before the oom-kill, > >>> when > >>> >> > tmalloc warnings show up. > >>> >> > > >>> >> > I'm not sure that it's the same issue that is described in the > >>> tracker. > >>> >> We > >>> >> > seem to have some stale "events" in the progress module though: > >>> >> > > >>> >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug > >>> 2023-11-21T14:56:30.718+ > >>> >> > 7f4bb19ef700 0 [progress WARNING root] complete: ev > >>> >> > cacc4230-75ee-4892-b8fd-a19fec8f9f66 does not exist > >>> >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug > >>> 2023-11-21T14:56:30.718+ > >>> >> > 7f4bb19ef700 0 [progress WARNING root] complete: ev > >>> >> > 44824331-3f6b-45c4-b925-423d098c3c76 does not exist > >>> >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug > >>> 2023-11-21T14:56:30.718+ > >>> >> > 7f4bb19ef700 0 [progress WARNING root] complete: ev > >>> >> > 0139bc54-ae42-4483-b278-851d77f23f9f does not exist > >>> >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug > >>> 2023-11-21T14:56:30.718+ > >>> >> > 7f4bb19ef700 0 [progress WARNING root] complete: ev > >>> >> > f9d6c20e-b8d8-4625-b9cf-84da1244c822 does not exist > >>> >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug > >>> 2023-11-21T14:56:30.718+ > >>> >> > 7f4bb19ef700 0 [progress WARNING root] complete: ev > >>> >> > 1486b26d-2a23-4416-a864-2cbb0ecf1429 does not exist > >>> >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug > >>> 2023-11-21T14:56:30.718+ > >>> >> > 7f4bb19ef700 0 [progress WARNING root] complete: ev > >>> >> > 7f14d01c-498c-413f-b2ef-05521050190a does not exist > >>> >> > Nov 21 14:57:35 ceph01 bash[3941523]: debug > >>> 2023-11-21T14:57:35.950+ > >>> >> > 7f4bb19ef700 0 [progress WARNING root] complete: ev > >>> >> > 48cbd97f-82f7-4b80-8086-890fff6e0824 does not exist > >>> >> > > >>> >> > I tried clearing them but they keep showing up. I am wondering if > >>> these > >>> >> > missing events can cause memory leaks over time. > >>> >> > > >>>
[ceph-users] Re: Ceph 16.2.14: ceph-mgr getting oom-killed
Yes, we use docker, though we haven't had any issues because of it. I don't think that docker itself can cause mgr memory leaks. /Z On Wed, 22 Nov 2023, 15:14 Eugen Block, wrote: > One other difference is you use docker, right? We use podman, could it > be some docker restriction? > > Zitat von Zakhar Kirpichenko : > > > It's a 6-node cluster with 96 OSDs, not much I/O, mgr . Each node has 384 > > GB of RAM, each OSD has a memory target of 16 GB, about 100 GB of memory, > > give or take, is available (mostly used by page cache) on each node > during > > normal operation. Nothing unusual there, tbh. > > > > No unusual mgr modules or settings either, except for disabled progress: > > > > { > > "always_on_modules": [ > > "balancer", > > "crash", > > "devicehealth", > > "orchestrator", > > "pg_autoscaler", > > "progress", > > "rbd_support", > > "status", > > "telemetry", > > "volumes" > > ], > > "enabled_modules": [ > > "cephadm", > > "dashboard", > > "iostat", > > "prometheus", > > "restful" > > ], > > > > /Z > > > > On Wed, 22 Nov 2023, 14:52 Eugen Block, wrote: > > > >> What does your hardware look like memory-wise? Just for comparison, > >> one customer cluster has 4,5 GB in use (middle-sized cluster for > >> openstack, 280 OSDs): > >> > >> PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ > >> COMMAND > >> 6077 ceph 20 0 6357560 4,522g 22316 S 12,00 1,797 > >> 57022:54 ceph-mgr > >> > >> In our own cluster (smaller than that and not really heavily used) the > >> mgr uses almost 2 GB. So those numbers you have seem relatively small. > >> > >> Zitat von Zakhar Kirpichenko : > >> > >> > I've disabled the progress module entirely and will see how it goes. > >> > Otherwise, mgr memory usage keeps increasing slowly, from past > experience > >> > it will stabilize at around 1.5-1.6 GB. Other than this event warning, > >> it's > >> > unclear what could have caused random memory ballooning. > >> > > >> > /Z > >> > > >> > On Wed, 22 Nov 2023 at 13:07, Eugen Block wrote: > >> > > >> >> I see these progress messages all the time, I don't think they cause > >> >> it, but I might be wrong. You can disable it just to rule that out. > >> >> > >> >> Zitat von Zakhar Kirpichenko : > >> >> > >> >> > Unfortunately, I don't have a full stack trace because there's no > >> crash > >> >> > when the mgr gets oom-killed. There's just the mgr log, which looks > >> >> > completely normal until about 2-3 minutes before the oom-kill, when > >> >> > tmalloc warnings show up. > >> >> > > >> >> > I'm not sure that it's the same issue that is described in the > >> tracker. > >> >> We > >> >> > seem to have some stale "events" in the progress module though: > >> >> > > >> >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug > >> 2023-11-21T14:56:30.718+ > >> >> > 7f4bb19ef700 0 [progress WARNING root] complete: ev > >> >> > cacc4230-75ee-4892-b8fd-a19fec8f9f66 does not exist > >> >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug > >> 2023-11-21T14:56:30.718+ > >> >> > 7f4bb19ef700 0 [progress WARNING root] complete: ev > >> >> > 44824331-3f6b-45c4-b925-423d098c3c76 does not exist > >> >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug > >> 2023-11-21T14:56:30.718+ > >> >> > 7f4bb19ef700 0 [progress WARNING root] complete: ev > >> >> > 0139bc54-ae42-4483-b278-851d77f23f9f does not exist > >> >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug > >> 2023-11-21T14:56:30.718+ > >> >> > 7f4bb19ef700 0 [progress WARNING root] complete: ev > >> >> > f9d6c20e-b8d8-4625-b9cf-84da1244c822 does not exist > >> >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug > >> 2023-11-21T14:56:30.718+ > >> >> > 7f4bb19ef700 0 [progress WARNING root] complete: ev > >> >> > 1486b26d-2a23-4416-a864-2cbb0ecf1429 does not exist > >> >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug > >> 2023-11-21T14:56:30.718+ > >> >> > 7f4bb19ef700 0 [progress WARNING root] complete: ev > >> >> > 7f14d01c-498c-413f-b2ef-05521050190a does not exist > >> >> > Nov 21 14:57:35 ceph01 bash[3941523]: debug > >> 2023-11-21T14:57:35.950+ > >> >> > 7f4bb19ef700 0 [progress WARNING root] complete: ev > >> >> > 48cbd97f-82f7-4b80-8086-890fff6e0824 does not exist > >> >> > > >> >> > I tried clearing them but they keep showing up. I am wondering if > >> these > >> >> > missing events can cause memory leaks over time. > >> >> > > >> >> > /Z > >> >> > > >> >> > On Wed, 22 Nov 2023 at 11:12, Eugen Block wrote: > >> >> > > >> >> >> Do you have the full stack trace? The pastebin only contains the > >> >> >> "tcmalloc: large alloc" messages (same as in the tracker issue). > >> Maybe > >> >> >> comment in the tracker issue directly since Radek asked for > someone > >> >> >> with a similar problem in a newer release. > >> >> >> > >> >> >> Zitat von Zakhar Kirpichenko : > >> >> >> > >> >> >> > Thanks, Eugen.
[ceph-users] Re: Ceph 16.2.14: ceph-mgr getting oom-killed
Hi, This memory leak with ceph-mgr seems to be due to a change in Ceph 16.2.12. Check this issue : https://tracker.ceph.com/issues/59580 We are also affected by this, with or without containerized services. Cheers, Adrien Le 22/11/2023 à 14:14, Eugen Block a écrit : One other difference is you use docker, right? We use podman, could it be some docker restriction? Zitat von Zakhar Kirpichenko : It's a 6-node cluster with 96 OSDs, not much I/O, mgr . Each node has 384 GB of RAM, each OSD has a memory target of 16 GB, about 100 GB of memory, give or take, is available (mostly used by page cache) on each node during normal operation. Nothing unusual there, tbh. No unusual mgr modules or settings either, except for disabled progress: { "always_on_modules": [ "balancer", "crash", "devicehealth", "orchestrator", "pg_autoscaler", "progress", "rbd_support", "status", "telemetry", "volumes" ], "enabled_modules": [ "cephadm", "dashboard", "iostat", "prometheus", "restful" ], /Z On Wed, 22 Nov 2023, 14:52 Eugen Block, wrote: What does your hardware look like memory-wise? Just for comparison, one customer cluster has 4,5 GB in use (middle-sized cluster for openstack, 280 OSDs): PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 6077 ceph 20 0 6357560 4,522g 22316 S 12,00 1,797 57022:54 ceph-mgr In our own cluster (smaller than that and not really heavily used) the mgr uses almost 2 GB. So those numbers you have seem relatively small. Zitat von Zakhar Kirpichenko : > I've disabled the progress module entirely and will see how it goes. > Otherwise, mgr memory usage keeps increasing slowly, from past experience > it will stabilize at around 1.5-1.6 GB. Other than this event warning, it's > unclear what could have caused random memory ballooning. > > /Z > > On Wed, 22 Nov 2023 at 13:07, Eugen Block wrote: > >> I see these progress messages all the time, I don't think they cause >> it, but I might be wrong. You can disable it just to rule that out. >> >> Zitat von Zakhar Kirpichenko : >> >> > Unfortunately, I don't have a full stack trace because there's no crash >> > when the mgr gets oom-killed. There's just the mgr log, which looks >> > completely normal until about 2-3 minutes before the oom-kill, when >> > tmalloc warnings show up. >> > >> > I'm not sure that it's the same issue that is described in the tracker. >> We >> > seem to have some stale "events" in the progress module though: >> > >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+ >> > 7f4bb19ef700 0 [progress WARNING root] complete: ev >> > cacc4230-75ee-4892-b8fd-a19fec8f9f66 does not exist >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+ >> > 7f4bb19ef700 0 [progress WARNING root] complete: ev >> > 44824331-3f6b-45c4-b925-423d098c3c76 does not exist >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+ >> > 7f4bb19ef700 0 [progress WARNING root] complete: ev >> > 0139bc54-ae42-4483-b278-851d77f23f9f does not exist >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+ >> > 7f4bb19ef700 0 [progress WARNING root] complete: ev >> > f9d6c20e-b8d8-4625-b9cf-84da1244c822 does not exist >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+ >> > 7f4bb19ef700 0 [progress WARNING root] complete: ev >> > 1486b26d-2a23-4416-a864-2cbb0ecf1429 does not exist >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+ >> > 7f4bb19ef700 0 [progress WARNING root] complete: ev >> > 7f14d01c-498c-413f-b2ef-05521050190a does not exist >> > Nov 21 14:57:35 ceph01 bash[3941523]: debug 2023-11-21T14:57:35.950+ >> > 7f4bb19ef700 0 [progress WARNING root] complete: ev >> > 48cbd97f-82f7-4b80-8086-890fff6e0824 does not exist >> > >> > I tried clearing them but they keep showing up. I am wondering if these >> > missing events can cause memory leaks over time. >> > >> > /Z >> > >> > On Wed, 22 Nov 2023 at 11:12, Eugen Block wrote: >> > >> >> Do you have the full stack trace? The pastebin only contains the >> >> "tcmalloc: large alloc" messages (same as in the tracker issue). Maybe >> >> comment in the tracker issue directly since Radek asked for someone >> >> with a similar problem in a newer release. >> >> >> >> Zitat von Zakhar Kirpichenko : >> >> >> >> > Thanks, Eugen. It is similar in the sense that the mgr is getting >> >> > OOM-killed. >> >> > >> >> > It started happening in our cluster after the upgrade to 16.2.14. We >> >> > haven't had this issue with earlier Pacific releases. >> >> > >> >> > /Z >> >> > >> >> > On Tue, 21 Nov 2023, 21:53 Eugen Block, wrote: >> >> > >> >> >> Just checking it on the phone, but isn’t this quite similar? >> >> >> >> >> >> https://tracker.ceph.com/issues/45136 >> >> >> >> >> >>
[ceph-users] Re: Ceph 16.2.14: ceph-mgr getting oom-killed
One other difference is you use docker, right? We use podman, could it be some docker restriction? Zitat von Zakhar Kirpichenko : It's a 6-node cluster with 96 OSDs, not much I/O, mgr . Each node has 384 GB of RAM, each OSD has a memory target of 16 GB, about 100 GB of memory, give or take, is available (mostly used by page cache) on each node during normal operation. Nothing unusual there, tbh. No unusual mgr modules or settings either, except for disabled progress: { "always_on_modules": [ "balancer", "crash", "devicehealth", "orchestrator", "pg_autoscaler", "progress", "rbd_support", "status", "telemetry", "volumes" ], "enabled_modules": [ "cephadm", "dashboard", "iostat", "prometheus", "restful" ], /Z On Wed, 22 Nov 2023, 14:52 Eugen Block, wrote: What does your hardware look like memory-wise? Just for comparison, one customer cluster has 4,5 GB in use (middle-sized cluster for openstack, 280 OSDs): PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 6077 ceph 20 0 6357560 4,522g 22316 S 12,00 1,797 57022:54 ceph-mgr In our own cluster (smaller than that and not really heavily used) the mgr uses almost 2 GB. So those numbers you have seem relatively small. Zitat von Zakhar Kirpichenko : > I've disabled the progress module entirely and will see how it goes. > Otherwise, mgr memory usage keeps increasing slowly, from past experience > it will stabilize at around 1.5-1.6 GB. Other than this event warning, it's > unclear what could have caused random memory ballooning. > > /Z > > On Wed, 22 Nov 2023 at 13:07, Eugen Block wrote: > >> I see these progress messages all the time, I don't think they cause >> it, but I might be wrong. You can disable it just to rule that out. >> >> Zitat von Zakhar Kirpichenko : >> >> > Unfortunately, I don't have a full stack trace because there's no crash >> > when the mgr gets oom-killed. There's just the mgr log, which looks >> > completely normal until about 2-3 minutes before the oom-kill, when >> > tmalloc warnings show up. >> > >> > I'm not sure that it's the same issue that is described in the tracker. >> We >> > seem to have some stale "events" in the progress module though: >> > >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+ >> > 7f4bb19ef700 0 [progress WARNING root] complete: ev >> > cacc4230-75ee-4892-b8fd-a19fec8f9f66 does not exist >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+ >> > 7f4bb19ef700 0 [progress WARNING root] complete: ev >> > 44824331-3f6b-45c4-b925-423d098c3c76 does not exist >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+ >> > 7f4bb19ef700 0 [progress WARNING root] complete: ev >> > 0139bc54-ae42-4483-b278-851d77f23f9f does not exist >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+ >> > 7f4bb19ef700 0 [progress WARNING root] complete: ev >> > f9d6c20e-b8d8-4625-b9cf-84da1244c822 does not exist >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+ >> > 7f4bb19ef700 0 [progress WARNING root] complete: ev >> > 1486b26d-2a23-4416-a864-2cbb0ecf1429 does not exist >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+ >> > 7f4bb19ef700 0 [progress WARNING root] complete: ev >> > 7f14d01c-498c-413f-b2ef-05521050190a does not exist >> > Nov 21 14:57:35 ceph01 bash[3941523]: debug 2023-11-21T14:57:35.950+ >> > 7f4bb19ef700 0 [progress WARNING root] complete: ev >> > 48cbd97f-82f7-4b80-8086-890fff6e0824 does not exist >> > >> > I tried clearing them but they keep showing up. I am wondering if these >> > missing events can cause memory leaks over time. >> > >> > /Z >> > >> > On Wed, 22 Nov 2023 at 11:12, Eugen Block wrote: >> > >> >> Do you have the full stack trace? The pastebin only contains the >> >> "tcmalloc: large alloc" messages (same as in the tracker issue). Maybe >> >> comment in the tracker issue directly since Radek asked for someone >> >> with a similar problem in a newer release. >> >> >> >> Zitat von Zakhar Kirpichenko : >> >> >> >> > Thanks, Eugen. It is similar in the sense that the mgr is getting >> >> > OOM-killed. >> >> > >> >> > It started happening in our cluster after the upgrade to 16.2.14. We >> >> > haven't had this issue with earlier Pacific releases. >> >> > >> >> > /Z >> >> > >> >> > On Tue, 21 Nov 2023, 21:53 Eugen Block, wrote: >> >> > >> >> >> Just checking it on the phone, but isn’t this quite similar? >> >> >> >> >> >> https://tracker.ceph.com/issues/45136 >> >> >> >> >> >> Zitat von Zakhar Kirpichenko : >> >> >> >> >> >> > Hi, >> >> >> > >> >> >> > I'm facing a rather new issue with our Ceph cluster: from time to >> time >> >> >> > ceph-mgr on one of the two mgr nodes gets oom-killed after >> consuming >> >> over >> >> >> > 100 GB RAM: >> >> >> > >> >> >>
[ceph-users] Re: Ceph 16.2.14: ceph-mgr getting oom-killed
It's a 6-node cluster with 96 OSDs, not much I/O, mgr . Each node has 384 GB of RAM, each OSD has a memory target of 16 GB, about 100 GB of memory, give or take, is available (mostly used by page cache) on each node during normal operation. Nothing unusual there, tbh. No unusual mgr modules or settings either, except for disabled progress: { "always_on_modules": [ "balancer", "crash", "devicehealth", "orchestrator", "pg_autoscaler", "progress", "rbd_support", "status", "telemetry", "volumes" ], "enabled_modules": [ "cephadm", "dashboard", "iostat", "prometheus", "restful" ], /Z On Wed, 22 Nov 2023, 14:52 Eugen Block, wrote: > What does your hardware look like memory-wise? Just for comparison, > one customer cluster has 4,5 GB in use (middle-sized cluster for > openstack, 280 OSDs): > > PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ > COMMAND > 6077 ceph 20 0 6357560 4,522g 22316 S 12,00 1,797 > 57022:54 ceph-mgr > > In our own cluster (smaller than that and not really heavily used) the > mgr uses almost 2 GB. So those numbers you have seem relatively small. > > Zitat von Zakhar Kirpichenko : > > > I've disabled the progress module entirely and will see how it goes. > > Otherwise, mgr memory usage keeps increasing slowly, from past experience > > it will stabilize at around 1.5-1.6 GB. Other than this event warning, > it's > > unclear what could have caused random memory ballooning. > > > > /Z > > > > On Wed, 22 Nov 2023 at 13:07, Eugen Block wrote: > > > >> I see these progress messages all the time, I don't think they cause > >> it, but I might be wrong. You can disable it just to rule that out. > >> > >> Zitat von Zakhar Kirpichenko : > >> > >> > Unfortunately, I don't have a full stack trace because there's no > crash > >> > when the mgr gets oom-killed. There's just the mgr log, which looks > >> > completely normal until about 2-3 minutes before the oom-kill, when > >> > tmalloc warnings show up. > >> > > >> > I'm not sure that it's the same issue that is described in the > tracker. > >> We > >> > seem to have some stale "events" in the progress module though: > >> > > >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug > 2023-11-21T14:56:30.718+ > >> > 7f4bb19ef700 0 [progress WARNING root] complete: ev > >> > cacc4230-75ee-4892-b8fd-a19fec8f9f66 does not exist > >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug > 2023-11-21T14:56:30.718+ > >> > 7f4bb19ef700 0 [progress WARNING root] complete: ev > >> > 44824331-3f6b-45c4-b925-423d098c3c76 does not exist > >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug > 2023-11-21T14:56:30.718+ > >> > 7f4bb19ef700 0 [progress WARNING root] complete: ev > >> > 0139bc54-ae42-4483-b278-851d77f23f9f does not exist > >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug > 2023-11-21T14:56:30.718+ > >> > 7f4bb19ef700 0 [progress WARNING root] complete: ev > >> > f9d6c20e-b8d8-4625-b9cf-84da1244c822 does not exist > >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug > 2023-11-21T14:56:30.718+ > >> > 7f4bb19ef700 0 [progress WARNING root] complete: ev > >> > 1486b26d-2a23-4416-a864-2cbb0ecf1429 does not exist > >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug > 2023-11-21T14:56:30.718+ > >> > 7f4bb19ef700 0 [progress WARNING root] complete: ev > >> > 7f14d01c-498c-413f-b2ef-05521050190a does not exist > >> > Nov 21 14:57:35 ceph01 bash[3941523]: debug > 2023-11-21T14:57:35.950+ > >> > 7f4bb19ef700 0 [progress WARNING root] complete: ev > >> > 48cbd97f-82f7-4b80-8086-890fff6e0824 does not exist > >> > > >> > I tried clearing them but they keep showing up. I am wondering if > these > >> > missing events can cause memory leaks over time. > >> > > >> > /Z > >> > > >> > On Wed, 22 Nov 2023 at 11:12, Eugen Block wrote: > >> > > >> >> Do you have the full stack trace? The pastebin only contains the > >> >> "tcmalloc: large alloc" messages (same as in the tracker issue). > Maybe > >> >> comment in the tracker issue directly since Radek asked for someone > >> >> with a similar problem in a newer release. > >> >> > >> >> Zitat von Zakhar Kirpichenko : > >> >> > >> >> > Thanks, Eugen. It is similar in the sense that the mgr is getting > >> >> > OOM-killed. > >> >> > > >> >> > It started happening in our cluster after the upgrade to 16.2.14. > We > >> >> > haven't had this issue with earlier Pacific releases. > >> >> > > >> >> > /Z > >> >> > > >> >> > On Tue, 21 Nov 2023, 21:53 Eugen Block, wrote: > >> >> > > >> >> >> Just checking it on the phone, but isn’t this quite similar? > >> >> >> > >> >> >> https://tracker.ceph.com/issues/45136 > >> >> >> > >> >> >> Zitat von Zakhar Kirpichenko : > >> >> >> > >> >> >> > Hi, > >> >> >> > > >> >> >> > I'm facing a rather new issue with our Ceph cluster: from time > to > >> time > >> >> >> > ceph-mgr on one of the two mgr nodes gets
[ceph-users] Re: Ceph 16.2.14: ceph-mgr getting oom-killed
What does your hardware look like memory-wise? Just for comparison, one customer cluster has 4,5 GB in use (middle-sized cluster for openstack, 280 OSDs): PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 6077 ceph 20 0 6357560 4,522g 22316 S 12,00 1,797 57022:54 ceph-mgr In our own cluster (smaller than that and not really heavily used) the mgr uses almost 2 GB. So those numbers you have seem relatively small. Zitat von Zakhar Kirpichenko : I've disabled the progress module entirely and will see how it goes. Otherwise, mgr memory usage keeps increasing slowly, from past experience it will stabilize at around 1.5-1.6 GB. Other than this event warning, it's unclear what could have caused random memory ballooning. /Z On Wed, 22 Nov 2023 at 13:07, Eugen Block wrote: I see these progress messages all the time, I don't think they cause it, but I might be wrong. You can disable it just to rule that out. Zitat von Zakhar Kirpichenko : > Unfortunately, I don't have a full stack trace because there's no crash > when the mgr gets oom-killed. There's just the mgr log, which looks > completely normal until about 2-3 minutes before the oom-kill, when > tmalloc warnings show up. > > I'm not sure that it's the same issue that is described in the tracker. We > seem to have some stale "events" in the progress module though: > > Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+ > 7f4bb19ef700 0 [progress WARNING root] complete: ev > cacc4230-75ee-4892-b8fd-a19fec8f9f66 does not exist > Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+ > 7f4bb19ef700 0 [progress WARNING root] complete: ev > 44824331-3f6b-45c4-b925-423d098c3c76 does not exist > Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+ > 7f4bb19ef700 0 [progress WARNING root] complete: ev > 0139bc54-ae42-4483-b278-851d77f23f9f does not exist > Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+ > 7f4bb19ef700 0 [progress WARNING root] complete: ev > f9d6c20e-b8d8-4625-b9cf-84da1244c822 does not exist > Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+ > 7f4bb19ef700 0 [progress WARNING root] complete: ev > 1486b26d-2a23-4416-a864-2cbb0ecf1429 does not exist > Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+ > 7f4bb19ef700 0 [progress WARNING root] complete: ev > 7f14d01c-498c-413f-b2ef-05521050190a does not exist > Nov 21 14:57:35 ceph01 bash[3941523]: debug 2023-11-21T14:57:35.950+ > 7f4bb19ef700 0 [progress WARNING root] complete: ev > 48cbd97f-82f7-4b80-8086-890fff6e0824 does not exist > > I tried clearing them but they keep showing up. I am wondering if these > missing events can cause memory leaks over time. > > /Z > > On Wed, 22 Nov 2023 at 11:12, Eugen Block wrote: > >> Do you have the full stack trace? The pastebin only contains the >> "tcmalloc: large alloc" messages (same as in the tracker issue). Maybe >> comment in the tracker issue directly since Radek asked for someone >> with a similar problem in a newer release. >> >> Zitat von Zakhar Kirpichenko : >> >> > Thanks, Eugen. It is similar in the sense that the mgr is getting >> > OOM-killed. >> > >> > It started happening in our cluster after the upgrade to 16.2.14. We >> > haven't had this issue with earlier Pacific releases. >> > >> > /Z >> > >> > On Tue, 21 Nov 2023, 21:53 Eugen Block, wrote: >> > >> >> Just checking it on the phone, but isn’t this quite similar? >> >> >> >> https://tracker.ceph.com/issues/45136 >> >> >> >> Zitat von Zakhar Kirpichenko : >> >> >> >> > Hi, >> >> > >> >> > I'm facing a rather new issue with our Ceph cluster: from time to time >> >> > ceph-mgr on one of the two mgr nodes gets oom-killed after consuming >> over >> >> > 100 GB RAM: >> >> > >> >> > [Nov21 15:02] tp_osd_tp invoked oom-killer: >> >> > gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0 >> >> > [ +0.10] oom_kill_process.cold+0xb/0x10 >> >> > [ +0.02] [ pid ] uid tgid total_vm rss pgtables_bytes >> >> > swapents oom_score_adj name >> >> > [ +0.08] >> >> > >> >> >> oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=504d37b566d9fd442d45904a00584b4f61c93c5d49dc59eb1c948b3d1c096907,mems_allowed=0-1,global_oom,task_memcg=/docker/3826be8f9115479117ddb8b721ca57585b2bdd58a27c7ed7b38e8d83eb795957,task=ceph-mgr,pid=3941610,uid=167 >> >> > [ +0.000697] Out of memory: Killed process 3941610 (ceph-mgr) >> >> > total-vm:146986656kB, anon-rss:125340436kB, file-rss:0kB, >> shmem-rss:0kB, >> >> > UID:167 pgtables:260356kB oom_score_adj:0 >> >> > [ +6.509769] oom_reaper: reaped process 3941610 (ceph-mgr), now >> >> > anon-rss:0kB, file-rss:0kB, shmem-rss:0kB >> >> > >> >> > The cluster is stable and operating normally, there's nothing unusual >> >> going >> >> > on before, during or after the kill, thus it's unclear what causes the >> >> mgr >> >> > to balloon, use all
[ceph-users] Re: Ceph 16.2.14: ceph-mgr getting oom-killed
I've disabled the progress module entirely and will see how it goes. Otherwise, mgr memory usage keeps increasing slowly, from past experience it will stabilize at around 1.5-1.6 GB. Other than this event warning, it's unclear what could have caused random memory ballooning. /Z On Wed, 22 Nov 2023 at 13:07, Eugen Block wrote: > I see these progress messages all the time, I don't think they cause > it, but I might be wrong. You can disable it just to rule that out. > > Zitat von Zakhar Kirpichenko : > > > Unfortunately, I don't have a full stack trace because there's no crash > > when the mgr gets oom-killed. There's just the mgr log, which looks > > completely normal until about 2-3 minutes before the oom-kill, when > > tmalloc warnings show up. > > > > I'm not sure that it's the same issue that is described in the tracker. > We > > seem to have some stale "events" in the progress module though: > > > > Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+ > > 7f4bb19ef700 0 [progress WARNING root] complete: ev > > cacc4230-75ee-4892-b8fd-a19fec8f9f66 does not exist > > Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+ > > 7f4bb19ef700 0 [progress WARNING root] complete: ev > > 44824331-3f6b-45c4-b925-423d098c3c76 does not exist > > Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+ > > 7f4bb19ef700 0 [progress WARNING root] complete: ev > > 0139bc54-ae42-4483-b278-851d77f23f9f does not exist > > Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+ > > 7f4bb19ef700 0 [progress WARNING root] complete: ev > > f9d6c20e-b8d8-4625-b9cf-84da1244c822 does not exist > > Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+ > > 7f4bb19ef700 0 [progress WARNING root] complete: ev > > 1486b26d-2a23-4416-a864-2cbb0ecf1429 does not exist > > Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+ > > 7f4bb19ef700 0 [progress WARNING root] complete: ev > > 7f14d01c-498c-413f-b2ef-05521050190a does not exist > > Nov 21 14:57:35 ceph01 bash[3941523]: debug 2023-11-21T14:57:35.950+ > > 7f4bb19ef700 0 [progress WARNING root] complete: ev > > 48cbd97f-82f7-4b80-8086-890fff6e0824 does not exist > > > > I tried clearing them but they keep showing up. I am wondering if these > > missing events can cause memory leaks over time. > > > > /Z > > > > On Wed, 22 Nov 2023 at 11:12, Eugen Block wrote: > > > >> Do you have the full stack trace? The pastebin only contains the > >> "tcmalloc: large alloc" messages (same as in the tracker issue). Maybe > >> comment in the tracker issue directly since Radek asked for someone > >> with a similar problem in a newer release. > >> > >> Zitat von Zakhar Kirpichenko : > >> > >> > Thanks, Eugen. It is similar in the sense that the mgr is getting > >> > OOM-killed. > >> > > >> > It started happening in our cluster after the upgrade to 16.2.14. We > >> > haven't had this issue with earlier Pacific releases. > >> > > >> > /Z > >> > > >> > On Tue, 21 Nov 2023, 21:53 Eugen Block, wrote: > >> > > >> >> Just checking it on the phone, but isn’t this quite similar? > >> >> > >> >> https://tracker.ceph.com/issues/45136 > >> >> > >> >> Zitat von Zakhar Kirpichenko : > >> >> > >> >> > Hi, > >> >> > > >> >> > I'm facing a rather new issue with our Ceph cluster: from time to > time > >> >> > ceph-mgr on one of the two mgr nodes gets oom-killed after > consuming > >> over > >> >> > 100 GB RAM: > >> >> > > >> >> > [Nov21 15:02] tp_osd_tp invoked oom-killer: > >> >> > gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0 > >> >> > [ +0.10] oom_kill_process.cold+0xb/0x10 > >> >> > [ +0.02] [ pid ] uid tgid total_vm rss > pgtables_bytes > >> >> > swapents oom_score_adj name > >> >> > [ +0.08] > >> >> > > >> >> > >> > oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=504d37b566d9fd442d45904a00584b4f61c93c5d49dc59eb1c948b3d1c096907,mems_allowed=0-1,global_oom,task_memcg=/docker/3826be8f9115479117ddb8b721ca57585b2bdd58a27c7ed7b38e8d83eb795957,task=ceph-mgr,pid=3941610,uid=167 > >> >> > [ +0.000697] Out of memory: Killed process 3941610 (ceph-mgr) > >> >> > total-vm:146986656kB, anon-rss:125340436kB, file-rss:0kB, > >> shmem-rss:0kB, > >> >> > UID:167 pgtables:260356kB oom_score_adj:0 > >> >> > [ +6.509769] oom_reaper: reaped process 3941610 (ceph-mgr), now > >> >> > anon-rss:0kB, file-rss:0kB, shmem-rss:0kB > >> >> > > >> >> > The cluster is stable and operating normally, there's nothing > unusual > >> >> going > >> >> > on before, during or after the kill, thus it's unclear what causes > the > >> >> mgr > >> >> > to balloon, use all RAM and get killed. Systemd logs aren't very > >> helpful: > >> >> > they just show normal mgr operations until it fails to allocate > memory > >> >> and > >> >> > gets killed: https://pastebin.com/MLyw9iVi > >> >> > > >> >> > The mgr experienced this issue several times in the last 2 months, > and > >> >>
[ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported"
There are some unhandled race conditions in the MDS cluster in rare circumstances. We had this issue with mimic and octopus and it went away after manually pinning sub-dirs to MDS ranks; see https://docs.ceph.com/en/nautilus/cephfs/multimds/?highlight=dir%20pin#manually-pinning-directory-trees-to-a-particular-rank. This has the added advantage that one can bypass the internal load-balancer, which was horrible for our work loads. I have a related post about ephemeral pinning on this list one-two years ago. You should be able to find it. Short story: after manually pinning all user directories to ranks, all our problems disappeared and performance improved a lot. MDS load dropped from 130% average to 10-20%. So did memory consumption and cache recycling. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Eugen Block Sent: Wednesday, November 22, 2023 12:30 PM To: ceph-users@ceph.io Subject: [ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported" Hi, we've seen this a year ago in a Nautilus cluster with multi-active MDS as well. It turned up only once within several years and we decided not to look too closely at that time. How often do you see it? Is it reproducable? In that case I'd recommend to create a tracker issue. Regards, Eugen Zitat von zxcs : > HI, Experts, > > we are using cephfs with 16.2.* with multi active mds, and > recently, we have two nodes mount with ceph-fuse due to the old os > system. > > and one nodes run a python script with `glob.glob(path)`, and > another client doing `cp` operation on the same path. > > then we see some log about `mds slow request`, and logs complain > “failed to authpin, subtree is being exported" > > then need to restart mds, > > > our question is, does there any dead lock? how can we avoid this > and how to fix it without restart mds(it will influence other users) ? > > > Thanks a ton! > > > xz > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported"
On 11/22/23 16:02, zxcs wrote: HI, Experts, we are using cephfs with 16.2.* with multi active mds, and recently, we have two nodes mount with ceph-fuse due to the old os system. and one nodes run a python script with `glob.glob(path)`, and another client doing `cp` operation on the same path. then we see some log about `mds slow request`, and logs complain “failed to authpin, subtree is being exported" then need to restart mds, our question is, does there any dead lock? how can we avoid this and how to fix it without restart mds(it will influence other users) ? BTW, won't the slow requests disappear themself later ? It looks like the exporting is slow or there too many exports are going on. Thanks - Xiubo Thanks a ton! xz ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported"
Hi, we've seen this a year ago in a Nautilus cluster with multi-active MDS as well. It turned up only once within several years and we decided not to look too closely at that time. How often do you see it? Is it reproducable? In that case I'd recommend to create a tracker issue. Regards, Eugen Zitat von zxcs : HI, Experts, we are using cephfs with 16.2.* with multi active mds, and recently, we have two nodes mount with ceph-fuse due to the old os system. and one nodes run a python script with `glob.glob(path)`, and another client doing `cp` operation on the same path. then we see some log about `mds slow request`, and logs complain “failed to authpin, subtree is being exported" then need to restart mds, our question is, does there any dead lock? how can we avoid this and how to fix it without restart mds(it will influence other users) ? Thanks a ton! xz ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Ceph 16.2.14: ceph-mgr getting oom-killed
I see these progress messages all the time, I don't think they cause it, but I might be wrong. You can disable it just to rule that out. Zitat von Zakhar Kirpichenko : Unfortunately, I don't have a full stack trace because there's no crash when the mgr gets oom-killed. There's just the mgr log, which looks completely normal until about 2-3 minutes before the oom-kill, when tmalloc warnings show up. I'm not sure that it's the same issue that is described in the tracker. We seem to have some stale "events" in the progress module though: Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+ 7f4bb19ef700 0 [progress WARNING root] complete: ev cacc4230-75ee-4892-b8fd-a19fec8f9f66 does not exist Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+ 7f4bb19ef700 0 [progress WARNING root] complete: ev 44824331-3f6b-45c4-b925-423d098c3c76 does not exist Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+ 7f4bb19ef700 0 [progress WARNING root] complete: ev 0139bc54-ae42-4483-b278-851d77f23f9f does not exist Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+ 7f4bb19ef700 0 [progress WARNING root] complete: ev f9d6c20e-b8d8-4625-b9cf-84da1244c822 does not exist Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+ 7f4bb19ef700 0 [progress WARNING root] complete: ev 1486b26d-2a23-4416-a864-2cbb0ecf1429 does not exist Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+ 7f4bb19ef700 0 [progress WARNING root] complete: ev 7f14d01c-498c-413f-b2ef-05521050190a does not exist Nov 21 14:57:35 ceph01 bash[3941523]: debug 2023-11-21T14:57:35.950+ 7f4bb19ef700 0 [progress WARNING root] complete: ev 48cbd97f-82f7-4b80-8086-890fff6e0824 does not exist I tried clearing them but they keep showing up. I am wondering if these missing events can cause memory leaks over time. /Z On Wed, 22 Nov 2023 at 11:12, Eugen Block wrote: Do you have the full stack trace? The pastebin only contains the "tcmalloc: large alloc" messages (same as in the tracker issue). Maybe comment in the tracker issue directly since Radek asked for someone with a similar problem in a newer release. Zitat von Zakhar Kirpichenko : > Thanks, Eugen. It is similar in the sense that the mgr is getting > OOM-killed. > > It started happening in our cluster after the upgrade to 16.2.14. We > haven't had this issue with earlier Pacific releases. > > /Z > > On Tue, 21 Nov 2023, 21:53 Eugen Block, wrote: > >> Just checking it on the phone, but isn’t this quite similar? >> >> https://tracker.ceph.com/issues/45136 >> >> Zitat von Zakhar Kirpichenko : >> >> > Hi, >> > >> > I'm facing a rather new issue with our Ceph cluster: from time to time >> > ceph-mgr on one of the two mgr nodes gets oom-killed after consuming over >> > 100 GB RAM: >> > >> > [Nov21 15:02] tp_osd_tp invoked oom-killer: >> > gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0 >> > [ +0.10] oom_kill_process.cold+0xb/0x10 >> > [ +0.02] [ pid ] uid tgid total_vm rss pgtables_bytes >> > swapents oom_score_adj name >> > [ +0.08] >> > >> oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=504d37b566d9fd442d45904a00584b4f61c93c5d49dc59eb1c948b3d1c096907,mems_allowed=0-1,global_oom,task_memcg=/docker/3826be8f9115479117ddb8b721ca57585b2bdd58a27c7ed7b38e8d83eb795957,task=ceph-mgr,pid=3941610,uid=167 >> > [ +0.000697] Out of memory: Killed process 3941610 (ceph-mgr) >> > total-vm:146986656kB, anon-rss:125340436kB, file-rss:0kB, shmem-rss:0kB, >> > UID:167 pgtables:260356kB oom_score_adj:0 >> > [ +6.509769] oom_reaper: reaped process 3941610 (ceph-mgr), now >> > anon-rss:0kB, file-rss:0kB, shmem-rss:0kB >> > >> > The cluster is stable and operating normally, there's nothing unusual >> going >> > on before, during or after the kill, thus it's unclear what causes the >> mgr >> > to balloon, use all RAM and get killed. Systemd logs aren't very helpful: >> > they just show normal mgr operations until it fails to allocate memory >> and >> > gets killed: https://pastebin.com/MLyw9iVi >> > >> > The mgr experienced this issue several times in the last 2 months, and >> the >> > events don't appear to correlate with any other events in the cluster >> > because basically nothing else happened at around those times. How can I >> > investigate this and figure out what's causing the mgr to consume all >> > memory and get killed? >> > >> > I would very much appreciate any advice! >> > >> > Best regards, >> > Zakhar >> > ___ >> > ceph-users mailing list -- ceph-users@ceph.io >> > To unsubscribe send an email to ceph-users-le...@ceph.io >> >> >> ___ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io >> ___ ceph-users mailing list -- ceph-users@ceph.io
[ceph-users] Re: Ceph 16.2.14: ceph-mgr getting oom-killed
Unfortunately, I don't have a full stack trace because there's no crash when the mgr gets oom-killed. There's just the mgr log, which looks completely normal until about 2-3 minutes before the oom-kill, when tmalloc warnings show up. I'm not sure that it's the same issue that is described in the tracker. We seem to have some stale "events" in the progress module though: Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+ 7f4bb19ef700 0 [progress WARNING root] complete: ev cacc4230-75ee-4892-b8fd-a19fec8f9f66 does not exist Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+ 7f4bb19ef700 0 [progress WARNING root] complete: ev 44824331-3f6b-45c4-b925-423d098c3c76 does not exist Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+ 7f4bb19ef700 0 [progress WARNING root] complete: ev 0139bc54-ae42-4483-b278-851d77f23f9f does not exist Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+ 7f4bb19ef700 0 [progress WARNING root] complete: ev f9d6c20e-b8d8-4625-b9cf-84da1244c822 does not exist Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+ 7f4bb19ef700 0 [progress WARNING root] complete: ev 1486b26d-2a23-4416-a864-2cbb0ecf1429 does not exist Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+ 7f4bb19ef700 0 [progress WARNING root] complete: ev 7f14d01c-498c-413f-b2ef-05521050190a does not exist Nov 21 14:57:35 ceph01 bash[3941523]: debug 2023-11-21T14:57:35.950+ 7f4bb19ef700 0 [progress WARNING root] complete: ev 48cbd97f-82f7-4b80-8086-890fff6e0824 does not exist I tried clearing them but they keep showing up. I am wondering if these missing events can cause memory leaks over time. /Z On Wed, 22 Nov 2023 at 11:12, Eugen Block wrote: > Do you have the full stack trace? The pastebin only contains the > "tcmalloc: large alloc" messages (same as in the tracker issue). Maybe > comment in the tracker issue directly since Radek asked for someone > with a similar problem in a newer release. > > Zitat von Zakhar Kirpichenko : > > > Thanks, Eugen. It is similar in the sense that the mgr is getting > > OOM-killed. > > > > It started happening in our cluster after the upgrade to 16.2.14. We > > haven't had this issue with earlier Pacific releases. > > > > /Z > > > > On Tue, 21 Nov 2023, 21:53 Eugen Block, wrote: > > > >> Just checking it on the phone, but isn’t this quite similar? > >> > >> https://tracker.ceph.com/issues/45136 > >> > >> Zitat von Zakhar Kirpichenko : > >> > >> > Hi, > >> > > >> > I'm facing a rather new issue with our Ceph cluster: from time to time > >> > ceph-mgr on one of the two mgr nodes gets oom-killed after consuming > over > >> > 100 GB RAM: > >> > > >> > [Nov21 15:02] tp_osd_tp invoked oom-killer: > >> > gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0 > >> > [ +0.10] oom_kill_process.cold+0xb/0x10 > >> > [ +0.02] [ pid ] uid tgid total_vm rss pgtables_bytes > >> > swapents oom_score_adj name > >> > [ +0.08] > >> > > >> > oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=504d37b566d9fd442d45904a00584b4f61c93c5d49dc59eb1c948b3d1c096907,mems_allowed=0-1,global_oom,task_memcg=/docker/3826be8f9115479117ddb8b721ca57585b2bdd58a27c7ed7b38e8d83eb795957,task=ceph-mgr,pid=3941610,uid=167 > >> > [ +0.000697] Out of memory: Killed process 3941610 (ceph-mgr) > >> > total-vm:146986656kB, anon-rss:125340436kB, file-rss:0kB, > shmem-rss:0kB, > >> > UID:167 pgtables:260356kB oom_score_adj:0 > >> > [ +6.509769] oom_reaper: reaped process 3941610 (ceph-mgr), now > >> > anon-rss:0kB, file-rss:0kB, shmem-rss:0kB > >> > > >> > The cluster is stable and operating normally, there's nothing unusual > >> going > >> > on before, during or after the kill, thus it's unclear what causes the > >> mgr > >> > to balloon, use all RAM and get killed. Systemd logs aren't very > helpful: > >> > they just show normal mgr operations until it fails to allocate memory > >> and > >> > gets killed: https://pastebin.com/MLyw9iVi > >> > > >> > The mgr experienced this issue several times in the last 2 months, and > >> the > >> > events don't appear to correlate with any other events in the cluster > >> > because basically nothing else happened at around those times. How > can I > >> > investigate this and figure out what's causing the mgr to consume all > >> > memory and get killed? > >> > > >> > I would very much appreciate any advice! > >> > > >> > Best regards, > >> > Zakhar > >> > ___ > >> > ceph-users mailing list -- ceph-users@ceph.io > >> > To unsubscribe send an email to ceph-users-le...@ceph.io > >> > >> > >> ___ > >> ceph-users mailing list -- ceph-users@ceph.io > >> To unsubscribe send an email to ceph-users-le...@ceph.io > >> > > > > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send
[ceph-users] Re: Ceph 16.2.14: ceph-mgr getting oom-killed
Do you have the full stack trace? The pastebin only contains the "tcmalloc: large alloc" messages (same as in the tracker issue). Maybe comment in the tracker issue directly since Radek asked for someone with a similar problem in a newer release. Zitat von Zakhar Kirpichenko : Thanks, Eugen. It is similar in the sense that the mgr is getting OOM-killed. It started happening in our cluster after the upgrade to 16.2.14. We haven't had this issue with earlier Pacific releases. /Z On Tue, 21 Nov 2023, 21:53 Eugen Block, wrote: Just checking it on the phone, but isn’t this quite similar? https://tracker.ceph.com/issues/45136 Zitat von Zakhar Kirpichenko : > Hi, > > I'm facing a rather new issue with our Ceph cluster: from time to time > ceph-mgr on one of the two mgr nodes gets oom-killed after consuming over > 100 GB RAM: > > [Nov21 15:02] tp_osd_tp invoked oom-killer: > gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0 > [ +0.10] oom_kill_process.cold+0xb/0x10 > [ +0.02] [ pid ] uid tgid total_vm rss pgtables_bytes > swapents oom_score_adj name > [ +0.08] > oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=504d37b566d9fd442d45904a00584b4f61c93c5d49dc59eb1c948b3d1c096907,mems_allowed=0-1,global_oom,task_memcg=/docker/3826be8f9115479117ddb8b721ca57585b2bdd58a27c7ed7b38e8d83eb795957,task=ceph-mgr,pid=3941610,uid=167 > [ +0.000697] Out of memory: Killed process 3941610 (ceph-mgr) > total-vm:146986656kB, anon-rss:125340436kB, file-rss:0kB, shmem-rss:0kB, > UID:167 pgtables:260356kB oom_score_adj:0 > [ +6.509769] oom_reaper: reaped process 3941610 (ceph-mgr), now > anon-rss:0kB, file-rss:0kB, shmem-rss:0kB > > The cluster is stable and operating normally, there's nothing unusual going > on before, during or after the kill, thus it's unclear what causes the mgr > to balloon, use all RAM and get killed. Systemd logs aren't very helpful: > they just show normal mgr operations until it fails to allocate memory and > gets killed: https://pastebin.com/MLyw9iVi > > The mgr experienced this issue several times in the last 2 months, and the > events don't appear to correlate with any other events in the cluster > because basically nothing else happened at around those times. How can I > investigate this and figure out what's causing the mgr to consume all > memory and get killed? > > I would very much appreciate any advice! > > Best regards, > Zakhar > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: No SSL Dashboard working after installing mgr crt|key with RSA/4096 secp384r1
Hello Eugen, thanks for the validation. Actually I use plain http because I do not have much time to look for a solution. But i will check a new cert ASAP. Christoph Am Fr., 17. Nov. 2023 um 12:57 Uhr schrieb Eugen Block : > I was able to reproduce the error with a self-signed elliptic curves > based certificate. But I also got out of it by removing cert and key: > > quincy-1:~ # ceph config-key rm mgr/dashboard/key > key deleted > quincy-1:~ # ceph config-key rm mgr/dashboard/crt > key deleted > > Then I failed the mgr just to be sure: > > quincy-1:~ # ceph mgr fail > quincy-1:~ # ceph config-key get mgr/dashboard/crt > Error ENOENT: > > And then I configured the previous key, did a mgr fail and now the > dashboard is working again. > > Zitat von Eugen Block : > > > Hi, > > > > did you get your dashboard back in the meantime? I don't have an > > answer regarding the certificate based on elliptic curves but since > > you wrote: > > > >> So we tried to go back to the original state by removing CRT anf KEY but > >> without success. The new key seems to stuck into the config > > > > how did you try to remove it? I would just assume that this should work: > > > > $ ceph config-key rm mgr/dashboard/cert > > > > Do you get an error message when removing it or does the mgr log > > anything when you try to remove it which fails? > > Also which ceph version is this? > > > > Thanks, > > Eugen > > > > Zitat von "Ackermann, Christoph" : > > > >> Hello all, > >> > >> today i got a new certificate for our internal domain based on RSA/4096 > >> secp384r1. After inserting CRT and Key i got both "...updated" > messages. > >> After checking the dashboard i got an empty page and this error: > >> > >> health: HEALTH_ERR > >> Module 'dashboard' has failed: key type unsupported > >> > >> So we tried to go back to the original state by removing CRT anf KEY but > >> without success. The new key seems to stuck into the config > >> > >> [root@ceph ~]# ceph config-key get mgr/dashboard/crt > >> -BEGIN CERTIFICATE- > >> MIIFqTCCBJGgAwIBAgIMB5tjLSz264Ic8zeHMA0GCSqGSIb3DQEBCwUAMEwxCzAJ > >> [...] > >> ItzkEzq4SZ6V1Jhuf4bFlOMBVAKgAwZ90gXlguoiFFQu5+NIqNljZ8Jz7d0jhH43 > >> e3zhm5sn21+eIqRbiQ== > >> -END CERTIFICATE- > >> > >> [root@ceph ~]# ceph config-key get mgr/dashboard/key > >> > >> *Error ENOENT: * > >> > >> We tried to generate a self signed Cert but no luck. It looks like > manger > >> stays in an intermediate state. The only way to get back the dashboard > is > >> to disable SSL and use plain http. > >> > >> Can somebody explain this behaviour? Maybe secp384r1 elliptic curves > >> aren't supported? How can we clean up SSL configuration? > >> > >> Thanks, > >> Christoph Ackermann > >> > >> Ps we checked some Information like > >> https://tracker.ceph.com/issues/57924#change-227744 and others but no > >> luck... > >> ___ > >> ceph-users mailing list -- ceph-users@ceph.io > >> To unsubscribe send an email to ceph-users-le...@ceph.io > > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] mds slow request with “failed to authpin, subtree is being exported"
HI, Experts, we are using cephfs with 16.2.* with multi active mds, and recently, we have two nodes mount with ceph-fuse due to the old os system. and one nodes run a python script with `glob.glob(path)`, and another client doing `cp` operation on the same path. then we see some log about `mds slow request`, and logs complain “failed to authpin, subtree is being exported" then need to restart mds, our question is, does there any dead lock? how can we avoid this and how to fix it without restart mds(it will influence other users) ? Thanks a ton! xz ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io