[ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported"

2023-11-22 Thread Xiubo Li


On 11/23/23 11:25, zxcs wrote:

Thanks a ton, Xiubo!

it not disappear.

even we umount the ceph directory on these two old os node.

after dump ops flight , we can see some request, and the earliest complain “failed 
to authpin, subtree is being exported"

And how to avoid this, would you please help to shed some light here?


Okay, as Frank mentioned you can try to disable the balancer by pining 
the directories. As I remembered the balancer is buggy.


And also you can raise one ceph tracker and provide the debug logs if 
you have.


Thanks

- Xiubo



Thanks,
xz



2023年11月22日 19:44,Xiubo Li  写道:


On 11/22/23 16:02, zxcs wrote:

HI, Experts,

we are using cephfs with  16.2.* with multi active mds, and recently, we have 
two nodes mount with ceph-fuse due to the old os system.

and  one nodes run a python script with `glob.glob(path)`, and another client 
doing `cp` operation on the same path.

then we see some log about `mds slow request`, and logs complain “failed to authpin, 
subtree is being exported"

then need to restart mds,


our question is, does there any dead lock?  how can we avoid this and how to 
fix it without restart mds(it will influence other users) ?

BTW, won't the slow requests disappear themself later ?

It looks like the exporting is slow or there too many exports are going on.

Thanks

- Xiubo


Thanks a ton!


xz
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to use hardware

2023-11-22 Thread Albert Shih
Le 20/11/2023 à 09:24:41+, Frank Schilder a écrit
Hi, 

Thanks everyone for your answer. 

> 
> we are using something similar for ceph-fs. For a backup system your setup 
> can work, depending on how you back up. While HDD pools have poor IOP/s 
> performance, they are very good for streaming workloads. If you are using 
> something like Borg backup that writes huge files sequentially, a HDD 
> back-end should be OK.
> 

Ok. Good to know

> Here some things to consider and try out:
> 
> 1. You really need to get a bunch of enterprise SSDs with power loss 
> protection for the FS meta data pool (disable write cache if enabled, this 
> will disable volatile write cache and switch to protected caching). We are 
> using (formerly Intel) 1.8T SATA drives that we subdivide into 4 OSDs each to 
> raise performance. Place the meta-data pool and the primary data pool on 
> these disks. Create a secondary data pool on the HDDs and assign it to the 
> root *before* creating anything on the FS (see the recommended 3-pool layout 
> for ceph file systems in the docs). I would not even consider running this 
> without SSDs. 1 such SSD per host is the minimum, 2 is better. If Borg or 
> whatever can make use of a small fast storage directory, assign a sub-dir of 
> the root to the primary data pool.

OK. I will see what I can do. 

> 
> 2. Calculate with sufficient extra disk space. As long as utilization stays 
> below 60-70% bluestore will try to make large object writes sequential, which 
> is really important for HDDs. On our cluster we currently have 40% 
> utilization and I get full HDD bandwidth out for large sequential 
> reads/writes. Make sure your backup application makes large sequential IO 
> requests.
> 
> 3. As Anthony said, add RAM. You should go for 512G on 50 HDD-nodes. You can 
> run the MDS daemons on the OSD nodes. Set a reasonable cache limit and use 
> ephemeral pinning. Depending on the CPUs you are using, 48 cores can be 
> plenty. The latest generation Intel Xeon Scalable Processors is so efficient 
> with ceph that 1HT per HDD is more than enough.

Yes I get 512G on each node, 64 core on each server.

> 
> 4. 3 MON+MGR nodes are sufficient. You can do something else with the 
> remaining 2 nodes. Of course, you can use them as additional MON+MGR nodes. 
> We also use 5 and it improves maintainability a lot.
> 

Ok thanks. 

> Something more exotic if you have time:
> 
> 5. To improve sequential performance further, you can experiment with larger 
> min_alloc_sizes for OSDs (on creation time, you will need to scrap and 
> re-deploy the cluster to test different values). Every HDD has a preferred 
> IO-size for which random IO achieves nearly the same band-with as sequential 
> writes. (But see 7.)
> 
> 6. On your set-up you will probably go for a 4+2 EC data pool on HDD. With 
> object size 4M the max. chunk size per OSD will be 1M. For many HDDs this is 
> the preferred IO size (usually between 256K-1M). (But see 7.)
> 
> 7. Important: large min_alloc_sizes are only good if your workload *never* 
> modifies files, but only replaces them. A bit like a pool without EC 
> overwrite enabled. The implementation of EC overwrites has a "feature" that 
> can lead to massive allocation amplification. If your backup workload does 
> modifications to files instead of adding new+deleting old, do *not* 
> experiment with options 5.-7. Instead, use the default and make sure you have 
> sufficient unused capacity to increase the chances for large bluestore writes 
> (keep utilization below 60-70% and just buy extra disks). A workload with 
> large min_alloc_sizes has to be S3-like, only upload, download and delete are 
> allowed.

Thankt a lot for those tips. 

I'm newbie with ceph so it's going to take sometime before I understand
everything you say. 


Best regards

-- 
Albert SHIH 嶺 
France
Heure locale/Local time:
jeu. 23 nov. 2023 08:32:20 CET
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph-exporter binds to IPv4 only

2023-11-22 Thread Stefan Kooman

On 22-11-2023 15:54, Stefan Kooman wrote:

Hi,

In a IPv6 only deployment the ceph-exporter daemons are not listening on 
IPv6 address(es). This can be fixed by editing the unit.run file of the 
ceph-exporter by changing "--addrs=0.0.0.0" to "--addrs=::".


Is this configurable? So that cephadm deploys ceph-exporter with proper 
unit.run arguments?


Related issue: https://tracker.ceph.com/issues/62220

A different fix is chosen as opposed to 
https://github.com/ceph/ceph/pull/54285/. Maybe better to remove the 
IPv4/IPv6 distinction and make the code IP family agnostic (i.e. go for 
the fix in 54285)?


Gr. Stefan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph fs (meta) data inconsistent

2023-11-22 Thread Xiubo Li

Hi Frank,

Locally I had some test by using the copy2 and copy, but they all worked 
well for me.


Could you write a reproducing script ?

Thanks

- Xiubo

On 11/10/23 22:53, Frank Schilder wrote:

It looks like the cap update request was dropped to the ground in MDS.
[...]
If you can reproduce it, then please provide the mds logs by setting:
[...]

I can do a test with MDS logs on high level. Before I do that, looking at the 
python
findings above, is this something that should work on ceph or is it a python 
issue?

Not sure yet. I need to understand what exactly shutil.copy does in kclient.

Thanks! Will wait for further instructions.
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Xiubo Li 
Sent: Friday, November 10, 2023 3:14 AM
To: Frank Schilder; Gregory Farnum
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Re: ceph fs (meta) data inconsistent


On 11/10/23 00:18, Frank Schilder wrote:

Hi Xiubo,

I will try to answer questions from all your 3 e-mails here together with some 
new information we have.

New: The problem occurs in newer python versions when using the shutil.copy function. There is also 
a function shutil.copy2 for which the problem does not show up. Copy2 behaves a bit like "cp 
-p" while copy is like "cp". The only code difference (linux) between these 2 
functions is that copy calls copyfile+copymode while copy2 calls copyfile+copystat. For now we 
asked our users to use copy2 to avoid the issue.

The copyfile function calls _fastcopy_sendfile on linux, which in turn calls 
os.sendfile, which seems to be part of libc:

#include 
ssize_t sendfile(int out_fd, int in_fd, off_t *offset, size_t count);

I'm wondering if using this function requires explicit meta-data updates or 
should be safe on ceph-fs. I'm also not sure if a user-space client even 
supports this function (seems to be meaningless). Should this function be safe 
to use on ceph kclient?

I didn't foresee any limit for this in kclient.

The shutil.copy will only copy the contents of the file, while the
shutil.copy2 will also copy the metadata. I need to know what exactly
they do in kclient for shutil.copy and shutil.copy2.


Answers to questions:


BTW, have you test the ceph-fuse with the same test ? Is also the same ?

I don't have fuse clients available, so can't test right now.


Have you tried other ceph version ?

We are in the process of deploying a new test cluster, the old one is scrapped 
already. I can't test this at the moment.


It looks like the cap update request was dropped to the ground in MDS.
[...]
If you can reproduce it, then please provide the mds logs by setting:
[...]

I can do a test with MDS logs on high level. Before I do that, looking at the 
python findings above, is this something that should work on ceph or is it a 
python issue?

Not sure yet. I need to understand what exactly shutil.copy does in kclient.

Thanks

- Xiubo




Thanks for your help!
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14






___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph fs (meta) data inconsistent

2023-11-22 Thread Xiubo Li
I just raised one tracker to follow this: 
https://tracker.ceph.com/issues/63510


Thanks

- Xiubo


On 11/10/23 22:53, Frank Schilder wrote:

It looks like the cap update request was dropped to the ground in MDS.
[...]
If you can reproduce it, then please provide the mds logs by setting:
[...]

I can do a test with MDS logs on high level. Before I do that, looking at the 
python
findings above, is this something that should work on ceph or is it a python 
issue?

Not sure yet. I need to understand what exactly shutil.copy does in kclient.

Thanks! Will wait for further instructions.
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Xiubo Li 
Sent: Friday, November 10, 2023 3:14 AM
To: Frank Schilder; Gregory Farnum
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Re: ceph fs (meta) data inconsistent


On 11/10/23 00:18, Frank Schilder wrote:

Hi Xiubo,

I will try to answer questions from all your 3 e-mails here together with some 
new information we have.

New: The problem occurs in newer python versions when using the shutil.copy function. There is also 
a function shutil.copy2 for which the problem does not show up. Copy2 behaves a bit like "cp 
-p" while copy is like "cp". The only code difference (linux) between these 2 
functions is that copy calls copyfile+copymode while copy2 calls copyfile+copystat. For now we 
asked our users to use copy2 to avoid the issue.

The copyfile function calls _fastcopy_sendfile on linux, which in turn calls 
os.sendfile, which seems to be part of libc:

#include 
ssize_t sendfile(int out_fd, int in_fd, off_t *offset, size_t count);

I'm wondering if using this function requires explicit meta-data updates or 
should be safe on ceph-fs. I'm also not sure if a user-space client even 
supports this function (seems to be meaningless). Should this function be safe 
to use on ceph kclient?

I didn't foresee any limit for this in kclient.

The shutil.copy will only copy the contents of the file, while the
shutil.copy2 will also copy the metadata. I need to know what exactly
they do in kclient for shutil.copy and shutil.copy2.


Answers to questions:


BTW, have you test the ceph-fuse with the same test ? Is also the same ?

I don't have fuse clients available, so can't test right now.


Have you tried other ceph version ?

We are in the process of deploying a new test cluster, the old one is scrapped 
already. I can't test this at the moment.


It looks like the cap update request was dropped to the ground in MDS.
[...]
If you can reproduce it, then please provide the mds logs by setting:
[...]

I can do a test with MDS logs on high level. Before I do that, looking at the 
python findings above, is this something that should work on ceph or is it a 
python issue?

Not sure yet. I need to understand what exactly shutil.copy does in kclient.

Thanks

- Xiubo




Thanks for your help!
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14






___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported"

2023-11-22 Thread zxcs
Thanks a ton, Xiubo!

it not disappear.

even we umount the ceph directory on these two old os node.

after dump ops flight , we can see some request, and the earliest complain 
“failed to authpin, subtree is being exported"

And how to avoid this, would you please help to shed some light here?

Thanks,
xz


> 2023年11月22日 19:44,Xiubo Li  写道:
> 
> 
> On 11/22/23 16:02, zxcs wrote:
>> HI, Experts,
>> 
>> we are using cephfs with  16.2.* with multi active mds, and recently, we 
>> have two nodes mount with ceph-fuse due to the old os system.
>> 
>> and  one nodes run a python script with `glob.glob(path)`, and another 
>> client doing `cp` operation on the same path.
>> 
>> then we see some log about `mds slow request`, and logs complain “failed to 
>> authpin, subtree is being exported"
>> 
>> then need to restart mds,
>> 
>> 
>> our question is, does there any dead lock?  how can we avoid this and how to 
>> fix it without restart mds(it will influence other users) ?
> 
> BTW, won't the slow requests disappear themself later ?
> 
> It looks like the exporting is slow or there too many exports are going on.
> 
> Thanks
> 
> - Xiubo
> 
>> 
>> Thanks a ton!
>> 
>> 
>> xz
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] ceph-exporter binds to IPv4 only

2023-11-22 Thread Stefan Kooman

Hi,

In a IPv6 only deployment the ceph-exporter daemons are not listening on 
IPv6 address(es). This can be fixed by editing the unit.run file of the 
ceph-exporter by changing "--addrs=0.0.0.0" to "--addrs=::".


Is this configurable? So that cephadm deploys ceph-exporter with proper 
unit.run arguments?


Gr. Stefan

... who really thinks the Ceph test lab should have an IPv6 only test 
environment to catch these things

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] CephFS - MDS removed from map - filesystem keeps to be stopped

2023-11-22 Thread Denis Polom

Hi

running Ceph Pacific 16.2.13.

we had full CephFS filesystem and after adding new HW we tried to start 
it but our MDS daemons are pushed to be standby and are removed from MDS 
map.


Filesystem was broken, so we repaired it with:

# ceph fs fail cephfs

# cephfs-journal-tool --rank=cephfs:0 event recover_dentries summary

# cephfs-journal-tool --rank=cephfs:0 journal reset

then I started ceph-mds service

and marked rank as repaired

mds after some time has switched to standby. Log is bellow.

I would appreciate any help to resolve this situation. Thank you.

from log:

2023-11-22T14:11:49.212+0100 7f5dc155e700  1 mds.0.9604 handle_mds_map i 
am now mds.0.9604
2023-11-22T14:11:49.212+0100 7f5dc155e700  1 mds.0.9604 handle_mds_map 
state change up:rejoin --> up:active
2023-11-22T14:11:49.212+0100 7f5dc155e700  1 mds.0.9604 recovery_done -- 
successful recovery!

2023-11-22T14:11:49.212+0100 7f5dc155e700  1 mds.0.9604 active_start
2023-11-22T14:11:49.216+0100 7f5dc155e700  1 mds.0.9604 cluster recovered.
2023-11-22T14:11:49.216+0100 7f5dc3d63700  0 --1- 
[v2:10.245.4.103:6800/1548097835,v1:10.245.4.103:6801/1548097835] >> 
v1:10.245.8.127:0/2123529386 conn(0x55a60627a800 0x55a606e5b000 :6801 
s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).
handle_connect_message_2 accept peer reset, then tried to connect to us, 
replacing
2023-11-22T14:11:49.216+0100 7f5dc4564700  0 --1- 
[v2:10.245.4.103:6800/1548097835,v1:10.245.4.103:6801/1548097835] >> 
v1:10.245.6.88:0/1899426587 conn(0x55a60627ac00 0x55a6070d :6801 
s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).h
andle_connect_message_2 accept peer reset, then tried to connect to us, 
replacing
2023-11-22T14:11:49.216+0100 7f5dc4564700  0 --1- 
[v2:10.245.4.103:6800/1548097835,v1:10.245.4.103:6801/1548097835] >> 
v1:10.245.4.216:0/2058542052 conn(0x55a6070c9800 0x55a6070d1800 :6801 
s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).
handle_connect_message_2 accept peer reset, then tried to connect to us, 
replacing
2023-11-22T14:11:49.216+0100 7f5dc3d63700  0 --1- 
[v2:10.245.4.103:6800/1548097835,v1:10.245.4.103:6801/1548097835] >> 
v1:10.245.4.220:0/1549374180 conn(0x55a60708d000 0x55a6070d0800 :6801 
s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).
handle_connect_message_2 accept peer reset, then tried to connect to us, 
replacing
2023-11-22T14:11:49.216+0100 7f5dc4d65700  0 --1- 
[v2:10.245.4.103:6800/1548097835,v1:10.245.4.103:6801/1548097835] >> 
v1:10.245.8.180:0/270666178 conn(0x55a60703a000 0x55a6070cf800 :6801 
s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).h
andle_connect_message_2 accept peer reset, then tried to connect to us, 
replacing
2023-11-22T14:11:49.216+0100 7f5dc4d65700  0 --1- 
[v2:10.245.4.103:6800/1548097835,v1:10.245.4.103:6801/1548097835] >> 
v1:10.245.8.178:0/3673271488 conn(0x55a6070c9400 0x55a6070d1000 :6801 
s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).
handle_connect_message_2 accept peer reset, then tried to connect to us, 
replacing
2023-11-22T14:11:49.216+0100 7f5dc4d65700  0 --1- 
[v2:10.245.4.103:6800/1548097835,v1:10.245.4.103:6801/1548097835] >> 
v1:10.245.4.167:0/2667964940 conn(0x55a6070c9c00 0x55a607112000 :6801 
s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).
handle_connect_message_2 accept peer reset, then tried to connect to us, 
replacing
2023-11-22T14:11:49.216+0100 7f5dc3d63700  0 --1- 
[v2:10.245.4.103:6800/1548097835,v1:10.245.4.103:6801/1548097835] >> 
v1:10.245.6.70:0/3181830075 conn(0x55a607116000 0x55a607112800 :6801 
s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).h
andle_connect_message_2 accept peer reset, then tried to connect to us, 
replacing
2023-11-22T14:11:49.216+0100 7f5dc4564700  0 --1- 
[v2:10.245.4.103:6800/1548097835,v1:10.245.4.103:6801/1548097835] >> 
v1:10.245.6.72:0/3744737352 conn(0x55a60627a800 0x55a606e5b000 :6801 
s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).h
andle_connect_message_2 accept peer reset, then tried to connect to us, 
replacing
2023-11-22T14:11:49.216+0100 7f5dc3d63700  0 --1- 
[v2:10.245.4.103:6800/1548097835,v1:10.245.4.103:6801/1548097835] >> 
v1:10.244.18.140:0/1607447464 conn(0x55a60627ac00 0x55a6070d :6801 
s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0)
.handle_connect_message_2 accept peer reset, then tried to connect to 
us, replacing
2023-11-22T14:11:49.220+0100 7f5dc155e700  1 mds.mds1 Updating MDS map 
to version 9608 from mon.1
2023-11-22T14:11:49.220+0100 7f5dc155e700  1 mds.0.9604 handle_mds_map i 
am now mds.0.9604
2023-11-22T14:11:49.220+0100 7f5dc155e700  1 mds.0.9604 handle_mds_map 
state change up:active --> up:stopping
2023-11-22T14:11:52.412+0100 7f5dc3562700  1 mds.mds1 asok_command: 
client ls {prefix=client ls} (starting...)
2023-11-22T14:11:57.412+0100 7f5dc3562700  1 mds.mds1 asok_command: 
client ls {prefix=client ls} (starting...)
2023-11-22T14:12:02.416+0100 7f5dc3562700  1 mds.mds1 asok_command: 
client ls {prefix=client ls} (starting...)
2023-11-22T14:12:07.420+0100 7f5dc3562700  1 mds.mds1 asok_command: 

[ceph-users] Re: Ceph 16.2.14: ceph-mgr getting oom-killed

2023-11-22 Thread Zakhar Kirpichenko
Thanks for this. This looks similar to what we're observing. Although we
don't use the API apart from the usage by Ceph deployment itself - which I
guess still counts.

/Z

On Wed, 22 Nov 2023, 15:22 Adrien Georget, 
wrote:

> Hi,
>
> This memory leak with ceph-mgr seems to be due to a change in Ceph 16.2.12.
> Check this issue : https://tracker.ceph.com/issues/59580
> We are also affected by this, with or without containerized services.
>
> Cheers,
> Adrien
>
> Le 22/11/2023 à 14:14, Eugen Block a écrit :
> > One other difference is you use docker, right? We use podman, could it
> > be some docker restriction?
> >
> > Zitat von Zakhar Kirpichenko :
> >
> >> It's a 6-node cluster with 96 OSDs, not much I/O, mgr . Each node has
> >> 384
> >> GB of RAM, each OSD has a memory target of 16 GB, about 100 GB of
> >> memory,
> >> give or take, is available (mostly used by page cache) on each node
> >> during
> >> normal operation. Nothing unusual there, tbh.
> >>
> >> No unusual mgr modules or settings either, except for disabled progress:
> >>
> >> {
> >> "always_on_modules": [
> >> "balancer",
> >> "crash",
> >> "devicehealth",
> >> "orchestrator",
> >> "pg_autoscaler",
> >> "progress",
> >> "rbd_support",
> >> "status",
> >> "telemetry",
> >> "volumes"
> >> ],
> >> "enabled_modules": [
> >> "cephadm",
> >> "dashboard",
> >> "iostat",
> >> "prometheus",
> >> "restful"
> >> ],
> >>
> >> /Z
> >>
> >> On Wed, 22 Nov 2023, 14:52 Eugen Block,  wrote:
> >>
> >>> What does your hardware look like memory-wise? Just for comparison,
> >>> one customer cluster has 4,5 GB in use (middle-sized cluster for
> >>> openstack, 280 OSDs):
> >>>
> >>>  PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+
> >>> COMMAND
> >>> 6077 ceph  20   0 6357560 4,522g  22316 S 12,00 1,797
> >>> 57022:54 ceph-mgr
> >>>
> >>> In our own cluster (smaller than that and not really heavily used) the
> >>> mgr uses almost 2 GB. So those numbers you have seem relatively small.
> >>>
> >>> Zitat von Zakhar Kirpichenko :
> >>>
> >>> > I've disabled the progress module entirely and will see how it goes.
> >>> > Otherwise, mgr memory usage keeps increasing slowly, from past
> >>> experience
> >>> > it will stabilize at around 1.5-1.6 GB. Other than this event
> >>> warning,
> >>> it's
> >>> > unclear what could have caused random memory ballooning.
> >>> >
> >>> > /Z
> >>> >
> >>> > On Wed, 22 Nov 2023 at 13:07, Eugen Block  wrote:
> >>> >
> >>> >> I see these progress messages all the time, I don't think they cause
> >>> >> it, but I might be wrong. You can disable it just to rule that out.
> >>> >>
> >>> >> Zitat von Zakhar Kirpichenko :
> >>> >>
> >>> >> > Unfortunately, I don't have a full stack trace because there's no
> >>> crash
> >>> >> > when the mgr gets oom-killed. There's just the mgr log, which
> >>> looks
> >>> >> > completely normal until about 2-3 minutes before the oom-kill,
> >>> when
> >>> >> > tmalloc warnings show up.
> >>> >> >
> >>> >> > I'm not sure that it's the same issue that is described in the
> >>> tracker.
> >>> >> We
> >>> >> > seem to have some stale "events" in the progress module though:
> >>> >> >
> >>> >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
> >>> 2023-11-21T14:56:30.718+
> >>> >> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
> >>> >> > cacc4230-75ee-4892-b8fd-a19fec8f9f66 does not exist
> >>> >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
> >>> 2023-11-21T14:56:30.718+
> >>> >> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
> >>> >> > 44824331-3f6b-45c4-b925-423d098c3c76 does not exist
> >>> >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
> >>> 2023-11-21T14:56:30.718+
> >>> >> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
> >>> >> > 0139bc54-ae42-4483-b278-851d77f23f9f does not exist
> >>> >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
> >>> 2023-11-21T14:56:30.718+
> >>> >> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
> >>> >> > f9d6c20e-b8d8-4625-b9cf-84da1244c822 does not exist
> >>> >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
> >>> 2023-11-21T14:56:30.718+
> >>> >> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
> >>> >> > 1486b26d-2a23-4416-a864-2cbb0ecf1429 does not exist
> >>> >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
> >>> 2023-11-21T14:56:30.718+
> >>> >> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
> >>> >> > 7f14d01c-498c-413f-b2ef-05521050190a does not exist
> >>> >> > Nov 21 14:57:35 ceph01 bash[3941523]: debug
> >>> 2023-11-21T14:57:35.950+
> >>> >> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
> >>> >> > 48cbd97f-82f7-4b80-8086-890fff6e0824 does not exist
> >>> >> >
> >>> >> > I tried clearing them but they keep showing up. I am wondering if
> >>> these
> >>> >> > missing events can cause memory leaks over time.
> >>> >> >
> >>> 

[ceph-users] Re: Ceph 16.2.14: ceph-mgr getting oom-killed

2023-11-22 Thread Zakhar Kirpichenko
Yes, we use docker, though we haven't had any issues because of it. I don't
think that docker itself can cause mgr memory leaks.

/Z

On Wed, 22 Nov 2023, 15:14 Eugen Block,  wrote:

> One other difference is you use docker, right? We use podman, could it
> be some docker restriction?
>
> Zitat von Zakhar Kirpichenko :
>
> > It's a 6-node cluster with 96 OSDs, not much I/O, mgr . Each node has 384
> > GB of RAM, each OSD has a memory target of 16 GB, about 100 GB of memory,
> > give or take, is available (mostly used by page cache) on each node
> during
> > normal operation. Nothing unusual there, tbh.
> >
> > No unusual mgr modules or settings either, except for disabled progress:
> >
> > {
> > "always_on_modules": [
> > "balancer",
> > "crash",
> > "devicehealth",
> > "orchestrator",
> > "pg_autoscaler",
> > "progress",
> > "rbd_support",
> > "status",
> > "telemetry",
> > "volumes"
> > ],
> > "enabled_modules": [
> > "cephadm",
> > "dashboard",
> > "iostat",
> > "prometheus",
> > "restful"
> > ],
> >
> > /Z
> >
> > On Wed, 22 Nov 2023, 14:52 Eugen Block,  wrote:
> >
> >> What does your hardware look like memory-wise? Just for comparison,
> >> one customer cluster has 4,5 GB in use (middle-sized cluster for
> >> openstack, 280 OSDs):
> >>
> >>  PID USER  PR  NIVIRTRESSHR S  %CPU  %MEM TIME+
> >> COMMAND
> >> 6077 ceph  20   0 6357560 4,522g  22316 S 12,00 1,797
> >> 57022:54 ceph-mgr
> >>
> >> In our own cluster (smaller than that and not really heavily used) the
> >> mgr uses almost 2 GB. So those numbers you have seem relatively small.
> >>
> >> Zitat von Zakhar Kirpichenko :
> >>
> >> > I've disabled the progress module entirely and will see how it goes.
> >> > Otherwise, mgr memory usage keeps increasing slowly, from past
> experience
> >> > it will stabilize at around 1.5-1.6 GB. Other than this event warning,
> >> it's
> >> > unclear what could have caused random memory ballooning.
> >> >
> >> > /Z
> >> >
> >> > On Wed, 22 Nov 2023 at 13:07, Eugen Block  wrote:
> >> >
> >> >> I see these progress messages all the time, I don't think they cause
> >> >> it, but I might be wrong. You can disable it just to rule that out.
> >> >>
> >> >> Zitat von Zakhar Kirpichenko :
> >> >>
> >> >> > Unfortunately, I don't have a full stack trace because there's no
> >> crash
> >> >> > when the mgr gets oom-killed. There's just the mgr log, which looks
> >> >> > completely normal until about 2-3 minutes before the oom-kill, when
> >> >> > tmalloc warnings show up.
> >> >> >
> >> >> > I'm not sure that it's the same issue that is described in the
> >> tracker.
> >> >> We
> >> >> > seem to have some stale "events" in the progress module though:
> >> >> >
> >> >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
> >> 2023-11-21T14:56:30.718+
> >> >> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
> >> >> > cacc4230-75ee-4892-b8fd-a19fec8f9f66 does not exist
> >> >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
> >> 2023-11-21T14:56:30.718+
> >> >> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
> >> >> > 44824331-3f6b-45c4-b925-423d098c3c76 does not exist
> >> >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
> >> 2023-11-21T14:56:30.718+
> >> >> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
> >> >> > 0139bc54-ae42-4483-b278-851d77f23f9f does not exist
> >> >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
> >> 2023-11-21T14:56:30.718+
> >> >> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
> >> >> > f9d6c20e-b8d8-4625-b9cf-84da1244c822 does not exist
> >> >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
> >> 2023-11-21T14:56:30.718+
> >> >> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
> >> >> > 1486b26d-2a23-4416-a864-2cbb0ecf1429 does not exist
> >> >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
> >> 2023-11-21T14:56:30.718+
> >> >> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
> >> >> > 7f14d01c-498c-413f-b2ef-05521050190a does not exist
> >> >> > Nov 21 14:57:35 ceph01 bash[3941523]: debug
> >> 2023-11-21T14:57:35.950+
> >> >> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
> >> >> > 48cbd97f-82f7-4b80-8086-890fff6e0824 does not exist
> >> >> >
> >> >> > I tried clearing them but they keep showing up. I am wondering if
> >> these
> >> >> > missing events can cause memory leaks over time.
> >> >> >
> >> >> > /Z
> >> >> >
> >> >> > On Wed, 22 Nov 2023 at 11:12, Eugen Block  wrote:
> >> >> >
> >> >> >> Do you have the full stack trace? The pastebin only contains the
> >> >> >> "tcmalloc: large alloc" messages (same as in the tracker issue).
> >> Maybe
> >> >> >> comment in the tracker issue directly since Radek asked for
> someone
> >> >> >> with a similar problem in a newer release.
> >> >> >>
> >> >> >> Zitat von Zakhar Kirpichenko :
> >> >> >>
> >> >> >> > Thanks, Eugen. 

[ceph-users] Re: Ceph 16.2.14: ceph-mgr getting oom-killed

2023-11-22 Thread Adrien Georget

Hi,

This memory leak with ceph-mgr seems to be due to a change in Ceph 16.2.12.
Check this issue : https://tracker.ceph.com/issues/59580
We are also affected by this, with or without containerized services.

Cheers,
Adrien

Le 22/11/2023 à 14:14, Eugen Block a écrit :
One other difference is you use docker, right? We use podman, could it 
be some docker restriction?


Zitat von Zakhar Kirpichenko :

It's a 6-node cluster with 96 OSDs, not much I/O, mgr . Each node has 
384
GB of RAM, each OSD has a memory target of 16 GB, about 100 GB of 
memory,
give or take, is available (mostly used by page cache) on each node 
during

normal operation. Nothing unusual there, tbh.

No unusual mgr modules or settings either, except for disabled progress:

{
    "always_on_modules": [
    "balancer",
    "crash",
    "devicehealth",
    "orchestrator",
    "pg_autoscaler",
    "progress",
    "rbd_support",
    "status",
    "telemetry",
    "volumes"
    ],
    "enabled_modules": [
    "cephadm",
    "dashboard",
    "iostat",
    "prometheus",
    "restful"
    ],

/Z

On Wed, 22 Nov 2023, 14:52 Eugen Block,  wrote:


What does your hardware look like memory-wise? Just for comparison,
one customer cluster has 4,5 GB in use (middle-sized cluster for
openstack, 280 OSDs):

 PID USER  PR  NI    VIRT    RES    SHR S  %CPU %MEM TIME+
COMMAND
    6077 ceph  20   0 6357560 4,522g  22316 S 12,00 1,797
57022:54 ceph-mgr

In our own cluster (smaller than that and not really heavily used) the
mgr uses almost 2 GB. So those numbers you have seem relatively small.

Zitat von Zakhar Kirpichenko :

> I've disabled the progress module entirely and will see how it goes.
> Otherwise, mgr memory usage keeps increasing slowly, from past 
experience
> it will stabilize at around 1.5-1.6 GB. Other than this event 
warning,

it's
> unclear what could have caused random memory ballooning.
>
> /Z
>
> On Wed, 22 Nov 2023 at 13:07, Eugen Block  wrote:
>
>> I see these progress messages all the time, I don't think they cause
>> it, but I might be wrong. You can disable it just to rule that out.
>>
>> Zitat von Zakhar Kirpichenko :
>>
>> > Unfortunately, I don't have a full stack trace because there's no
crash
>> > when the mgr gets oom-killed. There's just the mgr log, which 
looks
>> > completely normal until about 2-3 minutes before the oom-kill, 
when

>> > tmalloc warnings show up.
>> >
>> > I'm not sure that it's the same issue that is described in the
tracker.
>> We
>> > seem to have some stale "events" in the progress module though:
>> >
>> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
2023-11-21T14:56:30.718+
>> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
>> > cacc4230-75ee-4892-b8fd-a19fec8f9f66 does not exist
>> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
2023-11-21T14:56:30.718+
>> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
>> > 44824331-3f6b-45c4-b925-423d098c3c76 does not exist
>> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
2023-11-21T14:56:30.718+
>> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
>> > 0139bc54-ae42-4483-b278-851d77f23f9f does not exist
>> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
2023-11-21T14:56:30.718+
>> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
>> > f9d6c20e-b8d8-4625-b9cf-84da1244c822 does not exist
>> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
2023-11-21T14:56:30.718+
>> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
>> > 1486b26d-2a23-4416-a864-2cbb0ecf1429 does not exist
>> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
2023-11-21T14:56:30.718+
>> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
>> > 7f14d01c-498c-413f-b2ef-05521050190a does not exist
>> > Nov 21 14:57:35 ceph01 bash[3941523]: debug
2023-11-21T14:57:35.950+
>> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
>> > 48cbd97f-82f7-4b80-8086-890fff6e0824 does not exist
>> >
>> > I tried clearing them but they keep showing up. I am wondering if
these
>> > missing events can cause memory leaks over time.
>> >
>> > /Z
>> >
>> > On Wed, 22 Nov 2023 at 11:12, Eugen Block  wrote:
>> >
>> >> Do you have the full stack trace? The pastebin only contains the
>> >> "tcmalloc: large alloc" messages (same as in the tracker issue).
Maybe
>> >> comment in the tracker issue directly since Radek asked for 
someone

>> >> with a similar problem in a newer release.
>> >>
>> >> Zitat von Zakhar Kirpichenko :
>> >>
>> >> > Thanks, Eugen. It is similar in the sense that the mgr is 
getting

>> >> > OOM-killed.
>> >> >
>> >> > It started happening in our cluster after the upgrade to 
16.2.14.

We
>> >> > haven't had this issue with earlier Pacific releases.
>> >> >
>> >> > /Z
>> >> >
>> >> > On Tue, 21 Nov 2023, 21:53 Eugen Block,  wrote:
>> >> >
>> >> >> Just checking it on the phone, but isn’t this quite similar?
>> >> >>
>> >> >> https://tracker.ceph.com/issues/45136
>> >> >>
>> >> >> 

[ceph-users] Re: Ceph 16.2.14: ceph-mgr getting oom-killed

2023-11-22 Thread Eugen Block
One other difference is you use docker, right? We use podman, could it  
be some docker restriction?


Zitat von Zakhar Kirpichenko :


It's a 6-node cluster with 96 OSDs, not much I/O, mgr . Each node has 384
GB of RAM, each OSD has a memory target of 16 GB, about 100 GB of memory,
give or take, is available (mostly used by page cache) on each node during
normal operation. Nothing unusual there, tbh.

No unusual mgr modules or settings either, except for disabled progress:

{
"always_on_modules": [
"balancer",
"crash",
"devicehealth",
"orchestrator",
"pg_autoscaler",
"progress",
"rbd_support",
"status",
"telemetry",
"volumes"
],
"enabled_modules": [
"cephadm",
"dashboard",
"iostat",
"prometheus",
"restful"
],

/Z

On Wed, 22 Nov 2023, 14:52 Eugen Block,  wrote:


What does your hardware look like memory-wise? Just for comparison,
one customer cluster has 4,5 GB in use (middle-sized cluster for
openstack, 280 OSDs):

 PID USER  PR  NIVIRTRESSHR S  %CPU  %MEM TIME+
COMMAND
6077 ceph  20   0 6357560 4,522g  22316 S 12,00 1,797
57022:54 ceph-mgr

In our own cluster (smaller than that and not really heavily used) the
mgr uses almost 2 GB. So those numbers you have seem relatively small.

Zitat von Zakhar Kirpichenko :

> I've disabled the progress module entirely and will see how it goes.
> Otherwise, mgr memory usage keeps increasing slowly, from past experience
> it will stabilize at around 1.5-1.6 GB. Other than this event warning,
it's
> unclear what could have caused random memory ballooning.
>
> /Z
>
> On Wed, 22 Nov 2023 at 13:07, Eugen Block  wrote:
>
>> I see these progress messages all the time, I don't think they cause
>> it, but I might be wrong. You can disable it just to rule that out.
>>
>> Zitat von Zakhar Kirpichenko :
>>
>> > Unfortunately, I don't have a full stack trace because there's no
crash
>> > when the mgr gets oom-killed. There's just the mgr log, which looks
>> > completely normal until about 2-3 minutes before the oom-kill, when
>> > tmalloc warnings show up.
>> >
>> > I'm not sure that it's the same issue that is described in the
tracker.
>> We
>> > seem to have some stale "events" in the progress module though:
>> >
>> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
2023-11-21T14:56:30.718+
>> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
>> > cacc4230-75ee-4892-b8fd-a19fec8f9f66 does not exist
>> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
2023-11-21T14:56:30.718+
>> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
>> > 44824331-3f6b-45c4-b925-423d098c3c76 does not exist
>> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
2023-11-21T14:56:30.718+
>> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
>> > 0139bc54-ae42-4483-b278-851d77f23f9f does not exist
>> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
2023-11-21T14:56:30.718+
>> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
>> > f9d6c20e-b8d8-4625-b9cf-84da1244c822 does not exist
>> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
2023-11-21T14:56:30.718+
>> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
>> > 1486b26d-2a23-4416-a864-2cbb0ecf1429 does not exist
>> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
2023-11-21T14:56:30.718+
>> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
>> > 7f14d01c-498c-413f-b2ef-05521050190a does not exist
>> > Nov 21 14:57:35 ceph01 bash[3941523]: debug
2023-11-21T14:57:35.950+
>> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
>> > 48cbd97f-82f7-4b80-8086-890fff6e0824 does not exist
>> >
>> > I tried clearing them but they keep showing up. I am wondering if
these
>> > missing events can cause memory leaks over time.
>> >
>> > /Z
>> >
>> > On Wed, 22 Nov 2023 at 11:12, Eugen Block  wrote:
>> >
>> >> Do you have the full stack trace? The pastebin only contains the
>> >> "tcmalloc: large alloc" messages (same as in the tracker issue).
Maybe
>> >> comment in the tracker issue directly since Radek asked for someone
>> >> with a similar problem in a newer release.
>> >>
>> >> Zitat von Zakhar Kirpichenko :
>> >>
>> >> > Thanks, Eugen. It is similar in the sense that the mgr is getting
>> >> > OOM-killed.
>> >> >
>> >> > It started happening in our cluster after the upgrade to 16.2.14.
We
>> >> > haven't had this issue with earlier Pacific releases.
>> >> >
>> >> > /Z
>> >> >
>> >> > On Tue, 21 Nov 2023, 21:53 Eugen Block,  wrote:
>> >> >
>> >> >> Just checking it on the phone, but isn’t this quite similar?
>> >> >>
>> >> >> https://tracker.ceph.com/issues/45136
>> >> >>
>> >> >> Zitat von Zakhar Kirpichenko :
>> >> >>
>> >> >> > Hi,
>> >> >> >
>> >> >> > I'm facing a rather new issue with our Ceph cluster: from time
to
>> time
>> >> >> > ceph-mgr on one of the two mgr nodes gets oom-killed after
>> consuming
>> >> over
>> >> >> > 100 GB RAM:
>> >> >> >
>> >> >> 

[ceph-users] Re: Ceph 16.2.14: ceph-mgr getting oom-killed

2023-11-22 Thread Zakhar Kirpichenko
It's a 6-node cluster with 96 OSDs, not much I/O, mgr . Each node has 384
GB of RAM, each OSD has a memory target of 16 GB, about 100 GB of memory,
give or take, is available (mostly used by page cache) on each node during
normal operation. Nothing unusual there, tbh.

No unusual mgr modules or settings either, except for disabled progress:

{
"always_on_modules": [
"balancer",
"crash",
"devicehealth",
"orchestrator",
"pg_autoscaler",
"progress",
"rbd_support",
"status",
"telemetry",
"volumes"
],
"enabled_modules": [
"cephadm",
"dashboard",
"iostat",
"prometheus",
"restful"
],

/Z

On Wed, 22 Nov 2023, 14:52 Eugen Block,  wrote:

> What does your hardware look like memory-wise? Just for comparison,
> one customer cluster has 4,5 GB in use (middle-sized cluster for
> openstack, 280 OSDs):
>
>  PID USER  PR  NIVIRTRESSHR S  %CPU  %MEM TIME+
> COMMAND
> 6077 ceph  20   0 6357560 4,522g  22316 S 12,00 1,797
> 57022:54 ceph-mgr
>
> In our own cluster (smaller than that and not really heavily used) the
> mgr uses almost 2 GB. So those numbers you have seem relatively small.
>
> Zitat von Zakhar Kirpichenko :
>
> > I've disabled the progress module entirely and will see how it goes.
> > Otherwise, mgr memory usage keeps increasing slowly, from past experience
> > it will stabilize at around 1.5-1.6 GB. Other than this event warning,
> it's
> > unclear what could have caused random memory ballooning.
> >
> > /Z
> >
> > On Wed, 22 Nov 2023 at 13:07, Eugen Block  wrote:
> >
> >> I see these progress messages all the time, I don't think they cause
> >> it, but I might be wrong. You can disable it just to rule that out.
> >>
> >> Zitat von Zakhar Kirpichenko :
> >>
> >> > Unfortunately, I don't have a full stack trace because there's no
> crash
> >> > when the mgr gets oom-killed. There's just the mgr log, which looks
> >> > completely normal until about 2-3 minutes before the oom-kill, when
> >> > tmalloc warnings show up.
> >> >
> >> > I'm not sure that it's the same issue that is described in the
> tracker.
> >> We
> >> > seem to have some stale "events" in the progress module though:
> >> >
> >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
> 2023-11-21T14:56:30.718+
> >> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
> >> > cacc4230-75ee-4892-b8fd-a19fec8f9f66 does not exist
> >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
> 2023-11-21T14:56:30.718+
> >> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
> >> > 44824331-3f6b-45c4-b925-423d098c3c76 does not exist
> >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
> 2023-11-21T14:56:30.718+
> >> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
> >> > 0139bc54-ae42-4483-b278-851d77f23f9f does not exist
> >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
> 2023-11-21T14:56:30.718+
> >> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
> >> > f9d6c20e-b8d8-4625-b9cf-84da1244c822 does not exist
> >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
> 2023-11-21T14:56:30.718+
> >> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
> >> > 1486b26d-2a23-4416-a864-2cbb0ecf1429 does not exist
> >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
> 2023-11-21T14:56:30.718+
> >> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
> >> > 7f14d01c-498c-413f-b2ef-05521050190a does not exist
> >> > Nov 21 14:57:35 ceph01 bash[3941523]: debug
> 2023-11-21T14:57:35.950+
> >> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
> >> > 48cbd97f-82f7-4b80-8086-890fff6e0824 does not exist
> >> >
> >> > I tried clearing them but they keep showing up. I am wondering if
> these
> >> > missing events can cause memory leaks over time.
> >> >
> >> > /Z
> >> >
> >> > On Wed, 22 Nov 2023 at 11:12, Eugen Block  wrote:
> >> >
> >> >> Do you have the full stack trace? The pastebin only contains the
> >> >> "tcmalloc: large alloc" messages (same as in the tracker issue).
> Maybe
> >> >> comment in the tracker issue directly since Radek asked for someone
> >> >> with a similar problem in a newer release.
> >> >>
> >> >> Zitat von Zakhar Kirpichenko :
> >> >>
> >> >> > Thanks, Eugen. It is similar in the sense that the mgr is getting
> >> >> > OOM-killed.
> >> >> >
> >> >> > It started happening in our cluster after the upgrade to 16.2.14.
> We
> >> >> > haven't had this issue with earlier Pacific releases.
> >> >> >
> >> >> > /Z
> >> >> >
> >> >> > On Tue, 21 Nov 2023, 21:53 Eugen Block,  wrote:
> >> >> >
> >> >> >> Just checking it on the phone, but isn’t this quite similar?
> >> >> >>
> >> >> >> https://tracker.ceph.com/issues/45136
> >> >> >>
> >> >> >> Zitat von Zakhar Kirpichenko :
> >> >> >>
> >> >> >> > Hi,
> >> >> >> >
> >> >> >> > I'm facing a rather new issue with our Ceph cluster: from time
> to
> >> time
> >> >> >> > ceph-mgr on one of the two mgr nodes gets 

[ceph-users] Re: Ceph 16.2.14: ceph-mgr getting oom-killed

2023-11-22 Thread Eugen Block
What does your hardware look like memory-wise? Just for comparison,  
one customer cluster has 4,5 GB in use (middle-sized cluster for  
openstack, 280 OSDs):


PID USER  PR  NIVIRTRESSHR S  %CPU  %MEM TIME+ COMMAND
   6077 ceph  20   0 6357560 4,522g  22316 S 12,00 1,797   
57022:54 ceph-mgr


In our own cluster (smaller than that and not really heavily used) the  
mgr uses almost 2 GB. So those numbers you have seem relatively small.


Zitat von Zakhar Kirpichenko :


I've disabled the progress module entirely and will see how it goes.
Otherwise, mgr memory usage keeps increasing slowly, from past experience
it will stabilize at around 1.5-1.6 GB. Other than this event warning, it's
unclear what could have caused random memory ballooning.

/Z

On Wed, 22 Nov 2023 at 13:07, Eugen Block  wrote:


I see these progress messages all the time, I don't think they cause
it, but I might be wrong. You can disable it just to rule that out.

Zitat von Zakhar Kirpichenko :

> Unfortunately, I don't have a full stack trace because there's no crash
> when the mgr gets oom-killed. There's just the mgr log, which looks
> completely normal until about 2-3 minutes before the oom-kill, when
> tmalloc warnings show up.
>
> I'm not sure that it's the same issue that is described in the tracker.
We
> seem to have some stale "events" in the progress module though:
>
> Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+
> 7f4bb19ef700  0 [progress WARNING root] complete: ev
> cacc4230-75ee-4892-b8fd-a19fec8f9f66 does not exist
> Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+
> 7f4bb19ef700  0 [progress WARNING root] complete: ev
> 44824331-3f6b-45c4-b925-423d098c3c76 does not exist
> Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+
> 7f4bb19ef700  0 [progress WARNING root] complete: ev
> 0139bc54-ae42-4483-b278-851d77f23f9f does not exist
> Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+
> 7f4bb19ef700  0 [progress WARNING root] complete: ev
> f9d6c20e-b8d8-4625-b9cf-84da1244c822 does not exist
> Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+
> 7f4bb19ef700  0 [progress WARNING root] complete: ev
> 1486b26d-2a23-4416-a864-2cbb0ecf1429 does not exist
> Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+
> 7f4bb19ef700  0 [progress WARNING root] complete: ev
> 7f14d01c-498c-413f-b2ef-05521050190a does not exist
> Nov 21 14:57:35 ceph01 bash[3941523]: debug 2023-11-21T14:57:35.950+
> 7f4bb19ef700  0 [progress WARNING root] complete: ev
> 48cbd97f-82f7-4b80-8086-890fff6e0824 does not exist
>
> I tried clearing them but they keep showing up. I am wondering if these
> missing events can cause memory leaks over time.
>
> /Z
>
> On Wed, 22 Nov 2023 at 11:12, Eugen Block  wrote:
>
>> Do you have the full stack trace? The pastebin only contains the
>> "tcmalloc: large alloc" messages (same as in the tracker issue). Maybe
>> comment in the tracker issue directly since Radek asked for someone
>> with a similar problem in a newer release.
>>
>> Zitat von Zakhar Kirpichenko :
>>
>> > Thanks, Eugen. It is similar in the sense that the mgr is getting
>> > OOM-killed.
>> >
>> > It started happening in our cluster after the upgrade to 16.2.14. We
>> > haven't had this issue with earlier Pacific releases.
>> >
>> > /Z
>> >
>> > On Tue, 21 Nov 2023, 21:53 Eugen Block,  wrote:
>> >
>> >> Just checking it on the phone, but isn’t this quite similar?
>> >>
>> >> https://tracker.ceph.com/issues/45136
>> >>
>> >> Zitat von Zakhar Kirpichenko :
>> >>
>> >> > Hi,
>> >> >
>> >> > I'm facing a rather new issue with our Ceph cluster: from time to
time
>> >> > ceph-mgr on one of the two mgr nodes gets oom-killed after
consuming
>> over
>> >> > 100 GB RAM:
>> >> >
>> >> > [Nov21 15:02] tp_osd_tp invoked oom-killer:
>> >> > gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
>> >> > [  +0.10]  oom_kill_process.cold+0xb/0x10
>> >> > [  +0.02] [  pid  ]   uid  tgid total_vm  rss
pgtables_bytes
>> >> > swapents oom_score_adj name
>> >> > [  +0.08]
>> >> >
>> >>
>>
oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=504d37b566d9fd442d45904a00584b4f61c93c5d49dc59eb1c948b3d1c096907,mems_allowed=0-1,global_oom,task_memcg=/docker/3826be8f9115479117ddb8b721ca57585b2bdd58a27c7ed7b38e8d83eb795957,task=ceph-mgr,pid=3941610,uid=167
>> >> > [  +0.000697] Out of memory: Killed process 3941610 (ceph-mgr)
>> >> > total-vm:146986656kB, anon-rss:125340436kB, file-rss:0kB,
>> shmem-rss:0kB,
>> >> > UID:167 pgtables:260356kB oom_score_adj:0
>> >> > [  +6.509769] oom_reaper: reaped process 3941610 (ceph-mgr), now
>> >> > anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
>> >> >
>> >> > The cluster is stable and operating normally, there's nothing
unusual
>> >> going
>> >> > on before, during or after the kill, thus it's unclear what causes
the
>> >> mgr
>> >> > to balloon, use all 

[ceph-users] Re: Ceph 16.2.14: ceph-mgr getting oom-killed

2023-11-22 Thread Zakhar Kirpichenko
I've disabled the progress module entirely and will see how it goes.
Otherwise, mgr memory usage keeps increasing slowly, from past experience
it will stabilize at around 1.5-1.6 GB. Other than this event warning, it's
unclear what could have caused random memory ballooning.

/Z

On Wed, 22 Nov 2023 at 13:07, Eugen Block  wrote:

> I see these progress messages all the time, I don't think they cause
> it, but I might be wrong. You can disable it just to rule that out.
>
> Zitat von Zakhar Kirpichenko :
>
> > Unfortunately, I don't have a full stack trace because there's no crash
> > when the mgr gets oom-killed. There's just the mgr log, which looks
> > completely normal until about 2-3 minutes before the oom-kill, when
> > tmalloc warnings show up.
> >
> > I'm not sure that it's the same issue that is described in the tracker.
> We
> > seem to have some stale "events" in the progress module though:
> >
> > Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+
> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
> > cacc4230-75ee-4892-b8fd-a19fec8f9f66 does not exist
> > Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+
> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
> > 44824331-3f6b-45c4-b925-423d098c3c76 does not exist
> > Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+
> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
> > 0139bc54-ae42-4483-b278-851d77f23f9f does not exist
> > Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+
> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
> > f9d6c20e-b8d8-4625-b9cf-84da1244c822 does not exist
> > Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+
> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
> > 1486b26d-2a23-4416-a864-2cbb0ecf1429 does not exist
> > Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+
> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
> > 7f14d01c-498c-413f-b2ef-05521050190a does not exist
> > Nov 21 14:57:35 ceph01 bash[3941523]: debug 2023-11-21T14:57:35.950+
> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
> > 48cbd97f-82f7-4b80-8086-890fff6e0824 does not exist
> >
> > I tried clearing them but they keep showing up. I am wondering if these
> > missing events can cause memory leaks over time.
> >
> > /Z
> >
> > On Wed, 22 Nov 2023 at 11:12, Eugen Block  wrote:
> >
> >> Do you have the full stack trace? The pastebin only contains the
> >> "tcmalloc: large alloc" messages (same as in the tracker issue). Maybe
> >> comment in the tracker issue directly since Radek asked for someone
> >> with a similar problem in a newer release.
> >>
> >> Zitat von Zakhar Kirpichenko :
> >>
> >> > Thanks, Eugen. It is similar in the sense that the mgr is getting
> >> > OOM-killed.
> >> >
> >> > It started happening in our cluster after the upgrade to 16.2.14. We
> >> > haven't had this issue with earlier Pacific releases.
> >> >
> >> > /Z
> >> >
> >> > On Tue, 21 Nov 2023, 21:53 Eugen Block,  wrote:
> >> >
> >> >> Just checking it on the phone, but isn’t this quite similar?
> >> >>
> >> >> https://tracker.ceph.com/issues/45136
> >> >>
> >> >> Zitat von Zakhar Kirpichenko :
> >> >>
> >> >> > Hi,
> >> >> >
> >> >> > I'm facing a rather new issue with our Ceph cluster: from time to
> time
> >> >> > ceph-mgr on one of the two mgr nodes gets oom-killed after
> consuming
> >> over
> >> >> > 100 GB RAM:
> >> >> >
> >> >> > [Nov21 15:02] tp_osd_tp invoked oom-killer:
> >> >> > gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
> >> >> > [  +0.10]  oom_kill_process.cold+0xb/0x10
> >> >> > [  +0.02] [  pid  ]   uid  tgid total_vm  rss
> pgtables_bytes
> >> >> > swapents oom_score_adj name
> >> >> > [  +0.08]
> >> >> >
> >> >>
> >>
> oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=504d37b566d9fd442d45904a00584b4f61c93c5d49dc59eb1c948b3d1c096907,mems_allowed=0-1,global_oom,task_memcg=/docker/3826be8f9115479117ddb8b721ca57585b2bdd58a27c7ed7b38e8d83eb795957,task=ceph-mgr,pid=3941610,uid=167
> >> >> > [  +0.000697] Out of memory: Killed process 3941610 (ceph-mgr)
> >> >> > total-vm:146986656kB, anon-rss:125340436kB, file-rss:0kB,
> >> shmem-rss:0kB,
> >> >> > UID:167 pgtables:260356kB oom_score_adj:0
> >> >> > [  +6.509769] oom_reaper: reaped process 3941610 (ceph-mgr), now
> >> >> > anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> >> >> >
> >> >> > The cluster is stable and operating normally, there's nothing
> unusual
> >> >> going
> >> >> > on before, during or after the kill, thus it's unclear what causes
> the
> >> >> mgr
> >> >> > to balloon, use all RAM and get killed. Systemd logs aren't very
> >> helpful:
> >> >> > they just show normal mgr operations until it fails to allocate
> memory
> >> >> and
> >> >> > gets killed: https://pastebin.com/MLyw9iVi
> >> >> >
> >> >> > The mgr experienced this issue several times in the last 2 months,
> and
> >> >> 

[ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported"

2023-11-22 Thread Frank Schilder
There are some unhandled race conditions in the MDS cluster in rare 
circumstances.

We had this issue with mimic and octopus and it went away after manually 
pinning sub-dirs to MDS ranks; see 
https://docs.ceph.com/en/nautilus/cephfs/multimds/?highlight=dir%20pin#manually-pinning-directory-trees-to-a-particular-rank.

This has the added advantage that one can bypass the internal load-balancer, 
which was horrible for our work loads. I have a related post about ephemeral 
pinning on this list one-two years ago. You should be able to find it. Short 
story: after manually pinning all user directories to ranks, all our problems 
disappeared and performance improved a lot. MDS load dropped from 130% average 
to 10-20%. So did memory consumption and cache recycling.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Eugen Block 
Sent: Wednesday, November 22, 2023 12:30 PM
To: ceph-users@ceph.io
Subject: [ceph-users]  Re: mds slow request with “failed to authpin, subtree is 
being exported"

Hi,

we've seen this a year ago in a Nautilus cluster with multi-active MDS
as well. It turned up only once within several years and we decided
not to look too closely at that time. How often do you see it? Is it
reproducable? In that case I'd recommend to create a tracker issue.

Regards,
Eugen

Zitat von zxcs :

> HI, Experts,
>
> we are using cephfs with  16.2.* with multi active mds, and
> recently, we have two nodes mount with ceph-fuse due to the old os
> system.
>
> and  one nodes run a python script with `glob.glob(path)`, and
> another client doing `cp` operation on the same path.
>
> then we see some log about `mds slow request`, and logs complain
> “failed to authpin, subtree is being exported"
>
> then need to restart mds,
>
>
> our question is, does there any dead lock?  how can we avoid this
> and how to fix it without restart mds(it will influence other users) ?
>
>
> Thanks a ton!
>
>
> xz
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported"

2023-11-22 Thread Xiubo Li


On 11/22/23 16:02, zxcs wrote:

HI, Experts,

we are using cephfs with  16.2.* with multi active mds, and recently, we have 
two nodes mount with ceph-fuse due to the old os system.

and  one nodes run a python script with `glob.glob(path)`, and another client 
doing `cp` operation on the same path.

then we see some log about `mds slow request`, and logs complain “failed to authpin, 
subtree is being exported"

then need to restart mds,


our question is, does there any dead lock?  how can we avoid this and how to 
fix it without restart mds(it will influence other users) ?


BTW, won't the slow requests disappear themself later ?

It looks like the exporting is slow or there too many exports are going on.

Thanks

- Xiubo



Thanks a ton!


xz
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported"

2023-11-22 Thread Eugen Block

Hi,

we've seen this a year ago in a Nautilus cluster with multi-active MDS  
as well. It turned up only once within several years and we decided  
not to look too closely at that time. How often do you see it? Is it  
reproducable? In that case I'd recommend to create a tracker issue.


Regards,
Eugen

Zitat von zxcs :


HI, Experts,

we are using cephfs with  16.2.* with multi active mds, and  
recently, we have two nodes mount with ceph-fuse due to the old os  
system.


and  one nodes run a python script with `glob.glob(path)`, and  
another client doing `cp` operation on the same path.


then we see some log about `mds slow request`, and logs complain  
“failed to authpin, subtree is being exported"


then need to restart mds,


our question is, does there any dead lock?  how can we avoid this  
and how to fix it without restart mds(it will influence other users) ?



Thanks a ton!


xz
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph 16.2.14: ceph-mgr getting oom-killed

2023-11-22 Thread Eugen Block
I see these progress messages all the time, I don't think they cause  
it, but I might be wrong. You can disable it just to rule that out.


Zitat von Zakhar Kirpichenko :


Unfortunately, I don't have a full stack trace because there's no crash
when the mgr gets oom-killed. There's just the mgr log, which looks
completely normal until about 2-3 minutes before the oom-kill, when
tmalloc warnings show up.

I'm not sure that it's the same issue that is described in the tracker. We
seem to have some stale "events" in the progress module though:

Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+
7f4bb19ef700  0 [progress WARNING root] complete: ev
cacc4230-75ee-4892-b8fd-a19fec8f9f66 does not exist
Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+
7f4bb19ef700  0 [progress WARNING root] complete: ev
44824331-3f6b-45c4-b925-423d098c3c76 does not exist
Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+
7f4bb19ef700  0 [progress WARNING root] complete: ev
0139bc54-ae42-4483-b278-851d77f23f9f does not exist
Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+
7f4bb19ef700  0 [progress WARNING root] complete: ev
f9d6c20e-b8d8-4625-b9cf-84da1244c822 does not exist
Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+
7f4bb19ef700  0 [progress WARNING root] complete: ev
1486b26d-2a23-4416-a864-2cbb0ecf1429 does not exist
Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+
7f4bb19ef700  0 [progress WARNING root] complete: ev
7f14d01c-498c-413f-b2ef-05521050190a does not exist
Nov 21 14:57:35 ceph01 bash[3941523]: debug 2023-11-21T14:57:35.950+
7f4bb19ef700  0 [progress WARNING root] complete: ev
48cbd97f-82f7-4b80-8086-890fff6e0824 does not exist

I tried clearing them but they keep showing up. I am wondering if these
missing events can cause memory leaks over time.

/Z

On Wed, 22 Nov 2023 at 11:12, Eugen Block  wrote:


Do you have the full stack trace? The pastebin only contains the
"tcmalloc: large alloc" messages (same as in the tracker issue). Maybe
comment in the tracker issue directly since Radek asked for someone
with a similar problem in a newer release.

Zitat von Zakhar Kirpichenko :

> Thanks, Eugen. It is similar in the sense that the mgr is getting
> OOM-killed.
>
> It started happening in our cluster after the upgrade to 16.2.14. We
> haven't had this issue with earlier Pacific releases.
>
> /Z
>
> On Tue, 21 Nov 2023, 21:53 Eugen Block,  wrote:
>
>> Just checking it on the phone, but isn’t this quite similar?
>>
>> https://tracker.ceph.com/issues/45136
>>
>> Zitat von Zakhar Kirpichenko :
>>
>> > Hi,
>> >
>> > I'm facing a rather new issue with our Ceph cluster: from time to time
>> > ceph-mgr on one of the two mgr nodes gets oom-killed after consuming
over
>> > 100 GB RAM:
>> >
>> > [Nov21 15:02] tp_osd_tp invoked oom-killer:
>> > gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
>> > [  +0.10]  oom_kill_process.cold+0xb/0x10
>> > [  +0.02] [  pid  ]   uid  tgid total_vm  rss pgtables_bytes
>> > swapents oom_score_adj name
>> > [  +0.08]
>> >
>>
oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=504d37b566d9fd442d45904a00584b4f61c93c5d49dc59eb1c948b3d1c096907,mems_allowed=0-1,global_oom,task_memcg=/docker/3826be8f9115479117ddb8b721ca57585b2bdd58a27c7ed7b38e8d83eb795957,task=ceph-mgr,pid=3941610,uid=167
>> > [  +0.000697] Out of memory: Killed process 3941610 (ceph-mgr)
>> > total-vm:146986656kB, anon-rss:125340436kB, file-rss:0kB,
shmem-rss:0kB,
>> > UID:167 pgtables:260356kB oom_score_adj:0
>> > [  +6.509769] oom_reaper: reaped process 3941610 (ceph-mgr), now
>> > anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
>> >
>> > The cluster is stable and operating normally, there's nothing unusual
>> going
>> > on before, during or after the kill, thus it's unclear what causes the
>> mgr
>> > to balloon, use all RAM and get killed. Systemd logs aren't very
helpful:
>> > they just show normal mgr operations until it fails to allocate memory
>> and
>> > gets killed: https://pastebin.com/MLyw9iVi
>> >
>> > The mgr experienced this issue several times in the last 2 months, and
>> the
>> > events don't appear to correlate with any other events in the cluster
>> > because basically nothing else happened at around those times. How
can I
>> > investigate this and figure out what's causing the mgr to consume all
>> > memory and get killed?
>> >
>> > I would very much appreciate any advice!
>> >
>> > Best regards,
>> > Zakhar
>> > ___
>> > ceph-users mailing list -- ceph-users@ceph.io
>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>>
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>







___
ceph-users mailing list -- ceph-users@ceph.io

[ceph-users] Re: Ceph 16.2.14: ceph-mgr getting oom-killed

2023-11-22 Thread Zakhar Kirpichenko
Unfortunately, I don't have a full stack trace because there's no crash
when the mgr gets oom-killed. There's just the mgr log, which looks
completely normal until about 2-3 minutes before the oom-kill, when
tmalloc warnings show up.

I'm not sure that it's the same issue that is described in the tracker. We
seem to have some stale "events" in the progress module though:

Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+
7f4bb19ef700  0 [progress WARNING root] complete: ev
cacc4230-75ee-4892-b8fd-a19fec8f9f66 does not exist
Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+
7f4bb19ef700  0 [progress WARNING root] complete: ev
44824331-3f6b-45c4-b925-423d098c3c76 does not exist
Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+
7f4bb19ef700  0 [progress WARNING root] complete: ev
0139bc54-ae42-4483-b278-851d77f23f9f does not exist
Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+
7f4bb19ef700  0 [progress WARNING root] complete: ev
f9d6c20e-b8d8-4625-b9cf-84da1244c822 does not exist
Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+
7f4bb19ef700  0 [progress WARNING root] complete: ev
1486b26d-2a23-4416-a864-2cbb0ecf1429 does not exist
Nov 21 14:56:30 ceph01 bash[3941523]: debug 2023-11-21T14:56:30.718+
7f4bb19ef700  0 [progress WARNING root] complete: ev
7f14d01c-498c-413f-b2ef-05521050190a does not exist
Nov 21 14:57:35 ceph01 bash[3941523]: debug 2023-11-21T14:57:35.950+
7f4bb19ef700  0 [progress WARNING root] complete: ev
48cbd97f-82f7-4b80-8086-890fff6e0824 does not exist

I tried clearing them but they keep showing up. I am wondering if these
missing events can cause memory leaks over time.

/Z

On Wed, 22 Nov 2023 at 11:12, Eugen Block  wrote:

> Do you have the full stack trace? The pastebin only contains the
> "tcmalloc: large alloc" messages (same as in the tracker issue). Maybe
> comment in the tracker issue directly since Radek asked for someone
> with a similar problem in a newer release.
>
> Zitat von Zakhar Kirpichenko :
>
> > Thanks, Eugen. It is similar in the sense that the mgr is getting
> > OOM-killed.
> >
> > It started happening in our cluster after the upgrade to 16.2.14. We
> > haven't had this issue with earlier Pacific releases.
> >
> > /Z
> >
> > On Tue, 21 Nov 2023, 21:53 Eugen Block,  wrote:
> >
> >> Just checking it on the phone, but isn’t this quite similar?
> >>
> >> https://tracker.ceph.com/issues/45136
> >>
> >> Zitat von Zakhar Kirpichenko :
> >>
> >> > Hi,
> >> >
> >> > I'm facing a rather new issue with our Ceph cluster: from time to time
> >> > ceph-mgr on one of the two mgr nodes gets oom-killed after consuming
> over
> >> > 100 GB RAM:
> >> >
> >> > [Nov21 15:02] tp_osd_tp invoked oom-killer:
> >> > gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
> >> > [  +0.10]  oom_kill_process.cold+0xb/0x10
> >> > [  +0.02] [  pid  ]   uid  tgid total_vm  rss pgtables_bytes
> >> > swapents oom_score_adj name
> >> > [  +0.08]
> >> >
> >>
> oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=504d37b566d9fd442d45904a00584b4f61c93c5d49dc59eb1c948b3d1c096907,mems_allowed=0-1,global_oom,task_memcg=/docker/3826be8f9115479117ddb8b721ca57585b2bdd58a27c7ed7b38e8d83eb795957,task=ceph-mgr,pid=3941610,uid=167
> >> > [  +0.000697] Out of memory: Killed process 3941610 (ceph-mgr)
> >> > total-vm:146986656kB, anon-rss:125340436kB, file-rss:0kB,
> shmem-rss:0kB,
> >> > UID:167 pgtables:260356kB oom_score_adj:0
> >> > [  +6.509769] oom_reaper: reaped process 3941610 (ceph-mgr), now
> >> > anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> >> >
> >> > The cluster is stable and operating normally, there's nothing unusual
> >> going
> >> > on before, during or after the kill, thus it's unclear what causes the
> >> mgr
> >> > to balloon, use all RAM and get killed. Systemd logs aren't very
> helpful:
> >> > they just show normal mgr operations until it fails to allocate memory
> >> and
> >> > gets killed: https://pastebin.com/MLyw9iVi
> >> >
> >> > The mgr experienced this issue several times in the last 2 months, and
> >> the
> >> > events don't appear to correlate with any other events in the cluster
> >> > because basically nothing else happened at around those times. How
> can I
> >> > investigate this and figure out what's causing the mgr to consume all
> >> > memory and get killed?
> >> >
> >> > I would very much appreciate any advice!
> >> >
> >> > Best regards,
> >> > Zakhar
> >> > ___
> >> > ceph-users mailing list -- ceph-users@ceph.io
> >> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >>
> >>
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
> >>
>
>
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send 

[ceph-users] Re: Ceph 16.2.14: ceph-mgr getting oom-killed

2023-11-22 Thread Eugen Block
Do you have the full stack trace? The pastebin only contains the  
"tcmalloc: large alloc" messages (same as in the tracker issue). Maybe  
comment in the tracker issue directly since Radek asked for someone  
with a similar problem in a newer release.


Zitat von Zakhar Kirpichenko :


Thanks, Eugen. It is similar in the sense that the mgr is getting
OOM-killed.

It started happening in our cluster after the upgrade to 16.2.14. We
haven't had this issue with earlier Pacific releases.

/Z

On Tue, 21 Nov 2023, 21:53 Eugen Block,  wrote:


Just checking it on the phone, but isn’t this quite similar?

https://tracker.ceph.com/issues/45136

Zitat von Zakhar Kirpichenko :

> Hi,
>
> I'm facing a rather new issue with our Ceph cluster: from time to time
> ceph-mgr on one of the two mgr nodes gets oom-killed after consuming over
> 100 GB RAM:
>
> [Nov21 15:02] tp_osd_tp invoked oom-killer:
> gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
> [  +0.10]  oom_kill_process.cold+0xb/0x10
> [  +0.02] [  pid  ]   uid  tgid total_vm  rss pgtables_bytes
> swapents oom_score_adj name
> [  +0.08]
>
oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=504d37b566d9fd442d45904a00584b4f61c93c5d49dc59eb1c948b3d1c096907,mems_allowed=0-1,global_oom,task_memcg=/docker/3826be8f9115479117ddb8b721ca57585b2bdd58a27c7ed7b38e8d83eb795957,task=ceph-mgr,pid=3941610,uid=167
> [  +0.000697] Out of memory: Killed process 3941610 (ceph-mgr)
> total-vm:146986656kB, anon-rss:125340436kB, file-rss:0kB, shmem-rss:0kB,
> UID:167 pgtables:260356kB oom_score_adj:0
> [  +6.509769] oom_reaper: reaped process 3941610 (ceph-mgr), now
> anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
>
> The cluster is stable and operating normally, there's nothing unusual
going
> on before, during or after the kill, thus it's unclear what causes the
mgr
> to balloon, use all RAM and get killed. Systemd logs aren't very helpful:
> they just show normal mgr operations until it fails to allocate memory
and
> gets killed: https://pastebin.com/MLyw9iVi
>
> The mgr experienced this issue several times in the last 2 months, and
the
> events don't appear to correlate with any other events in the cluster
> because basically nothing else happened at around those times. How can I
> investigate this and figure out what's causing the mgr to consume all
> memory and get killed?
>
> I would very much appreciate any advice!
>
> Best regards,
> Zakhar
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: No SSL Dashboard working after installing mgr crt|key with RSA/4096 secp384r1

2023-11-22 Thread Ackermann, Christoph
Hello Eugen,

thanks for the validation. Actually I use plain http because I do not have
much time to look for a solution.
But i will check a new cert ASAP.

Christoph


Am Fr., 17. Nov. 2023 um 12:57 Uhr schrieb Eugen Block :

> I was able to reproduce the error with a self-signed elliptic curves
> based certificate. But I also got out of it by removing cert and key:
>
> quincy-1:~ # ceph config-key rm mgr/dashboard/key
> key deleted
> quincy-1:~ # ceph config-key rm mgr/dashboard/crt
> key deleted
>
> Then I failed the mgr just to be sure:
>
> quincy-1:~ # ceph mgr fail
> quincy-1:~ # ceph config-key get mgr/dashboard/crt
> Error ENOENT:
>
> And then I configured the previous key, did a mgr fail and now the
> dashboard is working again.
>
> Zitat von Eugen Block :
>
> > Hi,
> >
> > did you get your dashboard back in the meantime? I don't have an
> > answer regarding the certificate based on elliptic curves but since
> > you wrote:
> >
> >> So we tried to go back to the original state by removing CRT anf KEY but
> >> without success. The new key seems to stuck into the config
> >
> > how did you try to remove it? I would just assume that this should work:
> >
> > $ ceph config-key rm mgr/dashboard/cert
> >
> > Do you get an error message when removing it or does the mgr log
> > anything when you try to remove it which fails?
> > Also which ceph version is this?
> >
> > Thanks,
> > Eugen
> >
> > Zitat von "Ackermann, Christoph" :
> >
> >> Hello all,
> >>
> >> today i got a new certificate for our internal domain based on  RSA/4096
> >> secp384r1. After inserting  CRT and Key i got both "...updated"
> messages.
> >> After checking the dashboard i got an empty page and this error:
> >>
> >>   health: HEALTH_ERR
> >>   Module 'dashboard' has failed: key type unsupported
> >>
> >> So we tried to go back to the original state by removing CRT anf KEY but
> >> without success. The new key seems to stuck into the config
> >>
> >> [root@ceph ~]# ceph config-key get mgr/dashboard/crt
> >> -BEGIN CERTIFICATE-
> >> MIIFqTCCBJGgAwIBAgIMB5tjLSz264Ic8zeHMA0GCSqGSIb3DQEBCwUAMEwxCzAJ
> >> [...]
> >> ItzkEzq4SZ6V1Jhuf4bFlOMBVAKgAwZ90gXlguoiFFQu5+NIqNljZ8Jz7d0jhH43
> >> e3zhm5sn21+eIqRbiQ==
> >> -END CERTIFICATE-
> >>
> >> [root@ceph ~]# ceph config-key get mgr/dashboard/key
> >>
> >> *Error ENOENT: *
> >>
> >> We tried to generate a self signed Cert but no luck. It looks like
> manger
> >> stays in an intermediate state. The only way to get back the dashboard
> is
> >> to disable SSL  and use plain http.
> >>
> >> Can somebody explain this behaviour?  Maybe secp384r1 elliptic curves
> >> aren't supported? How can we clean up SSL configuration?
> >>
> >> Thanks,
> >> Christoph Ackermann
> >>
> >> Ps we checked some Information like
> >> https://tracker.ceph.com/issues/57924#change-227744 and others  but  no
> >> luck...
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] mds slow request with “failed to authpin, subtree is being exported"

2023-11-22 Thread zxcs
HI, Experts,

we are using cephfs with  16.2.* with multi active mds, and recently, we have 
two nodes mount with ceph-fuse due to the old os system. 

and  one nodes run a python script with `glob.glob(path)`, and another client 
doing `cp` operation on the same path. 

then we see some log about `mds slow request`, and logs complain “failed to 
authpin, subtree is being exported"

then need to restart mds, 


our question is, does there any dead lock?  how can we avoid this and how to 
fix it without restart mds(it will influence other users) ? 


Thanks a ton!


xz
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io