[ceph-users] Re: Cephadm host keeps trying to set osd_memory_target to less than minimum

2024-04-11 Thread Mads Aasted
Hi Adam. Just tried to extend the hosts memory to 48gb, and it stopped
throwing the error, and set it to 3.something gb instead

Thank you so much for you time and explainations

On Tue, Apr 9, 2024 at 9:30 PM Adam King  wrote:

> The same experiment with the mds daemons pulling 4GB instead of the 16GB,
> and me fixing the starting total memory (I accidentally used the
> memory_available_kb instead of memory_total_kb the first time) gives us
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *DEBUGcephadm.autotune:autotune.py:35 Autotuning OSD memory with given
> parameters:Total memory: 23530995712Daemons: [(crash.a),
> (grafana.a), (mds.a),
> (mds.b), (mds.c),
> (mgr.a), (mon.a),
> (node-exporter.a), (osd.1),
> (osd.2), (osd.3),
> (osd.4), (prometheus.a)]DEBUG
>  cephadm.autotune:autotune.py:50 Subtracting 134217728 from total for crash
> daemonDEBUGcephadm.autotune:autotune.py:52 new total: 23396777984DEBUG
>cephadm.autotune:autotune.py:50 Subtracting 1073741824 from total for
> grafana daemonDEBUGcephadm.autotune:autotune.py:52 new total:
> 22323036160DEBUGcephadm.autotune:autotune.py:40 Subtracting 4294967296
> from total for mds daemonDEBUGcephadm.autotune:autotune.py:42 new
> total: 18028068864DEBUGcephadm.autotune:autotune.py:40 Subtracting
> 4294967296 from total for mds daemonDEBUG
>  cephadm.autotune:autotune.py:42 new total: 13733101568DEBUG
>  cephadm.autotune:autotune.py:40 Subtracting 4294967296 from total for mds
> daemonDEBUGcephadm.autotune:autotune.py:42 new total: 9438134272DEBUG
>  cephadm.autotune:autotune.py:50 Subtracting 4294967296 from total for mgr
> daemonDEBUGcephadm.autotune:autotune.py:52 new total: 5143166976DEBUG
>  cephadm.autotune:autotune.py:50 Subtracting 1073741824 from total for mon
> daemonDEBUGcephadm.autotune:autotune.py:52 new total: 4069425152DEBUG
>  cephadm.autotune:autotune.py:50 Subtracting 1073741824 from total for
> node-exporter daemonDEBUGcephadm.autotune:autotune.py:52 new total:
> 2995683328DEBUGcephadm.autotune:autotune.py:50 Subtracting 1073741824
> from total for prometheus daemonDEBUGcephadm.autotune:autotune.py:52
> new total: 1921941504DEBUGcephadm.autotune:autotune.py:66 Final total
> is 1921941504 to be split among 4 OSDsDEBUG
>  cephadm.autotune:autotune.py:68 Result is 480485376 per OSD*
>
> My understanding is, given starting memory_total_kb of *32827840*, we get
> *33615708160* total bytes. We multiply that by the 0.7 autotune ratio to
> get *23530995712 *bytes to be split among the daemons (something like
> 23-24 GB). Then the mgr and mds daemons all get 4GB, the mon,
> node-exporter, and prometheus all take 1GB, and the crash daemon gets
> 128KB. That leaves us with only 2GB to split among the 4 OSDs. That's how
> we arrive at that "480485376" number per OSD from the original error
> message you posted.
>
> Unable to set osd_memory_target on my-ceph01 to 480485376: error parsing
>> value: Value '480485376' is below minimum 939524096
>
>
> As that value is well below the minimum (it's only about half a GB), it
> reports that error when trying to set it.
>
> On Tue, Apr 9, 2024 at 12:58 PM Mads Aasted  wrote:
>
>> Hi Adam
>>
>> Seems like the mds_cache_memory_limit both set globally through cephadm
>> and the hosts mds daemons are all set to approx. 4gb
>> root@my-ceph01:/# ceph config get mds mds_cache_memory_limit
>> 4294967296
>> same if query the individual mds daemons running on my-ceph01, or any of
>> the other mds daemons on the other hosts.
>>
>> On Tue, Apr 9, 2024 at 6:14 PM Mads Aasted  wrote:
>>
>>> Hi Adam
>>>
>>> Let me just finish tucking in a devlish tyke here and i’ll get to it
>>> first thing
>>>
>>> tirs. 9. apr. 2024 kl. 18.09 skrev Adam King :
>>>
 I did end up writing a unit test to see what we calculated here, as
 well as adding a bunch of debug logging (haven't created a PR yet, but
 probably will).  The total memory was set to (19858056 * 1024 * 0.7) (total
 memory in bytes * the autotune target ratio) = 14234254540. What ended up
 getting logged was (ignore the daemon id for the daemons, they don't affect
 anything. Only the types matter)





















 *DEBUGcephadm.autotune:autotune.py:35 Autotuning OSD memory with
 given parameters:Total memory: 14234254540Daemons:
 [(crash.a), (grafana.a),
 (mds.a), (mds.b),
 (mds.c), (mgr.a),
 (mon.a), (node-exporter.a),
 (osd.1), (osd.2),
 (osd.3), (osd.4),
 (prometheus.a)]DEBUGcephadm.autotune:autotune.py:50
 Subtracting 134217728 from total for crash daemonDEBUG
  cephadm.autotune:autotune.py:52 new total: 14100036812DEBUG
  cephadm.autotune:autotune.py:50 Subtracting 1073741824 from total for
 grafana daemonDEBUGcephadm.autotune:autotune.py:52 new total:
 13026294988DEBUGcephadm.autotune:autotune.py:40 Subtracting 17179869184
 from 

[ceph-users] Re: Cephadm host keeps trying to set osd_memory_target to less than minimum

2024-04-09 Thread Adam King
The same experiment with the mds daemons pulling 4GB instead of the 16GB,
and me fixing the starting total memory (I accidentally used the
memory_available_kb instead of memory_total_kb the first time) gives us























*DEBUGcephadm.autotune:autotune.py:35 Autotuning OSD memory with given
parameters:Total memory: 23530995712Daemons: [(crash.a),
(grafana.a), (mds.a),
(mds.b), (mds.c),
(mgr.a), (mon.a),
(node-exporter.a), (osd.1),
(osd.2), (osd.3),
(osd.4), (prometheus.a)]DEBUG
 cephadm.autotune:autotune.py:50 Subtracting 134217728 from total for crash
daemonDEBUGcephadm.autotune:autotune.py:52 new total: 23396777984DEBUG
   cephadm.autotune:autotune.py:50 Subtracting 1073741824 from total for
grafana daemonDEBUGcephadm.autotune:autotune.py:52 new total:
22323036160DEBUGcephadm.autotune:autotune.py:40 Subtracting 4294967296
from total for mds daemonDEBUGcephadm.autotune:autotune.py:42 new
total: 18028068864DEBUGcephadm.autotune:autotune.py:40 Subtracting
4294967296 from total for mds daemonDEBUG
 cephadm.autotune:autotune.py:42 new total: 13733101568DEBUG
 cephadm.autotune:autotune.py:40 Subtracting 4294967296 from total for mds
daemonDEBUGcephadm.autotune:autotune.py:42 new total: 9438134272DEBUG
 cephadm.autotune:autotune.py:50 Subtracting 4294967296 from total for mgr
daemonDEBUGcephadm.autotune:autotune.py:52 new total: 5143166976DEBUG
 cephadm.autotune:autotune.py:50 Subtracting 1073741824 from total for mon
daemonDEBUGcephadm.autotune:autotune.py:52 new total: 4069425152DEBUG
 cephadm.autotune:autotune.py:50 Subtracting 1073741824 from total for
node-exporter daemonDEBUGcephadm.autotune:autotune.py:52 new total:
2995683328DEBUGcephadm.autotune:autotune.py:50 Subtracting 1073741824
from total for prometheus daemonDEBUGcephadm.autotune:autotune.py:52
new total: 1921941504DEBUGcephadm.autotune:autotune.py:66 Final total
is 1921941504 to be split among 4 OSDsDEBUG
 cephadm.autotune:autotune.py:68 Result is 480485376 per OSD*

My understanding is, given starting memory_total_kb of *32827840*, we get
*33615708160* total bytes. We multiply that by the 0.7 autotune ratio to
get *23530995712 *bytes to be split among the daemons (something like 23-24
GB). Then the mgr and mds daemons all get 4GB, the mon, node-exporter, and
prometheus all take 1GB, and the crash daemon gets 128KB. That leaves us
with only 2GB to split among the 4 OSDs. That's how we arrive at that
"480485376" number per OSD from the original error message you posted.

Unable to set osd_memory_target on my-ceph01 to 480485376: error parsing
> value: Value '480485376' is below minimum 939524096


As that value is well below the minimum (it's only about half a GB), it
reports that error when trying to set it.

On Tue, Apr 9, 2024 at 12:58 PM Mads Aasted  wrote:

> Hi Adam
>
> Seems like the mds_cache_memory_limit both set globally through cephadm
> and the hosts mds daemons are all set to approx. 4gb
> root@my-ceph01:/# ceph config get mds mds_cache_memory_limit
> 4294967296
> same if query the individual mds daemons running on my-ceph01, or any of
> the other mds daemons on the other hosts.
>
> On Tue, Apr 9, 2024 at 6:14 PM Mads Aasted  wrote:
>
>> Hi Adam
>>
>> Let me just finish tucking in a devlish tyke here and i’ll get to it
>> first thing
>>
>> tirs. 9. apr. 2024 kl. 18.09 skrev Adam King :
>>
>>> I did end up writing a unit test to see what we calculated here, as well
>>> as adding a bunch of debug logging (haven't created a PR yet, but probably
>>> will).  The total memory was set to (19858056 * 1024 * 0.7) (total memory
>>> in bytes * the autotune target ratio) = 14234254540. What ended up getting
>>> logged was (ignore the daemon id for the daemons, they don't affect
>>> anything. Only the types matter)
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *DEBUGcephadm.autotune:autotune.py:35 Autotuning OSD memory with
>>> given parameters:Total memory: 14234254540Daemons:
>>> [(crash.a), (grafana.a),
>>> (mds.a), (mds.b),
>>> (mds.c), (mgr.a),
>>> (mon.a), (node-exporter.a),
>>> (osd.1), (osd.2),
>>> (osd.3), (osd.4),
>>> (prometheus.a)]DEBUGcephadm.autotune:autotune.py:50
>>> Subtracting 134217728 from total for crash daemonDEBUG
>>>  cephadm.autotune:autotune.py:52 new total: 14100036812DEBUG
>>>  cephadm.autotune:autotune.py:50 Subtracting 1073741824 from total for
>>> grafana daemonDEBUGcephadm.autotune:autotune.py:52 new total:
>>> 13026294988DEBUGcephadm.autotune:autotune.py:40 Subtracting 17179869184
>>> from total for mds daemonDEBUGcephadm.autotune:autotune.py:42 new
>>> total: -4153574196DEBUGcephadm.autotune:autotune.py:40 Subtracting
>>> 17179869184 from total for mds daemonDEBUG
>>>  cephadm.autotune:autotune.py:42 new total: -21333443380DEBUG
>>>  cephadm.autotune:autotune.py:40 Subtracting 17179869184 from total for mds
>>> daemonDEBUGcephadm.autotune:autotune.py:42 new total: 

[ceph-users] Re: Cephadm host keeps trying to set osd_memory_target to less than minimum

2024-04-09 Thread Mads Aasted
Hi Adam

Seems like the mds_cache_memory_limit both set globally through cephadm and
the hosts mds daemons are all set to approx. 4gb
root@my-ceph01:/# ceph config get mds mds_cache_memory_limit
4294967296
same if query the individual mds daemons running on my-ceph01, or any of
the other mds daemons on the other hosts.

On Tue, Apr 9, 2024 at 6:14 PM Mads Aasted  wrote:

> Hi Adam
>
> Let me just finish tucking in a devlish tyke here and i’ll get to it first
> thing
>
> tirs. 9. apr. 2024 kl. 18.09 skrev Adam King :
>
>> I did end up writing a unit test to see what we calculated here, as well
>> as adding a bunch of debug logging (haven't created a PR yet, but probably
>> will).  The total memory was set to (19858056 * 1024 * 0.7) (total memory
>> in bytes * the autotune target ratio) = 14234254540. What ended up getting
>> logged was (ignore the daemon id for the daemons, they don't affect
>> anything. Only the types matter)
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *DEBUGcephadm.autotune:autotune.py:35 Autotuning OSD memory with
>> given parameters:Total memory: 14234254540Daemons:
>> [(crash.a), (grafana.a),
>> (mds.a), (mds.b),
>> (mds.c), (mgr.a),
>> (mon.a), (node-exporter.a),
>> (osd.1), (osd.2),
>> (osd.3), (osd.4),
>> (prometheus.a)]DEBUGcephadm.autotune:autotune.py:50
>> Subtracting 134217728 from total for crash daemonDEBUG
>>  cephadm.autotune:autotune.py:52 new total: 14100036812DEBUG
>>  cephadm.autotune:autotune.py:50 Subtracting 1073741824 from total for
>> grafana daemonDEBUGcephadm.autotune:autotune.py:52 new total:
>> 13026294988DEBUGcephadm.autotune:autotune.py:40 Subtracting 17179869184
>> from total for mds daemonDEBUGcephadm.autotune:autotune.py:42 new
>> total: -4153574196DEBUGcephadm.autotune:autotune.py:40 Subtracting
>> 17179869184 from total for mds daemonDEBUG
>>  cephadm.autotune:autotune.py:42 new total: -21333443380DEBUG
>>  cephadm.autotune:autotune.py:40 Subtracting 17179869184 from total for mds
>> daemonDEBUGcephadm.autotune:autotune.py:42 new total: -38513312564DEBUG
>>cephadm.autotune:autotune.py:50 Subtracting 4294967296 from total for
>> mgr daemonDEBUGcephadm.autotune:autotune.py:52 new total:
>> -42808279860DEBUGcephadm.autotune:autotune.py:50 Subtracting 1073741824
>> from total for mon daemonDEBUGcephadm.autotune:autotune.py:52 new
>> total: -43882021684DEBUGcephadm.autotune:autotune.py:50 Subtracting
>> 1073741824 from total for node-exporter daemonDEBUG
>>  cephadm.autotune:autotune.py:52 new total: -44955763508DEBUG
>>  cephadm.autotune:autotune.py:50 Subtracting 1073741824 from total for
>> prometheus daemonDEBUGcephadm.autotune:autotune.py:52 new total:
>> -46029505332*
>>
>> It looks like it was taking pretty much all the memory away for the mds
>> daemons. The amount, however, is taken from the "mds_cache_memory_limit"
>> setting for each mds daemon. The number it was defaulting to for the test
>> is quite large. I guess I'd need to know what that comes out to for the mds
>> daemons in your cluster to get a full picture. Also, you can see the total
>> go well into the negatives here. When that happens cephadm just tries to
>> remove the osd_memory_target config settings for the OSDs on the host, but
>> given the error message from your initial post, it must be getting some
>> positive value when actually running on your system.
>>
>> On Fri, Apr 5, 2024 at 2:21 AM Mads Aasted  wrote:
>>
>>> Hi Adam
>>> No problem, i really appreciate your input :)
>>> The memory stats returned are as follows
>>>   "memory_available_kb": 19858056,
>>>   "memory_free_kb": 277480,
>>>   "memory_total_kb": 32827840,
>>>
>>> On Thu, Apr 4, 2024 at 10:14 PM Adam King  wrote:
>>>
 Sorry to keep asking for more info, but can I also get what `cephadm
 gather-facts` on that host returns for "memory_total_kb". Might end up
 creating a unit test out of this case if we have a calculation bug here.

 On Thu, Apr 4, 2024 at 4:05 PM Mads Aasted  wrote:

> sorry for the double send, forgot to hit reply all so it would appear
> on the page
>
> Hi Adam
>
> If we multiply by 0.7, and work through the previous example from that
> number, we would still arrive at roughly 2.5 gb for each osd. And the host
> in question is trying to set it to less than 500mb.
> I have attached a list of the processes running on the host. Currently
> you can even see that the OSD's are taking up the most memory by far, and
> at least 5x its proposed minimum.
> root@my-ceph01:/# ceph orch ps | grep my-ceph01
> crash.my-ceph01   my-ceph01   running (3w)
>  7m ago  13M9052k-  17.2.6
> grafana.my-ceph01 my-ceph01  *:3000   running (3w)
>  7m ago  13M95.6M-  8.3.5
> mds.testfs.my-ceph01.xjxfzd  my-ceph01   running (3w)
>  7m ago  10M 485M-  17.2.6
> 

[ceph-users] Re: Cephadm host keeps trying to set osd_memory_target to less than minimum

2024-04-09 Thread Mads Aasted
Hi Adam

Let me just finish tucking in a devlish tyke here and i’ll get to it first
thing

tirs. 9. apr. 2024 kl. 18.09 skrev Adam King :

> I did end up writing a unit test to see what we calculated here, as well
> as adding a bunch of debug logging (haven't created a PR yet, but probably
> will).  The total memory was set to (19858056 * 1024 * 0.7) (total memory
> in bytes * the autotune target ratio) = 14234254540. What ended up getting
> logged was (ignore the daemon id for the daemons, they don't affect
> anything. Only the types matter)
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *DEBUGcephadm.autotune:autotune.py:35 Autotuning OSD memory with given
> parameters:Total memory: 14234254540Daemons: [(crash.a),
> (grafana.a), (mds.a),
> (mds.b), (mds.c),
> (mgr.a), (mon.a),
> (node-exporter.a), (osd.1),
> (osd.2), (osd.3),
> (osd.4), (prometheus.a)]DEBUG
>  cephadm.autotune:autotune.py:50 Subtracting 134217728 from total for crash
> daemonDEBUGcephadm.autotune:autotune.py:52 new total: 14100036812DEBUG
>cephadm.autotune:autotune.py:50 Subtracting 1073741824 from total for
> grafana daemonDEBUGcephadm.autotune:autotune.py:52 new total:
> 13026294988DEBUGcephadm.autotune:autotune.py:40 Subtracting 17179869184
> from total for mds daemonDEBUGcephadm.autotune:autotune.py:42 new
> total: -4153574196DEBUGcephadm.autotune:autotune.py:40 Subtracting
> 17179869184 from total for mds daemonDEBUG
>  cephadm.autotune:autotune.py:42 new total: -21333443380DEBUG
>  cephadm.autotune:autotune.py:40 Subtracting 17179869184 from total for mds
> daemonDEBUGcephadm.autotune:autotune.py:42 new total: -38513312564DEBUG
>cephadm.autotune:autotune.py:50 Subtracting 4294967296 from total for
> mgr daemonDEBUGcephadm.autotune:autotune.py:52 new total:
> -42808279860DEBUGcephadm.autotune:autotune.py:50 Subtracting 1073741824
> from total for mon daemonDEBUGcephadm.autotune:autotune.py:52 new
> total: -43882021684DEBUGcephadm.autotune:autotune.py:50 Subtracting
> 1073741824 from total for node-exporter daemonDEBUG
>  cephadm.autotune:autotune.py:52 new total: -44955763508DEBUG
>  cephadm.autotune:autotune.py:50 Subtracting 1073741824 from total for
> prometheus daemonDEBUGcephadm.autotune:autotune.py:52 new total:
> -46029505332*
>
> It looks like it was taking pretty much all the memory away for the mds
> daemons. The amount, however, is taken from the "mds_cache_memory_limit"
> setting for each mds daemon. The number it was defaulting to for the test
> is quite large. I guess I'd need to know what that comes out to for the mds
> daemons in your cluster to get a full picture. Also, you can see the total
> go well into the negatives here. When that happens cephadm just tries to
> remove the osd_memory_target config settings for the OSDs on the host, but
> given the error message from your initial post, it must be getting some
> positive value when actually running on your system.
>
> On Fri, Apr 5, 2024 at 2:21 AM Mads Aasted  wrote:
>
>> Hi Adam
>> No problem, i really appreciate your input :)
>> The memory stats returned are as follows
>>   "memory_available_kb": 19858056,
>>   "memory_free_kb": 277480,
>>   "memory_total_kb": 32827840,
>>
>> On Thu, Apr 4, 2024 at 10:14 PM Adam King  wrote:
>>
>>> Sorry to keep asking for more info, but can I also get what `cephadm
>>> gather-facts` on that host returns for "memory_total_kb". Might end up
>>> creating a unit test out of this case if we have a calculation bug here.
>>>
>>> On Thu, Apr 4, 2024 at 4:05 PM Mads Aasted  wrote:
>>>
 sorry for the double send, forgot to hit reply all so it would appear
 on the page

 Hi Adam

 If we multiply by 0.7, and work through the previous example from that
 number, we would still arrive at roughly 2.5 gb for each osd. And the host
 in question is trying to set it to less than 500mb.
 I have attached a list of the processes running on the host. Currently
 you can even see that the OSD's are taking up the most memory by far, and
 at least 5x its proposed minimum.
 root@my-ceph01:/# ceph orch ps | grep my-ceph01
 crash.my-ceph01   my-ceph01   running (3w)
  7m ago  13M9052k-  17.2.6
 grafana.my-ceph01 my-ceph01  *:3000   running (3w)
  7m ago  13M95.6M-  8.3.5
 mds.testfs.my-ceph01.xjxfzd  my-ceph01   running (3w)
  7m ago  10M 485M-  17.2.6
 mds.prodfs.my-ceph01.rplvac   my-ceph01   running (3w)
  7m ago  12M26.9M-  17.2.6
 mds.prodfs.my-ceph01.twikzdmy-ceph01   running (3w)
  7m ago  12M26.2M-  17.2.6
 mgr.my-ceph01.rxdefe  my-ceph01  *:8443,9283  running (3w)
  7m ago  13M 907M-  17.2.6
 mon.my-ceph01 my-ceph01   running (3w)
  7m ago  13M 503M2048M  17.2.6
 node-exporter.my-ceph01   

[ceph-users] Re: Cephadm host keeps trying to set osd_memory_target to less than minimum

2024-04-09 Thread Adam King
I did end up writing a unit test to see what we calculated here, as well as
adding a bunch of debug logging (haven't created a PR yet, but probably
will).  The total memory was set to (19858056 * 1024 * 0.7) (total memory
in bytes * the autotune target ratio) = 14234254540. What ended up getting
logged was (ignore the daemon id for the daemons, they don't affect
anything. Only the types matter)





















*DEBUGcephadm.autotune:autotune.py:35 Autotuning OSD memory with given
parameters:Total memory: 14234254540Daemons: [(crash.a),
(grafana.a), (mds.a),
(mds.b), (mds.c),
(mgr.a), (mon.a),
(node-exporter.a), (osd.1),
(osd.2), (osd.3),
(osd.4), (prometheus.a)]DEBUG
 cephadm.autotune:autotune.py:50 Subtracting 134217728 from total for crash
daemonDEBUGcephadm.autotune:autotune.py:52 new total: 14100036812DEBUG
   cephadm.autotune:autotune.py:50 Subtracting 1073741824 from total for
grafana daemonDEBUGcephadm.autotune:autotune.py:52 new total:
13026294988DEBUGcephadm.autotune:autotune.py:40 Subtracting 17179869184
from total for mds daemonDEBUGcephadm.autotune:autotune.py:42 new
total: -4153574196DEBUGcephadm.autotune:autotune.py:40 Subtracting
17179869184 from total for mds daemonDEBUG
 cephadm.autotune:autotune.py:42 new total: -21333443380DEBUG
 cephadm.autotune:autotune.py:40 Subtracting 17179869184 from total for mds
daemonDEBUGcephadm.autotune:autotune.py:42 new total: -38513312564DEBUG
   cephadm.autotune:autotune.py:50 Subtracting 4294967296 from total for
mgr daemonDEBUGcephadm.autotune:autotune.py:52 new total:
-42808279860DEBUGcephadm.autotune:autotune.py:50 Subtracting 1073741824
from total for mon daemonDEBUGcephadm.autotune:autotune.py:52 new
total: -43882021684DEBUGcephadm.autotune:autotune.py:50 Subtracting
1073741824 from total for node-exporter daemonDEBUG
 cephadm.autotune:autotune.py:52 new total: -44955763508DEBUG
 cephadm.autotune:autotune.py:50 Subtracting 1073741824 from total for
prometheus daemonDEBUGcephadm.autotune:autotune.py:52 new total:
-46029505332*

It looks like it was taking pretty much all the memory away for the mds
daemons. The amount, however, is taken from the "mds_cache_memory_limit"
setting for each mds daemon. The number it was defaulting to for the test
is quite large. I guess I'd need to know what that comes out to for the mds
daemons in your cluster to get a full picture. Also, you can see the total
go well into the negatives here. When that happens cephadm just tries to
remove the osd_memory_target config settings for the OSDs on the host, but
given the error message from your initial post, it must be getting some
positive value when actually running on your system.

On Fri, Apr 5, 2024 at 2:21 AM Mads Aasted  wrote:

> Hi Adam
> No problem, i really appreciate your input :)
> The memory stats returned are as follows
>   "memory_available_kb": 19858056,
>   "memory_free_kb": 277480,
>   "memory_total_kb": 32827840,
>
> On Thu, Apr 4, 2024 at 10:14 PM Adam King  wrote:
>
>> Sorry to keep asking for more info, but can I also get what `cephadm
>> gather-facts` on that host returns for "memory_total_kb". Might end up
>> creating a unit test out of this case if we have a calculation bug here.
>>
>> On Thu, Apr 4, 2024 at 4:05 PM Mads Aasted  wrote:
>>
>>> sorry for the double send, forgot to hit reply all so it would appear on
>>> the page
>>>
>>> Hi Adam
>>>
>>> If we multiply by 0.7, and work through the previous example from that
>>> number, we would still arrive at roughly 2.5 gb for each osd. And the host
>>> in question is trying to set it to less than 500mb.
>>> I have attached a list of the processes running on the host. Currently
>>> you can even see that the OSD's are taking up the most memory by far, and
>>> at least 5x its proposed minimum.
>>> root@my-ceph01:/# ceph orch ps | grep my-ceph01
>>> crash.my-ceph01   my-ceph01   running (3w)
>>>  7m ago  13M9052k-  17.2.6
>>> grafana.my-ceph01 my-ceph01  *:3000   running (3w)
>>>  7m ago  13M95.6M-  8.3.5
>>> mds.testfs.my-ceph01.xjxfzd  my-ceph01   running (3w)
>>>  7m ago  10M 485M-  17.2.6
>>> mds.prodfs.my-ceph01.rplvac   my-ceph01   running (3w)
>>>  7m ago  12M26.9M-  17.2.6
>>> mds.prodfs.my-ceph01.twikzdmy-ceph01   running (3w)
>>>  7m ago  12M26.2M-  17.2.6
>>> mgr.my-ceph01.rxdefe  my-ceph01  *:8443,9283  running (3w)
>>>  7m ago  13M 907M-  17.2.6
>>> mon.my-ceph01 my-ceph01   running (3w)
>>>  7m ago  13M 503M2048M  17.2.6
>>> node-exporter.my-ceph01   my-ceph01  *:9100   running (3w)
>>>  7m ago  13M20.4M-  1.5.0
>>> osd.3my-ceph01   running (3w)
>>>7m ago  11M2595M4096M  17.2.6
>>> osd.5my-ceph01   running (3w)
>>>7m ago  11M

[ceph-users] Re: Cephadm host keeps trying to set osd_memory_target to less than minimum

2024-04-05 Thread Mads Aasted
Hi Adam
No problem, i really appreciate your input :)
The memory stats returned are as follows
  "memory_available_kb": 19858056,
  "memory_free_kb": 277480,
  "memory_total_kb": 32827840,

On Thu, Apr 4, 2024 at 10:14 PM Adam King  wrote:

> Sorry to keep asking for more info, but can I also get what `cephadm
> gather-facts` on that host returns for "memory_total_kb". Might end up
> creating a unit test out of this case if we have a calculation bug here.
>
> On Thu, Apr 4, 2024 at 4:05 PM Mads Aasted  wrote:
>
>> sorry for the double send, forgot to hit reply all so it would appear on
>> the page
>>
>> Hi Adam
>>
>> If we multiply by 0.7, and work through the previous example from that
>> number, we would still arrive at roughly 2.5 gb for each osd. And the host
>> in question is trying to set it to less than 500mb.
>> I have attached a list of the processes running on the host. Currently
>> you can even see that the OSD's are taking up the most memory by far, and
>> at least 5x its proposed minimum.
>> root@my-ceph01:/# ceph orch ps | grep my-ceph01
>> crash.my-ceph01   my-ceph01   running (3w)
>>  7m ago  13M9052k-  17.2.6
>> grafana.my-ceph01 my-ceph01  *:3000   running (3w)
>>  7m ago  13M95.6M-  8.3.5
>> mds.testfs.my-ceph01.xjxfzd  my-ceph01   running (3w)  7m
>> ago  10M 485M-  17.2.6
>> mds.prodfs.my-ceph01.rplvac   my-ceph01   running (3w)
>>  7m ago  12M26.9M-  17.2.6
>> mds.prodfs.my-ceph01.twikzdmy-ceph01   running (3w)
>>  7m ago  12M26.2M-  17.2.6
>> mgr.my-ceph01.rxdefe  my-ceph01  *:8443,9283  running (3w)
>>  7m ago  13M 907M-  17.2.6
>> mon.my-ceph01 my-ceph01   running (3w)
>>  7m ago  13M 503M2048M  17.2.6
>> node-exporter.my-ceph01   my-ceph01  *:9100   running (3w)
>>  7m ago  13M20.4M-  1.5.0
>> osd.3my-ceph01   running (3w)
>>  7m ago  11M2595M4096M  17.2.6
>> osd.5my-ceph01   running (3w)
>>  7m ago  11M2494M4096M  17.2.6
>> osd.6my-ceph01   running (3w)
>>  7m ago  11M2698M4096M  17.2.6
>> osd.9my-ceph01   running (3w)
>>  7m ago  11M3364M4096M  17.2.6
>> prometheus.my-ceph01  my-ceph01  *:9095   running (3w)
>>  7m ago  13M 164M-  2.42.0
>>
>>
>>
>>
>> On Thu, Mar 28, 2024 at 2:13 AM Adam King  wrote:
>>
>>>  I missed a step in the calculation. The total_memory_kb I mentioned
>>> earlier is also multiplied by the value of the
>>> mgr/cephadm/autotune_memory_target_ratio before doing the subtractions for
>>> all the daemons. That value defaults to 0.7. That might explain it seeming
>>> like it's getting a value lower than expected. Beyond that, I'd think 'i'd
>>> need a list of the daemon types and count on that host to try and work
>>> through what it's doing.
>>>
>>> On Wed, Mar 27, 2024 at 10:47 AM Mads Aasted  wrote:
>>>
 Hi Adam.

 So doing the calculations with what you are stating here I arrive at a
 total sum for all the listed processes at 13.3 (roughly) gb, for everything
 except the osds, leaving well in excess of +4gb for each OSD.
 Besides the mon daemon which i can tell on my host has a limit of 2gb ,
 none of the other daemons seem to have a limit set according to ceph orch
 ps. Then again, they are nowhere near the values stated in min_size_by_type
 that you list.
 Obviously yes, I could disable the auto tuning, but that would leave me
 none the wiser as to why this exact host is trying to do this.



 On Tue, Mar 26, 2024 at 10:20 PM Adam King  wrote:

> For context, the value the autotune goes with takes the value from
> `cephadm gather-facts` on the host (the "memory_total_kb" field) and then
> subtracts from that per daemon on the host according to
>
> min_size_by_type = {
> 'mds': 4096 * 1048576,
> 'mgr': 4096 * 1048576,
> 'mon': 1024 * 1048576,
> 'crash': 128 * 1048576,
> 'keepalived': 128 * 1048576,
> 'haproxy': 128 * 1048576,
> 'nvmeof': 4096 * 1048576,
> }
> default_size = 1024 * 1048576
>
> what's left is then divided by the number of OSDs on the host to
> arrive at the value. I'll also add, since it seems to be an issue on this
> particular host,  if you add the "_no_autotune_memory" label to the host,
> it will stop trying to do this on that host.
>
> On Mon, Mar 25, 2024 at 6:32 PM  wrote:
>
>> I have a virtual ceph cluster running 17.2.6 with 4 ubuntu 22.04
>> hosts in it, each with 4 OSD's attached. The first 2 servers hosting 
>> mgr's
>> have 32GB of RAM each, and the remaining have 24gb

[ceph-users] Re: Cephadm host keeps trying to set osd_memory_target to less than minimum

2024-04-04 Thread Adam King
Sorry to keep asking for more info, but can I also get what `cephadm
gather-facts` on that host returns for "memory_total_kb". Might end up
creating a unit test out of this case if we have a calculation bug here.

On Thu, Apr 4, 2024 at 4:05 PM Mads Aasted  wrote:

> sorry for the double send, forgot to hit reply all so it would appear on
> the page
>
> Hi Adam
>
> If we multiply by 0.7, and work through the previous example from that
> number, we would still arrive at roughly 2.5 gb for each osd. And the host
> in question is trying to set it to less than 500mb.
> I have attached a list of the processes running on the host. Currently you
> can even see that the OSD's are taking up the most memory by far, and at
> least 5x its proposed minimum.
> root@my-ceph01:/# ceph orch ps | grep my-ceph01
> crash.my-ceph01   my-ceph01   running (3w)  7m
> ago  13M9052k-  17.2.6
> grafana.my-ceph01 my-ceph01  *:3000   running (3w)  7m
> ago  13M95.6M-  8.3.5
> mds.testfs.my-ceph01.xjxfzd  my-ceph01   running (3w)  7m
> ago  10M 485M-  17.2.6
> mds.prodfs.my-ceph01.rplvac   my-ceph01   running (3w)  7m
> ago  12M26.9M-  17.2.6
> mds.prodfs.my-ceph01.twikzdmy-ceph01   running (3w)
>  7m ago  12M26.2M-  17.2.6
> mgr.my-ceph01.rxdefe  my-ceph01  *:8443,9283  running (3w)  7m
> ago  13M 907M-  17.2.6
> mon.my-ceph01 my-ceph01   running (3w)  7m
> ago  13M 503M2048M  17.2.6
> node-exporter.my-ceph01   my-ceph01  *:9100   running (3w)  7m
> ago  13M20.4M-  1.5.0
> osd.3my-ceph01   running (3w)
>  7m ago  11M2595M4096M  17.2.6
> osd.5my-ceph01   running (3w)
>  7m ago  11M2494M4096M  17.2.6
> osd.6my-ceph01   running (3w)
>  7m ago  11M2698M4096M  17.2.6
> osd.9my-ceph01   running (3w)
>  7m ago  11M3364M4096M  17.2.6
> prometheus.my-ceph01  my-ceph01  *:9095   running (3w)  7m
> ago  13M 164M-  2.42.0
>
>
>
>
> On Thu, Mar 28, 2024 at 2:13 AM Adam King  wrote:
>
>>  I missed a step in the calculation. The total_memory_kb I mentioned
>> earlier is also multiplied by the value of the
>> mgr/cephadm/autotune_memory_target_ratio before doing the subtractions for
>> all the daemons. That value defaults to 0.7. That might explain it seeming
>> like it's getting a value lower than expected. Beyond that, I'd think 'i'd
>> need a list of the daemon types and count on that host to try and work
>> through what it's doing.
>>
>> On Wed, Mar 27, 2024 at 10:47 AM Mads Aasted  wrote:
>>
>>> Hi Adam.
>>>
>>> So doing the calculations with what you are stating here I arrive at a
>>> total sum for all the listed processes at 13.3 (roughly) gb, for everything
>>> except the osds, leaving well in excess of +4gb for each OSD.
>>> Besides the mon daemon which i can tell on my host has a limit of 2gb ,
>>> none of the other daemons seem to have a limit set according to ceph orch
>>> ps. Then again, they are nowhere near the values stated in min_size_by_type
>>> that you list.
>>> Obviously yes, I could disable the auto tuning, but that would leave me
>>> none the wiser as to why this exact host is trying to do this.
>>>
>>>
>>>
>>> On Tue, Mar 26, 2024 at 10:20 PM Adam King  wrote:
>>>
 For context, the value the autotune goes with takes the value from
 `cephadm gather-facts` on the host (the "memory_total_kb" field) and then
 subtracts from that per daemon on the host according to

 min_size_by_type = {
 'mds': 4096 * 1048576,
 'mgr': 4096 * 1048576,
 'mon': 1024 * 1048576,
 'crash': 128 * 1048576,
 'keepalived': 128 * 1048576,
 'haproxy': 128 * 1048576,
 'nvmeof': 4096 * 1048576,
 }
 default_size = 1024 * 1048576

 what's left is then divided by the number of OSDs on the host to arrive
 at the value. I'll also add, since it seems to be an issue on this
 particular host,  if you add the "_no_autotune_memory" label to the host,
 it will stop trying to do this on that host.

 On Mon, Mar 25, 2024 at 6:32 PM  wrote:

> I have a virtual ceph cluster running 17.2.6 with 4 ubuntu 22.04 hosts
> in it, each with 4 OSD's attached. The first 2 servers hosting mgr's have
> 32GB of RAM each, and the remaining have 24gb
> For some reason i am unable to identify, the first host in the cluster
> appears to constantly be trying to set the osd_memory_target variable to
> roughly half of what the calculated minimum is for the cluster, i see the
> following spamming the logs constantly
> Unable to set osd_memory_target on 

[ceph-users] Re: Cephadm host keeps trying to set osd_memory_target to less than minimum

2024-04-04 Thread Mads Aasted
sorry for the double send, forgot to hit reply all so it would appear on
the page

Hi Adam

If we multiply by 0.7, and work through the previous example from that
number, we would still arrive at roughly 2.5 gb for each osd. And the host
in question is trying to set it to less than 500mb.
I have attached a list of the processes running on the host. Currently you
can even see that the OSD's are taking up the most memory by far, and at
least 5x its proposed minimum.
root@my-ceph01:/# ceph orch ps | grep my-ceph01
crash.my-ceph01   my-ceph01   running (3w)  7m
ago  13M9052k-  17.2.6
grafana.my-ceph01 my-ceph01  *:3000   running (3w)  7m
ago  13M95.6M-  8.3.5
mds.testfs.my-ceph01.xjxfzd  my-ceph01   running (3w)  7m
ago  10M 485M-  17.2.6
mds.prodfs.my-ceph01.rplvac   my-ceph01   running (3w)  7m
ago  12M26.9M-  17.2.6
mds.prodfs.my-ceph01.twikzdmy-ceph01   running (3w)  7m
ago  12M26.2M-  17.2.6
mgr.my-ceph01.rxdefe  my-ceph01  *:8443,9283  running (3w)  7m
ago  13M 907M-  17.2.6
mon.my-ceph01 my-ceph01   running (3w)  7m
ago  13M 503M2048M  17.2.6
node-exporter.my-ceph01   my-ceph01  *:9100   running (3w)  7m
ago  13M20.4M-  1.5.0
osd.3my-ceph01   running (3w)
 7m ago  11M2595M4096M  17.2.6
osd.5my-ceph01   running (3w)
 7m ago  11M2494M4096M  17.2.6
osd.6my-ceph01   running (3w)
 7m ago  11M2698M4096M  17.2.6
osd.9my-ceph01   running (3w)
 7m ago  11M3364M4096M  17.2.6
prometheus.my-ceph01  my-ceph01  *:9095   running (3w)  7m
ago  13M 164M-  2.42.0




On Thu, Mar 28, 2024 at 2:13 AM Adam King  wrote:

>  I missed a step in the calculation. The total_memory_kb I mentioned
> earlier is also multiplied by the value of the
> mgr/cephadm/autotune_memory_target_ratio before doing the subtractions for
> all the daemons. That value defaults to 0.7. That might explain it seeming
> like it's getting a value lower than expected. Beyond that, I'd think 'i'd
> need a list of the daemon types and count on that host to try and work
> through what it's doing.
>
> On Wed, Mar 27, 2024 at 10:47 AM Mads Aasted  wrote:
>
>> Hi Adam.
>>
>> So doing the calculations with what you are stating here I arrive at a
>> total sum for all the listed processes at 13.3 (roughly) gb, for everything
>> except the osds, leaving well in excess of +4gb for each OSD.
>> Besides the mon daemon which i can tell on my host has a limit of 2gb ,
>> none of the other daemons seem to have a limit set according to ceph orch
>> ps. Then again, they are nowhere near the values stated in min_size_by_type
>> that you list.
>> Obviously yes, I could disable the auto tuning, but that would leave me
>> none the wiser as to why this exact host is trying to do this.
>>
>>
>>
>> On Tue, Mar 26, 2024 at 10:20 PM Adam King  wrote:
>>
>>> For context, the value the autotune goes with takes the value from
>>> `cephadm gather-facts` on the host (the "memory_total_kb" field) and then
>>> subtracts from that per daemon on the host according to
>>>
>>> min_size_by_type = {
>>> 'mds': 4096 * 1048576,
>>> 'mgr': 4096 * 1048576,
>>> 'mon': 1024 * 1048576,
>>> 'crash': 128 * 1048576,
>>> 'keepalived': 128 * 1048576,
>>> 'haproxy': 128 * 1048576,
>>> 'nvmeof': 4096 * 1048576,
>>> }
>>> default_size = 1024 * 1048576
>>>
>>> what's left is then divided by the number of OSDs on the host to arrive
>>> at the value. I'll also add, since it seems to be an issue on this
>>> particular host,  if you add the "_no_autotune_memory" label to the host,
>>> it will stop trying to do this on that host.
>>>
>>> On Mon, Mar 25, 2024 at 6:32 PM  wrote:
>>>
 I have a virtual ceph cluster running 17.2.6 with 4 ubuntu 22.04 hosts
 in it, each with 4 OSD's attached. The first 2 servers hosting mgr's have
 32GB of RAM each, and the remaining have 24gb
 For some reason i am unable to identify, the first host in the cluster
 appears to constantly be trying to set the osd_memory_target variable to
 roughly half of what the calculated minimum is for the cluster, i see the
 following spamming the logs constantly
 Unable to set osd_memory_target on my-ceph01 to 480485376: error
 parsing value: Value '480485376' is below minimum 939524096
 Default is set to 4294967296.
 I did double check and osd_memory_base (805306368) +
 osd_memory_cache_min (134217728) adds up to minimum exactly
 osd_memory_target_autotune is currently enabled. But i cannot for the
 life of me figure out how it is arriving at 480485376 as a value for 

[ceph-users] Re: Cephadm host keeps trying to set osd_memory_target to less than minimum

2024-03-27 Thread Adam King
 I missed a step in the calculation. The total_memory_kb I mentioned
earlier is also multiplied by the value of the
mgr/cephadm/autotune_memory_target_ratio before doing the subtractions for
all the daemons. That value defaults to 0.7. That might explain it seeming
like it's getting a value lower than expected. Beyond that, I'd think 'i'd
need a list of the daemon types and count on that host to try and work
through what it's doing.

On Wed, Mar 27, 2024 at 10:47 AM Mads Aasted  wrote:

> Hi Adam.
>
> So doing the calculations with what you are stating here I arrive at a
> total sum for all the listed processes at 13.3 (roughly) gb, for everything
> except the osds, leaving well in excess of +4gb for each OSD.
> Besides the mon daemon which i can tell on my host has a limit of 2gb ,
> none of the other daemons seem to have a limit set according to ceph orch
> ps. Then again, they are nowhere near the values stated in min_size_by_type
> that you list.
> Obviously yes, I could disable the auto tuning, but that would leave me
> none the wiser as to why this exact host is trying to do this.
>
>
>
> On Tue, Mar 26, 2024 at 10:20 PM Adam King  wrote:
>
>> For context, the value the autotune goes with takes the value from
>> `cephadm gather-facts` on the host (the "memory_total_kb" field) and then
>> subtracts from that per daemon on the host according to
>>
>> min_size_by_type = {
>> 'mds': 4096 * 1048576,
>> 'mgr': 4096 * 1048576,
>> 'mon': 1024 * 1048576,
>> 'crash': 128 * 1048576,
>> 'keepalived': 128 * 1048576,
>> 'haproxy': 128 * 1048576,
>> 'nvmeof': 4096 * 1048576,
>> }
>> default_size = 1024 * 1048576
>>
>> what's left is then divided by the number of OSDs on the host to arrive
>> at the value. I'll also add, since it seems to be an issue on this
>> particular host,  if you add the "_no_autotune_memory" label to the host,
>> it will stop trying to do this on that host.
>>
>> On Mon, Mar 25, 2024 at 6:32 PM  wrote:
>>
>>> I have a virtual ceph cluster running 17.2.6 with 4 ubuntu 22.04 hosts
>>> in it, each with 4 OSD's attached. The first 2 servers hosting mgr's have
>>> 32GB of RAM each, and the remaining have 24gb
>>> For some reason i am unable to identify, the first host in the cluster
>>> appears to constantly be trying to set the osd_memory_target variable to
>>> roughly half of what the calculated minimum is for the cluster, i see the
>>> following spamming the logs constantly
>>> Unable to set osd_memory_target on my-ceph01 to 480485376: error parsing
>>> value: Value '480485376' is below minimum 939524096
>>> Default is set to 4294967296.
>>> I did double check and osd_memory_base (805306368) +
>>> osd_memory_cache_min (134217728) adds up to minimum exactly
>>> osd_memory_target_autotune is currently enabled. But i cannot for the
>>> life of me figure out how it is arriving at 480485376 as a value for that
>>> particular host that even has the most RAM. Neither the cluster or the host
>>> is even approaching max utilization on memory, so it's not like there are
>>> processes competing for resources.
>>> ___
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>>
>>>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cephadm host keeps trying to set osd_memory_target to less than minimum

2024-03-27 Thread Mads Aasted
Hi Adam.

So doing the calculations with what you are stating here I arrive at a
total sum for all the listed processes at 13.3 (roughly) gb, for everything
except the osds, leaving well in excess of +4gb for each OSD.
Besides the mon daemon which i can tell on my host has a limit of 2gb ,
none of the other daemons seem to have a limit set according to ceph orch
ps. Then again, they are nowhere near the values stated in min_size_by_type
that you list.
Obviously yes, I could disable the auto tuning, but that would leave me
none the wiser as to why this exact host is trying to do this.



On Tue, Mar 26, 2024 at 10:20 PM Adam King  wrote:

> For context, the value the autotune goes with takes the value from
> `cephadm gather-facts` on the host (the "memory_total_kb" field) and then
> subtracts from that per daemon on the host according to
>
> min_size_by_type = {
> 'mds': 4096 * 1048576,
> 'mgr': 4096 * 1048576,
> 'mon': 1024 * 1048576,
> 'crash': 128 * 1048576,
> 'keepalived': 128 * 1048576,
> 'haproxy': 128 * 1048576,
> 'nvmeof': 4096 * 1048576,
> }
> default_size = 1024 * 1048576
>
> what's left is then divided by the number of OSDs on the host to arrive at
> the value. I'll also add, since it seems to be an issue on this particular
> host,  if you add the "_no_autotune_memory" label to the host, it will stop
> trying to do this on that host.
>
> On Mon, Mar 25, 2024 at 6:32 PM  wrote:
>
>> I have a virtual ceph cluster running 17.2.6 with 4 ubuntu 22.04 hosts in
>> it, each with 4 OSD's attached. The first 2 servers hosting mgr's have 32GB
>> of RAM each, and the remaining have 24gb
>> For some reason i am unable to identify, the first host in the cluster
>> appears to constantly be trying to set the osd_memory_target variable to
>> roughly half of what the calculated minimum is for the cluster, i see the
>> following spamming the logs constantly
>> Unable to set osd_memory_target on my-ceph01 to 480485376: error parsing
>> value: Value '480485376' is below minimum 939524096
>> Default is set to 4294967296.
>> I did double check and osd_memory_base (805306368) + osd_memory_cache_min
>> (134217728) adds up to minimum exactly
>> osd_memory_target_autotune is currently enabled. But i cannot for the
>> life of me figure out how it is arriving at 480485376 as a value for that
>> particular host that even has the most RAM. Neither the cluster or the host
>> is even approaching max utilization on memory, so it's not like there are
>> processes competing for resources.
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cephadm host keeps trying to set osd_memory_target to less than minimum

2024-03-26 Thread Adam King
For context, the value the autotune goes with takes the value from `cephadm
gather-facts` on the host (the "memory_total_kb" field) and then subtracts
from that per daemon on the host according to

min_size_by_type = {
'mds': 4096 * 1048576,
'mgr': 4096 * 1048576,
'mon': 1024 * 1048576,
'crash': 128 * 1048576,
'keepalived': 128 * 1048576,
'haproxy': 128 * 1048576,
'nvmeof': 4096 * 1048576,
}
default_size = 1024 * 1048576

what's left is then divided by the number of OSDs on the host to arrive at
the value. I'll also add, since it seems to be an issue on this particular
host,  if you add the "_no_autotune_memory" label to the host, it will stop
trying to do this on that host.

On Mon, Mar 25, 2024 at 6:32 PM  wrote:

> I have a virtual ceph cluster running 17.2.6 with 4 ubuntu 22.04 hosts in
> it, each with 4 OSD's attached. The first 2 servers hosting mgr's have 32GB
> of RAM each, and the remaining have 24gb
> For some reason i am unable to identify, the first host in the cluster
> appears to constantly be trying to set the osd_memory_target variable to
> roughly half of what the calculated minimum is for the cluster, i see the
> following spamming the logs constantly
> Unable to set osd_memory_target on my-ceph01 to 480485376: error parsing
> value: Value '480485376' is below minimum 939524096
> Default is set to 4294967296.
> I did double check and osd_memory_base (805306368) + osd_memory_cache_min
> (134217728) adds up to minimum exactly
> osd_memory_target_autotune is currently enabled. But i cannot for the life
> of me figure out how it is arriving at 480485376 as a value for that
> particular host that even has the most RAM. Neither the cluster or the host
> is even approaching max utilization on memory, so it's not like there are
> processes competing for resources.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io