[ceph-users] Re: OSD Memory usage
Now one of my OSDs gets segfault. Here is the full trace: https://paste.ubuntu.com/p/4KHcCG9YQx/ On Mon, Nov 23, 2020 at 2:16 AM Seena Fallah wrote: > Hi all, > > After I upgrade from 14.2.9 to 14.2.14 my OSDs are using less more memory > than before! I give each OSD 6GB memory target and before the free memory > was 20GB and now after 24h from the upgrade I have 104GB free memory of > 128GB memory! Also, my OSD latency got increases! > This happens in both SSD and HDD tier. > > Are there any notes from the upgrade I missed? Is it related to > bluefs_buffered_io? > If BlueFS do a direct IO shouldn't BlueFS/Bluestore use the targeted > memory for its cache and does it mean before the upgrade the memory used > was by a kernel that buffers the IO and wasn't for the ceph-osd? > > Thanks. > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: OSD Memory usage
Hi Seena, just to note - this ticket might be relevant. https://tracker.ceph.com/issues/48276 Mind leaving a comment there? Thanks, Igor On 11/23/2020 2:51 AM, Seena Fallah wrote: Now one of my OSDs gets segfault. Here is the full trace: https://paste.ubuntu.com/p/4KHcCG9YQx/ On Mon, Nov 23, 2020 at 2:16 AM Seena Fallah wrote: Hi all, After I upgrade from 14.2.9 to 14.2.14 my OSDs are using less more memory than before! I give each OSD 6GB memory target and before the free memory was 20GB and now after 24h from the upgrade I have 104GB free memory of 128GB memory! Also, my OSD latency got increases! This happens in both SSD and HDD tier. Are there any notes from the upgrade I missed? Is it related to bluefs_buffered_io? If BlueFS do a direct IO shouldn't BlueFS/Bluestore use the targeted memory for its cache and does it mean before the upgrade the memory used was by a kernel that buffers the IO and wasn't for the ceph-osd? Thanks. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: OSD Memory usage
I add one OSD node to the cluster and I get 500MB/s throughput over my disks and it was 2 or 3 times better than before! but my latency raised 5 times!!! When I enable bluefs_buffered_io the throughput on disks gets 200MB/s and my latency gets down! Is there any kernel config/tuning that should be used to have correct latency without bluefs buffered io? On Mon, Nov 23, 2020 at 3:52 PM Igor Fedotov wrote: > Hi Seena, > > just to note - this ticket might be relevant. > > https://tracker.ceph.com/issues/48276 > > > Mind leaving a comment there? > > > Thanks, > > Igor > > On 11/23/2020 2:51 AM, Seena Fallah wrote: > > Now one of my OSDs gets segfault. > > Here is the full trace: https://paste.ubuntu.com/p/4KHcCG9YQx/ > > > > On Mon, Nov 23, 2020 at 2:16 AM Seena Fallah > wrote: > > > >> Hi all, > >> > >> After I upgrade from 14.2.9 to 14.2.14 my OSDs are using less more > memory > >> than before! I give each OSD 6GB memory target and before the free > memory > >> was 20GB and now after 24h from the upgrade I have 104GB free memory of > >> 128GB memory! Also, my OSD latency got increases! > >> This happens in both SSD and HDD tier. > >> > >> Are there any notes from the upgrade I missed? Is it related to > >> bluefs_buffered_io? > >> If BlueFS do a direct IO shouldn't BlueFS/Bluestore use the targeted > >> memory for its cache and does it mean before the upgrade the memory used > >> was by a kernel that buffers the IO and wasn't for the ceph-osd? > >> > >> Thanks. > >> > > ___ > > ceph-users mailing list -- ceph-users@ceph.io > > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: OSD Memory usage
This is what happens with my cluster (Screenshots attached). At 10:11 I turn on bluefs_buffered_io on all my OSDs and latency gets back but throughput decreases. I had these configs for all OSDs in recovery osd-max-backfills 1 osd-recovery-max-active 1 osd-recovery-op-priority 1 Do you have any idea why with these parameters latency got too much effect? On Tue, Nov 24, 2020 at 12:42 PM Seena Fallah wrote: > I add one OSD node to the cluster and I get 500MB/s throughput over my > disks and it was 2 or 3 times better than before! but my latency raised 5 > times!!! > When I enable bluefs_buffered_io the throughput on disks gets 200MB/s and > my latency gets down! > Is there any kernel config/tuning that should be used to have correct > latency without bluefs buffered io? > > On Mon, Nov 23, 2020 at 3:52 PM Igor Fedotov wrote: > >> Hi Seena, >> >> just to note - this ticket might be relevant. >> >> https://tracker.ceph.com/issues/48276 >> >> >> Mind leaving a comment there? >> >> >> Thanks, >> >> Igor >> >> On 11/23/2020 2:51 AM, Seena Fallah wrote: >> > Now one of my OSDs gets segfault. >> > Here is the full trace: https://paste.ubuntu.com/p/4KHcCG9YQx/ >> > >> > On Mon, Nov 23, 2020 at 2:16 AM Seena Fallah >> wrote: >> > >> >> Hi all, >> >> >> >> After I upgrade from 14.2.9 to 14.2.14 my OSDs are using less more >> memory >> >> than before! I give each OSD 6GB memory target and before the free >> memory >> >> was 20GB and now after 24h from the upgrade I have 104GB free memory of >> >> 128GB memory! Also, my OSD latency got increases! >> >> This happens in both SSD and HDD tier. >> >> >> >> Are there any notes from the upgrade I missed? Is it related to >> >> bluefs_buffered_io? >> >> If BlueFS do a direct IO shouldn't BlueFS/Bluestore use the targeted >> >> memory for its cache and does it mean before the upgrade the memory >> used >> >> was by a kernel that buffers the IO and wasn't for the ceph-osd? >> >> >> >> Thanks. >> >> >> > ___ >> > ceph-users mailing list -- ceph-users@ceph.io >> > To unsubscribe send an email to ceph-users-le...@ceph.io >> > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: OSD memory usage after cephadm adoption
Hi Luis, Can you do a "ceph tell osd. perf dump" and "ceph daemon osd. dump_mempools"? Those should help us understand how much memory is being used by different parts of the OSD/bluestore and how much memory the priority cache thinks it has to work with. Mark On 7/11/23 4:57 AM, Luis Domingues wrote: Hi everyone, We recently migrate a cluster from ceph-ansible to cephadm. Everything went as expected. But now we have some alerts on high memory usage. Cluster is running ceph 16.2.13. Of course, after adoption OSDs ended up in the zone: NAME PORTS RUNNING REFRESHED AGE PLACEMENT osd 88 7m ago - But the weirdest thing I observed, is that the OSDs seem to use more memory that the mem limit: NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID osd.0 running (5d) 2m ago 5d 19.7G 6400M 16.2.13 327f301eff51 ca07fe74a0fa osd.1 running (5d) 2m ago 5d 7068M 6400M 16.2.13 327f301eff51 6223ed8e34e9 osd.10 running (5d) 10m ago 5d 7235M 6400M 16.2.13 327f301eff51 073ddc0d7391 osd.100 running (5d) 2m ago 5d 7118M 6400M 16.2.13 327f301eff51 b7f9238c0c24 Does anybody knows why OSDs would use more memory than the limit? Thanks Luis Domingues Proton AG ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io -- Best Regards, Mark Nelson Head of R&D (USA) Clyso GmbH p: +49 89 21552391 12 a: Loristraße 8 | 80335 München | Germany w: https://clyso.com | e: mark.nel...@clyso.com We are hiring: https://www.clyso.com/jobs/ ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: OSD memory usage after cephadm adoption
Here you have. Perf dump: { "AsyncMessenger::Worker-0": { "msgr_recv_messages": 12239872, "msgr_send_messages": 12284221, "msgr_recv_bytes": 43759275160, "msgr_send_bytes": 61268769426, "msgr_created_connections": 754, "msgr_active_connections": 100, "msgr_running_total_time": 939.476931816, "msgr_running_send_time": 337.873686715, "msgr_running_recv_time": 360.728238752, "msgr_running_fast_dispatch_time": 183.737116872, "msgr_send_messages_queue_lat": { "avgcount": 12284206, "sum": 1538.989479364, "avgtime": 0.000125281 }, "msgr_handle_ack_lat": { "avgcount": 5258403, "sum": 1.005075918, "avgtime": 0.00191 } }, "AsyncMessenger::Worker-1": { "msgr_recv_messages": 12099771, "msgr_send_messages": 12138795, "msgr_recv_bytes": 56967534605, "msgr_send_bytes": 130548664272, "msgr_created_connections": 647, "msgr_active_connections": 91, "msgr_running_total_time": 977.277996439, "msgr_running_send_time": 362.155959231, "msgr_running_recv_time": 365.376281473, "msgr_running_fast_dispatch_time": 191.186643292, "msgr_send_messages_queue_lat": { "avgcount": 12138818, "sum": 1557.187685700, "avgtime": 0.000128281 }, "msgr_handle_ack_lat": { "avgcount": 6155265, "sum": 1.096270527, "avgtime": 0.00178 } }, "AsyncMessenger::Worker-2": { "msgr_recv_messages": 11858354, "msgr_send_messages": 11960404, "msgr_recv_bytes": 60727084610, "msgr_send_bytes": 168534726650, "msgr_created_connections": 1043, "msgr_active_connections": 103, "msgr_running_total_time": 937.324084772, "msgr_running_send_time": 351.174710644, "msgr_running_recv_time": 2744.276782474, "msgr_running_fast_dispatch_time": 172.960322050, "msgr_send_messages_queue_lat": { "avgcount": 11960392, "sum": 1763.762581924, "avgtime": 0.000147466 }, "msgr_handle_ack_lat": { "avgcount": 2651457, "sum": 0.538495450, "avgtime": 0.00203 } }, "bluefs": { "db_total_bytes": 128005955584, "db_used_bytes": 3271557120, "wal_total_bytes": 0, "wal_used_bytes": 0, "slow_total_bytes": 1810377216, "slow_used_bytes": 0, "num_files": 70, "log_bytes": 13045760, "log_compactions": 58, "logged_bytes": 922333184, "files_written_wal": 2, "files_written_sst": 13, "bytes_written_wal": 1988489216, "bytes_written_sst": 268890112, "bytes_written_slow": 0, "max_bytes_wal": 0, "max_bytes_db": 3271557120, "max_bytes_slow": 0, "read_random_count": 577484, "read_random_bytes": 2879541532, "read_random_disk_count": 284290, "read_random_disk_bytes": 1540394118, "read_random_buffer_count": 319088, "read_random_buffer_bytes": 1339147414, "read_count": 1086625, "read_bytes": 15054317429, "read_prefetch_count": 1069462, "read_prefetch_bytes": 14506469332, "read_zeros_candidate": 0, "read_zeros_errors": 0 }, "bluestore": { "kv_flush_lat": { "avgcount": 225099, "sum": 526.605165277, "avgtime": 0.002339438 }, "kv_commit_lat": { "avgcount": 225099, "sum": 61.412175620, "avgtime": 0.000272822 }, "kv_sync_lat": { "avgcount": 225099, "sum": 588.017340897, "avgtime": 0.002612261 }, "kv_final_lat": { "avgcount": 225096, "sum": 6.516869320, "avgtime": 0.28951 }, "state_prepare_lat": { "avgcount": 241063, "sum": 173.705759592, "avgtime": 0.000720582 }, "state_aio_wait_lat": { "avgcount": 241063, "sum": 1008.936150524, "avgtime": 0.004185362 }, "state_io_done_lat": { "avgcount": 241063, "sum": 2.923457351, "avgtime": 0.12127 }, "state_kv_queued_lat": { "avgcount": 241063, "sum": 560.050193021, "avgtime": 0.002323252 }, "state_kv_commiting_lat": { "avgcount": 241063, "sum": 68.355225981, "avgtime": 0.000283557 }, "state_kv_done_lat": { "avgcount": 241063, "sum": 0.097836444, "avgtime": 0.00405 }, "state_deferred_queued_lat": { "avgcount": 47230,
[ceph-users] Re: OSD memory usage after cephadm adoption
On 7/11/23 09:44, Luis Domingues wrote: "bluestore-pricache": { "target_bytes": 6713193267, "mapped_bytes": 6718742528, "unmapped_bytes": 467025920, "heap_bytes": 7185768448, "cache_bytes": 4161537138 }, Hi Luis, Looks like the mapped bytes for this OSD process is very close to (just a little over) the target bytes that has been set when you did the perf dump. There is some unmapped memory that can be reclaimed by the kernel, but we can't force the kernel to reclaim it. It could be that the kernel is being a little lazy if there isn't memory pressure. The way the memory autotuning works in Ceph is that periodically the prioritycache system will look at the mapped memory usage of the process, then grow/shrink the aggregate size of the in-memory caches to try and stay near the target. It's reactive in nature, meaning that it can't completely control for spikes. It also can't shrink the caches below a small minimum size, so if there is a memory leak it will help to an extent but can't completely fix it. Once the aggregate memory size is decided on, it goes through a process of looking at how hot the different caches are and assigns memory based on where it thinks the memory would be most useful. Again this is based on mapped memory though. It can't force the kernel to reclaim memory that has already been released. Thanks, Mark -- Best Regards, Mark Nelson Head of R&D (USA) Clyso GmbH p: +49 89 21552391 12 a: Loristraße 8 | 80335 München | Germany w: https://clyso.com | e: mark.nel...@clyso.com We are hiring: https://www.clyso.com/jobs/ ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: OSD memory usage after cephadm adoption
Hi, Thanks for your hints. I tries to play a little bit with the configs. And now I want to put the 0.7 value as default. So I configured ceph: mgradvanced mgr/cephadm/autotune_memory_target_ratio0.70 * osdadvanced osd_memory_target_autotune true And I ended up having this configs osd host:st10-cbosd-001 basic osd_memory_target 7219293672 osd host:st10-cbosd-002 basic osd_memory_target 7219293672 osd host:st10-cbosd-004 basic osd_memory_target 7219293672 osd host:st10-cbosd-005 basic osd_memory_target 7219293451 osd host:st10-cbosd-006 basic osd_memory_target 7219293451 osd host:st11-cbosd-007 basic osd_memory_target 7216821484 osd host:st11-cbosd-008 basic osd_memory_target 7216825454 And running a ceph orch ps gaves me: osd.0st11-cbosd-007.plabs.ch running (2d) 10m ago 10d25.8G6882M 16.2.13 327f301eff51 29a075f2f925 osd.1st10-cbosd-001.plabs.ch running (19m) 8m ago 10d2115M6884M 16.2.13 327f301eff51 df5067bde5ce osd.10 st10-cbosd-005.plabs.ch running (2d) 10m ago 10d5524M6884M 16.2.13 327f301eff51 f7bc0641ee46 osd.100 st11-cbosd-008.plabs.ch running (2d) 10m ago 10d5234M6882M 16.2.13 327f301eff51 74efa243b953 osd.101 st11-cbosd-008.plabs.ch running (2d) 10m ago 10d4741M6882M 16.2.13 327f301eff51 209671007c65 osd.102 st11-cbosd-008.plabs.ch running (2d) 10m ago 10d5174M6882M 16.2.13 327f301eff51 63691d557732 So far so good. But when I took a look on the memory usage of my OSDs, I was below of that value, by quite a bite. Looking at the OSDs themselves, I have: "bluestore-pricache": { "target_bytes": 4294967296, "mapped_bytes": 1343455232, "unmapped_bytes": 16973824, "heap_bytes": 1360429056, "cache_bytes": 2845415832 }, And if I get the running config: "osd_memory_target": "4294967296", "osd_memory_target_autotune": "true", "osd_memory_target_cgroup_limit_ratio": "0.80", Which is not the value I observe from the config. I have 4294967296 instead of something around 7219293672. Did I miss something? Luis Domingues Proton AG --- Original Message --- On Tuesday, July 11th, 2023 at 18:10, Mark Nelson wrote: > On 7/11/23 09:44, Luis Domingues wrote: > > > "bluestore-pricache": { > > "target_bytes": 6713193267, > > "mapped_bytes": 6718742528, > > "unmapped_bytes": 467025920, > > "heap_bytes": 7185768448, > > "cache_bytes": 4161537138 > > }, > > > Hi Luis, > > > Looks like the mapped bytes for this OSD process is very close to (just > a little over) the target bytes that has been set when you did the perf > dump. There is some unmapped memory that can be reclaimed by the kernel, > but we can't force the kernel to reclaim it. It could be that the > kernel is being a little lazy if there isn't memory pressure. > > The way the memory autotuning works in Ceph is that periodically the > prioritycache system will look at the mapped memory usage of the > process, then grow/shrink the aggregate size of the in-memory caches to > try and stay near the target. It's reactive in nature, meaning that it > can't completely control for spikes. It also can't shrink the caches > below a small minimum size, so if there is a memory leak it will help to > an e
[ceph-users] Re: OSD memory usage after cephadm adoption
Hello Luis, Please see my response below: But when I took a look on the memory usage of my OSDs, I was below of that > value, by quite a bite. Looking at the OSDs themselves, I have: > > "bluestore-pricache": { > "target_bytes": 4294967296, > "mapped_bytes": 1343455232, > "unmapped_bytes": 16973824, > "heap_bytes": 1360429056, > "cache_bytes": 2845415832 > }, > > And if I get the running config: > "osd_memory_target": "4294967296", > "osd_memory_target_autotune": "true", > "osd_memory_target_cgroup_limit_ratio": "0.80", > > Which is not the value I observe from the config. I have 4294967296 > instead of something around 7219293672. Did I miss something? > > This is very likely due to https://tracker.ceph.com/issues/48750. The fix was recently merged into the main branch and should be backported soon all the way to pacific. Until then, the workaround would be to set the osd_memory_target on each OSD individually to the desired value. -Sridhar ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: OSD memory usage after cephadm adoption
It looks indeed to be that bug that I hit. Thanks. Luis Domingues Proton AG --- Original Message --- On Monday, July 17th, 2023 at 07:45, Sridhar Seshasayee wrote: > Hello Luis, > > Please see my response below: > > But when I took a look on the memory usage of my OSDs, I was below of that > > > value, by quite a bite. Looking at the OSDs themselves, I have: > > > > "bluestore-pricache": { > > "target_bytes": 4294967296, > > "mapped_bytes": 1343455232, > > "unmapped_bytes": 16973824, > > "heap_bytes": 1360429056, > > "cache_bytes": 2845415832 > > }, > > > > And if I get the running config: > > "osd_memory_target": "4294967296", > > "osd_memory_target_autotune": "true", > > "osd_memory_target_cgroup_limit_ratio": "0.80", > > > > Which is not the value I observe from the config. I have 4294967296 > > instead of something around 7219293672. Did I miss something? > > This is very likely due to https://tracker.ceph.com/issues/48750. The fix > was recently merged into > the main branch and should be backported soon all the way to pacific. > > Until then, the workaround would be to set the osd_memory_target on each > OSD individually to > the desired value. > > -Sridhar > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: OSD memory usage after cephadm adoption
Hi all, now that host masks seem to work, could somebody please shed some light at the relative priority of these settings: ceph config set osd memory_target X ceph config set osd/host:A memory_target Y ceph config set osd/class:B memory_target Z Which one wins for an OSD on host A in class B? Similar for an explicit ID. The expectation is that a setting for OSD.ID always wins. Then the masked values, then the generic osd setting, then the globals and last the defaults. The relative precedence of masked values is not defined anywhere, nor the precedence in general. This is missing in even the latest docs: https://docs.ceph.com/en/quincy/rados/configuration/ceph-conf/#sections-and-masks . Would be great if someone could add this. Thanks! = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Luis Domingues Sent: Monday, July 17, 2023 9:36 AM To: Sridhar Seshasayee Cc: Mark Nelson; ceph-users@ceph.io Subject: [ceph-users] Re: OSD memory usage after cephadm adoption It looks indeed to be that bug that I hit. Thanks. Luis Domingues Proton AG --- Original Message --- On Monday, July 17th, 2023 at 07:45, Sridhar Seshasayee wrote: > Hello Luis, > > Please see my response below: > > But when I took a look on the memory usage of my OSDs, I was below of that > > > value, by quite a bite. Looking at the OSDs themselves, I have: > > > > "bluestore-pricache": { > > "target_bytes": 4294967296, > > "mapped_bytes": 1343455232, > > "unmapped_bytes": 16973824, > > "heap_bytes": 1360429056, > > "cache_bytes": 2845415832 > > }, > > > > And if I get the running config: > > "osd_memory_target": "4294967296", > > "osd_memory_target_autotune": "true", > > "osd_memory_target_cgroup_limit_ratio": "0.80", > > > > Which is not the value I observe from the config. I have 4294967296 > > instead of something around 7219293672. Did I miss something? > > This is very likely due to https://tracker.ceph.com/issues/48750. The fix > was recently merged into > the main branch and should be backported soon all the way to pacific. > > Until then, the workaround would be to set the osd_memory_target on each > OSD individually to > the desired value. > > -Sridhar > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io