Hey

>
> 1. We are not actively resetting the modem anymore. We found that this
> only made matters worse. Essentially the kernel was not releasing the USB
> interfaces of the modem completely, which results in ModemManager not
> acquiring the right ports.
>

Hummm... did you have custom rules as MM port type hints? MM doesn't rely
on interface names for its hints, so it shouldn't matter when grabbing
ports.


> 2. We are now using mmcli to turn on the GPS. This works without any
> issues.
>

Good!



> 3. As you said, it is only possible to have 1 instance of ModemManager, I
> thought there were 3 separate instances, but this is not the case.
>

Yes.


> 4. We are not using qmicli ourselves to communicate with the modem via the
> QMI interface, all interactions are done by ModemManager.
>

Ok.



> 5. We have updated ModemManager to 1.12.2 and libqmi to 1.24.2
>
> The problem we see with ModemManager is that it is slowly using more and
> more memory while using the QMI interface of our Quectel BG96 modem.
>

I have another report from a company that I work with that shows the same
symptoms. I don't have any clear picture yet of what the issue could be,
because none of my valgrind tests have shown any relevant memory leak. If
you could get a proper valgrind log we could give it a look.



> We have done some more research and found the following things:
>
> Our system has 512MB RAM. The OOM killer kills ModemManager after 12 hours
> because at that time, it is using half of the memory available:
>
>
Ouch!!! Really?



> *[43036.640048] kworker/u2:1 invoked oom-killer:
> gfp_mask=0x14082c2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_NOWARN|__GFP_ZERO),
> nodemask=(null),  order=0, oom_score_adj=0*
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *[43036.697846] kworker/u2:1 cpuset=/ mems_allowed=0[43036.701205] CPU: 0
> PID: 18834 Comm: kworker/u2:1 Not tainted 4.14.78 #1[43036.706521] Hardware
> name: Freescale i.MX6 UltraLite (Device Tree)[43036.711544] Workqueue:
> brcmf_wq/mmc0:0001:1 brcmf_sdio_dataworker [brcmfmac][43036.717326]
> [<8010f7b0>] (unwind_backtrace) from [<8010b2bc>]
> (show_stack+0x10/0x14)[43036.723780] [<8010b2bc>] (show_stack) from
> [<80a2cf5c>] (dump_stack+0x78/0x8c)[43036.729713] [<80a2cf5c>] (dump_stack)
> from [<801ceee8>] (dump_header+0xa0/0x230)[43036.735819] [<801ceee8>]
> (dump_header) from [<801ce1f8>] (oom_kill_process+0xb0/0x514)[43036.742444]
> [<801ce1f8>] (oom_kill_process) from [<801cec2c>]
> (out_of_memory+0x15c/0x31c)[43036.749329] [<801cec2c>] (out_of_memory) from
> [<801d3884>] (__alloc_pages_nodemask+0xd14/0xf34)[43036.756739]
> [<801d3884>] (__alloc_pages_nodemask) from [<802054a0>]
> (__vmalloc_node_range+0x114/0x230)[43036.764757] [<802054a0>]
> (__vmalloc_node_range) from [<80205858>]
> (__vmalloc_node.constprop.11+0x48/0x50)[43036.773030] [<80205858>]
> (__vmalloc_node.constprop.11) from [<802058c0>]
> (vzalloc+0x2c/0x34)[43036.780267] [<802058c0>] (vzalloc) from [<7f0369f0>]
> (brcmf_sdio_dataworker+0x1cbc/0x216c [brcmfmac])[43036.788286] [<7f0369f0>]
> (brcmf_sdio_dataworker [brcmfmac]) from [<80141298>]
> (process_one_work+0x1d8/0x410)[43036.796825] [<80141298>]
> (process_one_work) from [<80141f2c>]
> (worker_thread+0x50/0x598)[43036.803623] [<80141f2c>] (worker_thread) from
> [<80146d90>] (kthread+0x14c/0x154)[43036.809728] [<80146d90>] (kthread)
> from [<80107a48>] (ret_from_fork+0x14/0x2c)[43037.167843]
> Mem-Info:[43037.168859] active_anon:98149 inactive_anon:17916
> isolated_anon:0                active_file:12 inactive_file:25
> isolated_file:0                unevictable:0 dirty:0 writeback:0
> unstable:0                slab_reclaimable:1741 slab_unreclaimable:4386
>             mapped:14080 shmem:33723 pagetables:936 bounce:0
> free:650 free_pcp:45 free_cma:0[43037.277821] Node 0 active_anon:392596kB
> inactive_anon:71664kB active_file:48kB inactive_file:52kB unevictable:0kB
> isolated(anon):0kB isolated(file):0kB mapped:56320kB dirty:0kB
> writeback:0kB shmem:134892kB writeback_tmp:0kB unstable:0kB
> all_unreclaimable? no[43037.367828] Normal free:2680kB min:2704kB
> low:3380kB high:4056kB active_anon:392596kB inactive_anon:71664kB
> active_file:48kB inactive_file:52kB unevictable:0kB writepending:0kB
> present:524288kB managed:503600kB mlocked:0kB kernel_stack:1696kB
> pagetables:3744kB bounce:0kB free_pcp:120kB local_pcp:120kB
> free_cma:0kB[43037.477797] lowmem_reserve[]: 0 0 0[43037.480024] Normal:
> 82*4kB (UME) 72*8kB (UME) 31*16kB (UM) 16*32kB (UM) 6*64kB (U) 3*128kB (U)
> 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB 0*32768kB =
> 2680kB[43037.541295] 33751 total pagecache pages[43037.543847] 0 pages in
> swap cache[43037.545863] Swap cache stats: add 0, delete 0, find
> 0/0[43037.587944] Free swap  = 0kB[43037.589582] Total swap =
> 0kB[43037.591169] 131072 pages RAM[43037.592750] 0 pages
> HighMem/MovableOnly[43037.595286] 5172 pages reserved[43037.597126] 0 pages
> cma reserved[43037.617826] [ pid ]   uid  tgid total_vm      rss nr_ptes
> nr_pmds swapents oom_score_adj name[43037.657894] [  109]     0   109
> 7060     2073      50       0        0             0
> systemd-journal[43037.666094] [  115]     0   115     2058     1069       8
>       0        0         -1000 systemd-udevd[43037.713088] [  132]  1000
> 132     1695     1139       7       0        0          -900
> dbus-daemon[43037.743755] [  133]     0   133     1896     1092       8
>   0        0             0 haveged[43037.787950] [  135]     0   135
>  63780    53513     116       0        0             0
> ModemManager[43037.795893] [  138]     0   138     1849      856       8
>     0        0             0 systemd-logind[43037.837831] [  160]     0
> 160      635       74       5       0        0             0
> getty[43037.845169] [  186]     0   186      859      582       6       0
>      0             0 modem.sh[43037.917936] [  193]  1006   193     1971
>    845       8       0        0             0 systemd-network[43037.926136]
> [  196]     0   196    17518     3188      23       0        0
> 0 NetworkManager[43037.971235] [  244]  1007   244     2608     1599
>  10       0        0             0 systemd-resolve[43038.004486] [  245]
>   0   245     1574     1221       6       0        0             0
> openvpn[43038.042983] [  275]     0   275      826      600       6       0
>        0             0 led-control.sh[43038.077907] [  276]     0   276
> 3115      469       8       0        0             0 chronyd[43038.085472]
> [  304]     0   304     2073     1085       8       0        0
> 0 wpa_supplicant[43038.126229] [  320]     0   320     6909      899
>  12       0        0             0 qmi-proxy[43038.177919] [  366]     0
> 366   240369     9498      48       0        0          -500
> balena-engine-d[43038.186169] [  386]     0   386   238336     7960      45
>       0        0          -500 balena-engine-c[43038.224253] [  398]     0
>   398     1350      861       7       0        0         -1000
> sshd[43038.267853] [  578]     0   578   225786     6921      35       0
>      0          -999 balena-engine-c[43038.276058] [  585]     0   585
> 223993     7224      33       0        0          -999
> balena-engine-c[43038.333144] [  587]     0   587   226042     7248      38
>       0        0          -999 balena-engine-c[43038.357927] [  602]     0
>   602   223993     7040      34       0        0          -999
> balena-engine-c[43038.397439] [  688]     0   688     1290       75       7
>       0        0             0 enqueue[43038.427860] [  696]     0   696
>  42418     5093     104       0        0             0 node[43038.467867] [
>  702]   999   702     4385      206       9       0        0             0
> redis-server[43038.475915] [  717]     0   717    32997     6244     106
>     0        0             0 datahub-stats[43038.527834] brcmfmac:
> brcmf_sdio_bus_rxctl: resumed on timeout[43038.532447] brcmfmac:
> _brcmf_set_multicast_list: Setting allmulti failed, -110[43038.538498] [
>  934]     0   934      826      551       6       0        0             0
> app-layer-updat[43038.538511] [  935]     0   935      826      588       7
>       0        0             0 app-layer-healt[43038.538520] [11460]     0
> 11460     1380      185       7       0        0             0
> sleep[43038.538532] [ 6036]     0  6036   228091     6967      36       0
>      0          -999 balena-engine-c[43038.538542] [ 6063]     0  6063
>  29799     2932      74       0        0             0
> ivh2-msgs-deque[43038.538553] [18785]     0 18785     1380      184       7
>       0        0             0 sleep[43038.538563] [18901]     0 18901
> 213220     4860      24       0        0             0
> balena-engine[43038.538575] [18902]     0 18902      540       74       4
>     0        0             0 grep[43038.538584] [18907]     0 18907
>  826      321       5       0        0             0
> led-control.sh[43038.538639] Out of memory: Kill process 135 (ModemManager)
> score 413 or sacrifice child[43038.538685] Killed process 135
> (ModemManager) total-vm:255120kB, anon-rss:204780kB, file-rss:4kB,
> shmem-rss:9268kB[43039.110517] oom_reaper: reaped process 135
> (ModemManager), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB*
>
> After we found out that the OOM killer was killing ModemManager we ran
> ModemManager in debug mode using Memcheck from Valgrind to try to capture
> the memory leaks. See the log below
>
> *==2705== Memcheck, a memory error detector*
> *==2705== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.*
> *==2705== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright
> info*
> *==2705== Command: /usr/sbin/ModemManager --debug
> --filter-policy=WHITELIST-ONLY*
> *==2705== Parent PID: 2598*
> *==2705==*
> *==2705== Warning: noted but unhandled ioctl 0x540c with no size/direction
> hints.*
> *==2705==    This could cause spurious value errors to appear.*
> *==2705==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a
> proper wrapper.*
> *==2756==*
> *==2756== HEAP SUMMARY:*
> *==2756==     in use at exit: 237,156 bytes in 5,555 blocks*
> *==2756==   total heap usage: 30,658 allocs, 25,103 frees, 8,699,112 bytes
> allocated*
> *==2756==*
> *==2756== 12 bytes in 1 blocks are indirectly lost in loss record 1 of 15*
> *==2756==    at 0x4847D74: malloc (in
> /usr/lib/valgrind/vgpreload_memcheck-arm-linux.so)*
> *==2756==*
> *==2756== 32 bytes in 4 blocks are possibly lost in loss record 2 of 15*
> *==2756==    at 0x4847D74: malloc (in
> /usr/lib/valgrind/vgpreload_memcheck-arm-linux.so)*
> *==2756==*
> *==2756== 32 bytes in 4 blocks are possibly lost in loss record 3 of 15*
> *==2756==    at 0x4847C88: malloc (in
> /usr/lib/valgrind/vgpreload_memcheck-arm-linux.so)*
> *==2756==*
> *==2756== 52 (40 direct, 12 indirect) bytes in 4 blocks are definitely
> lost in loss record 4 of 15*
> *==2756==    at 0x4847D74: malloc (in
> /usr/lib/valgrind/vgpreload_memcheck-arm-linux.so)*
> *==2756==*
> *==2756== 376 bytes in 4 blocks are possibly lost in loss record 5 of 15*
> *==2756==    at 0x484AA44: realloc (in
> /usr/lib/valgrind/vgpreload_memcheck-arm-linux.so)*
> *==2756==*
> *==2756== 680 bytes in 16 blocks are possibly lost in loss record 6 of 15*
> *==2756==    at 0x484A7C4: calloc (in
> /usr/lib/valgrind/vgpreload_memcheck-arm-linux.so)*
> *==2756==*
> *==2756== 688 bytes in 43 blocks are still reachable in loss record 7 of
> 15*
> *==2756==    at 0x4DEF9E4: g_closure_new_simple (in
> /usr/lib/libgobject-2.0.so.0.5600.4)*
> *==2756==*
> *==2756== 928 bytes in 43 blocks are still reachable in loss record 8 of
> 15*
> *==2756==    at 0x4DEF99C: g_closure_new_simple (in
> /usr/lib/libgobject-2.0.so.0.5600.4)*
> *==2756==*
> *==2756== 4,080 bytes in 2 blocks are definitely lost in loss record 9 of
> 15*
> *==2756==    at 0x484A7C4: calloc (in
> /usr/lib/valgrind/vgpreload_memcheck-arm-linux.so)*
> *==2756==*
> *==2756== 8,684 bytes in 296 blocks are still reachable in loss record 10
> of 15*
> *==2756==    at 0x4847C88: malloc (in
> /usr/lib/valgrind/vgpreload_memcheck-arm-linux.so)*
> *==2756==*
> *==2756== 8,700 bytes in 217 blocks are still reachable in loss record 11
> of 15*
> *==2756==    at 0x4E0AF58: g_type_create_instance (in
> /usr/lib/libgobject-2.0.so.0.5600.4)*
> *==2756==*
> *==2756== 10,512 bytes in 217 blocks are still reachable in loss record 12
> of 15*
> *==2756==    at 0x4E0AF9C: g_type_create_instance (in
> /usr/lib/libgobject-2.0.so.0.5600.4)*
> *==2756==*
> *==2756== 12,067 bytes in 116 blocks are still reachable in loss record 13
> of 15*
> *==2756==    at 0x484AA44: realloc (in
> /usr/lib/valgrind/vgpreload_memcheck-arm-linux.so)*
> *==2756==*
> *==2756== 73,835 bytes in 3,023 blocks are still reachable in loss record
> 14 of 15*
> *==2756==    at 0x4847D74: malloc (in
> /usr/lib/valgrind/vgpreload_memcheck-arm-linux.so)*
> *==2756==*
> *==2756== 93,754 bytes in 1,305 blocks are still reachable in loss record
> 15 of 15*
> *==2756==    at 0x484A7C4: calloc (in
> /usr/lib/valgrind/vgpreload_memcheck-arm-linux.so)*
> *==2756==*
> *==2756== LEAK SUMMARY:*
> *==2756==    definitely lost: 4,120 bytes in 6 blocks*
> *==2756==    indirectly lost: 12 bytes in 1 blocks*
> *==2756==      possibly lost: 1,120 bytes in 28 blocks*
> *==2756==    still reachable: 209,168 bytes in 5,260 blocks*
> *==2756==                       of which reachable via heuristic:*
> *==2756==                         newarray           : 1,976 bytes in 75
> blocks*
> *==2756==         suppressed: 0 bytes in 0 blocks*
> *==2756==*
> *==2756== For counts of detected and suppressed errors, rerun with: -v*
> *==2756== ERROR SUMMARY: 6 errors from 6 contexts (suppressed: 0 from 0)*
> *==2705==*
> *==2705== HEAP SUMMARY:*
> *==2705==     in use at exit: 874,364 bytes in 24,674 blocks*
> *==2705==   total heap usage: 2,690,239 allocs, 2,665,565 frees,
> 812,938,818 bytes allocated*
> *==2705==*
> *==2705== 32 bytes in 4 blocks are possibly lost in loss record 1 of 14*
> *==2705==    at 0x4847C88: malloc (in
> /usr/lib/valgrind/vgpreload_memcheck-arm-linux.so)*
> *==2705==*
> *==2705== 480 bytes in 4 blocks are possibly lost in loss record 2 of 14*
> *==2705==    at 0x484AA44: realloc (in
> /usr/lib/valgrind/vgpreload_memcheck-arm-linux.so)*
> *==2705==*
> *==2705== 728 bytes in 10 blocks are possibly lost in loss record 3 of 14*
> *==2705==    at 0x4847D74: malloc (in
> /usr/lib/valgrind/vgpreload_memcheck-arm-linux.so)*
> *==2705==*
> *==2705== 1,032 bytes in 18 blocks are possibly lost in loss record 4 of
> 14*
> *==2705==    at 0x484A7C4: calloc (in
> /usr/lib/valgrind/vgpreload_memcheck-arm-linux.so)*
> *==2705==*
> *==2705== 1,744 bytes in 109 blocks are still reachable in loss record 5
> of 14*
> *==2705==    at 0x4DEF9E4: g_closure_new_simple (in
> /usr/lib/libgobject-2.0.so.0.5600.4)*
> *==2705==*
> *==2705== 2,256 bytes in 109 blocks are still reachable in loss record 6
> of 14*
> *==2705==    at 0x4DEF99C: g_closure_new_simple (in
> /usr/lib/libgobject-2.0.so.0.5600.4)*
> *==2705==*
> *==2705== 9,244 bytes in 364 blocks are still reachable in loss record 7
> of 14*
> *==2705==    at 0x4847C88: malloc (in
> /usr/lib/valgrind/vgpreload_memcheck-arm-linux.so)*
> *==2705==*
> *==2705== 15,776 bytes in 470 blocks are still reachable in loss record 8
> of 14*
> *==2705==    at 0x4E0AF9C: g_type_create_instance (in
> /usr/lib/libgobject-2.0.so.0.5600.4)*
> *==2705==*
> *==2705== 17,152 bytes in 229 blocks are still reachable in loss record 9
> of 14*
> *==2705==    at 0x484AA44: realloc (in
> /usr/lib/valgrind/vgpreload_memcheck-arm-linux.so)*
> *==2705==*
> *==2705== 23,512 bytes in 470 blocks are still reachable in loss record 10
> of 14*
> *==2705==    at 0x4E0AF58: g_type_create_instance (in
> /usr/lib/libgobject-2.0.so.0.5600.4)*
> *==2705==*
> *==2705== 78,223 bytes in 498 blocks are indirectly lost in loss record 11
> of 14*
> *==2705==    at 0x4847D74: malloc (in
> /usr/lib/valgrind/vgpreload_memcheck-arm-linux.so)*
> *==2705==*
> *==2705== 93,063 bytes in 3,062 blocks are still reachable in loss record
> 12 of 14*
> *==2705==    at 0x4847D74: malloc (in
> /usr/lib/valgrind/vgpreload_memcheck-arm-linux.so)*
> *==2705==*
> *==2705== 110,326 bytes in 1,844 blocks are still reachable in loss record
> 13 of 14*
> *==2705==    at 0x484A7C4: calloc (in
> /usr/lib/valgrind/vgpreload_memcheck-arm-linux.so)*
> *==2705==*
> *==2705== 551,535 (473,312 direct, 78,223 indirect) bytes in 16,904 blocks
> are definitely lost in loss record 14 of 14*
> *==2705==    at 0x4847D74: malloc (in
> /usr/lib/valgrind/vgpreload_memcheck-arm-linux.so)*
> *==2705==*
> *==2705== LEAK SUMMARY:*
> *==2705==    definitely lost: 473,312 bytes in 16,904 blocks*
>

This is a really bad memory leak, indeed.



> *==2705==    indirectly lost: 78,223 bytes in 498 blocks*
> *==2705==      possibly lost: 2,272 bytes in 36 blocks*
> *==2705==    still reachable: 273,073 bytes in 6,657 blocks*
> *==2705==                       of which reachable via heuristic:*
> *==2705==                         newarray           : 5,276 bytes in 150
> blocks*
> *==2705==         suppressed: 0 bytes in 0 blocks*
> *==2705==*
> *==2705== For counts of detected and suppressed errors, rerun with: -v*
> *==2705== ERROR SUMMARY: 5 errors from 5 contexts (suppressed: 0 from 0)*
>
> As you can see we are not able to get the stack trace to work with
> valgrind. The command we are currently using is:
>
> *systemctl stop ModemManager && G_SLICE=always-malloc valgrind
> --leak-check=full --show-leak-kinds=all --track-origins=yes
> --log-file=/data/valgrind.out /usr/bin/ModemManager --debug
> --filter-policy=WHITELIST-ONLY *
>
> Can you help us with getting the stack traces of the memory leaks that
> memcheck finds?
>

You need to build ModemManager (and if possible, glib) with debug symbols
enabled and install it in the system. Would you be able to do this?

-- 
Aleksander
https://aleksander.es
_______________________________________________
ModemManager-devel mailing list
ModemManager-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/modemmanager-devel

Reply via email to