On 8/2/19 6:23 PM, Nick Schaf wrote:
> 
> 
>> Nick Schaf <nick.sc...@jci.com> [2019-07-31 16:34:36]:
>>
>> Hi,
>>
>>> I've noticed the wpa_supplicant process on my mesh interfaces leaking
>>> memory to the point that the kernel kills the process.  It was
>>> discovered in 18.06.2, but I've reproduced it with 18.06.4 and with
>>> the master branch from the GitHub repo.  Since the leak occurs as mesh
>>> links are created and destroyed, I was able to reproduce it with a
>>> simple two-node setup where I monitor the wpa_supplicant process VSZ
>>> on one node and repeatedly bring wifi up and down on the other node.
>>>
>>> I've traced it back to the 18.06.2 release, specifically to lines
>>> 34-35 of
>>> package/network/services/hostapd/patches/015-mesh-do-not-use-
>> offchan-m
>>> gmt-tx-on-DFS.patch
>>> +                 (modes = nl80211_get_hw_feature_data(bss,
>>> + &num_modes,
>>> &flags, +
>>> &dfs_domain)) && That code was added in
>>> a35f24309021c1c0e9cbed0faedf58b941cb4bd3.
>>>
>>> I removed the entire patch file to resolve the memory leak because the
>>> subsequent call to ieee80211_is_dfs() uses the return value from
>>> nl80211_get_hw_feature_data().  However, I know the problem is
>>> specifically related to the nl80211_get_hw_feature_data() call because
>>> I stepped-backward through commits of the hostapd source until I got
>>> back to 0f7fc6b98de9c69f511b9b22f2b65553126362eb, where
>>> ieee80211_is_dfs() had only one argument and didn't rely on the
>>> nl80211_get_hw_feature_data() return value.  At that point, the memory
>>> leak still occurred until I commented-out the call to
>> nl80211_get_hw_feature_data().
>>>
>>> I attempted to dive into nl80211_get_hw_feature_data(), but was
>>> quickly lost, so I defer to those that are more experienced in that code.
>>
>> you did a nice job here to track it down, so thanks for reporting this, can 
>> you
>> try this patch[1]?
>>
> 
> I had already tried an os_free(modes) and found no resolution.  However, to 
> be sure, I tried your patch today and still observe the leak, but also 
> checked original code to determine whether the leak rate reduced with the 
> patch.  From that test (data below) it seems possible that the modes leak I 
> might be a small portion of the overall leak I observed.
> I still suspect the main leak to be somewhere inside 
> nl80211_get_hw_feature_data.
> 
> For your reference, data from today's quick test is below.  VSZ is "VmSize" 
> from /proc/[PID]/status where PID=wpa_supplicant's process ID.  Unpatched is 
> the clean 18.06.4 code.  Patched is the same with your patch applied.
> The other node cycles the connection ~ every 30 seconds (while [ 1 ]; do wifi 
> down; sleep 10; wifi; sleep 20; done).
> We don't see a rise in memory every 30 seconds, leading me to believe the 
> leaked memory was allocated from a memory pool and the pool size needs to be 
> periodically increased as the leak continues.
> 
> Time (s),VSZ unpatched,VSZ patched
> 0,3408,3404
> 10,3408,3408
> 20,3408,3416
> 30,3408,3416
> 40,3408,3420
> 50,3408,3440
> 60,3408,3440
> 70,3412,3440
> 80,3432,3440
> 90,3432,3440
> 100,3432,3440
> 110,3432,3464
> 120,3432,3464
> 130,3432,3464
> 140,3432,3464
> 150,3432,3464
> 160,3436,3464
> 170,3456,3464
> 180,3456,3464
> 190,3456,3464
> 200,3456,3464
> 210,3456,3464
> 220,3460,3464
> 230,3480,3468
> ,,3468
> ,,3468
> ,,3472
> ,,3472
> ,,3472
> ,,3496

Hi,

While I was looking at hostapd I saw this patch:
https://w1.fi/cgit/hostap/commit/?id=3e949655ccc5fba4686d04c70380463ebf059b30

We have the original patch in our tree on top of OpenWrt and I plan to
update hostapd to version 2.9 with this memory leak.
I did not look into this memory leak, but the comment could help.

Hauke

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel

Reply via email to