On 8/2/19 6:23 PM, Nick Schaf wrote: > > >> Nick Schaf <nick.sc...@jci.com> [2019-07-31 16:34:36]: >> >> Hi, >> >>> I've noticed the wpa_supplicant process on my mesh interfaces leaking >>> memory to the point that the kernel kills the process. It was >>> discovered in 18.06.2, but I've reproduced it with 18.06.4 and with >>> the master branch from the GitHub repo. Since the leak occurs as mesh >>> links are created and destroyed, I was able to reproduce it with a >>> simple two-node setup where I monitor the wpa_supplicant process VSZ >>> on one node and repeatedly bring wifi up and down on the other node. >>> >>> I've traced it back to the 18.06.2 release, specifically to lines >>> 34-35 of >>> package/network/services/hostapd/patches/015-mesh-do-not-use- >> offchan-m >>> gmt-tx-on-DFS.patch >>> + (modes = nl80211_get_hw_feature_data(bss, >>> + &num_modes, >>> &flags, + >>> &dfs_domain)) && That code was added in >>> a35f24309021c1c0e9cbed0faedf58b941cb4bd3. >>> >>> I removed the entire patch file to resolve the memory leak because the >>> subsequent call to ieee80211_is_dfs() uses the return value from >>> nl80211_get_hw_feature_data(). However, I know the problem is >>> specifically related to the nl80211_get_hw_feature_data() call because >>> I stepped-backward through commits of the hostapd source until I got >>> back to 0f7fc6b98de9c69f511b9b22f2b65553126362eb, where >>> ieee80211_is_dfs() had only one argument and didn't rely on the >>> nl80211_get_hw_feature_data() return value. At that point, the memory >>> leak still occurred until I commented-out the call to >> nl80211_get_hw_feature_data(). >>> >>> I attempted to dive into nl80211_get_hw_feature_data(), but was >>> quickly lost, so I defer to those that are more experienced in that code. >> >> you did a nice job here to track it down, so thanks for reporting this, can >> you >> try this patch[1]? >> > > I had already tried an os_free(modes) and found no resolution. However, to > be sure, I tried your patch today and still observe the leak, but also > checked original code to determine whether the leak rate reduced with the > patch. From that test (data below) it seems possible that the modes leak I > might be a small portion of the overall leak I observed. > I still suspect the main leak to be somewhere inside > nl80211_get_hw_feature_data. > > For your reference, data from today's quick test is below. VSZ is "VmSize" > from /proc/[PID]/status where PID=wpa_supplicant's process ID. Unpatched is > the clean 18.06.4 code. Patched is the same with your patch applied. > The other node cycles the connection ~ every 30 seconds (while [ 1 ]; do wifi > down; sleep 10; wifi; sleep 20; done). > We don't see a rise in memory every 30 seconds, leading me to believe the > leaked memory was allocated from a memory pool and the pool size needs to be > periodically increased as the leak continues. > > Time (s),VSZ unpatched,VSZ patched > 0,3408,3404 > 10,3408,3408 > 20,3408,3416 > 30,3408,3416 > 40,3408,3420 > 50,3408,3440 > 60,3408,3440 > 70,3412,3440 > 80,3432,3440 > 90,3432,3440 > 100,3432,3440 > 110,3432,3464 > 120,3432,3464 > 130,3432,3464 > 140,3432,3464 > 150,3432,3464 > 160,3436,3464 > 170,3456,3464 > 180,3456,3464 > 190,3456,3464 > 200,3456,3464 > 210,3456,3464 > 220,3460,3464 > 230,3480,3468 > ,,3468 > ,,3468 > ,,3472 > ,,3472 > ,,3472 > ,,3496
Hi, While I was looking at hostapd I saw this patch: https://w1.fi/cgit/hostap/commit/?id=3e949655ccc5fba4686d04c70380463ebf059b30 We have the original patch in our tree on top of OpenWrt and I plan to update hostapd to version 2.9 with this memory leak. I did not look into this memory leak, but the comment could help. Hauke
signature.asc
Description: OpenPGP digital signature
_______________________________________________ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel