RE: [External] Re: linux kernel page allocation failure and tuning of page cache

Nagal, Amit UTC CCS Sun, 02 Jun 2019 23:06:52 -0700

-----Original Message-----
From: Alexander Duyck [mailto:[email protected]] 
Sent: Saturday, June 1, 2019 2:57 AM
To: Nagal, Amit UTC CCS <[email protected]>
Cc: [email protected]; [email protected]; CHAWLA, RITU UTC CCS 
<[email protected]>
Subject: [External] Re: linux kernel page allocation failure and tuning of page 
cache


On Fri, May 31, 2019 at 8:07 AM Nagal, Amit UTC CCS <[email protected]> wrote:
>
> Hi
>
> We are using Renesas RZ/A1 processor based custom target board . linux kernel 
> version is 4.9.123.
>
> 1) the platform is low memory platform having memory 64MB.
>
> 2)  we are doing around 45MB TCP data transfer from PC to target using netcat 
> utility .On Target , a process receives data over socket and writes the data 
> to flash disk .
>
> 3) At the start of data transfer , we explicitly clear linux kernel cached 
> memory by  calling echo 3 > /proc/sys/vm/drop_caches .
>
> 4) during TCP data transfer , we could see free -m showing "free" getting 
> dropped to almost 1MB and most of the memory appearing as "cached"
>
> # free -m
>                                             total         used   free     
> shared   buffers   cached
> Mem:                                  57            56         1              
>    0            2           42
> -/+ buffers/cache:                          12        45
> Swap:                                   0              0           0
>
> 5) sometimes , we observed kernel memory getting exhausted as page allocation 
> failure happens in kernel  with the backtrace is printed below :
> # [  775.947949] nc.traditional: page allocation failure: order:0, 
> mode:0x2080020(GFP_ATOMIC)
> [  775.956362] CPU: 0 PID: 1288 Comm: nc.traditional Tainted: G           O   
>  4.9.123-pic6-g31a13de-dirty #19
> [  775.966085] Hardware name: Generic R7S72100 (Flattened Device Tree) 
> [  775.972501] [<c0109829>] (unwind_backtrace) from [<c010796f>] 
> (show_stack+0xb/0xc) [  775.980118] [<c010796f>] (show_stack) from 
> [<c0151de3>] (warn_alloc+0x89/0xba) [  775.987361] [<c0151de3>] 
> (warn_alloc) from [<c0152043>] (__alloc_pages_nodemask+0x1eb/0x634)
> [  775.995790] [<c0152043>] (__alloc_pages_nodemask) from [<c0152523>] 
> (__alloc_page_frag+0x39/0xde) [  776.004685] [<c0152523>] 
> (__alloc_page_frag) from [<c03190f1>] (__netdev_alloc_skb+0x51/0xb0) [  
> 776.013217] [<c03190f1>] (__netdev_alloc_skb) from [<c02c1b6f>] 
> (sh_eth_poll+0xbf/0x3c0) [  776.021342] [<c02c1b6f>] (sh_eth_poll) 
> from [<c031fd8f>] (net_rx_action+0x77/0x170) [  776.029051] 
> [<c031fd8f>] (net_rx_action) from [<c011238f>] 
> (__do_softirq+0x107/0x160) [  776.036896] [<c011238f>] (__do_softirq) 
> from [<c0112589>] (irq_exit+0x5d/0x80) [  776.044165] [<c0112589>] 
> (irq_exit) from [<c012f4db>] (__handle_domain_irq+0x57/0x8c) [  776.052007] 
> [<c012f4db>] (__handle_domain_irq) from [<c01012e1>] 
> (gic_handle_irq+0x31/0x48) [  776.060362] [<c01012e1>] (gic_handle_irq) from 
> [<c0108025>] (__irq_svc+0x65/0xac) [  776.067835] Exception stack(0xc1cafd70 
> to 0xc1cafdb8)
> [  776.072876] fd60:                                     0002751c c1dec6a0 
> 0000000c 521c3be5
> [  776.081042] fd80: 56feb08e f64823a6 ffb35f7b feab513d f9cb0643 
> 0000056c c1caff10 ffffe000 [  776.089204] fda0: b1f49160 c1cafdc4 
> c180c677 c0234ace 200e0033 ffffffff [  776.095816] [<c0108025>] 
> (__irq_svc) from [<c0234ace>] (__copy_to_user_std+0x7e/0x430) [  
> 776.103796] [<c0234ace>] (__copy_to_user_std) from [<c0241715>] 
> (copy_page_to_iter+0x105/0x250) [  776.112503] [<c0241715>] 
> (copy_page_to_iter) from [<c0319aeb>] 
> (skb_copy_datagram_iter+0xa3/0x108)
> [  776.121469] [<c0319aeb>] (skb_copy_datagram_iter) from [<c03443a7>] 
> (tcp_recvmsg+0x3ab/0x5f4) [  776.130045] [<c03443a7>] (tcp_recvmsg) 
> from [<c035e249>] (inet_recvmsg+0x21/0x2c) [  776.137576] [<c035e249>] 
> (inet_recvmsg) from [<c031009f>] (sock_read_iter+0x51/0x6e) [  
> 776.145384] [<c031009f>] (sock_read_iter) from [<c017795d>] 
> (__vfs_read+0x97/0xb0) [  776.152967] [<c017795d>] (__vfs_read) from 
> [<c01781d9>] (vfs_read+0x51/0xb0) [  776.159983] [<c01781d9>] 
> (vfs_read) from [<c0178aab>] (SyS_read+0x27/0x52) [  776.166837] 
> [<c0178aab>] (SyS_read) from [<c0105261>] (ret_fast_syscall+0x1/0x54)

>So it looks like you are interrupting the process that is draining the socket 
>to service the interrupt that is filling it. I am curious what your tcp_rmem 
>value is. If this is occurring often then you will likely build up a >backlog 
>of packets in the receive buffer for the socket and that may be where all your 
>memory is going.

Thanks for the reply .
# cat /proc/sys/net/ipv4/tcp_rmem
4096    87380   454688

the maximum value is less than 1MB here .  which means that socket buffer is 
not consuming all the memory here right ?
 
> [  776.174308] Mem-Info:
> [  776.176650] active_anon:2037 inactive_anon:23 isolated_anon:0 [  
> 776.176650]  active_file:2636 inactive_file:7391 isolated_file:32 [  
> 776.176650]  unevictable:0 dirty:1366 writeback:1281 unstable:0 [  
> 776.176650]  slab_reclaimable:719 slab_unreclaimable:724 [  
> 776.176650]  mapped:1990 shmem:26 pagetables:159 bounce:0 [  
> 776.176650]  free:373 free_pcp:6 free_cma:0 [  776.209062] Node 0 
> active_anon:8148kB inactive_anon:92kB active_file:10544kB 
> inactive_file:29564kB unevictable:0kB isolated(anon):0kB 
> isolated(file):128kB mapped:7960kB dirty:5464kB writeback:5124kB 
> shmem:104kB writeback_tmp:0kB unstable:0kB pages_scanned:0 
> all_unreclaimable? no [  776.233602] Normal free:1492kB min:964kB 
> low:1204kB high:1444kB active_anon:8148kB inactive_anon:92kB 
> active_file:10544kB inactive_file:29564kB unevictable:0kB 
> writepending:10588kB present:65536kB managed:59304kB mlocked:0kB 
> slab_reclaimable:2876kB slab_unreclaimable:2896kB kernel_stack:1152kB 
> pagetables:636kB bounce:0kB free_pcp:24kB local_pcp:24kB free_cma:0kB 
> [  776.265406] lowmem_reserve[]: 0 0 [  776.268761] Normal: 7*4kB (H) 
> 5*8kB (H) 7*16kB (H) 5*32kB (H) 6*64kB (H) 2*128kB (H) 2*256kB (H) 
> 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1492kB
> 10071 total pagecache pages
> [  776.284124] 0 pages in swap cache
> [  776.287446] Swap cache stats: add 0, delete 0, find 0/0 [  
> 776.292645] Free swap  = 0kB [  776.295532] Total swap = 0kB [  
> 776.298421] 16384 pages RAM [  776.301224] 0 pages HighMem/MovableOnly 
> [  776.305052] 1558 pages reserved
>
> 6) we have certain questions as below :
> a) how the kernel memory got exhausted ? at the time of low memory conditions 
> in kernel , are the kernel page flusher threads , which should have written 
> dirty pages from page cache to flash disk , not > >executing at right time ? 
> is the kernel page reclaim mechanism not executing at right time ?

>I suspect the pages are likely stuck in a state of buffering. In the case of 
>sockets the packets will get queued up until either they can be serviced or 
>the maximum size of the receive buffer as been exceeded >and they are dropped.

My concern here is that why the reclaim procedure has not triggered ?

> b) are there any parameters available within the linux memory subsystem with 
> which the reclaim procedure can be monitored and  fine tuned ?

>I don't think freeing up more memory will solve the issue. I really think you 
>probably should look at tuning the network settings. I suspect the socket 
>itself is likely the thing holding all of the memory.

> c) can  some amount of free memory be reserved so that linux kernel does not 
> caches it and kernel can use it for its other required page allocation ( 
> particularly gfp_atomic ) as needed above on behalf of netcat nc process ? 
> can some tuning be done in linux memory subsystem eg by using 
> /proc/sys/vm/min_free_kbytes  to achieve this objective .

>Within the kernel we already have some emergency reserved that get dipped into 
>if the PF_MEMALLOC flag is set. However that is usually reserved for the cases 
>where you are booting off of something like >iscsi or NVMe over TCP.

> d) can we be provided with further clues on how to debug this issue further 
> for out of memory condition in kernel  ?

>My advice would be look at tuning your TCP socket values in sysctl. I suspect 
>you are likely using a larger window then your system can currently handle 
>given the memory constraints and that what you are >seeing is that all the 
>memory is being consumed by buffering for the TCP socket.

Any suggestions here what all TCP socket values I should look into and what 
values to tune to .

RE: [External] Re: linux kernel page allocation failure and tuning of page cache

Reply via email to