[tipc-discussion] kernel page allocation problem

2016-03-29 Thread Leon Pollak
Hello, gurus.

Recently I encountered the kernel "page allocation" error in the TIPC code.
I attached the backtrace printout.
I use 2.6.37 Linux version, which seems to use TIPC 2.0 (although I am not 
sure, but the code is prom the Arago distribution). 
All this runs in the embedded ARM system.
Although it seems to me irrelevant, but anyway - the largest packet 
transferred by me is 15KiB.

I will be very thankful for any comment about the issue.
-- 
Leonroot@dm814x-evm:~# cat /proc/version
Linux version 2.6.37 (le...@leonp.plris.com) (gcc version 4.3.3 (Sourcery 
G++Lite 2009q1-203) ) #219 Wed Mar 2 12:09:00 IST 2016
root@dm814x-evm:~#./init.sh
TIPC: Started in network mode
TIPC: Own node address <1.1.1>, network identity 1234
TIPC: Enabled bearer , discovery domain <1.1.0>, priority 10
TIPC: Established link <1.1.1:eth0-1.1.5:eth0> on network plane A

root@dm814x-evm:~# AVU: page allocation failure. order:2, mode:0x4020
Backtrace:
[] (dump_backtrace+0x0/0x110) from [] (dump_stack+0x18/0x1c)
 r7:0020 r6:4020 r5:003f r4:
[] (dump_stack+0x0/0x1c) from [] 
(__alloc_pages_nodemask+0x4c8/0x51c)
[] (__alloc_pages_nodemask+0x0/0x51c) from [] 
(__get_free_pages+0x18/0x30)
[] (__get_free_pages+0x0/0x30) from [] 
(__kmalloc_track_caller+0x40/0xc8)
[] (__kmalloc_track_caller+0x0/0xc8) from [] 
(__alloc_skb+0x58/0xe8)
 r8:c03ef410 r7:3b00 r6:0020 r5:c4beb900 r4:c4804700
[] (__alloc_skb+0x0/0xe8) from [] 
(tipc_buf_acquire+0x28/0x60)
[] (tipc_buf_acquire+0x0/0x60) from [] 
(tipc_link_recv_fragment+0x130/0x2c4)
 r5: r4:c1fba850
[] (tipc_link_recv_fragment+0x0/0x2c4) from [] 
(tipc_bclink_recv_pkt+0x490/0x638)
[] (tipc_bclink_recv_pkt+0x0/0x638) from [] 
(tipc_recv_msg+0x164/0x948)
 r8: r7:c4b34000 r6:ca88 r5:c05e9ea4 r4:c4b3f240
[] (tipc_recv_msg+0x0/0x948) from [] (recv_msg+0x40/0x54)
[] (recv_msg+0x0/0x54) from [] 
(__netif_receive_skb+0x338/0x38c)
[] (__netif_receive_skb+0x0/0x38c) from [] 
(netif_receive_skb+0x5c/0x6c)
[] (netif_receive_skb+0x0/0x6c) from [] 
(cpsw_rx_handler+0xa4/0x17c)
 r4:c4b34000
[] (cpsw_rx_handler+0x0/0x17c) from [] 
(__cpdma_chan_free+0x88/0x8c)
[] (__cpdma_chan_free+0x0/0x8c) from [] 
(__cpdma_chan_process+0x10c/0x124)
[] (__cpdma_chan_process+0x0/0x124) from [] 
(cpdma_chan_process+0x30/0x50)
[] (cpdma_chan_process+0x0/0x50) from [] 
(cpsw_poll+0x34/0xa0)
 r7:0001 r6:0040 r5:c4b34360 r4:
[] (cpsw_poll+0x0/0xa0) from [] (net_rx_action+0x58/0x154)
 r9:000a r8:012c r7:0001 r6:000c r5:0040
r4:c4b34370
[] (net_rx_action+0x0/0x154) from [] 
(__do_softirq+0x80/0x108)
[] (__do_softirq+0x0/0x108) from [] (irq_exit+0x48/0x94)
[] (irq_exit+0x0/0x94) from [] (asm_do_IRQ+0x80/0xa0)
[] (asm_do_IRQ+0x0/0xa0) from [] (__irq_svc+0x34/0xa0)
Exception stack(0xc3c9d9a8 to 0xc3c9d9f0)
d9a0:   0003 02e8 0012 0011 c06f609c c06201c0
d9c0: c05b294c 2013 0001 c3c9db3c 0001 c3c9da14 c06201d8 c3c9d9f0
d9e0: 0002 c00ae0d8 8013 
 r5:fa20 r4:
[] (free_hot_cold_page+0x0/0x19c) from [] 
(__pagevec_free+0x2c/0x3c)
 r9:c3c9db3c r8:c3c9dc2c r7:c0674060 r6:c3c9da38 r5:c3c9da40
r4:0001
[] (__pagevec_free+0x0/0x3c) from [] 
(free_page_list+0x78/0xbc)
 r7:c0674060 r6:c4615d88 r5:c3c9dadc r4:c0621de0
[] (free_page_list+0x0/0xbc) from [] 
(shrink_page_list+0x728/0x78c)
 r5:c0674078 r4:
[] (shrink_page_list+0x0/0x78c) from [] 
(shrink_inactive_list+0x1b0/0x2a4)
[] (shrink_inactive_list+0x0/0x2a4) from [] 
(shrink_zone+0x3cc/0x478)
[] (shrink_zone+0x0/0x478) from [] 
(try_to_free_pages+0xf0/0x2f4)
[] (try_to_free_pages+0x0/0x2f4) from [] 
(__alloc_pages_nodemask+0x31c/0x51c)
[] (__alloc_pages_nodemask+0x0/0x51c) from [] 
(__do_page_cache_readahead+0x9c/0x1e8)
[] (__do_page_cache_readahead+0x0/0x1e8) from [] 
(ra_submit+0x2c/0x34)
[] (ra_submit+0x0/0x34) from [] 
(ondemand_readahead+0x1ac/0x1bc)
[] (ondemand_readahead+0x0/0x1bc) from [] 
(page_cache_sync_readahead+0x70/0x78)
[] (page_cache_sync_readahead+0x0/0x78) from [] 
(generic_file_aio_read+0x2ec/0x738)
 r6: r5:09e6 r4:1000
[] (generic_file_aio_read+0x0/0x738) from [] 
(nfs_file_read+0xd0/0x104)
[] (nfs_file_read+0x0/0x104) from [] 
(do_sync_read+0xa0/0xec)
[] (do_sync_read+0x0/0xec) from [] (vfs_read+0xb8/0x144)
 r8:40e5c000 r7:c3c9df70 r6:40e5c000 r5:0008 r4:c3ed3600
[] (vfs_read+0x0/0x144) from [] (sys_read+0x44/0x70)
 r8:40e5c000 r7:0008 r6:c3ed3600 r5: r4:0098
[] (sys_read+0x0/0x70) from [] (ret_fast_syscall+0x0/0x30)
 r8:c004afa8 r7:0003 r6:00712a18 r5: r4:0098
Mem-info:
Normal per-cpu:
CPU0: hi:   18, btch:   3 usd:  17
active_anon:746 inactive_anon:32 isolated_anon:0
 active_file:4089 inactive_file:10694 isolated_file:32
 unevictable:0 dirty:450 writeback:0 unstable:0
 free:220 slab_reclaimable:514 slab_unreclaimable:714
 mapped:716 shmem:47 pagetables:46 bounce:0
Normal free:880kB min:1112kB low:1388kB high:1668kB active_anon:2984kB 
inactive_anon:

Re: [tipc-discussion] kernel page allocation problem

2016-03-29 Thread Jon Maloy
Hi Leon,
Yes, you are running TIPC 2.0,  but that does not really mean much. The version 
number hasn't changed the last 4-5 years, while huge parts of the code has been 
fundamentally rewritten during that period.
There were several issues like the one you have encountered, and almost all of 
them have been dealt with or just disappeared in the newer versions.
The code you are using is in reality very old, and I don't know  if we could 
fix it even if we tried, since I am not sure there are any patch releases for 
this old kernel versions.

I would strongly recommend you to move to a more recent kernel if possible 
(Preferable 4.2 or later), so you can benefit from the maintenance and upgrades 
we are working on now.

Regards
///jon

PS. We will step code version number to 3.0 later this year, if everything goes 
according to plan, but the protocol version will not be changed.


> -Original Message-
> From: Leon Pollak [mailto:le...@plris.com]
> Sent: Tuesday, 29 March, 2016 07:18
> To: tipc-discussion@lists.sourceforge.net
> Subject: [tipc-discussion] kernel page allocation problem
> 
> Hello, gurus.
> 
> Recently I encountered the kernel "page allocation" error in the TIPC code.
> I attached the backtrace printout.
> I use 2.6.37 Linux version, which seems to use TIPC 2.0 (although I am not
> sure, but the code is prom the Arago distribution).
> All this runs in the embedded ARM system.
> Although it seems to me irrelevant, but anyway - the largest packet
> transferred by me is 15KiB.
> 
> I will be very thankful for any comment about the issue.
> --
> Leon

--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] kernel page allocation problem

2016-03-30 Thread Leon Pollak
Hi, Jon.
Thank you for the reply.
I can't upgrade the kernel to any other version, because it is an embedded 
system based on TI DM8148 processor and there are a lot of drivers and 
firmware working with it.

As far as I understand, I can change/upgrade the TIPC files only. But I wasn't 
able to find the sources in sf.net to try. Does it mean that I need to take 
them from the kernel itself?

Can you be so kind to recommend me the most suitable kernel version which may 
be compiled into 2.6.37 kernel to substitute the "native" code?

Thank you again for your help!

On Tuesday 29 March 2016 19:05:03 Jon Maloy wrote:
> Hi Leon,
> Yes, you are running TIPC 2.0,  but that does not really mean much. The 
> version number hasn't changed the last 4-5 years, while huge parts of the 
> code has been fundamentally rewritten during that period.
> There were several issues like the one you have encountered, and almost all 
> of them have been dealt with or just disappeared in the newer versions.
> The code you are using is in reality very old, and I don't know  if we could 
> fix it even if we tried, since I am not sure there are any patch releases 
> for this old kernel versions.
> 
> I would strongly recommend you to move to a more recent kernel if possible 
> (Preferable 4.2 or later), so you can benefit from the maintenance and 
> upgrades we are working on now.
> 
> Regards
> ///jon

-- 
Leon

--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] kernel page allocation problem

2016-03-30 Thread Jon Maloy
Hi Leon,
I took a quick look at your dump, and to me it looks like you have simply run 
out of memory. I think that is the first thing you should look at.
As for the most recent code version to use, I cannot give you aby definite 
answer, since the kernel API and environment keeps changing continuously, so 
new code will not easily compile on older kernels and vice versa.
If you want to go down that road, I would recommend a bisectional approach, 
e.g., try if the code from 3.15 applies and runs, and then go forwards or 
backwards from there.
But, as already said, look at the problem you have at hand first.

BR
///jon


> -Original Message-
> From: Leon Pollak [mailto:le...@plris.com]
> Sent: Wednesday, 30 March, 2016 03:45
> To: Jon Maloy
> Cc: tipc-discussion@lists.sourceforge.net
> Subject: Re: [tipc-discussion] kernel page allocation problem
> 
> Hi, Jon.
> Thank you for the reply.
> I can't upgrade the kernel to any other version, because it is an embedded
> system based on TI DM8148 processor and there are a lot of drivers and
> firmware working with it.
> 
> As far as I understand, I can change/upgrade the TIPC files only. But I wasn't
> able to find the sources in sf.net to try. Does it mean that I need to take
> them from the kernel itself?
> 
> Can you be so kind to recommend me the most suitable kernel version which
> may
> be compiled into 2.6.37 kernel to substitute the "native" code?
> 
> Thank you again for your help!
> 
> On Tuesday 29 March 2016 19:05:03 Jon Maloy wrote:
> > Hi Leon,
> > Yes, you are running TIPC 2.0,  but that does not really mean much. The
> > version number hasn't changed the last 4-5 years, while huge parts of the
> > code has been fundamentally rewritten during that period.
> > There were several issues like the one you have encountered, and almost all
> > of them have been dealt with or just disappeared in the newer versions.
> > The code you are using is in reality very old, and I don't know  if we could
> > fix it even if we tried, since I am not sure there are any patch releases
> > for this old kernel versions.
> >
> > I would strongly recommend you to move to a more recent kernel if possible
> > (Preferable 4.2 or later), so you can benefit from the maintenance and
> > upgrades we are working on now.
> >
> > Regards
> > ///jon
> 
> --
> Leon

--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] kernel page allocation problem

2016-03-30 Thread Leon Pollak
On Wednesday 30 March 2016 14:40:58 Jon Maloy wrote:
> Hi Leon,
> I took a quick look at your dump, and to me it looks like you have simply 
run out of memory. I think that is the first thing you should look at.
> As for the most recent code version to use, I cannot give you aby definite 
answer, since the kernel API and environment keeps changing continuously, so 
new code will not easily compile on older kernels and vice versa.
> If you want to go down that road, I would recommend a bisectional approach, 
e.g., try if the code from 3.15 applies and runs, and then go forwards or 
backwards from there.
> But, as already said, look at the problem you have at hand first.


Jon, thank you for the answer.

I saw that the problem is in the lack of memory.
I also made a simple stress test (one side sends in a loop) and received this 
problem immediately.

But User's Manual states that if there is no room for the receiving side to 
accept message it prevents the sending side from sending, effectively blocking 
the sender.
This is what I expected to occur. But this doesn't happen in my case.
I logged the sendto() execution time and it always was less then 2ms, except 
the case when this page error occurred on the accepting side - then sendto() 
took about 500-700 ms.

Excuse me, please, may be this is the simple newbie question...:-)
What do I do wrong?

-- 
Leon

--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] kernel page allocation problem

2016-04-01 Thread Jon Maloy


> -Original Message-
> From: Leon Pollak [mailto:le...@plris.com]
> Sent: Wednesday, 30 March, 2016 11:05
> To: Jon Maloy
> Cc: tipc-discussion@lists.sourceforge.net
> Subject: Re: [tipc-discussion] kernel page allocation problem
> 
> On Wednesday 30 March 2016 14:40:58 Jon Maloy wrote:
> > Hi Leon,
> > I took a quick look at your dump, and to me it looks like you have simply
> run out of memory. I think that is the first thing you should look at.
> > As for the most recent code version to use, I cannot give you aby definite
> answer, since the kernel API and environment keeps changing continuously, so
> new code will not easily compile on older kernels and vice versa.
> > If you want to go down that road, I would recommend a bisectional approach,
> e.g., try if the code from 3.15 applies and runs, and then go forwards or
> backwards from there.
> > But, as already said, look at the problem you have at hand first.
> 
> 
> Jon, thank you for the answer.
> 
> I saw that the problem is in the lack of memory.
> I also made a simple stress test (one side sends in a loop) and received this
> problem immediately.
> 
> But User's Manual states that if there is no room for the receiving side to
> accept message it prevents the sending side from sending, effectively blocking
> the sender.

That is true only for connection oriented messaging (SOC_STREAM, 
SOCK_SEQPACKET). Is that what you are running?
If you are just doing a tight loop with 66k messages using TIPC_PORT_NAME you 
will not have any flow control, and the messages might be dropped in the 
receiving socket.
But if you are really tight of memory they might also be dropped at the link 
layer, which is what you are seeing. 
I have never seen that happening before, but it is fully possible in TIPC 
versions before commit 40ba3cdf542a469aaa9  ("tipc: message reassembly using 
fragment chain") from Nov 6th 2013. (I don't remember which Linux version this 
corresponds to, but that should be easy to find out). Furthermore, this might 
even happen with connection oriented messaging, but is less likely.

So, if you are running connectionless, I recommend you to go to a connection 
oriented mode. If you already are connection oriented, you can try to find out 
which version this fix was in, and re-adapt the code.

Regards
///jon


> This is what I expected to occur. But this doesn't happen in my case.
> I logged the sendto() execution time and it always was less then 2ms, except
> the case when this page error occurred on the accepting side - then sendto()
> took about 500-700 ms.
> 
> Excuse me, please, may be this is the simple newbie question...:-)
> What do I do wrong?
> 
> --
> Leon

--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] kernel page allocation problem

2016-04-10 Thread Leon Pollak
On Friday 01 April 2016 15:02:05 Jon Maloy wrote:
> > I saw that the problem is in the lack of memory.
> > I also made a simple stress test (one side sends in a loop) and received 
> > this problem immediately.
> > But User's Manual states that if there is no room for the receiving side 
> > to accept message it prevents the sending side from sending, effectively
> > blocking the sender.
> 
> That is true only for connection oriented messaging (SOC_STREAM, 
> SOCK_SEQPACKET). Is that what you are running?
> If you are just doing a tight loop with 66k messages using TIPC_PORT_NAME 
> you will not have any flow control, and the messages might be dropped in the 
> receiving socket.
> But if you are really tight of memory they might also be dropped at the link 
> layer, which is what you are seeing. 
> I have never seen that happening before, but it is fully possible in TIPC 
> versions before commit 40ba3cdf542a469aaa9  ("tipc: message reassembly using 
> fragment chain") from Nov 6th 2013. (I don't remember which Linux version 
> this corresponds to, but that should be easy to find out). Furthermore, this 
> might even happen with connection oriented messaging, but is less likely.
> 
> So, if you are running connectionless, I recommend you to go to a connection 
oriented mode. If you already are connection oriented, you can try to find out 
which version this fix was in, and re-adapt the code.

Thanks a lot, Jon, again.

After your answer I again looked through the both manuals and did not find 
anything saying that connectionless messaging allows drops. Vice verse, the 
programmer's manual explicitly states that "TIPC is designed to be a reliable 
messaging mechanism, in which an application can send a message and assume 
that the message will be delivered to the specified destination as long as 
that destination is reachable."

Now, is it a bug, corrected in Nov 6th 2013? Or a feature?

i can't move to connection oriented methods, because I need to support one-
to-3 and 3-to-one.

Sorry.
-- 
Leon

--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial! http://pubads.g.doubleclick.net/
gampad/clk?id=1444514301&iu=/ca-pub-7940484522588532
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] kernel page allocation problem

2016-04-10 Thread Erik Hugne
On Apr 10, 2016 15:27, "Leon Pollak"  wrote:
>
> After your answer I again looked through the both manuals and did not find
> anything saying that connectionless messaging allows drops. Vice verse,
the
> programmer's manual explicitly states that "TIPC is designed to be a
reliable
> messaging mechanism, in which an application can send a message and assume
> that the message will be delivered to the specified destination as long as
> that destination is reachable."
>
> Now, is it a bug, corrected in Nov 6th 2013? Or a feature?
>
> i can't move to connection oriented methods, because I need to support
one-
> to-3 and 3-to-one.
>

This sounds a lot like the problem when connectionless messages are
received, acked on the tipc link layer and passed to the socket, but
dropped there because the socket receive buffer is full.
If the receiving application does not drain the queue faster than messages
build up, there will be losses as there us no concept of flow control at
this level.
Try bumping the prio of the server, maybe run it as an rt thread?

//E

> Sorry.
> --
> Leon
--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial! http://pubads.g.doubleclick.net/
gampad/clk?id=1444514301&iu=/ca-pub-7940484522588532
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] kernel page allocation problem

2016-04-11 Thread Jon Maloy


On 04/11/2016 01:28 AM, Erik Hugne wrote:
> On Apr 10, 2016 15:27, "Leon Pollak"  wrote:
>> After your answer I again looked through the both manuals and did not find
>> anything saying that connectionless messaging allows drops. Vice verse,
> the
>> programmer's manual explicitly states that "TIPC is designed to be a
> reliable
>> messaging mechanism, in which an application can send a message and assume
>> that the message will be delivered to the specified destination as long as
>> that destination is reachable."
>>
>> Now, is it a bug, corrected in Nov 6th 2013? Or a feature?

If it is a bug, the bug is in the manual, because we we cannot guarantee
sequenctial, loss-free delivery using SOCK_DGRAM or SOCK_RDM.
First, this comes from the very definition of those two communication
modes, second it is a practical impossibility to guarantee this 100% with
connectionless while retaining any reasonable performance.

What was corrected in the aforementioned commit was the buffer
allocation problem, which Erik describes correctly below. That was a bug.
But we still cannot guarantee connectionless delivery socket-to-socket,
because an overwhelmed receiving socket will have to toss messages away
when its receive buffer is full.

>>
>> i can't move to connection oriented methods, because I need to support
> one-
>> to-3 and 3-to-one.
As I see it you have three options here:
- Set up three connections via three different sockets pairs.
- Make your own end-to-end flow control at user level (it is not hard)
- Possibly increasing server priority, as Erik is suggesting. But this gives
you no absolute guarantee.

//jon

>>
> This sounds a lot like the problem when connectionless messages are
> received, acked on the tipc link layer and passed to the socket, but
> dropped there because the socket receive buffer is full.
> If the receiving application does not drain the queue faster than messages
> build up, there will be losses as there us no concept of flow control at
> this level.
> Try bumping the prio of the server, maybe run it as an rt thread?
>
> //E
>
>> Sorry.
>> --
>> Leon
> --
> Find and fix application performance issues faster with Applications Manager
> Applications Manager provides deep performance insights into multiple tiers of
> your business applications. It resolves application problems quickly and
> reduces your MTTR. Get your free trial! http://pubads.g.doubleclick.net/
> gampad/clk?id=1444514301&iu=/ca-pub-7940484522588532
> ___
> tipc-discussion mailing list
> tipc-discussion@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/tipc-discussion


--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial! http://pubads.g.doubleclick.net/
gampad/clk?id=1444514301&iu=/ca-pub-7940484522588532
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion