Re: [PATCH] net: sk == 0xffffffff fix - not for commit

2014-01-24 Thread Andrew Ruder
On Fri, Jan 24, 2014 at 07:38:31AM -0600, Andrew Ruder wrote:
> http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg574770.html
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58854

Just a little further confirmation.  This appears in
__inet_lookup_established as the last four instructions before
returning.

   <+440>:   bl  <__rcu_read_unlock>
   <+444>:   sub sp, r11, #40; 0x28
   <+448>:   ldr r0, [r11, #-48] ; 0x30
   <+452>:   ldm sp, {r4, r5, r6, r7, r8, r9, r10, r11, sp, pc}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] net: sk == 0xffffffff fix - not for commit

2014-01-24 Thread Andrew Ruder
Actually found what looks to be a fix for this in another thread.

http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg574770.html
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58854

Cheers,
Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] net: sk == 0xffffffff fix - not for commit

2014-01-24 Thread Andrew Ruder
Actually found what looks to be a fix for this in another thread.

http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg574770.html
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58854

Cheers,
Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] net: sk == 0xffffffff fix - not for commit

2014-01-24 Thread Andrew Ruder
On Fri, Jan 24, 2014 at 07:38:31AM -0600, Andrew Ruder wrote:
 http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg574770.html
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58854

Just a little further confirmation.  This appears in
__inet_lookup_established as the last four instructions before
returning.

   +440:   bl  __rcu_read_unlock
   +444:   sub sp, r11, #40; 0x28
   +448:   ldr r0, [r11, #-48] ; 0x30
   +452:   ldm sp, {r4, r5, r6, r7, r8, r9, r10, r11, sp, pc}
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] net: sk == 0xffffffff fix - not for commit

2014-01-17 Thread Andrzej Pietrasiewicz

W dniu 17.01.2014 13:18, Andrzej Pietrasiewicz pisze:

W dniu 16.01.2014 17:29, Eric Dumazet pisze:

On Thu, 2014-01-16 at 16:21 +0100, Andrzej Pietrasiewicz wrote:

W dniu 10.12.2013 15:25, Eric Dumazet pisze:

On Tue, 2013-12-10 at 07:55 +0100, Andrzej Pietrasiewicz wrote:

W dniu 09.12.2013 16:31, Eric Dumazet pisze:

On Mon, 2013-12-09 at 12:47 +0100, Andrzej Pietrasiewicz wrote:

NOT FOR COMMITTING TO MAINLINE.

With g_ether loaded the sk occasionally becomes 0x.
It happens usually after transferring few hundreds of kilobytes to few
tens of megabytes. If sk is 0x then dereferencing it causes
kernel panic.

This is a *workaround*. I don't know enough net code to understand the core
of the problem. However, with this patch applied the problems are gone,
or at least pushed farther away.


Is it happening on SMP or UP ?


UP build, S5PC110


OK

I believe you need additional debugging to track the exact moment
0x is fed to 'sk'

It looks like a very strange bug, involving a problem in some assembly
helper, register save/restore, compiler bug or stack corruption or
something.



I started with adding WARN_ON(sk == 0x); just before return in
__inet_lookup_established(), and the problem was gone. So this looks
very strange, like a toolchain problem.


Or a timing issue. Adding a WARN_ON() adds extra instructions and might
really change the assembly output.



I used gcc-linaro-arm-linux-gnueabihf-4.8-2013.05.

If I change the toolchain to

gcc-linaro-arm-linux-gnueabihf-4.7-2013.04-20130415

the problem seems to have gone away.


Its totally possible some barrier was not properly handled by the
compiler. You could disassemble the function on both toolchains and
try to spot the issue.



So I gave it a try.

Below is a part of assembly code (ARM) which corresponds to the last
lines of the __inet_lookup_established():

C source:
=
found:
rcu_read_unlock();
return sk;
}

assembly for toolchain 4.7:
===
c0333bb8:   ebf4bb6ebl  c0062978 <__rcu_read_unlock>
c0333bbc:   e51b0030ldr r0, [fp, #-48]  ; 0x30
c0333bc0:   e24bd028sub sp, fp, #40 ; 0x28
c0333bc4:   e89daff0ldm sp, {r4, r5, r6, r7, r8, r9, sl, fp, 
sp, pc}
c0333bc8:   e5132018ldr r2, [r3, #-24]


assembly for toolchain 4.8:
===
c033ff5c:   ebf4927ebl  c006495c <__rcu_read_unlock>
c033ff60:   e24bd028sub sp, fp, #40 ; 0x28
c033ff64:   e51b0030ldr r0, [fp, #-48]  ; 0x30
c033ff68:   e89daff0ldm sp, {r4, r5, r6, r7, r8, r9, sl, fp, 
sp, pc}
c033ff6c:   e5113018ldr r3, [r1, #-24]

What can be seen is that the usage of registers is slightly different,
and, what is more important, the _order_ of ldr/sub is different.
Now, if I swap the instructions at offsets c033ff60 and c033ff64
in the 4.8-generated vmlinux, the problem seems gone! Well, at least
the binary behaves the same way as the 4.7-generated one.

Here is a _hypothesis_ of what _might_ be happening:

The function in question puts its return value in the register r0.
In both cases the return value is fetched from a memory location
relative #-48 to what the frame pointer points to. However,
in the 4.7-generated binary the ldr executes in the branch delay slot,
whereas in the 4.8-generated binary it is the sub which executes
in the branch delay slot. That way, in the 4.7-generated binary the return
value is fetched before __rcu_read_unlock begins, but in the
4.8-generated binary it is fetched some time later. Which might be
enough for someone else to schedule in and break the data to be
copied to r0 and returned from the function.

As I said, this is just a hypothesis.



Please disregard what I have written.

There is no delay slot on ARM :O

A nice hypothesis, though ;)

AP


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] net: sk == 0xffffffff fix - not for commit

2014-01-17 Thread Andrzej Pietrasiewicz

W dniu 16.01.2014 17:29, Eric Dumazet pisze:

On Thu, 2014-01-16 at 16:21 +0100, Andrzej Pietrasiewicz wrote:

W dniu 10.12.2013 15:25, Eric Dumazet pisze:

On Tue, 2013-12-10 at 07:55 +0100, Andrzej Pietrasiewicz wrote:

W dniu 09.12.2013 16:31, Eric Dumazet pisze:

On Mon, 2013-12-09 at 12:47 +0100, Andrzej Pietrasiewicz wrote:

NOT FOR COMMITTING TO MAINLINE.

With g_ether loaded the sk occasionally becomes 0x.
It happens usually after transferring few hundreds of kilobytes to few
tens of megabytes. If sk is 0x then dereferencing it causes
kernel panic.

This is a *workaround*. I don't know enough net code to understand the core
of the problem. However, with this patch applied the problems are gone,
or at least pushed farther away.


Is it happening on SMP or UP ?


UP build, S5PC110


OK

I believe you need additional debugging to track the exact moment
0x is fed to 'sk'

It looks like a very strange bug, involving a problem in some assembly
helper, register save/restore, compiler bug or stack corruption or
something.



I started with adding WARN_ON(sk == 0x); just before return in
__inet_lookup_established(), and the problem was gone. So this looks
very strange, like a toolchain problem.


Or a timing issue. Adding a WARN_ON() adds extra instructions and might
really change the assembly output.



I used gcc-linaro-arm-linux-gnueabihf-4.8-2013.05.

If I change the toolchain to

gcc-linaro-arm-linux-gnueabihf-4.7-2013.04-20130415

the problem seems to have gone away.


Its totally possible some barrier was not properly handled by the
compiler. You could disassemble the function on both toolchains and
try to spot the issue.



So I gave it a try.

Below is a part of assembly code (ARM) which corresponds to the last
lines of the __inet_lookup_established():

C source:
=
found:
rcu_read_unlock();
return sk;
}

assembly for toolchain 4.7:
===
c0333bb8:   ebf4bb6ebl  c0062978 <__rcu_read_unlock>
c0333bbc:   e51b0030ldr r0, [fp, #-48]  ; 0x30
c0333bc0:   e24bd028sub sp, fp, #40 ; 0x28
c0333bc4:   e89daff0ldm sp, {r4, r5, r6, r7, r8, r9, sl, fp, 
sp, pc}
c0333bc8:   e5132018ldr r2, [r3, #-24]


assembly for toolchain 4.8:
===
c033ff5c:   ebf4927ebl  c006495c <__rcu_read_unlock>
c033ff60:   e24bd028sub sp, fp, #40 ; 0x28
c033ff64:   e51b0030ldr r0, [fp, #-48]  ; 0x30
c033ff68:   e89daff0ldm sp, {r4, r5, r6, r7, r8, r9, sl, fp, 
sp, pc}
c033ff6c:   e5113018ldr r3, [r1, #-24]

What can be seen is that the usage of registers is slightly different,
and, what is more important, the _order_ of ldr/sub is different.
Now, if I swap the instructions at offsets c033ff60 and c033ff64
in the 4.8-generated vmlinux, the problem seems gone! Well, at least
the binary behaves the same way as the 4.7-generated one.

Here is a _hypothesis_ of what _might_ be happening:

The function in question puts its return value in the register r0.
In both cases the return value is fetched from a memory location
relative #-48 to what the frame pointer points to. However,
in the 4.7-generated binary the ldr executes in the branch delay slot,
whereas in the 4.8-generated binary it is the sub which executes
in the branch delay slot. That way, in the 4.7-generated binary the return
value is fetched before __rcu_read_unlock begins, but in the
4.8-generated binary it is fetched some time later. Which might be
enough for someone else to schedule in and break the data to be
copied to r0 and returned from the function.

As I said, this is just a hypothesis.

AP
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] net: sk == 0xffffffff fix - not for commit

2014-01-17 Thread Andrzej Pietrasiewicz

W dniu 16.01.2014 17:29, Eric Dumazet pisze:

On Thu, 2014-01-16 at 16:21 +0100, Andrzej Pietrasiewicz wrote:

W dniu 10.12.2013 15:25, Eric Dumazet pisze:

On Tue, 2013-12-10 at 07:55 +0100, Andrzej Pietrasiewicz wrote:

W dniu 09.12.2013 16:31, Eric Dumazet pisze:

On Mon, 2013-12-09 at 12:47 +0100, Andrzej Pietrasiewicz wrote:

NOT FOR COMMITTING TO MAINLINE.

With g_ether loaded the sk occasionally becomes 0x.
It happens usually after transferring few hundreds of kilobytes to few
tens of megabytes. If sk is 0x then dereferencing it causes
kernel panic.

This is a *workaround*. I don't know enough net code to understand the core
of the problem. However, with this patch applied the problems are gone,
or at least pushed farther away.


Is it happening on SMP or UP ?


UP build, S5PC110


OK

I believe you need additional debugging to track the exact moment
0x is fed to 'sk'

It looks like a very strange bug, involving a problem in some assembly
helper, register save/restore, compiler bug or stack corruption or
something.



I started with adding WARN_ON(sk == 0x); just before return in
__inet_lookup_established(), and the problem was gone. So this looks
very strange, like a toolchain problem.


Or a timing issue. Adding a WARN_ON() adds extra instructions and might
really change the assembly output.



I used gcc-linaro-arm-linux-gnueabihf-4.8-2013.05.

If I change the toolchain to

gcc-linaro-arm-linux-gnueabihf-4.7-2013.04-20130415

the problem seems to have gone away.


Its totally possible some barrier was not properly handled by the
compiler. You could disassemble the function on both toolchains and
try to spot the issue.



So I gave it a try.

Below is a part of assembly code (ARM) which corresponds to the last
lines of the __inet_lookup_established():

C source:
=
found:
rcu_read_unlock();
return sk;
}

assembly for toolchain 4.7:
===
c0333bb8:   ebf4bb6ebl  c0062978 __rcu_read_unlock
c0333bbc:   e51b0030ldr r0, [fp, #-48]  ; 0x30
c0333bc0:   e24bd028sub sp, fp, #40 ; 0x28
c0333bc4:   e89daff0ldm sp, {r4, r5, r6, r7, r8, r9, sl, fp, 
sp, pc}
c0333bc8:   e5132018ldr r2, [r3, #-24]


assembly for toolchain 4.8:
===
c033ff5c:   ebf4927ebl  c006495c __rcu_read_unlock
c033ff60:   e24bd028sub sp, fp, #40 ; 0x28
c033ff64:   e51b0030ldr r0, [fp, #-48]  ; 0x30
c033ff68:   e89daff0ldm sp, {r4, r5, r6, r7, r8, r9, sl, fp, 
sp, pc}
c033ff6c:   e5113018ldr r3, [r1, #-24]

What can be seen is that the usage of registers is slightly different,
and, what is more important, the _order_ of ldr/sub is different.
Now, if I swap the instructions at offsets c033ff60 and c033ff64
in the 4.8-generated vmlinux, the problem seems gone! Well, at least
the binary behaves the same way as the 4.7-generated one.

Here is a _hypothesis_ of what _might_ be happening:

The function in question puts its return value in the register r0.
In both cases the return value is fetched from a memory location
relative #-48 to what the frame pointer points to. However,
in the 4.7-generated binary the ldr executes in the branch delay slot,
whereas in the 4.8-generated binary it is the sub which executes
in the branch delay slot. That way, in the 4.7-generated binary the return
value is fetched before __rcu_read_unlock begins, but in the
4.8-generated binary it is fetched some time later. Which might be
enough for someone else to schedule in and break the data to be
copied to r0 and returned from the function.

As I said, this is just a hypothesis.

AP
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] net: sk == 0xffffffff fix - not for commit

2014-01-17 Thread Andrzej Pietrasiewicz

W dniu 17.01.2014 13:18, Andrzej Pietrasiewicz pisze:

W dniu 16.01.2014 17:29, Eric Dumazet pisze:

On Thu, 2014-01-16 at 16:21 +0100, Andrzej Pietrasiewicz wrote:

W dniu 10.12.2013 15:25, Eric Dumazet pisze:

On Tue, 2013-12-10 at 07:55 +0100, Andrzej Pietrasiewicz wrote:

W dniu 09.12.2013 16:31, Eric Dumazet pisze:

On Mon, 2013-12-09 at 12:47 +0100, Andrzej Pietrasiewicz wrote:

NOT FOR COMMITTING TO MAINLINE.

With g_ether loaded the sk occasionally becomes 0x.
It happens usually after transferring few hundreds of kilobytes to few
tens of megabytes. If sk is 0x then dereferencing it causes
kernel panic.

This is a *workaround*. I don't know enough net code to understand the core
of the problem. However, with this patch applied the problems are gone,
or at least pushed farther away.


Is it happening on SMP or UP ?


UP build, S5PC110


OK

I believe you need additional debugging to track the exact moment
0x is fed to 'sk'

It looks like a very strange bug, involving a problem in some assembly
helper, register save/restore, compiler bug or stack corruption or
something.



I started with adding WARN_ON(sk == 0x); just before return in
__inet_lookup_established(), and the problem was gone. So this looks
very strange, like a toolchain problem.


Or a timing issue. Adding a WARN_ON() adds extra instructions and might
really change the assembly output.



I used gcc-linaro-arm-linux-gnueabihf-4.8-2013.05.

If I change the toolchain to

gcc-linaro-arm-linux-gnueabihf-4.7-2013.04-20130415

the problem seems to have gone away.


Its totally possible some barrier was not properly handled by the
compiler. You could disassemble the function on both toolchains and
try to spot the issue.



So I gave it a try.

Below is a part of assembly code (ARM) which corresponds to the last
lines of the __inet_lookup_established():

C source:
=
found:
rcu_read_unlock();
return sk;
}

assembly for toolchain 4.7:
===
c0333bb8:   ebf4bb6ebl  c0062978 __rcu_read_unlock
c0333bbc:   e51b0030ldr r0, [fp, #-48]  ; 0x30
c0333bc0:   e24bd028sub sp, fp, #40 ; 0x28
c0333bc4:   e89daff0ldm sp, {r4, r5, r6, r7, r8, r9, sl, fp, 
sp, pc}
c0333bc8:   e5132018ldr r2, [r3, #-24]


assembly for toolchain 4.8:
===
c033ff5c:   ebf4927ebl  c006495c __rcu_read_unlock
c033ff60:   e24bd028sub sp, fp, #40 ; 0x28
c033ff64:   e51b0030ldr r0, [fp, #-48]  ; 0x30
c033ff68:   e89daff0ldm sp, {r4, r5, r6, r7, r8, r9, sl, fp, 
sp, pc}
c033ff6c:   e5113018ldr r3, [r1, #-24]

What can be seen is that the usage of registers is slightly different,
and, what is more important, the _order_ of ldr/sub is different.
Now, if I swap the instructions at offsets c033ff60 and c033ff64
in the 4.8-generated vmlinux, the problem seems gone! Well, at least
the binary behaves the same way as the 4.7-generated one.

Here is a _hypothesis_ of what _might_ be happening:

The function in question puts its return value in the register r0.
In both cases the return value is fetched from a memory location
relative #-48 to what the frame pointer points to. However,
in the 4.7-generated binary the ldr executes in the branch delay slot,
whereas in the 4.8-generated binary it is the sub which executes
in the branch delay slot. That way, in the 4.7-generated binary the return
value is fetched before __rcu_read_unlock begins, but in the
4.8-generated binary it is fetched some time later. Which might be
enough for someone else to schedule in and break the data to be
copied to r0 and returned from the function.

As I said, this is just a hypothesis.



Please disregard what I have written.

There is no delay slot on ARM :O

A nice hypothesis, though ;)

AP


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] net: sk == 0xffffffff fix - not for commit

2014-01-16 Thread Andrew Ruder
On Mon, Dec 09, 2013 at 12:47:52PM +0100, Andrzej Pietrasiewicz wrote:
> With g_ether loaded the sk occasionally becomes 0x.
> It happens usually after transferring few hundreds of kilobytes to few
> tens of megabytes. If sk is 0x then dereferencing it causes
> kernel panic.

Don't know if this is relevant but I had this very similar stack trace
come up a few days ago (below).  I am working on a PXA 270/xscale with
gcc version 4.8.2 (Buildroot 2013.11-rc1-00028-gf388663).  Going to try
to see if I can reproduce it a little more readily before I start trying
to narrow down what is causing it.

===
Unable to handle kernel NULL pointer dereference at virtual address 0011
pgd = d18e
[0011] *pgd=a6d03831, *pte=, *ppte=
Internal error: Oops: 17 [#1] PREEMPT ARM
Modules linked in: zeusvirt(O) zeus16550(O) 8390p ipv6
CPU: 0 PID: 2365 Comm: sshd Tainted: G   O 3.12.0+ #201
task: d7216f00 ti: d7144000 task.ti: d7144000
PC is at tcp_v4_early_demux+0xe8/0x154
LR is at __inet_lookup_established+0x1bc/0x2e0
pc : []lr : []psr: a013
sp : d7145b20  ip : d7145ae8  fp : d7145b44
r10: c0576c28  r9 : 0008  r8 : d7998800
r7 : d7063800  r6 : c6cf2480  r5 :   r4 : c6cf2480
r3 : c02ec018  r2 : d7145ad0  r1 : d7b66a28  r0 : 
Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 397f  Table: b18e  DAC: 0015
Process sshd (pid: 2365, stack limit = 0xd71441c8)
Stack: (0xd7145b20 to 0xd7146000)
5b20: 17bf3f0a 0016 0003 c0026d90 d71f4634 d71f4600 d7145b6c d7145b48
5b40: c03211b4 c0341c20 05ea d7bb0538 d7063800 0034 d71f4600 c6cf2480
5b60: d7145b9c d7145b70 c03218dc c0321158 1001  c0576c1c 
5b80: c0577e84 c0576c14   d7145be4 d7145ba0 c02fae04 c03215d4
5ba0: c0590330 c057fc08 d7145bfc c6cf2480 c02571a0 c0576c28 07e1 c05a3dc0
5bc0:  0001 c05a3d60 c05a3d74 c05a3d60 c05a3d68 d7145bfc d7145be8
5be0: c02fb990 c02fa8f0 c05a3dc0  d7145c24 d7145c00 c02fc46c c02fb968
5c00: c02fc3dc c05a3dc0 c05a3d60 0001 012c 0040 d7145c64 d7145c28
5c20: c02fbcd0 c02fc3e8  d78af3c0 d7145c5c 8d99  0001
5c40: c05a81f0 0003 0100 3fa57e1c d7144028 c05a81ec d7145cb4 d7145c68
5c60: c0026a44 c02fbc10 d7145c8c d7145c78 c00538dc c0056ce4  8d98
5c80: 00400100 000a c0228594 6093 c0590330  d7145d54 0001
5ca0: d7bb0480 05b4 d7145ccc d7145cb8 c0026ca4 c00268f4  d7144010
5cc0: d7145ce4 d7145cd0 c0026f58 c0026c58 00ab 001a d7145d04 d7145ce8
5ce0: c000f7d0 c0026ed0 0014 d7145d20 a013  d7145d1c d7145d08
5d00: c00085bc c000f768 c02f0048 c00ca7d8 d7145d7c d7145d20 c03a7dc0 c0008590
5d20: 000118ed  c05a474c c05d41cc d7bb0180 d18ed800 d7801080 06a3
5d40: 0001 d7bb0480 05b4 d7145d7c d7145d80 d7145d68 c02f0048 c00ca7d8
5d60: a013  c05a4738 d7bb0180 d7145dac d7145d80 c02f0048 c00ca7b0
5d80: 0001 00c63fc0 d7b66a00 d7b66a00 4040 05b4  d7b66a00
5da0: d7145dcc d7145db0 c032e340 c02effd0 d7145e98 4040 0008c414 
5dc0: d7145e54 d7145dd0 c032f368 c032e310 d7145e24 c02ea81c c03a6040 c03a9c6c
5de0:   d7145ee8  05b4  d7b66adc 
5e00:  d7144000 1854 05b4 27ec 0040 d7116d80 05b4
5e20:   d7145e6c d7b66a00 d7145ee8 d7145e98 4040 4040
5e40: 4040 0002 d7145e74 d7145e58 c03526c8 c032eb0c d7145e78 d7116d80
5e60: d7145ee0 d7116d80 d7145ed4 d7145e78 c02e63a4 c0352688 c05a3dc0 d7142000
5e80: 0040 4040 d76701c0 d7145ee0  d7145e98  
5ea0: d7145ee0 0001   0040 d7145ee8 c6cf2900 
5ec0:  d7145f78 d7145f44 d7145ed8 c00d1c64 c02e62e4  
5ee0: 00089c28 4040 d7116d80   d7145e78 d7216f00 
5f00:     4040   
5f20: 00089c28 d7116d80 00089c28 d7145f78 4040 00089c28 d7145f74 d7145f48
5f40: c00d23a0 c00d1bf4     d7116d80 
5f60: 00089c28 4040 d7145fa4 d7145f78 c00d2948 c00d22c0  
5f80: beed167c 0003 000614dc 0004 c000ea28 d7144000  d7145fa8
5fa0: c000e7e0 c00d2908 beed167c 0003 0003 00089c28 4040 beed167c
5fc0: beed167c 0003 000614dc 0004 00089c28 00060a88 093e beed17a0
5fe0: beed167c beed1648 00029910 b6dc821c 6010 0003  
[] (tcp_v4_early_demux+0xe8/0x154) from [] 
(ip_rcv_finish+0x68/0x2c0)
[] (ip_rcv_finish+0x68/0x2c0) from [] (ip_rcv+0x314/0x398)
[] (ip_rcv+0x314/0x398) from [] 
(__netif_receive_skb_core+0x520/0x5d8)
[] (__netif_receive_skb_core+0x520/0x5d8) from [] 
(__netif_receive_skb+0x34/0x88)
[] (__netif_receive_skb+0x34/0x88) from [] 
(process_backlog+0x90/0x148)
[] (process_backlog+0x90/0x148) from [] 
(net_rx_action+0xcc/0x258)
[] 

Re: [PATCH] net: sk == 0xffffffff fix - not for commit

2014-01-16 Thread Eric Dumazet
On Thu, 2014-01-16 at 16:21 +0100, Andrzej Pietrasiewicz wrote:
> W dniu 10.12.2013 15:25, Eric Dumazet pisze:
> > On Tue, 2013-12-10 at 07:55 +0100, Andrzej Pietrasiewicz wrote:
> >> W dniu 09.12.2013 16:31, Eric Dumazet pisze:
> >>> On Mon, 2013-12-09 at 12:47 +0100, Andrzej Pietrasiewicz wrote:
>  NOT FOR COMMITTING TO MAINLINE.
> 
>  With g_ether loaded the sk occasionally becomes 0x.
>  It happens usually after transferring few hundreds of kilobytes to few
>  tens of megabytes. If sk is 0x then dereferencing it causes
>  kernel panic.
> 
>  This is a *workaround*. I don't know enough net code to understand the 
>  core
>  of the problem. However, with this patch applied the problems are gone,
>  or at least pushed farther away.
> >>>
> >>> Is it happening on SMP or UP ?
> >>
> >> UP build, S5PC110
> >
> > OK
> >
> > I believe you need additional debugging to track the exact moment
> > 0x is fed to 'sk'
> >
> > It looks like a very strange bug, involving a problem in some assembly
> > helper, register save/restore, compiler bug or stack corruption or
> > something.
> >
> 
> I started with adding WARN_ON(sk == 0x); just before return in
> __inet_lookup_established(), and the problem was gone. So this looks
> very strange, like a toolchain problem.

Or a timing issue. Adding a WARN_ON() adds extra instructions and might
really change the assembly output.

> 
> I used gcc-linaro-arm-linux-gnueabihf-4.8-2013.05.
> 
> If I change the toolchain to
> 
> gcc-linaro-arm-linux-gnueabihf-4.7-2013.04-20130415
> 
> the problem seems to have gone away.

Its totally possible some barrier was not properly handled by the
compiler. You could disassemble the function on both toolchains and
try to spot the issue.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] net: sk == 0xffffffff fix - not for commit

2014-01-16 Thread Eric Dumazet
On Thu, 2014-01-16 at 16:21 +0100, Andrzej Pietrasiewicz wrote:
 W dniu 10.12.2013 15:25, Eric Dumazet pisze:
  On Tue, 2013-12-10 at 07:55 +0100, Andrzej Pietrasiewicz wrote:
  W dniu 09.12.2013 16:31, Eric Dumazet pisze:
  On Mon, 2013-12-09 at 12:47 +0100, Andrzej Pietrasiewicz wrote:
  NOT FOR COMMITTING TO MAINLINE.
 
  With g_ether loaded the sk occasionally becomes 0x.
  It happens usually after transferring few hundreds of kilobytes to few
  tens of megabytes. If sk is 0x then dereferencing it causes
  kernel panic.
 
  This is a *workaround*. I don't know enough net code to understand the 
  core
  of the problem. However, with this patch applied the problems are gone,
  or at least pushed farther away.
 
  Is it happening on SMP or UP ?
 
  UP build, S5PC110
 
  OK
 
  I believe you need additional debugging to track the exact moment
  0x is fed to 'sk'
 
  It looks like a very strange bug, involving a problem in some assembly
  helper, register save/restore, compiler bug or stack corruption or
  something.
 
 
 I started with adding WARN_ON(sk == 0x); just before return in
 __inet_lookup_established(), and the problem was gone. So this looks
 very strange, like a toolchain problem.

Or a timing issue. Adding a WARN_ON() adds extra instructions and might
really change the assembly output.

 
 I used gcc-linaro-arm-linux-gnueabihf-4.8-2013.05.
 
 If I change the toolchain to
 
 gcc-linaro-arm-linux-gnueabihf-4.7-2013.04-20130415
 
 the problem seems to have gone away.

Its totally possible some barrier was not properly handled by the
compiler. You could disassemble the function on both toolchains and
try to spot the issue.



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] net: sk == 0xffffffff fix - not for commit

2014-01-16 Thread Andrew Ruder
On Mon, Dec 09, 2013 at 12:47:52PM +0100, Andrzej Pietrasiewicz wrote:
 With g_ether loaded the sk occasionally becomes 0x.
 It happens usually after transferring few hundreds of kilobytes to few
 tens of megabytes. If sk is 0x then dereferencing it causes
 kernel panic.

Don't know if this is relevant but I had this very similar stack trace
come up a few days ago (below).  I am working on a PXA 270/xscale with
gcc version 4.8.2 (Buildroot 2013.11-rc1-00028-gf388663).  Going to try
to see if I can reproduce it a little more readily before I start trying
to narrow down what is causing it.

===
Unable to handle kernel NULL pointer dereference at virtual address 0011
pgd = d18e
[0011] *pgd=a6d03831, *pte=, *ppte=
Internal error: Oops: 17 [#1] PREEMPT ARM
Modules linked in: zeusvirt(O) zeus16550(O) 8390p ipv6
CPU: 0 PID: 2365 Comm: sshd Tainted: G   O 3.12.0+ #201
task: d7216f00 ti: d7144000 task.ti: d7144000
PC is at tcp_v4_early_demux+0xe8/0x154
LR is at __inet_lookup_established+0x1bc/0x2e0
pc : [c0341cfc]lr : [c0329bd8]psr: a013
sp : d7145b20  ip : d7145ae8  fp : d7145b44
r10: c0576c28  r9 : 0008  r8 : d7998800
r7 : d7063800  r6 : c6cf2480  r5 :   r4 : c6cf2480
r3 : c02ec018  r2 : d7145ad0  r1 : d7b66a28  r0 : 
Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 397f  Table: b18e  DAC: 0015
Process sshd (pid: 2365, stack limit = 0xd71441c8)
Stack: (0xd7145b20 to 0xd7146000)
5b20: 17bf3f0a 0016 0003 c0026d90 d71f4634 d71f4600 d7145b6c d7145b48
5b40: c03211b4 c0341c20 05ea d7bb0538 d7063800 0034 d71f4600 c6cf2480
5b60: d7145b9c d7145b70 c03218dc c0321158 1001  c0576c1c 
5b80: c0577e84 c0576c14   d7145be4 d7145ba0 c02fae04 c03215d4
5ba0: c0590330 c057fc08 d7145bfc c6cf2480 c02571a0 c0576c28 07e1 c05a3dc0
5bc0:  0001 c05a3d60 c05a3d74 c05a3d60 c05a3d68 d7145bfc d7145be8
5be0: c02fb990 c02fa8f0 c05a3dc0  d7145c24 d7145c00 c02fc46c c02fb968
5c00: c02fc3dc c05a3dc0 c05a3d60 0001 012c 0040 d7145c64 d7145c28
5c20: c02fbcd0 c02fc3e8  d78af3c0 d7145c5c 8d99  0001
5c40: c05a81f0 0003 0100 3fa57e1c d7144028 c05a81ec d7145cb4 d7145c68
5c60: c0026a44 c02fbc10 d7145c8c d7145c78 c00538dc c0056ce4  8d98
5c80: 00400100 000a c0228594 6093 c0590330  d7145d54 0001
5ca0: d7bb0480 05b4 d7145ccc d7145cb8 c0026ca4 c00268f4  d7144010
5cc0: d7145ce4 d7145cd0 c0026f58 c0026c58 00ab 001a d7145d04 d7145ce8
5ce0: c000f7d0 c0026ed0 0014 d7145d20 a013  d7145d1c d7145d08
5d00: c00085bc c000f768 c02f0048 c00ca7d8 d7145d7c d7145d20 c03a7dc0 c0008590
5d20: 000118ed  c05a474c c05d41cc d7bb0180 d18ed800 d7801080 06a3
5d40: 0001 d7bb0480 05b4 d7145d7c d7145d80 d7145d68 c02f0048 c00ca7d8
5d60: a013  c05a4738 d7bb0180 d7145dac d7145d80 c02f0048 c00ca7b0
5d80: 0001 00c63fc0 d7b66a00 d7b66a00 4040 05b4  d7b66a00
5da0: d7145dcc d7145db0 c032e340 c02effd0 d7145e98 4040 0008c414 
5dc0: d7145e54 d7145dd0 c032f368 c032e310 d7145e24 c02ea81c c03a6040 c03a9c6c
5de0:   d7145ee8  05b4  d7b66adc 
5e00:  d7144000 1854 05b4 27ec 0040 d7116d80 05b4
5e20:   d7145e6c d7b66a00 d7145ee8 d7145e98 4040 4040
5e40: 4040 0002 d7145e74 d7145e58 c03526c8 c032eb0c d7145e78 d7116d80
5e60: d7145ee0 d7116d80 d7145ed4 d7145e78 c02e63a4 c0352688 c05a3dc0 d7142000
5e80: 0040 4040 d76701c0 d7145ee0  d7145e98  
5ea0: d7145ee0 0001   0040 d7145ee8 c6cf2900 
5ec0:  d7145f78 d7145f44 d7145ed8 c00d1c64 c02e62e4  
5ee0: 00089c28 4040 d7116d80   d7145e78 d7216f00 
5f00:     4040   
5f20: 00089c28 d7116d80 00089c28 d7145f78 4040 00089c28 d7145f74 d7145f48
5f40: c00d23a0 c00d1bf4     d7116d80 
5f60: 00089c28 4040 d7145fa4 d7145f78 c00d2948 c00d22c0  
5f80: beed167c 0003 000614dc 0004 c000ea28 d7144000  d7145fa8
5fa0: c000e7e0 c00d2908 beed167c 0003 0003 00089c28 4040 beed167c
5fc0: beed167c 0003 000614dc 0004 00089c28 00060a88 093e beed17a0
5fe0: beed167c beed1648 00029910 b6dc821c 6010 0003  
[c0341cfc] (tcp_v4_early_demux+0xe8/0x154) from [c03211b4] 
(ip_rcv_finish+0x68/0x2c0)
[c03211b4] (ip_rcv_finish+0x68/0x2c0) from [c03218dc] (ip_rcv+0x314/0x398)
[c03218dc] (ip_rcv+0x314/0x398) from [c02fae04] 
(__netif_receive_skb_core+0x520/0x5d8)
[c02fae04] (__netif_receive_skb_core+0x520/0x5d8) from [c02fb990] 
(__netif_receive_skb+0x34/0x88)
[c02fb990] (__netif_receive_skb+0x34/0x88) from [c02fc46c] 
(process_backlog+0x90/0x148)

Re: [PATCH] net: sk == 0xffffffff fix - not for commit

2013-12-10 Thread Eric Dumazet
On Tue, 2013-12-10 at 07:55 +0100, Andrzej Pietrasiewicz wrote:
> W dniu 09.12.2013 16:31, Eric Dumazet pisze:
> > On Mon, 2013-12-09 at 12:47 +0100, Andrzej Pietrasiewicz wrote:
> >> NOT FOR COMMITTING TO MAINLINE.
> >>
> >> With g_ether loaded the sk occasionally becomes 0x.
> >> It happens usually after transferring few hundreds of kilobytes to few
> >> tens of megabytes. If sk is 0x then dereferencing it causes
> >> kernel panic.
> >>
> >> This is a *workaround*. I don't know enough net code to understand the core
> >> of the problem. However, with this patch applied the problems are gone,
> >> or at least pushed farther away.
> >
> > Is it happening on SMP or UP ?
> 
> UP build, S5PC110

OK

I believe you need additional debugging to track the exact moment
0x is fed to 'sk'

It looks like a very strange bug, involving a problem in some assembly
helper, register save/restore, compiler bug or stack corruption or
something.

You should not have more than 150 instructions to decode, including
__inet_lookup_established()

Since __inet_lookup_established() dereferences the socket pointer, I do
not see why it would crash ~20 instructions _later_



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] net: sk == 0xffffffff fix - not for commit

2013-12-10 Thread Eric Dumazet
On Tue, 2013-12-10 at 07:55 +0100, Andrzej Pietrasiewicz wrote:
 W dniu 09.12.2013 16:31, Eric Dumazet pisze:
  On Mon, 2013-12-09 at 12:47 +0100, Andrzej Pietrasiewicz wrote:
  NOT FOR COMMITTING TO MAINLINE.
 
  With g_ether loaded the sk occasionally becomes 0x.
  It happens usually after transferring few hundreds of kilobytes to few
  tens of megabytes. If sk is 0x then dereferencing it causes
  kernel panic.
 
  This is a *workaround*. I don't know enough net code to understand the core
  of the problem. However, with this patch applied the problems are gone,
  or at least pushed farther away.
 
  Is it happening on SMP or UP ?
 
 UP build, S5PC110

OK

I believe you need additional debugging to track the exact moment
0x is fed to 'sk'

It looks like a very strange bug, involving a problem in some assembly
helper, register save/restore, compiler bug or stack corruption or
something.

You should not have more than 150 instructions to decode, including
__inet_lookup_established()

Since __inet_lookup_established() dereferences the socket pointer, I do
not see why it would crash ~20 instructions _later_



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] net: sk == 0xffffffff fix - not for commit

2013-12-09 Thread Andrzej Pietrasiewicz

W dniu 09.12.2013 16:31, Eric Dumazet pisze:

On Mon, 2013-12-09 at 12:47 +0100, Andrzej Pietrasiewicz wrote:

NOT FOR COMMITTING TO MAINLINE.

With g_ether loaded the sk occasionally becomes 0x.
It happens usually after transferring few hundreds of kilobytes to few
tens of megabytes. If sk is 0x then dereferencing it causes
kernel panic.

This is a *workaround*. I don't know enough net code to understand the core
of the problem. However, with this patch applied the problems are gone,
or at least pushed farther away.


Is it happening on SMP or UP ?


UP build, S5PC110



Crash should happen earlier in __inet_lookup_established()





AP
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] net: sk == 0xffffffff fix - not for commit

2013-12-09 Thread Eric Dumazet
On Mon, 2013-12-09 at 12:47 +0100, Andrzej Pietrasiewicz wrote:
> NOT FOR COMMITTING TO MAINLINE.
> 
> With g_ether loaded the sk occasionally becomes 0x.
> It happens usually after transferring few hundreds of kilobytes to few
> tens of megabytes. If sk is 0x then dereferencing it causes
> kernel panic.
> 
> This is a *workaround*. I don't know enough net code to understand the core
> of the problem. However, with this patch applied the problems are gone,
> or at least pushed farther away.

Is it happening on SMP or UP ?

Crash should happen earlier in __inet_lookup_established()


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] net: sk == 0xffffffff fix - not for commit

2013-12-09 Thread Andrzej Pietrasiewicz
NOT FOR COMMITTING TO MAINLINE.

With g_ether loaded the sk occasionally becomes 0x.
It happens usually after transferring few hundreds of kilobytes to few
tens of megabytes. If sk is 0x then dereferencing it causes
kernel panic.

This is a *workaround*. I don't know enough net code to understand the core
of the problem. However, with this patch applied the problems are gone,
or at least pushed farther away.

The relevant stack trace below:

[   53.583351] Unable to handle kernel NULL pointer dereference at virtual 
address 0011
 ]
[   53.590077] pgd = c0004000
[   53.592761] [0011] *pgd=
[   53.596319] Internal error: Oops: 17 [#1] PREEMPT ARM
[   53.601223] Modules linked in: usb_f_ecm g_ether u_ether libcomposite
[   53.607641] CPU: 0 PID: 0 Comm: swapper Not tainted 3.12.0-rc6+ #157
[   53.613962] task: c058e5d8 ti: c0584000 task.ti: c0584000
[   53.619345] PC is at tcp_v4_early_demux+0xbc/0x150
[   53.624105] LR is at __inet_lookup_established+0x25c/0x2e0
[   53.629562] pc : []lr : []psr: a113
[   53.629562] sp : c0585d08  ip : c0585cd0  fp : c0585d2c
[   53.640997] r10: c058cf84  r9 : c05c1768  r8 : e7b22740
[   53.646197] r7 :   r6 : 2cb7  r5 :   r4 : e7b22740
[   53.652697] r3 : c0304504  r2 : c0585cb8  r1 : e6d3e070  r0 : 
[   53.659198] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment 
kernel
[   53.666476] Control: 10c5387d  Table: 57b48019  DAC: 0015
[   53.672194] Process swapper (pid: 0, stack limit = 0xc0584238)
[   53.678001] Stack: (0xc0585d08 to 0xc0586000)
[   53.682338] 5d00:   0381a8c0 eb89 0002 c0585d20 
e7b6e810 e7b22758
[   53.690484] 5d20: c0585d5c c0585d30 c03377f0 c03594f4 c00104e8 c0028ef4 
c058cf70 c058ddec
[   53.698629] 5d40: 0008 e7806000  e7b22740 c0585dac c0585d60 
c0311dd4 c0337480
[   53.706775] 5d60: c058cf78  6113  c0585dfc e7b22740 
c0014368 c058cf84
[   53.714921] 5d80:  e7b22740  c05c8b14 0002 c05c8b60 
 00100100
[   53.723067] 5da0: c0585dc4 c0585db0 c0312f8c c0311c20 e7b22740  
c0585dfc c0585dc8
[   53.731212] 5dc0: c031390c c0312f60 00200200 c05c8b44  c05c8b60 
000c 012a
[   53.739358] 5de0: c05c8b00 c059c548 0001 0040 c0585e3c c0585e00 
c031331c c0313874
[   53.747503] 5e00: c05c8ce3 c0584000 c05ca6c0 3f7c c0028504 0003 
000c c05ceb10
[   53.755650] 5e20: c05ceb0c 0001 c0584000 c0584000 c0585ea4 c0585e40 
c0028968 c0313238
[   53.763795] 5e40: c005cc94 c005eb24 0020 c059b3e0 3f7b c059c548 
c05ceac0 000a
[   53.771941] 5e60:  c059e6d8 c0584000 000c 0101 c05c8d70 
 6193
[   53.780086] 5e80: c0584000  c0619b00  412fc082 c0584038 
c0585ebc c0585ea8
[   53.788232] 5ea0: c0028c40 c0028814  c0584000 c0585ed4 c0585ec0 
c0028fb8 c0028bd0
[   53.796378] 5ec0: c05b5018 0058 c0585ef4 c0585ed8 c00104e8 c0028ef4 
0020 c0619b28
[   53.804524] 5ee0: c0585f20 0001 c0585f1c c0585ef8 c00085dc c00104a8 
c058c734 c00106d4
[   53.812670] 5f00: 6013  c0585f54 c058c0d0 c0585f74 c0585f20 
c0014344 c0008578
[   53.820815] 5f20:  2a9c  c058c734 c0584038 c05c92ac 
c0584000 c05c8c08
[   53.828961] 5f40: c058c0d0 412fc082 c0584038 c0585f74 c0585f68 c0585f68 
c00106d0 c00106d4
[   53.837107] 5f60: 6013  c0585f9c c0585f78 c005c2ac c00106ac 
c058c040 c0584000
[   53.845252] 5f80: c0584000 c0584000 c03aa868  c0585fb4 c0585fa0 
c03a2458 c005c198
[   53.853398] 5fa0:  c058ca08 c0585ff4 c0585fb8 c053aa74 c03a23d0 
 
[   53.861544] 5fc0: c053a540   c0566058  10c53c7d 
c058c05c c0566054
[   53.869689] 5fe0: c058f88c 30004059  c0585ff8 30008070 c053a7c8 
 
[   53.877848] [] (tcp_v4_early_demux+0xbc/0x150) from [] 
(ip_rcv+0x37c/0x590)
[   53.886510] [] (ip_rcv+0x37c/0x590) from [] 
(__netif_receive_skb_core+0x1c0/0x624)
[   53.895779] [] (__netif_receive_skb_core+0x1c0/0x624) from 
[] (__netif_receive_skb+0x38/0x88)
[   53.906003] [] (__netif_receive_skb+0x38/0x88) from [] 
(process_backlog+0xa4/0x15c)
[   53.915361] [] (process_backlog+0xa4/0x15c) from [] 
(net_rx_action+0xf0/0x230)
[   53.924290] [] (net_rx_action+0xf0/0x230) from [] 
(__do_softirq+0x160/0x35c)
[   53.933040] [] (__do_softirq+0x160/0x35c) from [] 
(do_softirq+0x7c/0x80)
[   53.941444] [] (do_softirq+0x7c/0x80) from [] 
(irq_exit+0xd0/0x10c)
[   53.949423] [] (irq_exit+0xd0/0x10c) from [] 
(handle_IRQ+0x4c/0x94)
[   53.957390] [] (handle_IRQ+0x4c/0x94) from [] 
(vic_handle_irq+0x70/0xac)
[   53.965795] [] (vic_handle_irq+0x70/0xac) from [] 
(__irq_svc+0x44/0x78)
[   53.974106] Exception stack(0xc0585f20 to 0xc0585f68)
[   53.979137] 5f20:  2a9c  c058c734 c0584038 c05c92ac 
c0584000 c05c8c08
[   

[PATCH] net: sk == 0xffffffff fix - not for commit

2013-12-09 Thread Andrzej Pietrasiewicz
NOT FOR COMMITTING TO MAINLINE.

With g_ether loaded the sk occasionally becomes 0x.
It happens usually after transferring few hundreds of kilobytes to few
tens of megabytes. If sk is 0x then dereferencing it causes
kernel panic.

This is a *workaround*. I don't know enough net code to understand the core
of the problem. However, with this patch applied the problems are gone,
or at least pushed farther away.

The relevant stack trace below:

[   53.583351] Unable to handle kernel NULL pointer dereference at virtual 
address 0011
 ]
[   53.590077] pgd = c0004000
[   53.592761] [0011] *pgd=
[   53.596319] Internal error: Oops: 17 [#1] PREEMPT ARM
[   53.601223] Modules linked in: usb_f_ecm g_ether u_ether libcomposite
[   53.607641] CPU: 0 PID: 0 Comm: swapper Not tainted 3.12.0-rc6+ #157
[   53.613962] task: c058e5d8 ti: c0584000 task.ti: c0584000
[   53.619345] PC is at tcp_v4_early_demux+0xbc/0x150
[   53.624105] LR is at __inet_lookup_established+0x25c/0x2e0
[   53.629562] pc : [c03595a4]lr : [c033fc0c]psr: a113
[   53.629562] sp : c0585d08  ip : c0585cd0  fp : c0585d2c
[   53.640997] r10: c058cf84  r9 : c05c1768  r8 : e7b22740
[   53.646197] r7 :   r6 : 2cb7  r5 :   r4 : e7b22740
[   53.652697] r3 : c0304504  r2 : c0585cb8  r1 : e6d3e070  r0 : 
[   53.659198] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment 
kernel
[   53.666476] Control: 10c5387d  Table: 57b48019  DAC: 0015
[   53.672194] Process swapper (pid: 0, stack limit = 0xc0584238)
[   53.678001] Stack: (0xc0585d08 to 0xc0586000)
[   53.682338] 5d00:   0381a8c0 eb89 0002 c0585d20 
e7b6e810 e7b22758
[   53.690484] 5d20: c0585d5c c0585d30 c03377f0 c03594f4 c00104e8 c0028ef4 
c058cf70 c058ddec
[   53.698629] 5d40: 0008 e7806000  e7b22740 c0585dac c0585d60 
c0311dd4 c0337480
[   53.706775] 5d60: c058cf78  6113  c0585dfc e7b22740 
c0014368 c058cf84
[   53.714921] 5d80:  e7b22740  c05c8b14 0002 c05c8b60 
 00100100
[   53.723067] 5da0: c0585dc4 c0585db0 c0312f8c c0311c20 e7b22740  
c0585dfc c0585dc8
[   53.731212] 5dc0: c031390c c0312f60 00200200 c05c8b44  c05c8b60 
000c 012a
[   53.739358] 5de0: c05c8b00 c059c548 0001 0040 c0585e3c c0585e00 
c031331c c0313874
[   53.747503] 5e00: c05c8ce3 c0584000 c05ca6c0 3f7c c0028504 0003 
000c c05ceb10
[   53.755650] 5e20: c05ceb0c 0001 c0584000 c0584000 c0585ea4 c0585e40 
c0028968 c0313238
[   53.763795] 5e40: c005cc94 c005eb24 0020 c059b3e0 3f7b c059c548 
c05ceac0 000a
[   53.771941] 5e60:  c059e6d8 c0584000 000c 0101 c05c8d70 
 6193
[   53.780086] 5e80: c0584000  c0619b00  412fc082 c0584038 
c0585ebc c0585ea8
[   53.788232] 5ea0: c0028c40 c0028814  c0584000 c0585ed4 c0585ec0 
c0028fb8 c0028bd0
[   53.796378] 5ec0: c05b5018 0058 c0585ef4 c0585ed8 c00104e8 c0028ef4 
0020 c0619b28
[   53.804524] 5ee0: c0585f20 0001 c0585f1c c0585ef8 c00085dc c00104a8 
c058c734 c00106d4
[   53.812670] 5f00: 6013  c0585f54 c058c0d0 c0585f74 c0585f20 
c0014344 c0008578
[   53.820815] 5f20:  2a9c  c058c734 c0584038 c05c92ac 
c0584000 c05c8c08
[   53.828961] 5f40: c058c0d0 412fc082 c0584038 c0585f74 c0585f68 c0585f68 
c00106d0 c00106d4
[   53.837107] 5f60: 6013  c0585f9c c0585f78 c005c2ac c00106ac 
c058c040 c0584000
[   53.845252] 5f80: c0584000 c0584000 c03aa868  c0585fb4 c0585fa0 
c03a2458 c005c198
[   53.853398] 5fa0:  c058ca08 c0585ff4 c0585fb8 c053aa74 c03a23d0 
 
[   53.861544] 5fc0: c053a540   c0566058  10c53c7d 
c058c05c c0566054
[   53.869689] 5fe0: c058f88c 30004059  c0585ff8 30008070 c053a7c8 
 
[   53.877848] [c03595a4] (tcp_v4_early_demux+0xbc/0x150) from [c03377f0] 
(ip_rcv+0x37c/0x590)
[   53.886510] [c03377f0] (ip_rcv+0x37c/0x590) from [c0311dd4] 
(__netif_receive_skb_core+0x1c0/0x624)
[   53.895779] [c0311dd4] (__netif_receive_skb_core+0x1c0/0x624) from 
[c0312f8c] (__netif_receive_skb+0x38/0x88)
[   53.906003] [c0312f8c] (__netif_receive_skb+0x38/0x88) from [c031390c] 
(process_backlog+0xa4/0x15c)
[   53.915361] [c031390c] (process_backlog+0xa4/0x15c) from [c031331c] 
(net_rx_action+0xf0/0x230)
[   53.924290] [c031331c] (net_rx_action+0xf0/0x230) from [c0028968] 
(__do_softirq+0x160/0x35c)
[   53.933040] [c0028968] (__do_softirq+0x160/0x35c) from [c0028c40] 
(do_softirq+0x7c/0x80)
[   53.941444] [c0028c40] (do_softirq+0x7c/0x80) from [c0028fb8] 
(irq_exit+0xd0/0x10c)
[   53.949423] [c0028fb8] (irq_exit+0xd0/0x10c) from [c00104e8] 
(handle_IRQ+0x4c/0x94)
[   53.957390] [c00104e8] (handle_IRQ+0x4c/0x94) from [c00085dc] 
(vic_handle_irq+0x70/0xac)
[   53.965795] [c00085dc] (vic_handle_irq+0x70/0xac) from 

Re: [PATCH] net: sk == 0xffffffff fix - not for commit

2013-12-09 Thread Eric Dumazet
On Mon, 2013-12-09 at 12:47 +0100, Andrzej Pietrasiewicz wrote:
 NOT FOR COMMITTING TO MAINLINE.
 
 With g_ether loaded the sk occasionally becomes 0x.
 It happens usually after transferring few hundreds of kilobytes to few
 tens of megabytes. If sk is 0x then dereferencing it causes
 kernel panic.
 
 This is a *workaround*. I don't know enough net code to understand the core
 of the problem. However, with this patch applied the problems are gone,
 or at least pushed farther away.

Is it happening on SMP or UP ?

Crash should happen earlier in __inet_lookup_established()


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] net: sk == 0xffffffff fix - not for commit

2013-12-09 Thread Andrzej Pietrasiewicz

W dniu 09.12.2013 16:31, Eric Dumazet pisze:

On Mon, 2013-12-09 at 12:47 +0100, Andrzej Pietrasiewicz wrote:

NOT FOR COMMITTING TO MAINLINE.

With g_ether loaded the sk occasionally becomes 0x.
It happens usually after transferring few hundreds of kilobytes to few
tens of megabytes. If sk is 0x then dereferencing it causes
kernel panic.

This is a *workaround*. I don't know enough net code to understand the core
of the problem. However, with this patch applied the problems are gone,
or at least pushed farther away.


Is it happening on SMP or UP ?


UP build, S5PC110



Crash should happen earlier in __inet_lookup_established()





AP
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/