subject:"Re\: \[PATCH\] net\: sk == 0xffffffff fix \- not for commit"

Re: [PATCH] net: sk == 0xffffffff fix - not for commit

2014-01-24 Thread Andrew Ruder

Actually found what looks to be a fix for this in another thread.

http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg574770.html
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58854

Cheers,
Andy
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] net: sk == 0xffffffff fix - not for commit

2014-01-24 Thread Andrew Ruder

On Fri, Jan 24, 2014 at 07:38:31AM -0600, Andrew Ruder wrote:
 http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg574770.html
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58854

Just a little further confirmation.  This appears in
__inet_lookup_established as the last four instructions before
returning.

   +440:   bl  __rcu_read_unlock
   +444:   sub sp, r11, #40; 0x28
   +448:   ldr r0, [r11, #-48] ; 0x30
   +452:   ldm sp, {r4, r5, r6, r7, r8, r9, r10, r11, sp, pc}
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] net: sk == 0xffffffff fix - not for commit

2014-01-17 Thread Andrzej Pietrasiewicz


W dniu 16.01.2014 17:29, Eric Dumazet pisze:

On Thu, 2014-01-16 at 16:21 +0100, Andrzej Pietrasiewicz wrote:

W dniu 10.12.2013 15:25, Eric Dumazet pisze:

On Tue, 2013-12-10 at 07:55 +0100, Andrzej Pietrasiewicz wrote:

W dniu 09.12.2013 16:31, Eric Dumazet pisze:

On Mon, 2013-12-09 at 12:47 +0100, Andrzej Pietrasiewicz wrote:

NOT FOR COMMITTING TO MAINLINE.

With g_ether loaded the sk occasionally becomes 0x.
It happens usually after transferring few hundreds of kilobytes to few
tens of megabytes. If sk is 0x then dereferencing it causes
kernel panic.

This is a *workaround*. I don't know enough net code to understand the core
of the problem. However, with this patch applied the problems are gone,
or at least pushed farther away.


Is it happening on SMP or UP ?


UP build, S5PC110


OK

I believe you need additional debugging to track the exact moment
0x is fed to 'sk'

It looks like a very strange bug, involving a problem in some assembly
helper, register save/restore, compiler bug or stack corruption or
something.



I started with adding WARN_ON(sk == 0x); just before return in
__inet_lookup_established(), and the problem was gone. So this looks
very strange, like a toolchain problem.


Or a timing issue. Adding a WARN_ON() adds extra instructions and might
really change the assembly output.



I used gcc-linaro-arm-linux-gnueabihf-4.8-2013.05.

If I change the toolchain to

gcc-linaro-arm-linux-gnueabihf-4.7-2013.04-20130415

the problem seems to have gone away.


Its totally possible some barrier was not properly handled by the
compiler. You could disassemble the function on both toolchains and
try to spot the issue.



So I gave it a try.

Below is a part of assembly code (ARM) which corresponds to the last
lines of the __inet_lookup_established():

C source:
=
found:
rcu_read_unlock();
return sk;
}

assembly for toolchain 4.7:
===
c0333bb8:   ebf4bb6ebl  c0062978 __rcu_read_unlock
c0333bbc:   e51b0030ldr r0, [fp, #-48]  ; 0x30
c0333bc0:   e24bd028sub sp, fp, #40 ; 0x28
c0333bc4:   e89daff0ldm sp, {r4, r5, r6, r7, r8, r9, sl, fp, 
sp, pc}
c0333bc8:   e5132018ldr r2, [r3, #-24]


assembly for toolchain 4.8:
===
c033ff5c:   ebf4927ebl  c006495c __rcu_read_unlock
c033ff60:   e24bd028sub sp, fp, #40 ; 0x28
c033ff64:   e51b0030ldr r0, [fp, #-48]  ; 0x30
c033ff68:   e89daff0ldm sp, {r4, r5, r6, r7, r8, r9, sl, fp, 
sp, pc}
c033ff6c:   e5113018ldr r3, [r1, #-24]

What can be seen is that the usage of registers is slightly different,
and, what is more important, the _order_ of ldr/sub is different.
Now, if I swap the instructions at offsets c033ff60 and c033ff64
in the 4.8-generated vmlinux, the problem seems gone! Well, at least
the binary behaves the same way as the 4.7-generated one.

Here is a _hypothesis_ of what _might_ be happening:

The function in question puts its return value in the register r0.
In both cases the return value is fetched from a memory location
relative #-48 to what the frame pointer points to. However,
in the 4.7-generated binary the ldr executes in the branch delay slot,
whereas in the 4.8-generated binary it is the sub which executes
in the branch delay slot. That way, in the 4.7-generated binary the return
value is fetched before __rcu_read_unlock begins, but in the
4.8-generated binary it is fetched some time later. Which might be
enough for someone else to schedule in and break the data to be
copied to r0 and returned from the function.

As I said, this is just a hypothesis.

AP
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] net: sk == 0xffffffff fix - not for commit

2014-01-17 Thread Andrzej Pietrasiewicz


W dniu 17.01.2014 13:18, Andrzej Pietrasiewicz pisze:

W dniu 16.01.2014 17:29, Eric Dumazet pisze:

On Thu, 2014-01-16 at 16:21 +0100, Andrzej Pietrasiewicz wrote:

W dniu 10.12.2013 15:25, Eric Dumazet pisze:

On Tue, 2013-12-10 at 07:55 +0100, Andrzej Pietrasiewicz wrote:

W dniu 09.12.2013 16:31, Eric Dumazet pisze:

On Mon, 2013-12-09 at 12:47 +0100, Andrzej Pietrasiewicz wrote:

NOT FOR COMMITTING TO MAINLINE.

With g_ether loaded the sk occasionally becomes 0x.
It happens usually after transferring few hundreds of kilobytes to few
tens of megabytes. If sk is 0x then dereferencing it causes
kernel panic.

This is a *workaround*. I don't know enough net code to understand the core
of the problem. However, with this patch applied the problems are gone,
or at least pushed farther away.


Is it happening on SMP or UP ?


UP build, S5PC110


OK

I believe you need additional debugging to track the exact moment
0x is fed to 'sk'

It looks like a very strange bug, involving a problem in some assembly
helper, register save/restore, compiler bug or stack corruption or
something.



I started with adding WARN_ON(sk == 0x); just before return in
__inet_lookup_established(), and the problem was gone. So this looks
very strange, like a toolchain problem.


Or a timing issue. Adding a WARN_ON() adds extra instructions and might
really change the assembly output.



I used gcc-linaro-arm-linux-gnueabihf-4.8-2013.05.

If I change the toolchain to

gcc-linaro-arm-linux-gnueabihf-4.7-2013.04-20130415

the problem seems to have gone away.


Its totally possible some barrier was not properly handled by the
compiler. You could disassemble the function on both toolchains and
try to spot the issue.



So I gave it a try.

Below is a part of assembly code (ARM) which corresponds to the last
lines of the __inet_lookup_established():

C source:
=
found:
rcu_read_unlock();
return sk;
}

assembly for toolchain 4.7:
===
c0333bb8:   ebf4bb6ebl  c0062978 __rcu_read_unlock
c0333bbc:   e51b0030ldr r0, [fp, #-48]  ; 0x30
c0333bc0:   e24bd028sub sp, fp, #40 ; 0x28
c0333bc4:   e89daff0ldm sp, {r4, r5, r6, r7, r8, r9, sl, fp, 
sp, pc}
c0333bc8:   e5132018ldr r2, [r3, #-24]


assembly for toolchain 4.8:
===
c033ff5c:   ebf4927ebl  c006495c __rcu_read_unlock
c033ff60:   e24bd028sub sp, fp, #40 ; 0x28
c033ff64:   e51b0030ldr r0, [fp, #-48]  ; 0x30
c033ff68:   e89daff0ldm sp, {r4, r5, r6, r7, r8, r9, sl, fp, 
sp, pc}
c033ff6c:   e5113018ldr r3, [r1, #-24]

What can be seen is that the usage of registers is slightly different,
and, what is more important, the _order_ of ldr/sub is different.
Now, if I swap the instructions at offsets c033ff60 and c033ff64
in the 4.8-generated vmlinux, the problem seems gone! Well, at least
the binary behaves the same way as the 4.7-generated one.

Here is a _hypothesis_ of what _might_ be happening:

The function in question puts its return value in the register r0.
In both cases the return value is fetched from a memory location
relative #-48 to what the frame pointer points to. However,
in the 4.7-generated binary the ldr executes in the branch delay slot,
whereas in the 4.8-generated binary it is the sub which executes
in the branch delay slot. That way, in the 4.7-generated binary the return
value is fetched before __rcu_read_unlock begins, but in the
4.8-generated binary it is fetched some time later. Which might be
enough for someone else to schedule in and break the data to be
copied to r0 and returned from the function.

As I said, this is just a hypothesis.



Please disregard what I have written.

There is no delay slot on ARM :O

A nice hypothesis, though ;)

AP


--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] net: sk == 0xffffffff fix - not for commit

2014-01-16 Thread Eric Dumazet

On Thu, 2014-01-16 at 16:21 +0100, Andrzej Pietrasiewicz wrote:
 W dniu 10.12.2013 15:25, Eric Dumazet pisze:
  On Tue, 2013-12-10 at 07:55 +0100, Andrzej Pietrasiewicz wrote:
  W dniu 09.12.2013 16:31, Eric Dumazet pisze:
  On Mon, 2013-12-09 at 12:47 +0100, Andrzej Pietrasiewicz wrote:
  NOT FOR COMMITTING TO MAINLINE.
 
  With g_ether loaded the sk occasionally becomes 0x.
  It happens usually after transferring few hundreds of kilobytes to few
  tens of megabytes. If sk is 0x then dereferencing it causes
  kernel panic.
 
  This is a *workaround*. I don't know enough net code to understand the 
  core
  of the problem. However, with this patch applied the problems are gone,
  or at least pushed farther away.
 
  Is it happening on SMP or UP ?
 
  UP build, S5PC110
 
  OK
 
  I believe you need additional debugging to track the exact moment
  0x is fed to 'sk'
 
  It looks like a very strange bug, involving a problem in some assembly
  helper, register save/restore, compiler bug or stack corruption or
  something.
 
 
 I started with adding WARN_ON(sk == 0x); just before return in
 __inet_lookup_established(), and the problem was gone. So this looks
 very strange, like a toolchain problem.

Or a timing issue. Adding a WARN_ON() adds extra instructions and might
really change the assembly output.

 
 I used gcc-linaro-arm-linux-gnueabihf-4.8-2013.05.
 
 If I change the toolchain to
 
 gcc-linaro-arm-linux-gnueabihf-4.7-2013.04-20130415
 
 the problem seems to have gone away.

Its totally possible some barrier was not properly handled by the
compiler. You could disassemble the function on both toolchains and
try to spot the issue.



--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] net: sk == 0xffffffff fix - not for commit

2014-01-16 Thread Andrew Ruder

On Mon, Dec 09, 2013 at 12:47:52PM +0100, Andrzej Pietrasiewicz wrote:
 With g_ether loaded the sk occasionally becomes 0x.
 It happens usually after transferring few hundreds of kilobytes to few
 tens of megabytes. If sk is 0x then dereferencing it causes
 kernel panic.

Don't know if this is relevant but I had this very similar stack trace
come up a few days ago (below).  I am working on a PXA 270/xscale with
gcc version 4.8.2 (Buildroot 2013.11-rc1-00028-gf388663).  Going to try
to see if I can reproduce it a little more readily before I start trying
to narrow down what is causing it.

===
Unable to handle kernel NULL pointer dereference at virtual address 0011
pgd = d18e
[0011] *pgd=a6d03831, *pte=, *ppte=
Internal error: Oops: 17 [#1] PREEMPT ARM
Modules linked in: zeusvirt(O) zeus16550(O) 8390p ipv6
CPU: 0 PID: 2365 Comm: sshd Tainted: G   O 3.12.0+ #201
task: d7216f00 ti: d7144000 task.ti: d7144000
PC is at tcp_v4_early_demux+0xe8/0x154
LR is at __inet_lookup_established+0x1bc/0x2e0
pc : [c0341cfc]lr : [c0329bd8]psr: a013
sp : d7145b20  ip : d7145ae8  fp : d7145b44
r10: c0576c28  r9 : 0008  r8 : d7998800
r7 : d7063800  r6 : c6cf2480  r5 :   r4 : c6cf2480
r3 : c02ec018  r2 : d7145ad0  r1 : d7b66a28  r0 : 
Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 397f  Table: b18e  DAC: 0015
Process sshd (pid: 2365, stack limit = 0xd71441c8)
Stack: (0xd7145b20 to 0xd7146000)
5b20: 17bf3f0a 0016 0003 c0026d90 d71f4634 d71f4600 d7145b6c d7145b48
5b40: c03211b4 c0341c20 05ea d7bb0538 d7063800 0034 d71f4600 c6cf2480
5b60: d7145b9c d7145b70 c03218dc c0321158 1001  c0576c1c 
5b80: c0577e84 c0576c14   d7145be4 d7145ba0 c02fae04 c03215d4
5ba0: c0590330 c057fc08 d7145bfc c6cf2480 c02571a0 c0576c28 07e1 c05a3dc0
5bc0:  0001 c05a3d60 c05a3d74 c05a3d60 c05a3d68 d7145bfc d7145be8
5be0: c02fb990 c02fa8f0 c05a3dc0  d7145c24 d7145c00 c02fc46c c02fb968
5c00: c02fc3dc c05a3dc0 c05a3d60 0001 012c 0040 d7145c64 d7145c28
5c20: c02fbcd0 c02fc3e8  d78af3c0 d7145c5c 8d99  0001
5c40: c05a81f0 0003 0100 3fa57e1c d7144028 c05a81ec d7145cb4 d7145c68
5c60: c0026a44 c02fbc10 d7145c8c d7145c78 c00538dc c0056ce4  8d98
5c80: 00400100 000a c0228594 6093 c0590330  d7145d54 0001
5ca0: d7bb0480 05b4 d7145ccc d7145cb8 c0026ca4 c00268f4  d7144010
5cc0: d7145ce4 d7145cd0 c0026f58 c0026c58 00ab 001a d7145d04 d7145ce8
5ce0: c000f7d0 c0026ed0 0014 d7145d20 a013  d7145d1c d7145d08
5d00: c00085bc c000f768 c02f0048 c00ca7d8 d7145d7c d7145d20 c03a7dc0 c0008590
5d20: 000118ed  c05a474c c05d41cc d7bb0180 d18ed800 d7801080 06a3
5d40: 0001 d7bb0480 05b4 d7145d7c d7145d80 d7145d68 c02f0048 c00ca7d8
5d60: a013  c05a4738 d7bb0180 d7145dac d7145d80 c02f0048 c00ca7b0
5d80: 0001 00c63fc0 d7b66a00 d7b66a00 4040 05b4  d7b66a00
5da0: d7145dcc d7145db0 c032e340 c02effd0 d7145e98 4040 0008c414 
5dc0: d7145e54 d7145dd0 c032f368 c032e310 d7145e24 c02ea81c c03a6040 c03a9c6c
5de0:   d7145ee8  05b4  d7b66adc 
5e00:  d7144000 1854 05b4 27ec 0040 d7116d80 05b4
5e20:   d7145e6c d7b66a00 d7145ee8 d7145e98 4040 4040
5e40: 4040 0002 d7145e74 d7145e58 c03526c8 c032eb0c d7145e78 d7116d80
5e60: d7145ee0 d7116d80 d7145ed4 d7145e78 c02e63a4 c0352688 c05a3dc0 d7142000
5e80: 0040 4040 d76701c0 d7145ee0  d7145e98  
5ea0: d7145ee0 0001   0040 d7145ee8 c6cf2900 
5ec0:  d7145f78 d7145f44 d7145ed8 c00d1c64 c02e62e4  
5ee0: 00089c28 4040 d7116d80   d7145e78 d7216f00 
5f00:     4040   
5f20: 00089c28 d7116d80 00089c28 d7145f78 4040 00089c28 d7145f74 d7145f48
5f40: c00d23a0 c00d1bf4     d7116d80 
5f60: 00089c28 4040 d7145fa4 d7145f78 c00d2948 c00d22c0  
5f80: beed167c 0003 000614dc 0004 c000ea28 d7144000  d7145fa8
5fa0: c000e7e0 c00d2908 beed167c 0003 0003 00089c28 4040 beed167c
5fc0: beed167c 0003 000614dc 0004 00089c28 00060a88 093e beed17a0
5fe0: beed167c beed1648 00029910 b6dc821c 6010 0003  
[c0341cfc] (tcp_v4_early_demux+0xe8/0x154) from [c03211b4] 
(ip_rcv_finish+0x68/0x2c0)
[c03211b4] (ip_rcv_finish+0x68/0x2c0) from [c03218dc] (ip_rcv+0x314/0x398)
[c03218dc] (ip_rcv+0x314/0x398) from [c02fae04] 
(__netif_receive_skb_core+0x520/0x5d8)
[c02fae04] (__netif_receive_skb_core+0x520/0x5d8) from [c02fb990] 
(__netif_receive_skb+0x34/0x88)
[c02fb990] (__netif_receive_skb+0x34/0x88) from [c02fc46c] 
(process_backlog+0x90/0x148)

Re: [PATCH] net: sk == 0xffffffff fix - not for commit

2013-12-10 Thread Eric Dumazet

On Tue, 2013-12-10 at 07:55 +0100, Andrzej Pietrasiewicz wrote:
 W dniu 09.12.2013 16:31, Eric Dumazet pisze:
  On Mon, 2013-12-09 at 12:47 +0100, Andrzej Pietrasiewicz wrote:
  NOT FOR COMMITTING TO MAINLINE.
 
  With g_ether loaded the sk occasionally becomes 0x.
  It happens usually after transferring few hundreds of kilobytes to few
  tens of megabytes. If sk is 0x then dereferencing it causes
  kernel panic.
 
  This is a *workaround*. I don't know enough net code to understand the core
  of the problem. However, with this patch applied the problems are gone,
  or at least pushed farther away.
 
  Is it happening on SMP or UP ?
 
 UP build, S5PC110

OK

I believe you need additional debugging to track the exact moment
0x is fed to 'sk'

It looks like a very strange bug, involving a problem in some assembly
helper, register save/restore, compiler bug or stack corruption or
something.

You should not have more than 150 instructions to decode, including
__inet_lookup_established()

Since __inet_lookup_established() dereferences the socket pointer, I do
not see why it would crash ~20 instructions _later_



--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] net: sk == 0xffffffff fix - not for commit

2013-12-09 Thread Eric Dumazet

On Mon, 2013-12-09 at 12:47 +0100, Andrzej Pietrasiewicz wrote:
 NOT FOR COMMITTING TO MAINLINE.
 
 With g_ether loaded the sk occasionally becomes 0x.
 It happens usually after transferring few hundreds of kilobytes to few
 tens of megabytes. If sk is 0x then dereferencing it causes
 kernel panic.
 
 This is a *workaround*. I don't know enough net code to understand the core
 of the problem. However, with this patch applied the problems are gone,
 or at least pushed farther away.

Is it happening on SMP or UP ?

Crash should happen earlier in __inet_lookup_established()


--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] net: sk == 0xffffffff fix - not for commit

2013-12-09 Thread Andrzej Pietrasiewicz


W dniu 09.12.2013 16:31, Eric Dumazet pisze:

On Mon, 2013-12-09 at 12:47 +0100, Andrzej Pietrasiewicz wrote:

NOT FOR COMMITTING TO MAINLINE.

With g_ether loaded the sk occasionally becomes 0x.
It happens usually after transferring few hundreds of kilobytes to few
tens of megabytes. If sk is 0x then dereferencing it causes
kernel panic.

This is a *workaround*. I don't know enough net code to understand the core
of the problem. However, with this patch applied the problems are gone,
or at least pushed farther away.


Is it happening on SMP or UP ?


UP build, S5PC110



Crash should happen earlier in __inet_lookup_established()





AP
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] net: sk == 0xffffffff fix - not for commit

Re: [PATCH] net: sk == 0xffffffff fix - not for commit

Re: [PATCH] net: sk == 0xffffffff fix - not for commit

Re: [PATCH] net: sk == 0xffffffff fix - not for commit

Re: [PATCH] net: sk == 0xffffffff fix - not for commit

Re: [PATCH] net: sk == 0xffffffff fix - not for commit

Re: [PATCH] net: sk == 0xffffffff fix - not for commit

Re: [PATCH] net: sk == 0xffffffff fix - not for commit

Re: [PATCH] net: sk == 0xffffffff fix - not for commit

9 matches

Site Navigation

Mail list logo

Footer information