[beagleboard] Re: kernel panic - has anyone seen something similar?
If you are using the CAN device and the c_can driver, then implementing the kernel mod and re-building/re-installing would seem to be your only option. If you need to use CAN and can use a USB-to-CAN adapter, or some other serial-to-CAN adapter, then maybe you could get around this problem. On Friday, December 12, 2014 8:35:03 AM UTC-5, Jean-Pierre Aulas wrote: Hello, thanks for your reply, is there another way (more simple than rebuilt) for this fix ? Hereunder trace with another problem with mysql : (Linux BBB4 3.8.13-bone50 #1 SMP Tue May 13 13:24:52 UTC 2014 armv7l GNU/Linux) debian@larnau:~$ [ 543.774398] BUG: scheduling while atomic: rs:main Q:Reg/653/0x4100 [ 551.739092] BUG: scheduling while atomic: mysqld/1766/0x4100 [ 582.732825] BUG: scheduling while atomic: mysqld/1766/0x4100 [ 582.759827] [ cut here ] [ 582.764775] kernel BUG at net/core/dev.c:3988! [ 582.769500] Internal error: Oops - BUG: 0 [#1] SMP THUMB2 [ 582.775236] Modules linked in: can_raw can c_can_platform c_can can_dev mt7601Usta(O) [ 582.783670] CPU: 0Tainted: GW O (3.8.13-bone50 #1) [ 582.790084] PC is at __napi_complete+0x36/0x3c [ 582.794820] LR is at napi_complete+0x1d/0x28 [ 582.799368] pc : [c0423c3a]lr : [c0423c5d]psr: 40b3 [ 582.799368] sp : de51deb0 ip : fp : c0d36608 [ 582.811577] r10: c08260c0 r9 : c0d36600 r8 : 012c [ 582.817137] r7 : 0010 r6 : 0001 r5 : 4013 r4 : de68d5dc [ 582.824078] r3 : r2 : r1 : de68d5dc r0 : de68d5dc [ 582.831019] Flags: nZcv IRQs off FIQs on Mode SVC_32 ISA Thumb Segment user [ 582.838884] Control: 50c5387d Table: 9e664019 DAC: 0015 [ 582.844998] Process mysqld (pid: 1766, stack limit = 0xde51c240) [ 582.851381] Stack: (0xde51deb0 to 0xde51e000) [ 582.856029] dea0: de68d5dc 4013 0010 de68d000 [ 582.864737] dec0: de68d540 bf8c0609 de68d5dc 0010 fa200098 fa2000b8 0001 ebbdebbc [ 582.873442] dee0: de51c000 de68d5dc 0001 de51c000 0010 012c c0d36600 c08260c0 [ 582.882146] df00: c0d36608 c0423cd1 0002 0002356e de51df10 de51df10 de51dfb0 0001 [ 582.890853] df20: c082608c de51c000 0043 0003 b709e068 b709e230 000c c0034f0f [ 582.899566] df40: 0008 0043 0100 0009 00400040 df008650 de51c000 [ 582.908276] df60: 0043 0043 de51dfb0 b709e068 b709e230 a676a328 c0035205 [ 582.916989] df80: c0821728 c000d0e3 fa200098 fa2000b8 fa2000d8 c00085a9 b6cbefbe 8030 [ 582.925700] dfa0: 0020 00c8 c04ceca9 258fc214 b4d59e20 b4d59ebe a676a328 [ 582.934409] dfc0: 0031 a676a32c 36b96c00 0020 00c8 b709e068 b709e230 a676a328 [ 582.943115] dfe0: b6fcdfe4 a676a2e8 b6ba124b b6cbefbe 8030 [ 582.951839] [c0423c3a] (__napi_complete+0x36/0x3c) from [0010] (0x10) [ 582.959435] Code: 3108 bc30 f62c bfe3 (de02) de02 [ 582.964540] ---[ end trace a72764883bbe4627 ]--- [ 582.969459] Kernel panic - not syncing: Fatal exception in interrupt regards Jean-Pierre Aulas On Tuesday, December 2, 2014 11:02:15 PM UTC+1, beagler001 wrote: Hello again... Never mind any of the stuff I previously mentioned regarding changing of the kernel config parameters. The problem is rooted in my original comment about the c_can driver. There is a patch that exists that solves this problem. Unfortunately, it was inserted into the mainline kernel stream later than the 3.8+ branch we are using on BeagleBone Black; and therefore, the fix is not included in our kernel source. Take a look at this: http://lists.openwall.net/netdev/2013/11/27/64 If you have acquainted yourself with building the kernel for BBB, I would suggest manually editing that c_can.c file with the changes shown in the link above, rebuilding, and re-installing. That should fix your problem. It did for me. Good luck. On Monday, December 1, 2014 11:13:19 AM UTC-5, beagler001 wrote: RCN's kernel is the kernel source that I am using as well. If you change into that directory, you can run a rebuild script by typing tools/rebuild.sh. Invoking that script automatically pops up a window showing all the kernel config parameters. The number of parameters and finding the exact ones to match what I listed above is rather daunting. What I recommend is to view the default kernel config file and check if you are using the same config as me (probably not). default config file should be named defconfig and should be stored within the patches directory. On Monday, December 1, 2014 11:05:53 AM UTC-5, Me Nee wrote: The only reason I know we're using a custom kernel is because our former Linux guy told me so. Never recompiled a kernel before but I have a cursory grasp of what's involved. Former coworker's linux laptop has a folder
[beagleboard] Re: kernel panic - has anyone seen something similar?
Hello again... Never mind any of the stuff I previously mentioned regarding changing of the kernel config parameters. The problem is rooted in my original comment about the c_can driver. There is a patch that exists that solves this problem. Unfortunately, it was inserted into the mainline kernel stream later than the 3.8+ branch we are using on BeagleBone Black; and therefore, the fix is not included in our kernel source. Take a look at this: http://lists.openwall.net/netdev/2013/11/27/64 If you have acquainted yourself with building the kernel for BBB, I would suggest manually editing that c_can.c file with the changes shown in the link above, rebuilding, and re-installing. That should fix your problem. It did for me. Good luck. On Monday, December 1, 2014 11:13:19 AM UTC-5, beagler001 wrote: RCN's kernel is the kernel source that I am using as well. If you change into that directory, you can run a rebuild script by typing tools/rebuild.sh. Invoking that script automatically pops up a window showing all the kernel config parameters. The number of parameters and finding the exact ones to match what I listed above is rather daunting. What I recommend is to view the default kernel config file and check if you are using the same config as me (probably not). default config file should be named defconfig and should be stored within the patches directory. On Monday, December 1, 2014 11:05:53 AM UTC-5, Me Nee wrote: The only reason I know we're using a custom kernel is because our former Linux guy told me so. Never recompiled a kernel before but I have a cursory grasp of what's involved. Former coworker's linux laptop has a folder named Robert C Nelson that contains what seems to be the custom kernel mod to fix the UART speed issue. I'll start poking around in there to see if I can figure it all out. And yes, that does help. I do appreciate it, thanks for your patience. -- For more options, visit http://beagleboard.org/discuss --- You received this message because you are subscribed to the Google Groups BeagleBoard group. To unsubscribe from this group and stop receiving emails from it, send an email to beagleboard+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[beagleboard] Re: kernel panic - has anyone seen something similar?
Those config parameters are used for the kernel build. They are part of a huge collection of compiler flags used for controlling how the kernel is built. In your initial post, I noticed that you mentioned you were using a custom kernel; therefore, I assumed that you understood how to modify kernel config parameters and build the kernel. Does that help? On Wednesday, November 26, 2014 11:24:22 AM UTC-5, Me Nee wrote: Complete newbie - where can I find these config parameters? -- For more options, visit http://beagleboard.org/discuss --- You received this message because you are subscribed to the Google Groups BeagleBoard group. To unsubscribe from this group and stop receiving emails from it, send an email to beagleboard+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[beagleboard] Re: kernel panic - has anyone seen something similar?
The only reason I know we're using a custom kernel is because our former Linux guy told me so. Never recompiled a kernel before but I have a cursory grasp of what's involved. Former coworker's linux laptop has a folder named Robert C Nelson that contains what seems to be the custom kernel mod to fix the UART speed issue. I'll start poking around in there to see if I can figure it all out. And yes, that does help. I do appreciate it, thanks for your patience. -- For more options, visit http://beagleboard.org/discuss --- You received this message because you are subscribed to the Google Groups BeagleBoard group. To unsubscribe from this group and stop receiving emails from it, send an email to beagleboard+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[beagleboard] Re: kernel panic - has anyone seen something similar?
My kernel is no longer crashing. Unfortunately, I do not have the exact work-around - as I was messing around with a lot of stuff to try to get this to work. The one thing consistent in all this is the backtrace (/var/log/kern.log) in that the routine c_can_get_berr_counter is doing something that it should not do. But trying to get someone that knows about this code to take a look seems to be a huge challenge. If you would like to see the backtrace, let me know. I do know that the resolution was one of two things: 1. Our CAN transceiver's enable line was being shared with MMC1_DAT0 - which was brought out to P8-25 on an expansion header. I tried (various methods) to reset the eMMC in hopes of driving its pins to an open-drain state. I could never get that to work, and therefore I could never drive the CAN transceiver's enable line low. I got around this by wiring P8-25 directly to ground. That could have been causing problems with either the CAN transceiver or the processor itself. Or it could be that leaving the CAN transceiver enabled through the boot process caused issues as well. Nonetheless, I clipped the P8-25 pin on our cape and wired from another GPIO line that was routed to the P8 expansion header (P8-17 I think). This allowed me to enable the CAN transceiver cleanly, post-boot. 2. I disabled some things in my kernel config. Original default (defconfig) CONFIG_CAN_SLCAN=m CONFIG_CAN_TI_HECC=m CONFIG_CAN_MCP251X=m CONFIG_CAN_GRCAN=m # CONFIG_CAN_DEBUG_DEVICES is not set CONFIG_WIMAX=m CONFIG_WIMAX_DEBUG_LEVEL=8 CONFIG_RFKILL=m CONFIG_RFKILL_LEDS=y CONFIG_RFKILL_INPUT=y CONFIG_RFKILL_GPIO=m CONFIG_USB_HSO=m # WiMAX Wireless Broadband devices # # CONFIG_WIMAX_I2400M_USB is not set CONFIG_RADIO_WL128X=m Modified defconfig # CONFIG_CAN_SLCAN is not set # CONFIG_CAN_TI_HECC is not set # CONFIG_CAN_MCP251X is not set # CONFIG_CAN_GRCAN is not set CONFIG_CAN_DEBUG_DEVICES=y # CONFIG_WIMAX is not set # CONFIG_RFKILL is not set # Enable WiMAX (Networking options) to see the WiMAX drivers # # CONFIG_FB_BACKLIGHT is not set # CONFIG_USB_APPLEDISPLAY is not set Hope that helps. On Friday, November 21, 2014 2:47:21 PM UTC-5, Me Nee wrote: Thanks, very informative links, particularly the last one. On Friday, November 21, 2014 8:09:41 AM UTC-7, beagler001 wrote: No solution here yet, but I have found some very relevant discussions out there. Something must have changed with the kernel scheduler that requires drivers (CAN in our case) to be updated. I copied the BeagleBone kernel support guru to this post (Robert Nelson). Perhaps he is already aware of this problem and knows of a work-around. I will post something if/when I get this figured out. Until then, here are some relevant links. http://stackoverflow.com/questions/3537252/how-to-solve-bug-scheduling-while-atomic-swapper-0x0103-0-cpu0-in-ts http://e2e.ti.com/support/omap/f/849/t/250383.aspx https://community.freescale.com/thread/330079 -- For more options, visit http://beagleboard.org/discuss --- You received this message because you are subscribed to the Google Groups BeagleBoard group. To unsubscribe from this group and stop receiving emails from it, send an email to beagleboard+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[beagleboard] Re: kernel panic - has anyone seen something similar?
Note that there are plenty of other things in my kernel config. I only showed the differences between the original (when I would get kernel panic) and the modified (no kernel panic). On Wednesday, November 26, 2014 10:07:00 AM UTC-5, beagler001 wrote: My kernel is no longer crashing. Unfortunately, I do not have the exact work-around - as I was messing around with a lot of stuff to try to get this to work. The one thing consistent in all this is the backtrace (/var/log/kern.log) in that the routine c_can_get_berr_counter is doing something that it should not do. But trying to get someone that knows about this code to take a look seems to be a huge challenge. If you would like to see the backtrace, let me know. I do know that the resolution was one of two things: 1. Our CAN transceiver's enable line was being shared with MMC1_DAT0 - which was brought out to P8-25 on an expansion header. I tried (various methods) to reset the eMMC in hopes of driving its pins to an open-drain state. I could never get that to work, and therefore I could never drive the CAN transceiver's enable line low. I got around this by wiring P8-25 directly to ground. That could have been causing problems with either the CAN transceiver or the processor itself. Or it could be that leaving the CAN transceiver enabled through the boot process caused issues as well. Nonetheless, I clipped the P8-25 pin on our cape and wired from another GPIO line that was routed to the P8 expansion header (P8-17 I think). This allowed me to enable the CAN transceiver cleanly, post-boot. 2. I disabled some things in my kernel config. Original default (defconfig) CONFIG_CAN_SLCAN=m CONFIG_CAN_TI_HECC=m CONFIG_CAN_MCP251X=m CONFIG_CAN_GRCAN=m # CONFIG_CAN_DEBUG_DEVICES is not set CONFIG_WIMAX=m CONFIG_WIMAX_DEBUG_LEVEL=8 CONFIG_RFKILL=m CONFIG_RFKILL_LEDS=y CONFIG_RFKILL_INPUT=y CONFIG_RFKILL_GPIO=m CONFIG_USB_HSO=m # WiMAX Wireless Broadband devices # # CONFIG_WIMAX_I2400M_USB is not set CONFIG_RADIO_WL128X=m Modified defconfig # CONFIG_CAN_SLCAN is not set # CONFIG_CAN_TI_HECC is not set # CONFIG_CAN_MCP251X is not set # CONFIG_CAN_GRCAN is not set CONFIG_CAN_DEBUG_DEVICES=y # CONFIG_WIMAX is not set # CONFIG_RFKILL is not set # Enable WiMAX (Networking options) to see the WiMAX drivers # # CONFIG_FB_BACKLIGHT is not set # CONFIG_USB_APPLEDISPLAY is not set Hope that helps. On Friday, November 21, 2014 2:47:21 PM UTC-5, Me Nee wrote: Thanks, very informative links, particularly the last one. On Friday, November 21, 2014 8:09:41 AM UTC-7, beagler001 wrote: No solution here yet, but I have found some very relevant discussions out there. Something must have changed with the kernel scheduler that requires drivers (CAN in our case) to be updated. I copied the BeagleBone kernel support guru to this post (Robert Nelson). Perhaps he is already aware of this problem and knows of a work-around. I will post something if/when I get this figured out. Until then, here are some relevant links. http://stackoverflow.com/questions/3537252/how-to-solve-bug-scheduling-while-atomic-swapper-0x0103-0-cpu0-in-ts http://e2e.ti.com/support/omap/f/849/t/250383.aspx https://community.freescale.com/thread/330079 -- For more options, visit http://beagleboard.org/discuss --- You received this message because you are subscribed to the Google Groups BeagleBoard group. To unsubscribe from this group and stop receiving emails from it, send an email to beagleboard+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[beagleboard] Re: kernel panic - has anyone seen something similar?
Complete newbie - where can I find these config parameters? -- For more options, visit http://beagleboard.org/discuss --- You received this message because you are subscribed to the Google Groups BeagleBoard group. To unsubscribe from this group and stop receiving emails from it, send an email to beagleboard+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[beagleboard] Re: kernel panic - has anyone seen something similar?
No solution here yet, but I have found some very relevant discussions out there. Something must have changed with the kernel scheduler that requires drivers (CAN in our case) to be updated. I copied the BeagleBone kernel support guru to this post (Robert Nelson). Perhaps he is already aware of this problem and knows of a work-around. I will post something if/when I get this figured out. Until then, here are some relevant links. http://stackoverflow.com/questions/3537252/how-to-solve-bug-scheduling-while-atomic-swapper-0x0103-0-cpu0-in-ts http://e2e.ti.com/support/omap/f/849/t/250383.aspx https://community.freescale.com/thread/330079 On Thursday, November 20, 2014 5:36:51 PM UTC-5, Me Nee wrote: Haven't figured a software way around this yet. For now we're avoiding the direct CAN interface to the Beaglebone and instead using our external custom hardware to relay serial CAN messages to the Beaglebone. We don't have issues with this format. That said, if you have a fix I'd love to hear about it. On Thursday, November 20, 2014 3:04:20 PM UTC-7, beagler001 wrote: Hello Me Nee, Yes. I am having the same problem. How are you doing on this? Have you figured anything out? I too am using the CAN interface on the BeagleBone Black device. Let me know, and I can update you with my findings if you still need help. -- For more options, visit http://beagleboard.org/discuss --- You received this message because you are subscribed to the Google Groups BeagleBoard group. To unsubscribe from this group and stop receiving emails from it, send an email to beagleboard+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[beagleboard] Re: kernel panic - has anyone seen something similar?
Thanks, very informative links, particularly the last one. On Friday, November 21, 2014 8:09:41 AM UTC-7, beagler001 wrote: No solution here yet, but I have found some very relevant discussions out there. Something must have changed with the kernel scheduler that requires drivers (CAN in our case) to be updated. I copied the BeagleBone kernel support guru to this post (Robert Nelson). Perhaps he is already aware of this problem and knows of a work-around. I will post something if/when I get this figured out. Until then, here are some relevant links. http://stackoverflow.com/questions/3537252/how-to-solve-bug-scheduling-while-atomic-swapper-0x0103-0-cpu0-in-ts http://e2e.ti.com/support/omap/f/849/t/250383.aspx https://community.freescale.com/thread/330079 -- For more options, visit http://beagleboard.org/discuss --- You received this message because you are subscribed to the Google Groups BeagleBoard group. To unsubscribe from this group and stop receiving emails from it, send an email to beagleboard+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[beagleboard] Re: kernel panic - has anyone seen something similar?
Hello Me Nee, Yes. I am having the same problem. How are you doing on this? Have you figured anything out? I too am using the CAN interface on the BeagleBone Black device. Let me know, and I can update you with my findings if you still need help. On Thursday, November 13, 2014 12:41:48 PM UTC-5, Me Nee wrote: Background: my company is using beaglebone blacks in an industrial CAN bus monitoring application. Our Linux guy quit in June, leaving me to support this thing. Prior to June I had pretty much zero Linux experience. Our kernel has custom modifications. One was provided by Tower Tech (an Italian manufacturer of beaglebone CAN capes), to provide support for CAN and the other was done by the guy that quit. It was a modification to the kernel to support high speed serial clocks, as there was a bug in the original kernel code when setting serial rates greater than 230,400 baud (if I remember correctly). The base kernel we're working off of is 2013-09-04, which from what I can gather is the latest official Angstrom release. We're using the beaglebone in 2 different external hardware configurations. One is with a straight to AM3359 CAN interface, and the other is through a CAN-to-serial external (external to the beaglebone) interface. The CAN-to-serial incarnation works flawlessly. The straight to CAN version, we discovered, throws the same kernel panic over over when CAN traffic gets high. Our suite of programs that runs on the beaglebone includes two networking-related daemons. One is to catch react to network requests to configure our equipment and the other is basically a UDP blaster to broadcast data. I've found that disabling both of these daemons will prevent the panic. If I only disable one of these programs (doesn't matter which one), the panic still happens. Anyway, the panic references /net/core/dev.c line 3988 every time. What I'm wondering is if anyone has seen something similar? Or can someone maybe point me in the direction of fixing this? Seems that the issue is likely due to the CAN specific features we got from the Tower Tech kernel, but I'm hesitant to ask them because I know that my ex-colleague had trouble communicating with them in the past. Given that the panic does not occur when we use the indirect CAN-to-serial hardware, that's why I'm suspicious of the Tower Tech kernel modifications. And also note that the panic happens all the time. The dump below references our can_mon program, but it also happens when nothing is running other than our (custom) daemons. The panic dump is below. [ 101.133958] [ cut here ] [ 101.138802] kernel BUG at net/core/dev.c:3988! [ 101.143440] Internal error: Oops - BUG: 0 [#1] SMP THUMB2 [ 101.149074] Modules linked in: can_raw can c_can_platform c_can can_dev iptable_nat nf_conntrack_ipv4 nf_d efrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables g_multi libcomposite rfcomm ircomm_tty ircomm i rda hidp bluetooth rfkill [ 101.171411] CPU: 0Tainted: GW (3.8.13-bone53 #1) [ 101.177689] PC is at __napi_complete+0x36/0x3c [ 101.182334] LR is at napi_complete+0x1d/0x28 [ 101.186785] pc : [c03fa5ca]lr : [c03fa5ed]psr: 40b3 [ 101.186785] sp : d902dd50 ip : 54638479 fp : 012c [ 101.198775] r10: c0d12600 r9 : d902c000 r8 : c0d12608 [ 101.204216] r7 : 0001 r6 : 0010 r5 : 4013 r4 : dcb7eddc [ 101.211027] r3 : r2 : r1 : dcb7eddc r0 : dcb7eddc [ 101.217833] Flags: nZcv IRQs off FIQs on Mode SVC_32 ISA Thumb Segment user [ 101.225552] Control: 50c5387d Table: 9cacc019 DAC: 0015 [ 101.231544] Process can_mon (pid: 450, stack limit = 0xd902c240) [ 101.237812] Stack: (0xd902dd50 to 0xd902e000) [ 101.242361] dd40: dcb7eddc 4013 0010 dcb7e800 [ 101.250900] dd60: dcb7ed40 bf8f476d dcb7eddc 0010 df0c8010 df258010 c0d1176c c0288173 [ 101.259434] dd80: dcb7eddc 0001 0010 c08020c0 c0d12608 d902c000 c0d12600 [ 101.267967] dda0: 012c c03fa65d fffe7240 dc8c7440 0001 0003 000c [ 101.276500] ddc0: c0802090 c080208c d902c000 c08858e4 0037 c003466b df00c740 df007bc0 [ 101.285045] dde0: 0080 0100 000c 000a 00404100 df007c10 d902c000 [ 101.293582] de00: 0037 d902de60 c08884bc fa2000d8 fa2000f8 0037 c003493d [ 101.302105] de20: c07fd728 c000cfdf fa200098 fa200040 fa2000b8 c00085d9 c04a22c9 c04a22cc [ 101.310630] de40: 6033 d902de94 dce76800 6013 dcec8409 c04a259b [ 101.319164] de60: df258010 6013 8a108a10 6013 df2b9800 0002 df258010 [ 101.327695] de80: dce76800 6013 dcec8409 0039 d902dea8 c04a22c9 c04a22cc [ 101.336235] dea0: 6033 c028015b c02800d9 0003 dcec8407 dce76400
[beagleboard] Re: kernel panic - has anyone seen something similar?
Haven't figured a software way around this yet. For now we're avoiding the direct CAN interface to the Beaglebone and instead using our external custom hardware to relay serial CAN messages to the Beaglebone. We don't have issues with this format. That said, if you have a fix I'd love to hear about it. On Thursday, November 20, 2014 3:04:20 PM UTC-7, beagler001 wrote: Hello Me Nee, Yes. I am having the same problem. How are you doing on this? Have you figured anything out? I too am using the CAN interface on the BeagleBone Black device. Let me know, and I can update you with my findings if you still need help. -- For more options, visit http://beagleboard.org/discuss --- You received this message because you are subscribed to the Google Groups BeagleBoard group. To unsubscribe from this group and stop receiving emails from it, send an email to beagleboard+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.