Bug#719277: linux-image-3.11-rc4-amd64: Kernel crashes when running Folding@Home as a system service
Just want to let you know this bug can be closed now, I haven't once had the kernel crash due to Folding@Home since my last message and I've gone through several work units since then, all of which had been resumed several times through their progress. I think it's quite safe to say whatever bug was causing the kernel to crash back in 3.11-rc4 was fixed in 3.11-rc7 and remains fixed in the latest version. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#719277: linux-image-3.11-rc4-amd64: Kernel crashes when running Folding@Home as a system service
I would like to report that this issue seems to have been resolved in kernel 3.11-rc7, I am able to run Folding@Home without the kernel crashing. I do currently appear to be working on a different type of unit than I had been with the previous kernel, however, but it does appear to be using the same core (FahCore_a4) that had been causing the crashes. This unit will probably take about 10 days or so to complete, as it's quite a large one, hopefully the next unit I receive is of a similar type to the ones I had been working on previously when the kernel was crashing any time Folding@Home attempted to resume the work units. I will check back in later with a status update, hopefully this issue has indeed been resolved and it's not just a fluke because of a different type of work unit. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#719277: linux-image-3.11-rc4-amd64: Kernel crashes when running Folding@Home as a system service
On Fri, 2013-08-16 at 21:54 -0400, Alex Vanderpol wrote: Well, I've discovered why makedumpfile continues to run even after the dump files show up in the folder. It's failing to properly dump the kernel log, and is continually appending the line [ 0.00] to the dmesg file. It sounds like the version of makedumpfile you're using doesn't understand the structured log format introduced in Linux 3.5. I guess you need at least version 1.5.1-1, which has this changelog line: * Add --dmesg-fix from upstream 1.5.2 for kernels 3.5 and above Ben. -- Ben Hutchings Teamwork is essential - it allows you to blame someone else. signature.asc Description: This is a digitally signed message part
Bug#719277: linux-image-3.11-rc4-amd64: Kernel crashes when running Folding@Home as a system service
On Thu, 2013-08-15 at 20:01 -0400, Alex Vanderpol wrote: Apparently having the kernel image debug package installed is a good idea when trying to do anything with crash dumps... After installing the ~2GB (unpacked) package I was able to use the crash utility to analyze (to a degree) the crash dump file made by kdump-tools, however I am unable to extract the kernel log from the dump. When I run the 'log' command within crash I get this message: log: WARNING: log buf data structure(s) have changed I can, however, get a backtrace and the process status information from the dump. If you think it would be useful, I can output what I am able to get from crash to a file to send to you for you to look at. Yes please. Also, I have a few questions: [...] Sorry, I don't know how to use kdump-tools myself. Ben. -- Ben Hutchings Teamwork is essential - it allows you to blame someone else. signature.asc Description: This is a digitally signed message part
Bug#719277: linux-image-3.11-rc4-amd64: Kernel crashes when running Folding@Home as a system service
(In reply to your earlier email) The problem is that I'm already using version 1.5.4-1 from Unstable, and it's having that issue, so either something's been changed in the recent kernel version that broke it again, or makedumpfile has regressed since version 1.5.1-1. Either way I've already filed a bug about it, so hopefully the package maintainers will look into it. (In reply to your later email) I've attached crash's output including the back trace and process status information, if there's anything else from crash (other than the unfortunately unobtainable crash log, which is dumped separately anyway) that you need, let me know and I'll see if I can get it to you. As for the questions I had asked, don't worry about them, I ended up finding some helpful information some time a little later that helped me address some of those issues. crash 7.0.1 Copyright (C) 2002-2013 Red Hat, Inc. Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005, 2011 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter help copying to see the conditions. This program has absolutely no warranty. Enter help warranty for details. NOTE: stdin: not a tty GNU gdb (GDB) 7.6 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type show copying and show warranty for details. This GDB was configured as x86_64-unknown-linux-gnu... please wait... (gathering kmem slab cache data) please wait... (gathering module symbol data) please wait... (gathering task table data) please wait... (determining panic task) KERNEL: /usr/lib/debug/vmlinux DUMPFILE: /data/crashdumps/201308162051/dump.201308162051 [PARTIAL DUMP] CPUS: 2 DATE: Fri Aug 16 20:50:41 2013 UPTIME: 00:01:18 LOAD AVERAGE: 1.47, 0.63, 0.23 TASKS: 185 NODENAME: Kara01 RELEASE: 3.11-rc4-amd64 VERSION: #1 SMP Debian 3.11~rc4-1~exp1 (2013-08-08) MACHINE: x86_64 (1296 Mhz) MEMORY: 3.9 GB PANIC: WARNING: log buf data structure(s) have changed PID: 1765 COMMAND: FahCore_a4 TASK: 880137280800 [THREAD_INFO: 88013a41] CPU: 0 STATE: TASK_RUNNING (PANIC) PID: 1765 TASK: 880137280800 CPU: 0 COMMAND: FahCore_a4 #0 [88013a4119f0] machine_kexec at 8103366c #1 [88013a411a50] crash_kexec at 8108f00e #2 [88013a411b08] oops_end at 8138f793 #3 [88013a411b28] no_context at 81387f81 #4 [88013a411b68] __do_page_fault at 81391a58 #5 [88013a411c60] page_fault at 8138ee18 [exception RIP: jbd2_journal_file_inode+53] RIP: a02af28c RSP: 88013a411d10 RFLAGS: 00010246 RAX: RBX: 880138fe0ec0 RCX: 0019 RDX: 880138fe0ec0 RSI: RDI: 8801362553d8 RBP: R8: 880136541218 R9: b923 R10: 0020 R11: R12: 8801362553d8 R13: 8801362553d8 R14: 08cc R15: 1000 ORIG_RAX: CS: 0010 SS: 0018 #6 [88013a411d30] ext4_block_zero_page_range at a02d9112 [ext4] #7 [88013a411d88] ext4_truncate at a02d9b5f [ext4] #8 [88013a411de8] ext4_setattr at a02da5b2 [ext4] #9 [88013a411e48] notify_change at 81128a87 #10 [88013a411eb8] do_truncate at 811134fa #11 [88013a411f20] vfs_truncate at 81113656 #12 [88013a411f48] do_sys_truncate at 811137ce #13 [88013a411f80] system_call_fastpath at 81393d29 RIP: 008c3797 RSP: 7fd67a8fda68 RFLAGS: 0246 RAX: 004c RBX: 81393d29 RCX: RDX: 015cb3c0 RSI: 0007f8cc RDI: 0170f590 RBP: 1020 R8: 00c00140 R9: 06e5 R10: R11: 0206 R12: 0001 R13: 015c9900 R14: 015caa50 R15: 8801365fde40 ORIG_RAX: 004c CS: 0033 SS: 002b PIDPPID CPU TASKST %MEM VSZRSS COMM 0 0 0 81613400 RU 0.0 0 0 [swapper/0] 0
Bug#719277: linux-image-3.11-rc4-amd64: Kernel crashes when running Folding@Home as a system service
I've discovered that the kernel only crashes when the Folding@Home core attempts to resume a work unit already in progress. After configuring kdump-tools to not collect unused memory pages (something apparently recommended if your system has a larger amount of memory), I attempted to trigger a crash by starting the Folding@Home client service. However, after starting the service and waiting about 15 seconds or so (about how long it takes from starting the service to the kernel crashing), there was no crash. I waited a while longer, then checked on the work unit progress with FAHControl and noticed that Folding@Home had just downloaded and started a new work unit (that, thankfully, uses the same core as the previous one) as the previous unit had already been finished. I let it run all night without any issues, however when I powered off my laptop, booted it up again later and attempted to run Folding@Home again, the kernel crashed upon Folding@Home trying to resume the work unit. I do seem to have a problem getting crash dump collection to work, though. The dmesg file collected with this latest crash dump (which is apparently where the kernel log gets dumped, separate from the dump file) only contains the line [0.00] precisely 15447298 times (according to nano's line count upon opening the file). Clearly something is broken, but I do not know what exactly it is. (Previous attempts at crash dump collection did not even give me a plain text dmesg file, for some reason they were being saved as binary files, and I was unable to read them.) I may continue trying to get a proper crash dump with a proper kernel log dump with actual information in it so you have something to look at, though at this rate I'm about ready to give up. If you'd like what I have managed to get so far, useless dmesg file and all, I've packaged it up as a 277.4 MB .tar.gz I could send or upload to a file hosting site for you to download. If I do actually manage to get a kernel log with something useful in it (or, at least, something other than the same line repeated millions of times over) I will send that to you as soon as I can. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#719277: linux-image-3.11-rc4-amd64: Kernel crashes when running Folding@Home as a system service
Well, I've discovered why makedumpfile continues to run even after the dump files show up in the folder. It's failing to properly dump the kernel log, and is continually appending the line [ 0.00] to the dmesg file. I just ended up with a 3 GB file, nano ended up kaput trying to open it so I have no idea how many times it ended up printing that line into the file. I am going to file a bug on makedumpfile about this and hopefully it can be resolved, until then I am ceasing my dump collection attempts. (That said, the main dump file seems to be in alright shape, so some information may be able to be gleaned from that... I've archived my latest crash dump without the unnecessarily large, useless dmesg file and the total size comes to 107.4 MB, if your mail server can handle a file this size and you feel there may be something useful in the dump I can send the file with my next message.) (Also, apparently I was wrong about the previous dmesg dump files being saved as binary files, apparently that was a permissions-related issue, as they can only be viewed as root. Attempting to view them as a non-root user seems to mistakenly identify them as binary files rather than plain-text.) -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#719277: linux-image-3.11-rc4-amd64: Kernel crashes when running Folding@Home as a system service
Apparently having the kernel image debug package installed is a good idea when trying to do anything with crash dumps... After installing the ~2GB (unpacked) package I was able to use the crash utility to analyze (to a degree) the crash dump file made by kdump-tools, however I am unable to extract the kernel log from the dump. When I run the 'log' command within crash I get this message: log: WARNING: log buf data structure(s) have changed I can, however, get a backtrace and the process status information from the dump. If you think it would be useful, I can output what I am able to get from crash to a file to send to you for you to look at. Also, I have a few questions: 1) Is there any way at all to give the crash kernel more memory to work with? kdump-tools does not work if I specify an amount greater than 128M in the bootloader config file (the system does not reboot into the crash kernel), which seems unnecessarily small for a system with 4GB of memory available, and it seems like the small amount of memory available slows things down considerably. 2) How long should the crash dump collection process normally take? I've noticed that it usually takes about 4 or 5 minutes after the system finishes rebooting for the dump and dmesg files to show up in the crash dump folder specified (prior to which there's only one file, dump_incomplete), however the makedumpfile process seems to continue running even after 5 hours (watching it with top). 3) Is the system supposed to boot as normal when booting into the crash kernel to collect the crash dump? I ask, because mine does, and the severely limited amount of memory available doesn't seem to allow for a full boot. (I suspect I may need to specifically tell kdump-tools to boot into a more suitable runlevel, as it doesn't appear to do so on its own.) -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#719277: linux-image-3.11-rc4-amd64: Kernel crashes when running Folding@Home as a system service
On Tue, 2013-08-13 at 19:55 -0400, Alex Vanderpol wrote: I'll send the kernel log, though there's no record of the crash anywhere in the log and I can't really see anything in the log that would be useful... [...] You've sent /var/log/kern.log which I didn't expect would include any useful information. I meant that you should extract the kernel log from the crash dump. Unless you already tried it and this is what you meant when you said 'the dmesg dump is 2.9 GB' (it shouldn't be nearly that large...) Ben. -- Ben Hutchings Man invented language to satisfy his deep need to complain. - Lily Tomlin -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#719277: linux-image-3.11-rc4-amd64: Kernel crashes when running Folding@Home as a system service
Ah, I didn't know exactly what you meant. Unfortunately I don't know how to extract anything from the dump files I got with kdump-tools. There are two files in the crash dump directory I made and pointed kdump-tools to, dmesg.201308111839 (which is the 2.9 GB file) and dump.201308111839 (the 1.5 GB file). I cannot seem to find anything useful with Google about what to do with these files. I'm pretty sure the Debian-supplied kernel is configured to work with kdump-tools (at least, the default configuration state in the sources was configured correctly for such, and I did get a kernel dump), but I do not have a debug kernel image available, which I'm assuming would probably make this easier. I'm going to look into trying to set things up better so I can hopefully get a crash dump I can actually do something with, I found a site that has some useful information and maybe I can get something that's actually useful, though if you know how to work with those files I mentioned above, I'll gladly take that information as well. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#719277: linux-image-3.11-rc4-amd64: Kernel crashes when running Folding@Home as a system service
On Sun, 2013-08-11 at 19:48 -0400, Alex Vanderpol wrote: I have to ask: Is it normal for a crash dump (and, apparently, a dmesg dump as well) to be several GB in size? I ask, because my dump file from the crash is 1.5 GB and the dmesg dump is 2.9 GB. Yes, I'm afraid so. I would like to submit these somehow but I don't think via email would be the best way to do so, and I can't find any good, free file hosting sites that will accept files this large. Would anyone have anny suggestions as to what to do with them? I don't think Debian has any regular arrangement for this at the moment. And anyway, this will need to be forwarded upstream once we have a rough idea of where the bug lies. You could start by sending just the kernel log; that might be enough information to make some progress. Ben. -- Ben Hutchings Experience is what causes a person to make new mistakes instead of old ones. signature.asc Description: This is a digitally signed message part
Bug#719277: linux-image-3.11-rc4-amd64: Kernel crashes when running Folding@Home as a system service
I was unable to get a photo of the screen output, apparently neither of the cameras I have available can take high enough resolution shots to actually read the output even slightly, so I carefully wrote down (nearly) everything that was displayed and carefully typed it out into a text file (formatting may not be *exactly* as was on screen, but should be close enough) to send to you. The only thing not written down/typed out was the large list of kernel modules linked to (as I didn't think it was necessary), though if needed I can easily enough crash the kernel again to get that list for you. During my writing out of the terminal output I came to understand, due to its specifically being referenced in the output, that it's not the Folding@Home client service itself that's crashing the kernel, but the Folding@Home core (specifically, FahCore_a4, as you'll see in the terminal output) that's causing the kernel crash. Anyway, I hope this might help shed some light on the problem. BUG: Unable to handle kernel NULL pointer dereference at (null) IP: [a029728c] jbd2_journal_file_inode+0x35/0xdd [jbd2] PGD 1399c1067 PUD 1399b6067 PMD 0 Oops: [#1] SMP Modules linked in: [long list of modules] CPU: 0 PID: 1963 Comm: FahCore_a4 Tainted: G I 3.11-rc4-amd64 #1 Debian 3.11~rc4-1~exp1 Hardware name: Acer Aspire 1810TZ/JM11-MS, BIOS v1.3314 08/31/2010 task: 880139a40801 ti: 880139a4 task.ti: 880139a4 RIP: 0010:[a029728c] [a029728c] jbd2_journal_file_inode+0x35/0xdd [jbd2] RSP: 0018:880139a41d10 EFLAGS:00010246 RAX: RBX: 880138f7e1c0 RCX: 0019 RDX: 880138f7e1c0 RSI: RDI: 8801382c4408 RBP: R08: 8801382cf218 R09: 0020 R10: R11: R12: 8801382c4408 R13: 8801328c4408 R14: 08cc R15: 1000 FS: 7f595ead8700() GS: 88013fc0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: CR3: 000139993000 CR4: 000407f0 Stack: ea000404b728 8801382cf0b0 0734 8801382c4408 a02b1112 1000 007f 08cc 8801382cb540 8801382cf0b0 880139a41de0 8801382c4408 Call Trace: [a02b1112] ? ext4_block_zero_page_range+0x28b/0x29c [ext4] [a02b1bff] ? ext4_truncate+0x152/0x27f [ext4] [8111d48e] ? walk_component+0x163/0x1a2 [8112a22c] ? mntget+0x17/0x1c [811287b9] ? inode_change+0x2c/0x11a [a02b25b2] ? ext4_setattr+0x412/0x4b2 [ext4] [81046971] ? current_fs_time+0x2f/0x35 [81128a87] ? notify_change+0x1e0/0x2cc [81103b75] ? kmem_cache_free+0x3f/0x7c [811134fa] ? do_truncate+0x63/0x87 [81113656] ? vfs_truncate+0xe6/0x10d [811137ce] ? do_sys_truncate+0x3d/0x77 [81393d29] ? system_call_fastpath+0x16/0x1b Code: f5 53 48 8b 1f 48 85 db 75 11 be 49 09 00 00 48 c7 c7 14 03 2a a0 e8 70 c7 da e0 4c 89 e7 e8 3f df ff ff 85 c0 0f 85 98 00 00 48 39 5d 00 4c 8b 2b 0f 84 92 00 00 00 48 39 5d 08 0f 84 88 00 RIP [a029728c] jbd2_journal_file_inode+0x35/0xdd [jbd2] RSP 880139a41d10 CR2: ---[ end trace 0b57ed6584cd4409 ]---
Bug#719277: linux-image-3.11-rc4-amd64: Kernel crashes when running Folding@Home as a system service
So I figured out what's crashing the kernel, apparently kernel 3.11-rc4 and Folding@Home (when run as a system service) don't get along. I suspect this may be an issue with Folding@Home rather than the kernel, I may need to get in touch with them and inform them of this issue so it can be resolved. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#719277: linux-image-3.11-rc4-amd64: Kernel crashes when running Folding@Home as a system service
On Sun, 2013-08-11 at 07:40 -0400, Alex Vanderpol wrote: So I figured out what's crashing the kernel, apparently kernel 3.11-rc4 and Folding@Home (when run as a system service) don't get along. I suspect this may be an issue with Folding@Home rather than the kernel, I may need to get in touch with them and inform them of this issue so it can be resolved. It is a kernel bug; no application should be able to crash the kernel (unless it's run with special privileges). Ben. -- Ben Hutchings For every complex problem there is a solution that is simple, neat, and wrong. signature.asc Description: This is a digitally signed message part
Bug#719277: linux-image-3.11-rc4-amd64: Kernel crashes when running Folding@Home as a system service
Oh, well, in that case, I guess reporting it was a good idea then. I can probably capture a crash dump some time later, if you need it, right now though I need to get some rest. On 11/08/13 07:50 AM, Ben Hutchings wrote: On Sun, 2013-08-11 at 07:40 -0400, Alex Vanderpol wrote: So I figured out what's crashing the kernel, apparently kernel 3.11-rc4 and Folding@Home (when run as a system service) don't get along. I suspect this may be an issue with Folding@Home rather than the kernel, I may need to get in touch with them and inform them of this issue so it can be resolved. It is a kernel bug; no application should be able to crash the kernel (unless it's run with special privileges). Ben. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#719277: linux-image-3.11-rc4-amd64: Kernel crashes when running Folding@Home as a system service
I have to ask: Is it normal for a crash dump (and, apparently, a dmesg dump as well) to be several GB in size? I ask, because my dump file from the crash is 1.5 GB and the dmesg dump is 2.9 GB. I would like to submit these somehow but I don't think via email would be the best way to do so, and I can't find any good, free file hosting sites that will accept files this large. Would anyone have anny suggestions as to what to do with them? -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org