Re: 3.16-rcX crashes on resume from Suspend-To-RAM
I collected all the data that you asked for and attached it to the bug: https://bugzilla.kernel.org/show_bug.cgi?id=80911 Yes, both acpidump output and the list of PNP devices changes when I update the kernel. I was hoping to give you a brief "diff" output for the changes; but there are too many changes for that to make much sense. In any case, you can see it by running: diff -u last-good/dirtree_\!sys\!devices\!pnp0.txt first-bad/dirtree_\!sys\!devices\!pnp0.txt I included a README.txt that describes the contents of all the files. I hope this makes some sense and I hope it is sufficiently complete for you to make progress in debugging why my machine is unhappy. Please don't hesitate to ask, if you think I can provide other data and/or run other tests. Markus On Fri, Aug 15, 2014 at 5:46 PM, Rafael J. Wysocki wrote: > On Friday, August 15, 2014 10:17:42 AM Markus Gutschke wrote: >> Just wondering if any of you had any other ideas of what I could try >> to help debug this problem? > > My theory is that there is a device in your system that we don't have a driver > for, but it had been enumerated as a PNP device before the change that > triggered > the problem for you and we turned it off during suspend as part of the default > ACPI PNP device handling. > > The reason why you're seeing a crash with the "platform" test level is most > likely that the _WAK control method does something unusual on your system. > > The LNXSYBUS:00 thing from dmesg probably is a red herring. > > I need the output of acpidump from the affected system, but please attach it > to the bug entry at https://bugzilla.kernel.org/show_bug.cgi?id=80911 that > Rui has created for this issue. > > Also please check the list of PNP devices under > > /sys/bus/pnp/devices/ > > before and after the commit you have found by bisection and let me know if > there are any differences. > > >> On Tue, Aug 12, 2014 at 9:11 AM, Markus Gutschke wrote: >> > As I said earlier in this thread, echo'ing "devices" into "pm_test" >> > does not result in a crash; but doing so for "platform" does. >> > >> > Markus >> > >> > On Aug 12, 2014 1:26 AM, "Zhang Rui" wrote: >> >> >> >> On Sat, 2014-08-09 at 03:14 -0700, Markus Gutschke wrote: >> >> > I am back and have physical access to the machine now. >> >> > >> >> great! >> >> >> >> > I re-ran the test just to be sure, and I can confirm that "platform" >> >> > does in fact result in a crash. >> >> > >> >> what about "devices"? >> >> I mean >> >> >> >> # echo devices > /sys/power/pm_test >> >> >> >> and see if that triggers the crash. >> >> >> >> > Furthermore, I ran the test that Rui asked for. I suspended, resumed, >> >> > and upon crashing power-cycled the machine ASAP. "dmesg" suggests that >> >> > the problem is with LNXSYBUS:00 That doesn't tell me much, but >> >> > hopefully it makes sense to you guys. >> >> > >> >> [0.930093] Magic number: 10:810:122 >> >> [0.930185] acpi LNXSYBUS:00: hash matches >> >> >> >> This looks weird, ACPI will do nothing for LNXSYBUS devices during >> >> resume. >> >> Rafael, any thought on this? >> >> >> >> thanks, >> >> rui >> >> > > -- > I speak only for myself. > Rafael J. Wysocki, Intel Open Source Technology Center. This archive contains debugging information retrieved for the following kernel versions: 455c6fdbd219161bd09b1165f11699d6d73de11c: Linux 3.14 1860e379875dfe7271c649058aeddffe5afd9d0d: Linux 3.15 aca0a4eb4e325914ddb22a8ed06fcb0222da2a26: Last good commit eec15edbb0e14485998635ea7c62e30911b465f0: First bad commit b04c58b1ed26317bfb4b33d3a2d16377fc6acd0f: Still bad (merge branch "acpi-enumeration") b04c58b1ed26317bfb4b33d3a2d16377fc6acd0f.PATCHED: Patched with a debug patch provided by Rui After eec15edbb0e14485998635ea7c62e30911b465f0, the kernel can no longer resume from suspend. The crash looks like what is shown in "crash.png". Even for defective kernels, "echo devices >/sys/power/pm_test" completes successfully, whereas "echo platform >/sys/power/pm_test" triggers the crash. CONFIG_PM_TRACE_RTC suggests that the bug is caused by LNXSYBUS:00:, but that might be incorrect. Please note that "dmesg" shows an early stack trace during boot. This might or might not be related. Please also not that the output from "acpidump" changes between kernel versions. I also included the output from grep . /sys/bus/pnp/devices/*/firmware_node/* grep . /sys/bus/pnp/devices/*/* grep . /sys/bus/platform/devices/*/firmware_node/* grep . /sys/bus/platform/devices/*/* for each of the kernels. And the directory tree underneath /sys/bus/pnp/devices/ -> /sys/devices/pnp0/.
Re: 3.16-rcX crashes on resume from Suspend-To-RAM
I collected all the data that you asked for and attached it to the bug: https://bugzilla.kernel.org/show_bug.cgi?id=80911 Yes, both acpidump output and the list of PNP devices changes when I update the kernel. I was hoping to give you a brief diff output for the changes; but there are too many changes for that to make much sense. In any case, you can see it by running: diff -u last-good/dirtree_\!sys\!devices\!pnp0.txt first-bad/dirtree_\!sys\!devices\!pnp0.txt I included a README.txt that describes the contents of all the files. I hope this makes some sense and I hope it is sufficiently complete for you to make progress in debugging why my machine is unhappy. Please don't hesitate to ask, if you think I can provide other data and/or run other tests. Markus On Fri, Aug 15, 2014 at 5:46 PM, Rafael J. Wysocki r...@rjwysocki.net wrote: On Friday, August 15, 2014 10:17:42 AM Markus Gutschke wrote: Just wondering if any of you had any other ideas of what I could try to help debug this problem? My theory is that there is a device in your system that we don't have a driver for, but it had been enumerated as a PNP device before the change that triggered the problem for you and we turned it off during suspend as part of the default ACPI PNP device handling. The reason why you're seeing a crash with the platform test level is most likely that the _WAK control method does something unusual on your system. The LNXSYBUS:00 thing from dmesg probably is a red herring. I need the output of acpidump from the affected system, but please attach it to the bug entry at https://bugzilla.kernel.org/show_bug.cgi?id=80911 that Rui has created for this issue. Also please check the list of PNP devices under /sys/bus/pnp/devices/ before and after the commit you have found by bisection and let me know if there are any differences. On Tue, Aug 12, 2014 at 9:11 AM, Markus Gutschke mar...@gutschke.com wrote: As I said earlier in this thread, echo'ing devices into pm_test does not result in a crash; but doing so for platform does. Markus On Aug 12, 2014 1:26 AM, Zhang Rui rui.zh...@intel.com wrote: On Sat, 2014-08-09 at 03:14 -0700, Markus Gutschke wrote: I am back and have physical access to the machine now. great! I re-ran the test just to be sure, and I can confirm that platform does in fact result in a crash. what about devices? I mean # echo devices /sys/power/pm_test and see if that triggers the crash. Furthermore, I ran the test that Rui asked for. I suspended, resumed, and upon crashing power-cycled the machine ASAP. dmesg suggests that the problem is with LNXSYBUS:00 That doesn't tell me much, but hopefully it makes sense to you guys. [0.930093] Magic number: 10:810:122 [0.930185] acpi LNXSYBUS:00: hash matches This looks weird, ACPI will do nothing for LNXSYBUS devices during resume. Rafael, any thought on this? thanks, rui -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. This archive contains debugging information retrieved for the following kernel versions: 455c6fdbd219161bd09b1165f11699d6d73de11c: Linux 3.14 1860e379875dfe7271c649058aeddffe5afd9d0d: Linux 3.15 aca0a4eb4e325914ddb22a8ed06fcb0222da2a26: Last good commit eec15edbb0e14485998635ea7c62e30911b465f0: First bad commit b04c58b1ed26317bfb4b33d3a2d16377fc6acd0f: Still bad (merge branch acpi-enumeration) b04c58b1ed26317bfb4b33d3a2d16377fc6acd0f.PATCHED: Patched with a debug patch provided by Rui After eec15edbb0e14485998635ea7c62e30911b465f0, the kernel can no longer resume from suspend. The crash looks like what is shown in crash.png. Even for defective kernels, echo devices /sys/power/pm_test completes successfully, whereas echo platform /sys/power/pm_test triggers the crash. CONFIG_PM_TRACE_RTC suggests that the bug is caused by LNXSYBUS:00:, but that might be incorrect. Please note that dmesg shows an early stack trace during boot. This might or might not be related. Please also not that the output from acpidump changes between kernel versions. I also included the output from grep . /sys/bus/pnp/devices/*/firmware_node/* grep . /sys/bus/pnp/devices/*/* grep . /sys/bus/platform/devices/*/firmware_node/* grep . /sys/bus/platform/devices/*/* for each of the kernels. And the directory tree underneath /sys/bus/pnp/devices/ - /sys/devices/pnp0/.
Re: 3.16-rcX crashes on resume from Suspend-To-RAM
Just wondering if any of you had any other ideas of what I could try to help debug this problem? Thanks, Markus On Tue, Aug 12, 2014 at 9:11 AM, Markus Gutschke wrote: > As I said earlier in this thread, echo'ing "devices" into "pm_test" > does not result in a crash; but doing so for "platform" does. > > Markus > > On Aug 12, 2014 1:26 AM, "Zhang Rui" wrote: >> >> On Sat, 2014-08-09 at 03:14 -0700, Markus Gutschke wrote: >> > I am back and have physical access to the machine now. >> > >> great! >> >> > I re-ran the test just to be sure, and I can confirm that "platform" >> > does in fact result in a crash. >> > >> what about "devices"? >> I mean >> >> # echo devices > /sys/power/pm_test >> >> and see if that triggers the crash. >> >> > Furthermore, I ran the test that Rui asked for. I suspended, resumed, >> > and upon crashing power-cycled the machine ASAP. "dmesg" suggests that >> > the problem is with LNXSYBUS:00 That doesn't tell me much, but >> > hopefully it makes sense to you guys. >> > >> [0.930093] Magic number: 10:810:122 >> [0.930185] acpi LNXSYBUS:00: hash matches >> >> This looks weird, ACPI will do nothing for LNXSYBUS devices during >> resume. >> Rafael, any thought on this? >> >> thanks, >> rui >> -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.16-rcX crashes on resume from Suspend-To-RAM
Just wondering if any of you had any other ideas of what I could try to help debug this problem? Thanks, Markus On Tue, Aug 12, 2014 at 9:11 AM, Markus Gutschke mar...@gutschke.com wrote: As I said earlier in this thread, echo'ing devices into pm_test does not result in a crash; but doing so for platform does. Markus On Aug 12, 2014 1:26 AM, Zhang Rui rui.zh...@intel.com wrote: On Sat, 2014-08-09 at 03:14 -0700, Markus Gutschke wrote: I am back and have physical access to the machine now. great! I re-ran the test just to be sure, and I can confirm that platform does in fact result in a crash. what about devices? I mean # echo devices /sys/power/pm_test and see if that triggers the crash. Furthermore, I ran the test that Rui asked for. I suspended, resumed, and upon crashing power-cycled the machine ASAP. dmesg suggests that the problem is with LNXSYBUS:00 That doesn't tell me much, but hopefully it makes sense to you guys. [0.930093] Magic number: 10:810:122 [0.930185] acpi LNXSYBUS:00: hash matches This looks weird, ACPI will do nothing for LNXSYBUS devices during resume. Rafael, any thought on this? thanks, rui -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.16-rcX crashes on resume from Suspend-To-RAM
As I said earlier in this thread, echo'ing "devices" into "pm_test" does not result in a crash; but doing so for "platform" does. Markus On Aug 12, 2014 1:26 AM, "Zhang Rui" wrote: > > On Sat, 2014-08-09 at 03:14 -0700, Markus Gutschke wrote: > > I am back and have physical access to the machine now. > > > great! > > > I re-ran the test just to be sure, and I can confirm that "platform" > > does in fact result in a crash. > > > what about "devices"? > I mean > > # echo devices > /sys/power/pm_test > > and see if that triggers the crash. > > > Furthermore, I ran the test that Rui asked for. I suspended, resumed, > > and upon crashing power-cycled the machine ASAP. "dmesg" suggests that > > the problem is with LNXSYBUS:00 That doesn't tell me much, but > > hopefully it makes sense to you guys. > > > [0.930093] Magic number: 10:810:122 > [0.930185] acpi LNXSYBUS:00: hash matches > > This looks weird, ACPI will do nothing for LNXSYBUS devices during > resume. > Rafael, any thought on this? > > thanks, > rui > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.16-rcX crashes on resume from Suspend-To-RAM
As I said earlier in this thread, echo'ing devices into pm_test does not result in a crash; but doing so for platform does. Markus On Aug 12, 2014 1:26 AM, Zhang Rui rui.zh...@intel.com wrote: On Sat, 2014-08-09 at 03:14 -0700, Markus Gutschke wrote: I am back and have physical access to the machine now. great! I re-ran the test just to be sure, and I can confirm that platform does in fact result in a crash. what about devices? I mean # echo devices /sys/power/pm_test and see if that triggers the crash. Furthermore, I ran the test that Rui asked for. I suspended, resumed, and upon crashing power-cycled the machine ASAP. dmesg suggests that the problem is with LNXSYBUS:00 That doesn't tell me much, but hopefully it makes sense to you guys. [0.930093] Magic number: 10:810:122 [0.930185] acpi LNXSYBUS:00: hash matches This looks weird, ACPI will do nothing for LNXSYBUS devices during resume. Rafael, any thought on this? thanks, rui -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.16-rcX crashes on resume from Suspend-To-RAM
I am back and have physical access to the machine now. I re-ran the test just to be sure, and I can confirm that "platform" does in fact result in a crash. Furthermore, I ran the test that Rui asked for. I suspended, resumed, and upon crashing power-cycled the machine ASAP. "dmesg" suggests that the problem is with LNXSYBUS:00 That doesn't tell me much, but hopefully it makes sense to you guys. Let me know, what else you want me to test. Markus [0.00] Initializing cgroup subsys cpuset [0.00] Initializing cgroup subsys cpu [0.00] Initializing cgroup subsys cpuacct [0.00] Linux version 3.15.0-rc8-gutschke (root@medusa) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #16 SMP Fri Jul 25 17:39:40 PDT 2014 [0.00] Command line: BOOT_IMAGE=/vmlinuz-3.15.0-rc8-gutschke root=/dev/mapper/ubuntu--vg-root ro quiet splash crashkernel=384M-:128M [0.00] KERNEL supported cpus: [0.00] Intel GenuineIntel [0.00] AMD AuthenticAMD [0.00] Centaur CentaurHauls [0.00] e820: BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0009efff] usable [0.00] BIOS-e820: [mem 0x0009f000-0x0009] reserved [0.00] BIOS-e820: [mem 0x0010-0xdf451bff] usable [0.00] BIOS-e820: [mem 0xdf451c00-0xdfff] reserved [0.00] BIOS-e820: [mem 0xf800-0xfbff] reserved [0.00] BIOS-e820: [mem 0xfec0-0xfec0] reserved [0.00] BIOS-e820: [mem 0xfed18000-0xfed1bfff] reserved [0.00] BIOS-e820: [mem 0xfed2-0xfed8] reserved [0.00] BIOS-e820: [mem 0xfeda-0xfeda5fff] reserved [0.00] BIOS-e820: [mem 0xfee0-0xfee0] reserved [0.00] BIOS-e820: [mem 0xffe6-0x] reserved [0.00] BIOS-e820: [mem 0x0001-0x00021fff] usable [0.00] NX (Execute Disable) protection: active [0.00] SMBIOS 2.4 present. [0.00] DMI: Dell Inc. Precision M4400 /0R906R, BIOS A18 10/30/2009 [0.00] e820: update [mem 0x-0x0fff] usable ==> reserved [0.00] e820: remove [mem 0x000a-0x000f] usable [0.00] No AGP bridge found [0.00] e820: last_pfn = 0x22 max_arch_pfn = 0x4 [0.00] MTRR default type: uncachable [0.00] MTRR fixed ranges enabled: [0.00] 0-9 write-back [0.00] A-B uncachable [0.00] C-C write-protect [0.00] D-E uncachable [0.00] F-F write-protect [0.00] MTRR variable ranges enabled: [0.00] 0 base 0 mask 8 write-back [0.00] 1 base 0E000 mask FE000 uncachable [0.00] 2 disabled [0.00] 3 disabled [0.00] 4 disabled [0.00] 5 disabled [0.00] 6 disabled [0.00] x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106 [0.00] original variable MTRRs [0.00] reg 0, base: 0GB, range: 32GB, type WB [0.00] reg 1, base: 3584MB, range: 512MB, type UC [0.00] total RAM covered: 32256M [0.00] Found optimal setting for mtrr clean up [0.00] gran_size: 64K chunk_size: 1G num_reg: 5 lose cover RAM: 0G [0.00] New variable MTRRs [0.00] reg 0, base: 0GB, range: 4GB, type WB [0.00] reg 1, base: 3584MB, range: 512MB, type UC [0.00] reg 2, base: 4GB, range: 4GB, type WB [0.00] reg 3, base: 8GB, range: 8GB, type WB [0.00] reg 4, base: 16GB, range: 16GB, type WB [0.00] e820: update [mem 0xe000-0x] usable ==> reserved [0.00] e820: last_pfn = 0xdf451 max_arch_pfn = 0x4 [0.00] Scanning 1 areas for low memory corruption [0.00] Base memory trampoline at [88099000] 99000 size 24576 [0.00] init_memory_mapping: [mem 0x-0x000f] [0.00] [mem 0x-0x000f] page 4k [0.00] BRK [0x01fe9000, 0x01fe9fff] PGTABLE [0.00] BRK [0x01fea000, 0x01feafff] PGTABLE [0.00] BRK [0x01feb000, 0x01febfff] PGTABLE [0.00] init_memory_mapping: [mem 0x21fe0-0x21fff] [0.00] [mem 0x21fe0-0x21fff] page 2M [0.00] BRK [0x01fec000, 0x01fecfff] PGTABLE [0.00] init_memory_mapping: [mem 0x21c00-0x21fdf] [0.00] [mem 0x21c00-0x21fdf] page 2M [0.00] init_memory_mapping: [mem 0x2-0x21bff] [0.00] [mem 0x2-0x21bff] page 2M [0.00] init_memory_mapping: [mem 0x0010-0xdf450fff] [0.00] [mem 0x0010-0x001f] page 4k [0.00] [mem 0x0020-0xdf3f] page 2M [0.00] [mem 0xdf40-0xdf450fff] page 4k [0.00] init_memory_mapping: [mem
Re: 3.16-rcX crashes on resume from Suspend-To-RAM
I am back and have physical access to the machine now. I re-ran the test just to be sure, and I can confirm that platform does in fact result in a crash. Furthermore, I ran the test that Rui asked for. I suspended, resumed, and upon crashing power-cycled the machine ASAP. dmesg suggests that the problem is with LNXSYBUS:00 That doesn't tell me much, but hopefully it makes sense to you guys. Let me know, what else you want me to test. Markus [0.00] Initializing cgroup subsys cpuset [0.00] Initializing cgroup subsys cpu [0.00] Initializing cgroup subsys cpuacct [0.00] Linux version 3.15.0-rc8-gutschke (root@medusa) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #16 SMP Fri Jul 25 17:39:40 PDT 2014 [0.00] Command line: BOOT_IMAGE=/vmlinuz-3.15.0-rc8-gutschke root=/dev/mapper/ubuntu--vg-root ro quiet splash crashkernel=384M-:128M [0.00] KERNEL supported cpus: [0.00] Intel GenuineIntel [0.00] AMD AuthenticAMD [0.00] Centaur CentaurHauls [0.00] e820: BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0009efff] usable [0.00] BIOS-e820: [mem 0x0009f000-0x0009] reserved [0.00] BIOS-e820: [mem 0x0010-0xdf451bff] usable [0.00] BIOS-e820: [mem 0xdf451c00-0xdfff] reserved [0.00] BIOS-e820: [mem 0xf800-0xfbff] reserved [0.00] BIOS-e820: [mem 0xfec0-0xfec0] reserved [0.00] BIOS-e820: [mem 0xfed18000-0xfed1bfff] reserved [0.00] BIOS-e820: [mem 0xfed2-0xfed8] reserved [0.00] BIOS-e820: [mem 0xfeda-0xfeda5fff] reserved [0.00] BIOS-e820: [mem 0xfee0-0xfee0] reserved [0.00] BIOS-e820: [mem 0xffe6-0x] reserved [0.00] BIOS-e820: [mem 0x0001-0x00021fff] usable [0.00] NX (Execute Disable) protection: active [0.00] SMBIOS 2.4 present. [0.00] DMI: Dell Inc. Precision M4400 /0R906R, BIOS A18 10/30/2009 [0.00] e820: update [mem 0x-0x0fff] usable == reserved [0.00] e820: remove [mem 0x000a-0x000f] usable [0.00] No AGP bridge found [0.00] e820: last_pfn = 0x22 max_arch_pfn = 0x4 [0.00] MTRR default type: uncachable [0.00] MTRR fixed ranges enabled: [0.00] 0-9 write-back [0.00] A-B uncachable [0.00] C-C write-protect [0.00] D-E uncachable [0.00] F-F write-protect [0.00] MTRR variable ranges enabled: [0.00] 0 base 0 mask 8 write-back [0.00] 1 base 0E000 mask FE000 uncachable [0.00] 2 disabled [0.00] 3 disabled [0.00] 4 disabled [0.00] 5 disabled [0.00] 6 disabled [0.00] x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106 [0.00] original variable MTRRs [0.00] reg 0, base: 0GB, range: 32GB, type WB [0.00] reg 1, base: 3584MB, range: 512MB, type UC [0.00] total RAM covered: 32256M [0.00] Found optimal setting for mtrr clean up [0.00] gran_size: 64K chunk_size: 1G num_reg: 5 lose cover RAM: 0G [0.00] New variable MTRRs [0.00] reg 0, base: 0GB, range: 4GB, type WB [0.00] reg 1, base: 3584MB, range: 512MB, type UC [0.00] reg 2, base: 4GB, range: 4GB, type WB [0.00] reg 3, base: 8GB, range: 8GB, type WB [0.00] reg 4, base: 16GB, range: 16GB, type WB [0.00] e820: update [mem 0xe000-0x] usable == reserved [0.00] e820: last_pfn = 0xdf451 max_arch_pfn = 0x4 [0.00] Scanning 1 areas for low memory corruption [0.00] Base memory trampoline at [88099000] 99000 size 24576 [0.00] init_memory_mapping: [mem 0x-0x000f] [0.00] [mem 0x-0x000f] page 4k [0.00] BRK [0x01fe9000, 0x01fe9fff] PGTABLE [0.00] BRK [0x01fea000, 0x01feafff] PGTABLE [0.00] BRK [0x01feb000, 0x01febfff] PGTABLE [0.00] init_memory_mapping: [mem 0x21fe0-0x21fff] [0.00] [mem 0x21fe0-0x21fff] page 2M [0.00] BRK [0x01fec000, 0x01fecfff] PGTABLE [0.00] init_memory_mapping: [mem 0x21c00-0x21fdf] [0.00] [mem 0x21c00-0x21fdf] page 2M [0.00] init_memory_mapping: [mem 0x2-0x21bff] [0.00] [mem 0x2-0x21bff] page 2M [0.00] init_memory_mapping: [mem 0x0010-0xdf450fff] [0.00] [mem 0x0010-0x001f] page 4k [0.00] [mem 0x0020-0xdf3f] page 2M [0.00] [mem 0xdf40-0xdf450fff] page 4k [0.00] init_memory_mapping: [mem
Re: 3.16-rcX crashes on resume from Suspend-To-RAM
"platform" does in fact appear to cause a crash (at least, I can't reach the machine anymore, after having suspended). I am still on the road and have to do this remotely, and I cannot get hold of my helper right now. I'll try again later tonight or maybe the next day(s) and will get back to you with more data. Markus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.16-rcX crashes on resume from Suspend-To-RAM
I tried removing snd_hda_intel, but it didn't make any difference. I then followed your instructions to turn on tracing, but I am more puzzled than I was before. The crash reliably happens, every time I suspend/resume without first having tracing turned on. But as soon as I enter "echo devices >/sys/power/pm_test" to enable tracing things change. Upon suspending, the machine now happily resumes itself again a few seconds later. No crash whatsoever. Is this the expected behavior? Anything else you want me to do? Markus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.16-rcX crashes on resume from Suspend-To-RAM
I tried removing snd_hda_intel, but it didn't make any difference. I then followed your instructions to turn on tracing, but I am more puzzled than I was before. The crash reliably happens, every time I suspend/resume without first having tracing turned on. But as soon as I enter echo devices /sys/power/pm_test to enable tracing things change. Upon suspending, the machine now happily resumes itself again a few seconds later. No crash whatsoever. Is this the expected behavior? Anything else you want me to do? Markus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.16-rcX crashes on resume from Suspend-To-RAM
platform does in fact appear to cause a crash (at least, I can't reach the machine anymore, after having suspended). I am still on the road and have to do this remotely, and I cannot get hold of my helper right now. I'll try again later tonight or maybe the next day(s) and will get back to you with more data. Markus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.16-rcX crashes on resume from Suspend-To-RAM
Thanks for checking in. And no, I have not heard from Zhang since my last e-mail. I suspect he is still working on finding a solution. But you are of course right, reverting the patch in the meantime might be a good idea. I would love to be able to suspend my laptop again. But I defer to Zhang for the final decision. As long as it gets fixed eventually, I can personally live with a few weeks of delay while things get worked out. And of course, as offered before, I'll run whatever tests Zhang asks me to do. Markus On Mon, Aug 4, 2014 at 7:16 AM, Pavel Machek wrote: > On Sat 2014-07-26 08:52:34, Markus Gutschke wrote: >> Sorry for the delay. Remotely debugging kernels over a shared and >> flaky 1MBps terrestrial wireless connection is quite a new experience >> to me. >> >> In any case, I was able to collect all the data that you asked for. I >> then used "pm-suspend" to put the machine to sleep and asked a helper >> to physically press the power button to wake the computer back up. My >> helper told me that it crashed just as before, and they had to >> power-cycle the machine to bring it back to life. >> >> Please let me know, what other data I can get for you. And thank you >> very much for putting up with my slow turn-around. I should have much >> better response time again in about two to three weeks when I return >> to civilization. > > Was this solved, somehow? > > If not, can we revert the patch that causes it? > Pavel > > -- > (english) http://www.livejournal.com/~pavelmachek > (cesky, pictures) > http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.16-rcX crashes on resume from Suspend-To-RAM
Thanks for checking in. And no, I have not heard from Zhang since my last e-mail. I suspect he is still working on finding a solution. But you are of course right, reverting the patch in the meantime might be a good idea. I would love to be able to suspend my laptop again. But I defer to Zhang for the final decision. As long as it gets fixed eventually, I can personally live with a few weeks of delay while things get worked out. And of course, as offered before, I'll run whatever tests Zhang asks me to do. Markus On Mon, Aug 4, 2014 at 7:16 AM, Pavel Machek pa...@ucw.cz wrote: On Sat 2014-07-26 08:52:34, Markus Gutschke wrote: Sorry for the delay. Remotely debugging kernels over a shared and flaky 1MBps terrestrial wireless connection is quite a new experience to me. In any case, I was able to collect all the data that you asked for. I then used pm-suspend to put the machine to sleep and asked a helper to physically press the power button to wake the computer back up. My helper told me that it crashed just as before, and they had to power-cycle the machine to bring it back to life. Please let me know, what other data I can get for you. And thank you very much for putting up with my slow turn-around. I should have much better response time again in about two to three weeks when I return to civilization. Was this solved, somehow? If not, can we revert the patch that causes it? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.16-rcX crashes on resume from Suspend-To-RAM
Sorry for the delay. Remotely debugging kernels over a shared and flaky 1MBps terrestrial wireless connection is quite a new experience to me. In any case, I was able to collect all the data that you asked for. I then used "pm-suspend" to put the machine to sleep and asked a helper to physically press the power button to wake the computer back up. My helper told me that it crashed just as before, and they had to power-cycle the machine to bring it back to life. Please let me know, what other data I can get for you. And thank you very much for putting up with my slow turn-around. I should have much better response time again in about two to three weeks when I return to civilization. # Startup log file for stock 3.15 kernel https://medusa.gutschke.com/markus/3.15-dmesg.txt # Startup log file for b04c58b1ed26317bfb4b33d3a2d16377fc6acd0f https://medusa.gutschke.com/markus/3.15-rc8-dmesg.txt # /sys/* files for b04c58b1ed26317bfb4b33d3a2d16377fc6acd0f https://medusa.gutschke.com/markus/3.15-rc8-platform-devices-firmware-node.txt https://medusa.gutschke.com/markus/3.15-rc8-platform-devices.txt https://medusa.gutschke.com/markus/3.15-rc8-pnp-devices-firmware-node.txt https://medusa.gutschke.com/markus/3.15-rc8-pnp-devices.txt # /sys/* files for b04c58b1ed26317bfb4b33d3a2d16377fc6acd0f with patch applied https://medusa.gutschke.com/markus/3.15-rc8-patched-dmesg.txt https://medusa.gutschke.com/markus/3.15-rc8-patched-platform-devices-firmware-node.txt https://medusa.gutschke.com/markus/3.15-rc8-patched-platform-devices.txt https://medusa.gutschke.com/markus/3.15-rc8-patched-pnp-devices-firmware-node.txt https://medusa.gutschke.com/markus/3.15-rc8-patched-pnp-devices.txt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.16-rcX crashes on resume from Suspend-To-RAM
Sorry for the delay. Remotely debugging kernels over a shared and flaky 1MBps terrestrial wireless connection is quite a new experience to me. In any case, I was able to collect all the data that you asked for. I then used pm-suspend to put the machine to sleep and asked a helper to physically press the power button to wake the computer back up. My helper told me that it crashed just as before, and they had to power-cycle the machine to bring it back to life. Please let me know, what other data I can get for you. And thank you very much for putting up with my slow turn-around. I should have much better response time again in about two to three weeks when I return to civilization. # Startup log file for stock 3.15 kernel https://medusa.gutschke.com/markus/3.15-dmesg.txt # Startup log file for b04c58b1ed26317bfb4b33d3a2d16377fc6acd0f https://medusa.gutschke.com/markus/3.15-rc8-dmesg.txt # /sys/* files for b04c58b1ed26317bfb4b33d3a2d16377fc6acd0f https://medusa.gutschke.com/markus/3.15-rc8-platform-devices-firmware-node.txt https://medusa.gutschke.com/markus/3.15-rc8-platform-devices.txt https://medusa.gutschke.com/markus/3.15-rc8-pnp-devices-firmware-node.txt https://medusa.gutschke.com/markus/3.15-rc8-pnp-devices.txt # /sys/* files for b04c58b1ed26317bfb4b33d3a2d16377fc6acd0f with patch applied https://medusa.gutschke.com/markus/3.15-rc8-patched-dmesg.txt https://medusa.gutschke.com/markus/3.15-rc8-patched-platform-devices-firmware-node.txt https://medusa.gutschke.com/markus/3.15-rc8-patched-platform-devices.txt https://medusa.gutschke.com/markus/3.15-rc8-patched-pnp-devices-firmware-node.txt https://medusa.gutschke.com/markus/3.15-rc8-patched-pnp-devices.txt -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.16-rcX crashes on resume from Suspend-To-RAM
Please note the crash in "dmesg" right after booting. This looks relevant: https://medusa.gutschke.com/markus/acpi/after-dmesg.txt https://medusa.gutschke.com/markus/acpi/acpidump.txt https://medusa.gutschke.com/markus/acpi/before-platform-devices-firmware-node.txt https://medusa.gutschke.com/markus/acpi/before-platform-devices.txt https://medusa.gutschke.com/markus/acpi/before-pnp-devices-firmware-node.txt https://medusa.gutschke.com/markus/acpi/before-pnp-devices.txt https://medusa.gutschke.com/markus/acpi/after-platform-devices-firmware-node.txt https://medusa.gutschke.com/markus/acpi/after-platform-devices.txt https://medusa.gutschke.com/markus/acpi/after-pnp-devices-firmware-node.txt https://medusa.gutschke.com/markus/acpi/after-pnp-devices.txt Markus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.16-rcX crashes on resume from Suspend-To-RAM
Adding the reviewers of the faulty change list to the cc list for this e-mail. I hope that is considered proper etiquette for the LKML. On Tue, Jul 15, 2014 at 6:51 PM, Markus Gutschke wrote: > My Dell M4400 has been pretty well-supported by Linux a couple of > years now, but recent 3.16-rcX cause hard crashes when resuming from > Suspend-to-RAM. > > This is tricky to debug, as device drivers are not yet restored by the > time that the crash happens. So, I can't use Page-UP to scroll the > screen and see the full crash information. I also cannot use the > netconsole; the ethernet device is still suspended. For similar > reasons, crash kernels don't seem to work either. > > After about a day of false starts and a lengthy bi-secting session, I > finally narrowed things down to this change list: > > eec15edbb0e14485998635ea7c62e30911b465f0 is the first bad commit > commit eec15edbb0e14485998635ea7c62e30911b465f0 > Author: Zhang Rui > Date: Fri May 30 04:23:01 2014 +0200 > > ACPI / PNP: use device ID list for PNPACPI device enumeration > > ACPI can be used to enumerate PNP devices, but the code does not > handle this in the right way currently. Namely, if an ACPI device > object > 1. Has a _CRS method, > 2. Has an identification of > "three capital characters followed by four hex digits", > 3. Is not in the excluded IDs list, > it will be enumerated to PNP bus (that is, a PNP device object will > be create for it). This means that, actually, the PNP bus type is > used as the default bus type for enumerating _HID devices in ACPI. > > However, more and more _HID devices need to be enumerated to the > platform bus instead (that is, platform device objects need to be > created for them). As a result, the device ID list in acpi_platform.c > is used to enforce creating platform device objects rather than PNP > device objects for matching devices. That list has been continuously > growing recently, unfortunately, and it is pretty much guaranteed to > grow even more in the future. > > To address that problem it is better to enumerate _HID devices > as platform devices by default. To this end, change the way of > enumerating PNP devices by adding a PNP ACPI scan handler that > will use a device ID list to create PNP devices for the ACPI > device objects whose device IDs are present in that list. > > The initial device ID list in the PNP ACPI scan handler contains > all of the pnp_device_id strings from all the existing PNP drivers, > so this change should be transparent to the PNP core and all of the > PNP drivers. Still, in the future it should be possible to reduce > its size by converting PNP drivers that need not be PNP for any > technical reasons into platform drivers. > > Signed-off-by: Zhang Rui > [rjw: Rewrote the changelog, modified the PNP ACPI scan handler code] > Signed-off-by: Rafael J. Wysocki > Reviewed-by: Mika Westerberg > > :04 04 > b7c07232aa46ae7b6faf9a907fb7274a02e4680fc2e05b31a61dccd087c554adecc89a43a1ed81f7 > M drivers > :04 04 > 4eda970292fffbeebe167f9210502527df4e8ab421e9e6fd84c780a34bf3d48b5e7618b551da3b1a > M include > > I took a photo of the crash. It feels silly to do, but I couldn't > think of a better solution. You can find it at > https://drive.google.com/file/d/0B8SxqKDe4hyheTlTLXY2YThkMXM > > As I mentioned earlier, a bunch of information has already scrolled > off the screen, but hopefully what is visible is somewhat helpful. > > I will have only limited internet access the next couple of weeks. But > I wanted to make sure I at least got the result of the bisection out > to LKML. I will make every best effort to collect additional data, if > asked to do so; but some of it might be delayed for a little bit, > until I can get access to reasonably powerful hardware or reasonably > fast internet. > > > Markus > > P.S.: Please keep me cc'd on all responses, as I am not subscribed to > the firehose that is LKML. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.16-rcX crashes on resume from Suspend-To-RAM
Adding the reviewers of the faulty change list to the cc list for this e-mail. I hope that is considered proper etiquette for the LKML. On Tue, Jul 15, 2014 at 6:51 PM, Markus Gutschke mar...@gutschke.com wrote: My Dell M4400 has been pretty well-supported by Linux a couple of years now, but recent 3.16-rcX cause hard crashes when resuming from Suspend-to-RAM. This is tricky to debug, as device drivers are not yet restored by the time that the crash happens. So, I can't use Page-UP to scroll the screen and see the full crash information. I also cannot use the netconsole; the ethernet device is still suspended. For similar reasons, crash kernels don't seem to work either. After about a day of false starts and a lengthy bi-secting session, I finally narrowed things down to this change list: eec15edbb0e14485998635ea7c62e30911b465f0 is the first bad commit commit eec15edbb0e14485998635ea7c62e30911b465f0 Author: Zhang Rui rui.zh...@intel.com Date: Fri May 30 04:23:01 2014 +0200 ACPI / PNP: use device ID list for PNPACPI device enumeration ACPI can be used to enumerate PNP devices, but the code does not handle this in the right way currently. Namely, if an ACPI device object 1. Has a _CRS method, 2. Has an identification of three capital characters followed by four hex digits, 3. Is not in the excluded IDs list, it will be enumerated to PNP bus (that is, a PNP device object will be create for it). This means that, actually, the PNP bus type is used as the default bus type for enumerating _HID devices in ACPI. However, more and more _HID devices need to be enumerated to the platform bus instead (that is, platform device objects need to be created for them). As a result, the device ID list in acpi_platform.c is used to enforce creating platform device objects rather than PNP device objects for matching devices. That list has been continuously growing recently, unfortunately, and it is pretty much guaranteed to grow even more in the future. To address that problem it is better to enumerate _HID devices as platform devices by default. To this end, change the way of enumerating PNP devices by adding a PNP ACPI scan handler that will use a device ID list to create PNP devices for the ACPI device objects whose device IDs are present in that list. The initial device ID list in the PNP ACPI scan handler contains all of the pnp_device_id strings from all the existing PNP drivers, so this change should be transparent to the PNP core and all of the PNP drivers. Still, in the future it should be possible to reduce its size by converting PNP drivers that need not be PNP for any technical reasons into platform drivers. Signed-off-by: Zhang Rui rui.zh...@intel.com [rjw: Rewrote the changelog, modified the PNP ACPI scan handler code] Signed-off-by: Rafael J. Wysocki rafael.j.wyso...@intel.com Reviewed-by: Mika Westerberg mika.westerb...@linux.intel.com :04 04 b7c07232aa46ae7b6faf9a907fb7274a02e4680fc2e05b31a61dccd087c554adecc89a43a1ed81f7 M drivers :04 04 4eda970292fffbeebe167f9210502527df4e8ab421e9e6fd84c780a34bf3d48b5e7618b551da3b1a M include I took a photo of the crash. It feels silly to do, but I couldn't think of a better solution. You can find it at https://drive.google.com/file/d/0B8SxqKDe4hyheTlTLXY2YThkMXM As I mentioned earlier, a bunch of information has already scrolled off the screen, but hopefully what is visible is somewhat helpful. I will have only limited internet access the next couple of weeks. But I wanted to make sure I at least got the result of the bisection out to LKML. I will make every best effort to collect additional data, if asked to do so; but some of it might be delayed for a little bit, until I can get access to reasonably powerful hardware or reasonably fast internet. Markus P.S.: Please keep me cc'd on all responses, as I am not subscribed to the firehose that is LKML. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.16-rcX crashes on resume from Suspend-To-RAM
Please note the crash in dmesg right after booting. This looks relevant: https://medusa.gutschke.com/markus/acpi/after-dmesg.txt https://medusa.gutschke.com/markus/acpi/acpidump.txt https://medusa.gutschke.com/markus/acpi/before-platform-devices-firmware-node.txt https://medusa.gutschke.com/markus/acpi/before-platform-devices.txt https://medusa.gutschke.com/markus/acpi/before-pnp-devices-firmware-node.txt https://medusa.gutschke.com/markus/acpi/before-pnp-devices.txt https://medusa.gutschke.com/markus/acpi/after-platform-devices-firmware-node.txt https://medusa.gutschke.com/markus/acpi/after-platform-devices.txt https://medusa.gutschke.com/markus/acpi/after-pnp-devices-firmware-node.txt https://medusa.gutschke.com/markus/acpi/after-pnp-devices.txt Markus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
3.16-rcX crashes on resume from Suspend-To-RAM
My Dell M4400 has been pretty well-supported by Linux a couple of years now, but recent 3.16-rcX cause hard crashes when resuming from Suspend-to-RAM. This is tricky to debug, as device drivers are not yet restored by the time that the crash happens. So, I can't use Page-UP to scroll the screen and see the full crash information. I also cannot use the netconsole; the ethernet device is still suspended. For similar reasons, crash kernels don't seem to work either. After about a day of false starts and a lengthy bi-secting session, I finally narrowed things down to this change list: eec15edbb0e14485998635ea7c62e30911b465f0 is the first bad commit commit eec15edbb0e14485998635ea7c62e30911b465f0 Author: Zhang Rui Date: Fri May 30 04:23:01 2014 +0200 ACPI / PNP: use device ID list for PNPACPI device enumeration ACPI can be used to enumerate PNP devices, but the code does not handle this in the right way currently. Namely, if an ACPI device object 1. Has a _CRS method, 2. Has an identification of "three capital characters followed by four hex digits", 3. Is not in the excluded IDs list, it will be enumerated to PNP bus (that is, a PNP device object will be create for it). This means that, actually, the PNP bus type is used as the default bus type for enumerating _HID devices in ACPI. However, more and more _HID devices need to be enumerated to the platform bus instead (that is, platform device objects need to be created for them). As a result, the device ID list in acpi_platform.c is used to enforce creating platform device objects rather than PNP device objects for matching devices. That list has been continuously growing recently, unfortunately, and it is pretty much guaranteed to grow even more in the future. To address that problem it is better to enumerate _HID devices as platform devices by default. To this end, change the way of enumerating PNP devices by adding a PNP ACPI scan handler that will use a device ID list to create PNP devices for the ACPI device objects whose device IDs are present in that list. The initial device ID list in the PNP ACPI scan handler contains all of the pnp_device_id strings from all the existing PNP drivers, so this change should be transparent to the PNP core and all of the PNP drivers. Still, in the future it should be possible to reduce its size by converting PNP drivers that need not be PNP for any technical reasons into platform drivers. Signed-off-by: Zhang Rui [rjw: Rewrote the changelog, modified the PNP ACPI scan handler code] Signed-off-by: Rafael J. Wysocki Reviewed-by: Mika Westerberg :04 04 b7c07232aa46ae7b6faf9a907fb7274a02e4680fc2e05b31a61dccd087c554adecc89a43a1ed81f7 M drivers :04 04 4eda970292fffbeebe167f9210502527df4e8ab421e9e6fd84c780a34bf3d48b5e7618b551da3b1a M include I took a photo of the crash. It feels silly to do, but I couldn't think of a better solution. You can find it at https://drive.google.com/file/d/0B8SxqKDe4hyheTlTLXY2YThkMXM As I mentioned earlier, a bunch of information has already scrolled off the screen, but hopefully what is visible is somewhat helpful. I will have only limited internet access the next couple of weeks. But I wanted to make sure I at least got the result of the bisection out to LKML. I will make every best effort to collect additional data, if asked to do so; but some of it might be delayed for a little bit, until I can get access to reasonably powerful hardware or reasonably fast internet. Markus P.S.: Please keep me cc'd on all responses, as I am not subscribed to the firehose that is LKML. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
3.16-rcX crashes on resume from Suspend-To-RAM
My Dell M4400 has been pretty well-supported by Linux a couple of years now, but recent 3.16-rcX cause hard crashes when resuming from Suspend-to-RAM. This is tricky to debug, as device drivers are not yet restored by the time that the crash happens. So, I can't use Page-UP to scroll the screen and see the full crash information. I also cannot use the netconsole; the ethernet device is still suspended. For similar reasons, crash kernels don't seem to work either. After about a day of false starts and a lengthy bi-secting session, I finally narrowed things down to this change list: eec15edbb0e14485998635ea7c62e30911b465f0 is the first bad commit commit eec15edbb0e14485998635ea7c62e30911b465f0 Author: Zhang Rui rui.zh...@intel.com Date: Fri May 30 04:23:01 2014 +0200 ACPI / PNP: use device ID list for PNPACPI device enumeration ACPI can be used to enumerate PNP devices, but the code does not handle this in the right way currently. Namely, if an ACPI device object 1. Has a _CRS method, 2. Has an identification of three capital characters followed by four hex digits, 3. Is not in the excluded IDs list, it will be enumerated to PNP bus (that is, a PNP device object will be create for it). This means that, actually, the PNP bus type is used as the default bus type for enumerating _HID devices in ACPI. However, more and more _HID devices need to be enumerated to the platform bus instead (that is, platform device objects need to be created for them). As a result, the device ID list in acpi_platform.c is used to enforce creating platform device objects rather than PNP device objects for matching devices. That list has been continuously growing recently, unfortunately, and it is pretty much guaranteed to grow even more in the future. To address that problem it is better to enumerate _HID devices as platform devices by default. To this end, change the way of enumerating PNP devices by adding a PNP ACPI scan handler that will use a device ID list to create PNP devices for the ACPI device objects whose device IDs are present in that list. The initial device ID list in the PNP ACPI scan handler contains all of the pnp_device_id strings from all the existing PNP drivers, so this change should be transparent to the PNP core and all of the PNP drivers. Still, in the future it should be possible to reduce its size by converting PNP drivers that need not be PNP for any technical reasons into platform drivers. Signed-off-by: Zhang Rui rui.zh...@intel.com [rjw: Rewrote the changelog, modified the PNP ACPI scan handler code] Signed-off-by: Rafael J. Wysocki rafael.j.wyso...@intel.com Reviewed-by: Mika Westerberg mika.westerb...@linux.intel.com :04 04 b7c07232aa46ae7b6faf9a907fb7274a02e4680fc2e05b31a61dccd087c554adecc89a43a1ed81f7 M drivers :04 04 4eda970292fffbeebe167f9210502527df4e8ab421e9e6fd84c780a34bf3d48b5e7618b551da3b1a M include I took a photo of the crash. It feels silly to do, but I couldn't think of a better solution. You can find it at https://drive.google.com/file/d/0B8SxqKDe4hyheTlTLXY2YThkMXM As I mentioned earlier, a bunch of information has already scrolled off the screen, but hopefully what is visible is somewhat helpful. I will have only limited internet access the next couple of weeks. But I wanted to make sure I at least got the result of the bisection out to LKML. I will make every best effort to collect additional data, if asked to do so; but some of it might be delayed for a little bit, until I can get access to reasonably powerful hardware or reasonably fast internet. Markus P.S.: Please keep me cc'd on all responses, as I am not subscribed to the firehose that is LKML. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Marvell 7042 (sata_mv) fails to initialize drive
I have done some more testing, and it now looks as if this was actually a hardware fault. Reseating the PCI-E card made the problem go away (knock on wood). I am a little puzzled that it is possible for the card to show up on the PCI bus, and for the driver to be able to detect whether a disk is connected, but then for it to fail to communicate with the disk. But oh well, I guess if just some of the PCI-E signals aren't connected, strange error modes are to be expected. Sorry for the false alarm. And just for the record, if any you need any more testing done with 7042 controllers, feel free to ask me for help -- I think I read somewhere that Jeff was looking for testers that have access to this hardware. Markus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Marvell 7042 (sata_mv) fails to initialize drive
I have done some more testing, and it now looks as if this was actually a hardware fault. Reseating the PCI-E card made the problem go away (knock on wood). I am a little puzzled that it is possible for the card to show up on the PCI bus, and for the driver to be able to detect whether a disk is connected, but then for it to fail to communicate with the disk. But oh well, I guess if just some of the PCI-E signals aren't connected, strange error modes are to be expected. Sorry for the false alarm. And just for the record, if any you need any more testing done with 7042 controllers, feel free to ask me for help -- I think I read somewhere that Jeff was looking for testers that have access to this hardware. Markus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Marvell 7042 (sata_mv) fails to initialize drive
I just tried hooking up a Hitachi 1TB SATA-II drive to a Marvell 7042 based controller, and the most recent Linux kernel (2.6.23-rc1) fails to properly initialize the interface. Here are the relevant kernel messages: kernel: [43.312417] sata_mv :06:00.0: version 0.81 kernel: [43.312752] ACPI: PCI Interrupt Link [APC5] enabled at IRQ 16 kernel: [43.312757] ACPI: PCI Interrupt :06:00.0[A] -> Link [APC5] -> GSI 16 (level, low) -> IRQ 16 kernel: [43.312788] sata_mv :06:00.0: Applying 60X1C0 workarounds to unknown rev kernel: [43.314443] sata_mv :06:00.0: Gen-IIE 32 slots 4 ports SCSI mode IRQ via INTx kernel: [43.314535] scsi0 : sata_mv kernel: [43.314581] scsi1 : sata_mv kernel: [43.314614] scsi2 : sata_mv kernel: [43.314640] scsi3 : sata_mv kernel: [43.314660] ata1: SATA max UDMA/133 cmd 0x ctl 0xc20003522120 bmdma 0x irq 16 kernel: [43.314663] ata2: SATA max UDMA/133 cmd 0x ctl 0xc20003524120 bmdma 0x irq 16 kernel: [43.314666] ata3: SATA max UDMA/133 cmd 0x ctl 0xc20003526120 bmdma 0x irq 16 kernel: [43.314669] ata4: SATA max UDMA/133 cmd 0x ctl 0xc20003528120 bmdma 0x irq 16 kernel: [53.409086] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) kernel: [59.741602] ata1: EH pending after completion, repeating EH (cnt=4) kernel: [59.777642] ata2: SATA link down (SStatus 0 SControl 300) kernel: [59.809752] ata3: SATA link down (SStatus 0 SControl 300) kernel: [59.841740] ata4: SATA link down (SStatus 0 SControl 300) The kernel never even registers the drive as an available disk device, whereas everything appears to work fine, if I connect the disk to one of the other controllers (JMicron AHCI, or NVidia sata_nv) on this motherboard. As I have two of those disks (in a RAID-1 array) and multiple independent controllers, it is relatively easy for me to do some testing here. The worst case scenario is that I need to wait a couple of hours for the array to rebuild itself after I am done experimenting. Let me know, if there is anything I can do to help you diagnose the root cause, or whether this is a known bug and you don't need any help testing at this point. Markus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Marvell 7042 (sata_mv) fails to initialize drive
I just tried hooking up a Hitachi 1TB SATA-II drive to a Marvell 7042 based controller, and the most recent Linux kernel (2.6.23-rc1) fails to properly initialize the interface. Here are the relevant kernel messages: kernel: [43.312417] sata_mv :06:00.0: version 0.81 kernel: [43.312752] ACPI: PCI Interrupt Link [APC5] enabled at IRQ 16 kernel: [43.312757] ACPI: PCI Interrupt :06:00.0[A] - Link [APC5] - GSI 16 (level, low) - IRQ 16 kernel: [43.312788] sata_mv :06:00.0: Applying 60X1C0 workarounds to unknown rev kernel: [43.314443] sata_mv :06:00.0: Gen-IIE 32 slots 4 ports SCSI mode IRQ via INTx kernel: [43.314535] scsi0 : sata_mv kernel: [43.314581] scsi1 : sata_mv kernel: [43.314614] scsi2 : sata_mv kernel: [43.314640] scsi3 : sata_mv kernel: [43.314660] ata1: SATA max UDMA/133 cmd 0x ctl 0xc20003522120 bmdma 0x irq 16 kernel: [43.314663] ata2: SATA max UDMA/133 cmd 0x ctl 0xc20003524120 bmdma 0x irq 16 kernel: [43.314666] ata3: SATA max UDMA/133 cmd 0x ctl 0xc20003526120 bmdma 0x irq 16 kernel: [43.314669] ata4: SATA max UDMA/133 cmd 0x ctl 0xc20003528120 bmdma 0x irq 16 kernel: [53.409086] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) kernel: [59.741602] ata1: EH pending after completion, repeating EH (cnt=4) kernel: [59.777642] ata2: SATA link down (SStatus 0 SControl 300) kernel: [59.809752] ata3: SATA link down (SStatus 0 SControl 300) kernel: [59.841740] ata4: SATA link down (SStatus 0 SControl 300) The kernel never even registers the drive as an available disk device, whereas everything appears to work fine, if I connect the disk to one of the other controllers (JMicron AHCI, or NVidia sata_nv) on this motherboard. As I have two of those disks (in a RAID-1 array) and multiple independent controllers, it is relatively easy for me to do some testing here. The worst case scenario is that I need to wait a couple of hours for the array to rebuild itself after I am done experimenting. Let me know, if there is anything I can do to help you diagnose the root cause, or whether this is a known bug and you don't need any help testing at this point. Markus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] coredump: core dump masking support v3
Kawai, Hidehiro wrote: Requirements are: (1) a user can change the core dump settings _anytime_ - sometimes want to dump anonymous shared memory segments and sometimes don't want to dump them I might not have been sufficiently clear about this in my previous e-mail. Currently, the Google coredumper does not have the feature that you asked about, but adding it would be trivial -- it just hadn't been needed, yet, as on-the-fly compression was good enough for most users. Answering your question, I don't see any reason why the new API would not be able to make changes at any time. (2) a user can change the core dump settings of _any processes_ (although permission checks are performed) - in a huge application which forks many processes, a user hopes that some processes dump anonymous shared memory segments and some processes don't dump them The Google coredumper is a library that needs to be linked into the application and needs to be called from appropriate signal handlers. As such, it is the application's responsibility what management API it wants to expose externally, and what tools it wants to provide for managing a group of processes. And reliability of the core dump feature is also important. We have generally had very good reliability with the Google coredumper. In some cases, it even works a little more reliably than the default in-kernel dumper (e.g. because we can control where to write the file, and whether it should be compressed on-the-fly; or because we can get multi-threaded coredumps even in situations where the particular combination of libc and kernel doesn't support this), and in other cases the in-kernel dumper works a little better (e.g. if an application got too corrupted to even run any signal handlers). Realistically, it just works. But we did have to make sure that we set up alternate stacks for signal processing, and that we made sure that these stacks have been dirtied in order to avoid problems with memory overcomitting. And all the software vendors don't necessarily apply google-coredumper. If the vendor doesn't apply it, the user will be bothered by huge core dumps or the buggy application which remains unfixed. So I believe that in kernel solution is still needed. I agree that the Google coredumper is only one possible solution to your problem. Depending on how your production environment looks like, it might help a lot, or it might be completely useless. If it is cheap for you to modify your applications, but expensive to upgrade your kernels, the Google coredumper is the way to go. Also, if you need the extra features, such as the ability to compress core files on-the-fly, or the ability to send corefiles to somewhere other than an on-disk file, you definitely should look at a user-space solution. On the other hand, if you can easily upgrade all your kernels, but you don't even have access to the source of your applications, then an in-kernel solution is much better. Markus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] coredump: core dump masking support v3
Kawai, Hidehiro wrote: Requirements are: (1) a user can change the core dump settings _anytime_ - sometimes want to dump anonymous shared memory segments and sometimes don't want to dump them I might not have been sufficiently clear about this in my previous e-mail. Currently, the Google coredumper does not have the feature that you asked about, but adding it would be trivial -- it just hadn't been needed, yet, as on-the-fly compression was good enough for most users. Answering your question, I don't see any reason why the new API would not be able to make changes at any time. (2) a user can change the core dump settings of _any processes_ (although permission checks are performed) - in a huge application which forks many processes, a user hopes that some processes dump anonymous shared memory segments and some processes don't dump them The Google coredumper is a library that needs to be linked into the application and needs to be called from appropriate signal handlers. As such, it is the application's responsibility what management API it wants to expose externally, and what tools it wants to provide for managing a group of processes. And reliability of the core dump feature is also important. We have generally had very good reliability with the Google coredumper. In some cases, it even works a little more reliably than the default in-kernel dumper (e.g. because we can control where to write the file, and whether it should be compressed on-the-fly; or because we can get multi-threaded coredumps even in situations where the particular combination of libc and kernel doesn't support this), and in other cases the in-kernel dumper works a little better (e.g. if an application got too corrupted to even run any signal handlers). Realistically, it just works. But we did have to make sure that we set up alternate stacks for signal processing, and that we made sure that these stacks have been dirtied in order to avoid problems with memory overcomitting. And all the software vendors don't necessarily apply google-coredumper. If the vendor doesn't apply it, the user will be bothered by huge core dumps or the buggy application which remains unfixed. So I believe that in kernel solution is still needed. I agree that the Google coredumper is only one possible solution to your problem. Depending on how your production environment looks like, it might help a lot, or it might be completely useless. If it is cheap for you to modify your applications, but expensive to upgrade your kernels, the Google coredumper is the way to go. Also, if you need the extra features, such as the ability to compress core files on-the-fly, or the ability to send corefiles to somewhere other than an on-disk file, you definitely should look at a user-space solution. On the other hand, if you can easily upgrade all your kernels, but you don't even have access to the source of your applications, then an in-kernel solution is much better. Markus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] coredump: core dump masking support v3
David Howells wrote: How does it work when you can't actually get back to userspace to have userspace do the coredump? You still have to handle the userspace equivalents of double/triple faults. My experience shows that there are only very rare occurrences of situations where you cannot get back into userspace to do the coredump, and the coredumper tries very hard never to cause additional faults. While I am sure you could construct scenarios where this would happen, realistically the only one I have run into were stack overflows, and they can be handled by carefully setting up an alternate stack for signal handlers -- just make sure the entire stack is already dirtied before you run out of memory (or, turn of overcommitting). Markus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] coredump: core dump masking support v3
David Howells wrote: How does it work when you can't actually get back to userspace to have userspace do the coredump? You still have to handle the userspace equivalents of double/triple faults. My experience shows that there are only very rare occurrences of situations where you cannot get back into userspace to do the coredump, and the coredumper tries very hard never to cause additional faults. While I am sure you could construct scenarios where this would happen, realistically the only one I have run into were stack overflows, and they can be handled by carefully setting up an alternate stack for signal handlers -- just make sure the entire stack is already dirtied before you run out of memory (or, turn of overcommitting). Markus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] coredump: core dump masking support v3
Kawai, Hidehiro wrote: This patch series is version 3 of the core dump masking feature, which provides a per-process flag not to dump anonymous shared memory segments. I just wanted to remind you that you need to be careful about dumping the [vdso] segment no matter whether you omit other segments. I didn't actually try running your patches, and if the kernel doesn't actually consider this segment anonymous and shared, things might already work fine as is. In any case, you can check with "readelf -a", if the [vdso] segment is there. And you will find that if you forget to dump it, "gdb" can no longer give you stack traces on call chains that involve signal handlers. As an alternative to your kernel patch, you could achieve the same goal in user space, by linking my coredumper http://code.google.com/p/google-coredumper/ into your binaries and setting up appropriate signal handlers. An equivalent patch for selectively omitting memory regions would be trivial to add. While this does give you more flexibility, it of course has the drawback of requiring you to change your applications, so there still is some benefit in a kernelspace solution. Markus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] coredump: core dump masking support v3
Kawai, Hidehiro wrote: This patch series is version 3 of the core dump masking feature, which provides a per-process flag not to dump anonymous shared memory segments. I just wanted to remind you that you need to be careful about dumping the [vdso] segment no matter whether you omit other segments. I didn't actually try running your patches, and if the kernel doesn't actually consider this segment anonymous and shared, things might already work fine as is. In any case, you can check with readelf -a, if the [vdso] segment is there. And you will find that if you forget to dump it, gdb can no longer give you stack traces on call chains that involve signal handlers. As an alternative to your kernel patch, you could achieve the same goal in user space, by linking my coredumper http://code.google.com/p/google-coredumper/ into your binaries and setting up appropriate signal handlers. An equivalent patch for selectively omitting memory regions would be trivial to add. While this does give you more flexibility, it of course has the drawback of requiring you to change your applications, so there still is some benefit in a kernelspace solution. Markus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] bsd-style cursor
Lennert Buytenhek wrote: > On Wed, 13 Dec 2000, James Simmons wrote: > > How about placing > > echo '\033[?17;120c' > > In one of your startup scripts. This will give you this nice BSD > > cursor you like. > > [ buytenh@mara buytenh]$ tail -1 ~/.bash_profile > echo -e -n '\033[?17;127c' > [buytenh@mara buytenh]$ > > This has Issues though: try entering vi for example. My /etc/inittab has lines that look like this: 1:2345:respawn:/sbin/getty 38400 tty1 -I '^[c^[[?17;55;248c' This gives me a nice red non-blinking cursor. No problems with vi whatsoever. Of course, this only works on the console, but for my terminal windows, I can set these values in resource or configuration files. So, all of this is a user-space problem. No need to complicate the kernel code. Markus -- Markus GutschkeResonate, Inc. 3637 Fillmore Street #106 385 Moffett Park Drive San Francisco, CA 94123-1600 Sunnyvale, CA 94089 +1-415-567-8449+1-408-548-5528 [EMAIL PROTECTED][EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/