Bug#1028309: linux-image-6.0.0-6-amd64: Regression in Kernel 6.0: System partially freezes with "nvme controller is down"
Control: tag -1 moreinfo Hi Julian, On Thursday, 12 January 2023 19:31:37 CET Julian Groß wrote: > My computer just froze on Kernel version 5.19.11-1. Though nothing got > logged. What's the current status on this bug? Currently there is a version 6.1.15 in Testing/Unstable and it would be useful to know if the problem is still there. There's another version in the pipeline (6.1.19 or higher) which is a rather big update, so testing that when it becomes available is useful too. I also looked into the 'forwarded' URL and saw questions by (upstream) kernel maintainers, but I didn't see a response from you to those questions? Any progress in capturing output f.e. as Bernard described? signature.asc Description: This is a digitally signed message part.
Bug#1028309: linux-image-6.0.0-6-amd64: Regression in Kernel 6.0: System partially freezes with "nvme controller is down"
On Thu, 12 Jan 2023 18:31:37 + =?UTF-8?Q?Julian_Gro=c3=9f?= wrote: My computer just froze on Kernel version 5.19.11-1. Though nothing got logged. Now I don't know if this is the same issue and newer kernels just behave better in terms of logging, or if this is a different issue.. I guess I will add a second screen to my computer, open journalctl on there, and point a camera at it to capture the issue. Hello Julian, Capturing with a camera might be possible. But if you have a serial port at the target machine, another capturing system available, and a cable connecting them, it might be more convenient to capture the output from the serial port, to have it in text form. Below links show the kernel parameters at the target machine: https://wiki.archlinux.org/title/working_with_the_serial_console#Kernel And at the capturing machine you could start "script", that would record all that happens to a file "typescript" in the current directory, and inside start "screen" to the other machine like here: https://wiki.archlinux.org/title/working_with_the_serial_console#Screen If the machine is still kind of responsive then it might also be possible to capture through a network connection via "netconsole", more details in following link: https://wiki.ubuntu.com/Kernel/Netconsole Kind regards, Bernhard
Bug#1028309: linux-image-6.0.0-6-amd64: Regression in Kernel 6.0: System partially freezes with "nvme controller is down"
My computer just froze on Kernel version 5.19.11-1. Though nothing got logged. Now I don't know if this is the same issue and newer kernels just behave better in terms of logging, or if this is a different issue.. I guess I will add a second screen to my computer, open journalctl on there, and point a camera at it to capture the issue. I am pretty sure that I have had it not log on a 6.0 Kernel as well, but since I have been troubleshooting this for weeks now I am not completely sure. OpenPGP_0xAF605C87F9E5AE94.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
Bug#1028309: linux-image-6.0.0-6-amd64: Regression in Kernel 6.0: System partially freezes with "nvme controller is down"
On Thu, 12 Jan 2023 15:41:03 +0100 Diederik de Haas wrote: > "Currently, I am using git bisect to narrow down the window of possible > commits, but since the issue appears seemingly random, it will take many > months to identify the offending commit this way." > > Why? It *could* be that the maintainers will wait for the result of the > `git bisect` before responding/acting upon it. My intention has been to continue with the git bisect. However, I have already encountered two revisions that do not build, so I doubt I can get very far there. I will wait till next week and then report whatever additional information I was able to get through git bisect to the Kernel maintainers. Then I will most likely also tell them that I will not continue with git bisect unless they have a smaller window of revisions for me to try.
Bug#1028309: linux-image-6.0.0-6-amd64: Regression in Kernel 6.0: System partially freezes with "nvme controller is down"
Hi Julian, On Thursday, 12 January 2023 12:11:57 CET Julian wrote: > The message to linux-nvme finally came through and the thread is here: > http://lists.infradead.org/pipermail/linux-nvme/2023-January/037384.html The following paragraph may not be ideally formulated: "Currently, I am using git bisect to narrow down the window of possible commits, but since the issue appears seemingly random, it will take many months to identify the offending commit this way." Why? It *could* be that the maintainers will wait for the result of the `git bisect` before responding/acting upon it. Hopefully I'm wrong, but if they don't respond in a 'reasonable' time frame, you may want to clarify that you actually don't want to do the `git bisect` exactly because it could take many months. Cheers, Diederik signature.asc Description: This is a digitally signed message part.
Bug#1028309: linux-image-6.0.0-6-amd64: Regression in Kernel 6.0: System partially freezes with "nvme controller is down"
Control: forwarded -1 http://lists.infradead.org/pipermail/linux-nvme/2023-January/037384.html On Thursday, 12 January 2023 12:11:57 CET Julian wrote: > The message to linux-nvme finally came through and the thread is here: > http://lists.infradead.org/pipermail/linux-nvme/2023-January/037384.html Thanks! signature.asc Description: This is a digitally signed message part.
Bug#1028309: linux-image-6.0.0-6-amd64: Regression in Kernel 6.0: System partially freezes with "nvme controller is down"
On Wed, 11 Jan 2023 21:49:45 +0100 Diederik de Haas wrote: > So the next thing to do, is present the issue to the relevant upstream > maintainers. Searching for "nvme controller is down" brought up another bug > (but that happened pretty instantly) and in there the request was made to > report the issue (via email) to the linux-...@vger.kernel.org and > linux-n...@lists.infradead.org lists. > And that seems like the next best step for you too. The message to linux-nvme finally came through and the thread is here: http://lists.infradead.org/pipermail/linux-nvme/2023-January/037384.html For linux-pci, I am not sure if it worked. I got a "Delivery status OK" and a "BOUNCE linux-...@vger.kernel.org: Message too long (>10 chars)". Obviously my message isn't >10, so I assume they have a problem with one of the attachments. But the message doesn't contain any information about if the mail got refused or not. And more importantly, the "Delivery status OK" message, specifically says that the mail got delivered to the mailing lists address.
Bug#1028309: linux-image-6.0.0-6-amd64: Regression in Kernel 6.0: System partially freezes with "nvme controller is down"
On Wednesday, 11 January 2023 19:28:25 CET Julian Groß wrote: > On Mon, 09 Jan 2023 14:09:30 +0100 Diederik de Haas > wrote: > > https://wiki.debian.org/DebianKernel/GitBisect describes a procedure > > to find the exact commit which introduced the issue you reported, but it's > > often faster to first narrow down the range using snapshot.d.o. > > The way I understand the `git bisect`, and with the issue taking > sometimes days to happen, I will be sitting on this for months by the way. Yep, that would/could be the consequence, which I can fully understand is not desirable or very useful. Having the exact offending commit is ideal, but not a '100%' requirement. As you've determined that it already happened with 6.0~rc7 and not with 5.19.x, that's already a reasonably small range (likely introduced in the 6.0 merge window). So the next thing to do, is present the issue to the relevant upstream maintainers. Searching for "nvme controller is down" brought up another bug (but that happened pretty instantly) and in there the request was made to report the issue (via email) to the linux-...@vger.kernel.org and linux-n...@lists.infradead.org lists. And that seems like the next best step for you too. Use the information you provided in your initial bug report and add the extra findings (ie 6.1-rc7) to that too. When you've done that, please inform this bug report where you did that so that we can track it's progress too. Cheers, Diederik signature.asc Description: This is a digitally signed message part.
Bug#1028309: linux-image-6.0.0-6-amd64: Regression in Kernel 6.0: System partially freezes with "nvme controller is down"
On Mon, 09 Jan 2023 14:09:30 +0100 Diederik de Haas wrote: > https://wiki.debian.org/DebianKernel/GitBisect describes a procedure to find > the exact commit which introduced the issue you reported, but it's often > faster to first narrow down the range using snapshot.d.o. The way I understand the `git bisect`, and with the issue taking sometimes days to happen, I will be sitting on this for months by the way. What happens if I give `git bisect` a false “good”? Because it is perfectly possible that my computer will run fine for 48 hours; I will tell git bisect that the revision is good; But the issue actually didn't trigger just by chance, instead of it not being in the revision. OpenPGP_0xAF605C87F9E5AE94.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
Bug#1028309: linux-image-6.0.0-6-amd64: Regression in Kernel 6.0: System partially freezes with "nvme controller is down"
Control: found -1 6.0~rc7-1~exp1 On Wednesday, 11 January 2023 15:04:30 CET Julian wrote: > 6.0~rc7-1~exp1 is also broken. > > I will go through https://wiki.debian.org/DebianKernel/GitBisect and see > what I can find. Thankfully the documentation is quite comprehensive. That would be great, thanks! signature.asc Description: This is a digitally signed message part.
Bug#1028309: linux-image-6.0.0-6-amd64: Regression in Kernel 6.0: System partially freezes with "nvme controller is down"
6.0~rc7-1~exp1 is also broken. I will go through https://wiki.debian.org/DebianKernel/GitBisect and see what I can find. Thankfully the documentation is quite comprehensive.
Bug#1028309: linux-image-6.0.0-6-amd64: Regression in Kernel 6.0: System partially freezes with "nvme controller is down"
Thanks for your help. > Unstable currently has version 6.1.4-1, could you try that to see whether the > issue is already resolved? Version 6.1.4-1 shows the same issue. > If not, then we need to figure out when the issue first occurred. > Via https://snapshot.debian.org/binary/linux-image-amd64/ you can find several > other kernel versions from the 6.0.x series, could you try those? > It's probably quickest to try 6.0~rc7-1~exp1 first. I will try 6.0~rc7-1~exp1 next. OpenPGP_0xAF605C87F9E5AE94.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
Bug#1028309: linux-image-6.0.0-6-amd64: Regression in Kernel 6.0: System partially freezes with "nvme controller is down"
Control: found -1 6.0.10-1 On Monday, 9 January 2023 13:20:08 CET Julian Groß wrote: > when running Linux Kernel version 6.0.12 or 6.0.10, my system seemingly > randomly freezes due to the filesystem being set to read-only due to an > issue with my nvme controller. > The issue does *not* appear on Linux Kernel version 5.19.11 or lower. > > If there is any more information I might be able to provide, do not hesitate > to ask. Unstable currently has version 6.1.4-1, could you try that to see whether the issue is already resolved? If not, then we need to figure out when the issue first occurred. Via https://snapshot.debian.org/binary/linux-image-amd64/ you can find several other kernel versions from the 6.0.x series, could you try those? It's probably quickest to try 6.0~rc7-1~exp1 first. https://wiki.debian.org/DebianKernel/GitBisect describes a procedure to find the exact commit which introduced the issue you reported, but it's often faster to first narrow down the range using snapshot.d.o. signature.asc Description: This is a digitally signed message part.