Bug#1028309: linux-image-6.0.0-6-amd64: Regression in Kernel 6.0: System partially freezes with "nvme controller is down"

2023-03-14 Thread Diederik de Haas
Control: tag -1 moreinfo

Hi Julian,

On Thursday, 12 January 2023 19:31:37 CET Julian Groß wrote:
> My computer just froze on Kernel version 5.19.11-1. Though nothing got
> logged.

What's the current status on this bug?
Currently there is a version 6.1.15 in Testing/Unstable and it would be useful 
to know if the problem is still there. There's another version in the pipeline 
(6.1.19 or higher) which is a rather big update, so testing that when it 
becomes available is useful too.

I also looked into the 'forwarded' URL and saw questions by (upstream) kernel 
maintainers, but I didn't see a response from you to those questions?

Any progress in capturing output f.e. as Bernard described?

signature.asc
Description: This is a digitally signed message part.


Bug#1028309: linux-image-6.0.0-6-amd64: Regression in Kernel 6.0: System partially freezes with "nvme controller is down"

2023-01-23 Thread Bernhard Übelacker



On Thu, 12 Jan 2023 18:31:37 + =?UTF-8?Q?Julian_Gro=c3=9f?= 
 wrote:
My computer just froze on Kernel version 5.19.11-1. Though nothing got 
logged.
Now I don't know if this is the same issue and newer kernels just behave 
better in terms of logging, or if this is a different issue..


I guess I will add a second screen to my computer, open journalctl on 
there, and point a camera at it to capture the issue.


Hello Julian,
Capturing with a camera might be possible.
But if you have a serial port at the target machine,
another capturing system available, and a cable connecting them,
it might be more convenient to capture the output from the serial port,
to have it in text form.


Below links show the kernel parameters at the target machine:
https://wiki.archlinux.org/title/working_with_the_serial_console#Kernel

And at the capturing machine you could start "script", that would
record all that happens to a file "typescript" in the current directory,
and inside start "screen" to the other machine like here:
https://wiki.archlinux.org/title/working_with_the_serial_console#Screen


If the machine is still kind of responsive then it might also be possible
to capture through a network connection via "netconsole", more details
in following link:
https://wiki.ubuntu.com/Kernel/Netconsole

Kind regards,
Bernhard



Bug#1028309: linux-image-6.0.0-6-amd64: Regression in Kernel 6.0: System partially freezes with "nvme controller is down"

2023-01-12 Thread Julian Groß
My computer just froze on Kernel version 5.19.11-1. Though nothing got 
logged.
Now I don't know if this is the same issue and newer kernels just behave 
better in terms of logging, or if this is a different issue..


I guess I will add a second screen to my computer, open journalctl on 
there, and point a camera at it to capture the issue.


I am pretty sure that I have had it not log on a 6.0 Kernel as well, but 
since I have been troubleshooting this for weeks now I am not completely 
sure.


OpenPGP_0xAF605C87F9E5AE94.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Bug#1028309: linux-image-6.0.0-6-amd64: Regression in Kernel 6.0: System partially freezes with "nvme controller is down"

2023-01-12 Thread Julian
On Thu, 12 Jan 2023 15:41:03 +0100 Diederik de Haas  
wrote:
> "Currently, I am using git bisect to narrow down the window of possible
> commits, but since the issue appears seemingly random, it will take many
> months to identify the offending commit this way."
>
> Why? It *could* be that the maintainers will wait for the result of the
> `git bisect` before responding/acting upon it.

My intention has been to continue with the git bisect.
However, I have already encountered two revisions that do not build, so I doubt 
I can get very far there.

I will wait till next week and then report whatever additional information I 
was able to get through git bisect to the Kernel maintainers. Then I will most 
likely also tell them that I will not continue with git bisect unless they have 
a smaller window of revisions for me to try.


Bug#1028309: linux-image-6.0.0-6-amd64: Regression in Kernel 6.0: System partially freezes with "nvme controller is down"

2023-01-12 Thread Diederik de Haas
Hi Julian,

On Thursday, 12 January 2023 12:11:57 CET Julian wrote:
> The message to linux-nvme finally came through and the thread is here:
> http://lists.infradead.org/pipermail/linux-nvme/2023-January/037384.html

The following paragraph may not be ideally formulated:

"Currently, I am using git bisect to narrow down the window of possible 
commits, but since the issue appears seemingly random, it will take many 
months to identify the offending commit this way."

Why? It *could* be that the maintainers will wait for the result of the
`git bisect` before responding/acting upon it.

Hopefully I'm wrong, but if they don't respond in a 'reasonable' time frame, 
you may want to clarify that you actually don't want to do the `git bisect` 
exactly because it could take many months.

Cheers,
  Diederik

signature.asc
Description: This is a digitally signed message part.


Bug#1028309: linux-image-6.0.0-6-amd64: Regression in Kernel 6.0: System partially freezes with "nvme controller is down"

2023-01-12 Thread Diederik de Haas
Control: forwarded -1 
http://lists.infradead.org/pipermail/linux-nvme/2023-January/037384.html

On Thursday, 12 January 2023 12:11:57 CET Julian wrote:
> The message to linux-nvme finally came through and the thread is here:
> http://lists.infradead.org/pipermail/linux-nvme/2023-January/037384.html

Thanks!

signature.asc
Description: This is a digitally signed message part.


Bug#1028309: linux-image-6.0.0-6-amd64: Regression in Kernel 6.0: System partially freezes with "nvme controller is down"

2023-01-12 Thread Julian
On Wed, 11 Jan 2023 21:49:45 +0100 Diederik de Haas  
wrote:
> So the next thing to do, is present the issue to the relevant upstream
> maintainers. Searching for "nvme controller is down" brought up another bug
> (but that happened pretty instantly) and in there the request was made to
> report the issue (via email) to the linux-...@vger.kernel.org and
> linux-n...@lists.infradead.org lists.
> And that seems like the next best step for you too.

The message to linux-nvme finally came through and the thread is here: 
http://lists.infradead.org/pipermail/linux-nvme/2023-January/037384.html

For linux-pci, I am not sure if it worked.
I got a "Delivery status OK" and a "BOUNCE linux-...@vger.kernel.org: 
Message too long (>10 chars)".
Obviously my message isn't >10, so I assume they have a problem with one of 
the attachments.
But the message doesn't contain any information about if the mail got refused 
or not.
And more importantly, the "Delivery status OK" message, specifically says that 
the mail got delivered to the mailing lists address.


Bug#1028309: linux-image-6.0.0-6-amd64: Regression in Kernel 6.0: System partially freezes with "nvme controller is down"

2023-01-11 Thread Diederik de Haas
On Wednesday, 11 January 2023 19:28:25 CET Julian Groß wrote:
> On Mon, 09 Jan 2023 14:09:30 +0100 Diederik de Haas
>  wrote:
>  > https://wiki.debian.org/DebianKernel/GitBisect describes a procedure
>  > to find the exact commit which introduced the issue you reported, but it's
>  > often faster to first narrow down the range using snapshot.d.o.
> 
> The way I understand the `git bisect`, and with the issue taking
> sometimes days to happen, I will be sitting on this for months by the way.

Yep, that would/could be the consequence, which I can fully understand is not 
desirable or very useful.
Having the exact offending commit is ideal, but not a '100%' requirement.
As you've determined that it already happened with 6.0~rc7 and not with 
5.19.x, that's already a reasonably small range (likely introduced in the 6.0 
merge window).

So the next thing to do, is present the issue to the relevant upstream 
maintainers. Searching for "nvme controller is down" brought up another bug 
(but that happened pretty instantly) and in there the request was made to 
report the issue (via email) to the linux-...@vger.kernel.org and 
linux-n...@lists.infradead.org lists.
And that seems like the next best step for you too.

Use the information you provided in your initial bug report and add the extra 
findings (ie 6.1-rc7) to that too.
When you've done that, please inform this bug report where you did that so 
that we can track it's progress too.

Cheers,
  Diederik

signature.asc
Description: This is a digitally signed message part.


Bug#1028309: linux-image-6.0.0-6-amd64: Regression in Kernel 6.0: System partially freezes with "nvme controller is down"

2023-01-11 Thread Julian Groß
On Mon, 09 Jan 2023 14:09:30 +0100 Diederik de Haas 
 wrote:
> https://wiki.debian.org/DebianKernel/GitBisect describes a procedure 
to find

> the exact commit which introduced the issue you reported, but it's often
> faster to first narrow down the range using snapshot.d.o.

The way I understand the `git bisect`, and with the issue taking 
sometimes days to happen, I will be sitting on this for months by the way.


What happens if I give `git bisect` a false “good”?
Because it is perfectly possible that my computer will run fine for 48 
hours; I will tell git bisect that the revision is good; But the issue 
actually didn't trigger just by chance, instead of it not being in the 
revision.


OpenPGP_0xAF605C87F9E5AE94.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Bug#1028309: linux-image-6.0.0-6-amd64: Regression in Kernel 6.0: System partially freezes with "nvme controller is down"

2023-01-11 Thread Diederik de Haas
Control: found -1 6.0~rc7-1~exp1

On Wednesday, 11 January 2023 15:04:30 CET Julian wrote:
> 6.0~rc7-1~exp1 is also broken.
> 
> I will go through https://wiki.debian.org/DebianKernel/GitBisect and see
> what I can find. Thankfully the documentation is quite comprehensive.

That would be great, thanks!

signature.asc
Description: This is a digitally signed message part.


Bug#1028309: linux-image-6.0.0-6-amd64: Regression in Kernel 6.0: System partially freezes with "nvme controller is down"

2023-01-11 Thread Julian
6.0~rc7-1~exp1 is also broken.

I will go through https://wiki.debian.org/DebianKernel/GitBisect and see what I 
can find.
Thankfully the documentation is quite comprehensive.


Bug#1028309: linux-image-6.0.0-6-amd64: Regression in Kernel 6.0: System partially freezes with "nvme controller is down"

2023-01-10 Thread Julian Groß

Thanks for your help.

> Unstable currently has version 6.1.4-1, could you try that to see 
whether the

> issue is already resolved?

Version 6.1.4-1 shows the same issue.

> If not, then we need to figure out when the issue first occurred.
> Via https://snapshot.debian.org/binary/linux-image-amd64/ you can 
find several

> other kernel versions from the 6.0.x series, could you try those?
> It's probably quickest to try 6.0~rc7-1~exp1 first.

I will try 6.0~rc7-1~exp1 next.


OpenPGP_0xAF605C87F9E5AE94.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Bug#1028309: linux-image-6.0.0-6-amd64: Regression in Kernel 6.0: System partially freezes with "nvme controller is down"

2023-01-09 Thread Diederik de Haas
Control: found -1 6.0.10-1

On Monday, 9 January 2023 13:20:08 CET Julian Groß wrote:
> when running Linux Kernel version 6.0.12 or 6.0.10, my system seemingly
> randomly freezes due to the filesystem being set to read-only due to an
> issue with my nvme controller. 
> The issue does *not* appear on Linux Kernel version 5.19.11 or lower. 
> 
> If there is any more information I might be able to provide, do not hesitate
> to ask.

Unstable currently has version 6.1.4-1, could you try that to see whether the 
issue is already resolved?

If not, then we need to figure out when the issue first occurred.
Via https://snapshot.debian.org/binary/linux-image-amd64/ you can find several 
other kernel versions from the 6.0.x series, could you try those?
It's probably quickest to try 6.0~rc7-1~exp1 first.

https://wiki.debian.org/DebianKernel/GitBisect describes a procedure to find 
the exact commit which introduced the issue you reported, but it's often 
faster to first narrow down the range using snapshot.d.o.

signature.asc
Description: This is a digitally signed message part.