Bug#1040416: linux-image-6.1.0-9-amd64: Under heavy load Debian V12 and V11 causes data corruption on XFS filesystems.

2023-11-08 Thread Jose M Calhariz
Hi

On Tue, Nov 07, 2023 at 08:33:58PM +0100, Diederik de Haas wrote:
> Control: found -1 6.1~rc3-1~exp1
> Control: found -1 6.1.55-1
> 
> On Saturday, 4 November 2023 20:35:43 CET Jose M Calhariz wrote:
> > > Ok. Please test (when you have time) 6.1.55-1.
> > 
> > Fail : Linux afs31 6.1.0-0-amd64 #1 SMP PREEMPT_DYNAMIC Debian
> > 6.1~rc3-1~exp1 (2022-11-02) x86_64 GNU/Linux
> > 
> > Fail : Linux afs31 6.1.0-13-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.55-1
> > (2023-09-29) x86_64 GNU/Linux
> > 
> > Done.  I tested even the first 6.1 on Debian.  Both of them failed.
> 
> Thanks, updated metadata accordingly.
> So now we know it's indeed present in the whole 6.1 series.
> 
> > > Unfortunately there isn't a 6.2 kernel uploaded to the Debian archive and
> > > thus not available on snapshot.d.o, but testing 6.3.1-1~exp1 should be
> > > useful.
> 
> Please test with with 6.3.1-1~exp1 to make sure it was fixed then (too).
> 
> Unfortunately, the commit list between 6.1 and 6.3.1 is quite large:
> me@pc:~/dev/kernel.org/linux$ git log --oneline v6.1..v6.3.1 -- fs/xfs | wc -l
> 159
> 
> If that list was small, I could've suggested to try 'backporting' a couple of 
> patches, but that avenue seems rather pointless in this case.
> 
> It's probably also useful to verify whether it's also present in the whole 
> 5.10 series, which should give (even) more data points.
> 
> I think the next step should be to 'forward' this bug report to the upstream 
> mailing list at linux-...@vger.kernel.org

I do not follow closely linux-xfs mailing list, but I think other
people already reported problems with 6.1 and are trying to do the
effort of delimiting the patch and test a backport to 6.1.

Kind regards
Jose M Calhariz

-- 
--
Egoista, s. m. Um sujeito mais interessado em si prĂ³prio que
em mim.
-- Ambrose Bierce


signature.asc
Description: PGP signature


Processed: Re: Bug#1040416: linux-image-6.1.0-9-amd64: Under heavy load Debian V12 and V11 causes data corruption on XFS filesystems.

2023-11-07 Thread Debian Bug Tracking System
Processing control commands:

> found -1 6.1~rc3-1~exp1
Bug #1040416 [src:linux] linux-image-6.1.0-9-amd64: Under heavy load Debian V12 
and V11 causes data corruption on XFS filesystems.
Marked as found in versions linux/6.1~rc3-1~exp1.
> found -1 6.1.55-1
Bug #1040416 [src:linux] linux-image-6.1.0-9-amd64: Under heavy load Debian V12 
and V11 causes data corruption on XFS filesystems.
Marked as found in versions linux/6.1.55-1.

-- 
1040416: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1040416
Debian Bug Tracking System
Contact ow...@bugs.debian.org with problems



Bug#1040416: linux-image-6.1.0-9-amd64: Under heavy load Debian V12 and V11 causes data corruption on XFS filesystems.

2023-11-07 Thread Diederik de Haas
Control: found -1 6.1~rc3-1~exp1
Control: found -1 6.1.55-1

On Saturday, 4 November 2023 20:35:43 CET Jose M Calhariz wrote:
> > Ok. Please test (when you have time) 6.1.55-1.
> 
> Fail : Linux afs31 6.1.0-0-amd64 #1 SMP PREEMPT_DYNAMIC Debian
> 6.1~rc3-1~exp1 (2022-11-02) x86_64 GNU/Linux
> 
> Fail : Linux afs31 6.1.0-13-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.55-1
> (2023-09-29) x86_64 GNU/Linux
> 
> Done.  I tested even the first 6.1 on Debian.  Both of them failed.

Thanks, updated metadata accordingly.
So now we know it's indeed present in the whole 6.1 series.

> > Unfortunately there isn't a 6.2 kernel uploaded to the Debian archive and
> > thus not available on snapshot.d.o, but testing 6.3.1-1~exp1 should be
> > useful.

Please test with with 6.3.1-1~exp1 to make sure it was fixed then (too).

Unfortunately, the commit list between 6.1 and 6.3.1 is quite large:
me@pc:~/dev/kernel.org/linux$ git log --oneline v6.1..v6.3.1 -- fs/xfs | wc -l
159

If that list was small, I could've suggested to try 'backporting' a couple of 
patches, but that avenue seems rather pointless in this case.

It's probably also useful to verify whether it's also present in the whole 
5.10 series, which should give (even) more data points.

I think the next step should be to 'forward' this bug report to the upstream 
mailing list at linux-...@vger.kernel.org

signature.asc
Description: This is a digitally signed message part.


Bug#1040416: linux-image-6.1.0-9-amd64: Under heavy load Debian V12 and V11 causes data corruption on XFS filesystems.

2023-11-04 Thread Jose M Calhariz
Hi

On Thu, Nov 02, 2023 at 07:40:38PM +0100, Diederik de Haas wrote:
> 
> On Thursday, 2 November 2023 18:03:25 CET Jose M Calhariz wrote:
> > On Thu, Nov 02, 2023 at 03:37:39PM +0100, Diederik de Haas wrote:
> > > On Wednesday, 5 July 2023 19:07:15 CET Jose M Calhariz wrote:
> > > > Package: src:linux
> > > > Version: 6.1.27-1
> > > 
> > > Can you try with the latest version in the 6.1.x series to see if the
> > > problem is still there?
> > 
> > As I need to setup ASAP the servers in production I do not know if I
> > have time in the next days.  It works with backports kernels.
> 
> No problem.
> 
> > The latest kernels I tested were:
> > Fail : Linux afs31 6.1.0-10-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.37-1
> > (2023-07-03) x86_64 GNU/Linux
> 
> Ok. Please test (when you have time) 6.1.55-1.

Fail : Linux afs31 6.1.0-0-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1~rc3-1~exp1 
(2022-11-02) x86_64 GNU/Linux

Fail : Linux afs31 6.1.0-13-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.55-1 
(2023-09-29) x86_64 GNU/Linux

Done.  I tested even the first 6.1 on Debian.  Both of them failed.




> Also verify if it's also present in 6.1~rc3-1~exp1 to make sure it's present 
> in the whole 6.1 series.
> Use https://snapshot.debian.org/binary/linux-image-amd64/ to get it/them.
> 
> If the bug is NOT present in either the latest or the first, then try other 
> versions till you find the last one that work and the first one that fails.
> 
> > OK : Linux afs31 6.4.0-0.deb12.2-amd64 #1 SMP PREEMPT_DYNAMIC Debian
> > 6.4.4-3~bpo12+1 (2023-08-08) x86_64 GNU/Linux
> 
> It was fixed in 6.3.7-1, so it was expected that a later versions also works.
> But let's ignore bpo as it likely won't provide useful data points.
> 
> Unfortunately there isn't a 6.2 kernel uploaded to the Debian archive and 
> thus 
> not available on snapshot.d.o, but testing 6.3.1-1~exp1 should be useful.
> 
> > The bug is present on Debian v11 too.  So is an old bug with fixes on
> > kernel 6.2 rc something.
> 
> I'd recommend to focus first on the 6.1 series for now.
> If at a later point testing with 5.10 may be useful, we can do that then.


Kind regards
Jose M Calhariz


-- 
--
A vida feliz, meu Deus, consiste em nos alegrarmos em vos,
de vos e por vos


signature.asc
Description: PGP signature


Processed: Re: Bug#1040416: linux-image-6.1.0-9-amd64: Under heavy load Debian V12 and V11 causes data corruption on XFS filesystems.

2023-11-02 Thread Debian Bug Tracking System
Processing control commands:

> found -1 6.1.37-1
Bug #1040416 [src:linux] linux-image-6.1.0-9-amd64: Under heavy load Debian V12 
and V11 causes data corruption on XFS filesystems.
Marked as found in versions linux/6.1.37-1.

-- 
1040416: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1040416
Debian Bug Tracking System
Contact ow...@bugs.debian.org with problems



Bug#1040416: linux-image-6.1.0-9-amd64: Under heavy load Debian V12 and V11 causes data corruption on XFS filesystems.

2023-11-02 Thread Diederik de Haas
Control: found -1 6.1.37-1

On Thursday, 2 November 2023 18:03:25 CET Jose M Calhariz wrote:
> On Thu, Nov 02, 2023 at 03:37:39PM +0100, Diederik de Haas wrote:
> > On Wednesday, 5 July 2023 19:07:15 CET Jose M Calhariz wrote:
> > > Package: src:linux
> > > Version: 6.1.27-1
> > 
> > Can you try with the latest version in the 6.1.x series to see if the
> > problem is still there?
> 
> As I need to setup ASAP the servers in production I do not know if I
> have time in the next days.  It works with backports kernels.

No problem.

> The latest kernels I tested were:
> Fail : Linux afs31 6.1.0-10-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.37-1
> (2023-07-03) x86_64 GNU/Linux

Ok. Please test (when you have time) 6.1.55-1.
Also verify if it's also present in 6.1~rc3-1~exp1 to make sure it's present 
in the whole 6.1 series.
Use https://snapshot.debian.org/binary/linux-image-amd64/ to get it/them.

If the bug is NOT present in either the latest or the first, then try other 
versions till you find the last one that work and the first one that fails.

> OK : Linux afs31 6.4.0-0.deb12.2-amd64 #1 SMP PREEMPT_DYNAMIC Debian
> 6.4.4-3~bpo12+1 (2023-08-08) x86_64 GNU/Linux

It was fixed in 6.3.7-1, so it was expected that a later versions also works.
But let's ignore bpo as it likely won't provide useful data points.

Unfortunately there isn't a 6.2 kernel uploaded to the Debian archive and thus 
not available on snapshot.d.o, but testing 6.3.1-1~exp1 should be useful.

> The bug is present on Debian v11 too.  So is an old bug with fixes on
> kernel 6.2 rc something.

I'd recommend to focus first on the 6.1 series for now.
If at a later point testing with 5.10 may be useful, we can do that then.

signature.asc
Description: This is a digitally signed message part.


Bug#1040416: linux-image-6.1.0-9-amd64: Under heavy load Debian V12 and V11 causes data corruption on XFS filesystems.

2023-11-02 Thread Jose M Calhariz
On Thu, Nov 02, 2023 at 03:37:39PM +0100, Diederik de Haas wrote:
> Control: tag -1 moreinfo
> 
> On Wednesday, 5 July 2023 19:07:15 CET Jose M Calhariz wrote:
> > Package: src:linux
> > Version: 6.1.27-1
> 
> Can you try with the latest version in the 6.1.x series to see if the problem 
> is still there?

As I need to setup ASAP the servers in production I do not know if I
have time in the next days.  It works with backports kernels.

The latest kernels I tested were:

Fail : Linux afs31 6.1.0-10-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.37-1 
(2023-07-03) x86_64 GNU/Linux

OK : Linux afs31 6.4.0-0.deb12.2-amd64 #1 SMP PREEMPT_DYNAMIC Debian 
6.4.4-3~bpo12+1 (2023-08-08) x86_64 GNU/Linux



> 
> > On this hardware I am chasing a data corruption for several months on
> > Debian V11 and Debian v12.  Now that I was pointed that linux kernel
> > had some problems with XFS solved in later 6.3 kernel I can reproduce
> > the problem.
> > 
> > It seams the problem went away with current Debian testing kernel:
> > 
> > ii  linux-image-6.3.0-1-amd646.3.7-1  amd64Linux 6.3
> > for 64-bit PCs (signed)
> > 
> > Is there anyone willing to backport the XFS fixes into
> > linux-image-6.1.0 and linux-image-5.10.0?
> 
> If the problem is still present in the latest 6.1 kernel, then you need to 
> find 
> out which patch(es) actually fix the problem.
> The easiest way to start with that is to find the last kernel which exhibits 
> the issue and then the first one where it is fixed.
> https://snapshot.debian.org/binary/linux-image-amd64/ should help
> with that.

The bug is present on Debian v11 too.  So is an old bug with fixes on
kernel 6.2 rc something.

> 
> When the range has been narrowed, a `git bisect` should identify the specific 
> commit(s) which fixes the issue.
> https://wiki.debian.org/DebianKernel/GitBisect should help with that
> 
> When that/those have been identified, it should be reported to the upstream 
> kernel so that they can incorporate those fixes in their LTS kernel(s) which 
> Debian then will pick up automatically.
> 
> HTH



-- 
--
A vida feliz, meu Deus, consiste em nos alegrarmos em vos,
de vos e por vos


signature.asc
Description: PGP signature


Processed: Re: Bug#1040416: linux-image-6.1.0-9-amd64: Under heavy load Debian V12 and V11 causes data corruption on XFS filesystems.

2023-11-02 Thread Debian Bug Tracking System
Processing control commands:

> tag -1 moreinfo
Bug #1040416 [src:linux] linux-image-6.1.0-9-amd64: Under heavy load Debian V12 
and V11 causes data corruption on XFS filesystems.
Added tag(s) moreinfo.

-- 
1040416: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1040416
Debian Bug Tracking System
Contact ow...@bugs.debian.org with problems



Bug#1040416: linux-image-6.1.0-9-amd64: Under heavy load Debian V12 and V11 causes data corruption on XFS filesystems.

2023-11-02 Thread Diederik de Haas
Control: tag -1 moreinfo

On Wednesday, 5 July 2023 19:07:15 CET Jose M Calhariz wrote:
> Package: src:linux
> Version: 6.1.27-1

Can you try with the latest version in the 6.1.x series to see if the problem 
is still there?

> On this hardware I am chasing a data corruption for several months on
> Debian V11 and Debian v12.  Now that I was pointed that linux kernel
> had some problems with XFS solved in later 6.3 kernel I can reproduce
> the problem.
> 
> It seams the problem went away with current Debian testing kernel:
> 
> ii  linux-image-6.3.0-1-amd646.3.7-1  amd64Linux 6.3
> for 64-bit PCs (signed)
> 
> Is there anyone willing to backport the XFS fixes into
> linux-image-6.1.0 and linux-image-5.10.0?

If the problem is still present in the latest 6.1 kernel, then you need to find 
out which patch(es) actually fix the problem.
The easiest way to start with that is to find the last kernel which exhibits 
the issue and then the first one where it is fixed.
https://snapshot.debian.org/binary/linux-image-amd64/ should help with that.

When the range has been narrowed, a `git bisect` should identify the specific 
commit(s) which fixes the issue.
https://wiki.debian.org/DebianKernel/GitBisect should help with that

When that/those have been identified, it should be reported to the upstream 
kernel so that they can incorporate those fixes in their LTS kernel(s) which 
Debian then will pick up automatically.

HTH

signature.asc
Description: This is a digitally signed message part.