Re: ext4 file system corruption with v4.19.3 / v4.19.4

2018-11-28 Thread Marek Habersack
On 28/11/2018 17:10, Theodore Y. Ts'o wrote:
> On Wed, Nov 28, 2018 at 04:56:51PM +0100, Rainer Fiebig wrote:
>>
>> If you still see the errors, at least the Ubuntu-kernel could be ruled out.
> 
> My impression is that some of the people reporting problems have been
> using stock upstream kernels, so I wasn't really worried about the
Also, the Ubuntu mainline kernel doesn't patch the kernel code, it merely uses 
Ubuntu configs to build the stock kerenel
(you can find the patches in e.g. 
http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.19.5/ at the top of the 
directory)

> Ubuntu kernel (although it could be something about the default
> configs that Ubuntu sets up).  What I was more wondering was whether
> there was something about userspace or default configs of Ubuntu.
> This isn't necessarily a *problem* per se; for examople, not that long
> ago some users were getting surprised when a problem showed up with an
> older version of the LVM2 userspace with newer upstream kernels.
> After a while, you learn to get super paranoid about making sure to
> rule out all possibilities when trying to debug problems that are only
> hitting a set of users.
> 
>   - Ted
> 

marek


Re: ext4 file system corruption with v4.19.3 / v4.19.4

2018-11-28 Thread Marek Habersack
On 28/11/2018 17:10, Theodore Y. Ts'o wrote:
> On Wed, Nov 28, 2018 at 04:56:51PM +0100, Rainer Fiebig wrote:
>>
>> If you still see the errors, at least the Ubuntu-kernel could be ruled out.
> 
> My impression is that some of the people reporting problems have been
> using stock upstream kernels, so I wasn't really worried about the
Also, the Ubuntu mainline kernel doesn't patch the kernel code, it merely uses 
Ubuntu configs to build the stock kerenel
(you can find the patches in e.g. 
http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.19.5/ at the top of the 
directory)

> Ubuntu kernel (although it could be something about the default
> configs that Ubuntu sets up).  What I was more wondering was whether
> there was something about userspace or default configs of Ubuntu.
> This isn't necessarily a *problem* per se; for examople, not that long
> ago some users were getting surprised when a problem showed up with an
> older version of the LVM2 userspace with newer upstream kernels.
> After a while, you learn to get super paranoid about making sure to
> rule out all possibilities when trying to debug problems that are only
> hitting a set of users.
> 
>   - Ted
> 

marek


Re: ext4 file system corruption with v4.19.3 / v4.19.4

2018-11-28 Thread Marek Habersack
On 28/11/2018 05:15, Theodore Y. Ts'o wrote:
> On Wed, Nov 28, 2018 at 03:16:33AM +0300, Andrey Jr. Melnikov wrote:
>> Corrupted inodes - always directory, not touched at least year or
>> more for writing. Something wrong when updating atime?
I've just seen the errors come back despite having MQ off :( However, this time 
it took 5 days for them to come back, so
MQ must play a role here. Also, indeed, they happened after fstrim ran and this 
time *only* on the SSD disks reported
below, another clue? This time the errors were "just" orphaned inodes + invalid 
free inode counts, all repaired without
issues by fsck.

> 
> We're not sure.  The frustrating thing is that it's not reproducing
> for me.  I run extensive regression tests, and I'm using 4.19 on my
> development laptop without notcing any problems.  If I could reproduce
> it, I could debug it, but since I can't, I need to rely on those who
> are seeing the problem to help pinpoint the problem.
> 
> I'm trying to figure out common factors from those people who are
> reporting problems.
> 
> (a) What distribution are you running (it appears that many people
> reporting problems are running Ubuntu, but this may be a sampling
> issue; lots of people run Ubuntu)?  (For the record, I'm using Debian
> Testing.)
Ubuntu 18.10 here

> 
> (b) What hardware are you using?  (SSD?  SATA-attached?
> NVMe-attached?)
The errors occured on both SSD:
  - Samsung SSD 850 EVO 1TB, firmware rev EMT03B6Q
  - OCZ-AGILITY3, firmware rev 2.25

and spinning rust:
  - Seagate ST2000DX001-1CM164, firmware revision CC43

> 
> (c) Are you using LVM?  LUKS (e.g., disk encrypted)?
LUKS. Both the Samsung and the Seagate use DM for encryption.

> (d) are you using discard?  One theory is a recent discard change may
> be in play.   How do you use discard?   (mount option, fstrim, etc.)
fstrim runs weekly and the Samsung SSD is mounted with

   rw,nosuid,nodev,noatime,discard,helper=crypt

marek
> 
>   - Ted
> 



Re: ext4 file system corruption with v4.19.3 / v4.19.4

2018-11-28 Thread Marek Habersack
On 28/11/2018 05:15, Theodore Y. Ts'o wrote:
> On Wed, Nov 28, 2018 at 03:16:33AM +0300, Andrey Jr. Melnikov wrote:
>> Corrupted inodes - always directory, not touched at least year or
>> more for writing. Something wrong when updating atime?
I've just seen the errors come back despite having MQ off :( However, this time 
it took 5 days for them to come back, so
MQ must play a role here. Also, indeed, they happened after fstrim ran and this 
time *only* on the SSD disks reported
below, another clue? This time the errors were "just" orphaned inodes + invalid 
free inode counts, all repaired without
issues by fsck.

> 
> We're not sure.  The frustrating thing is that it's not reproducing
> for me.  I run extensive regression tests, and I'm using 4.19 on my
> development laptop without notcing any problems.  If I could reproduce
> it, I could debug it, but since I can't, I need to rely on those who
> are seeing the problem to help pinpoint the problem.
> 
> I'm trying to figure out common factors from those people who are
> reporting problems.
> 
> (a) What distribution are you running (it appears that many people
> reporting problems are running Ubuntu, but this may be a sampling
> issue; lots of people run Ubuntu)?  (For the record, I'm using Debian
> Testing.)
Ubuntu 18.10 here

> 
> (b) What hardware are you using?  (SSD?  SATA-attached?
> NVMe-attached?)
The errors occured on both SSD:
  - Samsung SSD 850 EVO 1TB, firmware rev EMT03B6Q
  - OCZ-AGILITY3, firmware rev 2.25

and spinning rust:
  - Seagate ST2000DX001-1CM164, firmware revision CC43

> 
> (c) Are you using LVM?  LUKS (e.g., disk encrypted)?
LUKS. Both the Samsung and the Seagate use DM for encryption.

> (d) are you using discard?  One theory is a recent discard change may
> be in play.   How do you use discard?   (mount option, fstrim, etc.)
fstrim runs weekly and the Samsung SSD is mounted with

   rw,nosuid,nodev,noatime,discard,helper=crypt

marek
> 
>   - Ted
> 



Re: ext4 file system corruption with v4.19.3 / v4.19.4

2018-11-27 Thread Marek Habersack
On 27/11/2018 15:32, Guenter Roeck wrote:
Hi,

You might try to see if you have CONFIG_SCSI_MQ_DEFAULT=yes in your kernel 
config. Starting with 4.19.1 it somehow
interferes with ext4 and causes problems similar to the ones you list below. 
Ever since I disabled MQ (either recompile
your kernel or add `scsi_mod.use_blk_mq=0` to the kernel command line) none of 
those errors came back.

hope it helps,

marek
> [trying again, this time with correct kernel.org address]
> 
> Hi,
> 
> I have seen the following and similar problems several times,
> with both v4.19.3 and v4.19.4:
> 
> Nov 23 04:32:25 mars kernel: [112668.673671] EXT4-fs error (device sdb1): 
> ext4_iget:4831: inode #12602889: comm git: bad
> extra_isize 33661 (inode size 256)
> Nov 23 04:32:25 mars kernel: [112668.675217] Aborting journal on device 
> sdb1-8.
> Nov 23 04:32:25 mars kernel: [112668.676681] EXT4-fs (sdb1): Remounting 
> filesystem read-only
> Nov 23 04:32:25 mars kernel: [112668.808886] EXT4-fs error (device sdb1): 
> ext4_iget:4831: inode #12602881: comm rm: bad
> extra_isize 33685 (inode size 256)
> ...
> 
> Nov 25 00:12:43 saturn kernel: [59377.725984] EXT4-fs error (device sda1): 
> ext4_lookup:1578: inode #238034131: comm
> updatedb.mlocat: deleted inode referenced: 238160407
> Nov 25 00:12:43 saturn kernel: [59377.766638] Aborting journal on device 
> sda1-8.
> Nov 25 00:12:43 saturn kernel: [59377.779372] EXT4-fs (sda1): Remounting 
> filesystem read-only
> ...
> 
> Nov 24 01:52:31 saturn kernel: [189085.240016] EXT4-fs error (device sda1): 
> ext4_lookup:1578: inode #52038457: comm
> nfsd: deleted inode referenced: 52043796
> Nov 24 01:52:31 saturn kernel: [189085.263427] Aborting journal on device 
> sda1-8.
> Nov 24 01:52:31 saturn kernel: [189085.275313] EXT4-fs (sda1): Remounting 
> filesystem read-only
> 
> 
> The same systems running v4.18.6 never experienced a problem.
> 
> Has anyone else seen similar problems ? Is there anything I can do
> to help tracking down the problem ?
> 
> Thanks,
> Guenter
> 



Re: ext4 file system corruption with v4.19.3 / v4.19.4

2018-11-27 Thread Marek Habersack
On 27/11/2018 15:32, Guenter Roeck wrote:
Hi,

You might try to see if you have CONFIG_SCSI_MQ_DEFAULT=yes in your kernel 
config. Starting with 4.19.1 it somehow
interferes with ext4 and causes problems similar to the ones you list below. 
Ever since I disabled MQ (either recompile
your kernel or add `scsi_mod.use_blk_mq=0` to the kernel command line) none of 
those errors came back.

hope it helps,

marek
> [trying again, this time with correct kernel.org address]
> 
> Hi,
> 
> I have seen the following and similar problems several times,
> with both v4.19.3 and v4.19.4:
> 
> Nov 23 04:32:25 mars kernel: [112668.673671] EXT4-fs error (device sdb1): 
> ext4_iget:4831: inode #12602889: comm git: bad
> extra_isize 33661 (inode size 256)
> Nov 23 04:32:25 mars kernel: [112668.675217] Aborting journal on device 
> sdb1-8.
> Nov 23 04:32:25 mars kernel: [112668.676681] EXT4-fs (sdb1): Remounting 
> filesystem read-only
> Nov 23 04:32:25 mars kernel: [112668.808886] EXT4-fs error (device sdb1): 
> ext4_iget:4831: inode #12602881: comm rm: bad
> extra_isize 33685 (inode size 256)
> ...
> 
> Nov 25 00:12:43 saturn kernel: [59377.725984] EXT4-fs error (device sda1): 
> ext4_lookup:1578: inode #238034131: comm
> updatedb.mlocat: deleted inode referenced: 238160407
> Nov 25 00:12:43 saturn kernel: [59377.766638] Aborting journal on device 
> sda1-8.
> Nov 25 00:12:43 saturn kernel: [59377.779372] EXT4-fs (sda1): Remounting 
> filesystem read-only
> ...
> 
> Nov 24 01:52:31 saturn kernel: [189085.240016] EXT4-fs error (device sda1): 
> ext4_lookup:1578: inode #52038457: comm
> nfsd: deleted inode referenced: 52043796
> Nov 24 01:52:31 saturn kernel: [189085.263427] Aborting journal on device 
> sda1-8.
> Nov 24 01:52:31 saturn kernel: [189085.275313] EXT4-fs (sda1): Remounting 
> filesystem read-only
> 
> 
> The same systems running v4.18.6 never experienced a problem.
> 
> Has anyone else seen similar problems ? Is there anything I can do
> to help tracking down the problem ?
> 
> Thanks,
> Guenter
> 



BUG (vmscan.c:102) and possibly VIA IDE timing problems with test10-pre4

2000-10-21 Thread Marek Habersack

Hi,

  Attached is a tarball with the log of the event, a config used for the
kernel and dmesg output for overview of what the machine is. The BUG ocurred
while XFree 4 was running, the swap wasn't allocated at all, half of the
machine's memory was free. BUG ocurred two times, the second time it wasn't
logged entirely as seen from the attached excerpt. Also, the hda/hdb errors
seen in dmesg output started ocurring with test10-pre3 AFAIR - when the disk
parameters are reset using hdparm to turn DMA off and turn UDMA33 on (mode2
for the hda, mode4 for hdb) everything works just fine. If any more
information is required, please let me know.

regards,
marek

 bug.tar.gz
 PGP signature


BUG (vmscan.c:102) and possibly VIA IDE timing problems with test10-pre4

2000-10-21 Thread Marek Habersack

Hi,

  Attached is a tarball with the log of the event, a config used for the
kernel and dmesg output for overview of what the machine is. The BUG ocurred
while XFree 4 was running, the swap wasn't allocated at all, half of the
machine's memory was free. BUG ocurred two times, the second time it wasn't
logged entirely as seen from the attached excerpt. Also, the hda/hdb errors
seen in dmesg output started ocurring with test10-pre3 AFAIR - when the disk
parameters are reset using hdparm to turn DMA off and turn UDMA33 on (mode2
for the hda, mode4 for hdb) everything works just fine. If any more
information is required, please let me know.

regards,
marek

 bug.tar.gz
 PGP signature


Re: Availability of kdb

2000-09-19 Thread Marek Habersack

** On Sep 19, Marty Fouts scribbled:
> Gene did the instruction set architecture along with some others. I think he
> was also involved in the i/o architecture.
Marty, could you _please_ stop posting to this thread on lkml and _PLEASE_
learn how to snip messages and _DON'T_ quote everything you reply to - this
list's volume is high enough without such noise. Thanks,

marek

 PGP signature


Re: Availability of kdb

2000-09-19 Thread Marek Habersack

** On Sep 19, Marty Fouts scribbled:
 Gene did the instruction set architecture along with some others. I think he
 was also involved in the i/o architecture.
Marty, could you _please_ stop posting to this thread on lkml and _PLEASE_
learn how to snip messages and _DON'T_ quote everything you reply to - this
list's volume is high enough without such noise. Thanks,

marek

 PGP signature


Re: [ANNOUNCE] Withdrawl of Open Source NDS Project/NTFS/M2FS for Linux

2000-09-05 Thread Marek Habersack

** On Sep 05, Jeff V. Merkey scribbled:
> > > Linux is more buggy than NT, but at least the source code comes with it
> > > so there's no excuse for  not getting soeone to fix it 
> > Excuse me for adding my irrelevant 0.2$ - but what are you doing with Linux
> > then?? Why don't you just stick with NT and improve NT? If you want NT
> > source - you can buy it from M$ and the only point that speaks against NT
> > (as it seems from reading your words) will vanish - you will have NT
> > sources, kernel debugger, nifty GUI for all the stuff, trained developers,
> > nice tech support. Let me ask you once again - why are you sticking with
> > Linux?
> 
> I guess you don't know when people are joking.
Yes, it seems so. So you're telling us that this entire thread is joke on
your part? If not, then please show me the joke above or, for the future,
mark your "jokes" somehow in the text so that dumbsticks like myself can
uderstand the jokes. Thank you.

marek

 PGP signature


Re: [ANNOUNCE] Withdrawl of Open Source NDS Project/NTFS/M2FS for Linux

2000-09-05 Thread Marek Habersack

** On Sep 05, Jeff V. Merkey scribbled:
> 
> Linux is more buggy than NT, but at least the source code comes with it
> so there's no excuse for  not getting soeone to fix it 
Excuse me for adding my irrelevant 0.2$ - but what are you doing with Linux
then?? Why don't you just stick with NT and improve NT? If you want NT
source - you can buy it from M$ and the only point that speaks against NT
(as it seems from reading your words) will vanish - you will have NT
sources, kernel debugger, nifty GUI for all the stuff, trained developers,
nice tech support. Let me ask you once again - why are you sticking with
Linux?

marek

 PGP signature


Re: [ANNOUNCE] Withdrawl of Open Source NDS Project/NTFS/M2FS for Linux

2000-09-05 Thread Marek Habersack

** On Sep 05, Jeff V. Merkey scribbled:
 
 Linux is more buggy than NT, but at least the source code comes with it
 so there's no excuse for  not getting soeone to fix it 
Excuse me for adding my irrelevant 0.2$ - but what are you doing with Linux
then?? Why don't you just stick with NT and improve NT? If you want NT
source - you can buy it from M$ and the only point that speaks against NT
(as it seems from reading your words) will vanish - you will have NT
sources, kernel debugger, nifty GUI for all the stuff, trained developers,
nice tech support. Let me ask you once again - why are you sticking with
Linux?

marek

 PGP signature


Re: [ANNOUNCE] Withdrawl of Open Source NDS Project/NTFS/M2FS for Linux

2000-09-05 Thread Marek Habersack

** On Sep 05, Jeff V. Merkey scribbled:
   Linux is more buggy than NT, but at least the source code comes with it
   so there's no excuse for  not getting soeone to fix it 
  Excuse me for adding my irrelevant 0.2$ - but what are you doing with Linux
  then?? Why don't you just stick with NT and improve NT? If you want NT
  source - you can buy it from M$ and the only point that speaks against NT
  (as it seems from reading your words) will vanish - you will have NT
  sources, kernel debugger, nifty GUI for all the stuff, trained developers,
  nice tech support. Let me ask you once again - why are you sticking with
  Linux?
 
 I guess you don't know when people are joking.
Yes, it seems so. So you're telling us that this entire thread is joke on
your part? If not, then please show me the joke above or, for the future,
mark your "jokes" somehow in the text so that jokedumbsticks/joke like myself can
uderstand the jokes. Thank you.

marek

 PGP signature