Re: ext4: total breakdown on USB hdd, 3.0 kernel

2014-06-30 Thread Theodore Ts'o
On Mon, Jun 30, 2014 at 08:46:44AM +0200, Pavel Machek wrote:
> :-). Aha, and I misremembered, it was block descriptor checksums, not
> inode checksums:
> 
> One or more block group descriptor checksums are invalid.  Fix? yes
> 
> Group descriptor 0 checksum is invalid.  FIXED.
> Group descriptor 1 checksum is invalid.  FIXED.
> Group descriptor 2 checksum is invalid.  FIXED.
> Group descriptor 3 checksum is invalid.  FIXED.

Yeah, what we should be doing here is to try to backup block
descriptors and check to see if they are valid, and if so, use them
instead.


> I'm still trying to figure out what went wrong in the OLPC-1.75 + USB
> disk case.
> 
> One possibility is that OLPC is unable to provide enough power from
> the two USB ports to power Seagate Momentus 5400.6, and that the hard
> drive fails to detect the brown-out and does something wrong. (Are
> SATA drives expected to work at 4.5V? Because that's what is
> guaranteed on USB, IIRC).

The USB spec seems to require 5V +/i 0.25V, which also seems to be the
spec on laptop drives.  It wouldn't surprise me if the OLPC (or its
power adapter) is a bit dodgy under heavy load, though.  It might be
useful for you to measure the voltage and amps delivered at the USB
ports

> Heavy corruption happened when I was charging the phone _and_ running
> the hard drive, from the OLPC. Now I have seen cases when OLPC crashed 
> on device plug-in, in what looked like a brown-out...

 and from the power brick to see if either is out of spec.



- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ext4: total breakdown on USB hdd, 3.0 kernel

2014-06-30 Thread Pavel Machek
On Sun 2014-06-29 17:04:28, Theodore Ts'o wrote:
> On Sun, Jun 29, 2014 at 10:25:16PM +0200, Pavel Machek wrote:
> > 
> > One more thing that I noticed: fsck notices bad checksum on inode, and
> > then offers to fix the checksum with 'y' being the default. If there's
> > trash in the inode, that will just induce more errors. (Including
> > potentially doubly-linked blocks?) Would it make more sense to clear
> > the inodes with bad checksums?
> 
> Metadata checksums aren't in e2fsprogs 1.41 or 1.42.  It will be in
> the to-be-released e2fsprogs 1.43, and yes, we need to change things
> so that the default answer is to zero the inode.  We didn't do that
> initially because we were more suspicious of the new metadata checksum
> code in the kernel and e2fsprogs than we were of hardware faults.
> :-)

:-). Aha, and I misremembered, it was block descriptor checksums, not
inode checksums:

One or more block group descriptor checksums are invalid.  Fix? yes

Group descriptor 0 checksum is invalid.  FIXED.
Group descriptor 1 checksum is invalid.  FIXED.
Group descriptor 2 checksum is invalid.  FIXED.
Group descriptor 3 checksum is invalid.  FIXED.

I'm still trying to figure out what went wrong in the OLPC-1.75 + USB
disk case.

One possibility is that OLPC is unable to provide enough power from
the two USB ports to power Seagate Momentus 5400.6, and that the hard
drive fails to detect the brown-out and does something wrong. (Are
SATA drives expected to work at 4.5V? Because that's what is
guaranteed on USB, IIRC).

Heavy corruption happened when I was charging the phone _and_ running
the hard drive, from the OLPC. Now I have seen cases when OLPC crashed 
on device plug-in, in what looked like a brown-out...

Best regards,
Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ext4: total breakdown on USB hdd, 3.0 kernel

2014-06-30 Thread Pavel Machek
On Sun 2014-06-29 17:04:28, Theodore Ts'o wrote:
 On Sun, Jun 29, 2014 at 10:25:16PM +0200, Pavel Machek wrote:
  
  One more thing that I noticed: fsck notices bad checksum on inode, and
  then offers to fix the checksum with 'y' being the default. If there's
  trash in the inode, that will just induce more errors. (Including
  potentially doubly-linked blocks?) Would it make more sense to clear
  the inodes with bad checksums?
 
 Metadata checksums aren't in e2fsprogs 1.41 or 1.42.  It will be in
 the to-be-released e2fsprogs 1.43, and yes, we need to change things
 so that the default answer is to zero the inode.  We didn't do that
 initially because we were more suspicious of the new metadata checksum
 code in the kernel and e2fsprogs than we were of hardware faults.
 :-)

:-). Aha, and I misremembered, it was block descriptor checksums, not
inode checksums:

One or more block group descriptor checksums are invalid.  Fix? yes

Group descriptor 0 checksum is invalid.  FIXED.
Group descriptor 1 checksum is invalid.  FIXED.
Group descriptor 2 checksum is invalid.  FIXED.
Group descriptor 3 checksum is invalid.  FIXED.

I'm still trying to figure out what went wrong in the OLPC-1.75 + USB
disk case.

One possibility is that OLPC is unable to provide enough power from
the two USB ports to power Seagate Momentus 5400.6, and that the hard
drive fails to detect the brown-out and does something wrong. (Are
SATA drives expected to work at 4.5V? Because that's what is
guaranteed on USB, IIRC).

Heavy corruption happened when I was charging the phone _and_ running
the hard drive, from the OLPC. Now I have seen cases when OLPC crashed 
on device plug-in, in what looked like a brown-out...

Best regards,
Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ext4: total breakdown on USB hdd, 3.0 kernel

2014-06-30 Thread Theodore Ts'o
On Mon, Jun 30, 2014 at 08:46:44AM +0200, Pavel Machek wrote:
 :-). Aha, and I misremembered, it was block descriptor checksums, not
 inode checksums:
 
 One or more block group descriptor checksums are invalid.  Fix? yes
 
 Group descriptor 0 checksum is invalid.  FIXED.
 Group descriptor 1 checksum is invalid.  FIXED.
 Group descriptor 2 checksum is invalid.  FIXED.
 Group descriptor 3 checksum is invalid.  FIXED.

Yeah, what we should be doing here is to try to backup block
descriptors and check to see if they are valid, and if so, use them
instead.


 I'm still trying to figure out what went wrong in the OLPC-1.75 + USB
 disk case.
 
 One possibility is that OLPC is unable to provide enough power from
 the two USB ports to power Seagate Momentus 5400.6, and that the hard
 drive fails to detect the brown-out and does something wrong. (Are
 SATA drives expected to work at 4.5V? Because that's what is
 guaranteed on USB, IIRC).

The USB spec seems to require 5V +/i 0.25V, which also seems to be the
spec on laptop drives.  It wouldn't surprise me if the OLPC (or its
power adapter) is a bit dodgy under heavy load, though.  It might be
useful for you to measure the voltage and amps delivered at the USB
ports

 Heavy corruption happened when I was charging the phone _and_ running
 the hard drive, from the OLPC. Now I have seen cases when OLPC crashed 
 on device plug-in, in what looked like a brown-out...

 and from the power brick to see if either is out of spec.



- Ted
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ext4: total breakdown on USB hdd, 3.0 kernel

2014-06-29 Thread Theodore Ts'o
On Sun, Jun 29, 2014 at 10:25:16PM +0200, Pavel Machek wrote:
> 
> One more thing that I noticed: fsck notices bad checksum on inode, and
> then offers to fix the checksum with 'y' being the default. If there's
> trash in the inode, that will just induce more errors. (Including
> potentially doubly-linked blocks?) Would it make more sense to clear
> the inodes with bad checksums?

Metadata checksums aren't in e2fsprogs 1.41 or 1.42.  It will be in
the to-be-released e2fsprogs 1.43, and yes, we need to change things
so that the default answer is to zero the inode.  We didn't do that
initially because we were more suspicious of the new metadata checksum
code in the kernel and e2fsprogs than we were of hardware faults.  :-)

Cheers,

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ext4: total breakdown on USB hdd, 3.0 kernel

2014-06-29 Thread Pavel Machek
Hi!

> > It looks like the filesystem contains _way_ too many 0x's:
> 
> That sounds like it's a hardware issue.  It may be that the controller
> did something insane while trying to do a write at the point when the
> disk drive was disconnected (and so the drive suffered a power
> drop).

Interesting. I tried to compare damaged image with the original, and
yes, way too many 0x. But they are not even block aligned? And
they start from byte 0... that area is not normally written, IIRC?

000        
*
030  07ff      
040        
*
3f0        
400       3e28 002d
410 fd57 000c      
420        
*
550        
560        
570     4ddb 0055  
580        
590   007e     
5a0        
*
5c0       682e 53ac
5d0 3a29 000a 0515  d144 002e  
5e0 7865 3474 6d5f 7061 625f 6f6c 6b63 0073
5f0        
600        
*
0001000 41c0 03e9 1000  6133 53ac 6133 53ac

> > And for every bug in kernel, there's one in fsck: I did not expect it, but 
> > fsck actually
> > suceeded, and marked fs as clean. But second fsck had issues with   
> > /lost+found...
>  
> I'd need the previous fsck transcript to have any idea what might have
> happened.  I'll note though you are using an ancient version of e2fsck
> (1.41.12, and there have been a huge number of bug fixes since
> May 2010)

Sorry for picking at fsck. No, it did quite a good job given
circumstances... and it probably does not make sense to debug old
version.

One more thing that I noticed: fsck notices bad checksum on inode, and
then offers to fix the checksum with 'y' being the default. If there's
trash in the inode, that will just induce more errors. (Including
potentially doubly-linked blocks?) Would it make more sense to clear
the inodes with bad checksums?

Thanks and best regards,
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ext4: total breakdown on USB hdd, 3.0 kernel

2014-06-29 Thread Pavel Machek
Hi!

  It looks like the filesystem contains _way_ too many 0x's:
 
 That sounds like it's a hardware issue.  It may be that the controller
 did something insane while trying to do a write at the point when the
 disk drive was disconnected (and so the drive suffered a power
 drop).

Interesting. I tried to compare damaged image with the original, and
yes, way too many 0x. But they are not even block aligned? And
they start from byte 0... that area is not normally written, IIRC?

000        
*
030  07ff      
040        
*
3f0        
400       3e28 002d
410 fd57 000c      
420        
*
550        
560        
570     4ddb 0055  
580        
590   007e     
5a0        
*
5c0       682e 53ac
5d0 3a29 000a 0515  d144 002e  
5e0 7865 3474 6d5f 7061 625f 6f6c 6b63 0073
5f0        
600        
*
0001000 41c0 03e9 1000  6133 53ac 6133 53ac

  And for every bug in kernel, there's one in fsck: I did not expect it, but 
  fsck actually
  suceeded, and marked fs as clean. But second fsck had issues with   
  /lost+found...
  
 I'd need the previous fsck transcript to have any idea what might have
 happened.  I'll note though you are using an ancient version of e2fsck
 (1.41.12, and there have been a huge number of bug fixes since
 May 2010)

Sorry for picking at fsck. No, it did quite a good job given
circumstances... and it probably does not make sense to debug old
version.

One more thing that I noticed: fsck notices bad checksum on inode, and
then offers to fix the checksum with 'y' being the default. If there's
trash in the inode, that will just induce more errors. (Including
potentially doubly-linked blocks?) Would it make more sense to clear
the inodes with bad checksums?

Thanks and best regards,
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ext4: total breakdown on USB hdd, 3.0 kernel

2014-06-29 Thread Theodore Ts'o
On Sun, Jun 29, 2014 at 10:25:16PM +0200, Pavel Machek wrote:
 
 One more thing that I noticed: fsck notices bad checksum on inode, and
 then offers to fix the checksum with 'y' being the default. If there's
 trash in the inode, that will just induce more errors. (Including
 potentially doubly-linked blocks?) Would it make more sense to clear
 the inodes with bad checksums?

Metadata checksums aren't in e2fsprogs 1.41 or 1.42.  It will be in
the to-be-released e2fsprogs 1.43, and yes, we need to change things
so that the default answer is to zero the inode.  We didn't do that
initially because we were more suspicious of the new metadata checksum
code in the kernel and e2fsprogs than we were of hardware faults.  :-)

Cheers,

- Ted
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ext4: total breakdown on USB hdd, 3.0 kernel

2014-06-27 Thread Oliver Neukum
On Thu, 2014-06-26 at 22:20 +0200, Pavel Machek wrote:
> Hi!
> 
> Ok, this ext4 filesystem does _not_ have easy life: it is in usb
> envelope, I wanted
> to use it as a root filesystem, and it is connected to OLPC-1.75,
> running some kind
> of linux-3.0 kernels.
> 
> So power disconnects are common, and even during regular reboot, I
> hear disk doing
> emergency parking.
> 
> I don't know how barriers work over USB...

Just like with other SCSI devices.

HTH
Oliver


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ext4: total breakdown on USB hdd, 3.0 kernel

2014-06-27 Thread Oliver Neukum
On Thu, 2014-06-26 at 22:20 +0200, Pavel Machek wrote:
 Hi!
 
 Ok, this ext4 filesystem does _not_ have easy life: it is in usb
 envelope, I wanted
 to use it as a root filesystem, and it is connected to OLPC-1.75,
 running some kind
 of linux-3.0 kernels.
 
 So power disconnects are common, and even during regular reboot, I
 hear disk doing
 emergency parking.
 
 I don't know how barriers work over USB...

Just like with other SCSI devices.

HTH
Oliver


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ext4: total breakdown on USB hdd, 3.0 kernel

2014-06-26 Thread Theodore Ts'o
On Thu, Jun 26, 2014 at 10:50:49PM +0200, Pavel Machek wrote:
> 
> And for every bug in kernel, there's one in fsck: I did not expect it, but 
> fsck actually
> suceeded, and marked fs as clean. But second fsck had issues with 
> /lost+found...

I'd need the previous fsck transcript to have any idea what might have
happened.  I'll note though you are using an ancient version of e2fsck
(1.41.12, and there have been a huge number of bug fixes since
May 2010)

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ext4: total breakdown on USB hdd, 3.0 kernel

2014-06-26 Thread Theodore Ts'o
On Thu, Jun 26, 2014 at 10:30:52PM +0200, Pavel Machek wrote:
> 
> It looks like the filesystem contains _way_ too many 0x's:

That sounds like it's a hardware issue.  It may be that the controller
did something insane while trying to do a write at the point when the
disk drive was disconnected (and so the drive suffered a power drop).

> I saved beggining of the filesystem using cat /dev/sdc4 | gzip -9 - > 
> /dev/sda3, but
> then ran out of patience. So there may be something for analysis, but...

The way to snapshot just the metadata blocks for analysis is:

e2image -r /dev/hdc4 | bzip2 > ~/hdc4.e2i.bz2

But in this case, it's I doubt it will be very helpful, because
fundamentally, this appears to be a hardware issue.

   - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ext4: total breakdown on USB hdd, 3.0 kernel

2014-06-26 Thread Pavel Machek
On Thu 2014-06-26 22:30:52, Pavel Machek wrote:
> Hi!
> 
> > Ok, this ext4 filesystem does _not_ have easy life: it is in usb envelope, 
> > I wanted
> > to use it as a root filesystem, and it is connected to OLPC-1.75, running 
> > some kind
> > of linux-3.0 kernels.
> > 
> > So power disconnects are common, and even during regular reboot, I hear 
> > disk doing
> > emergency parking.
> > 
> > I don't know how barriers work over USB...
> > 
> > Plus the drive has physical bad blocks, but I attempted to mark them with 
> > fsck -c.
> > 
> > OTOH, it is just a root filesystem... and nothing above should prevent 
> > correct operation
> > (right?)
> > 
> > On last mount, it remounted itself read-only, so there's time for fsck, I 
> > guess...
> > 
> > But I believe this means I am going to lose all the data on the filesystem, 
> > right?
> 
> It looks like the filesystem contains _way_ too many 0x's:
> 
> Inode 655221 has compression flag set on filesystem without compression 
> support.  Clear? yes
> 
> Inode 655221 has INDEX_FL flag set but is not a directory.
> Clear HTree index? yes
...

And for every bug in kernel, there's one in fsck: I did not expect it, but fsck 
actually
suceeded, and marked fs as clean. But second fsck had issues with /lost+found...


-bash-4.1# fsck  /dev/sdc4 
fsck from util-linux-ng 2.18
e2fsck 1.41.12 (17-May-2010)
armroot: clean, 132690/985424 files, 1023715/3934116 blocks
-bash-4.1# fsck -f /dev/sdc4 
fsck from util-linux-ng 2.18
e2fsck 1.41.12 (17-May-2010)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
'..' in /lost+found/#652090/auth-for-pavel-wzJd6X (17) is /lost+found (11), 
should be /lost+found/#652090 (652090).
Fix? yes

Pass 4: Checking reference counts
Pass 5: Checking group summary information

armroot: * FILE SYSTEM WAS MODIFIED *
armroot: 132690/985424 files (0.1% non-contiguous), 1023715/3934116 blocks
-bash-4.1# 

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ext4: total breakdown on USB hdd, 3.0 kernel

2014-06-26 Thread Pavel Machek
Hi!

> Ok, this ext4 filesystem does _not_ have easy life: it is in usb envelope, I 
> wanted
> to use it as a root filesystem, and it is connected to OLPC-1.75, running 
> some kind
> of linux-3.0 kernels.
> 
> So power disconnects are common, and even during regular reboot, I hear disk 
> doing
> emergency parking.
> 
> I don't know how barriers work over USB...
> 
> Plus the drive has physical bad blocks, but I attempted to mark them with 
> fsck -c.
> 
> OTOH, it is just a root filesystem... and nothing above should prevent 
> correct operation
> (right?)
> 
> On last mount, it remounted itself read-only, so there's time for fsck, I 
> guess...
> 
> But I believe this means I am going to lose all the data on the filesystem, 
> right?

It looks like the filesystem contains _way_ too many 0x's:

Inode 655221 has compression flag set on filesystem without compression 
support.  Clear? yes

Inode 655221 has INDEX_FL flag set but is not a directory.
Clear HTree index? yes

Inode 655221 should not have EOFBLOCKS_FL set (size 18446744073709551615, lblk 
-1)
Clear? yes

Inode 655221, i_size is 18446744073709551615, should be 0.  Fix? yes

Inode 655221, i_blocks is 281474976710655, should be 0.  Fix? yes

Inode 655222 is in use, but has dtime set.  Fix? yes

Inode 655222 has imagic flag set.  Clear? yes

Inode 655222 has a extra size (65535) which is invalid
Fix? yes

Inode 655222 has compression flag set on filesystem without compression 
support.  Clear? yes

Inode 655222 has INDEX_FL flag set but is not a directory.
Clear HTree index? yes

Inode 655222 should not have EOFBLOCKS_FL set (size 18446744073709551615, lblk 
-1)
Clear? 

I saved beggining of the filesystem using cat /dev/sdc4 | gzip -9 - > 
/dev/sda3, but
then ran out of patience. So there may be something for analysis, but...

Any ideas?

Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


ext4: total breakdown on USB hdd, 3.0 kernel

2014-06-26 Thread Pavel Machek
Hi!

Ok, this ext4 filesystem does _not_ have easy life: it is in usb envelope, I 
wanted
to use it as a root filesystem, and it is connected to OLPC-1.75, running some 
kind
of linux-3.0 kernels.

So power disconnects are common, and even during regular reboot, I hear disk 
doing
emergency parking.

I don't know how barriers work over USB...

Plus the drive has physical bad blocks, but I attempted to mark them with fsck 
-c.

OTOH, it is just a root filesystem... and nothing above should prevent correct 
operation
(right?)

On last mount, it remounted itself read-only, so there's time for fsck, I 
guess...

But I believe this means I am going to lose all the data on the filesystem, 
right?

Any idea what could have happened? It looks like garbage written over the 
filesystem, right?

I'm using devicemapper on another partition (for encrypted ext4). I feel I lost 
that filesystem,
too, but without root filesystem, I can't check it easily.

Any idea what to do so that it does not repeat?

Should I switch to plain ext2?

Pavel

-bash-4.1# fsck /dev/sdc4 
fsck from util-linux-ng 2.18
e2fsck 1.41.12 (17-May-2010)
fsck.ext2: Superblock invalid, trying backup blocks...
Superblock has an invalid journal (inode 8).
Clear? yes

*** ext3 journal has been deleted - filesystem is now ext2 only ***

One or more block group descriptor checksums are invalid.  Fix? yes

Group descriptor 0 checksum is invalid.  FIXED.
Group descriptor 1 checksum is invalid.  FIXED.
Group descriptor 2 checksum is invalid.  FIXED.
Group descriptor 3 checksum is invalid.  FIXED.
Group descriptor 4 checksum is invalid.  FIXED.
Group descriptor 5 checksum is invalid.  FIXED.
Group descriptor 6 checksum is invalid.  FIXED.
Group descriptor 7 checksum is invalid.  FIXED.
Group descriptor 8 checksum is invalid.  FIXED.
Group descriptor 9 checksum is invalid.  FIXED.
Group descriptor 10 checksum is invalid.  FIXED.
Group descriptor 11 checksum is invalid.  FIXED.
Group descriptor 12 checksum is invalid.  FIXED.
Group descriptor 13 checksum is invalid.  FIXED.
Group descriptor 14 checksum is invalid.  FIXED.
Group descriptor 15 checksum is invalid.  FIXED.
Group descriptor 16 checksum is invalid.  FIXED.
Group descriptor 17 checksum is invalid.  FIXED.
Group descriptor 18 checksum is invalid.  FIXED.
Group descriptor 19 checksum is invalid.  FIXED.
Group descriptor 20 checksum is invalid.  FIXED.
Group descriptor 21 checksum is invalid.  FIXED.
Group descriptor 22 checksum is invalid.  FIXED.
Group descriptor 23 checksum is invalid.  FIXED.
Group descriptor 24 checksum is invalid.  FIXED.
Group descriptor 25 checksum is invalid.  FIXED.
Group descriptor 26 checksum is invalid.  FIXED.
Group descriptor 27 checksum is invalid.  FIXED.
Group descriptor 28 checksum is invalid.  FIXED.
Group descriptor 29 checksum is invalid.  FIXED.
Group descriptor 30 checksum is invalid.  FIXED.
Group descriptor 31 checksum is invalid.  FIXED.
Group descriptor 32 checksum is invalid.  FIXED.
Group descriptor 33 checksum is invalid.  FIXED.
Group descriptor 34 checksum is invalid.  FIXED.
Group descriptor 35 checksum is invalid.  FIXED.
Group descriptor 36 checksum is invalid.  FIXED.
Group descriptor 37 checksum is invalid.  FIXED.
Group descriptor 38 checksum is invalid.  FIXED.
Group descriptor 39 checksum is invalid.  FIXED.
Group descriptor 40 checksum is invalid.  FIXED.
Group descriptor 41 checksum is invalid.  FIXED.
Group descriptor 42 checksum is invalid.  FIXED.
Group descriptor 43 checksum is invalid.  FIXED.
Group descriptor 44 checksum is invalid.  FIXED.
Group descriptor 45 checksum is invalid.  FIXED.
Group descriptor 46 checksum is invalid.  FIXED.
Group descriptor 47 checksum is invalid.  FIXED.
Group descriptor 48 checksum is invalid.  FIXED.
Group descriptor 49 checksum is invalid.  FIXED.
Group descriptor 50 checksum is invalid.  FIXED.
Group descriptor 51 checksum is invalid.  FIXED.
Group descriptor 52 checksum is invalid.  FIXED.
Group descriptor 53 checksum is invalid.  FIXED.
Group descriptor 54 checksum is invalid.  FIXED.
Group descriptor 55 checksum is invalid.  FIXED.
Group descriptor 56 checksum is invalid.  FIXED.
Group descriptor 57 checksum is invalid.  FIXED.
Group descriptor 58 checksum is invalid.  FIXED.
Group descriptor 59 checksum is invalid.  FIXED.
Group descriptor 60 checksum is invalid.  FIXED.
Group descriptor 61 checksum is invalid.  FIXED.
Group descriptor 62 checksum is invalid.  FIXED.
Group descriptor 63 checksum is invalid.  FIXED.
Group descriptor 64 checksum is invalid.  FIXED.
Group descriptor 65 checksum is invalid.  FIXED.
Group descriptor 66 checksum is invalid.  FIXED.
Group descriptor 67 checksum is invalid.  FIXED.
Group descriptor 68 checksum is invalid.  FIXED.
Group descriptor 69 checksum is invalid.  FIXED.
Group descriptor 70 checksum is invalid.  FIXED.
Group descriptor 71 checksum is invalid.  FIXED.
Group descriptor 72 

ext4: total breakdown on USB hdd, 3.0 kernel

2014-06-26 Thread Pavel Machek
Hi!

Ok, this ext4 filesystem does _not_ have easy life: it is in usb envelope, I 
wanted
to use it as a root filesystem, and it is connected to OLPC-1.75, running some 
kind
of linux-3.0 kernels.

So power disconnects are common, and even during regular reboot, I hear disk 
doing
emergency parking.

I don't know how barriers work over USB...

Plus the drive has physical bad blocks, but I attempted to mark them with fsck 
-c.

OTOH, it is just a root filesystem... and nothing above should prevent correct 
operation
(right?)

On last mount, it remounted itself read-only, so there's time for fsck, I 
guess...

But I believe this means I am going to lose all the data on the filesystem, 
right?

Any idea what could have happened? It looks like garbage written over the 
filesystem, right?

I'm using devicemapper on another partition (for encrypted ext4). I feel I lost 
that filesystem,
too, but without root filesystem, I can't check it easily.

Any idea what to do so that it does not repeat?

Should I switch to plain ext2?

Pavel

-bash-4.1# fsck /dev/sdc4 
fsck from util-linux-ng 2.18
e2fsck 1.41.12 (17-May-2010)
fsck.ext2: Superblock invalid, trying backup blocks...
Superblock has an invalid journal (inode 8).
Cleary? yes

*** ext3 journal has been deleted - filesystem is now ext2 only ***

One or more block group descriptor checksums are invalid.  Fixy? yes

Group descriptor 0 checksum is invalid.  FIXED.
Group descriptor 1 checksum is invalid.  FIXED.
Group descriptor 2 checksum is invalid.  FIXED.
Group descriptor 3 checksum is invalid.  FIXED.
Group descriptor 4 checksum is invalid.  FIXED.
Group descriptor 5 checksum is invalid.  FIXED.
Group descriptor 6 checksum is invalid.  FIXED.
Group descriptor 7 checksum is invalid.  FIXED.
Group descriptor 8 checksum is invalid.  FIXED.
Group descriptor 9 checksum is invalid.  FIXED.
Group descriptor 10 checksum is invalid.  FIXED.
Group descriptor 11 checksum is invalid.  FIXED.
Group descriptor 12 checksum is invalid.  FIXED.
Group descriptor 13 checksum is invalid.  FIXED.
Group descriptor 14 checksum is invalid.  FIXED.
Group descriptor 15 checksum is invalid.  FIXED.
Group descriptor 16 checksum is invalid.  FIXED.
Group descriptor 17 checksum is invalid.  FIXED.
Group descriptor 18 checksum is invalid.  FIXED.
Group descriptor 19 checksum is invalid.  FIXED.
Group descriptor 20 checksum is invalid.  FIXED.
Group descriptor 21 checksum is invalid.  FIXED.
Group descriptor 22 checksum is invalid.  FIXED.
Group descriptor 23 checksum is invalid.  FIXED.
Group descriptor 24 checksum is invalid.  FIXED.
Group descriptor 25 checksum is invalid.  FIXED.
Group descriptor 26 checksum is invalid.  FIXED.
Group descriptor 27 checksum is invalid.  FIXED.
Group descriptor 28 checksum is invalid.  FIXED.
Group descriptor 29 checksum is invalid.  FIXED.
Group descriptor 30 checksum is invalid.  FIXED.
Group descriptor 31 checksum is invalid.  FIXED.
Group descriptor 32 checksum is invalid.  FIXED.
Group descriptor 33 checksum is invalid.  FIXED.
Group descriptor 34 checksum is invalid.  FIXED.
Group descriptor 35 checksum is invalid.  FIXED.
Group descriptor 36 checksum is invalid.  FIXED.
Group descriptor 37 checksum is invalid.  FIXED.
Group descriptor 38 checksum is invalid.  FIXED.
Group descriptor 39 checksum is invalid.  FIXED.
Group descriptor 40 checksum is invalid.  FIXED.
Group descriptor 41 checksum is invalid.  FIXED.
Group descriptor 42 checksum is invalid.  FIXED.
Group descriptor 43 checksum is invalid.  FIXED.
Group descriptor 44 checksum is invalid.  FIXED.
Group descriptor 45 checksum is invalid.  FIXED.
Group descriptor 46 checksum is invalid.  FIXED.
Group descriptor 47 checksum is invalid.  FIXED.
Group descriptor 48 checksum is invalid.  FIXED.
Group descriptor 49 checksum is invalid.  FIXED.
Group descriptor 50 checksum is invalid.  FIXED.
Group descriptor 51 checksum is invalid.  FIXED.
Group descriptor 52 checksum is invalid.  FIXED.
Group descriptor 53 checksum is invalid.  FIXED.
Group descriptor 54 checksum is invalid.  FIXED.
Group descriptor 55 checksum is invalid.  FIXED.
Group descriptor 56 checksum is invalid.  FIXED.
Group descriptor 57 checksum is invalid.  FIXED.
Group descriptor 58 checksum is invalid.  FIXED.
Group descriptor 59 checksum is invalid.  FIXED.
Group descriptor 60 checksum is invalid.  FIXED.
Group descriptor 61 checksum is invalid.  FIXED.
Group descriptor 62 checksum is invalid.  FIXED.
Group descriptor 63 checksum is invalid.  FIXED.
Group descriptor 64 checksum is invalid.  FIXED.
Group descriptor 65 checksum is invalid.  FIXED.
Group descriptor 66 checksum is invalid.  FIXED.
Group descriptor 67 checksum is invalid.  FIXED.
Group descriptor 68 checksum is invalid.  FIXED.
Group descriptor 69 checksum is invalid.  FIXED.
Group descriptor 70 checksum is invalid.  FIXED.
Group descriptor 71 checksum is invalid.  FIXED.
Group descriptor 72 

Re: ext4: total breakdown on USB hdd, 3.0 kernel

2014-06-26 Thread Pavel Machek
Hi!

 Ok, this ext4 filesystem does _not_ have easy life: it is in usb envelope, I 
 wanted
 to use it as a root filesystem, and it is connected to OLPC-1.75, running 
 some kind
 of linux-3.0 kernels.
 
 So power disconnects are common, and even during regular reboot, I hear disk 
 doing
 emergency parking.
 
 I don't know how barriers work over USB...
 
 Plus the drive has physical bad blocks, but I attempted to mark them with 
 fsck -c.
 
 OTOH, it is just a root filesystem... and nothing above should prevent 
 correct operation
 (right?)
 
 On last mount, it remounted itself read-only, so there's time for fsck, I 
 guess...
 
 But I believe this means I am going to lose all the data on the filesystem, 
 right?

It looks like the filesystem contains _way_ too many 0x's:

Inode 655221 has compression flag set on filesystem without compression 
support.  Cleary? yes

Inode 655221 has INDEX_FL flag set but is not a directory.
Clear HTree indexy? yes

Inode 655221 should not have EOFBLOCKS_FL set (size 18446744073709551615, lblk 
-1)
Cleary? yes

Inode 655221, i_size is 18446744073709551615, should be 0.  Fixy? yes

Inode 655221, i_blocks is 281474976710655, should be 0.  Fixy? yes

Inode 655222 is in use, but has dtime set.  Fixy? yes

Inode 655222 has imagic flag set.  Cleary? yes

Inode 655222 has a extra size (65535) which is invalid
Fixy? yes

Inode 655222 has compression flag set on filesystem without compression 
support.  Cleary? yes

Inode 655222 has INDEX_FL flag set but is not a directory.
Clear HTree indexy? yes

Inode 655222 should not have EOFBLOCKS_FL set (size 18446744073709551615, lblk 
-1)
Cleary? 

I saved beggining of the filesystem using cat /dev/sdc4 | gzip -9 -  
/dev/sda3, but
then ran out of patience. So there may be something for analysis, but...

Any ideas?

Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ext4: total breakdown on USB hdd, 3.0 kernel

2014-06-26 Thread Pavel Machek
On Thu 2014-06-26 22:30:52, Pavel Machek wrote:
 Hi!
 
  Ok, this ext4 filesystem does _not_ have easy life: it is in usb envelope, 
  I wanted
  to use it as a root filesystem, and it is connected to OLPC-1.75, running 
  some kind
  of linux-3.0 kernels.
  
  So power disconnects are common, and even during regular reboot, I hear 
  disk doing
  emergency parking.
  
  I don't know how barriers work over USB...
  
  Plus the drive has physical bad blocks, but I attempted to mark them with 
  fsck -c.
  
  OTOH, it is just a root filesystem... and nothing above should prevent 
  correct operation
  (right?)
  
  On last mount, it remounted itself read-only, so there's time for fsck, I 
  guess...
  
  But I believe this means I am going to lose all the data on the filesystem, 
  right?
 
 It looks like the filesystem contains _way_ too many 0x's:
 
 Inode 655221 has compression flag set on filesystem without compression 
 support.  Cleary? yes
 
 Inode 655221 has INDEX_FL flag set but is not a directory.
 Clear HTree indexy? yes
...

And for every bug in kernel, there's one in fsck: I did not expect it, but fsck 
actually
suceeded, and marked fs as clean. But second fsck had issues with /lost+found...


-bash-4.1# fsck  /dev/sdc4 
fsck from util-linux-ng 2.18
e2fsck 1.41.12 (17-May-2010)
armroot: clean, 132690/985424 files, 1023715/3934116 blocks
-bash-4.1# fsck -f /dev/sdc4 
fsck from util-linux-ng 2.18
e2fsck 1.41.12 (17-May-2010)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
'..' in /lost+found/#652090/auth-for-pavel-wzJd6X (17) is /lost+found (11), 
should be /lost+found/#652090 (652090).
Fixy? yes

Pass 4: Checking reference counts
Pass 5: Checking group summary information

armroot: * FILE SYSTEM WAS MODIFIED *
armroot: 132690/985424 files (0.1% non-contiguous), 1023715/3934116 blocks
-bash-4.1# 

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ext4: total breakdown on USB hdd, 3.0 kernel

2014-06-26 Thread Theodore Ts'o
On Thu, Jun 26, 2014 at 10:30:52PM +0200, Pavel Machek wrote:
 
 It looks like the filesystem contains _way_ too many 0x's:

That sounds like it's a hardware issue.  It may be that the controller
did something insane while trying to do a write at the point when the
disk drive was disconnected (and so the drive suffered a power drop).

 I saved beggining of the filesystem using cat /dev/sdc4 | gzip -9 -  
 /dev/sda3, but
 then ran out of patience. So there may be something for analysis, but...

The way to snapshot just the metadata blocks for analysis is:

e2image -r /dev/hdc4 | bzip2  ~/hdc4.e2i.bz2

But in this case, it's I doubt it will be very helpful, because
fundamentally, this appears to be a hardware issue.

   - Ted
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ext4: total breakdown on USB hdd, 3.0 kernel

2014-06-26 Thread Theodore Ts'o
On Thu, Jun 26, 2014 at 10:50:49PM +0200, Pavel Machek wrote:
 
 And for every bug in kernel, there's one in fsck: I did not expect it, but 
 fsck actually
 suceeded, and marked fs as clean. But second fsck had issues with 
 /lost+found...

I'd need the previous fsck transcript to have any idea what might have
happened.  I'll note though you are using an ancient version of e2fsck
(1.41.12, and there have been a huge number of bug fixes since
May 2010)

- Ted
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/