Re: ext4: total breakdown on USB hdd, 3.0 kernel
On Mon, Jun 30, 2014 at 08:46:44AM +0200, Pavel Machek wrote: > :-). Aha, and I misremembered, it was block descriptor checksums, not > inode checksums: > > One or more block group descriptor checksums are invalid. Fix? yes > > Group descriptor 0 checksum is invalid. FIXED. > Group descriptor 1 checksum is invalid. FIXED. > Group descriptor 2 checksum is invalid. FIXED. > Group descriptor 3 checksum is invalid. FIXED. Yeah, what we should be doing here is to try to backup block descriptors and check to see if they are valid, and if so, use them instead. > I'm still trying to figure out what went wrong in the OLPC-1.75 + USB > disk case. > > One possibility is that OLPC is unable to provide enough power from > the two USB ports to power Seagate Momentus 5400.6, and that the hard > drive fails to detect the brown-out and does something wrong. (Are > SATA drives expected to work at 4.5V? Because that's what is > guaranteed on USB, IIRC). The USB spec seems to require 5V +/i 0.25V, which also seems to be the spec on laptop drives. It wouldn't surprise me if the OLPC (or its power adapter) is a bit dodgy under heavy load, though. It might be useful for you to measure the voltage and amps delivered at the USB ports > Heavy corruption happened when I was charging the phone _and_ running > the hard drive, from the OLPC. Now I have seen cases when OLPC crashed > on device plug-in, in what looked like a brown-out... and from the power brick to see if either is out of spec. - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ext4: total breakdown on USB hdd, 3.0 kernel
On Sun 2014-06-29 17:04:28, Theodore Ts'o wrote: > On Sun, Jun 29, 2014 at 10:25:16PM +0200, Pavel Machek wrote: > > > > One more thing that I noticed: fsck notices bad checksum on inode, and > > then offers to fix the checksum with 'y' being the default. If there's > > trash in the inode, that will just induce more errors. (Including > > potentially doubly-linked blocks?) Would it make more sense to clear > > the inodes with bad checksums? > > Metadata checksums aren't in e2fsprogs 1.41 or 1.42. It will be in > the to-be-released e2fsprogs 1.43, and yes, we need to change things > so that the default answer is to zero the inode. We didn't do that > initially because we were more suspicious of the new metadata checksum > code in the kernel and e2fsprogs than we were of hardware faults. > :-) :-). Aha, and I misremembered, it was block descriptor checksums, not inode checksums: One or more block group descriptor checksums are invalid. Fix? yes Group descriptor 0 checksum is invalid. FIXED. Group descriptor 1 checksum is invalid. FIXED. Group descriptor 2 checksum is invalid. FIXED. Group descriptor 3 checksum is invalid. FIXED. I'm still trying to figure out what went wrong in the OLPC-1.75 + USB disk case. One possibility is that OLPC is unable to provide enough power from the two USB ports to power Seagate Momentus 5400.6, and that the hard drive fails to detect the brown-out and does something wrong. (Are SATA drives expected to work at 4.5V? Because that's what is guaranteed on USB, IIRC). Heavy corruption happened when I was charging the phone _and_ running the hard drive, from the OLPC. Now I have seen cases when OLPC crashed on device plug-in, in what looked like a brown-out... Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ext4: total breakdown on USB hdd, 3.0 kernel
On Sun 2014-06-29 17:04:28, Theodore Ts'o wrote: On Sun, Jun 29, 2014 at 10:25:16PM +0200, Pavel Machek wrote: One more thing that I noticed: fsck notices bad checksum on inode, and then offers to fix the checksum with 'y' being the default. If there's trash in the inode, that will just induce more errors. (Including potentially doubly-linked blocks?) Would it make more sense to clear the inodes with bad checksums? Metadata checksums aren't in e2fsprogs 1.41 or 1.42. It will be in the to-be-released e2fsprogs 1.43, and yes, we need to change things so that the default answer is to zero the inode. We didn't do that initially because we were more suspicious of the new metadata checksum code in the kernel and e2fsprogs than we were of hardware faults. :-) :-). Aha, and I misremembered, it was block descriptor checksums, not inode checksums: One or more block group descriptor checksums are invalid. Fix? yes Group descriptor 0 checksum is invalid. FIXED. Group descriptor 1 checksum is invalid. FIXED. Group descriptor 2 checksum is invalid. FIXED. Group descriptor 3 checksum is invalid. FIXED. I'm still trying to figure out what went wrong in the OLPC-1.75 + USB disk case. One possibility is that OLPC is unable to provide enough power from the two USB ports to power Seagate Momentus 5400.6, and that the hard drive fails to detect the brown-out and does something wrong. (Are SATA drives expected to work at 4.5V? Because that's what is guaranteed on USB, IIRC). Heavy corruption happened when I was charging the phone _and_ running the hard drive, from the OLPC. Now I have seen cases when OLPC crashed on device plug-in, in what looked like a brown-out... Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ext4: total breakdown on USB hdd, 3.0 kernel
On Mon, Jun 30, 2014 at 08:46:44AM +0200, Pavel Machek wrote: :-). Aha, and I misremembered, it was block descriptor checksums, not inode checksums: One or more block group descriptor checksums are invalid. Fix? yes Group descriptor 0 checksum is invalid. FIXED. Group descriptor 1 checksum is invalid. FIXED. Group descriptor 2 checksum is invalid. FIXED. Group descriptor 3 checksum is invalid. FIXED. Yeah, what we should be doing here is to try to backup block descriptors and check to see if they are valid, and if so, use them instead. I'm still trying to figure out what went wrong in the OLPC-1.75 + USB disk case. One possibility is that OLPC is unable to provide enough power from the two USB ports to power Seagate Momentus 5400.6, and that the hard drive fails to detect the brown-out and does something wrong. (Are SATA drives expected to work at 4.5V? Because that's what is guaranteed on USB, IIRC). The USB spec seems to require 5V +/i 0.25V, which also seems to be the spec on laptop drives. It wouldn't surprise me if the OLPC (or its power adapter) is a bit dodgy under heavy load, though. It might be useful for you to measure the voltage and amps delivered at the USB ports Heavy corruption happened when I was charging the phone _and_ running the hard drive, from the OLPC. Now I have seen cases when OLPC crashed on device plug-in, in what looked like a brown-out... and from the power brick to see if either is out of spec. - Ted -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ext4: total breakdown on USB hdd, 3.0 kernel
On Sun, Jun 29, 2014 at 10:25:16PM +0200, Pavel Machek wrote: > > One more thing that I noticed: fsck notices bad checksum on inode, and > then offers to fix the checksum with 'y' being the default. If there's > trash in the inode, that will just induce more errors. (Including > potentially doubly-linked blocks?) Would it make more sense to clear > the inodes with bad checksums? Metadata checksums aren't in e2fsprogs 1.41 or 1.42. It will be in the to-be-released e2fsprogs 1.43, and yes, we need to change things so that the default answer is to zero the inode. We didn't do that initially because we were more suspicious of the new metadata checksum code in the kernel and e2fsprogs than we were of hardware faults. :-) Cheers, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ext4: total breakdown on USB hdd, 3.0 kernel
Hi! > > It looks like the filesystem contains _way_ too many 0x's: > > That sounds like it's a hardware issue. It may be that the controller > did something insane while trying to do a write at the point when the > disk drive was disconnected (and so the drive suffered a power > drop). Interesting. I tried to compare damaged image with the original, and yes, way too many 0x. But they are not even block aligned? And they start from byte 0... that area is not normally written, IIRC? 000 * 030 07ff 040 * 3f0 400 3e28 002d 410 fd57 000c 420 * 550 560 570 4ddb 0055 580 590 007e 5a0 * 5c0 682e 53ac 5d0 3a29 000a 0515 d144 002e 5e0 7865 3474 6d5f 7061 625f 6f6c 6b63 0073 5f0 600 * 0001000 41c0 03e9 1000 6133 53ac 6133 53ac > > And for every bug in kernel, there's one in fsck: I did not expect it, but > > fsck actually > > suceeded, and marked fs as clean. But second fsck had issues with > > /lost+found... > > I'd need the previous fsck transcript to have any idea what might have > happened. I'll note though you are using an ancient version of e2fsck > (1.41.12, and there have been a huge number of bug fixes since > May 2010) Sorry for picking at fsck. No, it did quite a good job given circumstances... and it probably does not make sense to debug old version. One more thing that I noticed: fsck notices bad checksum on inode, and then offers to fix the checksum with 'y' being the default. If there's trash in the inode, that will just induce more errors. (Including potentially doubly-linked blocks?) Would it make more sense to clear the inodes with bad checksums? Thanks and best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ext4: total breakdown on USB hdd, 3.0 kernel
Hi! It looks like the filesystem contains _way_ too many 0x's: That sounds like it's a hardware issue. It may be that the controller did something insane while trying to do a write at the point when the disk drive was disconnected (and so the drive suffered a power drop). Interesting. I tried to compare damaged image with the original, and yes, way too many 0x. But they are not even block aligned? And they start from byte 0... that area is not normally written, IIRC? 000 * 030 07ff 040 * 3f0 400 3e28 002d 410 fd57 000c 420 * 550 560 570 4ddb 0055 580 590 007e 5a0 * 5c0 682e 53ac 5d0 3a29 000a 0515 d144 002e 5e0 7865 3474 6d5f 7061 625f 6f6c 6b63 0073 5f0 600 * 0001000 41c0 03e9 1000 6133 53ac 6133 53ac And for every bug in kernel, there's one in fsck: I did not expect it, but fsck actually suceeded, and marked fs as clean. But second fsck had issues with /lost+found... I'd need the previous fsck transcript to have any idea what might have happened. I'll note though you are using an ancient version of e2fsck (1.41.12, and there have been a huge number of bug fixes since May 2010) Sorry for picking at fsck. No, it did quite a good job given circumstances... and it probably does not make sense to debug old version. One more thing that I noticed: fsck notices bad checksum on inode, and then offers to fix the checksum with 'y' being the default. If there's trash in the inode, that will just induce more errors. (Including potentially doubly-linked blocks?) Would it make more sense to clear the inodes with bad checksums? Thanks and best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ext4: total breakdown on USB hdd, 3.0 kernel
On Sun, Jun 29, 2014 at 10:25:16PM +0200, Pavel Machek wrote: One more thing that I noticed: fsck notices bad checksum on inode, and then offers to fix the checksum with 'y' being the default. If there's trash in the inode, that will just induce more errors. (Including potentially doubly-linked blocks?) Would it make more sense to clear the inodes with bad checksums? Metadata checksums aren't in e2fsprogs 1.41 or 1.42. It will be in the to-be-released e2fsprogs 1.43, and yes, we need to change things so that the default answer is to zero the inode. We didn't do that initially because we were more suspicious of the new metadata checksum code in the kernel and e2fsprogs than we were of hardware faults. :-) Cheers, - Ted -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ext4: total breakdown on USB hdd, 3.0 kernel
On Thu, 2014-06-26 at 22:20 +0200, Pavel Machek wrote: > Hi! > > Ok, this ext4 filesystem does _not_ have easy life: it is in usb > envelope, I wanted > to use it as a root filesystem, and it is connected to OLPC-1.75, > running some kind > of linux-3.0 kernels. > > So power disconnects are common, and even during regular reboot, I > hear disk doing > emergency parking. > > I don't know how barriers work over USB... Just like with other SCSI devices. HTH Oliver -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ext4: total breakdown on USB hdd, 3.0 kernel
On Thu, 2014-06-26 at 22:20 +0200, Pavel Machek wrote: Hi! Ok, this ext4 filesystem does _not_ have easy life: it is in usb envelope, I wanted to use it as a root filesystem, and it is connected to OLPC-1.75, running some kind of linux-3.0 kernels. So power disconnects are common, and even during regular reboot, I hear disk doing emergency parking. I don't know how barriers work over USB... Just like with other SCSI devices. HTH Oliver -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ext4: total breakdown on USB hdd, 3.0 kernel
On Thu, Jun 26, 2014 at 10:50:49PM +0200, Pavel Machek wrote: > > And for every bug in kernel, there's one in fsck: I did not expect it, but > fsck actually > suceeded, and marked fs as clean. But second fsck had issues with > /lost+found... I'd need the previous fsck transcript to have any idea what might have happened. I'll note though you are using an ancient version of e2fsck (1.41.12, and there have been a huge number of bug fixes since May 2010) - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ext4: total breakdown on USB hdd, 3.0 kernel
On Thu, Jun 26, 2014 at 10:30:52PM +0200, Pavel Machek wrote: > > It looks like the filesystem contains _way_ too many 0x's: That sounds like it's a hardware issue. It may be that the controller did something insane while trying to do a write at the point when the disk drive was disconnected (and so the drive suffered a power drop). > I saved beggining of the filesystem using cat /dev/sdc4 | gzip -9 - > > /dev/sda3, but > then ran out of patience. So there may be something for analysis, but... The way to snapshot just the metadata blocks for analysis is: e2image -r /dev/hdc4 | bzip2 > ~/hdc4.e2i.bz2 But in this case, it's I doubt it will be very helpful, because fundamentally, this appears to be a hardware issue. - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ext4: total breakdown on USB hdd, 3.0 kernel
On Thu 2014-06-26 22:30:52, Pavel Machek wrote: > Hi! > > > Ok, this ext4 filesystem does _not_ have easy life: it is in usb envelope, > > I wanted > > to use it as a root filesystem, and it is connected to OLPC-1.75, running > > some kind > > of linux-3.0 kernels. > > > > So power disconnects are common, and even during regular reboot, I hear > > disk doing > > emergency parking. > > > > I don't know how barriers work over USB... > > > > Plus the drive has physical bad blocks, but I attempted to mark them with > > fsck -c. > > > > OTOH, it is just a root filesystem... and nothing above should prevent > > correct operation > > (right?) > > > > On last mount, it remounted itself read-only, so there's time for fsck, I > > guess... > > > > But I believe this means I am going to lose all the data on the filesystem, > > right? > > It looks like the filesystem contains _way_ too many 0x's: > > Inode 655221 has compression flag set on filesystem without compression > support. Clear? yes > > Inode 655221 has INDEX_FL flag set but is not a directory. > Clear HTree index? yes ... And for every bug in kernel, there's one in fsck: I did not expect it, but fsck actually suceeded, and marked fs as clean. But second fsck had issues with /lost+found... -bash-4.1# fsck /dev/sdc4 fsck from util-linux-ng 2.18 e2fsck 1.41.12 (17-May-2010) armroot: clean, 132690/985424 files, 1023715/3934116 blocks -bash-4.1# fsck -f /dev/sdc4 fsck from util-linux-ng 2.18 e2fsck 1.41.12 (17-May-2010) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity '..' in /lost+found/#652090/auth-for-pavel-wzJd6X (17) is /lost+found (11), should be /lost+found/#652090 (652090). Fix? yes Pass 4: Checking reference counts Pass 5: Checking group summary information armroot: * FILE SYSTEM WAS MODIFIED * armroot: 132690/985424 files (0.1% non-contiguous), 1023715/3934116 blocks -bash-4.1# -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ext4: total breakdown on USB hdd, 3.0 kernel
Hi! > Ok, this ext4 filesystem does _not_ have easy life: it is in usb envelope, I > wanted > to use it as a root filesystem, and it is connected to OLPC-1.75, running > some kind > of linux-3.0 kernels. > > So power disconnects are common, and even during regular reboot, I hear disk > doing > emergency parking. > > I don't know how barriers work over USB... > > Plus the drive has physical bad blocks, but I attempted to mark them with > fsck -c. > > OTOH, it is just a root filesystem... and nothing above should prevent > correct operation > (right?) > > On last mount, it remounted itself read-only, so there's time for fsck, I > guess... > > But I believe this means I am going to lose all the data on the filesystem, > right? It looks like the filesystem contains _way_ too many 0x's: Inode 655221 has compression flag set on filesystem without compression support. Clear? yes Inode 655221 has INDEX_FL flag set but is not a directory. Clear HTree index? yes Inode 655221 should not have EOFBLOCKS_FL set (size 18446744073709551615, lblk -1) Clear? yes Inode 655221, i_size is 18446744073709551615, should be 0. Fix? yes Inode 655221, i_blocks is 281474976710655, should be 0. Fix? yes Inode 655222 is in use, but has dtime set. Fix? yes Inode 655222 has imagic flag set. Clear? yes Inode 655222 has a extra size (65535) which is invalid Fix? yes Inode 655222 has compression flag set on filesystem without compression support. Clear? yes Inode 655222 has INDEX_FL flag set but is not a directory. Clear HTree index? yes Inode 655222 should not have EOFBLOCKS_FL set (size 18446744073709551615, lblk -1) Clear? I saved beggining of the filesystem using cat /dev/sdc4 | gzip -9 - > /dev/sda3, but then ran out of patience. So there may be something for analysis, but... Any ideas? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
ext4: total breakdown on USB hdd, 3.0 kernel
Hi! Ok, this ext4 filesystem does _not_ have easy life: it is in usb envelope, I wanted to use it as a root filesystem, and it is connected to OLPC-1.75, running some kind of linux-3.0 kernels. So power disconnects are common, and even during regular reboot, I hear disk doing emergency parking. I don't know how barriers work over USB... Plus the drive has physical bad blocks, but I attempted to mark them with fsck -c. OTOH, it is just a root filesystem... and nothing above should prevent correct operation (right?) On last mount, it remounted itself read-only, so there's time for fsck, I guess... But I believe this means I am going to lose all the data on the filesystem, right? Any idea what could have happened? It looks like garbage written over the filesystem, right? I'm using devicemapper on another partition (for encrypted ext4). I feel I lost that filesystem, too, but without root filesystem, I can't check it easily. Any idea what to do so that it does not repeat? Should I switch to plain ext2? Pavel -bash-4.1# fsck /dev/sdc4 fsck from util-linux-ng 2.18 e2fsck 1.41.12 (17-May-2010) fsck.ext2: Superblock invalid, trying backup blocks... Superblock has an invalid journal (inode 8). Clear? yes *** ext3 journal has been deleted - filesystem is now ext2 only *** One or more block group descriptor checksums are invalid. Fix? yes Group descriptor 0 checksum is invalid. FIXED. Group descriptor 1 checksum is invalid. FIXED. Group descriptor 2 checksum is invalid. FIXED. Group descriptor 3 checksum is invalid. FIXED. Group descriptor 4 checksum is invalid. FIXED. Group descriptor 5 checksum is invalid. FIXED. Group descriptor 6 checksum is invalid. FIXED. Group descriptor 7 checksum is invalid. FIXED. Group descriptor 8 checksum is invalid. FIXED. Group descriptor 9 checksum is invalid. FIXED. Group descriptor 10 checksum is invalid. FIXED. Group descriptor 11 checksum is invalid. FIXED. Group descriptor 12 checksum is invalid. FIXED. Group descriptor 13 checksum is invalid. FIXED. Group descriptor 14 checksum is invalid. FIXED. Group descriptor 15 checksum is invalid. FIXED. Group descriptor 16 checksum is invalid. FIXED. Group descriptor 17 checksum is invalid. FIXED. Group descriptor 18 checksum is invalid. FIXED. Group descriptor 19 checksum is invalid. FIXED. Group descriptor 20 checksum is invalid. FIXED. Group descriptor 21 checksum is invalid. FIXED. Group descriptor 22 checksum is invalid. FIXED. Group descriptor 23 checksum is invalid. FIXED. Group descriptor 24 checksum is invalid. FIXED. Group descriptor 25 checksum is invalid. FIXED. Group descriptor 26 checksum is invalid. FIXED. Group descriptor 27 checksum is invalid. FIXED. Group descriptor 28 checksum is invalid. FIXED. Group descriptor 29 checksum is invalid. FIXED. Group descriptor 30 checksum is invalid. FIXED. Group descriptor 31 checksum is invalid. FIXED. Group descriptor 32 checksum is invalid. FIXED. Group descriptor 33 checksum is invalid. FIXED. Group descriptor 34 checksum is invalid. FIXED. Group descriptor 35 checksum is invalid. FIXED. Group descriptor 36 checksum is invalid. FIXED. Group descriptor 37 checksum is invalid. FIXED. Group descriptor 38 checksum is invalid. FIXED. Group descriptor 39 checksum is invalid. FIXED. Group descriptor 40 checksum is invalid. FIXED. Group descriptor 41 checksum is invalid. FIXED. Group descriptor 42 checksum is invalid. FIXED. Group descriptor 43 checksum is invalid. FIXED. Group descriptor 44 checksum is invalid. FIXED. Group descriptor 45 checksum is invalid. FIXED. Group descriptor 46 checksum is invalid. FIXED. Group descriptor 47 checksum is invalid. FIXED. Group descriptor 48 checksum is invalid. FIXED. Group descriptor 49 checksum is invalid. FIXED. Group descriptor 50 checksum is invalid. FIXED. Group descriptor 51 checksum is invalid. FIXED. Group descriptor 52 checksum is invalid. FIXED. Group descriptor 53 checksum is invalid. FIXED. Group descriptor 54 checksum is invalid. FIXED. Group descriptor 55 checksum is invalid. FIXED. Group descriptor 56 checksum is invalid. FIXED. Group descriptor 57 checksum is invalid. FIXED. Group descriptor 58 checksum is invalid. FIXED. Group descriptor 59 checksum is invalid. FIXED. Group descriptor 60 checksum is invalid. FIXED. Group descriptor 61 checksum is invalid. FIXED. Group descriptor 62 checksum is invalid. FIXED. Group descriptor 63 checksum is invalid. FIXED. Group descriptor 64 checksum is invalid. FIXED. Group descriptor 65 checksum is invalid. FIXED. Group descriptor 66 checksum is invalid. FIXED. Group descriptor 67 checksum is invalid. FIXED. Group descriptor 68 checksum is invalid. FIXED. Group descriptor 69 checksum is invalid. FIXED. Group descriptor 70 checksum is invalid. FIXED. Group descriptor 71 checksum is invalid. FIXED. Group descriptor 72
ext4: total breakdown on USB hdd, 3.0 kernel
Hi! Ok, this ext4 filesystem does _not_ have easy life: it is in usb envelope, I wanted to use it as a root filesystem, and it is connected to OLPC-1.75, running some kind of linux-3.0 kernels. So power disconnects are common, and even during regular reboot, I hear disk doing emergency parking. I don't know how barriers work over USB... Plus the drive has physical bad blocks, but I attempted to mark them with fsck -c. OTOH, it is just a root filesystem... and nothing above should prevent correct operation (right?) On last mount, it remounted itself read-only, so there's time for fsck, I guess... But I believe this means I am going to lose all the data on the filesystem, right? Any idea what could have happened? It looks like garbage written over the filesystem, right? I'm using devicemapper on another partition (for encrypted ext4). I feel I lost that filesystem, too, but without root filesystem, I can't check it easily. Any idea what to do so that it does not repeat? Should I switch to plain ext2? Pavel -bash-4.1# fsck /dev/sdc4 fsck from util-linux-ng 2.18 e2fsck 1.41.12 (17-May-2010) fsck.ext2: Superblock invalid, trying backup blocks... Superblock has an invalid journal (inode 8). Cleary? yes *** ext3 journal has been deleted - filesystem is now ext2 only *** One or more block group descriptor checksums are invalid. Fixy? yes Group descriptor 0 checksum is invalid. FIXED. Group descriptor 1 checksum is invalid. FIXED. Group descriptor 2 checksum is invalid. FIXED. Group descriptor 3 checksum is invalid. FIXED. Group descriptor 4 checksum is invalid. FIXED. Group descriptor 5 checksum is invalid. FIXED. Group descriptor 6 checksum is invalid. FIXED. Group descriptor 7 checksum is invalid. FIXED. Group descriptor 8 checksum is invalid. FIXED. Group descriptor 9 checksum is invalid. FIXED. Group descriptor 10 checksum is invalid. FIXED. Group descriptor 11 checksum is invalid. FIXED. Group descriptor 12 checksum is invalid. FIXED. Group descriptor 13 checksum is invalid. FIXED. Group descriptor 14 checksum is invalid. FIXED. Group descriptor 15 checksum is invalid. FIXED. Group descriptor 16 checksum is invalid. FIXED. Group descriptor 17 checksum is invalid. FIXED. Group descriptor 18 checksum is invalid. FIXED. Group descriptor 19 checksum is invalid. FIXED. Group descriptor 20 checksum is invalid. FIXED. Group descriptor 21 checksum is invalid. FIXED. Group descriptor 22 checksum is invalid. FIXED. Group descriptor 23 checksum is invalid. FIXED. Group descriptor 24 checksum is invalid. FIXED. Group descriptor 25 checksum is invalid. FIXED. Group descriptor 26 checksum is invalid. FIXED. Group descriptor 27 checksum is invalid. FIXED. Group descriptor 28 checksum is invalid. FIXED. Group descriptor 29 checksum is invalid. FIXED. Group descriptor 30 checksum is invalid. FIXED. Group descriptor 31 checksum is invalid. FIXED. Group descriptor 32 checksum is invalid. FIXED. Group descriptor 33 checksum is invalid. FIXED. Group descriptor 34 checksum is invalid. FIXED. Group descriptor 35 checksum is invalid. FIXED. Group descriptor 36 checksum is invalid. FIXED. Group descriptor 37 checksum is invalid. FIXED. Group descriptor 38 checksum is invalid. FIXED. Group descriptor 39 checksum is invalid. FIXED. Group descriptor 40 checksum is invalid. FIXED. Group descriptor 41 checksum is invalid. FIXED. Group descriptor 42 checksum is invalid. FIXED. Group descriptor 43 checksum is invalid. FIXED. Group descriptor 44 checksum is invalid. FIXED. Group descriptor 45 checksum is invalid. FIXED. Group descriptor 46 checksum is invalid. FIXED. Group descriptor 47 checksum is invalid. FIXED. Group descriptor 48 checksum is invalid. FIXED. Group descriptor 49 checksum is invalid. FIXED. Group descriptor 50 checksum is invalid. FIXED. Group descriptor 51 checksum is invalid. FIXED. Group descriptor 52 checksum is invalid. FIXED. Group descriptor 53 checksum is invalid. FIXED. Group descriptor 54 checksum is invalid. FIXED. Group descriptor 55 checksum is invalid. FIXED. Group descriptor 56 checksum is invalid. FIXED. Group descriptor 57 checksum is invalid. FIXED. Group descriptor 58 checksum is invalid. FIXED. Group descriptor 59 checksum is invalid. FIXED. Group descriptor 60 checksum is invalid. FIXED. Group descriptor 61 checksum is invalid. FIXED. Group descriptor 62 checksum is invalid. FIXED. Group descriptor 63 checksum is invalid. FIXED. Group descriptor 64 checksum is invalid. FIXED. Group descriptor 65 checksum is invalid. FIXED. Group descriptor 66 checksum is invalid. FIXED. Group descriptor 67 checksum is invalid. FIXED. Group descriptor 68 checksum is invalid. FIXED. Group descriptor 69 checksum is invalid. FIXED. Group descriptor 70 checksum is invalid. FIXED. Group descriptor 71 checksum is invalid. FIXED. Group descriptor 72
Re: ext4: total breakdown on USB hdd, 3.0 kernel
Hi! Ok, this ext4 filesystem does _not_ have easy life: it is in usb envelope, I wanted to use it as a root filesystem, and it is connected to OLPC-1.75, running some kind of linux-3.0 kernels. So power disconnects are common, and even during regular reboot, I hear disk doing emergency parking. I don't know how barriers work over USB... Plus the drive has physical bad blocks, but I attempted to mark them with fsck -c. OTOH, it is just a root filesystem... and nothing above should prevent correct operation (right?) On last mount, it remounted itself read-only, so there's time for fsck, I guess... But I believe this means I am going to lose all the data on the filesystem, right? It looks like the filesystem contains _way_ too many 0x's: Inode 655221 has compression flag set on filesystem without compression support. Cleary? yes Inode 655221 has INDEX_FL flag set but is not a directory. Clear HTree indexy? yes Inode 655221 should not have EOFBLOCKS_FL set (size 18446744073709551615, lblk -1) Cleary? yes Inode 655221, i_size is 18446744073709551615, should be 0. Fixy? yes Inode 655221, i_blocks is 281474976710655, should be 0. Fixy? yes Inode 655222 is in use, but has dtime set. Fixy? yes Inode 655222 has imagic flag set. Cleary? yes Inode 655222 has a extra size (65535) which is invalid Fixy? yes Inode 655222 has compression flag set on filesystem without compression support. Cleary? yes Inode 655222 has INDEX_FL flag set but is not a directory. Clear HTree indexy? yes Inode 655222 should not have EOFBLOCKS_FL set (size 18446744073709551615, lblk -1) Cleary? I saved beggining of the filesystem using cat /dev/sdc4 | gzip -9 - /dev/sda3, but then ran out of patience. So there may be something for analysis, but... Any ideas? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ext4: total breakdown on USB hdd, 3.0 kernel
On Thu 2014-06-26 22:30:52, Pavel Machek wrote: Hi! Ok, this ext4 filesystem does _not_ have easy life: it is in usb envelope, I wanted to use it as a root filesystem, and it is connected to OLPC-1.75, running some kind of linux-3.0 kernels. So power disconnects are common, and even during regular reboot, I hear disk doing emergency parking. I don't know how barriers work over USB... Plus the drive has physical bad blocks, but I attempted to mark them with fsck -c. OTOH, it is just a root filesystem... and nothing above should prevent correct operation (right?) On last mount, it remounted itself read-only, so there's time for fsck, I guess... But I believe this means I am going to lose all the data on the filesystem, right? It looks like the filesystem contains _way_ too many 0x's: Inode 655221 has compression flag set on filesystem without compression support. Cleary? yes Inode 655221 has INDEX_FL flag set but is not a directory. Clear HTree indexy? yes ... And for every bug in kernel, there's one in fsck: I did not expect it, but fsck actually suceeded, and marked fs as clean. But second fsck had issues with /lost+found... -bash-4.1# fsck /dev/sdc4 fsck from util-linux-ng 2.18 e2fsck 1.41.12 (17-May-2010) armroot: clean, 132690/985424 files, 1023715/3934116 blocks -bash-4.1# fsck -f /dev/sdc4 fsck from util-linux-ng 2.18 e2fsck 1.41.12 (17-May-2010) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity '..' in /lost+found/#652090/auth-for-pavel-wzJd6X (17) is /lost+found (11), should be /lost+found/#652090 (652090). Fixy? yes Pass 4: Checking reference counts Pass 5: Checking group summary information armroot: * FILE SYSTEM WAS MODIFIED * armroot: 132690/985424 files (0.1% non-contiguous), 1023715/3934116 blocks -bash-4.1# -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ext4: total breakdown on USB hdd, 3.0 kernel
On Thu, Jun 26, 2014 at 10:30:52PM +0200, Pavel Machek wrote: It looks like the filesystem contains _way_ too many 0x's: That sounds like it's a hardware issue. It may be that the controller did something insane while trying to do a write at the point when the disk drive was disconnected (and so the drive suffered a power drop). I saved beggining of the filesystem using cat /dev/sdc4 | gzip -9 - /dev/sda3, but then ran out of patience. So there may be something for analysis, but... The way to snapshot just the metadata blocks for analysis is: e2image -r /dev/hdc4 | bzip2 ~/hdc4.e2i.bz2 But in this case, it's I doubt it will be very helpful, because fundamentally, this appears to be a hardware issue. - Ted -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ext4: total breakdown on USB hdd, 3.0 kernel
On Thu, Jun 26, 2014 at 10:50:49PM +0200, Pavel Machek wrote: And for every bug in kernel, there's one in fsck: I did not expect it, but fsck actually suceeded, and marked fs as clean. But second fsck had issues with /lost+found... I'd need the previous fsck transcript to have any idea what might have happened. I'll note though you are using an ancient version of e2fsck (1.41.12, and there have been a huge number of bug fixes since May 2010) - Ted -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/