Re: [gentoo-amd64] Re: Disable fsck on boot, was: How to make watchdog start earlier during bootup?
Duncan wrote: I'd blame that on your choice of RAID (and ultimately on the defective hardware, but it wouldn't have been as bad on RAID-1 or RAID-6), more than on what was running on top of it. Agree - RAID-6 would have helped in this particular circumstance (assuming I didn't lose more than one drive). The non-server hardware still was a big issue. I'm not sure I'd ever go with RAID-6 for personal use - that is a lot of money in non-useful drives. What I'd guess happened is that the dirty/degraded crash happened while the set of stripes that also had the LVM2 record was being written, altho it wasn't necessarily the LVM data itself being written, but just something that happened to be in the same stripe set so the checksum covering it had to be rewritten as well. It's also possible the hardware error you mentioned was affecting the reliability of what the spindle returned even when it didn't cause resets. In that case, even if the data was on a different stripe, the resulting checksum written could end up invalid, thus playing havoc with a recovery. Sounds likely. I think the lvm2 metadata got corrupted. I'm a big fan of zfs and btrfs (once they're production ready) precisely because they try to address the RAID stripe problem with copy-on-write right down to the physical level. data=ordered is the middle ground and I believe what ext3 has always defaulted to, and what reiserfs has defaulted to for years. Yup - using ordered data. From a metadata integrity standpoint I believe this has been shown to be equivalent to data=journal. As you point out once lvm was hosed that didn't help much. Lucky, or more appropriately wise you! There aren't so many folks that backup to normally offline external device that regularly. Honestly, I don't. Yeah - I've learned that lesson over time the hard way. I can't backup everything (at least not with a big investment), but I do use dar and par2 to backup everything important. I just create a dar backup weekly, and then run a script on a laptop to copy the data offline. I don't backup anything that requires snapshots (I use a cron job do do a mysql export separately and back that up), so that works fine for me. This is really just my high value data - when my system was hosed I had to reinstall from stage3, but I had all my /etc config files so getting up and running didn't take a huge amount of effort. However, I did learn the hard way that some programs store their actual config files in /var and symlink them into /etc - be sure to catch those in your backups! I ended up having my samba domain controller SID change which was a headache since now all my usernames don't have their old permissions on all my XP workstations). Granted, this is a house with all of four users, which helped with the cleanup. So... I guess that's something else I can add to my list now, for the next time I setup a new disk set or whatever. To the everything-portage- touches-on-root that I explained in the other replies, and the RAID-6 that I had already chosen over RAID-5, I can now add to the list killing the LVM2 used in my current setup. If you have RAID-6 I'm not sure it is worth worrying about getting rid of LVM2. At least, assuming you don't start having multiple-drive-failures (a possibility with desktop hardware with all the drives sharing the same power cords, interfaces, etc). If you want to think really long term take a look at btrfs. It looks like it aims to be everything that zfs is (minus the GPL-incompatible license). Definitely not ready for prime time, but the proposed feature set looks better than zfs. I don't like the inability to reshape zfs - you can add more arrays to your system, but you can't add one drive to an existing array (online or offline). Btrfs seems to aim to be able to do this. Again, it is completely experimental at this point - don't use it except to try it out. It will be possible to migrate ext3/4 directly in-place to btrfs, and even reverse the migration (minus any changes - it essentially snapshots the existing data). The only limitation is that if you delete files you won't get the space back until you get rid of the ability to migrate back to ext3 (since it is a snapshot).
Re: [gentoo-amd64] Re: Disable fsck on boot, was: How to make watchdog start earlier during bootup?
Richard Freeman, mused, then expounded: Duncan wrote: I'd blame that on your choice of RAID (and ultimately on the defective hardware, but it wouldn't have been as bad on RAID-1 or RAID-6), more than on what was running on top of it. Agree - RAID-6 would have helped in this particular circumstance (assuming I didn't lose more than one drive). The non-server hardware still was a big issue. I'm not sure I'd ever go with RAID-6 for personal use - that is a lot of money in non-useful drives. FWIW - Raid 1 is more reliable than any other raid configuration as long as it can provide the needed storage and bandwidth. Raid 5, 6, 10 all require enterprise class drives. Otherwise the failure rates and the required backups mean a lot more administrative overhead - including stocking spares. Bob -
Re: [gentoo-amd64] Re: Disable fsck on boot, was: How to make watchdog start earlier during bootup?
Duncan wrote: 32 megs? That's small! How old is it? I looked at that and thought to myself typo, he must have meant gigs, but then I saw the below... It's very tiny, I'll admit. It came with a camera I bought about four months ago. Needless to say, this is the first time it's ever been used for anything; I keep a 2GB SD card in said camera. Yes, it must have been 32 meg. reiserfs uses a 4k blocksize by default, and as it mentions, an 8193 block journal, again by default. That's about... 32 MB. So yes indeed, you'd have problems fitting that on a 32 MB device -- it'd be all journal! There are parameters you can add to mkfs.reiserfs/mkreiserfs to change both the block size (-b, 512 byte to 8k, 4k default) and the journal size (-s, 513-32749 blocks, default 8193), thus yielding a minimum journal size of 256.5kb which would have easily fit, but you'd not be expected to know that since you don't run it routinely, and even many who do probably don't know it. I'll do some playing around with that this weekend and see what happens. be written, I don't believe reiserfs is particularly suitable for flash based media. I'm inclined to agree with you; this card was just used for the purpose of improperly unmounting LUKS filesystems because it was laying around. I could just as easily have used a file system image or an external hard drive from the corner store. -- The Doctor [412/724/301/703] PGP: 0x807B17C1 / 7960 1CDC 85C9 0B63 8D9F DD89 3BD8 FF2B 807B 17C1 WWW: http://drwho.virtadpt.net/ ...and that is how we know the earth is banana-shaped. signature.asc Description: OpenPGP digital signature
Re: [gentoo-amd64] Re: Disable fsck on boot, was: How to make watchdog start earlier during bootup?
Beso wrote: 2009/1/21 The Doctor dr...@virtadpt.net: Duncan wrote: and, if you have experiences with it, do you know what could happen without fsck on an unsafely unmounted luks partition? Luks I know nothing of. Someday when I get the appropriate round tuit... I'm using LUKS on a few of my systems, though not with ReiserFS (for reasons of secure data deletion). I could run a few tests later using image files or USB keys and let you know how it goes. that would be much appreciated. I don't have too much experience with luks, but I have used crypto-loop. I would think that the risks to a non-fsck'ed unclean filesystem would be the same with or without the underlying encryption. However, if you managed to hose your filesystem the extra layer of encryption certainly wouldn't make it easier to rescue should you attempt to do so. If you had a big enough partition lying around you could in theory just dd if=/dev/loop# of=/dev/dest to create an unencrypted copy - which would then be hosed in the same way as it would have been if the encryption weren't there. All of this assumes that luks contains no bugs. If the encryption layer botches your data all bets are off. That happened to me with lvm - I managed to hose half my system that way (an fsck on one logical volume managed to hose all the other logical volumes in the same volume group). It is a rare problem, but I'm now just running on bare md devices (and just running on md gives me some options for expanding storage later).
Re: [gentoo-amd64] Re: Disable fsck on boot, was: How to make watchdog start earlier during bootup?
Duncan wrote: and, if you have experiences with it, do you know what could happen without fsck on an unsafely unmounted luks partition? Luks I know nothing of. Someday when I get the appropriate round tuit... I just gave it a try with a 32 megabyte SD card on my laptop. I set up the LUKS volume like so: r...@windbringer ~:# cryptsetup -v -y -c aes-cbc-essiv:sha256 luksFormat /dev/mmcblk0p1 ...opened it: r...@windbringer ~:# cryptsetup luksOpen /dev/mmcblk0p1 test ...but I wasn't able to create a ReiserFS file system: r...@windbringer ~:# mkreiserfs -d /dev/mapper/test [credits from mkreiserfs snipped] Guessing about desired format.. Kernel 2.6.28 is running. reiserfs_create_journal: cannot create a journal of 8193 blocks with 18 offset on 7456 blocks Oops. Too small a device, I guess. So, I tried EXT3: r...@windbringer ~:# mkfs.ext3 -c -L test -v /dev/mapper/test ... r...@windbringer ~:# mount /dev/mapper/test /mnt/disk_image To see what would happen, I copied a load of .jpg files over to the card so there would be a test data set: r...@windbringer ~:# cp ~drwho/*.jpg /mnt/disk_image Then I yanked the SD card out of the slot without unmounting it or running the luksClose command of cryptsetup. Much to my surprise, nothing complained about it. Upon reinserting it, however, the following appeared in the kernel message buffer: ... mmc0: card cb93 removed Buffer I/O error on device dm-3, logical block 539 lost page write due to I/O error on dm-3 Aborting journal on device dm-3. Buffer I/O error on device dm-3, logical block 369 lost page write due to I/O error on dm-3 journal commit I/O error mmc0: new SD card at address cb93 mmcblk1: mmc0:cb93 S032B 29.6 MiB mmcblk1: p1 Nautilus prompted me to enter the LUKS passphrase to open the device; checking the kernel message buffer showed that the journal recovery was complete, and the volume could be properly read. I checked the files and none of them were corrupted. Next test: delete a few files, don't run sync(1), yank the card, and try again. Kernel output: ... mmc0: card cb93 removed Buffer I/O error on device dm-4, logical block 419 lost page write due to I/O error on dm-4 Aborting journal on device dm-4. Buffer I/O error on device dm-4, logical block 369 lost page write due to I/O error on dm-4 journal commit I/O error ext3_abort called. EXT3-fs error (device dm-4): ext3_put_super: Couldn't clean up the journal Remounting filesystem read-only After plugging the card back in: ... mmc0: new SD card at address cb93 mmcblk1: mmc0:cb93 S032B 29.6 MiB mmcblk1: p1 kjournald starting. Commit interval 5 seconds EXT3 FS on dm-4, internal journal EXT3-fs: recovery complete. EXT3-fs: mounted filesystem with ordered data mode. The deletions that took place before I pulled the card seemed to stay deleted when the journal was replayed. I did this four more times to see what would happen, and I didn't encounter any file system corruption. I haven't tried any other file systems yet, but if there is sufficient interest I'll start walking through the contents of `cat /proc/filesystems | grep -v ^nodev` -- The Doctor [412/724/301/703] PGP: 0x807B17C1 / 7960 1CDC 85C9 0B63 8D9F DD89 3BD8 FF2B 807B 17C1 WWW: http://drwho.virtadpt.net/ My faith protects me. My Kevlar helps. --Michael Carpenter, _Death Masks_ signature.asc Description: OpenPGP digital signature
Re: [gentoo-amd64] Re: Disable fsck on boot, was: How to make watchdog start earlier during bootup?
2009/1/22 Duncan 1i5t5.dun...@cox.net: Richard Freeman ri...@gentoo.org posted 49789d9c.7040...@gentoo.org, excerpted below, on Thu, 22 Jan 2009 11:23:56 -0500: All of this assumes that luks contains no bugs. If the encryption layer botches your data all bets are off. That happened to me with lvm - I managed to hose half my system that way (an fsck on one logical volume managed to hose all the other logical volumes in the same volume group). It is a rare problem, but I'm now just running on bare md devices (and just running on md gives me some options for expanding storage later). Hmm, interesting. I run my main system and a backup image of same direct on partitioned mdp/RAID (RAID doesn't cure the fat-finger or botched upgrade problem, that's what the backup image is for), so I have all my applications available without lvm, but I use lvm2 on top of RAID for most of my data partitions and their backups. I've never had a problem with that using reiserfs on lvm2 on RAID-6 here, nor have I heard of anyone else having that sort of problems with lvm, at least not since the lvm2 era. The problems I've had with LVM are simply its inconvenience and administration complexity when there are layers on layers, since there's no way to put / on it without using an initramfs/initrd, which I didn't want to use. The partitioned RAID is nice in that regard since the kernel can handle that directly. If I were to redo it today, I'd consider eliminating the LVM2 layer for the data for that reason alone. well, i think that the lvm2 layer is still good even when used on a single disk. especially when you don't know how the partitions would look like. i've had big time saves by resizing lvm2 array than copying, removing partitions, recreating them and then recopying files into the newer ones. as for the / i'm considering using / + /boot on a usb disk (nowadays booting from usb devices is no pain) and would prevent me from exposing ciphered luks data. it's true that loosing the key would mean a total disaster, but it's simpler to have 2-3 2gb usb keys (which mean about 20-30€) as root and have an entire luks+raided partition. if you'd were to go even further putting another external usb key as authentication key for the encrypted partion would be even more secure. -- dott. ing. beso
Re: [gentoo-amd64] Re: Disable fsck on boot, was: How to make watchdog start earlier during bootup?
2009/1/22 The Doctor dr...@virtadpt.net: Duncan wrote: and, if you have experiences with it, do you know what could happen without fsck on an unsafely unmounted luks partition? Luks I know nothing of. Someday when I get the appropriate round tuit... I just gave it a try with a 32 megabyte SD card on my laptop. I set up the LUKS volume like so: r...@windbringer ~:# cryptsetup -v -y -c aes-cbc-essiv:sha256 luksFormat /dev/mmcblk0p1 ...opened it: r...@windbringer ~:# cryptsetup luksOpen /dev/mmcblk0p1 test ...but I wasn't able to create a ReiserFS file system: r...@windbringer ~:# mkreiserfs -d /dev/mapper/test [credits from mkreiserfs snipped] Guessing about desired format.. Kernel 2.6.28 is running. reiserfs_create_journal: cannot create a journal of 8193 blocks with 18 offset on 7456 blocks Oops. Too small a device, I guess. So, I tried EXT3: r...@windbringer ~:# mkfs.ext3 -c -L test -v /dev/mapper/test ... r...@windbringer ~:# mount /dev/mapper/test /mnt/disk_image To see what would happen, I copied a load of .jpg files over to the card so there would be a test data set: r...@windbringer ~:# cp ~drwho/*.jpg /mnt/disk_image Then I yanked the SD card out of the slot without unmounting it or running the luksClose command of cryptsetup. Much to my surprise, nothing complained about it. Upon reinserting it, however, the following appeared in the kernel message buffer: ... mmc0: card cb93 removed Buffer I/O error on device dm-3, logical block 539 lost page write due to I/O error on dm-3 Aborting journal on device dm-3. Buffer I/O error on device dm-3, logical block 369 lost page write due to I/O error on dm-3 journal commit I/O error mmc0: new SD card at address cb93 mmcblk1: mmc0:cb93 S032B 29.6 MiB mmcblk1: p1 Nautilus prompted me to enter the LUKS passphrase to open the device; checking the kernel message buffer showed that the journal recovery was complete, and the volume could be properly read. I checked the files and none of them were corrupted. Next test: delete a few files, don't run sync(1), yank the card, and try again. Kernel output: ... mmc0: card cb93 removed Buffer I/O error on device dm-4, logical block 419 lost page write due to I/O error on dm-4 Aborting journal on device dm-4. Buffer I/O error on device dm-4, logical block 369 lost page write due to I/O error on dm-4 journal commit I/O error ext3_abort called. EXT3-fs error (device dm-4): ext3_put_super: Couldn't clean up the journal Remounting filesystem read-only After plugging the card back in: ... mmc0: new SD card at address cb93 mmcblk1: mmc0:cb93 S032B 29.6 MiB mmcblk1: p1 kjournald starting. Commit interval 5 seconds EXT3 FS on dm-4, internal journal EXT3-fs: recovery complete. EXT3-fs: mounted filesystem with ordered data mode. The deletions that took place before I pulled the card seemed to stay deleted when the journal was replayed. I did this four more times to see what would happen, and I didn't encounter any file system corruption. I haven't tried any other file systems yet, but if there is sufficient interest I'll start walking through the contents of `cat /proc/filesystems | grep -v ^nodev` these tests seem pretty promising. i'll try out using my old 1gb usb key as testing mule is i'll have enough time this week-end and let you know about happenings. -- dott. ing. beso
Re: [gentoo-amd64] Re: Disable fsck on boot, was: How to make watchdog start earlier during bootup?
2009/1/22 Duncan 1i5t5.dun...@cox.net: Beso givemesug...@gmail.com posted d257c3560901221103l4f11f220mc9f2b7598f7c3...@mail.gmail.com, excerpted below, on Thu, 22 Jan 2009 20:03:55 +0100: as for the / i'm considering using / + /boot on a usb disk (nowadays booting from usb devices is no pain) and would prevent me from exposing ciphered luks data. it's true that loosing the key would mean a total disaster, but it's simpler to have 2-3 2gb usb keys (which mean about 20-30€) as root and have an entire luks+raided partition. Something I found out the hard way, and why I have everything that portage touches on the same partition, is the trouble one goes thru when the /var/db/pkg database doesn't match what's actually installed, due to say /, /usr, and /var being on different partitions/volumes, then losing one and having to revert to a backup, while still having the others at current. i'm using /var/ mounted on a lvm2 partition and never had any issues with it. this is due to the fact that /var and /usr is mounted right after dm-crypt+lvm2 have started, and if the mount fails then i have services not starting since the /var on the / partition is not available for services to write in it. So here, that's all on the same partition. I break off /usr/src, /usr/local and /var/log, and have the Gentoo tree living somewhere other than on /usr as well, but anything that portage touches including its database is all on the same partition, so it all stays in sync if I have to revert to a backup. /usr also is mounted on another lvm2 partition (this helped me a lot with oracle and kde installations) but i had to do some hacks since actually i need some /usr/lib/ files at boot on the / partition before lvm2 is started and this is really a bug, since the /usr shouldn't be read at startup. When I setup this system, since / and its backup are not in LVM, I wanted to give them even more room for growth than I thought I'd need, so I doubled what I was using for growth, and then nearly doubled that again, 10 gig partition size. I currently have both kde3 and kde4 installed so am running rather more than I would otherwise, but I'm running 4.3 gig on /. So a 4 gig USB stick would do it in most cases, an 8-gig stick would be plenty and to spare, but a 2 gig stick wouldn't cut it. with everything stripped from / i found out that it requires less than a 2gb disk. and i really think that you could move out /usr/kde to an lvm partition since it would be mounted (if fstab knows about it) before xorg is started. the problem with my configuration is that you'd have sometimes to reboot on a lvm2 capable environment to resize /usr or /var if something is on use (/var could be resized after shutting down all processes accessing it, but i think it's faster to boot in a simple stripped down terminal distro with lvm2 capabilities to have it resized. what i'm now considering is moving from rsync to git for the backup (this would help me out understanding more into detail git). Not that anyone else necessarily needs to use my everything portage touches on one partition strategy, but I certainly learned /my/ lesson, and don't intend on screwing /that/ one up here again. It's worth considering, anyway. YMMV. i really don't really understand how you could have had this issue if you'd mount the lvm partition at boot via fstab. it's most likely to not happen anything. -- dott. ing. beso
Re: [gentoo-amd64] Re: Disable fsck on boot, was: How to make watchdog start earlier during bootup?
Beso wrote: well, i think that the lvm2 layer is still good even when used on a single disk. especially when you don't know how the partitions would look like. i've had big time saves by resizing lvm2 array than copying, removing partitions, recreating them and then recopying files into the newer ones. I tend to agree, but once bitten twice shy. :( Some details for the curious: I was running lvm2 on top of several raid-5 devices (that is, the raid-5 devices were the lmv2 physical volumes). I created the logical volumes on particular pvs to try to optimize disk seeking, so generally speaking particular partitions resided on only one set of disks. However, some partitions did cross both arrays. (When creating lvs you can tell lvm2 to try to put them on a particular pv, or you can use pvmove to move particular lvs I believe). I was running ext3 on my lvs (and swap). The problem was that I was having some kind of glitch that was causing my computer to reset (I traced it to one of my drives), and when it happened the array would sometimes come up with one of the drives missing. If the glitch happened again while the array was degraded it could cause data loss (no worse than not having RAID at all). When I finally got the bad drive replaced (which generally fixed the resets), I rebuilt my arrays. At that point mdadm was happy with the state of affairs, but fsck was showing loads of errors on some of my filesystems. When I went ahead and let fsck do its job, I immediately started noticing corrupt files all over the place. The majority of the data volume was mpg files from mythtv and I'd find hour-long TV episodes where one minute of some other show would get spliced in. It seemed obvious that files were somehow getting cross-linked (I'm not intimately familiar with ext3, but I could see how this could happen in FAT). Oh - these errors were on a partition that WASN'T fsck'ed (in the command-line-utility sense of the world only I suppose). I also started getting lots of errors on dmesg about attempts to seek past the end of the md devices. I did some googling and found that this had been seen by others - but it was obviously very rare. Fortunately all my most critical data is backed up weekly (only a day or two before the final crash), and I didn't care about the TV too much (I saved what I could and re-recorded anything that got truncated or wasn't watchable). I did find that some of my DVD backups of digital photos were unreadable which has taught me a valuable lesson. Fortunately only some of the photos actually had errors in them, and most were successfully backed up. I'm not longer using lvm2. If I need to expand my RAID I can potentially reshape it (after backups where possible). I miss some of the flexibility, but when I need a few GB of scratch space to test out a filesystem upgrade or something I just use losetup - but I don't care about performance in these cases. I would say that lvm2 is probably safe if you have more reliable hardware. My problem was that a failing drive not only made the drive inaccessible, but it took down the whole system (since hardware on a typical desktop isn't well-isolated). On a decent server a drive failure shouldn't cause errors that bring down the whole system. So, I didn't get the full benefit from RAID. smime.p7s Description: S/MIME Cryptographic Signature
Re: [gentoo-amd64] Re: Disable fsck on boot, was: How to make watchdog start earlier during bootup?
Duncan wrote: and, if you have experiences with it, do you know what could happen without fsck on an unsafely unmounted luks partition? Luks I know nothing of. Someday when I get the appropriate round tuit... I'm using LUKS on a few of my systems, though not with ReiserFS (for reasons of secure data deletion). I could run a few tests later using image files or USB keys and let you know how it goes. -- The Doctor [412/724/301/703] PGP: 0x807B17C1 / 7960 1CDC 85C9 0B63 8D9F DD89 3BD8 FF2B 807B 17C1 WWW: http://drwho.virtadpt.net/ Today is the tomorrow you worried about yesterday. signature.asc Description: OpenPGP digital signature
Re: [gentoo-amd64] Re: Disable fsck on boot, was: How to make watchdog start earlier during bootup?
2009/1/21 The Doctor dr...@virtadpt.net: Duncan wrote: and, if you have experiences with it, do you know what could happen without fsck on an unsafely unmounted luks partition? Luks I know nothing of. Someday when I get the appropriate round tuit... I'm using LUKS on a few of my systems, though not with ReiserFS (for reasons of secure data deletion). I could run a few tests later using image files or USB keys and let you know how it goes. that would be much appreciated. -- dott. ing. beso