Raffaele BELARDI <[EMAIL PROTECTED]> posted [EMAIL PROTECTED], excerpted below, on Tue, 20 Nov 2007 08:47:32 +0100:
> So my hypothesis is that the bad blocks or sectors at the beginning of > the partition were not copied, or only partly copied, by dd, and due to > this the superblocks are all shifted down. Although I don't like to > access again the hw, maybe I should try: # dd conv=noerror,sync bs=4096 > if=/dev/hdb of=/mnt/disk_500/sdb.img > > to get an aligned image. Problem is I don't know what bs= should be. > Block size, so 4k? > > Any other option I might have? This sounds reasonable. I run reiserfs here and don't know a whole lot about ext2/3/4, so won't even attempt an opinion at that level of detail. (That's why I left the actual recovery procedure after creating the copy to work with so vague... I wasn't going to try to go there.) However, I can say this. Based on my experience with recovery on reiserfs (and in fact reiserfs and dd-rescue recovery notes, so it's not just me), the block-size doesn't necessarily have to match, as it does copy over "raw", so the data it gets it gets, and the data it doesn't, well... It keeps it in the same order serially, as well, so that's not an issue. What the block-size DOES affect is how much data is operated on at once -- when it reaches bad blocks, that's the unit that's going to determine the amount of missing data. Working on a good disk, a relatively large block size (as long as it can be buffered in memory) is often more efficient, that is, faster, because the big blocks mean lower processing overhead. On a partially bad disk, larger blocks will still allow it to cover the good area faster (but that's trivial time anyway, compared to the time trying to access the bad blocks), AND because the block size is larger, it SHOULD mean less bad blocks to try and try and try before giving up in the bad areas too, so faster there as well. The flip side to the faster access over the bad areas is that as I said, that's the chunk size that's declared bad, so the larger the block size you choose, the more potentially recoverable data gets declared bad when the entire block is declared bad. As for working off the bad disk vs working off an image of it, as long as you can continue to recover data off the bad disk, you can keep trying to use it. The problem, of course, is that every access might be your last, and it's also possible that each time thru may lose a few more blocks of data at the margin. So it's up to you. The aligned image will certainly be easier to work with, but you might not be able to get the same amount of valid data off. ... You never mentioned exactly what happened to the disk. Mine was overheating. I live in Phoenix, AZ, and my AC went out in the middle of the summer, with me gone and the computer left running. With outside temps often reaching close to 50 C (122 F), the temps inside with the AC off could have easily reached 60 C (140 F). Ambient case air temps could therefore have reached 70 C, and with the drive spinning in that... one can only guess what temps it reached! Well, rather obviously, the platters expanded and the heads crashed, grooving out a circle in the platter at whatever location they were at at the time, plus wherever the still operating system told the heads to seek to. However, once I came home and realized what had happened, I shut down and let everything cool down. After replacing the AC, with everything running normal temps again, I was able to boot back up. I ended up with two separate heavily damaged areas in which I could recover little if anything, but fortunately, the partition table and superblocks were intact. I also had been running backup partition copies of most of my valuable stuff, by partition, and was able to recover most of it from that (barring the new stuff since my last backup, which was longer ago then it should have been), since they had been unmounted at the time and therefore didn't have the heads seeking into them, only across them a few times. Actually, perhaps surprisingly, I was able to run those disks for some time without any known additional damage. I did switch disks as soon as possible, because I was leery of continuing to depend on the partially bad ones, but in the mean time, I just checked off the affected partitions as dead, and continued to use the others without issue. In fact, I still have the disk, and might still be using it for extra storage, except that was the second disk I had lost in two years (looking back, the one I'd lost the previous year was probably heat related as well, as it had the same failure pattern, and the AC wasn't doing so well even then), and I decided to switch to RAID and go slower speed but longer warrantee (5 yr) Seagate drives. Those are now going into their third year, without issue (and with a new AC with cooling capacity to spare, so hopefully it'll be several years before I need to worry about /that/ issue again), but at least now I have the RAID backing me up, with most of the system on kernel/md RAID-6, so I can lose up to two of the four drives and maintain data integrity. I am, however, already thinking about how I'll do it better next time, now that I've a bit of RAID experience under my belt. =8^) So anyway, if it was heat related, chances are pretty decent it'll remain relatively stable, no additional data loss, as long as you keep pretty strict watch on the temps and don't let it overheat again. That was my experience this last time, when I know it was heat related, and the time before, which had the same failure pattern, so I'm guessing it was heat related. Of course, you never can tell, but that has been my experience with heat related disk failures, anyway. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- [EMAIL PROTECTED] mailing list
