Re: [RFH] Partition table recovery
On 07/22/2007 06:28 PM, Theodore Tso wrote: [ Al -- don't drop CCs please ] Well, let's think about this a bit. What are the requirements? 1) The partition manager should be able explicitly request that a new backup of the partition tables be stashed in each filesystem that has room for such a backup. That way, when the user affirmatively makes a partition table change, it can get backed up in all of the right places automatically. D-Bus? ;-) 2) The fsck program should *only* stash a backup of the partition table if there currently isn't one in the filesystem. It may be that the partition table has been corrupted, and so merely doing an fsck should not transfer a current copy of the partition table to the filesystem-secpfic backup area. It could be that the partition table was only partially recovered, and we don't want to overwrite the previously existing backups except on an explicit request from the system administrator. 3) The mkfs program should automatically create a backup of the current partition table layout. That way we get a backup in the newly created filesystem as soon as it is created. On an integrated system like this, do you consider it acceptable to only do the MS-DOS partitions and not the other types that may be present _inside_ those partitions? (MINIX subpartitions, BSD slices, ...). I believe those should really also be done, but this would require keeping more information again. 4) The exact location of the backup may vary from filesystem to filesystem. For ext2/3/4, bytes 512-1023 are always unused, and don't interfere with the boot sector at bytes 0-511, so that's the obvious location. Other filesystems may have that location in use, and some other location might be a better place to store it. Ideally it will be a well-known location, that isn't dependent on finding an inode table, or some such, but that may not be possible for all filesystems. OK, so how about this as a solution that meets the above requirements? /sbin/partbackup device [fspart] Will scan device (i.e., /dev/hda, /dev/sdb, etc.) and create a 512 byte partition backup, using the format I've previously described. If fspart is specified on the command line, it will use the blkid library to determine the filesystem type of fspart, and then attempt to execute /dev/partbackupfs.fstype to write the partition backup to fspart. If fspart is '-', then it will write the 512 byte partition table to stdout. If fspart is not specified on the command line, /sbin/partbackup will iterate over all partitions in device, use the blkid library to attempt to determine the correct filesystem type, and then execute /sbin/partbackupfs.fstype if such a backup program exists. I've cleaned up what I posted yesterday a bit and made it into the type of standalone-by-design program you suggest here (the version from yesterday required a -DTEST to be so). Not with the blkid bits though. Just dumps the sector to stdout (and a textual version to stderr if compiled with -DDEBUG). I (very) briefly looked at blkid but unless I'm mistaken blkid needs device names? The documentation seems to be missing. When scanning the device for the partition table, we've built a list of partitions with offsets into the device and it would be nice if we could hand the fd and the offset off to something directly. If the program has to construct device names itself there's another truckload of pitfalls right there. It wouldn't be hard to log minors as such, but you'd also need to be very sure you'd always do this in the same order as the kernel so that what I consider to be /dev/sda2 is the same the kernel considers it to be. This is again rather fragile. It might in fact make sense to just ask the kernel for the partitions on a device and not bother with scanning anything ourselves. Ie, just walk sysfs. Would you agree? This siginificantly reduces the risk of things getting out of sync, both scanning order and implementation. The kernel doesn't currently store/export everything you'd want to store in a backup (as far as I'm aware, that is) but that could conceivably change. It would make things significantly less fragile. Rene. /* * Public Domain 2007, Rene Herman */ #define _LARGEFILE64_SOURCE #include stdlib.h #include stdio.h #include stdint.h #include string.h #include sys/types.h #include sys/stat.h #include sys/ioctl.h #include unistd.h #include fcntl.h enum { DOS_EXTENDED = 0x05, WIN98_EXTENDED = 0x0f, LINUX_EXTENDED = 0x85, }; struct partition { uint8_t boot_ind; uint8_t head; uint8_t sector; uint8_t cyl; uint8_t sys_ind; uint8_t end_head; uint8_t end_sector; uint8_t end_cyl; uint32_t start; uint32_t size; } __attribute__((packed)); struct entry { uint8_t flags; uint8_t type;
Re: [RFH] Partition table recovery
On Mon, 2007-07-23 10:15:21 +0200, Rene Herman [EMAIL PROTECTED] wrote: /* * Public Domain 2007, Rene Herman */ #define _LARGEFILE64_SOURCE #include stdlib.h #include stdio.h #include stdint.h #include string.h #include sys/types.h #include sys/stat.h #include sys/ioctl.h #include unistd.h #include fcntl.h enum { DOS_EXTENDED = 0x05, WIN98_EXTENDED = 0x0f, LINUX_EXTENDED = 0x85, }; struct partition { uint8_t boot_ind; uint8_t head; Different indention. uint8_t sector; uint8_t cyl; uint8_t sys_ind; uint8_t end_head; uint8_t end_sector; uint8_t end_cyl; uint32_t start; uint32_t size; As multibyte on-disk variables, these will need LE/BE conversion. } __attribute__((packed)); struct entry { uint8_t flags; uint8_t type; uint16_t __1; uint64_t start; uint32_t size; Dito. } __attribute__((packed)); enum { ENTRY_FLAG_PRIMARY = 0x01, ENTRY_FLAG_BOOTABLE = 0x80, }; struct backup { uint8_t signature[8]; uint16_t type; uint8_t heads; uint8_t sectors; uint8_t count; uint8_t __1[3]; struct entry table[31]; } __attribute__((packed)); #define BACKUP_SIGNATURE PARTBAK1 enum { BACKUP_TYPE_MBR = 1, }; struct backup backup = { .signature = BACKUP_SIGNATURE, .type = BACKUP_TYPE_MBR, }; #define ARRAY_SIZE(arr) (sizeof arr / sizeof arr[0]) int is_extended(struct partition *partition) { int ret = 0; switch (partition-sys_ind) { case DOS_EXTENDED: case WIN98_EXTENDED: case LINUX_EXTENDED: ret = 1; } return ret; } unsigned char *get_sector(int fd, uint64_t n) { unsigned char *sector; if (lseek64(fd, n 9, SEEK_SET) 0) { perror(lseek64); return NULL; } sector = malloc(512); if (!sector) { fprintf(stderr, malloc: out of memory\n); return NULL; } if (read(fd, sector, 512) != 512) { perror(read); free(sector); return NULL; } return sector; } void put_sector(unsigned char *sector) { free(sector); } #define TABLE_OFFSET (512 - 2 - 4 * sizeof(struct partition)) inline struct partition *table(unsigned char *sector) { return (struct partition *)(sector + TABLE_OFFSET); } int do_sector(int fd, uint32_t offset, uint32_t start) { unsigned char *sector; struct partition *cur; struct partition *ext = NULL; int ret = 0; sector = get_sector(fd, offset + start); if (!sector) return -1; if (sector[510] != 0x55 || sector[511] != 0xaa) { ret = -1; goto out; } for (cur = table(sector); cur table(sector) + 4; cur++) { struct entry *entry; if (!cur-size) continue; cur-end_head += 1; if (backup.heads cur-end_head) backup.heads = cur-end_head; cur-end_sector = 0x3f; if (backup.sectors cur-end_sector) backup.sectors = cur-end_sector; if (is_extended(cur)) { if (!offset) { ret = do_sector(fd, cur-start, 0); if (ret 0) goto out; } else if (!ext) ext = cur; continue; } if (backup.count == ARRAY_SIZE(backup.table)) { fprintf(stderr, do_sector: out of space!\n); ret = -1; goto out; } entry = backup.table + backup.count++; entry-flags = cur-boot_ind; if (!offset) entry-flags |= ENTRY_FLAG_PRIMARY; entry-type = cur-sys_ind; entry-start = cur-start + start; entry-size = cur-size; LE/BE issues here... } if (ext) ret = do_sector(fd, offset, ext-start); out: put_sector(sector); return ret; } void show_backup(void) { #ifdef DEBUG int i; fprintf(stderr, signature: ); for (i = 0; i 8; i++) fprintf(stderr, %c, backup.signature[i]); fprintf(stderr, \n); fprintf(stderr, type: %d\n, backup.type); fprintf(stderr, heads: %d\n, backup.heads); fprintf(stderr, sectors: %d\n, backup.sectors); fprintf(stderr, count: %d\n, backup.count); for (i = 0; i backup.count; i++) { fprintf(stderr, \n); fprintf(stderr, %2d: flags:
Re: [RFH] Partition table recovery
On 07/23/2007 10:41 AM, Jan-Benedict Glaw wrote: As multibyte on-disk variables, these will need LE/BE conversion. Indeed, thanks -- has been updated in the version that is attached. Also fixes a bug that snuck in (failed to add offset to entry-start). struct entry { uint8_t flags; uint8_t type; uint16_t __1; uint64_t start; uint32_t size; Dito. This can stay for now. The partition table backup would indeed need some defined byte-order but it might be whatever order the filesystem it's backed up onto uses. Since it's not directly written to any filesystem for now, host order will do currently. Looks like a useful program, but you'd definively fix the LE/BE issues. If you do that, it'll be able to even run on BE machines, too. I might disagree with the usefulness. I believe that preferably a backup should be made with help from the kernel (if only by walking sysfs) to avoid things getting out of sync between kernel and backup program. (this program largely does the same as the kernel does but even now there's already a difference in so far that I didn't bother to de-garbage the 3rd and 4th entries in the second level extendeds). Rene. /* * Public Domain 2007, Rene Herman */ #define _LARGEFILE64_SOURCE #include stdlib.h #include stdio.h #include stdint.h #include string.h #include sys/types.h #include sys/stat.h #include sys/ioctl.h #include unistd.h #include fcntl.h #include endian.h #include byteswap.h static inline uint32_t le_32(uint32_t n) { #ifdef __LITTLE_ENDIAN return n; #else return bswap_32(n); #endif } enum { DOS_EXTENDED = 0x05, WIN98_EXTENDED = 0x0f, LINUX_EXTENDED = 0x85, }; struct partition { uint8_t boot_ind; uint8_t head; uint8_t sector; uint8_t cyl; uint8_t sys_ind; uint8_t end_head; uint8_t end_sector; uint8_t end_cyl; uint32_t start; uint32_t size; } __attribute__((packed)); struct entry { uint8_t flags; uint8_t type; uint16_t __1; uint64_t start; uint32_t size; } __attribute__((packed)); enum { ENTRY_FLAG_PRIMARY = 0x01, ENTRY_FLAG_BOOTABLE = 0x80, }; struct backup { uint8_t signature[8]; uint16_t type; uint8_t heads; uint8_t sectors; uint8_t count; uint8_t __1[3]; struct entry table[31]; } __attribute__((packed)); #define BACKUP_SIGNATURE PARTBAK1 enum { BACKUP_TYPE_MBR = 1, }; struct backup backup = { .signature = BACKUP_SIGNATURE, .type = BACKUP_TYPE_MBR, }; #define ARRAY_SIZE(arr) (sizeof arr / sizeof arr[0]) int is_extended(struct partition *partition) { int ret = 0; switch (partition-sys_ind) { case DOS_EXTENDED: case WIN98_EXTENDED: case LINUX_EXTENDED: ret = 1; } return ret; } unsigned char *get_sector(int fd, uint64_t n) { unsigned char *sector; if (lseek64(fd, n 9, SEEK_SET) 0) { perror(lseek64); return NULL; } sector = malloc(512); if (!sector) { fprintf(stderr, malloc: out of memory\n); return NULL; } if (read(fd, sector, 512) != 512) { perror(read); free(sector); return NULL; } return sector; } void put_sector(unsigned char *sector) { free(sector); } #define TABLE_OFFSET (512 - 2 - 4 * sizeof(struct partition)) inline struct partition *table(unsigned char *sector) { return (struct partition *)(sector + TABLE_OFFSET); } int do_sector(int fd, uint32_t offset, uint32_t start) { unsigned char *sector; struct partition *cur; struct partition *ext = NULL; int ret = 0; sector = get_sector(fd, offset + start); if (!sector) return -1; if (sector[510] != 0x55 || sector[511] != 0xaa) { ret = -1; goto out; } for (cur = table(sector); cur table(sector) + 4; cur++) { struct entry *entry; if (!cur-size) continue; cur-end_head += 1; if (backup.heads cur-end_head) backup.heads = cur-end_head; cur-end_sector = 0x3f; if (backup.sectors cur-end_sector) backup.sectors = cur-end_sector; if (is_extended(cur)) { if (!offset) { ret = do_sector(fd, le_32(cur-start), 0); if (ret 0) goto out; } else if (!ext) ext = cur; continue; } if
Re: [RFH] Partition table recovery
On 07/23/2007 12:54 PM, Rene Herman wrote: static inline uint32_t le_32(uint32_t n) { #ifdef __LITTLE_ENDIAN return n; #else return bswap_32(n); #endif } #if __BYTE_ORDER == __LITTLE_ENDIAN, that is. sigh. Rene. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFH] Partition table recovery
On Mon, 2007-07-23 14:39:11 +0200, Rene Herman [EMAIL PROTECTED] wrote: On 07/23/2007 12:54 PM, Rene Herman wrote: static inline uint32_t le_32(uint32_t n) { #ifdef __LITTLE_ENDIAN return n; #else return bswap_32(n); #endif } #if __BYTE_ORDER == __LITTLE_ENDIAN, that is. sigh. Don't forget PDP11 byteorder :-) MfG, JBG -- Jan-Benedict Glaw [EMAIL PROTECTED] +49-172-7608481 Signature of: Wenn ich wach bin, träume ich. the second : signature.asc Description: Digital signature
Re: [RFH] Partition table recovery
On 07/23/2007 03:15 PM, Jan-Benedict Glaw wrote: static inline uint32_t le_32(uint32_t n) { #ifdef __LITTLE_ENDIAN return n; #else return bswap_32(n); #endif } #if __BYTE_ORDER == __LITTLE_ENDIAN, that is. sigh. Don't forget PDP11 byteorder :-) How could I? It cracks me up every time... Rene. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFH] Partition table recovery
On Mon, Jul 23, 2007 at 09:34:25AM +0200, Rene Herman wrote: The most profound issue is _what_ to save. I for example don't cylinder align my partitions (I hate wasting disk just to appease broken software) meaning that not all my end_head/sector values are consistent even at the best of times. Admittedly, I'm terminally analy retentive. Exactly. This is why I earlier said that really the only functional thing to do is not try to be smart and just grab things verbatim. This ofcourse does hinge on one's view of good enough... Heh. Well, my definition of good enough is enough so that the C/H/S fields are sane so that the BIOS can boot the system. So I don't care about grabbing things verbatim since my higher priority is stuffing as much data for as many partitions as possible into 512 bytes. Things that require grabbing the C/H/S fields verbatim to avoid breakage I would classify as broken software. But, your mileage may vary. :-) That's not quite correct. Logicals have a start field relative to the encompassing extended (ie, for me always 1, for others often always 63) and the encompassing extended are relative not to the previous extended but to the level 0 extended (the one in the MBR). This assumes that the extended partition is at the beginning of the disk, yes? Why would anyone do that? I normally have /dev/hda1 at the beginning of the disk, and I normally make /dev/hda4 my extended, and place it *after* partitions at /dev/hda2, /dev/hda3, etc. So at most you get can 32-bit + 32-bit which could, yeah, in principle overflow into the 32nd bit -- it normally won't ofcourse since the start field, being relative, will be small, and I'd expect quite a few bits of software to break on this condition. It would be interesting to see how badly modern Windows systems breaks on this. If Windows 2000 and above works, and Linux works, then if other things break it might be quite sufficient to consider them broken software that we don't need to worry about. With 32-bit values (and 512-byte sectors) you can service 2TB -- anything above that requires something better than MS-DOS partition tables. Well, in about 2-3 years or so we'll seeing having singleton disks bigger 2TB. I'm not terribly sanguine about BIOS vendors and OS providers migrating to something better by then, alas. Life is sure going to be interesting. :-) - Ted - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFH] Partition table recovery
On Mon, Jul 23, 2007 at 10:15:21AM +0200, Rene Herman wrote: On an integrated system like this, do you consider it acceptable to only do the MS-DOS partitions and not the other types that may be present _inside_ those partitions? (MINIX subpartitions, BSD slices, ...). I believe those should really also be done, but this would require keeping more information again. Well, I'm considering this to be a MBR backup scheme, so Minix and BSD slices are legacy systems which are out of scope. If they are busted in the same way as MBR in terms of not having redundant backups of critical data, when they have a lot fewer excuses that MBR, and they can address that issue in their own way. The number of Linux users that also have Minix and BSD partitions are a vanishingly small number in any case. I (very) briefly looked at blkid but unless I'm mistaken blkid needs device names? The documentation seems to be missing. When scanning the device for the partition table, we've built a list of partitions with offsets into the device and it would be nice if we could hand the fd and the offset off to something directly. If the program has to construct device names itself there's another truckload of pitfalls right there. Yeah, good point, I'd have to add that support into blkid. It's been on my todo list, but I just haven't gotten around to it yet. It might in fact make sense to just ask the kernel for the partitions on a device and not bother with scanning anything ourselves. Ie, just walk sysfs. Would you agree? This siginificantly reduces the risk of things getting out of sync, both scanning order and implementation. My concern of sysfs is that #1, it won't work on older kernels since you would need to add new fields to backup what we want, and #2, I'm still fundamentally distrustful of sysfs because there isn't a bright line between what is an exported interface that will never change, and something which is considered an internal implementation detail that can change whenever some kernel hacker feels like it. (Or when some kernel hacker is careless...) So as far as I'm concerned sysfs is a terrible, TERRIBLE way to export a published interface where we promise stability to userspace. So I'd just as soon do this in userspace; after all, the entire partition manager (and there are multiple ones; fdisk, sfdisk, gpart, etc.) all in userspace, and that needs to be in synch with the kernel partition reading code anyway. So one more userspace implementation is in my mind much cleaner than trying to push the needed functionality into sysfs, and then hoping against hope that it doesn't accidentally change in the future. - Ted - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFH] Partition table recovery
Rene Herman wrote: On 07/23/2007 10:41 AM, Jan-Benedict Glaw wrote: As multibyte on-disk variables, these will need LE/BE conversion. Indeed, thanks -- has been updated in the version that is attached. Also fixes a bug that snuck in (failed to add offset to entry-start). struct entry { uint8_t flags; uint8_t type; uint16_t __1; uint64_t start; uint32_t size; Dito. This can stay for now. The partition table backup would indeed need some defined byte-order but it might be whatever order the filesystem it's backed up onto uses. Since it's not directly written to any filesystem for now, host order will do currently. Looks like a useful program, but you'd definively fix the LE/BE issues. If you do that, it'll be able to even run on BE machines, too. I might disagree with the usefulness. I believe that preferably a backup should be made with help from the kernel (if only by walking sysfs) to avoid things getting out of sync between kernel and backup program. (this program largely does the same as the kernel does but even now there's already a difference in so far that I didn't bother to de-garbage the 3rd and 4th entries in the second level extendeds). How can I politely say this code really needs comments? To quote the late R. W. Benway, If it was hard to write it should be hard to understand. (regarding code in FORTRAN II on punched cards, ca 1965) Rene. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFH] Partition table recovery
On 07/23/2007 03:48 PM, Theodore Tso wrote: On Mon, Jul 23, 2007 at 09:34:25AM +0200, Rene Herman wrote: That's not quite correct. Logicals have a start field relative to the encompassing extended (ie, for me always 1, for others often always 63) and the encompassing extended are relative not to the previous extended but to the level 0 extended (the one in the MBR). This assumes that the extended partition is at the beginning of the disk, yes? Err, well, no, that's not what I meant. The start field for the extended partition that sits in the primary partitition table (the one in the MBR) is absolute, or relative to the start of the disk, but the start field for the empty extended partitions that together form the logical partition list are relative not to the previous one in the list, but all to this outermost extended partition. Why would anyone do that? I normally have /dev/hda1 at the beginning of the disk, and I normally make /dev/hda4 my extended, and place it *after* partitions at /dev/hda2, /dev/hda3, etc. ... but having said that, I do actually have an extended partition as my /dev/hda1 at the beginning of the disk. This is the current layout on my main system: Device BootStart End #sectors Id System /dev/sda1 1 231733119 231733119 85 Linux extended /dev/sda2 * 231733120 2401217278388608 c W95 FAT32 (LBA) /dev/sda3 0 - 0 0 Empty /dev/sda4 0 - 0 0 Empty /dev/sda5 2 20971532097152 82 Linux swap /dev/sda6 2097155 18874370 16777216 83 Linux /dev/sda7 18874372 35651587 16777216 83 Linux /dev/sda8 35651589 231733119 196081531 83 Linux As you can see, everything neatly non-cylinder-aligned, with not a single sector wasted ;-) Table sectors at 0 (MBR), 1 (outer extended), 2097154, 18874371 and 35651588 (list-extendeds). /dev/sda2 used to be a FreeBSD install (partition type 0xa5), /dev/sda3 a MINIX install (type 0x81) and /dev/sda4 the still present FAT32 Windows 98 partition at the very end of the disk. I removed FreeBSD and MINIX due to space shortage... The reason that I use the first entry for an extended is that I view the type Linux Extended simply as Linux: That is, I see 0x85 simply as the one and only Linux type with all my Linux data partitions on the logicals inside -- very much like 0xa5 is the one FreeBSD type with all its data partitions on the slices inside, and 0x81 the one MINIX partition with its data partitions on the subpartitions inside. That is, I've been using a Linux native partitioning scheme where the Linux native layout just happens to coincide with a DOS/Windows native layout. My Linux partition is at the start of the disk since it's the system I use. The others are/were there just to boot perhaps a few times a year to check some things -- and the start of the disk is the fastest bit, so I certainly want my main system to use that. Anyone find my Native Linux Partitioning Scheme interesting? Designing and using a better way than regular logicals to carve up the space inside (such as something designed after BSD slices) would work for me as well ;-) It would be interesting to see how badly modern Windows systems breaks on this. If Windows 2000 and above works, and Linux works, then if other things break it might be quite sufficient to consider them broken software that we don't need to worry about. Googling for it, the 2TB limit of DOS partitioning is widely known and there would be no point worrying even about the single-bit overflow possibly of the list of extendeds... With 32-bit values (and 512-byte sectors) you can service 2TB -- anything above that requires something better than MS-DOS partition tables. Well, in about 2-3 years or so we'll seeing having singleton disks bigger 2TB. I'm not terribly sanguine about BIOS vendors and OS providers migrating to something better by then, alas. Life is sure going to be interesting. :-) And sectors probably larger than 512 bytes. I hope they'll not do _that_ until plain old partitions are truly abandoned since before you know it someone going to view it as an excuse to keep using this fragile mess ;-) Rene. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFH] Partition table recovery
On 07/23/2007 03:58 PM, Theodore Tso wrote: Well, I'm considering this to be a MBR backup scheme, so Minix and BSD slices are legacy systems which are out of scope. If they are busted in the same way as MBR in terms of not having redundant backups of critical data, when they have a lot fewer excuses that MBR, and they can address that issue in their own way. The number of Linux users that also have Minix and BSD partitions are a vanishingly small number in any case. I'd in fact expect quite a few people to have a FreeBSD partition around. And MINIX if they are in university and in an operating systems course... But they should take whatever precautions they want themselves is a valid argument. [ blkid ] Yeah, good point, I'd have to add that support into blkid. It's been on my todo list, but I just haven't gotten around to it yet. I'll for now stop updating the partbackup thingy as posted. Given that Linux only follows the first extended in the list of extendeds (which sort of destroys the nice recursion anyway) it might want to be iterative instead of recursive if the thing has a future -- not very important though. My concern of sysfs is that #1, it won't work on older kernels since you would need to add new fields to backup what we want, I'd be okay with that. and #2, I'm still fundamentally distrustful of sysfs because there isn't a bright line between what is an exported interface that will never change, and something which is considered an internal implementation detail that can change whenever some kernel hacker feels like it. (Or when some kernel hacker is careless...) So as far as I'm concerned sysfs is a terrible, TERRIBLE way to export a published interface where we promise stability to userspace. Oh come on, that's going overboard a bit, it's not all _that_ bad! Finding say sda will be possible without breaking too many times. Admittedly, the kernel's partittion scanning order is also not likely to change as it would certainly break userspace, but code duplication, with the possiblity of bugs slipping in at least userspace-ways (considering the kernel the reference no matter what it does) is a concern. Somewhat. A little. So I'd just as soon do this in userspace; after all, the entire partition manager (and there are multiple ones; fdisk, sfdisk, gpart, etc.) all in userspace, and that needs to be in synch with the kernel partition reading code anyway. So one more userspace implementation is in my mind much cleaner than trying to push the needed functionality into sysfs, and then hoping against hope that it doesn't accidentally change in the future. * rene envisions /lib/libpart.so... Not to mention my Grand Visions of a totally new Linux native partitioning scheme probably modelled after BSD slices (as also mentioned in a previous message just now). Or perhaps LVM already fills that role comfortably. It's certainly what I hear everyone talking about these days. Rene. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFH] Partition table recovery
On 07/22/2007 03:11 AM, Theodore Tso wrote: This is a problem. Today the CHS fields in the partition entries don't mean much of anything anymore and Linux happily ignores them but DOS and (hence) Windows 9x do not. From time to time I still have the Windows 98 install that's sitting in a corner of my disk throw a fit just by having set the BIOS from LBA to Large (meaning the geometry the BIOS pretends the disk has changes) for example. Old DOS installs that I keep around for the purpose of hardware testing with the originally supplied drivers make for even more of a don't touch, don't touch! thing -- various version of DOS throw fits for various reasons. This is true, but that's due to the fundamentally broken nature of CHS. You need them to boot, and that's about it. I will say up front that I don't particularly care about legacy operating system such as DOS, Windows 98, or Minix 3. So the idea of simply having the number of heads and sectors in the partition header is that we can reconstruct CHS fields such that it is likely with modern hardware you will get it right. Well, I still don't believe this all to be a great idea but it was sort of fun so the attached does largely what you want -- build a list of all data partitions. The heads/sectors fields it for now just gets from the HDIO_GETGEO call. A better source would be guessing the values from the partition table itself but that _also_ doesn't make too much sense. If you're reconstructing a sanitized version of the table anyway, it makes better sense to reconstruct it with the values HDIO_GETGEO returns at restoration time. I kept your suggested format, but in fact, the 64-bit start value seems not very useful if we're getting the value from a 32-bit field in the old partition tables anyway. With that shrunk down to 32-bit again, there would be enough room for the complete partition table entry... For ancient systems that do all sorts of weird things such as ECHS, etc., yeah, you're pretty much doomed, and the bigger danger comes from futzing with BIOS settings, et. al. But it's 2007, gosh darn it! Here's a quarter, kid, buy yourself a real computer. :-) Thanks, but real computers won't host my ISA cards... Yes, I'm very aware of the extended partitioning scheme mess. What I was proposing to back up here is only the real partitions, not the fake extended partitions. The idea is to store *just* enough information so that a partition table manager can recover the partition tables in such a way that the original filesystem information can be recovered. This should do I guess. It enters all data partitions into the list, in the order in which they are encountered and sets a flag to signify that a partition was a logical rather than primary. Another option would be to just reserve the first 4 entries for the primaries and the rest for the logicals but this saves entries if there are fewer than 4 primaries and was in fact easier... The program enters partitions in what should be the same order as Linux itself does. Primaries from slot 0 to 3 as normal (but not backed up to entry 0 to 3 as said -- the LOGICAL flag indentifies them), extended partitions in the MBR in the order as encountered, with the logicals in the second-level table as encountered, and following only the first extented in the second-level table. Made it into a generic C program -- didn't look at e2fsprogs sources yet. Need to be off now and haven't yet stared at this as long as I'd like so don't slap me if I've left a few bugs in (although it seems to work nicely). The program dumps the backup sector to stdout -- it's ofcourse easy to change it to print the entries out so they're easy to compare against, say, fdisk -l -us. Oh, and once you've looked at it, please throw it away. As said, I still don't think it's a great idea ;-) Rene. /* * Public Domain 2007, Rene Herman * * gcc -W -Wall -DTEST -D_LARGEFILE64_SOURCE -o backup backup.c * */ #include stdlib.h enum { DOS_EXTENDED = 0x05, WIN98_EXTENDED = 0x0f, LINUX_EXTENDED = 0x85, }; struct partition { unsigned char boot_ind; unsigned char __1[3]; unsigned char sys_ind; unsigned char __2[3]; unsigned int start; unsigned int size; } __attribute__((packed)); struct entry { unsigned char flags; unsigned char type; unsigned short __1; unsigned long long start; unsigned int size; } __attribute__((packed)); enum { ENTRY_FLAG_LOGICAL = 0x01, ENTRY_FLAG_BOOTABLE = 0x80, }; struct backup { unsigned char signature[8]; unsigned short type; unsigned char heads; unsigned char sectors; unsigned char count; unsigned char __1[3]; struct entry table[31]; } __attribute__((packed)); #define BACKUP_SIGNATURE PARTBAK1 enum { BACKUP_TYPE_MBR = 1, }; struct backup backup = { .signature =
Re: [RFH] Partition table recovery
On Sun, Jul 22, 2007 at 07:10:31AM +0300, Al Boldi wrote: Sounds great, but it may be advisable to hook this into the partition modification routines instead of mkfs/fsck. Which would mean that the partition manager could ask the kernel to instruct its fs subsystem to update the backup partition table for each known fs-type that supports such a feature. Well, let's think about this a bit. What are the requirements? 1) The partition manager should be able explicitly request that a new backup of the partition tables be stashed in each filesystem that has room for such a backup. That way, when the user affirmatively makes a partition table change, it can get backed up in all of the right places automatically. 2) The fsck program should *only* stash a backup of the partition table if there currently isn't one in the filesystem. It may be that the partition table has been corrupted, and so merely doing an fsck should not transfer a current copy of the partition table to the filesystem-secpfic backup area. It could be that the partition table was only partially recovered, and we don't want to overwrite the previously existing backups except on an explicit request from the system administrator. 3) The mkfs program should automatically create a backup of the current partition table layout. That way we get a backup in the newly created filesystem as soon as it is created. 4) The exact location of the backup may vary from filesystem to filesystem. For ext2/3/4, bytes 512-1023 are always unused, and don't interfere with the boot sector at bytes 0-511, so that's the obvious location. Other filesystems may have that location in use, and some other location might be a better place to store it. Ideally it will be a well-known location, that isn't dependent on finding an inode table, or some such, but that may not be possible for all filesystems. OK, so how about this as a solution that meets the above requirements? /sbin/partbackup device [fspart] Will scan device (i.e., /dev/hda, /dev/sdb, etc.) and create a 512 byte partition backup, using the format I've previously described. If fspart is specified on the command line, it will use the blkid library to determine the filesystem type of fspart, and then attempt to execute /dev/partbackupfs.fstype to write the partition backup to fspart. If fspart is '-', then it will write the 512 byte partition table to stdout. If fspart is not specified on the command line, /sbin/partbackup will iterate over all partitions in device, use the blkid library to attempt to determine the correct filesystem type, and then execute /sbin/partbackupfs.fstype if such a backup program exists. /sbin/partbackupfs.fstype fspart ... is a filesystem specific program for filesystem type fstype. It will assure that fspart (i.e., /dev/hda1, /dev/sdb3) is of an appropriate filesystem type, and then read 512 bytes from stdin and write it out to fspart to an appropriate place for that filesystem. Partition managers will be encouraged to check to see if /sbin/partbackup exists, and if so, after the partition table is written, will check to see if /sbin/partbackup exists, and if so, to call it with just one argument (i.e., /sbin/partbackup /dev/hdb). They SHOULD provide an option for the user to suppress the backup from happening, but the backup should be the default behavior. An /etc/mkfs.fstype program is encouraged to run /sbin/partbackup with two arguments (i.e., /sbin/partbackup /dev/hdb /dev/hdb3) when creating a filesystem. An /etc/fsck.fstype program is encouraged to check to see if a partition backup exists (assuming the filesystem supports it), and if not, call /sbin/partbackup with two arguments. A filesystem utility package for a particular filesystem type is encouraged to make the above changes to its mkfs and fsck programs, as well as provide an /sbin/partbackupfs.fstype program. I would do this all in userspace, though. Is there any reason to get the kernel involved? I don't think so. - Ted - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFH] Partition table recovery
Theodore Tso wrote: On Sun, Jul 22, 2007 at 07:10:31AM +0300, Al Boldi wrote: Sounds great, but it may be advisable to hook this into the partition modification routines instead of mkfs/fsck. Which would mean that the partition manager could ask the kernel to instruct its fs subsystem to update the backup partition table for each known fs-type that supports such a feature. Well, let's think about this a bit. What are the requirements? 1) The partition manager should be able explicitly request that a new backup of the partition tables be stashed in each filesystem that has room for such a backup. That way, when the user affirmatively makes a partition table change, it can get backed up in all of the right places automatically. 2) The fsck program should *only* stash a backup of the partition table if there currently isn't one in the filesystem. It may be that the partition table has been corrupted, and so merely doing an fsck should not transfer a current copy of the partition table to the filesystem-secpfic backup area. It could be that the partition table was only partially recovered, and we don't want to overwrite the previously existing backups except on an explicit request from the system administrator. 3) The mkfs program should automatically create a backup of the current partition table layout. That way we get a backup in the newly created filesystem as soon as it is created. 4) The exact location of the backup may vary from filesystem to filesystem. For ext2/3/4, bytes 512-1023 are always unused, and don't interfere with the boot sector at bytes 0-511, so that's the obvious location. Other filesystems may have that location in use, and some other location might be a better place to store it. Ideally it will be a well-known location, that isn't dependent on finding an inode table, or some such, but that may not be possible for all filesystems. OK, so how about this as a solution that meets the above requirements? /sbin/partbackup device [fspart] Will scan device (i.e., /dev/hda, /dev/sdb, etc.) and create a 512 byte partition backup, using the format I've previously described. If fspart is specified on the command line, it will use the blkid library to determine the filesystem type of fspart, and then attempt to execute /dev/partbackupfs.fstype to write the partition backup to fspart. If fspart is '-', then it will write the 512 byte partition table to stdout. If fspart is not specified on the command line, /sbin/partbackup will iterate over all partitions in device, use the blkid library to attempt to determine the correct filesystem type, and then execute /sbin/partbackupfs.fstype if such a backup program exists. /sbin/partbackupfs.fstype fspart ... is a filesystem specific program for filesystem type fstype. It will assure that fspart (i.e., /dev/hda1, /dev/sdb3) is of an appropriate filesystem type, and then read 512 bytes from stdin and write it out to fspart to an appropriate place for that filesystem. Partition managers will be encouraged to check to see if /sbin/partbackup exists, and if so, after the partition table is written, will check to see if /sbin/partbackup exists, and if so, to call it with just one argument (i.e., /sbin/partbackup /dev/hdb). They SHOULD provide an option for the user to suppress the backup from happening, but the backup should be the default behavior. An /etc/mkfs.fstype program is encouraged to run /sbin/partbackup with two arguments (i.e., /sbin/partbackup /dev/hdb /dev/hdb3) when creating a filesystem. An /etc/fsck.fstype program is encouraged to check to see if a partition backup exists (assuming the filesystem supports it), and if not, call /sbin/partbackup with two arguments. A filesystem utility package for a particular filesystem type is encouraged to make the above changes to its mkfs and fsck programs, as well as provide an /sbin/partbackupfs.fstype program. Great! I would do this all in userspace, though. Is there any reason to get the kernel involved? I don't think so. Yes, doing things in userspace, when possible, is much better. But, a change in the partition table has to be relayed to the kernel, and when that change happens to be on a mounted disk, then the partition manager complains of not being able to update the kernel's view. So how can this be addressed? Thanks! -- Al - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFH] Partition table recovery
On Sun, July 22, 2007 18:28, Theodore Tso wrote: On Sun, Jul 22, 2007 at 07:10:31AM +0300, Al Boldi wrote: Sounds great, but it may be advisable to hook this into the partition modification routines instead of mkfs/fsck. Which would mean that the partition manager could ask the kernel to instruct its fs subsystem to update the backup partition table for each known fs-type that supports such a feature. Well, let's think about this a bit. What are the requirements? 1) The partition manager should be able explicitly request that a new backup of the partition tables be stashed in each filesystem that has room for such a backup. That way, when the user affirmatively makes a partition table change, it can get backed up in all of the right places automatically. 2) The fsck program should *only* stash a backup of the partition table if there currently isn't one in the filesystem. It may be that the partition table has been corrupted, and so merely doing an fsck should not transfer a current copy of the partition table to the filesystem-secpfic backup area. It could be that the partition table was only partially recovered, and we don't want to overwrite the previously existing backups except on an explicit request from the system administrator. 3) The mkfs program should automatically create a backup of the current partition table layout. That way we get a backup in the newly created filesystem as soon as it is created. 4) The exact location of the backup may vary from filesystem to filesystem. For ext2/3/4, bytes 512-1023 are always unused, and don't interfere with the boot sector at bytes 0-511, so that's the obvious location. Other filesystems may have that location in use, and some other location might be a better place to store it. Ideally it will be a well-known location, that isn't dependent on finding an inode table, or some such, but that may not be possible for all filesystems. To be on the safe side, maybe also add a checksum, timestamp and something identifying the disk the filesystem was created on. Regards, Indan - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFH] Partition table recovery
Theodore Tso wrote: On Sat, Jul 21, 2007 at 07:54:14PM +0200, Rene Herman wrote: sfdisk -d already works most of the time. Not as a verbatim tool (I actually semi-frequently use a sfdisk -d /dev/hda | sfdisk invocation as a way to _rewrite_ the CHS fields to other values after changing machines around on a disk) but something you'd backup on the FS level should, in my opinion, need to be less fragile than would be possible with just 512 bytes available. *IF* you remember to store the sfdisk -d somewhere useful. In my How To Recover From Hard Drive Catastrophies classes, I tell them to print out a copy of sfdisk -l /dev/hda ; sfdisk -d /dev/hda and tape it to the side of the computer. I also tell them do regular backups. What to make a guess how many them actually follow this good advice? Far fewer than I would like, I suspect... What I'm suggesting is the equivalent of sfdisk -d, except we'd be doing it automatically without requiring the user to take any kind of explicit action. Is it perfect? No, although the edge conditions are quite rare these days and generally involve users using legacy systems and/or doing Weird Shit such that They Really Should Know To Do Their Own Explicit Backups. But for the novice users, it should work Just Fine. Sounds great, but it may be advisable to hook this into the partition modification routines instead of mkfs/fsck. Which would mean that the partition manager could ask the kernel to instruct its fs subsystem to update the backup partition table for each known fs-type that supports such a feature. Thanks! -- Al - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFH] Partition table recovery
Jeffrey V. Merkey wrote: Al Boldi wrote: As always, a good friend of mine managed to scratch my partion table by cat'ing /dev/full into /dev/sda. I was able to push him out of the way, but at least the first 100MB are gone. I can probably live without the first partion, but there are many partitions after that, which I hope should easily be recoverable. I tried parted, but it's not working out for me. Does anybody know of a simple partition recovery tool, that would just scan the disk for lost partions? One thing NetWare always did was to stamp a copy of the partition table at the time a partition was created as the second logical sector (offset 1) from the start of a newly created partition. This allowed the disk to be scanned for the original (or last) partition table copy. This is really a good idea, as this would save you the trouble of reconstructing the table due to older overlapping entries. Can linux do something like that? Thanks! -- Al - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFH] Partition table recovery
Dave Young wrote: On 7/20/07, Al Boldi [EMAIL PROTECTED] wrote: As always, a good friend of mine managed to scratch my partion table by cat'ing /dev/full into /dev/sda. I was able to push him out of the way, but /dev/null ? at least the first 100MB are gone. I can probably live without the first partion, but there are many partitions after that, which I hope should easily be recoverable. I tried parted, but it's not working out for me. Does anybody know of a simple partition recovery tool, that would just scan the disk for lost partions? The best way is to backup you partition table before destroyed. Very true! # sfdisk -d is a real saviour. But make sure you don't save it on the same disk you are trying to recover. Thanks! -- Al - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFH] Partition table recovery
James Lamanna wrote: On 7/19/07, Al Boldi [EMAIL PROTECTED] wrote: As always, a good friend of mine managed to scratch my partion table by cat'ing /dev/full into /dev/sda. I was able to push him out of the way, but at least the first 100MB are gone. I can probably live without the first partion, but there are many partitions after that, which I hope should easily be recoverable. I tried parted, but it's not working out for me. Does anybody know of a simple partition recovery tool, that would just scan the disk for lost partions? Tried gpart? http://www.stud.uni-hannover.de/user/76201/gpart/ This definitely looks like the ticket. And also rescuept from util-linux. There is only one small problem; I have been regularly adding / deleting / resizing partitions, which kind of confuses the scanner. But still, it's better than nothing. Anton Altaparmakov wrote: parted and its derivatives are pile of crap... They cause corruption to totally healthy systems at the best of times. Don't go near them. Use TestDisk (http://www.cgsecurity.org/wiki/TestDisk) and be happy. (-: This one really worked best, without getting confused about older partitions. Thanks everybody! BTW, what's a partion table? -- Al - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFH] Partition table recovery
Jan-Benedict Glaw wrote: On Fri, 2007-07-20 14:29:34 +0300, Al Boldi [EMAIL PROTECTED] wrote: But, I want something much more automated. And the partition table backup per partition entry isn't really a bad idea. That's called `gpart'. Oh, gpart is great, but if we had a backup copy of the partition table on every partition location on disk, then this backup copy could easily be reused to reconstruct the original partition table without further searching. Just like the NetWare approach, and in some respect like the ext2/3 superblock backups. Thanks! -- Al - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFH] Partition table recovery
On 07/20/2007 02:22 PM, Al Boldi wrote: Oh, gpart is great, but if we had a backup copy of the partition table on every partition location on disk, then this backup copy could easily be reused to reconstruct the original partition table without further searching. As long as you don't reboot you have a backup copy in kernel. Admittedly not in a very nice format, but /sys/block/disk/part/start and size have saved my butt a few times... Rene. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFH] Partition table recovery
On Fri, Jul 20, 2007 at 03:22:17PM +0300, Al Boldi wrote: Oh, gpart is great, but if we had a backup copy of the partition table on every partition location on disk, then this backup copy could easily be reused to reconstruct the original partition table without further searching. Just like the NetWare approach, and in some respect like the ext2/3 superblock backups. It wouldn't be that hard to put a backup partition table at the beginning of an ext2/3 filesystem. No one is currently using the space between offset 512 and 1023 bytes, and it would be an easy place to stash a backup copy of the partition table. We wouldn't be able to use the MBR format, since information about extended partitions are stored scattered across the disk. So for the sake of argument, I'll propose the following partition backup, starting at offset 512 and going for 512 bytes to byte #1023: offset from 512 Description --- --- 0..7 Signature ASCII: PARTBAK1 8..9 Part-type: 1=MBR 10# of heads 11# sectors 12# of partitions in the backup 13..15Reserved (must be zero) 16..31first part entry ... 496..511 31st partition entry Partition entry (16 bytes) offsetdescription ----- 0 Flags 1 Type 2..3 Reserved (must be zero) 4..11 Start LBA (little endian) 12..15# of LBA in partition Obviously this won't work for LVM or EFI volume partition tables, but they have enough redundancy into their on-disk formats that it shouldn't be an issue. Someone want to write a stand-alone function which pulls the information from the MBR (and extended MBR's) and put it into a 512 byte buffer? If so, I'll integrate it into mke2fs and e2fsck (so that when the user upgrades to a new enough e2fsprogs, it will automatically back up the partition information just before the superblock. - Ted - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html