Re: [PATCH] PPC64: large INITRD causes kernel not to boot [UPDATE]
Paul Mackerras wrote: Mark Bellon writes: Simply put the existing code has a fixed reservation (claim) address and once the kernel plus initrd image are large enough to pass this address all sorts of bad things occur. The fix is the dynamically establish the first claim address above the loaded kernel plus initrd (plus some "padding" and rounding). If PROG_START is defined this will be used as the minimum safe address - currently known to be 0x0140 for the firmwares tested so far. The idea is fine, but I have some questions about the actual patch: -void *claim(unsigned int, unsigned int, unsigned int); +void *claim(unsigned long, unsigned long, unsigned long); What was the motivation for this change? Since the zImage wrapper is a 32-bit executable, int and long are both 32 bits. I would prefer to leave the parameters as unsigned int to force people to realize that the parameters are 32 bits (even if said people have been working on 64-bit programs recently). The function, claim, is found in prom.c uses longs. The long is the usual idiom for hiding a pointer, not an int, so I fixed accordingly. I'm open to further discussion of course. On a 64 bit machine long and int are different sizes. This would make things "proper" if things changed in the future. + claim_base = _ALIGN_UP((unsigned long)_end, ONE_MB); + +#if defined(PROG_START) + /* +* Maintain a "magic" minimum address. This keeps some older +* firmware platforms running. +*/ + + if (claim_base < PROG_START) + claim_base = PROG_START; +#endif This appears to be the meat of the patch, the rest is "cleanup", right? Correct. The preceding comment explains what is going on. Removing the magic numbers seemed like a good idea. mark Paul. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] PPC64: large INITRD causes kernel not to boot [UPDATE]
In PPC64 there are number of problems in arch/ppc64/boot/main.c that prevent a kernel from making use of a large (greater than ~16MB) INITRD. This is 64 bit architecture and really large INITRD images should be possible. Simply put the existing code has a fixed reservation (claim) address and once the kernel plus initrd image are large enough to pass this address all sorts of bad things occur. The fix is the dynamically establish the first claim address above the loaded kernel plus initrd (plus some "padding" and rounding). If PROG_START is defined this will be used as the minimum safe address - currently known to be 0x0140 for the firmwares tested so far. We've talked about this in [EMAIL PROTECTED] and this is what seems to have settled out. mark Signed-off-by: Mark Bellon <[EMAIL PROTECTED]> diff -Naur linux-2.6.13-orig/arch/ppc64/boot/main.c linux-2.6.13/arch/ppc64/boot/main.c --- linux-2.6.13-orig/arch/ppc64/boot/main.c2005-08-28 16:41:01.0 -0700 +++ linux-2.6.13/arch/ppc64/boot/main.c 2005-09-06 15:42:22.0 -0700 @@ -20,7 +20,7 @@ extern void printf(const char *fmt, ...); extern int sprintf(char *buf, const char *fmt, ...); void gunzip(void *, int, unsigned char *, int *); -void *claim(unsigned int, unsigned int, unsigned int); +void *claim(unsigned long, unsigned long, unsigned long); void flush_cache(void *, unsigned long); void pause(void); extern void exit(void); @@ -31,7 +31,8 @@ /* Value picked to match that used by yaboot */ #define PROG_START 0x0140 -#define RAM_END(256<<20) // Fixme: use OF */ +#define RAM_END(512<<20) // Fixme: use OF */ +#defineONE_MB 0x10 char *avail_ram; char *begin_avail, *end_avail; @@ -40,6 +41,7 @@ unsigned int heap_max; extern char _start[]; +extern char _end[]; extern char _vmlinux_start[]; extern char _vmlinux_end[]; extern char _initrd_start[]; @@ -73,13 +75,13 @@ #undef DEBUG -static unsigned long claim_base = PROG_START; +static unsigned long claim_base; static unsigned long try_claim(unsigned long size) { unsigned long addr = 0; - for(; claim_base < RAM_END; claim_base += 0x10) { + for(; claim_base < RAM_END; claim_base += ONE_MB) { #ifdef DEBUG printf("trying: 0x%08lx\n\r", claim_base); #endif @@ -110,7 +112,26 @@ if (getprop(chosen_handle, "stdin", , sizeof(stdin)) != 4) exit(); - printf("\n\rzImage starting: loaded at 0x%x\n\r", (unsigned)_start); + printf("\n\rzImage starting: loaded at 0x%lx\n\r", (unsigned long) _start); + + /* +* The first available claim_base must be above the end of the +* the loaded kernel wrapper file (_start to _end includes the +* initrd image if it is present) and rounded up to a nice +* 1 MB boundary for good measure. +*/ + + claim_base = _ALIGN_UP((unsigned long)_end, ONE_MB); + +#if defined(PROG_START) + /* +* Maintain a "magic" minimum address. This keeps some older +* firmware platforms running. +*/ + + if (claim_base < PROG_START) + claim_base = PROG_START; +#endif /* * Now we try to claim some memory for the kernel itself @@ -120,7 +141,7 @@ * size... In practice we add 1Mb, that is enough, but we should really * consider fixing the Makefile to put a _raw_ kernel in there ! */ - vmlinux_memsize += 0x10; + vmlinux_memsize += ONE_MB; printf("Allocating 0x%lx bytes for kernel ...\n\r", vmlinux_memsize); vmlinux.addr = try_claim(vmlinux_memsize); if (vmlinux.addr == 0) {
[PATCH] PPC64: large INITRD causes kernel not to boot [UPDATE]
In PPC64 there are number of problems in arch/ppc64/boot/main.c that prevent a kernel from making use of a large (greater than ~16MB) INITRD. This is 64 bit architecture and really large INITRD images should be possible. Simply put the existing code has a fixed reservation (claim) address and once the kernel plus initrd image are large enough to pass this address all sorts of bad things occur. The fix is the dynamically establish the first claim address above the loaded kernel plus initrd (plus some padding and rounding). If PROG_START is defined this will be used as the minimum safe address - currently known to be 0x0140 for the firmwares tested so far. We've talked about this in [EMAIL PROTECTED] and this is what seems to have settled out. mark Signed-off-by: Mark Bellon [EMAIL PROTECTED] diff -Naur linux-2.6.13-orig/arch/ppc64/boot/main.c linux-2.6.13/arch/ppc64/boot/main.c --- linux-2.6.13-orig/arch/ppc64/boot/main.c2005-08-28 16:41:01.0 -0700 +++ linux-2.6.13/arch/ppc64/boot/main.c 2005-09-06 15:42:22.0 -0700 @@ -20,7 +20,7 @@ extern void printf(const char *fmt, ...); extern int sprintf(char *buf, const char *fmt, ...); void gunzip(void *, int, unsigned char *, int *); -void *claim(unsigned int, unsigned int, unsigned int); +void *claim(unsigned long, unsigned long, unsigned long); void flush_cache(void *, unsigned long); void pause(void); extern void exit(void); @@ -31,7 +31,8 @@ /* Value picked to match that used by yaboot */ #define PROG_START 0x0140 -#define RAM_END(25620) // Fixme: use OF */ +#define RAM_END(51220) // Fixme: use OF */ +#defineONE_MB 0x10 char *avail_ram; char *begin_avail, *end_avail; @@ -40,6 +41,7 @@ unsigned int heap_max; extern char _start[]; +extern char _end[]; extern char _vmlinux_start[]; extern char _vmlinux_end[]; extern char _initrd_start[]; @@ -73,13 +75,13 @@ #undef DEBUG -static unsigned long claim_base = PROG_START; +static unsigned long claim_base; static unsigned long try_claim(unsigned long size) { unsigned long addr = 0; - for(; claim_base RAM_END; claim_base += 0x10) { + for(; claim_base RAM_END; claim_base += ONE_MB) { #ifdef DEBUG printf(trying: 0x%08lx\n\r, claim_base); #endif @@ -110,7 +112,26 @@ if (getprop(chosen_handle, stdin, stdin, sizeof(stdin)) != 4) exit(); - printf(\n\rzImage starting: loaded at 0x%x\n\r, (unsigned)_start); + printf(\n\rzImage starting: loaded at 0x%lx\n\r, (unsigned long) _start); + + /* +* The first available claim_base must be above the end of the +* the loaded kernel wrapper file (_start to _end includes the +* initrd image if it is present) and rounded up to a nice +* 1 MB boundary for good measure. +*/ + + claim_base = _ALIGN_UP((unsigned long)_end, ONE_MB); + +#if defined(PROG_START) + /* +* Maintain a magic minimum address. This keeps some older +* firmware platforms running. +*/ + + if (claim_base PROG_START) + claim_base = PROG_START; +#endif /* * Now we try to claim some memory for the kernel itself @@ -120,7 +141,7 @@ * size... In practice we add 1Mb, that is enough, but we should really * consider fixing the Makefile to put a _raw_ kernel in there ! */ - vmlinux_memsize += 0x10; + vmlinux_memsize += ONE_MB; printf(Allocating 0x%lx bytes for kernel ...\n\r, vmlinux_memsize); vmlinux.addr = try_claim(vmlinux_memsize); if (vmlinux.addr == 0) {
Re: [PATCH] PPC64: large INITRD causes kernel not to boot [UPDATE]
Paul Mackerras wrote: Mark Bellon writes: Simply put the existing code has a fixed reservation (claim) address and once the kernel plus initrd image are large enough to pass this address all sorts of bad things occur. The fix is the dynamically establish the first claim address above the loaded kernel plus initrd (plus some padding and rounding). If PROG_START is defined this will be used as the minimum safe address - currently known to be 0x0140 for the firmwares tested so far. The idea is fine, but I have some questions about the actual patch: -void *claim(unsigned int, unsigned int, unsigned int); +void *claim(unsigned long, unsigned long, unsigned long); What was the motivation for this change? Since the zImage wrapper is a 32-bit executable, int and long are both 32 bits. I would prefer to leave the parameters as unsigned int to force people to realize that the parameters are 32 bits (even if said people have been working on 64-bit programs recently). The function, claim, is found in prom.c uses longs. The long is the usual idiom for hiding a pointer, not an int, so I fixed accordingly. I'm open to further discussion of course. On a 64 bit machine long and int are different sizes. This would make things proper if things changed in the future. + claim_base = _ALIGN_UP((unsigned long)_end, ONE_MB); + +#if defined(PROG_START) + /* +* Maintain a magic minimum address. This keeps some older +* firmware platforms running. +*/ + + if (claim_base PROG_START) + claim_base = PROG_START; +#endif This appears to be the meat of the patch, the rest is cleanup, right? Correct. The preceding comment explains what is going on. Removing the magic numbers seemed like a good idea. mark Paul. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] PPC64: large INITRD causes kernel not to boot
In PPC64 there are number of problems in arch/ppc64/boot/main.c that prevent a kernel from making use of a large (greater than ~16MB) INITRD. This is 64 bit architecture and really large INITRD images should be possible. Simply put the existing code has a fixed reservation (claim) address and once the kernel plus initrd image are large enough to pass this address all sorts of bad things occur. The fix is the dynamically establish the first claim address above the loaded kernel plus initrd (plus some "padding" and rounding) mark Signed-off-by: Mark Bellon <[EMAIL PROTECTED]> Index: linux-2.6.12.3/arch/ppc64/boot/main.c === --- linux-2.6.12.3.orig/arch/ppc64/boot/main.c +++ linux-2.6.12.3/arch/ppc64/boot/main.c @@ -22,7 +22,7 @@ extern void printf(const char *fmt, ...); extern int sprintf(char *buf, const char *fmt, ...); void gunzip(void *, int, unsigned char *, int *); -void *claim(unsigned int, unsigned int, unsigned int); +void *claim(unsigned long, unsigned long, unsigned long); void flush_cache(void *, unsigned long); void pause(void); extern void exit(void); @@ -31,9 +31,8 @@ void *memmove(void *dest, const void *src, unsigned long n); void *memcpy(void *dest, const void *src, unsigned long n); -/* Value picked to match that used by yaboot */ -#define PROG_START 0x0140 -#define RAM_END (256<<20) // Fixme: use OF */ +#define ONE_MB 0x10 +#define RAM_END (512<<20) // Fixme: use OF */ char *avail_ram; char *begin_avail, *end_avail; @@ -75,13 +74,13 @@ #define DEBUG -static unsigned long claim_base = PROG_START; +static unsigned long claim_base; static unsigned long try_claim(unsigned long size) { unsigned long addr = 0; - for(; claim_base < RAM_END; claim_base += 0x10) { + for(; claim_base < RAM_END; claim_base += ONE_MB) { #ifdef DEBUG printf("trying: 0x%08lx\n\r", claim_base); #endif @@ -112,7 +111,23 @@ if (getprop(chosen_handle, "stdin", , sizeof(stdin)) != 4) exit(); - printf("zImage starting: loaded at 0x%x\n\r", (unsigned)_start); + printf("zImage starting: loaded at 0x%lx\n\r", (unsigned long)_start); + + /* + * The first available claim_base must be "out of the way" - + * well above _start + kernel_size + initrd + overhead. + */ + + claim_base = ((unsigned long) _start) + +((unsigned long) vmlinux_filesize) + +(unsigned long)(_initrd_end - _initrd_start) + +ONE_MB; + + /* + * Now round up the claim_base to a nice 1 MB boundary. + */ + + claim_base = ((claim_base + ONE_MB - 1) / ONE_MB) * ONE_MB; /* * Now we try to claim some memory for the kernel itself @@ -122,7 +137,7 @@ * size... In practice we add 1Mb, that is enough, but we should really * consider fixing the Makefile to put a _raw_ kernel in there ! */ - vmlinux_memsize += 0x10; + vmlinux_memsize += ONE_MB; printf("Allocating 0x%lx bytes for kernel ...\n\r", vmlinux_memsize); vmlinux.addr = try_claim(vmlinux_memsize); if (vmlinux.addr == 0) {
[PATCH] PPC64: large INITRD causes kernel not to boot
In PPC64 there are number of problems in arch/ppc64/boot/main.c that prevent a kernel from making use of a large (greater than ~16MB) INITRD. This is 64 bit architecture and really large INITRD images should be possible. Simply put the existing code has a fixed reservation (claim) address and once the kernel plus initrd image are large enough to pass this address all sorts of bad things occur. The fix is the dynamically establish the first claim address above the loaded kernel plus initrd (plus some padding and rounding) mark Signed-off-by: Mark Bellon [EMAIL PROTECTED] Index: linux-2.6.12.3/arch/ppc64/boot/main.c === --- linux-2.6.12.3.orig/arch/ppc64/boot/main.c +++ linux-2.6.12.3/arch/ppc64/boot/main.c @@ -22,7 +22,7 @@ extern void printf(const char *fmt, ...); extern int sprintf(char *buf, const char *fmt, ...); void gunzip(void *, int, unsigned char *, int *); -void *claim(unsigned int, unsigned int, unsigned int); +void *claim(unsigned long, unsigned long, unsigned long); void flush_cache(void *, unsigned long); void pause(void); extern void exit(void); @@ -31,9 +31,8 @@ void *memmove(void *dest, const void *src, unsigned long n); void *memcpy(void *dest, const void *src, unsigned long n); -/* Value picked to match that used by yaboot */ -#define PROG_START 0x0140 -#define RAM_END (25620) // Fixme: use OF */ +#define ONE_MB 0x10 +#define RAM_END (51220) // Fixme: use OF */ char *avail_ram; char *begin_avail, *end_avail; @@ -75,13 +74,13 @@ #define DEBUG -static unsigned long claim_base = PROG_START; +static unsigned long claim_base; static unsigned long try_claim(unsigned long size) { unsigned long addr = 0; - for(; claim_base RAM_END; claim_base += 0x10) { + for(; claim_base RAM_END; claim_base += ONE_MB) { #ifdef DEBUG printf(trying: 0x%08lx\n\r, claim_base); #endif @@ -112,7 +111,23 @@ if (getprop(chosen_handle, stdin, stdin, sizeof(stdin)) != 4) exit(); - printf(zImage starting: loaded at 0x%x\n\r, (unsigned)_start); + printf(zImage starting: loaded at 0x%lx\n\r, (unsigned long)_start); + + /* + * The first available claim_base must be out of the way - + * well above _start + kernel_size + initrd + overhead. + */ + + claim_base = ((unsigned long) _start) + +((unsigned long) vmlinux_filesize) + +(unsigned long)(_initrd_end - _initrd_start) + +ONE_MB; + + /* + * Now round up the claim_base to a nice 1 MB boundary. + */ + + claim_base = ((claim_base + ONE_MB - 1) / ONE_MB) * ONE_MB; /* * Now we try to claim some memory for the kernel itself @@ -122,7 +137,7 @@ * size... In practice we add 1Mb, that is enough, but we should really * consider fixing the Makefile to put a _raw_ kernel in there ! */ - vmlinux_memsize += 0x10; + vmlinux_memsize += ONE_MB; printf(Allocating 0x%lx bytes for kernel ...\n\r, vmlinux_memsize); vmlinux.addr = try_claim(vmlinux_memsize); if (vmlinux.addr == 0) {
Re: [PATCH] IDE disks show invalid geometries in /proc/ide/hd*/geometry
Jan Engelhardt wrote: Simple: do not use BIOS values. [ Yes, there should be some warning from kernel. ] On that matter, I get a warning from LILO wrt cyls and stuff: 07:47 spectre:~ # cat /proc/ide/hda/geometry physical 16383/16/63 logical 65535/16/63 07:58 spectre:~ # lilo Warning: Kernel & BIOS return differing head/sector geometries for device 0x80 Kernel: 65535 cylinders, 16 heads, 63 sectors BIOS: 1023 cylinders, 255 heads, 63 sectors Added linux * 07:59 spectre:~ # fdisk -l Disk /dev/hda: 40.9 GB, 40982151168 bytes 255 heads, 63 sectors/track, 4982 cylinders All of these numbers are virtual, since CHS is not really used anymore, as we know. But, which of these fake CHS values (16383/16/63 | 65535/16/63 | 1023/255/63) is the right one? 255/63/4982 is another matter, since it [almost] matches the actual size of the disk while the other three are just "for the bios". This is exactly the case that my patch was attempting to fix (and apparently didn't get quite right). Certain drive returns cause strange numbers to slip through often when LBA 28 is involved. mark - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] IDE disks show invalid geometries in /proc/ide/hd*/geometry
Jan Engelhardt wrote: Simple: do not use BIOS values. [ Yes, there should be some warning from kernel. ] On that matter, I get a warning from LILO wrt cyls and stuff: 07:47 spectre:~ # cat /proc/ide/hda/geometry physical 16383/16/63 logical 65535/16/63 07:58 spectre:~ # lilo Warning: Kernel BIOS return differing head/sector geometries for device 0x80 Kernel: 65535 cylinders, 16 heads, 63 sectors BIOS: 1023 cylinders, 255 heads, 63 sectors Added linux * 07:59 spectre:~ # fdisk -l Disk /dev/hda: 40.9 GB, 40982151168 bytes 255 heads, 63 sectors/track, 4982 cylinders All of these numbers are virtual, since CHS is not really used anymore, as we know. But, which of these fake CHS values (16383/16/63 | 65535/16/63 | 1023/255/63) is the right one? 255/63/4982 is another matter, since it [almost] matches the actual size of the disk while the other three are just for the bios. This is exactly the case that my patch was attempting to fix (and apparently didn't get quite right). Certain drive returns cause strange numbers to slip through often when LBA 28 is involved. mark - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] IDE disks show invalid geometries in /proc/ide/hd*/geometry
Bartlomiej Zolnierkiewicz wrote: On 8/3/05, Mark Bellon <[EMAIL PROTECTED]> wrote: Bartlomiej Zolnierkiewicz wrote: Hi, The topic was discussed to death on linux-kernel. Mark, you need to fix your applications and stop using /proc/ide/hd*/geometry or/and HDIO_GET_GEO ioctl (which BTW your patch also affects). Fixing the applications I can understand but the patch still seems necessary, to me, so the HDIO_GET_GEO returns "rational" values. I tested HDIO_GET_GEO and it returns the same broken values as go into /proc (no surprises) without my patch. Please explain. Let me explain more below. Thanks for the response as now I can understand better the issue (which was what I was fishing for). If a drive is in LBA mode (28 or 48 bit) the existing code doesn't always "fix up" the geometry properly for some value returns. It only tries with 48 bit mode and it fails there for some values. My patch forces a complete geometry and appears (to me) to preserve the side efefcts of the existing code. Am I missing something? diff -Naur linux-2.6.13-rc3-git9-orig/drivers/ide/ide-disk.c linux-2.6.13-rc3-git9/drivers/ide/ide-disk.c --- linux-2.6.13-rc3-git9-orig/drivers/ide/ide-disk.c 2005-08-01 13:48:21.0 -0700 +++ linux-2.6.13-rc3-git9/drivers/ide/ide-disk.c2005-08-02 12:14:43.0 -0700 @@ -884,10 +884,17 @@ ide_add_setting(drive, "max_failures", SETTING_RW, -1, -1, TYPE_INT, 0, 65535, 1, 1, >max_failures,NULL); } +static uint32_t do_div64_32 (__u64 n, uint32_t d) +{ + do_div(n, d); + + return n; +} + static void idedisk_setup (ide_drive_t *drive) { struct hd_driveid *id = drive->id; - unsigned long long capacity; + __u64 capacity; int barrier; idedisk_add_settings(drive); @@ -949,27 +956,32 @@ */ capacity = idedisk_capacity (drive); if (!drive->forced_geom) { + uint32_t cylsz, cyl; - if (idedisk_supports_lba48(drive->id)) { - /* compatibility */ - drive->bios_sect = 63; - drive->bios_head = 255; + /* +* In LBA mode the geometry isn't used to talk to the drive +* so always create a "rational" geometry from the capacity. +*/ + if (drive->select.b.lba) { why are you forcing drive->bios_sect and drive->bios_head values for LBA28 drives? Any LBA 28 drive can return fixed geometry (regardless of size) just as well as an LBA 48 drive can. I've got drives all over the place that do just that. In any event there are machines which do not have have an BIOS, like a PPC target, and the BIOS values can be "fixed up". The ioctl returns the BIOS values and these are totally wrong on some targets. Comments? + drive->bios_sect = drive->sect = 63; + drive->bios_head = drive->head = 255; changing drive->sect and drive->head is just wrong, these are values reported by the device - do not touch them If drive->cyl, drive-head amd drive->sect are to be left alone in all cases I have no problem. + + cylsz = drive->bios_sect * drive->bios_head; + cyl = do_div64_32(capacity, cylsz); division by zero now is possible (yes, 0/0/0 is possible) Then the original code is prefered. + + /* "fake out" works up to ~500 GB */ + cyl = (cyl > 65535) ? 65535 : cyl; + drive->bios_cyl = drive->cyl = cyl; drive->cyl shouldn't be changed Same comment as above. Only change BIOS values. } if (drive->bios_sect && drive->bios_head) { - unsigned int cap0 = capacity; /* truncate to 32 bits */ - unsigned int cylsz, cyl; + cylsz = drive->bios_sect * drive->bios_head; + cyl = do_div64_32(capacity, cylsz); - if (cap0 != capacity) - drive->bios_cyl = 65535; cap0 != capacity => set the max possible drive->bios_cyl, no need for adjusting it - else { cap0 == capacity => no need for do_div(), only lower 32-bits are used, adjust drive->bios_cyl - cylsz = drive->bios_sect * drive->bios_head; - cyl = cap0 / cylsz; - if (cyl > 65535) - cyl = 65535; - if (cyl > drive-&g
Re: [PATCH] IDE disks show invalid geometries in /proc/ide/hd*/geometry
Bartlomiej Zolnierkiewicz wrote: Hi, The topic was discussed to death on linux-kernel. Mark, you need to fix your applications and stop using /proc/ide/hd*/geometry or/and HDIO_GET_GEO ioctl (which BTW your patch also affects). Fixing the applications I can understand but the patch still seems necessary, to me, so the HDIO_GET_GEO returns "rational" values. I tested HDIO_GET_GEO and it returns the same broken values as go into /proc (no surprises) without my patch. If a drive is in LBA mode (28 or 48 bit) the existing code doesn't always "fix up" the geometry properly for some value returns. It only tries with 48 bit mode and it fails there for some values. My patch forces a complete geometry and appears (to me) to preserve the side efefcts of the existing code. Am I missing something? mark Bartlomiej On 8/3/05, Andre Hedrick <[EMAIL PROTECTED]> wrote: Did you read ATA-1 through ATA-7 to understand all the variations? On Tue, 2 Aug 2005, Mark Bellon wrote: The ATA specification tells large disk drives to return C/H/S data of 16383/16/63 regardless of their actual size (other variations on this return include 15 heads and/or 4092 cylinders). Unfortunately these CHS data confuse the existing IDE code and cause it to report invalid geometries in /proc when the disk runs in LBA mode. The invalid geometries can cause failures in the partitioning tools; partitioning may be impossible or illogical size limitations occur. This also leads to various forms of human confusion. I attach a patch that fixes this problem while strongly attempting to not break any existing side effects and await any comments. mark Signed-off-by: Mark Bellon <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] IDE disks show invalid geometries in /proc/ide/hd*/geometry
Andre Hedrick wrote: Did you read ATA-1 through ATA-7 to understand all the variations? Regardless of all of the geometry returns by the drives and their ATA compliance, the existing code will fail for some drives and return values. For instance, the existing code attempts to "fix up" LBA 48 fails to handle LBA 28. In both cases the "fix up" code appears errant - it doesn't create a complete, valid geometry. My patch attempts to preserve the flow and side effects of the existing code while handling all of the boundary cases. Given the way the original code appears to read one should be able to "fix up" things without regard for the ATA compliance of a drive. It might help to read the code before and after my patch is applied. The explaination and patch alone don't make it easy to see what I think is a simple fix. mark On Tue, 2 Aug 2005, Mark Bellon wrote: The ATA specification tells large disk drives to return C/H/S data of 16383/16/63 regardless of their actual size (other variations on this return include 15 heads and/or 4092 cylinders). Unfortunately these CHS data confuse the existing IDE code and cause it to report invalid geometries in /proc when the disk runs in LBA mode. The invalid geometries can cause failures in the partitioning tools; partitioning may be impossible or illogical size limitations occur. This also leads to various forms of human confusion. I attach a patch that fixes this problem while strongly attempting to not break any existing side effects and await any comments. mark Signed-off-by: Mark Bellon <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] IDE disks show invalid geometries in /proc/ide/hd*/geometry
Andre Hedrick wrote: Did you read ATA-1 through ATA-7 to understand all the variations? Regardless of all of the geometry returns by the drives and their ATA compliance, the existing code will fail for some drives and return values. For instance, the existing code attempts to fix up LBA 48 fails to handle LBA 28. In both cases the fix up code appears errant - it doesn't create a complete, valid geometry. My patch attempts to preserve the flow and side effects of the existing code while handling all of the boundary cases. Given the way the original code appears to read one should be able to fix up things without regard for the ATA compliance of a drive. It might help to read the code before and after my patch is applied. The explaination and patch alone don't make it easy to see what I think is a simple fix. mark On Tue, 2 Aug 2005, Mark Bellon wrote: The ATA specification tells large disk drives to return C/H/S data of 16383/16/63 regardless of their actual size (other variations on this return include 15 heads and/or 4092 cylinders). Unfortunately these CHS data confuse the existing IDE code and cause it to report invalid geometries in /proc when the disk runs in LBA mode. The invalid geometries can cause failures in the partitioning tools; partitioning may be impossible or illogical size limitations occur. This also leads to various forms of human confusion. I attach a patch that fixes this problem while strongly attempting to not break any existing side effects and await any comments. mark Signed-off-by: Mark Bellon [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] IDE disks show invalid geometries in /proc/ide/hd*/geometry
Bartlomiej Zolnierkiewicz wrote: Hi, The topic was discussed to death on linux-kernel. Mark, you need to fix your applications and stop using /proc/ide/hd*/geometry or/and HDIO_GET_GEO ioctl (which BTW your patch also affects). Fixing the applications I can understand but the patch still seems necessary, to me, so the HDIO_GET_GEO returns rational values. I tested HDIO_GET_GEO and it returns the same broken values as go into /proc (no surprises) without my patch. If a drive is in LBA mode (28 or 48 bit) the existing code doesn't always fix up the geometry properly for some value returns. It only tries with 48 bit mode and it fails there for some values. My patch forces a complete geometry and appears (to me) to preserve the side efefcts of the existing code. Am I missing something? mark Bartlomiej On 8/3/05, Andre Hedrick [EMAIL PROTECTED] wrote: Did you read ATA-1 through ATA-7 to understand all the variations? On Tue, 2 Aug 2005, Mark Bellon wrote: The ATA specification tells large disk drives to return C/H/S data of 16383/16/63 regardless of their actual size (other variations on this return include 15 heads and/or 4092 cylinders). Unfortunately these CHS data confuse the existing IDE code and cause it to report invalid geometries in /proc when the disk runs in LBA mode. The invalid geometries can cause failures in the partitioning tools; partitioning may be impossible or illogical size limitations occur. This also leads to various forms of human confusion. I attach a patch that fixes this problem while strongly attempting to not break any existing side effects and await any comments. mark Signed-off-by: Mark Bellon [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] IDE disks show invalid geometries in /proc/ide/hd*/geometry
Bartlomiej Zolnierkiewicz wrote: On 8/3/05, Mark Bellon [EMAIL PROTECTED] wrote: Bartlomiej Zolnierkiewicz wrote: Hi, The topic was discussed to death on linux-kernel. Mark, you need to fix your applications and stop using /proc/ide/hd*/geometry or/and HDIO_GET_GEO ioctl (which BTW your patch also affects). Fixing the applications I can understand but the patch still seems necessary, to me, so the HDIO_GET_GEO returns rational values. I tested HDIO_GET_GEO and it returns the same broken values as go into /proc (no surprises) without my patch. Please explain. Let me explain more below. Thanks for the response as now I can understand better the issue (which was what I was fishing for). If a drive is in LBA mode (28 or 48 bit) the existing code doesn't always fix up the geometry properly for some value returns. It only tries with 48 bit mode and it fails there for some values. My patch forces a complete geometry and appears (to me) to preserve the side efefcts of the existing code. Am I missing something? diff -Naur linux-2.6.13-rc3-git9-orig/drivers/ide/ide-disk.c linux-2.6.13-rc3-git9/drivers/ide/ide-disk.c --- linux-2.6.13-rc3-git9-orig/drivers/ide/ide-disk.c 2005-08-01 13:48:21.0 -0700 +++ linux-2.6.13-rc3-git9/drivers/ide/ide-disk.c2005-08-02 12:14:43.0 -0700 @@ -884,10 +884,17 @@ ide_add_setting(drive, max_failures, SETTING_RW, -1, -1, TYPE_INT, 0, 65535, 1, 1, drive-max_failures,NULL); } +static uint32_t do_div64_32 (__u64 n, uint32_t d) +{ + do_div(n, d); + + return n; +} + static void idedisk_setup (ide_drive_t *drive) { struct hd_driveid *id = drive-id; - unsigned long long capacity; + __u64 capacity; int barrier; idedisk_add_settings(drive); @@ -949,27 +956,32 @@ */ capacity = idedisk_capacity (drive); if (!drive-forced_geom) { + uint32_t cylsz, cyl; - if (idedisk_supports_lba48(drive-id)) { - /* compatibility */ - drive-bios_sect = 63; - drive-bios_head = 255; + /* +* In LBA mode the geometry isn't used to talk to the drive +* so always create a rational geometry from the capacity. +*/ + if (drive-select.b.lba) { why are you forcing drive-bios_sect and drive-bios_head values for LBA28 drives? Any LBA 28 drive can return fixed geometry (regardless of size) just as well as an LBA 48 drive can. I've got drives all over the place that do just that. In any event there are machines which do not have have an BIOS, like a PPC target, and the BIOS values can be fixed up. The ioctl returns the BIOS values and these are totally wrong on some targets. Comments? + drive-bios_sect = drive-sect = 63; + drive-bios_head = drive-head = 255; changing drive-sect and drive-head is just wrong, these are values reported by the device - do not touch them If drive-cyl, drive-head amd drive-sect are to be left alone in all cases I have no problem. + + cylsz = drive-bios_sect * drive-bios_head; + cyl = do_div64_32(capacity, cylsz); division by zero now is possible (yes, 0/0/0 is possible) Then the original code is prefered. + + /* fake out works up to ~500 GB */ + cyl = (cyl 65535) ? 65535 : cyl; + drive-bios_cyl = drive-cyl = cyl; drive-cyl shouldn't be changed Same comment as above. Only change BIOS values. } if (drive-bios_sect drive-bios_head) { - unsigned int cap0 = capacity; /* truncate to 32 bits */ - unsigned int cylsz, cyl; + cylsz = drive-bios_sect * drive-bios_head; + cyl = do_div64_32(capacity, cylsz); - if (cap0 != capacity) - drive-bios_cyl = 65535; cap0 != capacity = set the max possible drive-bios_cyl, no need for adjusting it - else { cap0 == capacity = no need for do_div(), only lower 32-bits are used, adjust drive-bios_cyl - cylsz = drive-bios_sect * drive-bios_head; - cyl = cap0 / cylsz; - if (cyl 65535) - cyl = 65535; - if (cyl drive-bios_cyl) - drive-bios_cyl = cyl; - } + if (cyl 65535) + cyl = 65535; + if (cyl drive-bios_cyl
[PATCH] IDE disks show invalid geometries in /proc/ide/hd*/geometry
The ATA specification tells large disk drives to return C/H/S data of 16383/16/63 regardless of their actual size (other variations on this return include 15 heads and/or 4092 cylinders). Unfortunately these CHS data confuse the existing IDE code and cause it to report invalid geometries in /proc when the disk runs in LBA mode. The invalid geometries can cause failures in the partitioning tools; partitioning may be impossible or illogical size limitations occur. This also leads to various forms of human confusion. I attach a patch that fixes this problem while strongly attempting to not break any existing side effects and await any comments. mark Signed-off-by: Mark Bellon <[EMAIL PROTECTED]> diff -Naur linux-2.6.13-rc3-git9-orig/drivers/ide/ide-disk.c linux-2.6.13-rc3-git9/drivers/ide/ide-disk.c --- linux-2.6.13-rc3-git9-orig/drivers/ide/ide-disk.c 2005-08-01 13:48:21.0 -0700 +++ linux-2.6.13-rc3-git9/drivers/ide/ide-disk.c2005-08-02 12:14:43.0 -0700 @@ -884,10 +884,17 @@ ide_add_setting(drive, "max_failures", SETTING_RW, -1, -1, TYPE_INT, 0, 65535, 1, 1, >max_failures, NULL); } +static uint32_t do_div64_32 (__u64 n, uint32_t d) +{ + do_div(n, d); + + return n; +} + static void idedisk_setup (ide_drive_t *drive) { struct hd_driveid *id = drive->id; - unsigned long long capacity; + __u64 capacity; int barrier; idedisk_add_settings(drive); @@ -949,27 +956,32 @@ */ capacity = idedisk_capacity (drive); if (!drive->forced_geom) { + uint32_t cylsz, cyl; - if (idedisk_supports_lba48(drive->id)) { - /* compatibility */ - drive->bios_sect = 63; - drive->bios_head = 255; + /* +* In LBA mode the geometry isn't used to talk to the drive +* so always create a "rational" geometry from the capacity. +*/ + if (drive->select.b.lba) { + drive->bios_sect = drive->sect = 63; + drive->bios_head = drive->head = 255; + + cylsz = drive->bios_sect * drive->bios_head; + cyl = do_div64_32(capacity, cylsz); + + /* "fake out" works up to ~500 GB */ + cyl = (cyl > 65535) ? 65535 : cyl; + drive->bios_cyl = drive->cyl = cyl; } if (drive->bios_sect && drive->bios_head) { - unsigned int cap0 = capacity; /* truncate to 32 bits */ - unsigned int cylsz, cyl; + cylsz = drive->bios_sect * drive->bios_head; + cyl = do_div64_32(capacity, cylsz); - if (cap0 != capacity) - drive->bios_cyl = 65535; - else { - cylsz = drive->bios_sect * drive->bios_head; - cyl = cap0 / cylsz; - if (cyl > 65535) - cyl = 65535; - if (cyl > drive->bios_cyl) - drive->bios_cyl = cyl; - } + if (cyl > 65535) + cyl = 65535; + if (cyl > drive->bios_cyl) + drive->bios_cyl = cyl; } } printk(KERN_INFO "%s: %llu sectors (%llu MB)",
[PATCH] IDE disks show invalid geometries in /proc/ide/hd*/geometry
The ATA specification tells large disk drives to return C/H/S data of 16383/16/63 regardless of their actual size (other variations on this return include 15 heads and/or 4092 cylinders). Unfortunately these CHS data confuse the existing IDE code and cause it to report invalid geometries in /proc when the disk runs in LBA mode. The invalid geometries can cause failures in the partitioning tools; partitioning may be impossible or illogical size limitations occur. This also leads to various forms of human confusion. I attach a patch that fixes this problem while strongly attempting to not break any existing side effects and await any comments. mark Signed-off-by: Mark Bellon [EMAIL PROTECTED] diff -Naur linux-2.6.13-rc3-git9-orig/drivers/ide/ide-disk.c linux-2.6.13-rc3-git9/drivers/ide/ide-disk.c --- linux-2.6.13-rc3-git9-orig/drivers/ide/ide-disk.c 2005-08-01 13:48:21.0 -0700 +++ linux-2.6.13-rc3-git9/drivers/ide/ide-disk.c2005-08-02 12:14:43.0 -0700 @@ -884,10 +884,17 @@ ide_add_setting(drive, max_failures, SETTING_RW, -1, -1, TYPE_INT, 0, 65535, 1, 1, drive-max_failures, NULL); } +static uint32_t do_div64_32 (__u64 n, uint32_t d) +{ + do_div(n, d); + + return n; +} + static void idedisk_setup (ide_drive_t *drive) { struct hd_driveid *id = drive-id; - unsigned long long capacity; + __u64 capacity; int barrier; idedisk_add_settings(drive); @@ -949,27 +956,32 @@ */ capacity = idedisk_capacity (drive); if (!drive-forced_geom) { + uint32_t cylsz, cyl; - if (idedisk_supports_lba48(drive-id)) { - /* compatibility */ - drive-bios_sect = 63; - drive-bios_head = 255; + /* +* In LBA mode the geometry isn't used to talk to the drive +* so always create a rational geometry from the capacity. +*/ + if (drive-select.b.lba) { + drive-bios_sect = drive-sect = 63; + drive-bios_head = drive-head = 255; + + cylsz = drive-bios_sect * drive-bios_head; + cyl = do_div64_32(capacity, cylsz); + + /* fake out works up to ~500 GB */ + cyl = (cyl 65535) ? 65535 : cyl; + drive-bios_cyl = drive-cyl = cyl; } if (drive-bios_sect drive-bios_head) { - unsigned int cap0 = capacity; /* truncate to 32 bits */ - unsigned int cylsz, cyl; + cylsz = drive-bios_sect * drive-bios_head; + cyl = do_div64_32(capacity, cylsz); - if (cap0 != capacity) - drive-bios_cyl = 65535; - else { - cylsz = drive-bios_sect * drive-bios_head; - cyl = cap0 / cylsz; - if (cyl 65535) - cyl = 65535; - if (cyl drive-bios_cyl) - drive-bios_cyl = cyl; - } + if (cyl 65535) + cyl = 65535; + if (cyl drive-bios_cyl) + drive-bios_cyl = cyl; } } printk(KERN_INFO %s: %llu sectors (%llu MB),
Re: [PATCH] (TAKE 3) disk quotas fail when /etc/mtab is symlinked to /proc/mounts
If /etc/mtab is a regular file all of the mount options (of a file system) are written to /etc/mtab by the mount command. The quota tools look there for the quota strings for their operation. If, however, /etc/mtab is a symlink to /proc/mounts (a "good thing" in some environments) the tools don't write anything - they assume the kernel will take care of things. While the quota options are sent down to the kernel via the mount system call and the file system codes handle them properly unfortunately there is no code to echo the quota strings into /proc/mounts and the quota tools fail in the symlink case. The attached patchs modify the EXT[2|3] and JFS codes to add the necessary hooks. The show_options function of each file system in these patches currently deal with only those things that seemed related to quotas; especially in the EXT3 case more can be done (later?). Jan Kara also noted the difficulty in moving these changes above the FS codes responding similarly to myself to Andrew's comment about possible VFS migration. Issue summary: - FS codes have to process the entire string of options anyway. - Only FS codes that use quotas must have a show_options function (for quotas to work properly) however quotas are only used in a small number of FS. - Since most of the quota using FS support other options these FS codes should have the a show_options function to show those options - and the quota echoing becomes virtually negligible. Based on feedback I have modified my patches from the original: JFS a missing patch has been restored to the posting EXT[2|3] and JFS always use the show_options function - Each FS has at least one FS specific option displayed - QUOTA output is under a CONFIG_QUOTA ifdef - a follow-on patch will add a multitude of options for each FS EXT[2|3] and JFS "quota" is treated as "usrquota" EXT3 journalled data check for journalled quota removed EXT[2|3] mount when quota specified but not compiled in - no changes from my original patch. I tested the patch and the codes warn but - still mount. With all due respection I believe the comments otherwise were a - misread of the patch. Please reread/test and comment. XFS patch removed - the XFS team already made the necessary changes EXT3 mixing old and new quotas are handled differently (not purely exclusive) - if old and new quotas for the same type are used together the old type is silently depricated for compatability (e.g. usrquota and usrjquota) - mixing of old and new quotas is an error (e.g. usrjquota and grpquota) mark Signed-off-by: Mark Bellon <[EMAIL PROTECTED]> diff -Naur linux-2.6.13-rc3-git9-orig/fs/ext2/super.c linux-2.6.13-rc3-git9/fs/ext2/super.c --- linux-2.6.13-rc3-git9-orig/fs/ext2/super.c 2005-07-28 15:12:51.0 -0700 +++ linux-2.6.13-rc3-git9/fs/ext2/super.c 2005-07-29 14:49:45.0 -0700 @@ -19,6 +19,7 @@ #include #include #include +#include #include #include #include @@ -27,6 +28,8 @@ #include #include #include +#include +#include #include #include "ext2.h" #include "xattr.h" @@ -201,6 +204,26 @@ #endif } +static int ext2_show_options(struct seq_file *seq, struct vfsmount *vfs) +{ + struct ext2_sb_info *sbi = EXT2_SB(vfs->mnt_sb); + + if (sbi->s_mount_opt & EXT2_MOUNT_GRPID) + seq_puts(seq, ",grpid"); + else + seq_puts(seq, ",nogrpid"); + +#if defined(CONFIG_QUOTA) + if (sbi->s_mount_opt & EXT2_MOUNT_USRQUOTA) + seq_puts(seq, ",usrquota"); + + if (sbi->s_mount_opt & EXT2_MOUNT_GRPQUOTA) + seq_puts(seq, ",grpquota"); +#endif + + return 0; +} + #ifdef CONFIG_QUOTA static ssize_t ext2_quota_read(struct super_block *sb, int type, char *data, size_t len, loff_t off); static ssize_t ext2_quota_write(struct super_block *sb, int type, const char *data, size_t len, loff_t off); @@ -218,6 +241,7 @@ .statfs = ext2_statfs, .remount_fs = ext2_remount, .clear_inode= ext2_clear_inode, + .show_options = ext2_show_options, #ifdef CONFIG_QUOTA .quota_read = ext2_quota_read, .quota_write= ext2_quota_write, @@ -256,10 +280,11 @@ enum { Opt_bsd_df, Opt_minix_df, Opt_grpid, Opt_nogrpid, - Opt_resgid, Opt_resuid, Opt_sb, Opt_err_cont, Opt_err_panic, Opt_err_ro, - Opt_nouid32, Opt_check, Opt_nocheck, Opt_debug, Opt_oldalloc, Opt_orlov, Opt_nobh, - Opt_user_xattr, Opt_nouser_xattr, Opt_acl, Opt_noacl, Opt_xip, - Opt_ignore, Opt_err, + Opt_resgid, Opt_resuid, Opt_sb, Opt_err_cont, Opt_err_panic, + Opt_err_ro, Opt_nouid32, Opt_check, Opt_nocheck, Opt_debug, + Opt_oldalloc, Opt_orlov, Opt_nobh, Opt_user_xattr, Opt_nouser_xattr, + Opt
[PATCH] (REPOST/REVISION) disk quotas fail when /etc/mtab is symlinked to /proc/mounts
If /etc/mtab is a regular file all of the mount options (of a file system) are written to /etc/mtab by the mount command. The quota tools look there for the quota strings for their operation. If, however, /etc/mtab is a symlink to /proc/mounts (a "good thing" in some environments) the tools don't write anything - they assume the kernel will take care of things. While the quota options are sent down to the kernel via the mount system call and the file system codes handle them properly unfortunately there is no code to echo the quota strings into /proc/mounts and the quota tools fail in the symlink case. The attached patchs modify the EXT[2|3] and JFS codes to add the necessary hooks. The show_options function of each file system in these patches currently deal with only those things that seemed related to quotas; especially in the EXT3 case more can be done (later?). Jan Kara also noted the difficulty in moving these changes above the FS codes responding similarly to myself to Andrew's comment about possible VFS migration. Issue summary: - FS codes have to process the entire string of options anyway. - Only FS codes that use quotas must have a show_options function (for quotas to work properly) however quotas are only used in a small number of FS. - Since most of the quota using FS support other options these FS codes should have the a show_options function to show those options - and the quota echoing becomes virtually negligible. Based on feedback I have modified my patches from the original: EXT3 always uses the show_options function even when QUOTA are not compiled in EXT[2|3] and JFS "quota" is treated as "usrquota" EXT3 journalled data check for journalled quota removed EXT[2|3] mount when quota specified but not compiled in - no changes from my original patch. I tested the patch and the codes warn but still - mount. With all due respection I believe the comments otherwise were a misread - of the patch. Please reread/test and comment. XFS patch removed - the XFS team already made the necessary changes EXT3 mixing old and new quotas are handled differently (not purely exclusive) - if old and new are present together old is silently depricated (compatability) - mixing of old and new quotas is an error (e.g. usrjquota and grpquota) mark Signed-off-by: Mark Bellon <[EMAIL PROTECTED]> diff -Naur linux-2.6.13-rc3-git9-orig/include/linux/ext3_fs.h linux-2.6.13-rc3-git9/include/linux/ext3_fs.h --- linux-2.6.13-rc3-git9-orig/include/linux/ext3_fs.h 2005-07-28 15:12:52.0 -0700 +++ linux-2.6.13-rc3-git9/include/linux/ext3_fs.h 2005-07-28 16:18:24.0 -0700 @@ -373,6 +373,8 @@ #define EXT3_MOUNT_BARRIER 0x2 /* Use block barriers */ #define EXT3_MOUNT_NOBH0x4 /* No bufferheads */ #define EXT3_MOUNT_QUOTA 0x8 /* Some quota option set */ +#define EXT3_MOUNT_USRQUOTA0x10 /* "old" user quota */ +#define EXT3_MOUNT_GRPQUOTA0x20 /* "old" group quota */ /* Compatibility, for having both ext2_fs.h and ext3_fs.h included at once */ #ifndef _LINUX_EXT2_FS_H diff -Naur linux-2.6.13-rc3-git9-orig/fs/ext3/super.c linux-2.6.13-rc3-git9/fs/ext3/super.c --- linux-2.6.13-rc3-git9-orig/fs/ext3/super.c 2005-07-28 15:12:51.0 -0700 +++ linux-2.6.13-rc3-git9/fs/ext3/super.c 2005-07-29 10:00:38.0 -0700 @@ -35,6 +35,7 @@ #include #include #include +#include #include #include "xattr.h" #include "acl.h" @@ -509,8 +510,41 @@ kfree(rsv); } -#ifdef CONFIG_QUOTA +static int ext3_show_options(struct seq_file *seq, struct vfsmount *vfs) +{ + struct ext3_sb_info *sbi = EXT3_SB(vfs->mnt_sb); + + if (sbi->s_mount_opt & EXT3_MOUNT_JOURNAL_DATA) + seq_puts(seq, ",data=journal"); + + if (sbi->s_mount_opt & EXT3_MOUNT_ORDERED_DATA) + seq_puts(seq, ",data=ordered"); + + if (sbi->s_mount_opt & EXT3_MOUNT_WRITEBACK_DATA) + seq_puts(seq, ",data=writeback"); + +#if defined(CONFIG_QUOTA) + if (sbi->s_jquota_fmt) + seq_printf(seq, ",jqfmt=%s", + (sbi->s_jquota_fmt == QFMT_VFS_OLD) ? "vfsold": "vfsv0"); + + if (sbi->s_qf_names[USRQUOTA]) + seq_printf(seq, ",usrjquota=%s", sbi->s_qf_names[USRQUOTA]); + + if (sbi->s_qf_names[GRPQUOTA]) + seq_printf(seq, ",grpjquota=%s", sbi->s_qf_names[GRPQUOTA]); + + if (sbi->s_mount_opt & EXT3_MOUNT_USRQUOTA) + seq_puts(seq, ",usrquota"); + + if (sbi->s_mount_opt & EXT3_MOUNT_GRPQUOTA) + seq_puts(seq, ",grpquota&
Re: [PATCH] disk quotas fail when /etc/mtab is symlinked to /proc/mounts
Jan Kara wrote: Hello, If /etc/mtab is a regular file all of the mount options (of a file system) are written to /etc/mtab by the mount command. The quota tools look there for the quota strings for their operation. If, however, /etc/mtab is a symlink to /proc/mounts (a "good thing" in some environments) the tools don't write anything - they assume the kernel will take care of things. While the quota options are sent down to the kernel via the mount system call and the file system codes handle them properly unfortunately there is no code to echo the quota strings into /proc/mounts and the quota tools fail in the symlink case. Yes, I agree that it's a good think to have quota options in /proc/mounts. Doing it per filesystem is not nice but I don't know a nice way of doing it a VFS level either... I'll have a new patch which includes your comments and a few I thought up soon. I'll repost the new patch. mark The attached patchs modify the EXT[2|3] and [X|J]FS codes to add the necessary hooks. The show_options function of each file system in these patches currently deal with only those things that seemed related to quotas; especially in the EXT3 case more can be done (later?). The EXT3 has added error checking and has two minor changes: The "quota" option is considered part of the older style quotas Journalled quotas and older style quotas are mutually exclusive. - both discussable topics Ack. mark Signed-off-by: Mark Bellon <[EMAIL PROTECTED]> Thanks for the patch - I have some comments below... #ifdef CONFIG_QUOTA +static int ext3_show_options(struct seq_file *seq, struct vfsmount *vfs) +{ + struct ext3_sb_info *sbi = EXT3_SB(vfs->mnt_sb); + + if (sbi->s_mount_opt & EXT3_MOUNT_JOURNAL_DATA) + seq_puts(seq, ",data=journal"); + + if (sbi->s_mount_opt & EXT3_MOUNT_ORDERED_DATA) + seq_puts(seq, ",data=ordered"); + + if (sbi->s_mount_opt & EXT3_MOUNT_WRITEBACK_DATA) + seq_puts(seq, ",data=writeback"); Showing 'data' option only when quota is compile is ugly... Please move CONFIG_QUOTA inside only around quota specific stuff. + + if (sbi->s_jquota_fmt) + seq_printf(seq, ",jqfmt=%s", + (sbi->s_jquota_fmt == QFMT_VFS_OLD) ? "vfsold": "vfsv0"); + + if (sbi->s_qf_names[USRQUOTA]) + seq_printf(seq, ",usrjquota=%s", sbi->s_qf_names[USRQUOTA]); + + if (sbi->s_qf_names[GRPQUOTA]) + seq_printf(seq, ",grpjquota=%s", sbi->s_qf_names[GRPQUOTA]); + + if (sbi->s_mount_opt & EXT3_MOUNT_USRQUOTA) + seq_puts(seq, ",usrquota"); + + if (sbi->s_mount_opt & EXT3_MOUNT_GRPQUOTA) + seq_puts(seq, ",grpquota"); + + return 0; +} #define QTYPE2NAME(t) ((t)==USRQUOTA?"user":"group") #define QTYPE2MOPT(on, t) ((t)==USRQUOTA?((on)##USRJQUOTA):((on)##GRPJQUOTA)) @@ -572,6 +604,7 @@ #ifdef CONFIG_QUOTA .quota_read = ext3_quota_read, .quota_write= ext3_quota_write, + .show_options = ext3_show_options, Probably set this function everytime... + case Opt_quota: + case Opt_usrquota: + case Opt_grpquota: case Opt_usrjquota: case Opt_grpjquota: case Opt_offusrjquota: @@ -924,7 +973,6 @@ "EXT3-fs: journalled quota options not " "supported.\n"); break; - case Opt_quota: case Opt_noquota: break; I'm not sure with the above change.. Previously if you mounted a filesystem with 'usrquota' option without a kernel quota support the mount succeeded. With your patch it will fail. I agree that that makes more sense but I'm not sure it's correct to change a behaviour so suddently. Maybe just issue a warning but let the mount succeed. #endif @@ -962,11 +1010,36 @@ } } #ifdef CONFIG_QUOTA - if (!sbi->s_jquota_fmt && (sbi->s_qf_names[USRQUOTA] || - sbi->s_qf_names[GRPQUOTA])) { - printk(KERN_ERR + if (sbi->s_qf_names[USRQUOTA] || sbi->s_qf_names[GRPQUOTA]) { + if ((sbi->s_mount_opt & EXT3_MOUNT_USRQUOTA) || + (sbi->s_mount_opt & EXT3_MOUNT_GRPQUOTA)) { + printk(KERN_ERR + "EXT3-fs: only one type of quotas allowed.\n"); + + return 0; + } + + if (!sbi->s_jquota_fmt) { + printk(KERN_ERR "EXT3-fs: journalled quota format not specifie
Re: [PATCH] disk quotas fail when /etc/mtab is symlinked to /proc/mounts
Jan Kara wrote: Hello, If /etc/mtab is a regular file all of the mount options (of a file system) are written to /etc/mtab by the mount command. The quota tools look there for the quota strings for their operation. If, however, /etc/mtab is a symlink to /proc/mounts (a good thing in some environments) the tools don't write anything - they assume the kernel will take care of things. While the quota options are sent down to the kernel via the mount system call and the file system codes handle them properly unfortunately there is no code to echo the quota strings into /proc/mounts and the quota tools fail in the symlink case. Yes, I agree that it's a good think to have quota options in /proc/mounts. Doing it per filesystem is not nice but I don't know a nice way of doing it a VFS level either... I'll have a new patch which includes your comments and a few I thought up soon. I'll repost the new patch. mark The attached patchs modify the EXT[2|3] and [X|J]FS codes to add the necessary hooks. The show_options function of each file system in these patches currently deal with only those things that seemed related to quotas; especially in the EXT3 case more can be done (later?). The EXT3 has added error checking and has two minor changes: The quota option is considered part of the older style quotas Journalled quotas and older style quotas are mutually exclusive. - both discussable topics Ack. mark Signed-off-by: Mark Bellon [EMAIL PROTECTED] Thanks for the patch - I have some comments below... snip #ifdef CONFIG_QUOTA +static int ext3_show_options(struct seq_file *seq, struct vfsmount *vfs) +{ + struct ext3_sb_info *sbi = EXT3_SB(vfs-mnt_sb); + + if (sbi-s_mount_opt EXT3_MOUNT_JOURNAL_DATA) + seq_puts(seq, ,data=journal); + + if (sbi-s_mount_opt EXT3_MOUNT_ORDERED_DATA) + seq_puts(seq, ,data=ordered); + + if (sbi-s_mount_opt EXT3_MOUNT_WRITEBACK_DATA) + seq_puts(seq, ,data=writeback); Showing 'data' option only when quota is compile is ugly... Please move CONFIG_QUOTA inside only around quota specific stuff. + + if (sbi-s_jquota_fmt) + seq_printf(seq, ,jqfmt=%s, + (sbi-s_jquota_fmt == QFMT_VFS_OLD) ? vfsold: vfsv0); + + if (sbi-s_qf_names[USRQUOTA]) + seq_printf(seq, ,usrjquota=%s, sbi-s_qf_names[USRQUOTA]); + + if (sbi-s_qf_names[GRPQUOTA]) + seq_printf(seq, ,grpjquota=%s, sbi-s_qf_names[GRPQUOTA]); + + if (sbi-s_mount_opt EXT3_MOUNT_USRQUOTA) + seq_puts(seq, ,usrquota); + + if (sbi-s_mount_opt EXT3_MOUNT_GRPQUOTA) + seq_puts(seq, ,grpquota); + + return 0; +} #define QTYPE2NAME(t) ((t)==USRQUOTA?user:group) #define QTYPE2MOPT(on, t) ((t)==USRQUOTA?((on)##USRJQUOTA):((on)##GRPJQUOTA)) @@ -572,6 +604,7 @@ #ifdef CONFIG_QUOTA .quota_read = ext3_quota_read, .quota_write= ext3_quota_write, + .show_options = ext3_show_options, Probably set this function everytime... snip + case Opt_quota: + case Opt_usrquota: + case Opt_grpquota: case Opt_usrjquota: case Opt_grpjquota: case Opt_offusrjquota: @@ -924,7 +973,6 @@ EXT3-fs: journalled quota options not supported.\n); break; - case Opt_quota: case Opt_noquota: break; I'm not sure with the above change.. Previously if you mounted a filesystem with 'usrquota' option without a kernel quota support the mount succeeded. With your patch it will fail. I agree that that makes more sense but I'm not sure it's correct to change a behaviour so suddently. Maybe just issue a warning but let the mount succeed. #endif @@ -962,11 +1010,36 @@ } } #ifdef CONFIG_QUOTA - if (!sbi-s_jquota_fmt (sbi-s_qf_names[USRQUOTA] || - sbi-s_qf_names[GRPQUOTA])) { - printk(KERN_ERR + if (sbi-s_qf_names[USRQUOTA] || sbi-s_qf_names[GRPQUOTA]) { + if ((sbi-s_mount_opt EXT3_MOUNT_USRQUOTA) || + (sbi-s_mount_opt EXT3_MOUNT_GRPQUOTA)) { + printk(KERN_ERR + EXT3-fs: only one type of quotas allowed.\n); + + return 0; + } + + if (!sbi-s_jquota_fmt) { + printk(KERN_ERR EXT3-fs: journalled quota format not specified.\n); - return 0; + + return 0; + } + + if ((sbi-s_mount_opt EXT3_MOUNT_JOURNAL_DATA) == 0) { This test does not make sense - journaled quota in recent kernels works with arbitrary journaling data mode. + printk(KERN_ERR + EXT3-fs
[PATCH] (REPOST/REVISION) disk quotas fail when /etc/mtab is symlinked to /proc/mounts
If /etc/mtab is a regular file all of the mount options (of a file system) are written to /etc/mtab by the mount command. The quota tools look there for the quota strings for their operation. If, however, /etc/mtab is a symlink to /proc/mounts (a good thing in some environments) the tools don't write anything - they assume the kernel will take care of things. While the quota options are sent down to the kernel via the mount system call and the file system codes handle them properly unfortunately there is no code to echo the quota strings into /proc/mounts and the quota tools fail in the symlink case. The attached patchs modify the EXT[2|3] and JFS codes to add the necessary hooks. The show_options function of each file system in these patches currently deal with only those things that seemed related to quotas; especially in the EXT3 case more can be done (later?). Jan Kara also noted the difficulty in moving these changes above the FS codes responding similarly to myself to Andrew's comment about possible VFS migration. Issue summary: - FS codes have to process the entire string of options anyway. - Only FS codes that use quotas must have a show_options function (for quotas to work properly) however quotas are only used in a small number of FS. - Since most of the quota using FS support other options these FS codes should have the a show_options function to show those options - and the quota echoing becomes virtually negligible. Based on feedback I have modified my patches from the original: EXT3 always uses the show_options function even when QUOTA are not compiled in EXT[2|3] and JFS quota is treated as usrquota EXT3 journalled data check for journalled quota removed EXT[2|3] mount when quota specified but not compiled in - no changes from my original patch. I tested the patch and the codes warn but still - mount. With all due respection I believe the comments otherwise were a misread - of the patch. Please reread/test and comment. XFS patch removed - the XFS team already made the necessary changes EXT3 mixing old and new quotas are handled differently (not purely exclusive) - if old and new are present together old is silently depricated (compatability) - mixing of old and new quotas is an error (e.g. usrjquota and grpquota) mark Signed-off-by: Mark Bellon [EMAIL PROTECTED] diff -Naur linux-2.6.13-rc3-git9-orig/include/linux/ext3_fs.h linux-2.6.13-rc3-git9/include/linux/ext3_fs.h --- linux-2.6.13-rc3-git9-orig/include/linux/ext3_fs.h 2005-07-28 15:12:52.0 -0700 +++ linux-2.6.13-rc3-git9/include/linux/ext3_fs.h 2005-07-28 16:18:24.0 -0700 @@ -373,6 +373,8 @@ #define EXT3_MOUNT_BARRIER 0x2 /* Use block barriers */ #define EXT3_MOUNT_NOBH0x4 /* No bufferheads */ #define EXT3_MOUNT_QUOTA 0x8 /* Some quota option set */ +#define EXT3_MOUNT_USRQUOTA0x10 /* old user quota */ +#define EXT3_MOUNT_GRPQUOTA0x20 /* old group quota */ /* Compatibility, for having both ext2_fs.h and ext3_fs.h included at once */ #ifndef _LINUX_EXT2_FS_H diff -Naur linux-2.6.13-rc3-git9-orig/fs/ext3/super.c linux-2.6.13-rc3-git9/fs/ext3/super.c --- linux-2.6.13-rc3-git9-orig/fs/ext3/super.c 2005-07-28 15:12:51.0 -0700 +++ linux-2.6.13-rc3-git9/fs/ext3/super.c 2005-07-29 10:00:38.0 -0700 @@ -35,6 +35,7 @@ #include linux/mount.h #include linux/namei.h #include linux/quotaops.h +#include linux/seq_file.h #include asm/uaccess.h #include xattr.h #include acl.h @@ -509,8 +510,41 @@ kfree(rsv); } -#ifdef CONFIG_QUOTA +static int ext3_show_options(struct seq_file *seq, struct vfsmount *vfs) +{ + struct ext3_sb_info *sbi = EXT3_SB(vfs-mnt_sb); + + if (sbi-s_mount_opt EXT3_MOUNT_JOURNAL_DATA) + seq_puts(seq, ,data=journal); + + if (sbi-s_mount_opt EXT3_MOUNT_ORDERED_DATA) + seq_puts(seq, ,data=ordered); + + if (sbi-s_mount_opt EXT3_MOUNT_WRITEBACK_DATA) + seq_puts(seq, ,data=writeback); + +#if defined(CONFIG_QUOTA) + if (sbi-s_jquota_fmt) + seq_printf(seq, ,jqfmt=%s, + (sbi-s_jquota_fmt == QFMT_VFS_OLD) ? vfsold: vfsv0); + + if (sbi-s_qf_names[USRQUOTA]) + seq_printf(seq, ,usrjquota=%s, sbi-s_qf_names[USRQUOTA]); + + if (sbi-s_qf_names[GRPQUOTA]) + seq_printf(seq, ,grpjquota=%s, sbi-s_qf_names[GRPQUOTA]); + + if (sbi-s_mount_opt EXT3_MOUNT_USRQUOTA) + seq_puts(seq, ,usrquota); + + if (sbi-s_mount_opt EXT3_MOUNT_GRPQUOTA) + seq_puts(seq, ,grpquota); +#endif + return 0; +} + +#ifdef CONFIG_QUOTA #define QTYPE2NAME(t) ((t)==USRQUOTA?user:group) #define QTYPE2MOPT(on, t) ((t)==USRQUOTA?((on)##USRJQUOTA):((on)##GRPJQUOTA)) @@ -569,6 +603,7 @@ .statfs
Re: [PATCH] (TAKE 3) disk quotas fail when /etc/mtab is symlinked to /proc/mounts
If /etc/mtab is a regular file all of the mount options (of a file system) are written to /etc/mtab by the mount command. The quota tools look there for the quota strings for their operation. If, however, /etc/mtab is a symlink to /proc/mounts (a good thing in some environments) the tools don't write anything - they assume the kernel will take care of things. While the quota options are sent down to the kernel via the mount system call and the file system codes handle them properly unfortunately there is no code to echo the quota strings into /proc/mounts and the quota tools fail in the symlink case. The attached patchs modify the EXT[2|3] and JFS codes to add the necessary hooks. The show_options function of each file system in these patches currently deal with only those things that seemed related to quotas; especially in the EXT3 case more can be done (later?). Jan Kara also noted the difficulty in moving these changes above the FS codes responding similarly to myself to Andrew's comment about possible VFS migration. Issue summary: - FS codes have to process the entire string of options anyway. - Only FS codes that use quotas must have a show_options function (for quotas to work properly) however quotas are only used in a small number of FS. - Since most of the quota using FS support other options these FS codes should have the a show_options function to show those options - and the quota echoing becomes virtually negligible. Based on feedback I have modified my patches from the original: JFS a missing patch has been restored to the posting EXT[2|3] and JFS always use the show_options function - Each FS has at least one FS specific option displayed - QUOTA output is under a CONFIG_QUOTA ifdef - a follow-on patch will add a multitude of options for each FS EXT[2|3] and JFS quota is treated as usrquota EXT3 journalled data check for journalled quota removed EXT[2|3] mount when quota specified but not compiled in - no changes from my original patch. I tested the patch and the codes warn but - still mount. With all due respection I believe the comments otherwise were a - misread of the patch. Please reread/test and comment. XFS patch removed - the XFS team already made the necessary changes EXT3 mixing old and new quotas are handled differently (not purely exclusive) - if old and new quotas for the same type are used together the old type is silently depricated for compatability (e.g. usrquota and usrjquota) - mixing of old and new quotas is an error (e.g. usrjquota and grpquota) mark Signed-off-by: Mark Bellon [EMAIL PROTECTED] diff -Naur linux-2.6.13-rc3-git9-orig/fs/ext2/super.c linux-2.6.13-rc3-git9/fs/ext2/super.c --- linux-2.6.13-rc3-git9-orig/fs/ext2/super.c 2005-07-28 15:12:51.0 -0700 +++ linux-2.6.13-rc3-git9/fs/ext2/super.c 2005-07-29 14:49:45.0 -0700 @@ -19,6 +19,7 @@ #include linux/config.h #include linux/module.h #include linux/string.h +#include linux/fs.h #include linux/slab.h #include linux/init.h #include linux/blkdev.h @@ -27,6 +28,8 @@ #include linux/buffer_head.h #include linux/smp_lock.h #include linux/vfs.h +#include linux/seq_file.h +#include linux/mount.h #include asm/uaccess.h #include ext2.h #include xattr.h @@ -201,6 +204,26 @@ #endif } +static int ext2_show_options(struct seq_file *seq, struct vfsmount *vfs) +{ + struct ext2_sb_info *sbi = EXT2_SB(vfs-mnt_sb); + + if (sbi-s_mount_opt EXT2_MOUNT_GRPID) + seq_puts(seq, ,grpid); + else + seq_puts(seq, ,nogrpid); + +#if defined(CONFIG_QUOTA) + if (sbi-s_mount_opt EXT2_MOUNT_USRQUOTA) + seq_puts(seq, ,usrquota); + + if (sbi-s_mount_opt EXT2_MOUNT_GRPQUOTA) + seq_puts(seq, ,grpquota); +#endif + + return 0; +} + #ifdef CONFIG_QUOTA static ssize_t ext2_quota_read(struct super_block *sb, int type, char *data, size_t len, loff_t off); static ssize_t ext2_quota_write(struct super_block *sb, int type, const char *data, size_t len, loff_t off); @@ -218,6 +241,7 @@ .statfs = ext2_statfs, .remount_fs = ext2_remount, .clear_inode= ext2_clear_inode, + .show_options = ext2_show_options, #ifdef CONFIG_QUOTA .quota_read = ext2_quota_read, .quota_write= ext2_quota_write, @@ -256,10 +280,11 @@ enum { Opt_bsd_df, Opt_minix_df, Opt_grpid, Opt_nogrpid, - Opt_resgid, Opt_resuid, Opt_sb, Opt_err_cont, Opt_err_panic, Opt_err_ro, - Opt_nouid32, Opt_check, Opt_nocheck, Opt_debug, Opt_oldalloc, Opt_orlov, Opt_nobh, - Opt_user_xattr, Opt_nouser_xattr, Opt_acl, Opt_noacl, Opt_xip, - Opt_ignore, Opt_err, + Opt_resgid, Opt_resuid, Opt_sb, Opt_err_cont, Opt_err_panic, + Opt_err_ro, Opt_nouid32, Opt_check, Opt_nocheck, Opt_debug, + Opt_oldalloc, Opt_orlov, Opt_nobh, Opt_user_xattr
Re: [PATCH] disk quotas fail when /etc/mtab is symlinked to /proc/mounts
Mark Bellon wrote: Andrew Morton wrote: Mark Bellon <[EMAIL PROTECTED]> wrote: If /etc/mtab is a regular file all of the mount options (of a file system) are written to /etc/mtab by the mount command. The quota tools look there for the quota strings for their operation. If, however, /etc/mtab is a symlink to /proc/mounts (a "good thing" in some environments) the tools don't write anything - they assume the kernel will take care of things. While the quota options are sent down to the kernel via the mount system call and the file system codes handle them properly unfortunately there is no code to echo the quota strings into /proc/mounts and the quota tools fail in the symlink case. hm. Perhaps others with operational experience in that area can comment. OK. The attached patchs modify the EXT[2|3] and [X|J]FS codes to add the necessary hooks. The show_options function of each file system in these patches currently deal with only those things that seemed related to quotas; especially in the EXT3 case more can be done (later?). It seems sad to do it in each filesystem. Is there no way in which we can do this for all filesystems, in a single place in the VFS? Each file system must be able to echo it's own FS specific options, hence the show_options hook (XFS is a good example). EXT3 has it's own form of journalled quota file options, hence the need for the specific hook. The "older style" (e.g. "usrquota", "grpquota", "quota") could be done generically but a FS might want any number of quota options. The few lines of code in each file system didn't seem so bad especially if the show_function start echoing more. Followup comment/through... If we want /proc/mounts to really replace /etc/mtab in the general case, always using it as a symlink, the file system codes will all need the show_options hook - they will need to echo all of their private options duplicating, as closely as possible, what would have been written to the /etc/mtab regular file. mark mark - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] disk quotas fail when /etc/mtab is symlinked to /proc/mounts
Andrew Morton wrote: Mark Bellon <[EMAIL PROTECTED]> wrote: If /etc/mtab is a regular file all of the mount options (of a file system) are written to /etc/mtab by the mount command. The quota tools look there for the quota strings for their operation. If, however, /etc/mtab is a symlink to /proc/mounts (a "good thing" in some environments) the tools don't write anything - they assume the kernel will take care of things. While the quota options are sent down to the kernel via the mount system call and the file system codes handle them properly unfortunately there is no code to echo the quota strings into /proc/mounts and the quota tools fail in the symlink case. hm. Perhaps others with operational experience in that area can comment. OK. The attached patchs modify the EXT[2|3] and [X|J]FS codes to add the necessary hooks. The show_options function of each file system in these patches currently deal with only those things that seemed related to quotas; especially in the EXT3 case more can be done (later?). It seems sad to do it in each filesystem. Is there no way in which we can do this for all filesystems, in a single place in the VFS? Each file system must be able to echo it's own FS specific options, hence the show_options hook (XFS is a good example). EXT3 has it's own form of journalled quota file options, hence the need for the specific hook. The "older style" (e.g. "usrquota", "grpquota", "quota") could be done generically but a FS might want any number of quota options. The few lines of code in each file system didn't seem so bad especially if the show_function start echoing more. mark - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] disk quotas fail when /etc/mtab is symlinked to /proc/mounts
If /etc/mtab is a regular file all of the mount options (of a file system) are written to /etc/mtab by the mount command. The quota tools look there for the quota strings for their operation. If, however, /etc/mtab is a symlink to /proc/mounts (a "good thing" in some environments) the tools don't write anything - they assume the kernel will take care of things. While the quota options are sent down to the kernel via the mount system call and the file system codes handle them properly unfortunately there is no code to echo the quota strings into /proc/mounts and the quota tools fail in the symlink case. The attached patchs modify the EXT[2|3] and JFS codes to add the necessary hooks. The show_options function of each file system in these patches currently deal with only those things that seemed related to quotas; especially in the EXT3 case more can be done (later?). The original XFS change has been dropped as per Nathan Scott <[EMAIL PROTECTED]>. The functionality had already been added. The EXT3 has added error checking and has two minor changes: The "quota" option is considered part of the older style quotas Journalled quotas and older style quotas are mutually exclusive. - both discussable topics mark Signed-off-by: Mark Bellon <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] disk quotas fail when /etc/mtab is symlinked to /proc/mounts
Nathan Scott wrote: On Thu, Jul 28, 2005 at 05:03:02PM -0700, Mark Bellon wrote: The attached patchs modify the EXT[2|3] and [X|J]FS codes to add the The XFS component is incorrect, we're already doing this elsewhere (over in fs/xfs/quota/xfs_qm_bhv.c), so please drop this last part from your patch... No problem! New patch coming up. mark diff -Naur linux-2.6.13-rc3-git9-orig/fs/xfs/xfs_vfsops.c linux-2.6.13-rc3-git9/fs/xfs/xfs_vfsops.c ... + { XFS_UQUOTA_ACTIVE,"," "usrquota" }, + { XFS_GQUOTA_ACTIVE,"," "grpquota" }, thanks. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] disk quotas fail when /etc/mtab is symlinked to /proc/mounts
If /etc/mtab is a regular file all of the mount options (of a file system) are written to /etc/mtab by the mount command. The quota tools look there for the quota strings for their operation. If, however, /etc/mtab is a symlink to /proc/mounts (a "good thing" in some environments) the tools don't write anything - they assume the kernel will take care of things. While the quota options are sent down to the kernel via the mount system call and the file system codes handle them properly unfortunately there is no code to echo the quota strings into /proc/mounts and the quota tools fail in the symlink case. The attached patchs modify the EXT[2|3] and [X|J]FS codes to add the necessary hooks. The show_options function of each file system in these patches currently deal with only those things that seemed related to quotas; especially in the EXT3 case more can be done (later?). The EXT3 has added error checking and has two minor changes: The "quota" option is considered part of the older style quotas Journalled quotas and older style quotas are mutually exclusive. - both discussable topics mark Signed-off-by: Mark Bellon <[EMAIL PROTECTED]> diff -Naur linux-2.6.13-rc3-git9-orig/include/linux/ext3_fs.h linux-2.6.13-rc3-git9/include/linux/ext3_fs.h --- linux-2.6.13-rc3-git9-orig/include/linux/ext3_fs.h 2005-07-28 15:12:52.0 -0700 +++ linux-2.6.13-rc3-git9/include/linux/ext3_fs.h 2005-07-28 16:18:24.0 -0700 @@ -373,6 +373,8 @@ #define EXT3_MOUNT_BARRIER 0x2 /* Use block barriers */ #define EXT3_MOUNT_NOBH0x4 /* No bufferheads */ #define EXT3_MOUNT_QUOTA 0x8 /* Some quota option set */ +#define EXT3_MOUNT_USRQUOTA0x10 /* "old" user quota */ +#define EXT3_MOUNT_GRPQUOTA0x20 /* "old" group quota */ /* Compatibility, for having both ext2_fs.h and ext3_fs.h included at once */ #ifndef _LINUX_EXT2_FS_H diff -Naur linux-2.6.13-rc3-git9-orig/fs/ext3/super.c linux-2.6.13-rc3-git9/fs/ext3/super.c --- linux-2.6.13-rc3-git9-orig/fs/ext3/super.c 2005-07-28 15:12:51.0 -0700 +++ linux-2.6.13-rc3-git9/fs/ext3/super.c 2005-07-28 16:01:15.0 -0700 @@ -35,6 +35,7 @@ #include #include #include +#include #include #include "xattr.h" #include "acl.h" @@ -510,6 +511,37 @@ } #ifdef CONFIG_QUOTA +static int ext3_show_options(struct seq_file *seq, struct vfsmount *vfs) +{ + struct ext3_sb_info *sbi = EXT3_SB(vfs->mnt_sb); + + if (sbi->s_mount_opt & EXT3_MOUNT_JOURNAL_DATA) + seq_puts(seq, ",data=journal"); + + if (sbi->s_mount_opt & EXT3_MOUNT_ORDERED_DATA) + seq_puts(seq, ",data=ordered"); + + if (sbi->s_mount_opt & EXT3_MOUNT_WRITEBACK_DATA) + seq_puts(seq, ",data=writeback"); + + if (sbi->s_jquota_fmt) + seq_printf(seq, ",jqfmt=%s", + (sbi->s_jquota_fmt == QFMT_VFS_OLD) ? "vfsold": "vfsv0"); + + if (sbi->s_qf_names[USRQUOTA]) + seq_printf(seq, ",usrjquota=%s", sbi->s_qf_names[USRQUOTA]); + + if (sbi->s_qf_names[GRPQUOTA]) + seq_printf(seq, ",grpjquota=%s", sbi->s_qf_names[GRPQUOTA]); + + if (sbi->s_mount_opt & EXT3_MOUNT_USRQUOTA) + seq_puts(seq, ",usrquota"); + + if (sbi->s_mount_opt & EXT3_MOUNT_GRPQUOTA) + seq_puts(seq, ",grpquota"); + + return 0; +} #define QTYPE2NAME(t) ((t)==USRQUOTA?"user":"group") #define QTYPE2MOPT(on, t) ((t)==USRQUOTA?((on)##USRJQUOTA):((on)##GRPJQUOTA)) @@ -572,6 +604,7 @@ #ifdef CONFIG_QUOTA .quota_read = ext3_quota_read, .quota_write= ext3_quota_write, + .show_options = ext3_show_options, #endif }; @@ -590,7 +623,8 @@ Opt_abort, Opt_data_journal, Opt_data_ordered, Opt_data_writeback, Opt_usrjquota, Opt_grpjquota, Opt_offusrjquota, Opt_offgrpjquota, Opt_jqfmt_vfsold, Opt_jqfmt_vfsv0, Opt_quota, Opt_noquota, - Opt_ignore, Opt_barrier, Opt_err, Opt_resize, + Opt_ignore, Opt_barrier, Opt_err, Opt_resize, Opt_usrquota, + Opt_grpquota }; static match_table_t tokens = { @@ -634,10 +668,10 @@ {Opt_grpjquota, "grpjquota=%s"}, {Opt_jqfmt_vfsold, "jqfmt=vfsold"}, {Opt_jqfmt_vfsv0, "jqfmt=vfsv0"}, - {Opt_quota, "grpquota"}, + {Opt_grpquota, "grpquota"}, {Opt_noquota, "noquota"}, {Opt_quota, "quota"}, - {Opt_quota, "usrquota"}, + {Opt_usrquota, "usrquota"}, {Opt_barrier, "barrier=%u"}, {Opt_err, NULL}, {Opt_resize, "resize"},
[PATCH] disk quotas fail when /etc/mtab is symlinked to /proc/mounts
If /etc/mtab is a regular file all of the mount options (of a file system) are written to /etc/mtab by the mount command. The quota tools look there for the quota strings for their operation. If, however, /etc/mtab is a symlink to /proc/mounts (a good thing in some environments) the tools don't write anything - they assume the kernel will take care of things. While the quota options are sent down to the kernel via the mount system call and the file system codes handle them properly unfortunately there is no code to echo the quota strings into /proc/mounts and the quota tools fail in the symlink case. The attached patchs modify the EXT[2|3] and [X|J]FS codes to add the necessary hooks. The show_options function of each file system in these patches currently deal with only those things that seemed related to quotas; especially in the EXT3 case more can be done (later?). The EXT3 has added error checking and has two minor changes: The quota option is considered part of the older style quotas Journalled quotas and older style quotas are mutually exclusive. - both discussable topics mark Signed-off-by: Mark Bellon [EMAIL PROTECTED] diff -Naur linux-2.6.13-rc3-git9-orig/include/linux/ext3_fs.h linux-2.6.13-rc3-git9/include/linux/ext3_fs.h --- linux-2.6.13-rc3-git9-orig/include/linux/ext3_fs.h 2005-07-28 15:12:52.0 -0700 +++ linux-2.6.13-rc3-git9/include/linux/ext3_fs.h 2005-07-28 16:18:24.0 -0700 @@ -373,6 +373,8 @@ #define EXT3_MOUNT_BARRIER 0x2 /* Use block barriers */ #define EXT3_MOUNT_NOBH0x4 /* No bufferheads */ #define EXT3_MOUNT_QUOTA 0x8 /* Some quota option set */ +#define EXT3_MOUNT_USRQUOTA0x10 /* old user quota */ +#define EXT3_MOUNT_GRPQUOTA0x20 /* old group quota */ /* Compatibility, for having both ext2_fs.h and ext3_fs.h included at once */ #ifndef _LINUX_EXT2_FS_H diff -Naur linux-2.6.13-rc3-git9-orig/fs/ext3/super.c linux-2.6.13-rc3-git9/fs/ext3/super.c --- linux-2.6.13-rc3-git9-orig/fs/ext3/super.c 2005-07-28 15:12:51.0 -0700 +++ linux-2.6.13-rc3-git9/fs/ext3/super.c 2005-07-28 16:01:15.0 -0700 @@ -35,6 +35,7 @@ #include linux/mount.h #include linux/namei.h #include linux/quotaops.h +#include linux/seq_file.h #include asm/uaccess.h #include xattr.h #include acl.h @@ -510,6 +511,37 @@ } #ifdef CONFIG_QUOTA +static int ext3_show_options(struct seq_file *seq, struct vfsmount *vfs) +{ + struct ext3_sb_info *sbi = EXT3_SB(vfs-mnt_sb); + + if (sbi-s_mount_opt EXT3_MOUNT_JOURNAL_DATA) + seq_puts(seq, ,data=journal); + + if (sbi-s_mount_opt EXT3_MOUNT_ORDERED_DATA) + seq_puts(seq, ,data=ordered); + + if (sbi-s_mount_opt EXT3_MOUNT_WRITEBACK_DATA) + seq_puts(seq, ,data=writeback); + + if (sbi-s_jquota_fmt) + seq_printf(seq, ,jqfmt=%s, + (sbi-s_jquota_fmt == QFMT_VFS_OLD) ? vfsold: vfsv0); + + if (sbi-s_qf_names[USRQUOTA]) + seq_printf(seq, ,usrjquota=%s, sbi-s_qf_names[USRQUOTA]); + + if (sbi-s_qf_names[GRPQUOTA]) + seq_printf(seq, ,grpjquota=%s, sbi-s_qf_names[GRPQUOTA]); + + if (sbi-s_mount_opt EXT3_MOUNT_USRQUOTA) + seq_puts(seq, ,usrquota); + + if (sbi-s_mount_opt EXT3_MOUNT_GRPQUOTA) + seq_puts(seq, ,grpquota); + + return 0; +} #define QTYPE2NAME(t) ((t)==USRQUOTA?user:group) #define QTYPE2MOPT(on, t) ((t)==USRQUOTA?((on)##USRJQUOTA):((on)##GRPJQUOTA)) @@ -572,6 +604,7 @@ #ifdef CONFIG_QUOTA .quota_read = ext3_quota_read, .quota_write= ext3_quota_write, + .show_options = ext3_show_options, #endif }; @@ -590,7 +623,8 @@ Opt_abort, Opt_data_journal, Opt_data_ordered, Opt_data_writeback, Opt_usrjquota, Opt_grpjquota, Opt_offusrjquota, Opt_offgrpjquota, Opt_jqfmt_vfsold, Opt_jqfmt_vfsv0, Opt_quota, Opt_noquota, - Opt_ignore, Opt_barrier, Opt_err, Opt_resize, + Opt_ignore, Opt_barrier, Opt_err, Opt_resize, Opt_usrquota, + Opt_grpquota }; static match_table_t tokens = { @@ -634,10 +668,10 @@ {Opt_grpjquota, grpjquota=%s}, {Opt_jqfmt_vfsold, jqfmt=vfsold}, {Opt_jqfmt_vfsv0, jqfmt=vfsv0}, - {Opt_quota, grpquota}, + {Opt_grpquota, grpquota}, {Opt_noquota, noquota}, {Opt_quota, quota}, - {Opt_quota, usrquota}, + {Opt_usrquota, usrquota}, {Opt_barrier, barrier=%u}, {Opt_err, NULL}, {Opt_resize, resize}, @@ -902,8 +936,18 @@ case Opt_jqfmt_vfsv0: sbi-s_jquota_fmt = QFMT_VFS_V0; break; + case Opt_usrquota: + set_opt(sbi-s_mount_opt, QUOTA); + set_opt(sbi-s_mount_opt, USRQUOTA); + break; + case
Re: [PATCH] disk quotas fail when /etc/mtab is symlinked to /proc/mounts
Nathan Scott wrote: On Thu, Jul 28, 2005 at 05:03:02PM -0700, Mark Bellon wrote: The attached patchs modify the EXT[2|3] and [X|J]FS codes to add the The XFS component is incorrect, we're already doing this elsewhere (over in fs/xfs/quota/xfs_qm_bhv.c), so please drop this last part from your patch... No problem! New patch coming up. mark diff -Naur linux-2.6.13-rc3-git9-orig/fs/xfs/xfs_vfsops.c linux-2.6.13-rc3-git9/fs/xfs/xfs_vfsops.c ... + { XFS_UQUOTA_ACTIVE,, usrquota }, + { XFS_GQUOTA_ACTIVE,, grpquota }, thanks. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] disk quotas fail when /etc/mtab is symlinked to /proc/mounts
If /etc/mtab is a regular file all of the mount options (of a file system) are written to /etc/mtab by the mount command. The quota tools look there for the quota strings for their operation. If, however, /etc/mtab is a symlink to /proc/mounts (a good thing in some environments) the tools don't write anything - they assume the kernel will take care of things. While the quota options are sent down to the kernel via the mount system call and the file system codes handle them properly unfortunately there is no code to echo the quota strings into /proc/mounts and the quota tools fail in the symlink case. The attached patchs modify the EXT[2|3] and JFS codes to add the necessary hooks. The show_options function of each file system in these patches currently deal with only those things that seemed related to quotas; especially in the EXT3 case more can be done (later?). The original XFS change has been dropped as per Nathan Scott [EMAIL PROTECTED]. The functionality had already been added. The EXT3 has added error checking and has two minor changes: The quota option is considered part of the older style quotas Journalled quotas and older style quotas are mutually exclusive. - both discussable topics mark Signed-off-by: Mark Bellon [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] disk quotas fail when /etc/mtab is symlinked to /proc/mounts
Andrew Morton wrote: Mark Bellon [EMAIL PROTECTED] wrote: If /etc/mtab is a regular file all of the mount options (of a file system) are written to /etc/mtab by the mount command. The quota tools look there for the quota strings for their operation. If, however, /etc/mtab is a symlink to /proc/mounts (a good thing in some environments) the tools don't write anything - they assume the kernel will take care of things. While the quota options are sent down to the kernel via the mount system call and the file system codes handle them properly unfortunately there is no code to echo the quota strings into /proc/mounts and the quota tools fail in the symlink case. hm. Perhaps others with operational experience in that area can comment. OK. The attached patchs modify the EXT[2|3] and [X|J]FS codes to add the necessary hooks. The show_options function of each file system in these patches currently deal with only those things that seemed related to quotas; especially in the EXT3 case more can be done (later?). It seems sad to do it in each filesystem. Is there no way in which we can do this for all filesystems, in a single place in the VFS? Each file system must be able to echo it's own FS specific options, hence the show_options hook (XFS is a good example). EXT3 has it's own form of journalled quota file options, hence the need for the specific hook. The older style (e.g. usrquota, grpquota, quota) could be done generically but a FS might want any number of quota options. The few lines of code in each file system didn't seem so bad especially if the show_function start echoing more. mark - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] disk quotas fail when /etc/mtab is symlinked to /proc/mounts
Mark Bellon wrote: Andrew Morton wrote: Mark Bellon [EMAIL PROTECTED] wrote: If /etc/mtab is a regular file all of the mount options (of a file system) are written to /etc/mtab by the mount command. The quota tools look there for the quota strings for their operation. If, however, /etc/mtab is a symlink to /proc/mounts (a good thing in some environments) the tools don't write anything - they assume the kernel will take care of things. While the quota options are sent down to the kernel via the mount system call and the file system codes handle them properly unfortunately there is no code to echo the quota strings into /proc/mounts and the quota tools fail in the symlink case. hm. Perhaps others with operational experience in that area can comment. OK. The attached patchs modify the EXT[2|3] and [X|J]FS codes to add the necessary hooks. The show_options function of each file system in these patches currently deal with only those things that seemed related to quotas; especially in the EXT3 case more can be done (later?). It seems sad to do it in each filesystem. Is there no way in which we can do this for all filesystems, in a single place in the VFS? Each file system must be able to echo it's own FS specific options, hence the show_options hook (XFS is a good example). EXT3 has it's own form of journalled quota file options, hence the need for the specific hook. The older style (e.g. usrquota, grpquota, quota) could be done generically but a FS might want any number of quota options. The few lines of code in each file system didn't seem so bad especially if the show_function start echoing more. Followup comment/through... If we want /proc/mounts to really replace /etc/mtab in the general case, always using it as a symlink, the file system codes will all need the show_options hook - they will need to echo all of their private options duplicating, as closely as possible, what would have been written to the /etc/mtab regular file. mark mark - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/