[PATCH] VFAT short names in 2.4.0-test5-pre[3,4]
Hi Gordon, it looks like that recently added 16-bit charsets added few bugs into VFAT code. I was really surprised when I found that for VLAKC.TTR short name is VLAKCTTR... I'm not VFAT guru, so please can you review this patch and eventually fix it or send to Linus? Thanks, Petr Vandrovec [EMAIL PROTECTED] diff -urdN linux/fs/vfat/namei.c linux/fs/vfat/namei.c --- linux/fs/vfat/namei.c Mon Jul 24 09:34:50 2000 +++ linux/fs/vfat/namei.c Mon Jul 24 09:38:51 2000 @@ -453,10 +453,11 @@ if (charbuf[chi] != vfat_tolower(nls, c)) return -EINVAL; if (strchr(replace_chars,c)) return -EINVAL; if (c ' '|| c==':') return -EINVAL; - if (c == '.') break; + if (c == '.') goto dot; space = c == ' '; } } +dot:; if (space) return -EINVAL; if (len c != '.') { len--; @@ -464,6 +465,7 @@ if (charbuf[0] != '.') return -EINVAL; } else return -EINVAL; + c = '.'; } if (c == '.') { if (len = 4) return -EINVAL; @@ -522,7 +524,7 @@ if (chl == 0) return -EINVAL; for (chi = 0; chi chl; chi++){ - if (charbuf[chi] == '.') break; + if (charbuf[chi] == '.') goto dot; if (!charbuf[chi]) return -EINVAL; if (walk-res == 8) return -EINVAL; if (strchr(replace_chars,charbuf[chi])) return -EINVAL; @@ -532,6 +534,7 @@ walk++; } } +dot:; if (space) return -EINVAL; if (len = 0) { while (walk-res 8) *walk++ = ' ';
cont_prepare_write fix
Hi, While trying to get affs working again, I noticed that the current cont_prepare_write is hardly working, if you write to a file in steps of 1024, the bytes pointer is never updated resulting in an endless loop. Below is a fixed version, that also removes the bogus tricks to the bytes ptr. I also "fixed" the comment, any reason we have to be insulting like this? Anyway, does anyone see a problem with the fix before I send it to Linus? bye, Roman Index: fs/buffer.c === RCS file: /dump/cvs/linux-2.4/fs/buffer.c,v retrieving revision 1.1.1.3 diff -u -r1.1.1.3 buffer.c --- fs/buffer.c 2000/07/17 10:46:49 1.1.1.3 +++ fs/buffer.c 2000/07/24 11:45:54 @@ -1585,7 +1585,7 @@ } /* - * For moronic filesystems that do not allow holes in file. + * For filesystems that do not allow holes in file. * We may have to extend the file. */ @@ -1600,7 +1600,15 @@ unsigned blocksize = inode-i_sb-s_blocksize; char *kaddr; - while(page-index (pgpos = *bytesPAGE_CACHE_SHIFT)) { +restart: + zerofrom = offset; + pgpos = *bytes; + if (page-index (pgpos PAGE_CACHE_SHIFT)) + goto skip_extend; + + zerofrom = pgpos ~PAGE_CACHE_MASK; + pgpos = PAGE_CACHE_SHIFT; + while (page-index pgpos) { status = -ENOMEM; new_page = grab_cache_page(mapping, pgpos); if (!new_page) @@ -1609,12 +1617,7 @@ if (*bytesPAGE_CACHE_SHIFT != pgpos) { UnlockPage(new_page); page_cache_release(new_page); - continue; - } - zerofrom = *bytes ~PAGE_CACHE_MASK; - if (zerofrom (blocksize-1)) { - *bytes |= (blocksize-1); - (*bytes)++; + goto restart; } status = __block_prepare_write(inode, new_page, zerofrom, PAGE_CACHE_SIZE, get_block); @@ -1626,35 +1629,26 @@ kunmap(new_page); UnlockPage(new_page); page_cache_release(new_page); + zerofrom = 0; + *bytes = ++pgpos PAGE_CACHE_SHIFT; } - if (page-index pgpos) { - /* completely inside the area */ - zerofrom = offset; - } else { - /* page covers the boundary, find the boundary offset */ - zerofrom = *bytes ~PAGE_CACHE_MASK; - - /* if we will expand the thing last block will be filled */ - if (to zerofrom (zerofrom (blocksize-1))) { - *bytes |= (blocksize-1); - (*bytes)++; - } + pgpos = PAGE_CACHE_SHIFT; + if (zerofrom to) + *bytes = pgpos + to; - /* starting below the boundary? Nothing to zero out */ - if (offset = zerofrom) - zerofrom = offset; - } +skip_extend: status = __block_prepare_write(inode, page, zerofrom, to, get_block); - if (status) - goto out1; - kaddr = (char*)page_address(page); - if (zerofrom offset) { - memset(kaddr+zerofrom, 0, offset-zerofrom); - __block_commit_write(inode, page, zerofrom, offset); + if (!status) { + if (zerofrom offset) { + kaddr = (char*)page_address(page); + memset(kaddr+zerofrom, 0, offset-zerofrom); + __block_commit_write(inode, page, zerofrom, offset); + *bytes = pgpos + to; + } + return 0; } - return 0; -out1: + ClearPageUptodate(page); kunmap(page); return status;
Re: cont_prepare_write fix
On Mon, 24 Jul 2000, Roman Zippel wrote: Hi, While trying to get affs working again, I noticed that the current cont_prepare_write is hardly working, if you write to a file in steps of 1024, the bytes pointer is never updated resulting in an endless loop. Testcase, please? It certainly seems to be working on FAT and HPFS - both are using the same thing. Below is a fixed version, that also removes the bogus tricks to the bytes ptr. I also "fixed" the comment, any reason we have to be insulting like this? Sheer frustration from the braindead design? _Nothing_ can insult persons responsible for AFFS and FAT "designs". Sorry. Anyway, does anyone see a problem with the fix before I send it to Linus? Yes. In particular, adding the whole PAGE_CACHE_SIZE to *bytes is _definitely_ wrong. The whole "round up to the block boundary" thing is there for purpose - in native environment all of these filesystems do not care to clean the tail of the last allocated block. So we have to do it for them. And we don't want to repeat such cleaning on _every_ write() - the first one should normalize the situation. If you have problems with affs_get_block() - feel free to ask, I probably can help. What _is_ the problem? Notice that you can assume that affs_get_block() will only have to create blocks immediately past the end-of-file. Please, check fat_get_block() or hpfs_get_block() - they seem to work. I don't claim that I didn't fsck up there, but I would certainly like to hear more before I decide that I did. If you need description of cont_... design - just ask.
Re: cont_prepare_write fix
Hi, While trying to get affs working again, I noticed that the current cont_prepare_write is hardly working, if you write to a file in steps of 1024, the bytes pointer is never updated resulting in an endless loop. Testcase, please? It certainly seems to be working on FAT and HPFS - both are using the same thing. Ok, now I see it, they are updating it themselves, why isn't that done by the vfs? Below is a fixed version, that also removes the bogus tricks to the bytes ptr. I also "fixed" the comment, any reason we have to be insulting like this? Sheer frustration from the braindead design? _Nothing_ can insult persons responsible for AFFS and FAT "designs". Sorry. $ grep cont_prepare_write fs/*/*.c fs/adfs/inode.c:prepare_write: cont_prepare_write, fs/affs/file.c: prepare_write: cont_prepare_write, fs/hfs/inode.c: prepare_write: cont_prepare_write, fs/hpfs/file.c: prepare_write: cont_prepare_write, fs/ntfs/fs.c: prepare_write: cont_prepare_write, fs/qnx4/inode.c:prepare_write: cont_prepare_write, Only because a fs wasn't designed for UNIX, means it's shit. Sorry, for me it's a simple matter of respect. A fs isn't "moronic", only because it doesn't support holes, the real design issues are somewhere else. Anyway, probably as I only see mmu_private updates in fat and hpfs, it looks like adfs, hfs, ntfs and qnx4 are broken as well. BTW wouldn't it be better to introduce a i_realsize field for that, it can be useful for other fs later as well. E.g. it can be used to fix read/truncate race without a BKL (i_realsize only grows while file is open and truncate can be delayed until file is closed). Yes. In particular, adding the whole PAGE_CACHE_SIZE to *bytes is _definitely_ wrong. The whole "round up to the block boundary" thing is there for purpose - in native environment all of these filesystems do not care to clean the tail of the last allocated block. So we have to do it for them. And we don't want to repeat such cleaning on _every_ write() - the first one should normalize the situation. Rounding it up to the block size isn't a problem, the question is why do we give the bytes ptr to cont_prepare_write, if get_block has to update it in first place. That's probably the point that confused me here. If you have problems with affs_get_block() - feel free to ask, I probably can help. What _is_ the problem? Notice that you can assume that affs_get_block() will only have to create blocks immediately past the end-of-file. Yes, the whole get_block/getblock stuff in affs needs a real cleanup because of this. bye, Roman
Re: ext3 on root filesystem problem
Jeremy writes: Hmm...no, I'm not booting from a ramdisk. The only ramdisk involved is to initialize my scsi adapter and I've already successfully install the journal on my scsi drive's partition. The problem is that you probably have a journal option in your lilo.conf file, which is getting passed to the ramdisk and not the real root fs. If you have already mounted it as ext3, then you don't need any journal mount options, as the filesystem will mount as ext3 before it tries to mount as ext2. I thought about what you suggest, just putting things in fstab, but then I fear not being able to boot. You need a rescue floppy for sure. It doesn't need ext3 support, but it does need debugfs to be able to turn off the recovery feature so you can mount it in emergency. Cheers, Andreas -- Andreas Dilger \ "If a man ate a pound of pasta and a pound of antipasto, \ would they cancel out, leaving him still hungry?" http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert
Re: cont_prepare_write fix
On Mon, 24 Jul 2000, Roman Zippel wrote: Hi, While trying to get affs working again, I noticed that the current cont_prepare_write is hardly working, if you write to a file in steps of 1024, the bytes pointer is never updated resulting in an endless loop. Testcase, please? It certainly seems to be working on FAT and HPFS - both are using the same thing. Ok, now I see it, they are updating it themselves, why isn't that done by the vfs? Because actual _allocation_ unit may have nothing to blocksize and be invisible to VFS. Check fatfs - it uses the thing to trigger allocation, but actual zero-out goes on per-sector basis. Sheer frustration from the braindead design? _Nothing_ can insult persons responsible for AFFS and FAT "designs". Sorry. $ grep cont_prepare_write fs/*/*.c fs/adfs/inode.c:prepare_write: cont_prepare_write, Not on this box. What version are you using? fs/affs/file.c: prepare_write: cont_prepare_write, fs/hfs/inode.c: prepare_write: cont_prepare_write, Ditto. fs/hpfs/file.c: prepare_write: cont_prepare_write, Ditto. fs/ntfs/fs.c: prepare_write: cont_prepare_write, Ditto. fs/qnx4/inode.c:prepare_write: cont_prepare_write, Ditto. They are _calling_ the thing, but they would simply refuse to compile if they would have such lines - different prototypes. BTW, when that comment went in it was FAT, HPFS and AFFS. Only because a fs wasn't designed for UNIX, means it's shit. Sorry, for me it's a simple matter of respect. A fs isn't "moronic", only because it doesn't support holes, the real design issues are somewhere else. And you _bet_ that they are. No, lack of holes per se is not horrible. But there is a _lot_ of things about AFFS and FAT (and to some extent HPFS) design that are past "horrible" - "Lovecraftian" would be more accurate. But that's another rant... Anyway, probably as I only see mmu_private updates in fat and hpfs, it looks like adfs, hfs, ntfs and qnx4 are broken as well. BTW wouldn't it From adfs_get_block(): ... /* don't support allocation of blocks yet */ return -EIO; NTFS is not just broken, it eats filesystems in rw mode. QNX4 does not allocate blocks ("// to be done" says the driver) HFS refuses to allocate anything. Probably broken... be better to introduce a i_realsize field for that, it can be useful for other fs later as well. E.g. it can be used to fix read/truncate race without a BKL (i_realsize only grows while file is open and truncate can be delayed until file is closed). Not needed. That race is taken care of - readpage() never does access past i_size and has the page locked, while vmtruncate() starts with setting i_size (on shrinking truncate(), that is), then goes through all pages in affected area and grabs the lock on all of them (one by one, not simultaneously, indeed) and only then calls -truncate(). So readpage() in affected area that began before we've set -i_size will be completed prior to call of -truncate() and any attempt after setting -i_size will see new -i_size and will not bother -truncate() at all. _That_ one is OK. There are very scary dragons around -truncate(), but a) they are unrelated to that b) knowledge of realsize won't help c) let's take that to email. Rounding it up to the block size isn't a problem, the question is why do we give the bytes ptr to cont_prepare_write, if get_block has to update it in first place. That's probably the point that confused me here. cont_prepare_write() has to zero the tail of the last block out ;- If you have problems with affs_get_block() - feel free to ask, I probably can help. What _is_ the problem? Notice that you can assume that affs_get_block() will only have to create blocks immediately past the end-of-file. Yes, the whole get_block/getblock stuff in affs needs a real cleanup because of this. Well, it will get simpler - all logics with growing the chain shrinks a lot, AFAICS...
Re: cont_prepare_write fix
On Mon, 24 Jul 2000, Alexander Viro wrote: Not needed. That race is taken care of - readpage() never does access past i_size and has the page locked, while vmtruncate() starts with setting i_size (on shrinking truncate(), that is), then goes through all pages in affected area and grabs the lock on all of them (one by one, not simultaneously, indeed) and only then calls -truncate(). That assumes that i_size is atomically updated, which currently is not the case on 32 bit platforms. -ben
Re: cont_prepare_write fix
Hi, Because actual _allocation_ unit may have nothing to blocksize and be invisible to VFS. Check fatfs - it uses the thing to trigger allocation, but actual zero-out goes on per-sector basis. I tried vfat on a MO drive with 1KByte sectors - it doesn't work. I also don't know how you want to do it. A fs with a block size smaller than the sector size can't impossibly write directly into the page cache and I don't see anywhere where that is taken care of. fs/adfs/inode.c:prepare_write: cont_prepare_write, Not on this box. What version are you using? They are _calling_ the thing, but they would simply refuse to compile if they would have such lines - different prototypes. Oops, sorry, wrong source tree, it's a 2.4 with some other changes. :) What I tried was to change the prototypes so the smaller caller routines in _every_ fs can be removed - seems to work, except that cont_prepare_write needs that extra argument (another reason I'd like to move it to i_realsize). It seems to work, is there any reason not to do this? Not needed. That race is taken care of - readpage() never does access past i_size and has the page locked, while vmtruncate() starts with setting i_size (on shrinking truncate(), that is), then goes through all pages in affected area and grabs the lock on all of them (one by one, not simultaneously, indeed) and only then calls -truncate(). Hmm, then I don't understand the BKL in the get_block functions. Rounding it up to the block size isn't a problem, the question is why do we give the bytes ptr to cont_prepare_write, if get_block has to update it in first place. That's probably the point that confused me here. cont_prepare_write() has to zero the tail of the last block out ;- IMO we could simplify that if we would do the preallocation business in the page cache. We only need to support a dummy page structure in the page cache to delay the allocation of the real page (we need that anyway, if want to do direct io into/from e.g. pci space), but that's more something for 2.5 or is there any reason why it shouldn't work? bye, Roman
Re: Virtual files
Hi! I've been looking at the possibility of cleanly implementing virtual files (podfuk/avfs style) with the multiple mount thing in 2.4 kernels. Here is how it would work: 1) There needs to be a global lookup hook in VFS. If a real lookup fails, the global lookup is invoked. The cached lookup path is not effected. 2) The global lookup checks whether the name refers to a virtual file (contains the magic char exists in /overlay). If it does, then the file/dir from overlay is mounted on top of the negative dentry. The dentry is not filled in, so the virtual file remains invisible. For 1) the VFS needs to be modified, but with infinitesimal performance effect. For 2) a kernel module could be used. Great! I've prepared a kernel patch against 2.4.0-test4 (just for comments on this, NOT for inclusion in the main kernel), and a kernel module. I've tested it with avfs, but it should also work for podfuk. It looks much cleaner than my previous approaches. There seems to be something strange with usage counts: pavel@bug:/tmp/cz1_mbrola.tgz#utar$ ls -al total 88 drwxr-xr-x 2 pavelusers3586 Mar 26 18:43 ./ drwxrwxrwt 38 root root81920 Jul 24 09:57 ../ drwxr-xr-x 42949 pavelusers 0 Mar 25 11:16 czech/ pavel@bug:/tmp/cz1_mbrola.tgz#utar$ Problems: - The filesystem will be littered with these loopback mounts. This should be cleared upon unmount, and possibly when the dcache is shrunk. There was a similar requirement for new autofs IIRC. - Creation/removal of virtual files are not handled by this code. Comments? Yes. You seem to have damaged locking: @@ -281,12 +285,18 @@ lock_kernel(); result = dir-i_op-lookup(dir, dentry); unlock_kernel(); - if (result) + if (result) { dput(dentry); - else +up(dir-i_sem); +} + else { + up(dir-i_sem); result = dentry; + /* for avfs */ + if (!dentry-d_inode lookup_global) + lookup_global(nd, dentry); + } } - up(dir-i_sem); In case of !dentry, up(dir-i_sem) is not executed at all. I have some cleanups to nredir.c module, they are attached (please apply). Ouch, and there's slight problem with caching: You access vmlinux#gz, then load nredir module, then access vmlinux#gz again. But failure is already cached and you are out of luck :-(. Pavel -- I'm [EMAIL PROTECTED] "In my country we have almost anarchy and I don't care." Panos Katsaloulis describing me w.r.t. patents at [EMAIL PROTECTED] --- /tmp/nredir.c Mon Jul 24 10:03:48 2000 +++ nredir.cMon Jul 24 09:53:05 2000 @@ -1,4 +1,5 @@ #include linux/module.h +#include linux/init.h #include linux/kernel.h #include linux/fs.h #include linux/slab.h @@ -7,7 +8,7 @@ #define NREDIR_VERSION "0.1" -#define AVFS_MAGIC_CHAR '@' +#define AVFS_MAGIC_CHAR '#' #define OVERLAY_DIR "/overlay" #define OVERLAY_DIR_LEN 8 @@ -99,7 +100,7 @@ return dentry; } -int init_module(void) +static int __init init_nredir(void) { printk(KERN_INFO "nredir init (version %s)\n", NREDIR_VERSION); @@ -111,11 +112,14 @@ } -void cleanup_module(void) +static void __exit cleanup_nredir(void) { printk(KERN_INFO "nredir cleanup\n"); -lookup_global = 0; +lookup_global = NULL; printk("lookup_global: %x\n", (int) lookup_global); } + +module_init(init_nredir); +module_exit(cleanup_nredir);
Re: Virtual files
On Mon, 24 Jul 2000, Pavel Machek wrote: Hi! I've been looking at the possibility of cleanly implementing virtual files (podfuk/avfs style) with the multiple mount thing in 2.4 kernels. Here is how it would work: 1) There needs to be a global lookup hook in VFS. If a real lookup fails, the global lookup is invoked. The cached lookup path is not effected. 2) The global lookup checks whether the name refers to a virtual file (contains the magic char exists in /overlay). If it does, then the file/dir from overlay is mounted on top of the negative dentry. The dentry is not filled in, so the virtual file remains invisible. For 1) the VFS needs to be modified, but with infinitesimal performance effect. For 2) a kernel module could be used. Erm? Negative lookup happens every time when you are creating an object. Problems: - The filesystem will be littered with these loopback mounts. This should be cleared upon unmount, and possibly when the dcache is shrunk. There was a similar requirement for new autofs IIRC. Not with the dcache shrinking. - Creation/removal of virtual files are not handled by this code. Comments? How will it interact with modifying the mount-tree? Comment on autofs is pretty interesting, though. Sigh... Just let me find some time to do mount-traps - that may very well be the right tool here...
Re: cont_prepare_write fix
On Mon, Jul 24, 2000 at 12:04:39PM -0400, Benjamin C.R. LaHaise wrote: On Mon, 24 Jul 2000, Alexander Viro wrote: Not needed. That race is taken care of - readpage() never does access past i_size and has the page locked, while vmtruncate() starts with setting i_size (on shrinking truncate(), that is), then goes through all pages in affected area and grabs the lock on all of them (one by one, not simultaneously, indeed) and only then calls -truncate(). That assumes that i_size is atomically updated, which currently is not the case on 32 bit platforms. -ben I found that i_size is in most cases reasonably protected by i_sem. Except for Coda where inode-i_size != inode-i_mapping-host-i_size, because we `borrow' the mapping of the underlying container file. At the moment I believe that the introduction of i_mapping-host might have been a mistake. The readpage/writepage operations use i_mapping-host-i_size, but sys_stat and generic_write_file (for a file with O_APPEND) are using inode-i_size. So for Coda we are copying the i_size back to the Coda inode in a wrapper around generic_file_write. AFAIK Coda is the only FS that has an address space mapping that associated with another inode, and we need to lock the i_sem of that inode to avoid problems. Maybe the Coda file write should become something like: coda_file_write() { down(container_inode-i_sem); n = generic_file_write(file, ...); container_inode-i_size = file-f_dentry-d_inode-i_size; up(container_inode-i_sem); return n; } That way all writes are protected by the semaphore on the container file's inode. And the use of i_mapping-host, along with the nasty cast to (struct inode *) could be removed. Jan
question about sard and disk profiling patches
I build some customized kernel rpm's for our inhouse distro and I've incorporated the profiling patches to use the sard utility. I'm just curious if there is any downside to using this patch for any reason and if there are any concerns for stability using this patch? Thanks -jeremy -- http://www.xxedgexx.com | [EMAIL PROTECTED] -
Re: cont_prepare_write fix
On Mon, Jul 24, 2000 at 02:24:42PM -0400, Jan Harkes wrote: On Mon, Jul 24, 2000 at 12:04:39PM -0400, Benjamin C.R. LaHaise wrote: On Mon, 24 Jul 2000, Alexander Viro wrote: Not needed. That race is taken care of - readpage() never does access past i_size and has the page locked, while vmtruncate() starts with setting i_size (on shrinking truncate(), that is), then goes through all pages in affected area and grabs the lock on all of them (one by one, not simultaneously, indeed) and only then calls -truncate(). That assumes that i_size is atomically updated, which currently is not the case on 32 bit platforms. I found that i_size is in most cases reasonably protected by i_sem. Except for Coda where inode-i_size != inode-i_mapping-host-i_size, because we `borrow' the mapping of the underlying container file. At the moment I believe that the introduction of i_mapping-host might have been a mistake. The readpage/writepage operations use i_mapping-host-i_size, but sys_stat and generic_write_file (for a file with O_APPEND) are using inode-i_size. So for Coda we are copying the i_size back to the Coda inode in a wrapper around generic_file_write. AFAIK Coda is the only FS that has an address space mapping that associated with another inode, and we need to lock the i_sem of that inode to avoid problems. Maybe the Coda file write should become something like: coda_file_write() { down(container_inode-i_sem); n = generic_file_write(file, ...); container_inode-i_size = file-f_dentry-d_inode-i_size; up(container_inode-i_sem); return n; } That way all writes are protected by the semaphore on the container file's inode. And the use of i_mapping-host, along with the nasty cast to (struct inode *) could be removed. This might work for Coda, but I think in general there are a couple of problems with always using the struct file to get at the inode structure. IIRC either writepage or prepare_write is called with file==NULL in some circumstances (when creating a symlink I think). I discovered this when I got oopsen in my patch which adds space/inode/dentry usage limits to ramfs. There's another problem too: for ramfs limits, I obviously also need to keep track of when a page is removed from an inode. The only sane way I've seen to do this is to add a hook to address_space_operations called when the page is removed (currently I'm calling it from remove_page_from_inode_queue() but that may not be the best place). At this point, only the struct page is available, so from that I need to find the inode (well, actually inode-i_sb where I store the limits). Easy with mapping-host, otherwise impossible (AFAICT, it's entirely possible I'm missing something blindingly obvious). -- David Gibson, Technical Support Engineer, Linuxcare, Inc. +61 2 6262 8990 [EMAIL PROTECTED], http://www.linuxcare.com/ Linuxcare. Support for the revolution.