Re: [PATCH][RFC] fast file mapping for loop
> So I looked at the code - it seems you build a full extent of the blocks > in the file, filling holes as you go along. I initally did that as well, > but that is to slow to be usable in real life. > > You also don't support sparse files, falling back to normal fs > read/write paths. Supporting sparse files properly is a must, people > generally don't want to prealloc a huge disk backing. How would you do sparse file support with passthrough loopback that doesn't use pagecache? Holes are allocated at get_block function provided by each filesystem and the function gets a buffer that is supposed to be in the pagecache. Now if you want to allocate holes without pagecache, there's a problem --- new interface to all filesystems is needed. It could be possible to use pagecache interface for filling holes and passthrough interface for other requests --- but get_block is allowed to move other blocks on the filesystem (and on UFS it really does), so calling get_block to fill a hole could move other unrelated blocks which would result in desychronized block map and corruption of both filesystems. Mikulas - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Symbolic links vs hard links
Other people are of the opinion that the invention of the symbolic link was a huge mistake. I guess I haven't heard that one. What is the argument that we were better off without symbolic links? Numerous security bugs in tar (extracting a specially crafted archive with symlinks could overwrite arbitrary file) and in coreutils. For example, to walk directory path without following symlinks, you must break the path to elements and repeatedly use h = open(element, O_RDONLY | O_NOFOLLOW); fchdir(h); --- latest coreutils have it, but it's obvious that a lot of file-manipulation programs doesn't, making them unsafe to operate on user's directories from root account. Mikulas - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Finding hardlinks
Currently, large file support is already necessary to handle dvd and video. It's also useful for images for virtualization. So the failing stat() calls should already be a thing of the past with modern distributions. As long as glibc compiles by default with 32-bit ino_t, the problem exists and is severe --- programs handling large files, such as coreutils, tar, mc, mplayer, already compile with 64-bit ino_t and off_t, but the user (or script) may type something like: cat >file.c < #include main() { int h; struct stat st; if ((h = creat("foo", 0600)) < 0) perror("creat"), exit(1); if (fstat(h, &st)) perror("stat"), exit(1); close(h); return 0; } EOF gcc file.c; ./a.out --- and you certainly do not want this to fail (unless you are out of disk space). The difference is, that with 32-bit program and 64-bit off_t, you get deterministic failure on large files, with 32-bit program and 64-bit ino_t, you get random failures. What's (technically) the problem with changing the gcc default? Technically none (i.e. edit gcc specs or glibc includes). But persuading all distribution builders to use this version is impossible. Plus there are many binary programs that are unchangable. Alternatively we could make the error deterministic in various ways. Start st_ino numbering from 4G (except for a few special ones maybe such as root/mounts). Or make old and new programs look differently at the ELF level or by sys_personality() and/or check against a "ino64" mount flag/filesystem feature. Lots of possibilities. I think the best solution would be to drop -EOVERFLOW on st_ino and let legacy 32-bit programs live with coliding inodes. They'll have anyway. Mikulas - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Finding hardlinks
And does it matter? If you rename a file, tar might skip it no matter of hardlink detection (if readdir races with rename, you can read none of the names of file, one or both --- all these are possible). If you have "dir1/a" hardlinked to "dir1/b" and while tar runs you delete both "a" and "b" and create totally new files "dir2/c" linked to "dir2/d", tar might hardlink both "c" and "d" to "a" and "b". No one guarantees you sane result of tar or cp -a while changing the tree. I don't see how is_samefile() could make it worse. There are several cases where changing the tree doesn't affect the correctness of the tar or cp -a result. In some of these cases using samefile() instead of st_ino _will_ result in a corrupted result. ... and those are what? If you create hardlinks while copying, you may have files duplicated instead of hardlinked in the backup. If you unlink hardlinks, cp will miss hardlinks too and create two copies of the same file (it searches the hash only for files with i_nlink > 1). If you rename files, the archive will be completely fscked up (either missing or duplicate files). Generally samefile() is _weaker_ than the st_ino interface in comparing the identity of two files without using massive amounts of memory. You're searching for a better solution, not one that is broken in a different way, aren't you? What is the relevant case where st_ino/st_dev works and samefile(char *path1, char *path2) doesn't? Miklos Mikulas - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Finding hardlinks
Well, sort of. Samefile without keeping fds open doesn't have any protection against the tree changing underneath between first registering a file and later opening it. The inode number is more You only need to keep one-file-per-hardlink-group open during final verification, checking that inode hashing produced reasonable results. What final verification? I wasn't just talking about 'tar' but all cases where st_ino might be used to check the identity of two files at possibly different points in time. Time A:remember identity of file X Time B:check if identity of file Y matches that of file X With samefile() if you open X at A, and keep it open till B, you can accumulate large numbers of open files and the application can fail. If you don't keep an open file, just remember the path, then renaming X will foil the later identity check. Changing the file at this path between A and B can even give you a false positive. This applies to 'tar' as well as the other uses. And does it matter? If you rename a file, tar might skip it no matter of hardlink detection (if readdir races with rename, you can read none of the names of file, one or both --- all these are possible). If you have "dir1/a" hardlinked to "dir1/b" and while tar runs you delete both "a" and "b" and create totally new files "dir2/c" linked to "dir2/d", tar might hardlink both "c" and "d" to "a" and "b". No one guarantees you sane result of tar or cp -a while changing the tree. I don't see how is_samefile() could make it worse. Mikulas Miklos - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Finding hardlinks
On Wed, 3 Jan 2007, Frank van Maarseveen wrote: On Wed, Jan 03, 2007 at 01:09:41PM -0800, Bryan Henderson wrote: On any decent filesystem st_ino should uniquely identify an object and reliably provide hardlink information. The UNIX world has relied upon this for decades. A filesystem with st_ino collisions without being hardlinked (or the other way around) needs a fix. But for at least the last of those decades, filesystems that could not do that were not uncommon. They had to present 32 bit inode numbers and either allowed more than 4G files or just didn't have the means of assigning inode numbers with the proper uniqueness to files. And the sky did not fall. I don't have an explanation why, I think it's mostly high end use and high end users tend to understand more. But we're going to see more really large filesystems in "normal" use so.. Currently, large file support is already necessary to handle dvd and video. It's also useful for images for virtualization. So the failing stat() calls should already be a thing of the past with modern distributions. As long as glibc compiles by default with 32-bit ino_t, the problem exists and is severe --- programs handling large files, such as coreutils, tar, mc, mplayer, already compile with 64-bit ino_t and off_t, but the user (or script) may type something like: cat >file.c < #include main() { int h; struct stat st; if ((h = creat("foo", 0600)) < 0) perror("creat"), exit(1); if (fstat(h, &st)) perror("stat"), exit(1); close(h); return 0; } EOF gcc file.c; ./a.out --- and you certainly do not want this to fail (unless you are out of disk space). The difference is, that with 32-bit program and 64-bit off_t, you get deterministic failure on large files, with 32-bit program and 64-bit ino_t, you get random failures. Mikulas - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Finding hardlinks
I didn't hardlink directories, I just patched stat, lstat and fstat to always return st_ino == 0 --- and I've seen those failures. These failures are going to happen on non-POSIX filesystems in real world too, very rarely. I don't want to spoil your day but testing with st_ino==0 is a bad choice because it is a special number. Anyway, one can only find breakage, not prove that all the other programs handle this correctly so this is kind of pointless. On any decent filesystem st_ino should uniquely identify an object and reliably provide hardlink information. The UNIX world has relied upon this for decades. A filesystem with st_ino collisions without being hardlinked (or the other way around) needs a fix. ... and that's the problem --- the UNIX world specified something that isn't implementable in real world. Sure it is. Numerous popular POSIX filesystems do that. There is a lot of inode number space in 64 bit (of course it is a matter of time for it to jump to 128 bit and more) If the filesystem was designed by someone not from Unix world (FAT, SMB, ...), then not. And users still want to access these filesystems. 64-bit inode numbers space is not yet implemented on Linux --- the problem is that if you return ino >= 2^32, programs compiled without -D_FILE_OFFSET_BITS=64 will fail with stat() returning -EOVERFLOW --- this failure is specified in POSIX, but not very useful. Mikulas - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Finding hardlinks
On Wed, 3 Jan 2007, Frank van Maarseveen wrote: On Tue, Jan 02, 2007 at 01:04:06AM +0100, Mikulas Patocka wrote: I didn't hardlink directories, I just patched stat, lstat and fstat to always return st_ino == 0 --- and I've seen those failures. These failures are going to happen on non-POSIX filesystems in real world too, very rarely. I don't want to spoil your day but testing with st_ino==0 is a bad choice because it is a special number. Anyway, one can only find breakage, not prove that all the other programs handle this correctly so this is kind of pointless. On any decent filesystem st_ino should uniquely identify an object and reliably provide hardlink information. The UNIX world has relied upon this for decades. A filesystem with st_ino collisions without being hardlinked (or the other way around) needs a fix. ... and that's the problem --- the UNIX world specified something that isn't implementable in real world. You can take a closed box and say "this is POSIX cerified" --- but how useful such box could be, if you can't access CDs, diskettes and USB sticks with it? Mikulas - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Finding hardlinks
On Wed, 3 Jan 2007, Miklos Szeredi wrote: High probability is all you have. Cosmic radiation hitting your computer will more likly cause problems, than colliding 64bit inode numbers ;) Some of us have machines designed to cope with cosmic rays, and would be unimpressed with a decrease in reliability. With the suggested samefile() interface you'd get a failure with just about 100% reliability for any application which needs to compare a more than a few files. The fact is open files are _very_ expensive, no wonder they are limited in various ways. What should 'tar' do when it runs out of open files, while searching for hardlinks? Should it just give up? Then the samefile() interface would be _less_ reliable than the st_ino one by a significant margin. You could do samefile() for paths --- as for races --- it doesn't matter in this scenario, it is no more racy than stat or lstat. Mikulas Miklos - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Finding hardlinks
On Wed, 3 Jan 2007, Trond Myklebust wrote: On Sat, 2006-12-30 at 02:04 +0100, Mikulas Patocka wrote: On Fri, 29 Dec 2006, Trond Myklebust wrote: On Thu, 2006-12-28 at 19:14 +0100, Mikulas Patocka wrote: Why don't you rip off the support for colliding inode number from the kernel at all (i.e. remove iget5_locked)? It's reasonable to have either no support for colliding ino_t or full support for that (including syscalls that userspace can use to work with such filesystem) --- but I don't see any point in having half-way support in kernel as is right now. What would ino_t have to do with inode numbers? It is only used as a hash table lookup. The inode number is set in the ->getattr() callback. The question is: why does the kernel contain iget5 function that looks up according to callback, if the filesystem cannot have more than 64-bit inode identifier? Huh? The filesystem can have as large a damned identifier as it likes. NFSv4 uses 128-byte filehandles, for instance. But then it needs some other syscall to let applications determine hardlinks --- which was the initial topic in this thread. POSIX filesystems are another matter. They can only have 64-bit identifiers thanks to the requirement that inode numbers be 64-bit unique and permanently stored, however Linux caters for a whole truckload of filesystems which will never fit that label: look at all those users of iunique(), for one... I see them. The bad thing is that many programmers read POSIX, write programs as if POSIX specification was true and these programs break randomly on non-POSIX filesystem. Each non-POSIX filesystem invents st_ino on its own, trying to minimize hash collision, making the failure even less probable and worse to find. The current situation is (for example) that cp does stat(), open(), fstat() and compares st_ino/st_dev --- if they mismatch, it writes error and doesn't copy files --- so if kernel removes the inode from cache between stat() and open() and filesystem uses iunique(), cp will fail. What utilities should the user use on those non-POSIX filesystems, if not cp? Probably some file-handling guidelines should be specified and written to Documentation/ as a form of standard that can appliaction programmers use. Mikulas Trond - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Finding hardlinks
Certainly, but tar isn't going to remember all the inode numbers. Even if you solve the storage requirements (not impossible) it would have to do (4e9^2)/2=8e18 comparisons, which computers don't have enough CPU power just yet. It is remembering all inode numbers with nlink > 1 and many other tools are remembering all directory inode numbers (see my other post on this topic). Don't you mean they are remembering all the inode numbers of the directories _above_ the one they are currently working on? I'm quite sure they aren't remembering all the directories they have processed. cp -a is remembering all directory inodes it has visited, not just path from root. If you have two directories with the same inode number anywhere in the tree, it will skip one of them. Mikulas It of course doesn't compare each number with all others, it is using hashing. Yes, I didn't think of that. It doesn't matter if there are collisions within the filesystem, as long as there are no collisions between the set of files an application is working on at the same time. --- that are all files in case of backup. No, it's usually working with a _single_ file at a time. It will remember inode numbers of files with nlink > 1, but it won't remember all the other inode numbers. You could have a filesystem with 4billion files, each one having two links. Not a likely scenario though. Miklos - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Finding hardlinks
On Tue, 2 Jan 2007, Miklos Szeredi wrote: It seems like the posix idea of unique doesn't hold water for modern file systems are you really sure? Well Jan's example was of Coda that uses 128-bit internal file ids. and if so, why don't we fix *THAT* instead Hmm, sometimes you can't fix the world, especially if the filesystem is exported over NFS and has a problem with fitting its file IDs uniquely into a 64-bit identifier. Note, it's pretty easy to fit _anything_ into a 64-bit identifier with the use of a good hash function. The chance of an accidental collision is infinitesimally small. For a set of 100 files: 0.03% 1,000,000 files: 0.03% I do not think we want to play with probability like this. I mean... imagine 4G files, 1KB each. That's 4TB disk space, not _completely_ unreasonable, and collision probability is going to be ~100% due to birthday paradox. You'll still want to back up your 4TB server... Certainly, but tar isn't going to remember all the inode numbers. Even if you solve the storage requirements (not impossible) it would have to do (4e9^2)/2=8e18 comparisons, which computers don't have enough CPU power just yet. It is remembering all inode numbers with nlink > 1 and many other tools are remembering all directory inode numbers (see my other post on this topic). It of course doesn't compare each number with all others, it is using hashing. It doesn't matter if there are collisions within the filesystem, as long as there are no collisions between the set of files an application is working on at the same time. --- that are all files in case of backup. Miklos Mikulas - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Finding hardlinks
On Mon, 1 Jan 2007, Jan Harkes wrote: On Mon, Jan 01, 2007 at 11:47:06PM +0100, Mikulas Patocka wrote: Anyway, cp -a is not the only application that wants to do hardlink detection. I tested programs for ino_t collision (I intentionally injected it) and found that CP from coreutils 6.7 fails to copy directories but displays error messages (coreutils 5 work fine). MC and ARJ skip directories with colliding ino_t and pretend that operation completed successfuly. FTS library fails to walk directories returning FTS_DC error. Diffutils, find, grep fail to search directories with coliding inode numbers. Tar seems tolerant except incremental backup (which I didn't try). All programs except diff were tolerant to coliding ino_t on files. Thanks for testing so many programs, but... did the files/symlinks with colliding inode number have i_nlink > 1? Or did you also have directories with colliding inode numbers. It looks like you've introduced hardlinked directories in your test which are definitely not supported, in fact it will probably cause not only issues for userspace programs, but also locking and garbage collection issues in the kernel's dcache. I tested it only on files without hardlink (with i_nlink == 1) --- most programs (except diff) are tolerant to collision, they won't store st_ino in memory unless i_nlink > 1. I didn't hardlink directories, I just patched stat, lstat and fstat to always return st_ino == 0 --- and I've seen those failures. These failures are going to happen on non-POSIX filesystems in real world too, very rarely. BTW. POSIX supports (optionally) hardlinked directories but doesn't supoprt colliding st_ino --- so programs act according to POSIX --- but the problem is that this POSIX requirement no longer represents real world situation. I'm surprised you're seeing so many problems. The only find problem that I am aware of is the one where it assumes that there will be only i_nlink-2 subdirectories in a given directory, this optimization can be disabled with -noleaf. This is not a bug but a feature. If filesystem doesn't count subdirectories, it should set directory's n_link to 1 and find will be ok. The only problems I've encountered with ino_t collisions are archivers and other programs that recursively try to copy a tree while preserving hardlinks. And in all cases these seem to have no problem with such collisions as long as i_nlink == 1. Yes, but they have big problems with directory ino_t collisions. They think that directories are hardlinked and skip processing them. Mikulas Jan - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Finding hardlinks
> BTW. How does ReiserFS find that a given inode number (or object ID in > ReiserFS terminology) is free before assigning it to new file/directory? reiserfs v3 has an extent map of free object identifiers in super-block. Inode free space can have at most 2^31 extents --- if inode numbers alternate between "allocated", "free". How do you pack it to superblock? reiser4 used 64 bit object identifiers without reuse. So you are going to hit the same problem as I did with SpadFS --- you can't export 64-bit inode number to userspace (programs without -D_FILE_OFFSET_BITS=64 will have stat() randomly failing with EOVERFLOW then) and if you export only 32-bit number, it will eventually wrap-around and colliding st_ino will cause data corruption with many userspace programs. Mikulas - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Finding hardlinks
> The question is: why does the kernel contain iget5 function that looks up > according to callback, if the filesystem cannot have more than 64-bit > inode identifier? Generally speaking, file system might have two different identifiers for files: - one that makes it easy to tell whether two files are the same one; - one that makes it easy to locate file on the storage. According to POSIX, inode number should always work as identifier of the first class, but not necessary as one of the second. For example, in reiserfs something called "a key" is used to locate on-disk inode, which in turn, contains inode number. Identifiers of the second class tend to BTW. How does ReiserFS find that a given inode number (or object ID in ReiserFS terminology) is free before assigning it to new file/directory? Mikulas live in directory entries, and during lookup we want to consult inode cache _before_ reading inode from the disk (otherwise cache is mostly useless), right? This means that some file systems want to index inodes in a cache by something different than inode number. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Finding hardlinks
Hi! If user (or script) doesn't specify that flag, it doesn't help. I think the best solution for these filesystems would be either to add new syscall int is_hardlink(char *filename1, char *filename2) (but I know adding syscall bloat may be objectionable) it's also the wrong api; the filenames may have been changed under you just as you return from this call, so it really is a "was_hardlink_at_some_point()" as you specify it. If you make it work on fd's.. it has a chance at least. Yes, but it doesn't matter --- if the tree changes under "cp -a" command, no one guarantees you what you get. int fis_hardlink(int handle1, int handle 2); Is another possibility but it can't detect hardlinked symlinks. Ugh. Is it even legal to hardlink symlinks? Why it shoudln't be? It seems to work quite fine in Linux. Anyway, cp -a is not the only application that wants to do hardlink detection. I tested programs for ino_t collision (I intentionally injected it) and found that CP from coreutils 6.7 fails to copy directories but displays error messages (coreutils 5 work fine). MC and ARJ skip directories with colliding ino_t and pretend that operation completed successfuly. FTS library fails to walk directories returning FTS_DC error. Diffutils, find, grep fail to search directories with coliding inode numbers. Tar seems tolerant except incremental backup (which I didn't try). All programs except diff were tolerant to coliding ino_t on files. ino_t is no longer unique in many filesystems, it seems like quite serious data corruption possibility. Mikulas Pavel - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Finding hardlinks
On Wed, 20 Dec 2006, Al Viro wrote: On Wed, Dec 20, 2006 at 05:50:11PM +0100, Miklos Szeredi wrote: I don't see any problems with changing struct kstat. There would be reservations against changing inode.i_ino though. So filesystems that have 64bit inodes will need a specialized getattr() method instead of generic_fillattr(). And they are already free to do so. And no, struct kstat doesn't need to be changed - it has u64 ino already. If I return 64-bit values as ino_t, 32-bit programs will get EOVERFLOW on stat attempt (even if they are not going to use st_ino in any way) --- I know that POSIX specifies it, but the question is if it is useful. What is the correct solution? Mount option that can differentiate between 32-bit colliding inode numbers and 64-bit non-colliding inode numbers? Or is there any better idea. Given the fact that glibc compiles anything by default with 32-bit ino_t, I wonder if returning 64-bit inode number is possible at all. Mikulas - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Finding hardlinks
On Fri, 29 Dec 2006, Trond Myklebust wrote: On Thu, 2006-12-28 at 19:14 +0100, Mikulas Patocka wrote: Why don't you rip off the support for colliding inode number from the kernel at all (i.e. remove iget5_locked)? It's reasonable to have either no support for colliding ino_t or full support for that (including syscalls that userspace can use to work with such filesystem) --- but I don't see any point in having half-way support in kernel as is right now. What would ino_t have to do with inode numbers? It is only used as a hash table lookup. The inode number is set in the ->getattr() callback. The question is: why does the kernel contain iget5 function that looks up according to callback, if the filesystem cannot have more than 64-bit inode identifier? This lookup callback just induces writing bad filesystems with coliding inode numbers. Either remove coda, smb (and possibly other) filesystems from the kernel or make a proper support for userspace for them. The situation is that current coreutils 6.7 fail to recursively copy directories if some two directories in the tree have coliding inode number, so you get random data corruption with these filesystems. Mikulas - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Finding hardlinks
This sounds like a bug to me. It seems like we should have a one to one correspondence of filehandle -> inode. In what situations would this not be the case? Well, the NFS protocol allows that [see rfc1813, p. 21: "If two file handles from the same server are equal, they must refer to the same file, but if they are not equal, no conclusions can be drawn."] As an example, some file systems encode hint information into the filehandle and the hints may change over time, another example is encoding parent information into the filehandle and then handles representing hard links to the same file from different directories will differ. BTW. how does (or how should?) NFS client deal with cache coherency if filehandles for the same file differ? Mikulas - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Finding hardlinks
On Thu, 28 Dec 2006, Arjan van de Ven wrote: It seems like the posix idea of unique doesn't hold water for modern file systems are you really sure? and if so, why don't we fix *THAT* instead, rather than adding racy syscalls and such that just can't really be used right... Why don't you rip off the support for colliding inode number from the kernel at all (i.e. remove iget5_locked)? It's reasonable to have either no support for colliding ino_t or full support for that (including syscalls that userspace can use to work with such filesystem) --- but I don't see any point in having half-way support in kernel as is right now. As for syscall races --- if you pack something with tar and the directory changes underneath, you can't expect sane output anyway. Mikulas - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Finding hardlinks
If user (or script) doesn't specify that flag, it doesn't help. I think the best solution for these filesystems would be either to add new syscall int is_hardlink(char *filename1, char *filename2) (but I know adding syscall bloat may be objectionable) it's also the wrong api; the filenames may have been changed under you just as you return from this call, so it really is a "was_hardlink_at_some_point()" as you specify it. If you make it work on fd's.. it has a chance at least. Yes, but it doesn't matter --- if the tree changes under "cp -a" command, no one guarantees you what you get. int fis_hardlink(int handle1, int handle 2); Is another possibility but it can't detect hardlinked symlinks. Mikulas - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Finding hardlinks
On Thu, 21 Dec 2006, Jan Harkes wrote: On Wed, Dec 20, 2006 at 12:44:42PM +0100, Miklos Szeredi wrote: The stat64.st_ino field is 64bit, so AFAICS you'd only need to extend the kstat.ino field to 64bit and fix those filesystems to fill in kstat correctly. Coda actually uses 128-bit file identifiers internally, so 64-bits really doesn't cut it. Since the 128-bit space is used pretty sparsely there is a hash which avoids most collistions in 32-bit i_ino space, but not completely. I can also imagine that at some point someone wants to implement a git-based filesystem where it would be more natural to use 160-bit SHA1 hashes as unique object identifiers. But Coda only allow hardlinks within a single directory and if someone renames a hardlinked file and one of the names ends up in a different directory we implicitly create a copy of the object. This actually leverages off of the way we handle volume snapshots and the fact that we use whole file caching and writes, so we only copy the metadata while the data is 'copy-on-write'. The problem is that if inode number collision happens occasionally, you get data corruption with cp -a command --- it will just copy one file and hardlink the other. Any application that tries to be smart enough to keep track of which files are hardlinked should (in my opinion) also have a way to disable this behaviour. If user (or script) doesn't specify that flag, it doesn't help. I think the best solution for these filesystems would be either to add new syscall int is_hardlink(char *filename1, char *filename2) (but I know adding syscall bloat may be objectionable) or add new field in statvfs ST_HAS_BROKEN_INO_T, that applications can test and disable hardlink processing. Mikulas Jan - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Finding hardlinks
On Wed, 20 Dec 2006, Al Viro wrote: On Wed, Dec 20, 2006 at 05:50:11PM +0100, Miklos Szeredi wrote: I don't see any problems with changing struct kstat. There would be reservations against changing inode.i_ino though. So filesystems that have 64bit inodes will need a specialized getattr() method instead of generic_fillattr(). And they are already free to do so. And no, struct kstat doesn't need to be changed - it has u64 ino already. I see, I should have checked recent kernel. Mikulas - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Finding hardlinks
I've came across this problem: how can a userspace program (such as for example "cp -a") tell that two files form a hardlink? Comparing inode number will break on filesystems that can have more than 2^32 files (NFS3, OCFS, SpadFS; kernel developers already implemented iget5_locked for the case of colliding inode numbers). Other possibilities: --- compare not only ino, but all stat entries and make sure that i_nlink > 1? --- is not 100% reliable either, only lowers failure probability --- create a hardlink and watch if i_nlink is increased on both files? --- doesn't work on read-only filesystems --- compare file content? --- "cp -a" won't then corrupt data at least, but will create hardlinks where they shouldn't be. Is there some reliable way how should "cp -a" command determine that? Finding in kernel whether two dentries point to the same inode is trivial but I am not sure how to let userspace know ... am I missing something? The stat64.st_ino field is 64bit, so AFAICS you'd only need to extend the kstat.ino field to 64bit and fix those filesystems to fill in kstat correctly. There is 32-bit __st_ino and 64-bit st_ino --- what is their purpose? Some old compatibility code? SUSv3 requires st_ino/st_dev to be unique within a system so the application shouldn't need to bend over backwards. I see but kernel needs to be fixed for that. Would patches for changing kstat be accepted? Mikulas Miklos - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Finding hardlinks
Hi I've came across this problem: how can a userspace program (such as for example "cp -a") tell that two files form a hardlink? Comparing inode number will break on filesystems that can have more than 2^32 files (NFS3, OCFS, SpadFS; kernel developers already implemented iget5_locked for the case of colliding inode numbers). Other possibilities: --- compare not only ino, but all stat entries and make sure that i_nlink > 1? --- is not 100% reliable either, only lowers failure probability --- create a hardlink and watch if i_nlink is increased on both files? --- doesn't work on read-only filesystems --- compare file content? --- "cp -a" won't then corrupt data at least, but will create hardlinks where they shouldn't be. Is there some reliable way how should "cp -a" command determine that? Finding in kernel whether two dentries point to the same inode is trivial but I am not sure how to let userspace know ... am I missing something? Mikulas - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -mm] hpfs: fix printk format warnings
From: Randy Dunlap <[EMAIL PROTECTED]> Fix hpfs printk warnings: (why do I only see these in -mm?) Probably because -mm has unsigned long inode number and Linus' kernel has just unsigned int? Change it this way: hpfs_error(inode->i_sb, "not a directory, fnode %08lx", (unsigned long)inode->i_ino); --- so that it can work on both. Mikulas fs/hpfs/dir.c:87: warning: format '%08x' expects type 'unsigned int', but argument 3 has type 'long unsigned int' fs/hpfs/dir.c:147: warning: format '%08x' expects type 'unsigned int', but argument 3 has type 'long int' fs/hpfs/dir.c:148: warning: format '%08x' expects type 'unsigned int', but argument 3 has type 'long int' fs/hpfs/dnode.c:537: warning: format '%08x' expects type 'unsigned int', but argument 5 has type 'long unsigned int' fs/hpfs/dnode.c:854: warning: format '%08x' expects type 'unsigned int', but argument 3 has type 'loff_t' fs/hpfs/ea.c:247: warning: format '%08x' expects type 'unsigned int', but argument 3 has type 'long unsigned int' fs/hpfs/inode.c:254: warning: format '%08x' expects type 'unsigned int', but argument 3 has type 'long unsigned int' fs/hpfs/map.c:129: warning: format '%08x' expects type 'unsigned int', but argument 3 has type 'ino_t' fs/hpfs/map.c:135: warning: format '%08x' expects type 'unsigned int', but argument 3 has type 'ino_t' fs/hpfs/map.c:140: warning: format '%08x' expects type 'unsigned int', but argument 3 has type 'ino_t' fs/hpfs/map.c:147: warning: format '%08x' expects type 'unsigned int', but argument 3 has type 'ino_t' fs/hpfs/map.c:154: warning: format '%08x' expects type 'unsigned int', but argument 3 has type 'ino_t' Signed-off-by: Randy Dunlap <[EMAIL PROTECTED]> --- fs/hpfs/dir.c | 10 +++--- fs/hpfs/dnode.c | 13 + fs/hpfs/ea.c|2 +- fs/hpfs/inode.c |5 - fs/hpfs/map.c | 20 ++-- 5 files changed, 35 insertions(+), 15 deletions(-) --- linux-2619-rc5mm2.orig/fs/hpfs/dir.c +++ linux-2619-rc5mm2/fs/hpfs/dir.c @@ -84,7 +84,8 @@ static int hpfs_readdir(struct file *fil } if (!fno->dirflag) { e = 1; - hpfs_error(inode->i_sb, "not a directory, fnode %08x",inode->i_ino); + hpfs_error(inode->i_sb, "not a directory, fnode %08lx", + inode->i_ino); } if (hpfs_inode->i_dno != fno->u.external[0].disk_secno) { e = 1; @@ -144,8 +145,11 @@ static int hpfs_readdir(struct file *fil } if (de->first || de->last) { if (hpfs_sb(inode->i_sb)->sb_chk) { - if (de->first && !de->last && (de->namelen != 2 || de ->name[0] != 1 || de->name[1] != 1)) hpfs_error(inode->i_sb, "hpfs_readdir: bad ^A^A entry; pos = %08x", old_pos); - if (de->last && (de->namelen != 1 || de ->name[0] != 255)) hpfs_error(inode->i_sb, "hpfs_readdir: bad \\377 entry; pos = %08x", old_pos); + if (de->first && !de->last && (de->namelen != 2 + || de ->name[0] != 1 || de->name[1] != 1)) + hpfs_error(inode->i_sb, "hpfs_readdir: bad ^A^A entry; pos = %08lx", old_pos); + if (de->last && (de->namelen != 1 || de ->name[0] != 255)) + hpfs_error(inode->i_sb, "hpfs_readdir: bad \\377 entry; pos = %08lx", old_pos); } hpfs_brelse4(&qbh); goto again; --- linux-2619-rc5mm2.orig/fs/hpfs/dnode.c +++ linux-2619-rc5mm2/fs/hpfs/dnode.c @@ -533,10 +533,13 @@ static void delete_empty_dnode(struct in struct buffer_head *bh; struct dnode *d1; struct quad_buffer_head qbh1; - if (hpfs_sb(i->i_sb)->sb_chk) if (up != i->i_ino) { - hpfs_error(i->i_sb, "bad pointer to fnode, dnode %08x, pointing to %08x, should be %08x", dno, up, i->i_ino); + if (hpfs_sb(i->i_sb)->sb_chk) + if (up != i->i_ino) { + hpfs_error(i->i_sb, + "bad pointer to fnode, dnode %08x, pointing to %08x, should be %08lx", + dno, up, i->i_ino); return; - } + } if ((d1 = hpfs_map_dnode(i->i_sb, down, &qbh1))) { d1->up = up; d1->root_dnode = 1; @@ -851,7 +854,9 @@ struct hpfs_dirent *map_pos_dirent(struc /* Going to the next dirent */ if ((d = de_next_de(de)) < dnode_end_de(dnode)) { if (!(++*posp & 077)) { - hpfs_error(