Re: Finding hardlinks
On Fri, 29 Dec 2006, Trond Myklebust wrote: On Thu, 2006-12-28 at 19:14 +0100, Mikulas Patocka wrote: Why don't you rip off the support for colliding inode number from the kernel at all (i.e. remove iget5_locked)? It's reasonable to have either no support for colliding ino_t or full support for that (including syscalls that userspace can use to work with such filesystem) --- but I don't see any point in having half-way support in kernel as is right now. What would ino_t have to do with inode numbers? It is only used as a hash table lookup. The inode number is set in the ->getattr() callback. The question is: why does the kernel contain iget5 function that looks up according to callback, if the filesystem cannot have more than 64-bit inode identifier? This lookup callback just induces writing bad filesystems with coliding inode numbers. Either remove coda, smb (and possibly other) filesystems from the kernel or make a proper support for userspace for them. The situation is that current coreutils 6.7 fail to recursively copy directories if some two directories in the tree have coliding inode number, so you get random data corruption with these filesystems. Mikulas - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Finding hardlinks
> Whoops; wrong example. It doesn't matter, though, since clearly there > exist correct examples: where Statement 1 is true and Statement 2 is > false, and in that case when the inode numbers are different, the links > are "almost certainly" to different files. but.. there's no links in that case! -- if you want to mail me at work (you don't), use arjan (at) linux.intel.com Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Finding hardlinks
>On Fri, 2006-12-29 at 10:08 -0800, Bryan Henderson wrote: >> >On Thu, 2006-12-28 at 16:44 -0800, Bryan Henderson wrote: >> >> >Statement 1: >> >> >If two files have identical st_dev and st_ino, they MUST be hardlinks >> of >> >> >each other/the same file. >> >> > >> >> >Statement 2: >> >> >If two "files" are a hardlink of each other, they MUST be detectable >> >> >(for example by having the same st_dev/st_ino) >> >> > >> >> >I personally consider statement 1 a mandatory requirement in terms of >> >> >quality of implementation if not Posix compliance. >> >> > >> >> >Statement 2 for me is "nice but optional" >> >> >> >> Statement 1 without Statement 2 provides one of those facilities where >> the > >> There are various "these AREs" here, but the "almost certainly" I'm >> talking about is where Statement 1 is true and Statement 2 is false and >> the inode numbers you read through two links are different. (For example, >> consider a filesystem in which the reported inode number is the internal >> inode number truncated to 32 bits). The links are almost certainly to >> different files. >> > >but then statement 1 is false and violated. Whoops; wrong example. It doesn't matter, though, since clearly there exist correct examples: where Statement 1 is true and Statement 2 is false, and in that case when the inode numbers are different, the links are "almost certainly" to different files. -- Bryan Henderson IBM Almaden Research Center San Jose CA Filesystems - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] have pipefs ensure i_ino uniqueness by calling iunique and hashing the inode
This converts pipefs to use the new scheme. Here we're calling iunique to get a unique i_ino value for the new inode, and then hashing it afterward. We call iunique with a max_reserved value of 1 to avoid collision with the root inode. Since the inode is now hashed, we need to take care that we end up in generic_delete_inode rather than generic_forget_inode or we'll create a nasty leak, so we clear_nlink when we destroy the pipe info. I'm not certain that this is the right place to add the clear_nlink, though it does seem to work. I'm open to suggestions on a better place to put this, or of a better way to make sure that we end up with i_nlink == 0 at iput time. Signed-off-by: Jeff Layton <[EMAIL PROTECTED]> diff --git a/fs/pipe.c b/fs/pipe.c index 68090e8..1d44ff0 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -825,6 +825,7 @@ void free_pipe_info(struct inode *inode) { __free_pipe_info(inode->i_pipe); inode->i_pipe = NULL; + clear_nlink(inode); } static struct vfsmount *pipe_mnt __read_mostly; @@ -871,6 +872,8 @@ static struct inode * get_pipe_inode(void) inode->i_uid = current->fsuid; inode->i_gid = current->fsgid; inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME; + inode->i_ino = iunique(pipe_mnt->mnt_sb, 1); + insert_inode_hash(inode); return inode; - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] change libfs sb creation routines to avoid collisions with their root inodes
This changes the superblock creation routines that call new_inode to take steps to avoid later collisions with other inodes that get created. I took the approach here of not hashing things unless is was strictly necessary, though that does mean that filesystem authors need to be careful to avoid collisions by calling iunique properly. Signed-off-by: Jeff Layton <[EMAIL PROTECTED]> diff --git a/fs/libfs.c b/fs/libfs.c index 503898d..5bdaf00 100644 --- a/fs/libfs.c +++ b/fs/libfs.c @@ -217,6 +217,12 @@ int get_sb_pseudo(struct file_system_type *fs_type, char *name, root = new_inode(s); if (!root) goto Enomem; + /* +* since this is the first inode, make it number 1. New inodes created + * after this must take care not to collide with it (by passing +* max_reserved of 1 to iunique). +*/ + root->i_ino = 1; root->i_mode = S_IFDIR | S_IRUSR | S_IWUSR; root->i_uid = root->i_gid = 0; root->i_atime = root->i_mtime = root->i_ctime = CURRENT_TIME; @@ -373,6 +379,9 @@ int simple_fill_super(struct super_block *s, int magic, struct tree_descr *files inode = new_inode(s); if (!inode) return -ENOMEM; + /* set to high value to try and avoid collisions with loop below */ + inode->i_ino = 0x; + insert_inode_hash(inode); inode->i_mode = S_IFDIR | 0755; inode->i_uid = inode->i_gid = 0; inode->i_blocks = 0; @@ -399,6 +408,11 @@ int simple_fill_super(struct super_block *s, int magic, struct tree_descr *files inode->i_blocks = 0; inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME; inode->i_fop = files->ops; + /* +* no need to hash these, but you need to make sure that any +* calls to iunique on this mount call it with a max_reserved +* value high enough to avoid collisions with these inodes. +*/ inode->i_ino = i; d_add(dentry, inode); } - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] make static counters in new_inode and iunique be 32 bits
When a 32-bit program that was not compiled with large file offsets does a stat and gets a st_ino value back that won't fit in the 32 bit field, glibc (correctly) generates an EOVERFLOW error. We can't do anything about fs's with larger permanent inode numbers, but when we generate them on the fly, we ought to try and have them fit within a 32 bit field. This patch takes the first step toward this by making the static counters in these two functions be 32 bits. Signed-off-by: Jeff Layton <[EMAIL PROTECTED]> diff --git a/fs/inode.c b/fs/inode.c index bf21dc6..23fc1fd 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -524,7 +524,8 @@ repeat: */ struct inode *new_inode(struct super_block *sb) { - static unsigned long last_ino; + /* 32 bits for compatability mode stat calls */ + static unsigned int last_ino; struct inode * inode; spin_lock_prefetch(&inode_lock); @@ -683,7 +684,8 @@ static unsigned long hash(struct super_block *sb, unsigned long hashval) */ ino_t iunique(struct super_block *sb, ino_t max_reserved) { - static ino_t counter; + /* 32 bits for compatability mode stat calls */ + static unsigned int counter; struct inode *inode; struct hlist_head * head; ino_t res; - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/3] i_ino uniqueness: alternate approach -- hash the inodes
Since Joern mentioned that he thought that hashing the inodes might be simpler and not have a drastic performance impact, I took the liberty of whipping up some patches that use that approach. They follow in the next set of emails. To reiterate, the problems are: 1) on filesystems w/o permanent inode numbers, i_ino values can be larger than 32 bits, which can cause problems for some 32 bit userspace programs on a 64 bit kernel. We can't do anything for filesystems that have actual >32-bit inode numbers, but on filesystems that generate i_ino values on the fly, we should try to have them fit in 32 bits. We could trivially fix this by making the static counters in new_inode and iunique 32 bits, but... 2) many filesystems call new_inode and assume that the i_ino values they are given are unique. They are not guaranteed to be so, since the static counter can wrap. This problem is exacerbated by the fix for #1. 3) after allocating a new inode, some filesystems call iunique to try to get a unique i_ino value, but they don't actually add their inodes to the hashtable, and so they're still not guaranteed to be unique if that counter wraps. This patch set takes the simpler approach of simply using iunique and hashing the inodes afterward. Christoph H. previously mentioned that he thought that this approach may slow down lookups for filesystems that currently hash their inodes. The questions are: 1) how much would this slow down lookups for these filesystems? 2) is it enough to justify adding more infrastructure to avoid it? What might be best is to start with this approach and then only move to using IDR or some other scheme if these extra inodes in the hashtable prove to be problematic. I've done some cursory testing with this patch and the overhead of hashing and unhashing the inodes with pipefs is pretty low -- just a few seconds of system time added on to the creation and destruction of 10 million pipes (very similar to the overhead that the IDR approach would add). The hard thing to measure is what effect this has on other filesystems. I'm open to ways to try and gauge this. Again, I've only converted pipefs as an example. If this approach is acceptable then I'll start work on patches to convert other filesystems. Comments and suggestions welcome... -- Jeff - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Finding hardlinks
On Fri, 2006-12-29 at 10:08 -0800, Bryan Henderson wrote: > >On Thu, 2006-12-28 at 16:44 -0800, Bryan Henderson wrote: > >> >Statement 1: > >> >If two files have identical st_dev and st_ino, they MUST be hardlinks > of > >> >each other/the same file. > >> > > >> >Statement 2: > >> >If two "files" are a hardlink of each other, they MUST be detectable > >> >(for example by having the same st_dev/st_ino) > >> > > >> >I personally consider statement 1 a mandatory requirement in terms of > >> >quality of implementation if not Posix compliance. > >> > > >> >Statement 2 for me is "nice but optional" > >> > >> Statement 1 without Statement 2 provides one of those facilities where > the > There are various "these AREs" here, but the "almost certainly" I'm > talking about is where Statement 1 is true and Statement 2 is false and > the inode numbers you read through two links are different. (For example, > consider a filesystem in which the reported inode number is the internal > inode number truncated to 32 bits). The links are almost certainly to > different files. > but then statement 1 is false and violated. -- if you want to mail me at work (you don't), use arjan (at) linux.intel.com Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Finding hardlinks
>On Thu, 2006-12-28 at 16:44 -0800, Bryan Henderson wrote: >> >Statement 1: >> >If two files have identical st_dev and st_ino, they MUST be hardlinks of >> >each other/the same file. >> > >> >Statement 2: >> >If two "files" are a hardlink of each other, they MUST be detectable >> >(for example by having the same st_dev/st_ino) >> > >> >I personally consider statement 1 a mandatory requirement in terms of >> >quality of implementation if not Posix compliance. >> > >> >Statement 2 for me is "nice but optional" >> >> Statement 1 without Statement 2 provides one of those facilities where the >> computer tells you something is "maybe" or "almost certainly" true. > >No it's not a "almost certainly". It's a "these ARE". There are various "these AREs" here, but the "almost certainly" I'm talking about is where Statement 1 is true and Statement 2 is false and the inode numbers you read through two links are different. (For example, consider a filesystem in which the reported inode number is the internal inode number truncated to 32 bits). The links are almost certainly to different files. -- Bryan Henderson IBM Almaden Research Center San Jose CA Filesystems - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Finding hardlinks
> > Actually no. Statement 2 for me is important in terms of archive > correctness. With my "archiver" program Mksquashfs, if the two files > are the same, and filesystem says they're hardlinks, I make them > hardlinks in the Squashfs filesystem, otherwise they're stored as > duplicates (same data, different inode). Doesn't matter much in > terms of storage overhead, but it does matter if two files become > one, or vice versa. statement 2 was "all files that are hardlinks can be found with ino/dev pairs". How would files become one if accidentally the kernel shows a hardlinked file as 2 separate files in terms of inode nr or device? -- if you want to mail me at work (you don't), use arjan (at) linux.intel.com Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Finding hardlinks
On 29 Dec 2006, at 08:41, Arjan van de Ven wrote: I think "statement 2" is extremely important. Without this guarantee applications have to guess which files are hardlinks. Any guessing is going to be be got wrong sometimes with potentially disastrous results. actually no. Statement 1 will tell them when the kernel knows they are hardlinks. It's the kernels job to make a reasonably quality of implementation so that that works most of the time. Statement 2 requires that "all of the time" which suddenly creates a lot of evil corner cases (like "what if I mount a network filesystem twice and the server doesn't quite tell me enough to figure it out" cases) to make it impractical. Actually no. Statement 2 for me is important in terms of archive correctness. With my "archiver" program Mksquashfs, if the two files are the same, and filesystem says they're hardlinks, I make them hardlinks in the Squashfs filesystem, otherwise they're stored as duplicates (same data, different inode). Doesn't matter much in terms of storage overhead, but it does matter if two files become one, or vice versa. If a filesystem cannot guarantee statement 2 in the "normal" case, I wouldn't use hardlinks in that filesystem, period. Using "evil corner cases" and network filesystems as an objection is somewhat like saying because we can't do it in every case, we shouldn't bother doing it in the "normal" case too. Disk based filesystems should be able to handle statements 1 and 2. No-one expects things to always work correctly in "evil corner cases" or with network filesystems. Phillip Think of it as the difference between good and perfect. (and perfect is the enemy of good :) the kernel will tell you when it knows within reason, via statement 1 technology. It's not perfect, but reasonably will be enough for normal userspace to depend on it. Your case is NOT a case of "I require 100%".. it's a "we'd like to take hardlinks into account" case. -- if you want to mail me at work (you don't), use arjan (at) linux.intel.com Test the interaction between Linux and your BIOS via http:// www.linuxfirmwarekit.org - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Finding hardlinks
On Thu, 2006-12-28 at 19:14 +0100, Mikulas Patocka wrote: > Why don't you rip off the support for colliding inode number from the > kernel at all (i.e. remove iget5_locked)? > > It's reasonable to have either no support for colliding ino_t or full > support for that (including syscalls that userspace can use to work with > such filesystem) --- but I don't see any point in having half-way support > in kernel as is right now. What would ino_t have to do with inode numbers? It is only used as a hash table lookup. The inode number is set in the ->getattr() callback. Cheers Trond - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [nfsv4] RE: Finding hardlinks
On Thu, 2006-12-28 at 15:07 -0500, Halevy, Benny wrote: > Mikulas Patocka wrote: > >BTW. how does (or how should?) NFS client deal with cache coherency if > >filehandles for the same file differ? > > > > Trond can probably answer this better than me... > As I read it, currently the nfs client matches both the fileid and the > filehandle (in nfs_find_actor). This means that different filehandles > for the same file would result in different inodes :(. > Strictly following the nfs protocol, comparing only the fileid should > be enough IF fileids are indeed unique within the filesystem. > Comparing the filehandle works as a workaround when the exported filesystem > (or the nfs server) violates that. From a user stand point I think that > this should be configurable, probably per mount point. Matching files by fileid instead of filehandle is a lot more trouble since fileids may be reused after a file has been deleted. Every time you look up a file, and get a new filehandle for the same fileid, you would at the very least have to do another GETATTR using one of the 'old' filehandles in order to ensure that the file is the same object as the one you have cached. Then there is the issue of what to do when you open(), read() or write() to the file: which filehandle do you use, are the access permissions the same for all filehandles, ... All in all, much pain for little or no gain. Most servers therefore take great pains to ensure that clients can use filehandles to identify inodes. The exceptions tend to be broken in other ways (Note: knfsd without the no_subtree_check option is one of these exceptions - it can break in the case of cross-directory renames). Cheers, Trond - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Finding hardlinks
On Thu, 2006-12-28 at 17:12 +0200, Benny Halevy wrote: > As an example, some file systems encode hint information into the filehandle > and the hints may change over time, another example is encoding parent > information into the filehandle and then handles representing hard links > to the same file from different directories will differ. Both these examples are bogus. Filehandle information should not change over time (except in the special case of NFSv4 "volatile filehandles") and they should definitely not encode parent directory information that can change over time (think rename()!). Cheers Trond - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Finding hardlinks
> I think "statement 2" is extremely important. Without this guarantee > applications have to guess which files are hardlinks. Any guessing > is going to be be got wrong sometimes with potentially disastrous > results. actually no. Statement 1 will tell them when the kernel knows they are hardlinks. It's the kernels job to make a reasonably quality of implementation so that that works most of the time. Statement 2 requires that "all of the time" which suddenly creates a lot of evil corner cases (like "what if I mount a network filesystem twice and the server doesn't quite tell me enough to figure it out" cases) to make it impractical. Think of it as the difference between good and perfect. (and perfect is the enemy of good :) the kernel will tell you when it knows within reason, via statement 1 technology. It's not perfect, but reasonably will be enough for normal userspace to depend on it. Your case is NOT a case of "I require 100%".. it's a "we'd like to take hardlinks into account" case. -- if you want to mail me at work (you don't), use arjan (at) linux.intel.com Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Finding hardlinks
On Thu, 2006-12-28 at 16:44 -0800, Bryan Henderson wrote: > >Statement 1: > >If two files have identical st_dev and st_ino, they MUST be hardlinks of > >each other/the same file. > > > >Statement 2: > >If two "files" are a hardlink of each other, they MUST be detectable > >(for example by having the same st_dev/st_ino) > > > >I personally consider statement 1 a mandatory requirement in terms of > >quality of implementation if not Posix compliance. > > > >Statement 2 for me is "nice but optional" > > Statement 1 without Statement 2 provides one of those facilities where the > computer tells you something is "maybe" or "almost certainly" true. No it's not a "almost certainly". It's a "these ARE". It's not a "these are NOT" Statement 2 is the "these are NOT" statement basically they are entirely separate concepts... (but then again I'm not a CS guy so maybe I just look at it from a different angle) -- if you want to mail me at work (you don't), use arjan (at) linux.intel.com Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html