Re: Finding hardlinks

2006-12-29 Thread Mikulas Patocka



On Fri, 29 Dec 2006, Trond Myklebust wrote:


On Thu, 2006-12-28 at 19:14 +0100, Mikulas Patocka wrote:

Why don't you rip off the support for colliding inode number from the
kernel at all (i.e. remove iget5_locked)?

It's reasonable to have either no support for colliding ino_t or full
support for that (including syscalls that userspace can use to work with
such filesystem) --- but I don't see any point in having half-way support
in kernel as is right now.


What would ino_t have to do with inode numbers? It is only used as a
hash table lookup. The inode number is set in the ->getattr() callback.


The question is: why does the kernel contain iget5 function that looks up 
according to callback, if the filesystem cannot have more than 64-bit 
inode identifier?


This lookup callback just induces writing bad filesystems with coliding 
inode numbers. Either remove coda, smb (and possibly other) filesystems 
from the kernel or make a proper support for userspace for them.


The situation is that current coreutils 6.7 fail to recursively copy 
directories if some two directories in the tree have coliding inode 
number, so you get random data corruption with these filesystems.


Mikulas
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Finding hardlinks

2006-12-29 Thread Arjan van de Ven

> Whoops; wrong example.  It doesn't matter, though, since clearly there 
> exist correct examples: where Statement 1 is true and Statement 2 is 
> false, and in that case when the inode numbers are different, the links 
> are "almost certainly" to different files.

but.. there's no links in that case!

-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via 
http://www.linuxfirmwarekit.org

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Finding hardlinks

2006-12-29 Thread Bryan Henderson
>On Fri, 2006-12-29 at 10:08 -0800, Bryan Henderson wrote:
>> >On Thu, 2006-12-28 at 16:44 -0800, Bryan Henderson wrote:
>> >> >Statement 1:
>> >> >If two files have identical st_dev and st_ino, they MUST be 
hardlinks 
>> of
>> >> >each other/the same file.
>> >> >
>> >> >Statement 2:
>> >> >If two "files" are a hardlink of each other, they MUST be 
detectable
>> >> >(for example by having the same st_dev/st_ino)
>> >> >
>> >> >I personally consider statement 1 a mandatory requirement in terms 
of
>> >> >quality of implementation if not Posix compliance.
>> >> >
>> >> >Statement 2 for me is "nice but optional"
>> >> 
>> >> Statement 1 without Statement 2 provides one of those facilities 
where 
>> the 
>
>> There are various "these AREs" here, but the "almost certainly" I'm 
>> talking about is where Statement 1 is true and Statement 2 is false and 

>> the inode numbers you read through two links are different.  (For 
example, 
>> consider a filesystem in which the reported inode number is the 
internal 
>> inode number truncated to 32 bits).  The links are almost certainly to 
>> different files.
>> 
>
>but then statement 1 is false and violated.

Whoops; wrong example.  It doesn't matter, though, since clearly there 
exist correct examples: where Statement 1 is true and Statement 2 is 
false, and in that case when the inode numbers are different, the links 
are "almost certainly" to different files.

--
Bryan Henderson IBM Almaden Research Center
San Jose CA Filesystems

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] have pipefs ensure i_ino uniqueness by calling iunique and hashing the inode

2006-12-29 Thread Jeff Layton
This converts pipefs to use the new scheme. Here we're calling iunique to get
a unique i_ino value for the new inode, and then hashing it afterward. We
call iunique with a max_reserved value of 1 to avoid collision with the root
inode.  Since the inode is now hashed, we need to take care that we end up in
generic_delete_inode rather than generic_forget_inode or we'll create a nasty
leak, so we clear_nlink when we destroy the pipe info.

I'm not certain that this is the right place to add the clear_nlink, though
it does seem to work. I'm open to suggestions on a better place to put
this, or of a better way to make sure that we end up with i_nlink == 0 at
iput time.

Signed-off-by: Jeff Layton <[EMAIL PROTECTED]>

diff --git a/fs/pipe.c b/fs/pipe.c
index 68090e8..1d44ff0 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -825,6 +825,7 @@ void free_pipe_info(struct inode *inode)
 {
__free_pipe_info(inode->i_pipe);
inode->i_pipe = NULL;
+   clear_nlink(inode);
 }
 
 static struct vfsmount *pipe_mnt __read_mostly;
@@ -871,6 +872,8 @@ static struct inode * get_pipe_inode(void)
inode->i_uid = current->fsuid;
inode->i_gid = current->fsgid;
inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME;
+   inode->i_ino = iunique(pipe_mnt->mnt_sb, 1);
+   insert_inode_hash(inode);
 
return inode;
 
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] change libfs sb creation routines to avoid collisions with their root inodes

2006-12-29 Thread Jeff Layton
This changes the superblock creation routines that call new_inode to take steps
to avoid later collisions with other inodes that get created. I took the
approach here of not hashing things unless is was strictly necessary, though
that does mean that filesystem authors need to be careful to avoid collisions
by calling iunique properly.

Signed-off-by: Jeff Layton <[EMAIL PROTECTED]>

diff --git a/fs/libfs.c b/fs/libfs.c
index 503898d..5bdaf00 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -217,6 +217,12 @@ int get_sb_pseudo(struct file_system_type *fs_type, char 
*name,
root = new_inode(s);
if (!root)
goto Enomem;
+   /*
+* since this is the first inode, make it number 1. New inodes created
+ * after this must take care not to collide with it (by passing
+* max_reserved of 1 to iunique).
+*/
+   root->i_ino = 1;
root->i_mode = S_IFDIR | S_IRUSR | S_IWUSR;
root->i_uid = root->i_gid = 0;
root->i_atime = root->i_mtime = root->i_ctime = CURRENT_TIME;
@@ -373,6 +379,9 @@ int simple_fill_super(struct super_block *s, int magic, 
struct tree_descr *files
inode = new_inode(s);
if (!inode)
return -ENOMEM;
+   /* set to high value to try and avoid collisions with loop below */
+   inode->i_ino = 0x;
+   insert_inode_hash(inode);
inode->i_mode = S_IFDIR | 0755;
inode->i_uid = inode->i_gid = 0;
inode->i_blocks = 0;
@@ -399,6 +408,11 @@ int simple_fill_super(struct super_block *s, int magic, 
struct tree_descr *files
inode->i_blocks = 0;
inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME;
inode->i_fop = files->ops;
+   /*
+* no need to hash these, but you need to make sure that any
+* calls to iunique on this mount call it with a max_reserved
+* value high enough to avoid collisions with these inodes.
+*/
inode->i_ino = i;
d_add(dentry, inode);
}
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] make static counters in new_inode and iunique be 32 bits

2006-12-29 Thread Jeff Layton
When a 32-bit program that was not compiled with large file offsets does a
stat and gets a st_ino value back that won't fit in the 32 bit field, glibc
(correctly) generates an EOVERFLOW error. We can't do anything about fs's
with larger permanent inode numbers, but when we generate them on the fly,
we ought to try and have them fit within a 32 bit field.

This patch takes the first step toward this by making the static counters in
these two functions be 32 bits.

Signed-off-by: Jeff Layton <[EMAIL PROTECTED]>

diff --git a/fs/inode.c b/fs/inode.c
index bf21dc6..23fc1fd 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -524,7 +524,8 @@ repeat:
  */
 struct inode *new_inode(struct super_block *sb)
 {
-   static unsigned long last_ino;
+   /* 32 bits for compatability mode stat calls */
+   static unsigned int last_ino;
struct inode * inode;
 
spin_lock_prefetch(&inode_lock);
@@ -683,7 +684,8 @@ static unsigned long hash(struct super_block *sb, unsigned 
long hashval)
  */
 ino_t iunique(struct super_block *sb, ino_t max_reserved)
 {
-   static ino_t counter;
+   /* 32 bits for compatability mode stat calls */
+   static unsigned int counter;
struct inode *inode;
struct hlist_head * head;
ino_t res;
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] i_ino uniqueness: alternate approach -- hash the inodes

2006-12-29 Thread Jeff Layton
Since Joern mentioned that he thought that hashing the inodes might be simpler
and not have a drastic performance impact, I took the liberty of whipping up
some patches that use that approach. They follow in the next set of emails.

To reiterate, the problems are:

1) on filesystems w/o permanent inode numbers, i_ino values can be
larger than 32 bits, which can cause problems for some 32 bit userspace
programs on a 64 bit kernel. We can't do anything for filesystems that have
actual >32-bit inode numbers, but on filesystems that generate i_ino
values on the fly, we should try to have them fit in 32 bits. We could
trivially fix this by making the static counters in new_inode and iunique
32 bits, but...

2) many filesystems call new_inode and assume that the i_ino values they
are given are unique. They are not guaranteed to be so, since the static
counter can wrap. This problem is exacerbated by the fix for #1.

3) after allocating a new inode, some filesystems call iunique to try to
get a unique i_ino value, but they don't actually add their inodes to
the hashtable, and so they're still not guaranteed to be unique if that
counter wraps.

This patch set takes the simpler approach of simply using iunique and
hashing the inodes afterward. Christoph H. previously mentioned that he
thought that this approach may slow down lookups for filesystems that
currently hash their inodes.

The questions are:

1) how much would this slow down lookups for these filesystems?
2) is it enough to justify adding more infrastructure to avoid it?

What might be best is to start with this approach and then only move to using
IDR or some other scheme if these extra inodes in the hashtable prove to be
problematic.

I've done some cursory testing with this patch and the overhead of hashing
and unhashing the inodes with pipefs is pretty low -- just a few seconds of
system time added on to the creation and destruction of 10 million pipes (very
similar to the overhead that the IDR approach would add).

The hard thing to measure is what effect this has on other filesystems. I'm
open to ways to try and gauge this.

Again, I've only converted pipefs as an example. If this approach is
acceptable then I'll start work on patches to convert other filesystems.

Comments and suggestions welcome...

-- Jeff

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Finding hardlinks

2006-12-29 Thread Arjan van de Ven
On Fri, 2006-12-29 at 10:08 -0800, Bryan Henderson wrote:
> >On Thu, 2006-12-28 at 16:44 -0800, Bryan Henderson wrote:
> >> >Statement 1:
> >> >If two files have identical st_dev and st_ino, they MUST be hardlinks 
> of
> >> >each other/the same file.
> >> >
> >> >Statement 2:
> >> >If two "files" are a hardlink of each other, they MUST be detectable
> >> >(for example by having the same st_dev/st_ino)
> >> >
> >> >I personally consider statement 1 a mandatory requirement in terms of
> >> >quality of implementation if not Posix compliance.
> >> >
> >> >Statement 2 for me is "nice but optional"
> >> 
> >> Statement 1 without Statement 2 provides one of those facilities where 
> the 


> There are various "these AREs" here, but the "almost certainly" I'm 
> talking about is where Statement 1 is true and Statement 2 is false and 
> the inode numbers you read through two links are different.  (For example, 
> consider a filesystem in which the reported inode number is the internal 
> inode number truncated to 32 bits).  The links are almost certainly to 
> different files.
> 

but then statement 1 is false and violated.


-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via 
http://www.linuxfirmwarekit.org

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Finding hardlinks

2006-12-29 Thread Bryan Henderson
>On Thu, 2006-12-28 at 16:44 -0800, Bryan Henderson wrote:
>> >Statement 1:
>> >If two files have identical st_dev and st_ino, they MUST be hardlinks 
of
>> >each other/the same file.
>> >
>> >Statement 2:
>> >If two "files" are a hardlink of each other, they MUST be detectable
>> >(for example by having the same st_dev/st_ino)
>> >
>> >I personally consider statement 1 a mandatory requirement in terms of
>> >quality of implementation if not Posix compliance.
>> >
>> >Statement 2 for me is "nice but optional"
>> 
>> Statement 1 without Statement 2 provides one of those facilities where 
the 
>> computer tells you something is "maybe" or "almost certainly" true.
>
>No it's not a "almost certainly". It's a "these ARE".

There are various "these AREs" here, but the "almost certainly" I'm 
talking about is where Statement 1 is true and Statement 2 is false and 
the inode numbers you read through two links are different.  (For example, 
consider a filesystem in which the reported inode number is the internal 
inode number truncated to 32 bits).  The links are almost certainly to 
different files.

--
Bryan Henderson IBM Almaden Research Center
San Jose CA Filesystems

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Finding hardlinks

2006-12-29 Thread Arjan van de Ven

> 
> Actually no.  Statement  2 for me is important in terms of archive  
> correctness.  With my "archiver" program Mksquashfs, if the two files  
> are the same, and filesystem says they're hardlinks, I make them  
> hardlinks in the Squashfs filesystem, otherwise they're stored as  
> duplicates (same data, different inode).  Doesn't matter much in  
> terms of storage overhead, but it does matter if two files become  
> one, or vice versa.

statement 2 was "all files that are hardlinks can be found with ino/dev
pairs". How would files become one if accidentally the kernel shows a
hardlinked file as 2 separate files in terms of inode nr or device?

-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via 
http://www.linuxfirmwarekit.org

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Finding hardlinks

2006-12-29 Thread Phillip Lougher


On 29 Dec 2006, at 08:41, Arjan van de Ven wrote:




I think "statement 2" is extremely important.  Without this guarantee
applications have to guess which files are hardlinks.  Any guessing
is going to be be got wrong sometimes with potentially disastrous
results.


actually no. Statement 1 will tell them when the kernel knows they are
hardlinks. It's the kernels job to make a reasonably quality of
implementation so that that works most of the time.

Statement 2 requires that "all of the time" which suddenly creates  
a lot

of evil corner cases (like "what if I mount a network filesystem twice
and the server doesn't quite tell me enough to figure it out"  
cases) to

make it impractical.



Actually no.  Statement  2 for me is important in terms of archive  
correctness.  With my "archiver" program Mksquashfs, if the two files  
are the same, and filesystem says they're hardlinks, I make them  
hardlinks in the Squashfs filesystem, otherwise they're stored as  
duplicates (same data, different inode).  Doesn't matter much in  
terms of storage overhead, but it does matter if two files become  
one, or vice versa.


If a filesystem cannot guarantee statement 2 in the "normal" case, I  
wouldn't use hardlinks in that filesystem, period.   Using "evil  
corner cases" and network filesystems as an objection is somewhat  
like saying because we can't do it in every case, we shouldn't bother  
doing it in the "normal" case too.  Disk based filesystems should be  
able to handle statements 1 and 2.  No-one expects things to always  
work correctly in "evil corner cases" or with network filesystems.


Phillip


Think of it as the difference between good and perfect.
(and perfect is the enemy of good :)

the kernel will tell you when it knows within reason, via statement 1
technology. It's not perfect, but reasonably will be enough for normal
userspace to depend on it. Your case is NOT a case of "I require  
100%"..

it's a "we'd like to take hardlinks into account" case.


--
if you want to mail me at work (you don't), use arjan (at)  
linux.intel.com
Test the interaction between Linux and your BIOS via http:// 
www.linuxfirmwarekit.org




-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Finding hardlinks

2006-12-29 Thread Trond Myklebust
On Thu, 2006-12-28 at 19:14 +0100, Mikulas Patocka wrote:
> Why don't you rip off the support for colliding inode number from the 
> kernel at all (i.e. remove iget5_locked)?
> 
> It's reasonable to have either no support for colliding ino_t or full 
> support for that (including syscalls that userspace can use to work with 
> such filesystem) --- but I don't see any point in having half-way support 
> in kernel as is right now.

What would ino_t have to do with inode numbers? It is only used as a
hash table lookup. The inode number is set in the ->getattr() callback.

Cheers
  Trond

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [nfsv4] RE: Finding hardlinks

2006-12-29 Thread Trond Myklebust
On Thu, 2006-12-28 at 15:07 -0500, Halevy, Benny wrote:
> Mikulas Patocka wrote:

> >BTW. how does (or how should?) NFS client deal with cache coherency if 
> >filehandles for the same file differ?
> >
> 
> Trond can probably answer this better than me...
> As I read it, currently the nfs client matches both the fileid and the
> filehandle (in nfs_find_actor). This means that different filehandles
> for the same file would result in different inodes :(.
> Strictly following the nfs protocol, comparing only the fileid should
> be enough IF fileids are indeed unique within the filesystem.
> Comparing the filehandle works as a workaround when the exported filesystem
> (or the nfs server) violates that.  From a user stand point I think that
> this should be configurable, probably per mount point.

Matching files by fileid instead of filehandle is a lot more trouble
since fileids may be reused after a file has been deleted. Every time
you look up a file, and get a new filehandle for the same fileid, you
would at the very least have to do another GETATTR using one of the
'old' filehandles in order to ensure that the file is the same object as
the one you have cached. Then there is the issue of what to do when you
open(), read() or write() to the file: which filehandle do you use, are
the access permissions the same for all filehandles, ...

All in all, much pain for little or no gain.

Most servers therefore take great pains to ensure that clients can use
filehandles to identify inodes. The exceptions tend to be broken in
other ways (Note: knfsd without the no_subtree_check option is one of
these exceptions - it can break in the case of cross-directory renames).

Cheers,
  Trond

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Finding hardlinks

2006-12-29 Thread Trond Myklebust
On Thu, 2006-12-28 at 17:12 +0200, Benny Halevy wrote:

> As an example, some file systems encode hint information into the filehandle
> and the hints may change over time, another example is encoding parent
> information into the filehandle and then handles representing hard links
> to the same file from different directories will differ.

Both these examples are bogus. Filehandle information should not change
over time (except in the special case of NFSv4 "volatile filehandles")
and they should definitely not encode parent directory information that
can change over time (think rename()!).

Cheers
  Trond

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Finding hardlinks

2006-12-29 Thread Arjan van de Ven

> I think "statement 2" is extremely important.  Without this guarantee  
> applications have to guess which files are hardlinks.  Any guessing  
> is going to be be got wrong sometimes with potentially disastrous  
> results.

actually no. Statement 1 will tell them when the kernel knows they are
hardlinks. It's the kernels job to make a reasonably quality of
implementation so that that works most of the time.

Statement 2 requires that "all of the time" which suddenly creates a lot
of evil corner cases (like "what if I mount a network filesystem twice
and the server doesn't quite tell me enough to figure it out" cases) to
make it impractical.

Think of it as the difference between good and perfect.
(and perfect is the enemy of good :)

the kernel will tell you when it knows within reason, via statement 1
technology. It's not perfect, but reasonably will be enough for normal
userspace to depend on it. Your case is NOT a case of "I require 100%"..
it's a "we'd like to take hardlinks into account" case.


-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via 
http://www.linuxfirmwarekit.org

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Finding hardlinks

2006-12-29 Thread Arjan van de Ven
On Thu, 2006-12-28 at 16:44 -0800, Bryan Henderson wrote:
> >Statement 1:
> >If two files have identical st_dev and st_ino, they MUST be hardlinks of
> >each other/the same file.
> >
> >Statement 2:
> >If two "files" are a hardlink of each other, they MUST be detectable
> >(for example by having the same st_dev/st_ino)
> >
> >I personally consider statement 1 a mandatory requirement in terms of
> >quality of implementation if not Posix compliance.
> >
> >Statement 2 for me is "nice but optional"
> 
> Statement 1 without Statement 2 provides one of those facilities where the 
> computer tells you something is "maybe" or "almost certainly" true.

No it's not a "almost certainly". It's a "these ARE".
It's not a "these are NOT"

Statement 2 is the "these are NOT" statement basically

they are entirely separate concepts... 
(but then again I'm not a CS guy so maybe I just look at it from a
different angle)

-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via 
http://www.linuxfirmwarekit.org

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html