Re: [PATCH][RFC] fast file mapping for loop

2008-01-11 Thread Mikulas Patocka
> So I looked at the code - it seems you build a full extent of the blocks
> in the file, filling holes as you go along. I initally did that as well,
> but that is to slow to be usable in real life.
> 
> You also don't support sparse files, falling back to normal fs
> read/write paths. Supporting sparse files properly is a must, people
> generally don't want to prealloc a huge disk backing.

How would you do sparse file support with passthrough loopback that 
doesn't use pagecache?

Holes are allocated at get_block function provided by each filesystem and 
the function gets a buffer that is supposed to be in the pagecache. Now if 
you want to allocate holes without pagecache, there's a problem --- new 
interface to all filesystems is needed.

It could be possible to use pagecache interface for filling holes and 
passthrough interface for other requests --- but get_block is allowed to 
move other blocks on the filesystem (and on UFS it really does), so 
calling get_block to fill a hole could move other unrelated blocks which 
would result in desychronized block map and corruption of both 
filesystems.

Mikulas
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Symbolic links vs hard links

2007-01-10 Thread Mikulas Patocka

Other people are of the opinion that the invention of the symbolic link
was a huge mistake.


I guess I haven't heard that one.  What is the argument that we were
better off without symbolic links?


Numerous security bugs in tar (extracting a specially crafted archive with 
symlinks could overwrite arbitrary file) and in coreutils.


For example, to walk directory path without following symlinks, you must 
break the path to elements and repeatedly use

h = open(element, O_RDONLY | O_NOFOLLOW);
fchdir(h);
--- latest coreutils have it, but it's obvious that a lot of 
file-manipulation programs doesn't, making them unsafe to operate on 
user's directories from root account.


Mikulas
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Finding hardlinks

2007-01-07 Thread Mikulas Patocka

Currently, large file support is already necessary to handle dvd and
video. It's also useful for images for virtualization. So the failing
stat()
calls should already be a thing of the past with modern distributions.


As long as glibc compiles by default with 32-bit ino_t, the problem exists
and is severe --- programs handling large files, such as coreutils, tar,
mc, mplayer, already compile with 64-bit ino_t and off_t, but the user (or
script) may type something like:

cat >file.c <
#include 
main()
{
int h;
struct stat st;
if ((h = creat("foo", 0600)) < 0) perror("creat"), exit(1);
if (fstat(h, &st)) perror("stat"), exit(1);
close(h);
return 0;
}
EOF
gcc file.c; ./a.out

--- and you certainly do not want this to fail (unless you are out of disk
space).

The difference is, that with 32-bit program and 64-bit off_t, you get
deterministic failure on large files, with 32-bit program and 64-bit
ino_t, you get random failures.


What's (technically) the problem with changing the gcc default?


Technically none (i.e. edit gcc specs or glibc includes). But persuading 
all distribution builders to use this version is impossible. Plus there 
are many binary programs that are unchangable.



Alternatively we could make the error deterministic in various ways. Start
st_ino numbering from 4G (except for a few special ones maybe such
as root/mounts). Or make old and new programs look differently at the
ELF level or by sys_personality() and/or check against a "ino64" mount
flag/filesystem feature. Lots of possibilities.


I think the best solution would be to drop -EOVERFLOW on st_ino and let 
legacy 32-bit programs live with coliding inodes. They'll have anyway.


Mikulas
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Finding hardlinks

2007-01-07 Thread Mikulas Patocka

And does it matter? If you rename a file, tar might skip it no matter of
hardlink detection (if readdir races with rename, you can read none of the
names of file, one or both --- all these are possible).

If you have "dir1/a" hardlinked to "dir1/b" and while tar runs you delete
both "a" and "b" and create totally new files "dir2/c" linked to "dir2/d",
tar might hardlink both "c" and "d" to "a" and "b".

No one guarantees you sane result of tar or cp -a while changing the tree.
I don't see how is_samefile() could make it worse.


There are several cases where changing the tree doesn't affect the
correctness of the tar or cp -a result.  In some of these cases using
samefile() instead of st_ino _will_ result in a corrupted result.


... and those are what? If you create hardlinks while copying, you may 
have files duplicated instead of hardlinked in the backup. If you unlink 
hardlinks, cp will miss hardlinks too and create two copies of the same 
file (it searches the hash only for files with i_nlink > 1). If you rename 
files, the archive will be completely fscked up (either missing or 
duplicate files).



Generally samefile() is _weaker_ than the st_ino interface in
comparing the identity of two files without using massive amounts of
memory.  You're searching for a better solution, not one that is
broken in a different way, aren't you?


What is the relevant case where st_ino/st_dev works and samefile(char 
*path1, char *path2) doesn't?



Miklos


Mikulas
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Finding hardlinks

2007-01-05 Thread Mikulas Patocka

Well, sort of.  Samefile without keeping fds open doesn't have any
protection against the tree changing underneath between first
registering a file and later opening it.  The inode number is more


You only need to keep one-file-per-hardlink-group open during final
verification, checking that inode hashing produced reasonable results.


What final verification?  I wasn't just talking about 'tar' but all
cases where st_ino might be used to check the identity of two files at
possibly different points in time.

Time A:remember identity of file X
Time B:check if identity of file Y matches that of file X

With samefile() if you open X at A, and keep it open till B, you can
accumulate large numbers of open files and the application can fail.

If you don't keep an open file, just remember the path, then renaming
X will foil the later identity check.  Changing the file at this path
between A and B can even give you a false positive.  This applies to
'tar' as well as the other uses.


And does it matter? If you rename a file, tar might skip it no matter of 
hardlink detection (if readdir races with rename, you can read none of the 
names of file, one or both --- all these are possible).


If you have "dir1/a" hardlinked to "dir1/b" and while tar runs you delete 
both "a" and "b" and create totally new files "dir2/c" linked to "dir2/d", 
tar might hardlink both "c" and "d" to "a" and "b".


No one guarantees you sane result of tar or cp -a while changing the tree. 
I don't see how is_samefile() could make it worse.


Mikulas


Miklos


-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Finding hardlinks

2007-01-03 Thread Mikulas Patocka

On Wed, 3 Jan 2007, Frank van Maarseveen wrote:


On Wed, Jan 03, 2007 at 01:09:41PM -0800, Bryan Henderson wrote:

On any decent filesystem st_ino should uniquely identify an object and
reliably provide hardlink information. The UNIX world has relied upon

this

for decades. A filesystem with st_ino collisions without being hardlinked
(or the other way around) needs a fix.


But for at least the last of those decades, filesystems that could not do
that were not uncommon.  They had to present 32 bit inode numbers and
either allowed more than 4G files or just didn't have the means of
assigning inode numbers with the proper uniqueness to files.  And the sky
did not fall.  I don't have an explanation why,


I think it's mostly high end use and high end users tend to understand
more. But we're going to see more really large filesystems in "normal"
use so..

Currently, large file support is already necessary to handle dvd and
video. It's also useful for images for virtualization. So the failing stat()
calls should already be a thing of the past with modern distributions.


As long as glibc compiles by default with 32-bit ino_t, the problem exists 
and is severe --- programs handling large files, such as coreutils, tar, 
mc, mplayer, already compile with 64-bit ino_t and off_t, but the user (or 
script) may type something like:


cat >file.c <
#include 
main()
{
int h;
struct stat st;
if ((h = creat("foo", 0600)) < 0) perror("creat"), exit(1);
if (fstat(h, &st)) perror("stat"), exit(1);
close(h);
return 0;
}
EOF
gcc file.c; ./a.out

--- and you certainly do not want this to fail (unless you are out of disk 
space).


The difference is, that with 32-bit program and 64-bit off_t, you get 
deterministic failure on large files, with 32-bit program and 64-bit 
ino_t, you get random failures.


Mikulas
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Finding hardlinks

2007-01-03 Thread Mikulas Patocka

I didn't hardlink directories, I just patched stat, lstat and fstat to
always return st_ino == 0 --- and I've seen those failures. These failures
are going to happen on non-POSIX filesystems in real world too, very
rarely.


I don't want to spoil your day but testing with st_ino==0 is a bad choice
because it is a special number. Anyway, one can only find breakage,
not prove that all the other programs handle this correctly so this is
kind of pointless.

On any decent filesystem st_ino should uniquely identify an object and
reliably provide hardlink information. The UNIX world has relied upon this
for decades. A filesystem with st_ino collisions without being hardlinked
(or the other way around) needs a fix.


... and that's the problem --- the UNIX world specified something that
isn't implementable in real world.


Sure it is. Numerous popular POSIX filesystems do that. There is a lot of
inode number space in 64 bit (of course it is a matter of time for it to
jump to 128 bit and more)


If the filesystem was designed by someone not from Unix world (FAT, SMB, 
...), then not. And users still want to access these filesystems.


64-bit inode numbers space is not yet implemented on Linux --- the problem 
is that if you return ino >= 2^32, programs compiled without 
-D_FILE_OFFSET_BITS=64 will fail with stat() returning -EOVERFLOW --- this 
failure is specified in POSIX, but not very useful.


Mikulas
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Finding hardlinks

2007-01-03 Thread Mikulas Patocka



On Wed, 3 Jan 2007, Frank van Maarseveen wrote:


On Tue, Jan 02, 2007 at 01:04:06AM +0100, Mikulas Patocka wrote:


I didn't hardlink directories, I just patched stat, lstat and fstat to
always return st_ino == 0 --- and I've seen those failures. These failures
are going to happen on non-POSIX filesystems in real world too, very
rarely.


I don't want to spoil your day but testing with st_ino==0 is a bad choice
because it is a special number. Anyway, one can only find breakage,
not prove that all the other programs handle this correctly so this is
kind of pointless.

On any decent filesystem st_ino should uniquely identify an object and
reliably provide hardlink information. The UNIX world has relied upon this
for decades. A filesystem with st_ino collisions without being hardlinked
(or the other way around) needs a fix.


... and that's the problem --- the UNIX world specified something that 
isn't implementable in real world.


You can take a closed box and say "this is POSIX cerified" --- but how 
useful such box could be, if you can't access CDs, diskettes and USB 
sticks with it?


Mikulas
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Finding hardlinks

2007-01-03 Thread Mikulas Patocka



On Wed, 3 Jan 2007, Miklos Szeredi wrote:


High probability is all you have.  Cosmic radiation hitting your
computer will more likly cause problems, than colliding 64bit inode
numbers ;)


Some of us have machines designed to cope with cosmic rays, and would be
unimpressed with a decrease in reliability.


With the suggested samefile() interface you'd get a failure with just
about 100% reliability for any application which needs to compare a
more than a few files.  The fact is open files are _very_ expensive,
no wonder they are limited in various ways.

What should 'tar' do when it runs out of open files, while searching
for hardlinks?  Should it just give up?  Then the samefile() interface
would be _less_ reliable than the st_ino one by a significant margin.


You could do samefile() for paths --- as for races --- it doesn't matter 
in this scenario, it is no more racy than stat or lstat.


Mikulas


Miklos


-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Finding hardlinks

2007-01-02 Thread Mikulas Patocka

On Wed, 3 Jan 2007, Trond Myklebust wrote:


On Sat, 2006-12-30 at 02:04 +0100, Mikulas Patocka wrote:


On Fri, 29 Dec 2006, Trond Myklebust wrote:


On Thu, 2006-12-28 at 19:14 +0100, Mikulas Patocka wrote:

Why don't you rip off the support for colliding inode number from the
kernel at all (i.e. remove iget5_locked)?

It's reasonable to have either no support for colliding ino_t or full
support for that (including syscalls that userspace can use to work with
such filesystem) --- but I don't see any point in having half-way support
in kernel as is right now.


What would ino_t have to do with inode numbers? It is only used as a
hash table lookup. The inode number is set in the ->getattr() callback.


The question is: why does the kernel contain iget5 function that looks up
according to callback, if the filesystem cannot have more than 64-bit
inode identifier?


Huh? The filesystem can have as large a damned identifier as it likes.
NFSv4 uses 128-byte filehandles, for instance.


But then it needs some other syscall to let applications determine 
hardlinks --- which was the initial topic in this thread.



POSIX filesystems are another matter. They can only have 64-bit
identifiers thanks to the requirement that inode numbers be 64-bit
unique and permanently stored, however Linux caters for a whole
truckload of filesystems which will never fit that label: look at all
those users of iunique(), for one...


I see them. The bad thing is that many programmers read POSIX, write 
programs as if POSIX specification was true and these programs break 
randomly on non-POSIX filesystem. Each non-POSIX filesystem invents st_ino 
on its own, trying to minimize hash collision, making the failure even 
less probable and worse to find.


The current situation is (for example) that cp does stat(), open(), 
fstat() and compares st_ino/st_dev --- if they mismatch, it writes error 
and doesn't copy files --- so if kernel removes the inode from cache 
between stat() and open() and filesystem uses iunique(), cp will fail.


What utilities should the user use on those non-POSIX filesystems, if not 
cp?


Probably some file-handling guidelines should be specified and written to 
Documentation/ as a form of standard that can appliaction programmers use.


Mikulas


Trond


-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Finding hardlinks

2007-01-02 Thread Mikulas Patocka

Certainly, but tar isn't going to remember all the inode numbers.
Even if you solve the storage requirements (not impossible) it would
have to do (4e9^2)/2=8e18 comparisons, which computers don't have
enough CPU power just yet.


It is remembering all inode numbers with nlink > 1 and many other tools
are remembering all directory inode numbers (see my other post on this
topic).


Don't you mean they are remembering all the inode numbers of the
directories _above_ the one they are currently working on?  I'm quite
sure they aren't remembering all the directories they have processed.


cp -a is remembering all directory inodes it has visited, not just path 
from root. If you have two directories with the same inode number 
anywhere in the tree, it will skip one of them.


Mikulas


It of course doesn't compare each number with all others, it is
using hashing.


Yes, I didn't think of that.


It doesn't matter if there are collisions within the filesystem, as
long as there are no collisions between the set of files an
application is working on at the same time.


--- that are all files in case of backup.


No, it's usually working with a _single_ file at a time.  It will
remember inode numbers of files with nlink > 1, but it won't remember
all the other inode numbers.

You could have a filesystem with 4billion files, each one having two
links.  Not a likely scenario though.

Miklos


-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Finding hardlinks

2007-01-02 Thread Mikulas Patocka



On Tue, 2 Jan 2007, Miklos Szeredi wrote:


It seems like the posix idea of unique  doesn't
hold water for modern file systems


are you really sure?


Well Jan's example was of Coda that uses 128-bit internal file ids.


and if so, why don't we fix *THAT* instead


Hmm, sometimes you can't fix the world, especially if the filesystem
is exported over NFS and has a problem with fitting its file IDs uniquely
into a 64-bit identifier.


Note, it's pretty easy to fit _anything_ into a 64-bit identifier with
the use of a good hash function.  The chance of an accidental
collision is infinitesimally small.  For a set of

 100 files: 0.03%
   1,000,000 files: 0.03%


I do not think we want to play with probability like this. I mean...
imagine 4G files, 1KB each. That's 4TB disk space, not _completely_
unreasonable, and collision probability is going to be ~100% due to
birthday paradox.

You'll still want to back up your 4TB server...


Certainly, but tar isn't going to remember all the inode numbers.
Even if you solve the storage requirements (not impossible) it would
have to do (4e9^2)/2=8e18 comparisons, which computers don't have
enough CPU power just yet.


It is remembering all inode numbers with nlink > 1 and many other tools 
are remembering all directory inode numbers (see my other post on this 
topic). It of course doesn't compare each number with all others, it is 
using hashing.



It doesn't matter if there are collisions within the filesystem, as
long as there are no collisions between the set of files an
application is working on at the same time.


--- that are all files in case of backup.


Miklos


Mikulas
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Finding hardlinks

2007-01-01 Thread Mikulas Patocka

On Mon, 1 Jan 2007, Jan Harkes wrote:


On Mon, Jan 01, 2007 at 11:47:06PM +0100, Mikulas Patocka wrote:

Anyway, cp -a is not the only application that wants to do hardlink
detection.


I tested programs for ino_t collision (I intentionally injected it) and
found that CP from coreutils 6.7 fails to copy directories but displays
error messages (coreutils 5 work fine). MC and ARJ skip directories with
colliding ino_t and pretend that operation completed successfuly. FTS
library fails to walk directories returning FTS_DC error. Diffutils, find,
grep fail to search directories with coliding inode numbers. Tar seems
tolerant except incremental backup (which I didn't try). All programs
except diff were tolerant to coliding ino_t on files.


Thanks for testing so many programs, but... did the files/symlinks with
colliding inode number have i_nlink > 1? Or did you also have directories
with colliding inode numbers. It looks like you've introduced hardlinked
directories in your test which are definitely not supported, in fact it
will probably cause not only issues for userspace programs, but also
locking and garbage collection issues in the kernel's dcache.


I tested it only on files without hardlink (with i_nlink == 1) --- most 
programs (except diff) are tolerant to collision, they won't store st_ino 
in memory unless i_nlink > 1.


I didn't hardlink directories, I just patched stat, lstat and fstat to 
always return st_ino == 0 --- and I've seen those failures. These failures 
are going to happen on non-POSIX filesystems in real world too, very 
rarely.


BTW. POSIX supports (optionally) hardlinked directories but doesn't 
supoprt colliding st_ino --- so programs act according to POSIX --- but 
the problem is that this POSIX requirement no longer represents real world 
situation.



I'm surprised you're seeing so many problems. The only find problem that
I am aware of is the one where it assumes that there will be only
i_nlink-2 subdirectories in a given directory, this optimization can be
disabled with -noleaf.


This is not a bug but a feature. If filesystem doesn't count 
subdirectories, it should set directory's n_link to 1 and find will be ok.


The only problems I've encountered with ino_t collisions are archivers 
and other programs that recursively try to copy a tree while preserving 
hardlinks. And in all cases these seem to have no problem with such 
collisions as long as i_nlink == 1.


Yes, but they have big problems with directory ino_t collisions. They 
think that directories are hardlinked and skip processing them.


Mikulas


Jan


-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Finding hardlinks

2007-01-01 Thread Mikulas Patocka

> BTW. How does ReiserFS find that a given inode number (or object ID in
> ReiserFS terminology) is free before assigning it to new file/directory?

reiserfs v3 has an extent map of free object identifiers in
super-block.


Inode free space can have at most 2^31 extents --- if inode numbers 
alternate between "allocated", "free". How do you pack it to superblock?



reiser4 used 64 bit object identifiers without reuse.


So you are going to hit the same problem as I did with SpadFS --- you 
can't export 64-bit inode number to userspace (programs without 
-D_FILE_OFFSET_BITS=64 will have stat() randomly failing with EOVERFLOW 
then) and if you export only 32-bit number, it will eventually wrap-around 
and colliding st_ino will cause data corruption with many userspace 
programs.


Mikulas
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Finding hardlinks

2007-01-01 Thread Mikulas Patocka

> The question is: why does the kernel contain iget5 function that looks up
> according to callback, if the filesystem cannot have more than 64-bit
> inode identifier?

Generally speaking, file system might have two different identifiers for
files:

- one that makes it easy to tell whether two files are the same one;

- one that makes it easy to locate file on the storage.

According to POSIX, inode number should always work as identifier of the
first class, but not necessary as one of the second. For example, in
reiserfs something called "a key" is used to locate on-disk inode, which
in turn, contains inode number. Identifiers of the second class tend to


BTW. How does ReiserFS find that a given inode number (or object ID in 
ReiserFS terminology) is free before assigning it to new file/directory?


Mikulas


live in directory entries, and during lookup we want to consult inode
cache _before_ reading inode from the disk (otherwise cache is mostly
useless), right? This means that some file systems want to index inodes
in a cache by something different than inode number.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Finding hardlinks

2007-01-01 Thread Mikulas Patocka

Hi!


If user (or script) doesn't specify that flag, it
doesn't help. I think
the best solution for these filesystems would be
either to add new syscall
int is_hardlink(char *filename1, char *filename2)
(but I know adding syscall bloat may be objectionable)


it's also the wrong api; the filenames may have been
changed under you
just as you return from this call, so it really is a
"was_hardlink_at_some_point()" as you specify it.
If you make it work on fd's.. it has a chance at least.


Yes, but it doesn't matter --- if the tree changes under
"cp -a" command, no one guarantees you what you get.
int fis_hardlink(int handle1, int handle 2);
Is another possibility but it can't detect hardlinked
symlinks.


Ugh. Is it even legal to hardlink symlinks?


Why it shoudln't be? It seems to work quite fine in Linux.


Anyway, cp -a is not the only application that wants to do hardlink
detection.


I tested programs for ino_t collision (I intentionally injected it) and 
found that CP from coreutils 6.7 fails to copy directories but displays 
error messages (coreutils 5 work fine). MC and ARJ skip directories with 
colliding ino_t and pretend that operation completed successfuly. FTS 
library fails to walk directories returning FTS_DC error. Diffutils, find, 
grep fail to search directories with coliding inode numbers. Tar seems 
tolerant except incremental backup (which I didn't try). All programs 
except diff were tolerant to coliding ino_t on files.


ino_t is no longer unique in many filesystems, it seems like quite serious 
data corruption possibility.


Mikulas


Pavel

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Finding hardlinks

2006-12-31 Thread Mikulas Patocka

On Wed, 20 Dec 2006, Al Viro wrote:


On Wed, Dec 20, 2006 at 05:50:11PM +0100, Miklos Szeredi wrote:

I don't see any problems with changing struct kstat.  There would be
reservations against changing inode.i_ino though.

So filesystems that have 64bit inodes will need a specialized
getattr() method instead of generic_fillattr().


And they are already free to do so.  And no, struct kstat doesn't need
to be changed - it has u64 ino already.


If I return 64-bit values as ino_t, 32-bit programs will get EOVERFLOW on 
stat attempt (even if they are not going to use st_ino in any way) --- I 
know that POSIX specifies it, but the question is if it is useful.


What is the correct solution? Mount option that can differentiate between 
32-bit colliding inode numbers and 64-bit non-colliding inode numbers? Or 
is there any better idea.


Given the fact that glibc compiles anything by default with 32-bit ino_t, 
I wonder if returning 64-bit inode number is possible at all.


Mikulas
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Finding hardlinks

2006-12-29 Thread Mikulas Patocka



On Fri, 29 Dec 2006, Trond Myklebust wrote:


On Thu, 2006-12-28 at 19:14 +0100, Mikulas Patocka wrote:

Why don't you rip off the support for colliding inode number from the
kernel at all (i.e. remove iget5_locked)?

It's reasonable to have either no support for colliding ino_t or full
support for that (including syscalls that userspace can use to work with
such filesystem) --- but I don't see any point in having half-way support
in kernel as is right now.


What would ino_t have to do with inode numbers? It is only used as a
hash table lookup. The inode number is set in the ->getattr() callback.


The question is: why does the kernel contain iget5 function that looks up 
according to callback, if the filesystem cannot have more than 64-bit 
inode identifier?


This lookup callback just induces writing bad filesystems with coliding 
inode numbers. Either remove coda, smb (and possibly other) filesystems 
from the kernel or make a proper support for userspace for them.


The situation is that current coreutils 6.7 fail to recursively copy 
directories if some two directories in the tree have coliding inode 
number, so you get random data corruption with these filesystems.


Mikulas
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Finding hardlinks

2006-12-28 Thread Mikulas Patocka

This sounds like a bug to me. It seems like we should have a one to one
correspondence of filehandle -> inode. In what situations would this not be the
case?


Well, the NFS protocol allows that [see rfc1813, p. 21: "If two file handles 
from
the same server are equal, they must refer to the same file, but if they are not
equal, no conclusions can be drawn."]

As an example, some file systems encode hint information into the filehandle
and the hints may change over time, another example is encoding parent
information into the filehandle and then handles representing hard links
to the same file from different directories will differ.


BTW. how does (or how should?) NFS client deal with cache coherency if 
filehandles for the same file differ?


Mikulas
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Finding hardlinks

2006-12-28 Thread Mikulas Patocka

On Thu, 28 Dec 2006, Arjan van de Ven wrote:




It seems like the posix idea of unique  doesn't
hold water for modern file systems


are you really sure?
and if so, why don't we fix *THAT* instead, rather than adding racy
syscalls and such that just can't really be used right...


Why don't you rip off the support for colliding inode number from the 
kernel at all (i.e. remove iget5_locked)?


It's reasonable to have either no support for colliding ino_t or full 
support for that (including syscalls that userspace can use to work with 
such filesystem) --- but I don't see any point in having half-way support 
in kernel as is right now.


As for syscall races --- if you pack something with tar and the directory 
changes underneath, you can't expect sane output anyway.


Mikulas
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Finding hardlinks

2006-12-23 Thread Mikulas Patocka

If user (or script) doesn't specify that flag, it doesn't help. I think
the best solution for these filesystems would be either to add new syscall
int is_hardlink(char *filename1, char *filename2)
(but I know adding syscall bloat may be objectionable)


it's also the wrong api; the filenames may have been changed under you
just as you return from this call, so it really is a
"was_hardlink_at_some_point()" as you specify it.
If you make it work on fd's.. it has a chance at least.


Yes, but it doesn't matter --- if the tree changes under "cp -a" command, 
no one guarantees you what you get.

int fis_hardlink(int handle1, int handle 2);
Is another possibility but it can't detect hardlinked symlinks.

Mikulas
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Finding hardlinks

2006-12-21 Thread Mikulas Patocka

On Thu, 21 Dec 2006, Jan Harkes wrote:


On Wed, Dec 20, 2006 at 12:44:42PM +0100, Miklos Szeredi wrote:

The stat64.st_ino field is 64bit, so AFAICS you'd only need to extend
the kstat.ino field to 64bit and fix those filesystems to fill in
kstat correctly.


Coda actually uses 128-bit file identifiers internally, so 64-bits
really doesn't cut it. Since the 128-bit space is used pretty sparsely
there is a hash which avoids most collistions in 32-bit i_ino space, but
not completely. I can also imagine that at some point someone wants to
implement a git-based filesystem where it would be more natural to use
160-bit SHA1 hashes as unique object identifiers.

But Coda only allow hardlinks within a single directory and if someone
renames a hardlinked file and one of the names ends up in a different
directory we implicitly create a copy of the object. This actually
leverages off of the way we handle volume snapshots and the fact that we
use whole file caching and writes, so we only copy the metadata while
the data is 'copy-on-write'.


The problem is that if inode number collision happens occasionally, you 
get data corruption with cp -a command --- it will just copy one file and 
hardlink the other.



Any application that tries to be smart enough to keep track of which
files are hardlinked should (in my opinion) also have a way to disable
this behaviour.


If user (or script) doesn't specify that flag, it doesn't help. I think 
the best solution for these filesystems would be either to add new syscall

int is_hardlink(char *filename1, char *filename2)
(but I know adding syscall bloat may be objectionable)
or add new field in statvfs ST_HAS_BROKEN_INO_T, that applications can 
test and disable hardlink processing.


Mikulas


Jan


-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Finding hardlinks

2006-12-20 Thread Mikulas Patocka



On Wed, 20 Dec 2006, Al Viro wrote:


On Wed, Dec 20, 2006 at 05:50:11PM +0100, Miklos Szeredi wrote:

I don't see any problems with changing struct kstat.  There would be
reservations against changing inode.i_ino though.

So filesystems that have 64bit inodes will need a specialized
getattr() method instead of generic_fillattr().


And they are already free to do so.  And no, struct kstat doesn't need
to be changed - it has u64 ino already.


I see, I should have checked recent kernel.

Mikulas
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Finding hardlinks

2006-12-20 Thread Mikulas Patocka

I've came across this problem: how can a userspace program (such as for
example "cp -a") tell that two files form a hardlink? Comparing inode
number will break on filesystems that can have more than 2^32 files (NFS3,
OCFS, SpadFS; kernel developers already implemented iget5_locked for the
case of colliding inode numbers). Other possibilities:

--- compare not only ino, but all stat entries and make sure that
i_nlink > 1?
--- is not 100% reliable either, only lowers failure probability
--- create a hardlink and watch if i_nlink is increased on both files?
--- doesn't work on read-only filesystems
--- compare file content?
--- "cp -a" won't then corrupt data at least, but will create
hardlinks where they shouldn't be.

Is there some reliable way how should "cp -a" command determine that?
Finding in kernel whether two dentries point to the same inode is trivial
but I am not sure how to let userspace know ... am I missing something?


The stat64.st_ino field is 64bit, so AFAICS you'd only need to extend
the kstat.ino field to 64bit and fix those filesystems to fill in
kstat correctly.


There is 32-bit __st_ino and 64-bit st_ino --- what is their purpose? Some 
old compatibility code?



SUSv3 requires st_ino/st_dev to be unique within a system so the
application shouldn't need to bend over backwards.


I see but kernel needs to be fixed for that. Would patches for changing 
kstat be accepted?


Mikulas


Miklos


-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Finding hardlinks

2006-12-20 Thread Mikulas Patocka

Hi

I've came across this problem: how can a userspace program (such as for 
example "cp -a") tell that two files form a hardlink? Comparing inode 
number will break on filesystems that can have more than 2^32 files (NFS3, 
OCFS, SpadFS; kernel developers already implemented iget5_locked for the 
case of colliding inode numbers). Other possibilities:


--- compare not only ino, but all stat entries and make sure that
i_nlink > 1?
--- is not 100% reliable either, only lowers failure probability
--- create a hardlink and watch if i_nlink is increased on both files?
--- doesn't work on read-only filesystems
--- compare file content?
--- "cp -a" won't then corrupt data at least, but will create
hardlinks where they shouldn't be.

Is there some reliable way how should "cp -a" command determine that? 
Finding in kernel whether two dentries point to the same inode is trivial 
but I am not sure how to let userspace know ... am I missing something?


Mikulas
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -mm] hpfs: fix printk format warnings

2006-11-19 Thread Mikulas Patocka

From: Randy Dunlap <[EMAIL PROTECTED]>

Fix hpfs printk warnings:
(why do I only see these in -mm?)


Probably because -mm has unsigned long inode number and Linus' kernel has 
just unsigned int?


Change it this way:
hpfs_error(inode->i_sb, "not a directory, fnode %08lx", 
(unsigned long)inode->i_ino);

--- so that it can work on both.

Mikulas


fs/hpfs/dir.c:87: warning: format '%08x' expects type 'unsigned int', but 
argument 3 has type 'long unsigned int'
fs/hpfs/dir.c:147: warning: format '%08x' expects type 'unsigned int', but 
argument 3 has type 'long int'
fs/hpfs/dir.c:148: warning: format '%08x' expects type 'unsigned int', but 
argument 3 has type 'long int'
fs/hpfs/dnode.c:537: warning: format '%08x' expects type 'unsigned int', but 
argument 5 has type 'long unsigned int'
fs/hpfs/dnode.c:854: warning: format '%08x' expects type 'unsigned int', but 
argument 3 has type 'loff_t'
fs/hpfs/ea.c:247: warning: format '%08x' expects type 'unsigned int', but 
argument 3 has type 'long unsigned int'
fs/hpfs/inode.c:254: warning: format '%08x' expects type 'unsigned int', but 
argument 3 has type 'long unsigned int'
fs/hpfs/map.c:129: warning: format '%08x' expects type 'unsigned int', but 
argument 3 has type 'ino_t'
fs/hpfs/map.c:135: warning: format '%08x' expects type 'unsigned int', but 
argument 3 has type 'ino_t'
fs/hpfs/map.c:140: warning: format '%08x' expects type 'unsigned int', but 
argument 3 has type 'ino_t'
fs/hpfs/map.c:147: warning: format '%08x' expects type 'unsigned int', but 
argument 3 has type 'ino_t'
fs/hpfs/map.c:154: warning: format '%08x' expects type 'unsigned int', but 
argument 3 has type 'ino_t'

Signed-off-by: Randy Dunlap <[EMAIL PROTECTED]>
---
fs/hpfs/dir.c   |   10 +++---
fs/hpfs/dnode.c |   13 +
fs/hpfs/ea.c|2 +-
fs/hpfs/inode.c |5 -
fs/hpfs/map.c   |   20 ++--
5 files changed, 35 insertions(+), 15 deletions(-)

--- linux-2619-rc5mm2.orig/fs/hpfs/dir.c
+++ linux-2619-rc5mm2/fs/hpfs/dir.c
@@ -84,7 +84,8 @@ static int hpfs_readdir(struct file *fil
}
if (!fno->dirflag) {
e = 1;
-   hpfs_error(inode->i_sb, "not a directory, fnode 
%08x",inode->i_ino);
+   hpfs_error(inode->i_sb, "not a directory, fnode %08lx",
+   inode->i_ino);
}
if (hpfs_inode->i_dno != fno->u.external[0].disk_secno) {
e = 1;
@@ -144,8 +145,11 @@ static int hpfs_readdir(struct file *fil
}
if (de->first || de->last) {
if (hpfs_sb(inode->i_sb)->sb_chk) {
-   if (de->first && !de->last && (de->namelen != 2 || de ->name[0] 
!= 1 || de->name[1] != 1)) hpfs_error(inode->i_sb, "hpfs_readdir: bad ^A^A entry; pos = %08x", old_pos);
-   if (de->last && (de->namelen != 1 || de ->name[0] != 255)) 
hpfs_error(inode->i_sb, "hpfs_readdir: bad \\377 entry; pos = %08x", old_pos);
+   if (de->first && !de->last && (de->namelen != 2
+   || de ->name[0] != 1 || de->name[1] != 1))
+   hpfs_error(inode->i_sb, "hpfs_readdir: bad 
^A^A entry; pos = %08lx", old_pos);
+   if (de->last && (de->namelen != 1 || de 
->name[0] != 255))
+   hpfs_error(inode->i_sb, "hpfs_readdir: bad 
\\377 entry; pos = %08lx", old_pos);
}
hpfs_brelse4(&qbh);
goto again;
--- linux-2619-rc5mm2.orig/fs/hpfs/dnode.c
+++ linux-2619-rc5mm2/fs/hpfs/dnode.c
@@ -533,10 +533,13 @@ static void delete_empty_dnode(struct in
struct buffer_head *bh;
struct dnode *d1;
struct quad_buffer_head qbh1;
-   if (hpfs_sb(i->i_sb)->sb_chk) if (up != i->i_ino) {
-   hpfs_error(i->i_sb, "bad pointer to fnode, dnode %08x, 
pointing to %08x, should be %08x", dno, up, i->i_ino);
+   if (hpfs_sb(i->i_sb)->sb_chk)
+   if (up != i->i_ino) {
+   hpfs_error(i->i_sb,
+   "bad pointer to fnode, dnode %08x, pointing 
to %08x, should be %08lx",
+   dno, up, i->i_ino);
return;
-   }
+   }
if ((d1 = hpfs_map_dnode(i->i_sb, down, &qbh1))) {
d1->up = up;
d1->root_dnode = 1;
@@ -851,7 +854,9 @@ struct hpfs_dirent *map_pos_dirent(struc
/* Going to the next dirent */
if ((d = de_next_de(de)) < dnode_end_de(dnode)) {
if (!(++*posp & 077)) {
-   hpfs_error(