RFC: How du counts size of hardlinked files

2006-01-13 Thread Johannes Niess
Hi list,

du (with default options) seems to count files with multiple hard links in the 
first directory it traverses.

The -l option changes that.

But there are other valid viewpoints.

Somehow the byte count of multiple hardlinks partially belongs to all of them, 
even when not part of traversed directories. In this mode a file with 10 
bytes and 3 hardlinks would be counted as 3 files with 3 bytes (an only one 
hardlink) each. The rounding error of integers is acceptable in this 
'approximate' mode. Programmatically this is should be very similar to the -l 
mode. Use case: Different physical owners of the hardlinks and doing fair 
accounting for them. (Of course the inode has only one common logical owner 
for all directory entries).

Not counting multiple AND out-of-tree hardlinks is also usefull. It tells us 
how much space we really gain when deleting that tree. 'rm-size' could be a 
name for this mode. Programmatically this is similar to default mode: In Perl 
I'd use hash keys for the test in default mode. In 'rm-size' mode I'd 
increase the hash values of visited inodes.  Finally compare # of visited 
directory entries to the # of links.

du seems to be the natural home for this functionality. Or is it feature 
bloat?

Background: Backups via 'cp -l' need (almost) no space for files unchanged in 
several cycles. But these shadow forests of hardlinks are difficult to 
account for. Especially when combined with finding and linking identical  
files across several physical owners.

Johannes Niess

P.S: I'm not volunteering to implement this. I did not even feel enough need 
to do the perl scripts.


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: RFC: How du counts size of hardlinked files

2006-01-13 Thread Phillip Susi
Maybe I misunderstood you but you seem to think that each hard link to 
the same file can have different ownerships.  This is not the case. 
Hard links are just additional names for the same inode, and permissions 
and ownership is associated with the inode, not the name(s).


Also I just tested it and du doesn't report the size used by duplicate 
hard links in the tree twice.  I did a cp -al foo bar, then a du -sh, du 
-sh foo, and they were both the same size.




Johannes Niess wrote:

Hi list,

du (with default options) seems to count files with multiple hard links in the 
first directory it traverses.


The -l option changes that.

But there are other valid viewpoints.

Somehow the byte count of multiple hardlinks partially belongs to all of them, 
even when not part of traversed directories. In this mode a file with 10 
bytes and 3 hardlinks would be counted as 3 files with 3 bytes (an only one 
hardlink) each. The rounding error of integers is acceptable in this 
'approximate' mode. Programmatically this is should be very similar to the -l 
mode. Use case: Different physical owners of the hardlinks and doing fair 
accounting for them. (Of course the inode has only one common logical owner 
for all directory entries).


Not counting multiple AND out-of-tree hardlinks is also usefull. It tells us 
how much space we really gain when deleting that tree. 'rm-size' could be a 
name for this mode. Programmatically this is similar to default mode: In Perl 
I'd use hash keys for the test in default mode. In 'rm-size' mode I'd 
increase the hash values of visited inodes.  Finally compare # of visited 
directory entries to the # of links.


du seems to be the natural home for this functionality. Or is it feature 
bloat?


Background: Backups via 'cp -l' need (almost) no space for files unchanged in 
several cycles. But these shadow forests of hardlinks are difficult to 
account for. Especially when combined with finding and linking identical  
files across several physical owners.


Johannes Niess

P.S: I'm not volunteering to implement this. I did not even feel enough need 
to do the perl scripts.






___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: RFC: How du counts size of hardlinked files

2006-01-13 Thread Johannes Niess
Hi Phillip,

Hard links and file sizes are concepts that don't fit each other well. The 
best fit depends on what you are asking for.

bash-2.05b$ cp -al a/ b
bash-2.05b$ du -s a b .
34540   a
34540   b
34556   .
bash-2.05b$ du -sc a b .
34540   a
12  b
4   .
34556   total
bash-2.05b$ du -scl a b .
34540   a
34540   b
69084   .
138164  total
bash-2.05b$ 

Am Freitag, 13. Januar 2006 19:56 schrieb Phillip Susi:
 Maybe I misunderstood you but you seem to think that each hard link to
 the same file can have different ownerships.  This is not the case.
 Hard links are just additional names for the same inode, and permissions
 and ownership is associated with the inode, not the name(s).

I know that. So I made the distinction between physical (customer) and logical 
(file system) owner. A file hardlinked between 2 customers belongs to both of 
them. It is quite unpredictable which directory entry (i.e one of the links 
to the inode) du finds first. This directory has the inode size added to its 
sum.


 Also I just tested it and du doesn't report the size used by duplicate
 hard links in the tree twice.  I did a cp -al foo bar, then a du -sh, du
 -sh foo, and they were both the same size.

That's correct without -l. The sizes do not add up: 'du ./foo' + 
'du ./bar'  (my two customers point of view) != 'du .' (disk space I need in 
the server).

'du -l' counts the links multiple times. 'du ./foo' = 'du ./bar' = 0.5 'du .';  
The overall size is from a customer perspective.

My approximate mode would count two halves. 0.5 'du ./foo' + 0.5 'du ./bar' = 
'du .'; That's the admins size perspective. In reality there is no fixed 
factor to du -l.

Johannes


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils