Re: dpkg and hardlinks

2009-03-24 Thread Raphael Hertzog
On Tue, 24 Mar 2009, Jerome Warnier wrote:
 For files from packages, though, deduplication might be a good idea, as
 dpkg is supposedly the only one to ever modify the files (under /usr for
 example).
 I don't know however how dpkg treats hardlinks. Does it break the
 hardlink before replacing a file or does it replace the file whatever
 its real nature is?

IIRC dpkg preserves hardlinks inside a binary package but I don't see how
it could do the same across multiple binary packages.

Cheers,
-- 
Raphaël Hertzog

Contribuez à Debian et gagnez un cahier de l'admin Debian Lenny :
http://www.ouaza.com/wp/2009/03/02/contribuer-a-debian-gagner-un-livre/


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: dpkg and hardlinks

2009-03-24 Thread Jerome Warnier
Raphael Hertzog wrote:
 On Tue, 24 Mar 2009, Jerome Warnier wrote:
   
 For files from packages, though, deduplication might be a good idea, as
 dpkg is supposedly the only one to ever modify the files (under /usr for
 example).
 I don't know however how dpkg treats hardlinks. Does it break the
 hardlink before replacing a file or does it replace the file whatever
 its real nature is?
 

 IIRC dpkg preserves hardlinks inside a binary package but I don't see how
 it could do the same across multiple binary packages.
   
Oh, I didn't expect it to. I just wanted to know its behaviour when it
upgrades a package.
Before the upgrade, the file is a hardlink (because I hardlinked it
manually), then it tries to upgrade the file/hardlink. Does it break
the hardlink* before upgrading the file or does it overwrite the
file/hardlink and all of its siblings?

 Cheers,
   

* because it knows it is supposed to be a plain file, and it no longer is.


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: dpkg and hardlinks

2009-03-24 Thread Mike Hommey
On Tue, Mar 24, 2009 at 02:09:25PM +0100, Raphael Hertzog hert...@debian.org 
wrote:
 On Tue, 24 Mar 2009, Jerome Warnier wrote:
  For files from packages, though, deduplication might be a good idea, as
  dpkg is supposedly the only one to ever modify the files (under /usr for
  example).
  I don't know however how dpkg treats hardlinks. Does it break the
  hardlink before replacing a file or does it replace the file whatever
  its real nature is?
 
 IIRC dpkg preserves hardlinks inside a binary package but I don't see how
 it could do the same across multiple binary packages.

I think the question is more something like:
Package foo has file a
Package bar had file b
They are actually the same content, so the user hardlinks a and b.
What happens when bar is updated with a different b file ?

The answer, AFAIK, is that dpkg will do the right thing, namely, to
replace the content of b, but not of a, because it actually doesn't put
the content in b but rather in another file that it renames, eventually,
to b.

On the other hand, if package bar is updated with an unmodified b, the
hardlink will be broken anyway, because dpkg does the above even when files
are not modified. But I could be wrong on this one.

Mike


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: dpkg and hardlinks

2009-03-24 Thread Giacomo A. Catenazzi

Jerome Warnier wrote:

Raphael Hertzog wrote:

On Tue, 24 Mar 2009, Jerome Warnier wrote:
  

For files from packages, though, deduplication might be a good idea, as
dpkg is supposedly the only one to ever modify the files (under /usr for
example).
I don't know however how dpkg treats hardlinks. Does it break the
hardlink before replacing a file or does it replace the file whatever
its real nature is?


IIRC dpkg preserves hardlinks inside a binary package but I don't see how
it could do the same across multiple binary packages.
  

Oh, I didn't expect it to. I just wanted to know its behaviour when it
upgrades a package.
Before the upgrade, the file is a hardlink (because I hardlinked it
manually), then it tries to upgrade the file/hardlink. Does it break
the hardlink* before upgrading the file or does it overwrite the
file/hardlink and all of its siblings?


Do you really care? (not theoretically, but in normal use).
I would expect that same content will be delivered:
- by brother packages (same source), thus usually updated
  at the same time.
- in documentation (so maybe not so important for your use).

I think the most problem are in files outside dpkg control,
i.e. /var and /etc.

I'm just curious: do you have a list of same content files?
maybe I'm completely wrong.

ciao
cate


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: dpkg and hardlinks

2009-03-24 Thread Sune Vuorela
On 2009-03-24, Mike Hommey m...@glandium.org wrote:
 On Tue, Mar 24, 2009 at 02:09:25PM +0100, Raphael Hertzog 
 hert...@debian.org wrote:
 On Tue, 24 Mar 2009, Jerome Warnier wrote:
  For files from packages, though, deduplication might be a good idea, as
  dpkg is supposedly the only one to ever modify the files (under /usr for
  example).
  I don't know however how dpkg treats hardlinks. Does it break the
  hardlink before replacing a file or does it replace the file whatever
  its real nature is?
 
 IIRC dpkg preserves hardlinks inside a binary package but I don't see how
 it could do the same across multiple binary packages.

 I think the question is more something like:
 Package foo has file a
 Package bar had file b
 They are actually the same content, so the user hardlinks a and b.
 What happens when bar is updated with a different b file ?

 The answer, AFAIK, is that dpkg will do the right thing, namely, to
 replace the content of b, but not of a, because it actually doesn't put
 the content in b but rather in another file that it renames, eventually,
 to b.

iirc, dpkg starts with chmod'ing 0600 a file and then replacing it.

But let us test:

s...@gladstone:/var/tmp/user$ ln /usr/bin/sudo
s...@gladstone:/var/tmp/user$ ls -l
-rwsr-xr-x 3 root root 113916 2009-01-27 19:57 sudo
s...@gladstone:/var/tmp/user$ sudo apt-get --reinstall install sudo
[many lines of apt]
s...@gladstone:/var/tmp/user$ ls -l
-rw--- 1 root root 113916 2009-01-27 19:57 sudo


for non-suid files, it is a bit different:

s...@gladstone:/var/tmp/user$ ln /bin/ls
s...@gladstone:/var/tmp/user$ ls -l ls
-rwxr-xr-x 2 root root 100564 2009-02-22 23:40 ls
s...@gladstone:/var/tmp/user$ sudo apt-get --reinstall install coreutils
s...@gladstone:/var/tmp/user$ ls -l ls
-rwxr-xr-x 1 root root 100564 2009-02-22 23:40 ls

/Sune


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: dpkg and hardlinks

2009-03-24 Thread Peter Samuelson

[Jerome Warnier]
 I don't know however how dpkg treats hardlinks. Does it break the
 hardlink before replacing a file or does it replace the file whatever
 its real nature is?

You know, given the time it takes to type a 20-line email, including
finding the appropriate Wikipedia article to link to, it would have
been a lot faster to just try it.

  # ln /bin/ls /bin/ls2
  # aptitude reinstall coreutils
  # ls -l /bin/ls /bin/ls2

-- 
Peter Samuelson | org-tld!p12n!peter | http://p12n.org/


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: dpkg and hardlinks

2009-03-24 Thread Jerome Warnier
Peter Samuelson wrote:
 [Jerome Warnier]
   
 I don't know however how dpkg treats hardlinks. Does it break the
 hardlink before replacing a file or does it replace the file whatever
 its real nature is?
 

 You know, given the time it takes to type a 20-line email, including
 finding the appropriate Wikipedia article to link to, it would have
 been a lot faster to just try it.

   # ln /bin/ls /bin/ls2
   # aptitude reinstall coreutils
   # ls -l /bin/ls /bin/ls2
   
Maybe, but I also wanted to bring attention to it. ;-)

Interesting subject, isn't it?


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: dpkg and hardlinks

2009-03-24 Thread Jerome Warnier
Giacomo A. Catenazzi wrote:
 Jerome Warnier wrote:
 Raphael Hertzog wrote:
 On Tue, 24 Mar 2009, Jerome Warnier wrote:
  
 For files from packages, though, deduplication might be a good
 idea, as
 dpkg is supposedly the only one to ever modify the files (under
 /usr for
 example).
 I don't know however how dpkg treats hardlinks. Does it break the
 hardlink before replacing a file or does it replace the file whatever
 its real nature is?
 
 IIRC dpkg preserves hardlinks inside a binary package but I don't
 see how
 it could do the same across multiple binary packages.
   
 Oh, I didn't expect it to. I just wanted to know its behaviour when it
 upgrades a package.
 Before the upgrade, the file is a hardlink (because I hardlinked it
 manually), then it tries to upgrade the file/hardlink. Does it break
 the hardlink* before upgrading the file or does it overwrite the
 file/hardlink and all of its siblings?

 Do you really care? (not theoretically, but in normal use).
 I would expect that same content will be delivered:
 - by brother packages (same source), thus usually updated
   at the same time.
 - in documentation (so maybe not so important for your use).

 I think the most problem are in files outside dpkg control,
 i.e. /var and /etc.

 I'm just curious: do you have a list of same content files?
 maybe I'm completely wrong.
Here you are, for /usr on a typical Lenny AMD64 server (generated with
finddup -n from package perforate):
http://glouglou.beeznest.org/~jwarnier/usr-duplicates.list.gz

 ciao
 cate




-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: dpkg and hardlinks

2009-03-24 Thread Steve McIntyre
In article 49c8dcdb.90...@beeznest.net you write:
Before the upgrade, the file is a hardlink (because I hardlinked it
manually), then it tries to upgrade the file/hardlink. Does it break
the hardlink* before upgrading the file or does it overwrite the
file/hardlink and all of its siblings?

* because it knows it is supposed to be a plain file, and it no longer is.

Your language suggests that you don't understand how hard links
work. A hard link to a file *is* a plain file.

See http://en.wikipedia.org/wiki/Hard_link for some explanation.

-- 
Steve McIntyre, Cambridge, UK.st...@einval.com
  Armed with Valor: Centurion represents quality of Discipline,
  Honor, Integrity and Loyalty. Now you don't have to be a Caesar to
  concord the digital world while feeling safe and proud.


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: dpkg and hardlinks

2009-03-24 Thread Mike Hommey
On Tue, Mar 24, 2009 at 02:34:09PM +0100, Jerome Warnier 
jwarn...@beeznest.net wrote:
 Giacomo A. Catenazzi wrote:
  Jerome Warnier wrote:
  Raphael Hertzog wrote:
  On Tue, 24 Mar 2009, Jerome Warnier wrote:
   
  For files from packages, though, deduplication might be a good
  idea, as
  dpkg is supposedly the only one to ever modify the files (under
  /usr for
  example).
  I don't know however how dpkg treats hardlinks. Does it break the
  hardlink before replacing a file or does it replace the file whatever
  its real nature is?
  
  IIRC dpkg preserves hardlinks inside a binary package but I don't
  see how
  it could do the same across multiple binary packages.

  Oh, I didn't expect it to. I just wanted to know its behaviour when it
  upgrades a package.
  Before the upgrade, the file is a hardlink (because I hardlinked it
  manually), then it tries to upgrade the file/hardlink. Does it break
  the hardlink* before upgrading the file or does it overwrite the
  file/hardlink and all of its siblings?
 
  Do you really care? (not theoretically, but in normal use).
  I would expect that same content will be delivered:
  - by brother packages (same source), thus usually updated
at the same time.
  - in documentation (so maybe not so important for your use).
 
  I think the most problem are in files outside dpkg control,
  i.e. /var and /etc.
 
  I'm just curious: do you have a list of same content files?
  maybe I'm completely wrong.
 Here you are, for /usr on a typical Lenny AMD64 server (generated with
 finddup -n from package perforate):
 http://glouglou.beeznest.org/~jwarnier/usr-duplicates.list.gz

$ zcat usr-duplicates.list.gz | awk '{t+=$1*(NF-2)}END{print t}'
33142129

You would free 33MB. How big is your disk ? Is it worth bothering ?

You can get much more free space than that by reducing the number of inodes
supported by your filesystem:
For instance, on my / fs, that contains /usr, and is only 3GB:
Inode count:  384000
Free inodes:  314133

I will obviously never use that many inodes... Now, consider an inode
is 128 bytes (or even 256 in some cases), and do some maths...

Mike


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: dpkg and hardlinks

2009-03-24 Thread Jerome Warnier
Steve McIntyre wrote:
 In article 49c8dcdb.90...@beeznest.net you write:
   
 Before the upgrade, the file is a hardlink (because I hardlinked it
 manually), then it tries to upgrade the file/hardlink. Does it break
 the hardlink* before upgrading the file or does it overwrite the
 file/hardlink and all of its siblings?

 * because it knows it is supposed to be a plain file, and it no longer is.
 

 Your language suggests that you don't understand how hard links
 work. A hard link to a file *is* a plain file.

 See http://en.wikipedia.org/wiki/Hard_link for some explanation.

   
Of course I know what a hardlink is. I'm not native in English, and even
tried to find the right words on the Net before writing, but I couldn't
find better ones.

The question here is: which one is the hardlink to the other? :-P


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: dpkg and hardlinks

2009-03-24 Thread Jerome Warnier
Mike Hommey wrote:
 On Tue, Mar 24, 2009 at 02:34:09PM +0100, Jerome Warnier 
 jwarn...@beeznest.net wrote:
   
 Giacomo A. Catenazzi wrote:
 
 Jerome Warnier wrote:
   
 Raphael Hertzog wrote:
 
 On Tue, 24 Mar 2009, Jerome Warnier wrote:
  
   
 For files from packages, though, deduplication might be a good
 idea, as
 dpkg is supposedly the only one to ever modify the files (under
 /usr for
 example).
 I don't know however how dpkg treats hardlinks. Does it break the
 hardlink before replacing a file or does it replace the file whatever
 its real nature is?
 
 
 IIRC dpkg preserves hardlinks inside a binary package but I don't
 see how
 it could do the same across multiple binary packages.
   
   
 Oh, I didn't expect it to. I just wanted to know its behaviour when it
 upgrades a package.
 Before the upgrade, the file is a hardlink (because I hardlinked it
 manually), then it tries to upgrade the file/hardlink. Does it break
 the hardlink* before upgrading the file or does it overwrite the
 file/hardlink and all of its siblings?
 
 Do you really care? (not theoretically, but in normal use).
 I would expect that same content will be delivered:
 - by brother packages (same source), thus usually updated
   at the same time.
 - in documentation (so maybe not so important for your use).

 I think the most problem are in files outside dpkg control,
 i.e. /var and /etc.

 I'm just curious: do you have a list of same content files?
 maybe I'm completely wrong.
   
 Here you are, for /usr on a typical Lenny AMD64 server (generated with
 finddup -n from package perforate):
 http://glouglou.beeznest.org/~jwarnier/usr-duplicates.list.gz
 

 $ zcat usr-duplicates.list.gz | awk '{t+=$1*(NF-2)}END{print t}'
 33142129

 You would free 33MB. How big is your disk ? Is it worth bothering ?
   
I'm not an awk god, but isn't that supposed to just be the total size of
the files it could take if deduplicated?
In this case, it is not the size I would reclaim, as there are sometimes
up to 4 copies of the same content.
 You can get much more free space than that by reducing the number of inodes
 supported by your filesystem:
 For instance, on my / fs, that contains /usr, and is only 3GB:
 Inode count:  384000
 Free inodes:  314133

 I will obviously never use that many inodes... Now, consider an inode
 is 128 bytes (or even 256 in some cases), and do some maths...

 Mike
   


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: dpkg and hardlinks

2009-03-24 Thread Heiko Schlittermann
Jerome Warnier jwarn...@beeznest.net (Di 24 Mär 2009 14:58:35 CET):
 
 The question here is: which one is the hardlink to the other? :-P

You can't distinguish hardlinks from each other - in the sense of
original and link...

They are just different directory entries referring to the same file system
object.

Best regards from Dresden/Germany
Viele Grüße aus Dresden
Heiko Schlittermann
-- 
 SCHLITTERMANN.de  internet  unix support -
 Heiko Schlittermann HS12-RIPE -
 gnupg encrypted messages are welcome - key ID: 48D0359B ---
 gnupg fingerprint: 3061 CFBF 2D88 F034 E8D2  7E92 EE4E AC98 48D0 359B -


signature.asc
Description: Digital signature


Re: dpkg and hardlinks

2009-03-24 Thread Mike Hommey
On Tue, Mar 24, 2009 at 03:11:17PM +0100, Jerome Warnier 
jwarn...@beeznest.net wrote:
 Mike Hommey wrote:
  On Tue, Mar 24, 2009 at 02:34:09PM +0100, Jerome Warnier 
  jwarn...@beeznest.net wrote:

  Giacomo A. Catenazzi wrote:
  
  Jerome Warnier wrote:

  Raphael Hertzog wrote:
  
  On Tue, 24 Mar 2009, Jerome Warnier wrote:
   

  For files from packages, though, deduplication might be a good
  idea, as
  dpkg is supposedly the only one to ever modify the files (under
  /usr for
  example).
  I don't know however how dpkg treats hardlinks. Does it break the
  hardlink before replacing a file or does it replace the file whatever
  its real nature is?
  
  
  IIRC dpkg preserves hardlinks inside a binary package but I don't
  see how
  it could do the same across multiple binary packages.


  Oh, I didn't expect it to. I just wanted to know its behaviour when it
  upgrades a package.
  Before the upgrade, the file is a hardlink (because I hardlinked it
  manually), then it tries to upgrade the file/hardlink. Does it break
  the hardlink* before upgrading the file or does it overwrite the
  file/hardlink and all of its siblings?
  
  Do you really care? (not theoretically, but in normal use).
  I would expect that same content will be delivered:
  - by brother packages (same source), thus usually updated
at the same time.
  - in documentation (so maybe not so important for your use).
 
  I think the most problem are in files outside dpkg control,
  i.e. /var and /etc.
 
  I'm just curious: do you have a list of same content files?
  maybe I'm completely wrong.

  Here you are, for /usr on a typical Lenny AMD64 server (generated with
  finddup -n from package perforate):
  http://glouglou.beeznest.org/~jwarnier/usr-duplicates.list.gz
  
 
  $ zcat usr-duplicates.list.gz | awk '{t+=$1*(NF-2)}END{print t}'
  33142129
 
  You would free 33MB. How big is your disk ? Is it worth bothering ?

 I'm not an awk god, but isn't that supposed to just be the total size of
 the files it could take if deduplicated?
 In this case, it is not the size I would reclaim, as there are sometimes
 up to 4 copies of the same content.

the *(NF-2) part takes care of those copies.

Mike


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: dpkg and hardlinks

2009-03-24 Thread John H. Robinson, IV
Jerome Warnier wrote:
 Peter Samuelson wrote:
  [Jerome Warnier]

  I don't know however how dpkg treats hardlinks. Does it break the
  hardlink before replacing a file or does it replace the file whatever
  its real nature is?
  
 
  You know, given the time it takes to type a 20-line email, including
  finding the appropriate Wikipedia article to link to, it would have
  been a lot faster to just try it.
 
# ln /bin/ls /bin/ls2
# aptitude reinstall coreutils
# ls -l /bin/ls /bin/ls2

 Maybe, but I also wanted to bring attention to it. ;-)

I'm curious as to why no one is looking at the index node numbers
themselves.

[jh...@chao:/bin]% sudo ln ls ls2
[jh...@chao:/bin]% ls -il ls{,2}
7342643 -rwxr-xr-x 2 root root 101992 Apr  4  2008 ls
7342643 -rwxr-xr-x 2 root root 101992 Apr  4  2008 ls2
[jh...@chao:/bin]% sudo aptitude reinstall coreutils
. . .
[jh...@chao:/bin]% ls -il ls{,2}
7350701 -rwxr-xr-x 1 root root 101992 Apr  4  2008 ls
7342643 -rwxr-xr-x 1 root root 101992 Apr  4  2008 ls2

ls2 kept the old index node, but ls gets a brand new index node, thus
showing that, indeed, dpkg will break hardlinks upon upgrade.

-- 
John H. Robinson, IV  jaq...@debian.org
 http  
WARNING: I cannot be held responsible for the above, sbih.org ( )(:[
as apparently my cats have learned how to type.  spiders.html  


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: dpkg and hardlinks

2009-03-24 Thread Raphael Hertzog
On Tue, 24 Mar 2009, John H. Robinson, IV wrote:
 I'm curious as to why no one is looking at the index node numbers
 themselves.

Because the second field of ls -l is hardlink count and is enough
alone to conclude:

 7342643 -rwxr-xr-x 2 root root 101992 Apr  4  2008 ls
 ^
vs

 7350701 -rwxr-xr-x 1 root root 101992 Apr  4  2008 ls
 ^

Cheers,
-- 
Raphaël Hertzog

Contribuez à Debian et gagnez un cahier de l'admin Debian Lenny :
http://www.ouaza.com/wp/2009/03/02/contribuer-a-debian-gagner-un-livre/


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org