Re: dpkg and hardlinks
On Tue, 24 Mar 2009, Jerome Warnier wrote: For files from packages, though, deduplication might be a good idea, as dpkg is supposedly the only one to ever modify the files (under /usr for example). I don't know however how dpkg treats hardlinks. Does it break the hardlink before replacing a file or does it replace the file whatever its real nature is? IIRC dpkg preserves hardlinks inside a binary package but I don't see how it could do the same across multiple binary packages. Cheers, -- Raphaël Hertzog Contribuez à Debian et gagnez un cahier de l'admin Debian Lenny : http://www.ouaza.com/wp/2009/03/02/contribuer-a-debian-gagner-un-livre/ -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: dpkg and hardlinks
Raphael Hertzog wrote: On Tue, 24 Mar 2009, Jerome Warnier wrote: For files from packages, though, deduplication might be a good idea, as dpkg is supposedly the only one to ever modify the files (under /usr for example). I don't know however how dpkg treats hardlinks. Does it break the hardlink before replacing a file or does it replace the file whatever its real nature is? IIRC dpkg preserves hardlinks inside a binary package but I don't see how it could do the same across multiple binary packages. Oh, I didn't expect it to. I just wanted to know its behaviour when it upgrades a package. Before the upgrade, the file is a hardlink (because I hardlinked it manually), then it tries to upgrade the file/hardlink. Does it break the hardlink* before upgrading the file or does it overwrite the file/hardlink and all of its siblings? Cheers, * because it knows it is supposed to be a plain file, and it no longer is. -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: dpkg and hardlinks
On Tue, Mar 24, 2009 at 02:09:25PM +0100, Raphael Hertzog hert...@debian.org wrote: On Tue, 24 Mar 2009, Jerome Warnier wrote: For files from packages, though, deduplication might be a good idea, as dpkg is supposedly the only one to ever modify the files (under /usr for example). I don't know however how dpkg treats hardlinks. Does it break the hardlink before replacing a file or does it replace the file whatever its real nature is? IIRC dpkg preserves hardlinks inside a binary package but I don't see how it could do the same across multiple binary packages. I think the question is more something like: Package foo has file a Package bar had file b They are actually the same content, so the user hardlinks a and b. What happens when bar is updated with a different b file ? The answer, AFAIK, is that dpkg will do the right thing, namely, to replace the content of b, but not of a, because it actually doesn't put the content in b but rather in another file that it renames, eventually, to b. On the other hand, if package bar is updated with an unmodified b, the hardlink will be broken anyway, because dpkg does the above even when files are not modified. But I could be wrong on this one. Mike -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: dpkg and hardlinks
Jerome Warnier wrote: Raphael Hertzog wrote: On Tue, 24 Mar 2009, Jerome Warnier wrote: For files from packages, though, deduplication might be a good idea, as dpkg is supposedly the only one to ever modify the files (under /usr for example). I don't know however how dpkg treats hardlinks. Does it break the hardlink before replacing a file or does it replace the file whatever its real nature is? IIRC dpkg preserves hardlinks inside a binary package but I don't see how it could do the same across multiple binary packages. Oh, I didn't expect it to. I just wanted to know its behaviour when it upgrades a package. Before the upgrade, the file is a hardlink (because I hardlinked it manually), then it tries to upgrade the file/hardlink. Does it break the hardlink* before upgrading the file or does it overwrite the file/hardlink and all of its siblings? Do you really care? (not theoretically, but in normal use). I would expect that same content will be delivered: - by brother packages (same source), thus usually updated at the same time. - in documentation (so maybe not so important for your use). I think the most problem are in files outside dpkg control, i.e. /var and /etc. I'm just curious: do you have a list of same content files? maybe I'm completely wrong. ciao cate -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: dpkg and hardlinks
On 2009-03-24, Mike Hommey m...@glandium.org wrote: On Tue, Mar 24, 2009 at 02:09:25PM +0100, Raphael Hertzog hert...@debian.org wrote: On Tue, 24 Mar 2009, Jerome Warnier wrote: For files from packages, though, deduplication might be a good idea, as dpkg is supposedly the only one to ever modify the files (under /usr for example). I don't know however how dpkg treats hardlinks. Does it break the hardlink before replacing a file or does it replace the file whatever its real nature is? IIRC dpkg preserves hardlinks inside a binary package but I don't see how it could do the same across multiple binary packages. I think the question is more something like: Package foo has file a Package bar had file b They are actually the same content, so the user hardlinks a and b. What happens when bar is updated with a different b file ? The answer, AFAIK, is that dpkg will do the right thing, namely, to replace the content of b, but not of a, because it actually doesn't put the content in b but rather in another file that it renames, eventually, to b. iirc, dpkg starts with chmod'ing 0600 a file and then replacing it. But let us test: s...@gladstone:/var/tmp/user$ ln /usr/bin/sudo s...@gladstone:/var/tmp/user$ ls -l -rwsr-xr-x 3 root root 113916 2009-01-27 19:57 sudo s...@gladstone:/var/tmp/user$ sudo apt-get --reinstall install sudo [many lines of apt] s...@gladstone:/var/tmp/user$ ls -l -rw--- 1 root root 113916 2009-01-27 19:57 sudo for non-suid files, it is a bit different: s...@gladstone:/var/tmp/user$ ln /bin/ls s...@gladstone:/var/tmp/user$ ls -l ls -rwxr-xr-x 2 root root 100564 2009-02-22 23:40 ls s...@gladstone:/var/tmp/user$ sudo apt-get --reinstall install coreutils s...@gladstone:/var/tmp/user$ ls -l ls -rwxr-xr-x 1 root root 100564 2009-02-22 23:40 ls /Sune -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: dpkg and hardlinks
[Jerome Warnier] I don't know however how dpkg treats hardlinks. Does it break the hardlink before replacing a file or does it replace the file whatever its real nature is? You know, given the time it takes to type a 20-line email, including finding the appropriate Wikipedia article to link to, it would have been a lot faster to just try it. # ln /bin/ls /bin/ls2 # aptitude reinstall coreutils # ls -l /bin/ls /bin/ls2 -- Peter Samuelson | org-tld!p12n!peter | http://p12n.org/ -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: dpkg and hardlinks
Peter Samuelson wrote: [Jerome Warnier] I don't know however how dpkg treats hardlinks. Does it break the hardlink before replacing a file or does it replace the file whatever its real nature is? You know, given the time it takes to type a 20-line email, including finding the appropriate Wikipedia article to link to, it would have been a lot faster to just try it. # ln /bin/ls /bin/ls2 # aptitude reinstall coreutils # ls -l /bin/ls /bin/ls2 Maybe, but I also wanted to bring attention to it. ;-) Interesting subject, isn't it? -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: dpkg and hardlinks
Giacomo A. Catenazzi wrote: Jerome Warnier wrote: Raphael Hertzog wrote: On Tue, 24 Mar 2009, Jerome Warnier wrote: For files from packages, though, deduplication might be a good idea, as dpkg is supposedly the only one to ever modify the files (under /usr for example). I don't know however how dpkg treats hardlinks. Does it break the hardlink before replacing a file or does it replace the file whatever its real nature is? IIRC dpkg preserves hardlinks inside a binary package but I don't see how it could do the same across multiple binary packages. Oh, I didn't expect it to. I just wanted to know its behaviour when it upgrades a package. Before the upgrade, the file is a hardlink (because I hardlinked it manually), then it tries to upgrade the file/hardlink. Does it break the hardlink* before upgrading the file or does it overwrite the file/hardlink and all of its siblings? Do you really care? (not theoretically, but in normal use). I would expect that same content will be delivered: - by brother packages (same source), thus usually updated at the same time. - in documentation (so maybe not so important for your use). I think the most problem are in files outside dpkg control, i.e. /var and /etc. I'm just curious: do you have a list of same content files? maybe I'm completely wrong. Here you are, for /usr on a typical Lenny AMD64 server (generated with finddup -n from package perforate): http://glouglou.beeznest.org/~jwarnier/usr-duplicates.list.gz ciao cate -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: dpkg and hardlinks
In article 49c8dcdb.90...@beeznest.net you write: Before the upgrade, the file is a hardlink (because I hardlinked it manually), then it tries to upgrade the file/hardlink. Does it break the hardlink* before upgrading the file or does it overwrite the file/hardlink and all of its siblings? * because it knows it is supposed to be a plain file, and it no longer is. Your language suggests that you don't understand how hard links work. A hard link to a file *is* a plain file. See http://en.wikipedia.org/wiki/Hard_link for some explanation. -- Steve McIntyre, Cambridge, UK.st...@einval.com Armed with Valor: Centurion represents quality of Discipline, Honor, Integrity and Loyalty. Now you don't have to be a Caesar to concord the digital world while feeling safe and proud. -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: dpkg and hardlinks
On Tue, Mar 24, 2009 at 02:34:09PM +0100, Jerome Warnier jwarn...@beeznest.net wrote: Giacomo A. Catenazzi wrote: Jerome Warnier wrote: Raphael Hertzog wrote: On Tue, 24 Mar 2009, Jerome Warnier wrote: For files from packages, though, deduplication might be a good idea, as dpkg is supposedly the only one to ever modify the files (under /usr for example). I don't know however how dpkg treats hardlinks. Does it break the hardlink before replacing a file or does it replace the file whatever its real nature is? IIRC dpkg preserves hardlinks inside a binary package but I don't see how it could do the same across multiple binary packages. Oh, I didn't expect it to. I just wanted to know its behaviour when it upgrades a package. Before the upgrade, the file is a hardlink (because I hardlinked it manually), then it tries to upgrade the file/hardlink. Does it break the hardlink* before upgrading the file or does it overwrite the file/hardlink and all of its siblings? Do you really care? (not theoretically, but in normal use). I would expect that same content will be delivered: - by brother packages (same source), thus usually updated at the same time. - in documentation (so maybe not so important for your use). I think the most problem are in files outside dpkg control, i.e. /var and /etc. I'm just curious: do you have a list of same content files? maybe I'm completely wrong. Here you are, for /usr on a typical Lenny AMD64 server (generated with finddup -n from package perforate): http://glouglou.beeznest.org/~jwarnier/usr-duplicates.list.gz $ zcat usr-duplicates.list.gz | awk '{t+=$1*(NF-2)}END{print t}' 33142129 You would free 33MB. How big is your disk ? Is it worth bothering ? You can get much more free space than that by reducing the number of inodes supported by your filesystem: For instance, on my / fs, that contains /usr, and is only 3GB: Inode count: 384000 Free inodes: 314133 I will obviously never use that many inodes... Now, consider an inode is 128 bytes (or even 256 in some cases), and do some maths... Mike -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: dpkg and hardlinks
Steve McIntyre wrote: In article 49c8dcdb.90...@beeznest.net you write: Before the upgrade, the file is a hardlink (because I hardlinked it manually), then it tries to upgrade the file/hardlink. Does it break the hardlink* before upgrading the file or does it overwrite the file/hardlink and all of its siblings? * because it knows it is supposed to be a plain file, and it no longer is. Your language suggests that you don't understand how hard links work. A hard link to a file *is* a plain file. See http://en.wikipedia.org/wiki/Hard_link for some explanation. Of course I know what a hardlink is. I'm not native in English, and even tried to find the right words on the Net before writing, but I couldn't find better ones. The question here is: which one is the hardlink to the other? :-P -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: dpkg and hardlinks
Mike Hommey wrote: On Tue, Mar 24, 2009 at 02:34:09PM +0100, Jerome Warnier jwarn...@beeznest.net wrote: Giacomo A. Catenazzi wrote: Jerome Warnier wrote: Raphael Hertzog wrote: On Tue, 24 Mar 2009, Jerome Warnier wrote: For files from packages, though, deduplication might be a good idea, as dpkg is supposedly the only one to ever modify the files (under /usr for example). I don't know however how dpkg treats hardlinks. Does it break the hardlink before replacing a file or does it replace the file whatever its real nature is? IIRC dpkg preserves hardlinks inside a binary package but I don't see how it could do the same across multiple binary packages. Oh, I didn't expect it to. I just wanted to know its behaviour when it upgrades a package. Before the upgrade, the file is a hardlink (because I hardlinked it manually), then it tries to upgrade the file/hardlink. Does it break the hardlink* before upgrading the file or does it overwrite the file/hardlink and all of its siblings? Do you really care? (not theoretically, but in normal use). I would expect that same content will be delivered: - by brother packages (same source), thus usually updated at the same time. - in documentation (so maybe not so important for your use). I think the most problem are in files outside dpkg control, i.e. /var and /etc. I'm just curious: do you have a list of same content files? maybe I'm completely wrong. Here you are, for /usr on a typical Lenny AMD64 server (generated with finddup -n from package perforate): http://glouglou.beeznest.org/~jwarnier/usr-duplicates.list.gz $ zcat usr-duplicates.list.gz | awk '{t+=$1*(NF-2)}END{print t}' 33142129 You would free 33MB. How big is your disk ? Is it worth bothering ? I'm not an awk god, but isn't that supposed to just be the total size of the files it could take if deduplicated? In this case, it is not the size I would reclaim, as there are sometimes up to 4 copies of the same content. You can get much more free space than that by reducing the number of inodes supported by your filesystem: For instance, on my / fs, that contains /usr, and is only 3GB: Inode count: 384000 Free inodes: 314133 I will obviously never use that many inodes... Now, consider an inode is 128 bytes (or even 256 in some cases), and do some maths... Mike -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: dpkg and hardlinks
Jerome Warnier jwarn...@beeznest.net (Di 24 Mär 2009 14:58:35 CET): The question here is: which one is the hardlink to the other? :-P You can't distinguish hardlinks from each other - in the sense of original and link... They are just different directory entries referring to the same file system object. Best regards from Dresden/Germany Viele Grüße aus Dresden Heiko Schlittermann -- SCHLITTERMANN.de internet unix support - Heiko Schlittermann HS12-RIPE - gnupg encrypted messages are welcome - key ID: 48D0359B --- gnupg fingerprint: 3061 CFBF 2D88 F034 E8D2 7E92 EE4E AC98 48D0 359B - signature.asc Description: Digital signature
Re: dpkg and hardlinks
On Tue, Mar 24, 2009 at 03:11:17PM +0100, Jerome Warnier jwarn...@beeznest.net wrote: Mike Hommey wrote: On Tue, Mar 24, 2009 at 02:34:09PM +0100, Jerome Warnier jwarn...@beeznest.net wrote: Giacomo A. Catenazzi wrote: Jerome Warnier wrote: Raphael Hertzog wrote: On Tue, 24 Mar 2009, Jerome Warnier wrote: For files from packages, though, deduplication might be a good idea, as dpkg is supposedly the only one to ever modify the files (under /usr for example). I don't know however how dpkg treats hardlinks. Does it break the hardlink before replacing a file or does it replace the file whatever its real nature is? IIRC dpkg preserves hardlinks inside a binary package but I don't see how it could do the same across multiple binary packages. Oh, I didn't expect it to. I just wanted to know its behaviour when it upgrades a package. Before the upgrade, the file is a hardlink (because I hardlinked it manually), then it tries to upgrade the file/hardlink. Does it break the hardlink* before upgrading the file or does it overwrite the file/hardlink and all of its siblings? Do you really care? (not theoretically, but in normal use). I would expect that same content will be delivered: - by brother packages (same source), thus usually updated at the same time. - in documentation (so maybe not so important for your use). I think the most problem are in files outside dpkg control, i.e. /var and /etc. I'm just curious: do you have a list of same content files? maybe I'm completely wrong. Here you are, for /usr on a typical Lenny AMD64 server (generated with finddup -n from package perforate): http://glouglou.beeznest.org/~jwarnier/usr-duplicates.list.gz $ zcat usr-duplicates.list.gz | awk '{t+=$1*(NF-2)}END{print t}' 33142129 You would free 33MB. How big is your disk ? Is it worth bothering ? I'm not an awk god, but isn't that supposed to just be the total size of the files it could take if deduplicated? In this case, it is not the size I would reclaim, as there are sometimes up to 4 copies of the same content. the *(NF-2) part takes care of those copies. Mike -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: dpkg and hardlinks
Jerome Warnier wrote: Peter Samuelson wrote: [Jerome Warnier] I don't know however how dpkg treats hardlinks. Does it break the hardlink before replacing a file or does it replace the file whatever its real nature is? You know, given the time it takes to type a 20-line email, including finding the appropriate Wikipedia article to link to, it would have been a lot faster to just try it. # ln /bin/ls /bin/ls2 # aptitude reinstall coreutils # ls -l /bin/ls /bin/ls2 Maybe, but I also wanted to bring attention to it. ;-) I'm curious as to why no one is looking at the index node numbers themselves. [jh...@chao:/bin]% sudo ln ls ls2 [jh...@chao:/bin]% ls -il ls{,2} 7342643 -rwxr-xr-x 2 root root 101992 Apr 4 2008 ls 7342643 -rwxr-xr-x 2 root root 101992 Apr 4 2008 ls2 [jh...@chao:/bin]% sudo aptitude reinstall coreutils . . . [jh...@chao:/bin]% ls -il ls{,2} 7350701 -rwxr-xr-x 1 root root 101992 Apr 4 2008 ls 7342643 -rwxr-xr-x 1 root root 101992 Apr 4 2008 ls2 ls2 kept the old index node, but ls gets a brand new index node, thus showing that, indeed, dpkg will break hardlinks upon upgrade. -- John H. Robinson, IV jaq...@debian.org http WARNING: I cannot be held responsible for the above, sbih.org ( )(:[ as apparently my cats have learned how to type. spiders.html -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: dpkg and hardlinks
On Tue, 24 Mar 2009, John H. Robinson, IV wrote: I'm curious as to why no one is looking at the index node numbers themselves. Because the second field of ls -l is hardlink count and is enough alone to conclude: 7342643 -rwxr-xr-x 2 root root 101992 Apr 4 2008 ls ^ vs 7350701 -rwxr-xr-x 1 root root 101992 Apr 4 2008 ls ^ Cheers, -- Raphaël Hertzog Contribuez à Debian et gagnez un cahier de l'admin Debian Lenny : http://www.ouaza.com/wp/2009/03/02/contribuer-a-debian-gagner-un-livre/ -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org