Re: Non-identical files with identical md5sums on Debian systems?
Dear Peter, Am Mittwoch, den 07.08.2013, 00:03 +0100 schrieb peter green: The bottom line is under practical conditions the only way you are going to see two files with the same md5 is if someone went out of their way to create them and send them to you. thank you very much for this insightful analysis, now I feel more confident :) Best regards, - Fabian -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/1375858850.24465.17.camel@kff50
re: Non-identical files with identical md5sums on Debian systems?
I do occasionally check for identical files on different systems by comparing their md5sums. So, just out of interest, could someone tell me (how to find out) how many non-identical files with identical md5sums there are there on a typical (say, amd64) Debian system? Assuming the output of md5 is random uncorrelated 128 bit binary numbers and making a couple of other approximations we can approximate the number with the formula. ((n*n-1)/2)/(2^128) Where n is the number of unique files on your system. I used the command cat /var/lib/dpkg/info/*.list | wc -l to get an approximation of the number of debian files on my main debian system with lots of stuff installed. I will assume all these files are unique. plugwash@debian:~$ cat /var/lib/dpkg/info/*.list | wc -l 304431 So the expected number of md5 collisions would be approximately ((304431*304430)/2)/(2^128) Plugging that into octave gives us an answer of octave:1 ((304431*304430)/2)/(2^128) ans = 1.3618e-28 The bottom line is under practical conditions the only way you are going to see two files with the same md5 is if someone went out of their way to create them and send them to you. -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/520180ac.90...@p10link.net
Re: Non-identical files with identical md5sums on Debian systems?
Hello, Russ Allbery r...@debian.org writes: Fabian Greffrath fab...@greffrath.com writes: I do occasionally check for identical files on different systems by comparing their md5sums. So, just out of interest, could someone tell me (how to find out) how many non-identical files with identical md5sums there are there on a typical (say, amd64) Debian system? Unless you have a collection of MD5 collision attacks, or have installed a package that includes a sample MD5 collision, the changes are quite good that the answer is zero. MD5 is no longer considered cryptographically strong, but that doesn't mean it's not a fairly random 128-bit hash. You need a *lot* of files before even the birthday paradox will give you much likelihood of an MD5 collision that wasn't intentionally constructed. exactly. And why don't you run a experiment, Fabian? I guess you have a typical Debian system at your hands and calculating the MD5 hashes of all distribution files burns only a few IOPs and CPU cycles ;). Regards hmw PS: Let us see the results ;) -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/8761vkljwm@luisa.c0t0d0s0.de
Re: Non-identical files with identical md5sums on Debian systems?
On Mon, Aug 05, 2013 at 06:44:49AM +0200, Fabian Greffrath wrote: Hi all, I do occasionally check for identical files on different systems by comparing their md5sums. So, just out of interest, could someone tell me (how to find out) how many non-identical files with identical md5sums there are there on a typical (say, amd64) Debian system? How about this? #!/bin/sh cat /var/lib/dpkg/info/*.md5sums | sort -u md5sums-files.txt awk '{print $1}' md5sums-files.txt | uniq -c | awk '$1 1 {print $2}' dup.txt while read md5; do grep ^$md5 md5sums-files.txt | sed -re 's/^[a-f0-9]+[[:space:]]+//' | ( read file shasum1=$(sha256sum $file | awk '{print $1}') while read file; do if [ $(sha256sum $file | awk '{print $1}') != $shasum1 ]; then echo $md5 $file fi done ) done dup.txt I tried running it, didn't find anything on my Ubuntu installation. -- Kind regards, Loong Jin signature.asc Description: Digital signature
Re: Non-identical files with identical md5sums on Debian systems?
On Sun, Aug 04, 2013 at 10:24:59PM -0700, Vincent Cheng wrote: On Sun, Aug 4, 2013 at 9:44 PM, Fabian Greffrath fab...@greffrath.com wrote: I do occasionally check for identical files on different systems by comparing their md5sums. So, just out of interest, could someone tell me (how to find out) how many non-identical files with identical md5sums there are there on a typical (say, amd64) Debian system? The closest thing to what you want may be dedup.debian.net, but I don't think it lets you filter out non-identical files. Indeed this task can be solved with the software backing dedup.debian.net. The general assumption is that sha512 is collision-free. I can give a rough idea on how to do that: 1) Obtain the software. 2) Modify schema.sql to add md5 to the functions table. 3) Modify importpkg.py to record md5 hashes. 4) Follow the steps in README to import a local Debian mirror. (This takes about 7 hours on a quick 8 core box and 3 days on a slower single core.) 5) Look for files, that have same md5 hash, but different sha512 hash. Something like this SQL query will give you an answer (untested). SELECT h1.cid, h2.cid FROM hash AS h1 JOIN hash AS h2 ON h1.fid = h2.fid AND h1.hash = h2.hash JOIN hash AS h3 ON h1.cid = h3.cid JOIN hash AS h4 ON h2.cid = h4.cid AND h3.fid = h4.fid JOIN function AS f1 ON h1.fid = f1.id JOIN function AS f3 ON h3.fid = f3.id WHERE h3.hash != h4.hash AND f1.name = 'md5' AND f3.name = 'sha512'; It gives keys into the content table to look up the actual filenames and packages. In case you have any questions, just ask (mail or #-qa on oftc). Helmut -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20130805084636.ga10...@alf.mars
Re: Non-identical files with identical md5sums on Debian systems?
On Sun, Aug 04, 2013 at 10:21:09PM -0700, Russ Allbery wrote: Fabian Greffrath fab...@greffrath.com writes: I do occasionally check for identical files on different systems by comparing their md5sums. So, just out of interest, could someone tell me (how to find out) how many non-identical files with identical md5sums there are there on a typical (say, amd64) Debian system? Unless you have a collection of MD5 collision attacks, or have installed a package that includes a sample MD5 collision, the changes are quite good that the answer is zero. MD5 is no longer considered cryptographically strong, but that doesn't mean it's not a fairly random 128-bit hash. You need a *lot* of files before even the birthday paradox will give you much likelihood of an MD5 collision that wasn't intentionally constructed. Let's assume every hard drive produced so far in human history is combined in a single RAID0 array, and formatted using a typical filesystem without an inode limit, then filled with small files. If my estimate is correct, thanks to the birthday paradox there's around 0.001% chance there will be at least one non-constructed MD5 collision. Also, there is no known preimage attack against MD5; collision attacks are quite less dangerous as the attacker would need to first give you a legitimate version of the file she wants to replace. -- ᛊᚨᚾᛁᛏᚣ᛫ᛁᛊ᛫ᚠᛟᚱ᛫ᚦᛖ᛫ᚹᛖᚨᚲ -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20130805100834.ga2...@angband.pl
Re: Non-identical files with identical md5sums on Debian systems?
Russ Allbery writes (Re: Non-identical files with identical md5sums on Debian systems?): Unless you have a collection of MD5 collision attacks, or have installed a package that includes a sample MD5 collision, [...] For the sake of sanity of our (still) MD5-based tools, I hope that no-one uploads into our archive a package with an example MD5 collision. (Unless the colliding files are wrapped up somehow, to protect our infrastructure from any untoward behaviour.) Ian. -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20991.42365.739458.834...@chiark.greenend.org.uk
Re: Non-identical files with identical md5sums on Debian systems?
On Mon, Aug 05, 2013 at 02:15:41PM +0100, Ian Jackson wrote: Russ Allbery writes (Re: Non-identical files with identical md5sums on Debian systems?): Unless you have a collection of MD5 collision attacks, or have installed a package that includes a sample MD5 collision, [...] For the sake of sanity of our (still) MD5-based tools, I hope that no-one uploads into our archive a package with an example MD5 collision. (Unless the colliding files are wrapped up somehow, to protect our infrastructure from any untoward behaviour.) What in our infrastructure would break on an MD5 collision anyway? The closest thing I could think of is dedup.debian.net, but that appears to use SHA512. -- Kind regards, Loong Jin signature.asc Description: Digital signature
Re: Non-identical files with identical md5sums on Debian systems?
Fabian Greffrath fab...@greffrath.com writes: I do occasionally check for identical files on different systems by comparing their md5sums. So, just out of interest, could someone tell me (how to find out) how many non-identical files with identical md5sums there are there on a typical (say, amd64) Debian system? Unless you have a collection of MD5 collision attacks, or have installed a package that includes a sample MD5 collision, the changes are quite good that the answer is zero. MD5 is no longer considered cryptographically strong, but that doesn't mean it's not a fairly random 128-bit hash. You need a *lot* of files before even the birthday paradox will give you much likelihood of an MD5 collision that wasn't intentionally constructed. -- Russ Allbery (r...@debian.org) http://www.eyrie.org/~eagle/ -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/87li4gogqi@windlord.stanford.edu
Re: Non-identical files with identical md5sums on Debian systems?
On Sun, Aug 4, 2013 at 9:44 PM, Fabian Greffrath fab...@greffrath.com wrote: Hi all, I do occasionally check for identical files on different systems by comparing their md5sums. So, just out of interest, could someone tell me (how to find out) how many non-identical files with identical md5sums there are there on a typical (say, amd64) Debian system? The closest thing to what you want may be dedup.debian.net, but I don't think it lets you filter out non-identical files. Regards, Vincent -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/caczd_tcqeftp3si47fzhgtfejf0zwz-ys6_kaaee2jvwnse...@mail.gmail.com