avilella wrote: > I would like to compare ~20 files that are mostly the same, but some > of them have 2-3 different lines in a couple of places. I can do a > diff for every pair, but I bould like to have one representation for > all files that is a consensus file then with extra tagged lines for > the differences. Is there any tool that does that? What would people > recommend?
I don't know of any tool that does that directly. And I think diff'ing every pair could generate a lot of messy output. What I tend to do in those types of situations is to run md5sum (or any of the *sum utilities) on the entire list of files. Then sort by the signature. Files that are identical will have identical signatures and will be grouped together. Files that are different will be listed apart from them. Also the 'uniq -c' utility can count and produce a count of identical. Sort can then be applied to this output and the files that have the most identical copies will be identified and files with fewer instances identified. $ md5sum ./* | sort -k1,1 118721e880107e6bac4d8b6f42c472d4 ./5 118721e880107e6bac4d8b6f42c472d4 ./6 29c450ee7a45cf7aa4e8ebe165925fd5 ./7 3e234925eeb1b48960dcbf43050f4b23 ./1 3e234925eeb1b48960dcbf43050f4b23 ./2 3e234925eeb1b48960dcbf43050f4b23 ./3 3e234925eeb1b48960dcbf43050f4b23 ./4 $ md5sum ./* | sort -k1,1 | awk '{print$1}' | uniq -c 2 118721e880107e6bac4d8b6f42c472d4 1 29c450ee7a45cf7aa4e8ebe165925fd5 4 3e234925eeb1b48960dcbf43050f4b23 $ md5sum ./* | sort -k1,1 | awk '{print$1}' | uniq -c | sort -nr 4 3e234925eeb1b48960dcbf43050f4b23 2 118721e880107e6bac4d8b6f42c472d4 1 29c450ee7a45cf7aa4e8ebe165925fd5 Perhaps something like that might be useful for you as well? Bob