Re: How to locate all duplicate files?
I would recommend installing fdupes. It should be available in your package repository and it does exactly what you are looking to do. Brian Cluff On 04/21/2010 12:53 PM, j...@actionline.com wrote: What command syntax can I use to locate all duplicate files (filenames) on my system? Or, more specifically, within any specified directory on the system? Also, how can I tell which duplicates have identical contents and which duplicates have different content (or at least different file sizes)? --- PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us To subscribe, unsubscribe, or to change your mail settings: http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss --- PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us To subscribe, unsubscribe, or to change your mail settings: http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss
Re: How to locate all duplicate files?
Try: http://netdial.caribe.net/~adrian2/programs/fdupes.html Or: /tmp/MD5SUMs cd /path_1 find . -type f|sort|while read FI;do md5sum "$FI">>/tmp/MD5SUMs;done cd /path_2 md5sum -c /tmp/MD5SUMs|grep -v 'OK$' OR: cd /path_1 find . -type f|sort>/tmp/path_1.files cd /path_2 find . -type f|sort>/tmp/path_2.files diff /tmp/path_1.files /tmp/path_2.files YMMV ET PS: If you have any question you will get any answer. :) j...@actionline.com writes: What command syntax can I use to locate all duplicate files (filenames) on my system? Or, more specifically, within any specified directory on the system? Also, how can I tell which duplicates have identical contents and which duplicates have different content (or at least different file sizes)? --- PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us To subscribe, unsubscribe, or to change your mail settings: http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss --- PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us To subscribe, unsubscribe, or to change your mail settings: http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss
Re: How to locate all duplicate files?
OK, you have several questions... - First a simple script to find all duplicate filenames. problem is you need to get a list of all files on your system, then compare the names, minus the path. So I would try something like this (not fully tested): #/bin/bash find -P / -type f > /tmp/files.txt sed -i -e 's#.*/\(.*\)$#\1#' /tmp/files.txt sort /tmp/files{,1}.txt rm files.txt uniq -D /tmp/files{1,} rm files1.txt My logic: First get a list of all files ignoring symlinks (which are duplicate by definition) looking at only regular files. Next strip the path from the names in the temp file Now that you only have filenames, sort the list into a temp file Delete the original file Now, seek all duplicates, and place those names back into the original file Delete the second temp file Now you should have a list of all dup filenames - How can I tell if they are just duplicate filenames, or if they are actually duplicate files? for each filename, find all copies of the files with the find command, and run them through sha1sum like so: for x in $(find /tmp -name ); do sha1sum $x; done files with the same sha1sum, should have duplicate contents. You may need to check my syntax on some of this, but it should get the job done. Kevin Fries On Wed, Apr 21, 2010 at 1:53 PM, wrote: > > What command syntax can I use to locate all duplicate files (filenames) on > my system? Or, more specifically, within any specified directory on the > system? > > Also, how can I tell which duplicates have identical contents and which > duplicates have different content (or at least different file sizes)? > > > > --- > PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us > To subscribe, unsubscribe, or to change your mail settings: > http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss > --- PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us To subscribe, unsubscribe, or to change your mail settings: http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss
Re: How to locate all duplicate files? Thanks
Thanks Mike. Very helpful. > I don't have time to write the script right now, > but a something simple like: > > find . * | xargs ls -l | awk -F" " '{print $7 $9}' > > then man on sort and uniq. You could also just toss > the output into a spreadsheet. --- PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us To subscribe, unsubscribe, or to change your mail settings: http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss
Re: How to locate all duplicate files?
I don't have time to write the script right now, but a something simple like: find . * | xargs ls -l | awk -F" " '{print $7 $9}' then man on sort and uniq. You could also just toss the output into a spreadsheet. -Mike On Wed, Apr 21, 2010 at 3:53 PM, wrote: > > What command syntax can I use to locate all duplicate files (filenames) on > my system? Or, more specifically, within any specified directory on the > system? > > Also, how can I tell which duplicates have identical contents and which > duplicates have different content (or at least different file sizes)? > > > > --- > PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us > To subscribe, unsubscribe, or to change your mail settings: > http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss > --- PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us To subscribe, unsubscribe, or to change your mail settings: http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss