Re: How to locate all duplicate files?

2010-04-22 Thread Brian Cluff
I would recommend installing fdupes.  It should be available in your 
package repository and it does exactly what you are looking to do.


Brian Cluff

On 04/21/2010 12:53 PM, j...@actionline.com wrote:


What command syntax can I use to locate all duplicate files (filenames) on
my system?  Or, more specifically, within any specified directory on the
system?

Also, how can I tell which duplicates have identical contents and which
duplicates have different content (or at least different file sizes)?



---
PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us
To subscribe, unsubscribe, or to change your mail settings:
http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss



---
PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us
To subscribe, unsubscribe, or to change your mail settings:
http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss


Re: How to locate all duplicate files?

2010-04-21 Thread kitepi...@kitepilot.com

Try:
http://netdial.caribe.net/~adrian2/programs/fdupes.html 


Or:

/tmp/MD5SUMs

cd /path_1
find . -type f|sort|while read FI;do md5sum "$FI">>/tmp/MD5SUMs;done
cd /path_2
md5sum -c /tmp/MD5SUMs|grep -v 'OK$' 


OR:
cd /path_1
find . -type f|sort>/tmp/path_1.files
cd /path_2
find . -type f|sort>/tmp/path_2.files
diff /tmp/path_1.files /tmp/path_2.files
YMMV
ET 

PS: If you have any question you will get any answer.  :) 





j...@actionline.com writes: 



What command syntax can I use to locate all duplicate files (filenames) on
my system?  Or, more specifically, within any specified directory on the
system? 


Also, how can I tell which duplicates have identical contents and which
duplicates have different content (or at least different file sizes)? 

 


---
PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us
To subscribe, unsubscribe, or to change your mail settings:
http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss

---
PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us
To subscribe, unsubscribe, or to change your mail settings:
http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss


Re: How to locate all duplicate files?

2010-04-21 Thread Kevin Fries
OK, you have several questions...

- First a simple script to find all duplicate filenames.
problem is you need to get a list of all files on your system, then compare
the names, minus the path.  So I would try something like this (not fully
tested):

#/bin/bash

find -P / -type f > /tmp/files.txt
sed -i -e 's#.*/\(.*\)$#\1#' /tmp/files.txt
sort /tmp/files{,1}.txt
rm files.txt
uniq -D /tmp/files{1,}
rm files1.txt

My logic:
  First get a list of all files ignoring symlinks (which are duplicate by
definition) looking at only regular files.
  Next strip the path from the names in the temp file
  Now that you only have filenames, sort the list into a temp file
  Delete the original file
  Now, seek all duplicates, and place those names back into the original
file
  Delete the second temp file

Now you should have a list of all dup filenames

- How can I tell if they are just duplicate filenames, or if they are
actually duplicate files?
for each filename, find all copies of the files with the find command, and
run them through sha1sum like so:

for x in $(find /tmp -name ); do sha1sum $x; done

files with the same sha1sum, should have duplicate contents.

You may need to check my syntax on some of this, but it should get the job
done.

Kevin Fries
On Wed, Apr 21, 2010 at 1:53 PM,  wrote:

>
> What command syntax can I use to locate all duplicate files (filenames) on
> my system?  Or, more specifically, within any specified directory on the
> system?
>
> Also, how can I tell which duplicates have identical contents and which
> duplicates have different content (or at least different file sizes)?
>
>
>
> ---
> PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us
> To subscribe, unsubscribe, or to change your mail settings:
> http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss
>
---
PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us
To subscribe, unsubscribe, or to change your mail settings:
http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss

Re: How to locate all duplicate files? Thanks

2010-04-21 Thread joe

Thanks Mike.  Very helpful.


> I don't have time to write the script right now,
> but a something simple like:
>
>  find . * | xargs ls -l | awk -F" " '{print $7 $9}'
>
> then man on sort and uniq. You could also just toss
> the output into a spreadsheet.



---
PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us
To subscribe, unsubscribe, or to change your mail settings:
http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss


Re: How to locate all duplicate files?

2010-04-21 Thread Mike Ballon
I don't have time to write the script right now, but a something simple
like:

 find . * | xargs ls -l | awk -F" " '{print $7 $9}'

then man on sort and uniq. You could also just toss the output into a
spreadsheet.

-Mike

On Wed, Apr 21, 2010 at 3:53 PM,  wrote:

>
> What command syntax can I use to locate all duplicate files (filenames) on
> my system?  Or, more specifically, within any specified directory on the
> system?
>
> Also, how can I tell which duplicates have identical contents and which
> duplicates have different content (or at least different file sizes)?
>
>
>
> ---
> PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us
> To subscribe, unsubscribe, or to change your mail settings:
> http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss
>
---
PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us
To subscribe, unsubscribe, or to change your mail settings:
http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss