Re: [Dorset] help in sorting files for duplicates

Ralph Corderoy Thu, 23 Jul 2009 10:50:40 -0700

Hi Peter,

> My thoughts were to do an 'Ls -something' piped into a file, then
> perhaps if I could do a sort on that from the end of each line, I
> would end up with duplicates adjacent, which I could then investigate
> and clean up as required.


What do you mean by `duplicate'?  If you mean you want to group all
files called `foo' together, regardless of their possible differing size
or content then

    find foo bar -type f -printf '%h %f\n' |
    rev | sort | rev |
    uniq -f 1 -D

will list the directory path and filename for all files under `foo' and
`bar' that occur more than once by name, e.g. README is a prime
contender.

It won't work well with paths or filenames with spaces in or other weird
characters, but then that's why you shouldn't have them.  :-)

On the other hand, if you want to find files that almost certainly have
the same content regardless of their file name then

    find foo bar -type f -print0 |
    xargs -r0 sha1sum |
    sort | uniq -D -w 40

lists those.  Note, it won't realise that two files may be hard or
symbolically linked together.

Cheers,


Ralph.


-- 
Next meeting: Bournemouth, Wednesday 2009-08-05 20:00
Dorset LUG: http://dorset.lug.org.uk/
Chat: http://www.mibbit.com/?server=irc.blitzed.org&channel=%23dorset
List info: https://mailman.lug.org.uk/mailman/listinfo/dorset

Re: [Dorset] help in sorting files for duplicates

Reply via email to