On 9/28/24 21:06, Michael via PLUG-discuss wrote:
About a year ago I messed up by accidently copying a folder  with other
folders into another folder. I'm running out of room and need to find that
directory tree and get rid of it. All I know for certain is that it is
somewhere in my home directory. I THINK it is my pictures directory with
ARW files.
chatgpt told me to use fdupes but it told me to use an exclude option
(which I found out it doesn't have) to avoid config files (and I was
planning on adding to that as I discovered other stuff I didn't want). then
it told me to use find but I got an error which leads me to believe it
doesn't know what it's talking about!
coul;d someone help me out?

First, someone said you need to run updatedb before running find.  No, sorry, updatedb is for using locate, not find.  Find actively walks the directory tree.  Locate searches the text (I think) database built by updatedb.


Ok, now to answer the question.  I've got a similar situation, but in spades.  Every time I did a backup, I did an entire copy of everything, so I've got ... oh, 10, 20, 30 copies of many things. I'm working on scripts to help reduce that, but for now doing it somewhat manually, I suggest the following command:


cd (the directory of interest, possibly your home dir) ; find . -type f -print0 | xargs -0 md5sum | sort > list.of.files

this will create a list of files, sorted by their md5sum.  If you want to be lazy and not search that file for duplicate md5sums, consider uniq.  Like this:

uniq -c -d -w list.of.files


This will print the list of files which are duplicates.  For example, out of a list of 42,279 files in a certain directory on my computer, here's the result:

      2 73d249df037f6e63022e5cfa8d0c959b _files/from_ebay_pc/pics_and_such_from_work/phone_backup/try2_nonptp_or_whatever/Pictures/Screenshots/Screenshot_20160321-223138.png       5 9b162ac35214691461cc0f0104fb91ce _files/melissa/Documents/EPHESUS/Office Stuff/SPD/SPD SUMMER 2016 (1).pdf       3 b396af67f2cd75658397efd878a01fb8 _files/dads_zipdisks/2003-1/CLASS at VBC Sp-03/CLASS BKUP - Music Reading & Sight Singing Class/C  & D Major & Minor Scales & Chords.mct       2 cd83094e0c4aeb9128806b5168444578 _files/from_ebay_pc/pics_and_such_from_work/phone_backup/try2_nonptp_or_whatever/Pictures/Screenshots/Screenshot_20160318-222051.png       2 d1a5a1bec046cc85a3a3fd53a8d5be86 _files/from_ebay_pc/pics_and_such_from_work/phone_backup/try2_nonptp_or_whatever/Pictures/Screenshots/Screenshot_20160410-145331.png       2 fa681c54a2bd7cfa590ddb8cf6ca1cea _files/from_ebay_pc/pics_and_such_from_work/phone_backup/try2_nonptp_or_whatever/Pictures/Screenshots/Screenshot_20160312-113340.png

Originally the _files directory had MANY duplicates, now I've managed to get that down to the above list...

Anyway, there you go.  Happy scripting.

---------------------------------------------------
PLUG-discuss mailing list: PLUG-discuss@lists.phxlinux.org
To subscribe, unsubscribe, or to change your mail settings:
https://lists.phxlinux.org/mailman/listinfo/plug-discuss

Reply via email to