Hi,

I use a similar Algorithm for optimizing document storage.
Pretty simple actually:
just troll through all directories recursively and store each file in a record.
You just need the path and the file hash which you can create with

DOCUMENT TO BLOB($t_DocPath;$x_Content)
$t_FileHash:=Generate digest($x_Content;MD5 digest)
SET BLOB SIZE($x_Content;0)

Then just check for unique hashes and voila!
Using the hash will also find identical files that have different filenames.
The chances of 2 different files generating the same hash are so close to 0 
they are for all practivcal reasons 0.
Now write something that moves unigue data somewehere else or deletes 
duplicates.

Whole thing is quickly written, I guess some 100 lines of code.
120 with progress bars :-)

As for running it, well that will take some time, don’t do it on you main work 
machine, it might be tied up for a while.

Hope that helped.

Cheers
Alex

> Am 14.03.2017 um 07:56 schrieb Robert ListMail via 4D_Tech 
> <4d_tech@lists.4d.com>:
> 
> I need a utility that can scan a backup drive (or index) and identify what’s 
> unique to the backup volume without expecting identical pathnames on the 
> other drives... So, the routine would have to query (effectively a Finder 
> Search for each file) all specified drives looking for each file and 
> reporting those that are missing... Basically, I need to know which data on 
> this given backup drive is truly unique and therefore potentially valuable.
> 
> Might there be a 4D solution?  Have you dealt with large directories or many 
> directories from the file system? If there is a utility already built I’m 
> open to that as well.
> 
> Thanks,
> 
> Robert
> **********************************************************************
> 4D Internet Users Group (4D iNUG)
> FAQ:  http://lists.4d.com/faqnug.html
> Archive:  http://lists.4d.com/archives.html
> Options: http://lists.4d.com/mailman/options/4d_tech
> Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
> **********************************************************************

**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**********************************************************************

Reply via email to