Continuing to let you all know about developments, I do expect that many of you are facing a similar problem - trying to condense and preserve a lifetime of "collecting digital stuff".
The DFF utility has been very helpful, however once I started organizing my files, I realized that although there are a lot of duplication, much of it is was downloaded at different times and/or from different sites making much of it different, many vendors don't go out of their way to make file content/purpose obvious in the names, and many files are dependent on other files - so manually reorganizing the data is NOT always easy. The best solution I have come up with so far is to invent a new archive format designed to eliminate duplicate data but capable of recreating the entire original directory trees (or parts thereof). To that end I created the two utilities described below (now included in the web archive). -- yeah, I do seem to have a fair bit of spare time on my hands these days... ;=BDA - Build Dave's Archive ;=EDA - Extract Dave's Archive Dave's Archives contain the smallest possible representation of a complete directory tree: - Only one copy of the data for duplicate files is stored. - Duplicate filenames are stored only once. - Path information is stored only once per directory, and only additions to a path are stored (adding/removing from last path). eg: Starting with a large DIR of support files for one of my systems. This has duplicates and a lot of pre-compressed install files: 314 dirs, 930 files using 3,762,691,033 bytes. Just "ZIP"ing it I get: SysSupt.zip 3,352,081,951 bytes 7zip does a bit better: SysSupt.7z 3,245,871,362 bytes Running BDA, I get: SysSupt.DA1 9,404 bytes and SysSupt.DA2 1,912,855,711 bytes Big improvement, but no compression yet, using ZIP and 7zip I get: SysSupt.zip 1,636,965,417 bytes and SysSupt.7z 1,609,663,862 bytes And YES, using ZIP/7zip to extract the .DA's, then EDA gives me a directory with exactly the same content that I started with. Like my other tools, these can deal with BIG directory trees, and the output file format is well documented should you ever want to recover the files by other means. Sorry if I've not responded to messages here, tend not to follow the list directly much these days due to the high content, but you can always reach me through the link on my site - might take me a few days to respond, but I do get to it from time to time... Dave -- ---------------------------------------------------------------------------------- Personal site: http://dunfield.maknonsolutions.com ----------------------------------------------------------------------------------