Tim Daneliuk <tun...@tundraware.com> writes: > At this point, I'm inclined to believe that 'file' alone is > insufficient to do this and, at best - even with more tools - > it's going to be a probabilities game - i.e. "What percentage > of false positives is acceptable?"
file(1) is only intended to be a set of heuristics. It has a remarkably good set of heuristics at this point, but you're right that this cannot be solved simply by analyzing the contents of the files. For use in a system that you expect to scale, you will always be better off keeping meta-data in some other form (if you can, which is frequently not possible). If the whole data path is under your (customer's) control, it's not so hard; you can use file names, or put every file into a tar file along with a text file that indicates the data type, and on and on through as many approaches as you have the time to dream up. [If my examples are unclear, I can expand on them to make the point better.] This is made considerably worse by the fact that you've said that your files are encrypted. Some forms of encryption store some meta-data at a known place (like first) in the file, but generally this won't be the case. Now consider that there is a finite chance of running into a combination of cleartext, encryption, and password that you end up with an encrypted file that happens to have exactly the same contents as /bin/ls (it's vanishingly unlikely that this exact scenario would happen, but it's a good illustration of the problem). All of which is just agreeing with your suggestion that it's a "probabilities game" of reducing the error rate to acceptability; UNLESS you can control some other source of information. For an example of the latter, I have a backup file from this morning, named "be-well.100702._usr.l2.dump.gz.idea". If the files are coming in from the outside (untrustworthy input), you can't do this. One thing you *could* do in that case is use a custom magic(5) file for this application. You may well not care about input that really is an MS-DOS executable, so you can remove the patterns for all of them. Or AmigaOS, or laser printer firmware, or... Anyway, good luck. _______________________________________________ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"