Re: 'file' Command Giving False Positives
One thing I noticed about the file command's output might be useful: For the file in question, it says "MS-DOS executable (built-in)" For real Windows programs, it gives more information. One that I tried said "PE32 executable for MS Windows (GUI) Intel 80386 32-bit". I remember that some others have said "COFF" instead of "PE32". So maybe you could just assume that unless the file command is able to figure out what _kind_ of executable the file is, it's a false positive. It depends how likely you are to run into a really ancient DOS program (which would probably just get the generic description).___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: 'file' Command Giving False Positives
Tim Daneliuk writes: > At this point, I'm inclined to believe that 'file' alone is > insufficient to do this and, at best - even with more tools - > it's going to be a probabilities game - i.e. "What percentage > of false positives is acceptable?" file(1) is only intended to be a set of heuristics. It has a remarkably good set of heuristics at this point, but you're right that this cannot be solved simply by analyzing the contents of the files. For use in a system that you expect to scale, you will always be better off keeping meta-data in some other form (if you can, which is frequently not possible). If the whole data path is under your (customer's) control, it's not so hard; you can use file names, or put every file into a tar file along with a text file that indicates the data type, and on and on through as many approaches as you have the time to dream up. [If my examples are unclear, I can expand on them to make the point better.] This is made considerably worse by the fact that you've said that your files are encrypted. Some forms of encryption store some meta-data at a known place (like first) in the file, but generally this won't be the case. Now consider that there is a finite chance of running into a combination of cleartext, encryption, and password that you end up with an encrypted file that happens to have exactly the same contents as /bin/ls (it's vanishingly unlikely that this exact scenario would happen, but it's a good illustration of the problem). All of which is just agreeing with your suggestion that it's a "probabilities game" of reducing the error rate to acceptability; UNLESS you can control some other source of information. For an example of the latter, I have a backup file from this morning, named "be-well.100702._usr.l2.dump.gz.idea". If the files are coming in from the outside (untrustworthy input), you can't do this. One thing you *could* do in that case is use a custom magic(5) file for this application. You may well not care about input that really is an MS-DOS executable, so you can remove the patterns for all of them. Or AmigaOS, or laser printer firmware, or... Anyway, good luck. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: 'file' Command Giving False Positives
On 7/2/2010 1:42 PM, Polytropon wrote: On Fri, 02 Jul 2010 14:23:24 -0400, Lowell Gilbert wrote: Apparently, your memory is better than mine, because that was indeed what I was thinking of. Which leads to the question of why magic(5) lists LZ as representing "MS-DOS executable (built-in)". I'd be hesitant to change that unless we knew for sure it was wrong. As it has been mentioned before, .EXE is *one* of the formats executable in DOS. .COM executables do not have specific headers (as they are loaded directly). Also, .BAT are executable, allthough they are text files, and finally .BTM are also text file executables, specific to NDOS. As far as I also remember, there's .EXE on OS/2, too. One could argue if "Windows" .PIF are also executables. Of course, VMS also has .COM... but I see I'm making a digression... :-) Even if it _is_ wrong, the "problem" still remains for "MZ" at least: Any file starting with those letters is going to be identified as an MS-DOS executable, and there's no clear way to distinguish it from a text file that happens to start with those letters. Well, there's a solution that is not *that* complicated: If the file contains characters that don't match isprint(), i. e. those outside the ASCII set used in real text files, it's likely to be an executable. A scriptable solution might be to diff vs. `strings `. If they differ, it's not a text, so it might be an executable. I'm not sure if the magic identification string starting with MZ could be enlarged with other specific characters immediately following MZ that are *only* present in executables... The problem is that "MZ itself is completely sufficient: % echo "MZ"> foo % file foo foo: MS-DOS executable Of course, that's not correct. All noted (and appreciated). In this case, the client has a situation where none of the above will work: They can take in encrypted files that happen to have an MZ/LZ at the beginning but have binary data thereafter but are NOT executables. They want to properly flag executables but not get false positives. At this point, I'm inclined to believe that 'file' alone is insufficient to do this and, at best - even with more tools - it's going to be a probabilities game - i.e. "What percentage of false positives is acceptable?" -- Tim Daneliuk tun...@tundraware.com ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: 'file' Command Giving False Positives
On Fri, 02 Jul 2010 14:23:24 -0400, Lowell Gilbert wrote: > Apparently, your memory is better than mine, because that was indeed > what I was thinking of. Which leads to the question of why magic(5) > lists LZ as representing "MS-DOS executable (built-in)". I'd be > hesitant to change that unless we knew for sure it was wrong. As it has been mentioned before, .EXE is *one* of the formats executable in DOS. .COM executables do not have specific headers (as they are loaded directly). Also, .BAT are executable, allthough they are text files, and finally .BTM are also text file executables, specific to NDOS. As far as I also remember, there's .EXE on OS/2, too. One could argue if "Windows" .PIF are also executables. Of course, VMS also has .COM... but I see I'm making a digression... :-) > Even if it _is_ wrong, the "problem" still remains for "MZ" at least: > Any file starting with those letters is going to be identified as an > MS-DOS executable, and there's no clear way to distinguish it from a > text file that happens to start with those letters. Well, there's a solution that is not *that* complicated: If the file contains characters that don't match isprint(), i. e. those outside the ASCII set used in real text files, it's likely to be an executable. A scriptable solution might be to diff vs. `strings `. If they differ, it's not a text, so it might be an executable. I'm not sure if the magic identification string starting with MZ could be enlarged with other specific characters immediately following MZ that are *only* present in executables... The problem is that "MZ itself is completely sufficient: % echo "MZ" > foo % file foo foo: MS-DOS executable Of course, that's not correct. -- Polytropon Magdeburg, Germany Happy FreeBSD user since 4.0 Andra moi ennepe, Mousa, ... ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: 'file' Command Giving False Positives
Polytropon writes: > On Fri, 02 Jul 2010 11:25:20 -0400, Lowell Gilbert > wrote: >> Why is it incorrect? "LZ" as the first two bytes in a file is (unless >> my memory is badly mistaken) exactly what the old command.com looked for >> as the flag of an executable. > > If I ask *my* memory, it tells me that what you mean is "MZ". As > far as I remember, those are the initials of a programmer involved > with the creation of the DOS binary executable format. :-) Apparently, your memory is better than mine, because that was indeed what I was thinking of. Which leads to the question of why magic(5) lists LZ as representing "MS-DOS executable (built-in)". I'd be hesitant to change that unless we knew for sure it was wrong. Even if it _is_ wrong, the "problem" still remains for "MZ" at least: Any file starting with those letters is going to be identified as an MS-DOS executable, and there's no clear way to distinguish it from a text file that happens to start with those letters. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: 'file' Command Giving False Positives
On Fri, Jul 02, 2010 at 05:35:04PM +0200, Polytropon wrote: > On Fri, 02 Jul 2010 11:25:20 -0400, Lowell Gilbert > wrote: > > Why is it incorrect? "LZ" as the first two bytes in a file is (unless > > my memory is badly mistaken) exactly what the old command.com looked for > > as the flag of an executable. > > If I ask *my* memory, it tells me that what you mean is "MZ". As > far as I remember, those are the initials of a programmer involved > with the creation of the DOS binary executable format. :-) "MZ" is indeed what an MS-DOS style .EXE file should start with. For an MS-DOS .COM file there is no header or other metadata in the file so there is no good way of distinguishing it from any other binary file. -- Erik Trulsson ertr1...@student.uu.se ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: 'file' Command Giving False Positives
On 7/2/2010 10:35 AM, Polytropon wrote: On Fri, 02 Jul 2010 11:25:20 -0400, Lowell Gilbert wrote: Why is it incorrect? "LZ" as the first two bytes in a file is (unless my memory is badly mistaken) exactly what the old command.com looked for as the flag of an executable. If I ask *my* memory, it tells me that what you mean is "MZ". As far as I remember, those are the initials of a programmer involved with the creation of the DOS binary executable format. :-) Some OSs report both LZ and MZ as being DOS .exe, some only report LZ. Either way, when processing data files, there needs to be a deeper check to avoid the false positive. It may be that 'file' just isn't powerful enough to do this. -- Tim Daneliuk tun...@tundraware.com ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: 'file' Command Giving False Positives
On Fri, 02 Jul 2010 11:25:20 -0400, Lowell Gilbert wrote: > Why is it incorrect? "LZ" as the first two bytes in a file is (unless > my memory is badly mistaken) exactly what the old command.com looked for > as the flag of an executable. If I ask *my* memory, it tells me that what you mean is "MZ". As far as I remember, those are the initials of a programmer involved with the creation of the DOS binary executable format. :-) -- Polytropon Magdeburg, Germany Happy FreeBSD user since 4.0 Andra moi ennepe, Mousa, ... ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: 'file' Command Giving False Positives
Tim Daneliuk writes: > I have a data file with the content: > >LZasdadqjwjqwjqwjeqwe > > > 'file' (incorrectly) reports this as an MS-DOS executable. Why is it incorrect? "LZ" as the first two bytes in a file is (unless my memory is badly mistaken) exactly what the old command.com looked for as the flag of an executable. > Does anyone happen to know the proper changes to 'magic' that would > fix this? That would be tricky, given that MS-DOS *would*, in fact, think this file was a valid executable. I don't think the syntax of "magic" is powerful enough to distinguish this from a "real" executable. You might be able to do it by adding file(1) support for looking for invalid opcodes, but that would get hairy very quickly... ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: 'file' Command Giving False Positives
In the last episode (Jul 02), Tim Daneliuk said: > I have a data file with the content: > > LZasdadqjwjqwjqwjeqwe > > 'file' (incorrectly) reports this as an MS-DOS executable. I dunno; if I create a file "a.exe" on my XP system with those contents, I can run it from a cmd prompt, and it doesn't print any errors, so technically it is an MS-DOS executable :) > Does anyone happen to know the proper changes to 'magic' that would > fix this? Easiest fix would be to remove line 377 from /usr/src/contrib/file/Magdir/msdos and rebuild & reinstall /usr/src/lib/libmagic/ . -- Dan Nelson dnel...@allantgroup.com ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
'file' Command Giving False Positives
I have a data file with the content: LZasdadqjwjqwjqwjeqwe 'file' (incorrectly) reports this as an MS-DOS executable. Does anyone happen to know the proper changes to 'magic' that would fix this? Thanks, -- Tim Daneliuk tun...@tundraware.com ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"