Re: 'file' Command Giving False Positives

2010-07-03 Thread Andy Balholm
One thing I noticed about the file command's output might be useful:

For the file in question, it says "MS-DOS executable (built-in)"

For real Windows programs, it gives more information. One that I tried said 
"PE32 executable for MS Windows (GUI) Intel 80386 32-bit". I remember that some 
others have said "COFF" instead of "PE32". So maybe you could just assume that 
unless the file command is able to figure out what _kind_ of executable the 
file is, it's a false positive. It depends how likely you are to run into a 
really ancient DOS program (which would probably just get the generic 
description).___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: 'file' Command Giving False Positives

2010-07-02 Thread Lowell Gilbert
Tim Daneliuk  writes:

> At this point, I'm inclined to believe that 'file' alone is
> insufficient to do this and, at best - even with more tools -
> it's going to be a probabilities game - i.e. "What percentage
> of false positives is acceptable?"

file(1) is only intended to be a set of heuristics.  It has a remarkably
good set of heuristics at this point, but you're right that this cannot
be solved simply by analyzing the contents of the files.  For use in a
system that you expect to scale, you will always be better off keeping
meta-data in some other form (if you can, which is frequently not
possible).  If the whole data path is under your (customer's) control,
it's not so hard; you can use file names, or put every file into a tar
file along with a text file that indicates the data type, and on and on
through as many approaches as you have the time to dream up.  [If my
examples are unclear, I can expand on them to make the point better.]

This is made considerably worse by the fact that you've said that your
files are encrypted.  Some forms of encryption store some meta-data at a
known place (like first) in the file, but generally this won't be the
case.  Now consider that there is a finite chance of running into a
combination of cleartext, encryption, and password that you end up with
an encrypted file that happens to have exactly the same contents as
/bin/ls (it's vanishingly unlikely that this exact scenario would
happen, but it's a good illustration of the problem).  

All of which is just agreeing with your suggestion that it's a
"probabilities game" of reducing the error rate to acceptability; UNLESS
you can control some other source of information.  For an example of the
latter, I have a backup file from this morning, named
"be-well.100702._usr.l2.dump.gz.idea".  If the files are coming in from
the outside (untrustworthy input), you can't do this.  One thing you
*could* do in that case is use a custom magic(5) file for this
application.  You may well not care about input that really is an MS-DOS
executable, so you can remove the patterns for all of them.  Or AmigaOS,
or laser printer firmware, or...

Anyway, good luck.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: 'file' Command Giving False Positives

2010-07-02 Thread Tim Daneliuk

On 7/2/2010 1:42 PM, Polytropon wrote:

On Fri, 02 Jul 2010 14:23:24 -0400, Lowell 
Gilbert  wrote:

Apparently, your memory is better than mine, because that was indeed
what I was thinking of.  Which leads to the question of why magic(5)
lists LZ as representing "MS-DOS executable (built-in)".  I'd be
hesitant to change that unless we knew for sure it was wrong.


As it has been mentioned before, .EXE is *one* of the formats
executable in DOS. .COM executables do not have specific headers
(as they are loaded directly). Also, .BAT are executable, allthough
they are text files, and finally .BTM are also text file executables,
specific to NDOS. As far as I also remember, there's .EXE on OS/2,
too. One could argue if "Windows" .PIF are also executables. Of
course, VMS also has .COM... but I see I'm making a digression... :-)




Even if it _is_ wrong, the "problem" still remains for "MZ" at least:
Any file starting with those letters is going to be identified as an
MS-DOS executable, and there's no clear way to distinguish it from a
text file that happens to start with those letters.


Well, there's a solution that is not *that* complicated: If the
file contains characters that don't match isprint(), i. e. those
outside the ASCII set used in real text files, it's likely to be
an executable.

A scriptable solution might be to diff  vs. `strings
`. If they differ, it's not a text, so it might be an
executable.

I'm not sure if the magic identification string starting with MZ
could be enlarged with other specific characters immediately
following MZ that are *only* present in executables...

The problem is that "MZ itself is completely sufficient:

% echo "MZ">  foo
% file foo
foo: MS-DOS executable

Of course, that's not correct.




All noted (and appreciated).  In this case, the client has
a situation where none of the above will work:  They can
take in encrypted files that happen to have an MZ/LZ at the
beginning but have binary data thereafter but are NOT
executables.  They want to properly flag executables but
not get false positives.

At this point, I'm inclined to believe that 'file' alone is
insufficient to do this and, at best - even with more tools -
it's going to be a probabilities game - i.e. "What percentage
of false positives is acceptable?"


--

Tim Daneliuk
tun...@tundraware.com
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: 'file' Command Giving False Positives

2010-07-02 Thread Polytropon
On Fri, 02 Jul 2010 14:23:24 -0400, Lowell Gilbert 
 wrote:
> Apparently, your memory is better than mine, because that was indeed
> what I was thinking of.  Which leads to the question of why magic(5)
> lists LZ as representing "MS-DOS executable (built-in)".  I'd be
> hesitant to change that unless we knew for sure it was wrong.

As it has been mentioned before, .EXE is *one* of the formats
executable in DOS. .COM executables do not have specific headers
(as they are loaded directly). Also, .BAT are executable, allthough
they are text files, and finally .BTM are also text file executables,
specific to NDOS. As far as I also remember, there's .EXE on OS/2,
too. One could argue if "Windows" .PIF are also executables. Of
course, VMS also has .COM... but I see I'm making a digression... :-)



> Even if it _is_ wrong, the "problem" still remains for "MZ" at least:
> Any file starting with those letters is going to be identified as an
> MS-DOS executable, and there's no clear way to distinguish it from a
> text file that happens to start with those letters.

Well, there's a solution that is not *that* complicated: If the
file contains characters that don't match isprint(), i. e. those
outside the ASCII set used in real text files, it's likely to be
an executable.

A scriptable solution might be to diff  vs. `strings
`. If they differ, it's not a text, so it might be an
executable.

I'm not sure if the magic identification string starting with MZ
could be enlarged with other specific characters immediately
following MZ that are *only* present in executables...

The problem is that "MZ itself is completely sufficient:

% echo "MZ" > foo
% file foo
foo: MS-DOS executable

Of course, that's not correct.


-- 
Polytropon
Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: 'file' Command Giving False Positives

2010-07-02 Thread Lowell Gilbert
Polytropon  writes:

> On Fri, 02 Jul 2010 11:25:20 -0400, Lowell Gilbert 
>  wrote:
>> Why is it incorrect?  "LZ" as the first two bytes in a file is (unless
>> my memory is badly mistaken) exactly what the old command.com looked for
>> as the flag of an executable.
>
> If I ask *my* memory, it tells me that what you mean is "MZ". As
> far as I remember, those are the initials of a programmer involved
> with the creation of the DOS binary executable format. :-)

Apparently, your memory is better than mine, because that was indeed
what I was thinking of.  Which leads to the question of why magic(5)
lists LZ as representing "MS-DOS executable (built-in)".  I'd be
hesitant to change that unless we knew for sure it was wrong.

Even if it _is_ wrong, the "problem" still remains for "MZ" at least:
Any file starting with those letters is going to be identified as an
MS-DOS executable, and there's no clear way to distinguish it from a
text file that happens to start with those letters.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: 'file' Command Giving False Positives

2010-07-02 Thread Erik Trulsson
On Fri, Jul 02, 2010 at 05:35:04PM +0200, Polytropon wrote:
> On Fri, 02 Jul 2010 11:25:20 -0400, Lowell Gilbert 
>  wrote:
> > Why is it incorrect?  "LZ" as the first two bytes in a file is (unless
> > my memory is badly mistaken) exactly what the old command.com looked for
> > as the flag of an executable.
> 
> If I ask *my* memory, it tells me that what you mean is "MZ". As
> far as I remember, those are the initials of a programmer involved
> with the creation of the DOS binary executable format. :-)

"MZ" is indeed what an MS-DOS style .EXE file should start with.
For an MS-DOS .COM file there is no header or other metadata in the
file so there is no good way of distinguishing it from any other binary
file.



-- 

Erik Trulsson
ertr1...@student.uu.se
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: 'file' Command Giving False Positives

2010-07-02 Thread Tim Daneliuk

On 7/2/2010 10:35 AM, Polytropon wrote:

On Fri, 02 Jul 2010 11:25:20 -0400, Lowell 
Gilbert  wrote:

Why is it incorrect?  "LZ" as the first two bytes in a file is (unless
my memory is badly mistaken) exactly what the old command.com looked for
as the flag of an executable.


If I ask *my* memory, it tells me that what you mean is "MZ". As
far as I remember, those are the initials of a programmer involved
with the creation of the DOS binary executable format. :-)






Some OSs report both LZ and MZ as being DOS .exe, some only
report LZ.  Either way, when processing data files, there
needs to be a deeper check to avoid the false positive.
It may be that 'file' just isn't powerful enough to do this.

--

Tim Daneliuk
tun...@tundraware.com
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: 'file' Command Giving False Positives

2010-07-02 Thread Polytropon
On Fri, 02 Jul 2010 11:25:20 -0400, Lowell Gilbert 
 wrote:
> Why is it incorrect?  "LZ" as the first two bytes in a file is (unless
> my memory is badly mistaken) exactly what the old command.com looked for
> as the flag of an executable.

If I ask *my* memory, it tells me that what you mean is "MZ". As
far as I remember, those are the initials of a programmer involved
with the creation of the DOS binary executable format. :-)




-- 
Polytropon
Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: 'file' Command Giving False Positives

2010-07-02 Thread Lowell Gilbert
Tim Daneliuk  writes:

> I have a data file with the content:
>
>LZasdadqjwjqwjqwjeqwe
>
>
> 'file' (incorrectly) reports this as an MS-DOS executable.

Why is it incorrect?  "LZ" as the first two bytes in a file is (unless
my memory is badly mistaken) exactly what the old command.com looked for
as the flag of an executable.

> Does anyone happen to know the proper changes to 'magic' that would
> fix this?

That would be tricky, given that MS-DOS *would*, in fact, think this
file was a valid executable.  I don't think the syntax of "magic" is
powerful enough to distinguish this from a "real" executable.  You might
be able to do it by adding file(1) support for looking for invalid
opcodes, but that would get hairy very quickly...
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: 'file' Command Giving False Positives

2010-07-02 Thread Dan Nelson
In the last episode (Jul 02), Tim Daneliuk said:
> I have a data file with the content:
> 
> LZasdadqjwjqwjqwjeqwe
> 
> 'file' (incorrectly) reports this as an MS-DOS executable.

I dunno; if I create a file "a.exe" on my XP system with those contents, I
can run it from a cmd prompt, and it doesn't print any errors, so
technically it is an MS-DOS executable :)
 
> Does anyone happen to know the proper changes to 'magic' that would
> fix this?

Easiest fix would be to remove line 377 from
/usr/src/contrib/file/Magdir/msdos and rebuild & reinstall
/usr/src/lib/libmagic/ .  

-- 
Dan Nelson
dnel...@allantgroup.com
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


'file' Command Giving False Positives

2010-07-02 Thread Tim Daneliuk

I have a data file with the content:

   LZasdadqjwjqwjqwjeqwe


'file' (incorrectly) reports this as an MS-DOS executable.

Does anyone happen to know the proper changes to 'magic' that would
fix this?

Thanks,
--

Tim Daneliuk
tun...@tundraware.com
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"