On Monday 23 February 2004 11:55 am, Richard Davey wrote:
> Hello Axel,
>
> Monday, February 23, 2004, 7:38:25 PM, you wrote:
>
> AIM> Thanks, you just gave me the solution, I think. I don't have to strip
> AIM> out every character above standard ascii, I just have to look for
> them. AIM> If one is there, then just get rid of it. It's true that an OS
> can't AIM> tell the difference between a jpg and an exe file, but that's to
> be AIM> expected. But the file_get_contents() function DOES open the file.
> Since AIM> there is a definite difference between a text file and a binary
> file, it AIM> should be able to detect that.
>
> The difference isn't as obvious as you might think. Opening a binary
> file into a hex editor will show you this. Your brain can determine if
> the codes in-front of you are "English" or not, but from a pure logic
> point of view that's a little harder.
>
> Also bear in mind that on Unix ALL files are binary files. It is up to
> you to determine the type of the file contents as you see fit. For
> example you can check for line-terminated data.
>
> It would be wise to check for characters from 0 to 31, if they appear
> then it's almost certainly (but not guaranteed) binary.

Assuming that's decimal, you're including 0x09 0x0a and 0x0d which are, 
respectively, tab, line feed, and carriage return. That's off the top of my 
head, which means two things: (1) i may be forgetting something, and (2) I 
need a life ;)

I'm not up to speed on this thread, but perhaps you could (ab)use some 
techniques from natural language processing? May be overkill, though ;)

>
> --
> Best regards,
>  Richard Davey
>  http://www.phpcommunity.org/wiki/296.html

-- 
Evan Nemerson
[EMAIL PROTECTED]
http://coeusgroup.com/en

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to