On Mon, 2004-02-23 at 14:19, Axel IS Main wrote:
> Yes, and in fact that is what I am doing now. This is a spider bot 
> though, so I'm having to think of every single type of binary file that 
> could be linked to on the web. So far I'm up to 28 with no end in sight. 
> What about a .com file? I can't omit links that end in .com can I? That 
> would be counterproductive to say the least. Also, the function that 
> does the checking just keep getting longer and longer, which makes the 
> spider go slower and slower. Granted, the thing is pretty fast if it has 
> enough BW to work with, but still. This could eventually turn into a 
> script killer. Detecting whether the stream from file_get_contents(), or 
> fopen() for that matter, is binary or not and going with that result is 
> the elegant solution to this problem. There has to be a way to do it.

You could trying writing a script to check the first several bytes of
the file for control characters.  If the first 1kb is >= 20% (randomly
pulled from my head) control characters it's a safe bet it is a binary
file.  This is not 100% accurate, but it's something to play with that
doesn't rely on mime types or file extensions, both of which can easily
be inaccurate.

-- 
Adam Bregenzer
[EMAIL PROTECTED]
http://adam.bregenzer.net/

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to