There are lots of ways to decide if a file is non-text, but I don't know of any "standard" way. You can detect a file as not-ascii by simply searching for any character greater than 0x7f. But that doesn't handle a UTF-8 file, which is an 8bit text file representing Unicode.

The way I've seen done many times is to search for regular occurrence of the end-of-line character, and the lack of nulls. Most "binary" files will have more nulls than linefeeds, and any null could be considered a marker for a non-text file.

If you're happy with your particular perl script, probably it could be readily translated to Python.

ritu wrote:
Hi,

I'm wondering if Python has a utility to detect binary content in
files? Or if anyone has any ideas on how that can be accomplished? I
haven't been able to find any useful information to accomplish this
(my other option is to fire off a perl script from within m python
script that will tell me whether the file is binary), so any pointers
will be appreciated.

Thanks,
Ritu

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to