There are lots of ways to decide if a file is non-text, but I don't know
of any "standard" way. You can detect a file as not-ascii by simply
searching for any character greater than 0x7f. But that doesn't handle
a UTF-8 file, which is an 8bit text file representing Unicode.
The way I've seen done many times is to search for regular occurrence of
the end-of-line character, and the lack of nulls. Most "binary" files
will have more nulls than linefeeds, and any null could be considered a
marker for a non-text file.
If you're happy with your particular perl script, probably it could be
readily translated to Python.
ritu wrote:
Hi,
I'm wondering if Python has a utility to detect binary content in
files? Or if anyone has any ideas on how that can be accomplished? I
haven't been able to find any useful information to accomplish this
(my other option is to fire off a perl script from within m python
script that will tell me whether the file is binary), so any pointers
will be appreciated.
Thanks,
Ritu
--
http://mail.python.org/mailman/listinfo/python-list