On Fri, 12 Feb 2010 15:14:07 +0100, Christian Heimes wrote: > Lloyd Zusman wrote: >> .... The -T and -B switches work as follows. The first block or so >> .... of the file is examined for odd characters such as strange control >> .... codes or characters with the high bit set. If too many strange >> .... characters (>30%) are found, it's a -B file; otherwise it's a -T >> .... file. Also, any file containing null in the first block is .... >> considered a binary file. [ ... ] > > That's a butt ugly heuristic that will lead to lots of false positives > if your text happens to be UTF-16 encoded or non-english text UTF-8 > encoded.
And a hell of a lot of false negatives if the file is binary. The way I've always seen it, a file is binary if it contains a single binary character *anywhere* in the file. -- Steven -- http://mail.python.org/mailman/listinfo/python-list