I should also add that the core reason that wc is slow and Python is fast is not that UTF-8 decoding in wc is slow, it is that the Python code is just counting characters, while wc is also maintaining a line width for --max-line-length. It doesn't really need to do this, and probably shouldn't do this, unless --max-line-length is specified.
Eric
