On Fri, Aug 28, 2020 at 10:32 PM Richard Damon <rich...@damon-family.org> wrote:
>
> This might be one of the cases where Python 2's lack handling of string
> vs bytes was an advantage.
>
> If he was just scanning the message for specific ASCII strings, then not
> getting the full message decoded write is unlikely to have been causing
> problems.
>
> Python2 handled that sort of case quite easily. Python 3 on the other
> hand, will have issue converting the byte message to a string, since
> there isn't a single encoding that you could use for all of it all the
> time. This being 'fussier' does make sure that the program is handling
> all the text 'properly', and would be helpful if some of the patterns
> being checked for contained 'extended' (non-ASCII) characters.
>
> One possible solution in Python3 is to decode the byte string using an
> encoding that allows all 256 byte values, so it won't raise any encoding
> errors, just give your possibly non-sense characters for non-ASCII text.

Why? If you want to work with bytes, work with bytes. There's no
reason to decode in a meaningless way. Python 3 can handle the job of
searching a bytestring for ASCII text just fine.

Also, if you're parsing an email message, you can and should be doing
so with respect to the encoding(s) stipulated in the headers, after
which you will have valid Unicode text.

Please don't spread misinformation like this.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to