Marc-Andre Lemburg <m...@egenix.com> added the comment: Ezio Melotti wrote: > > Ezio Melotti <ezio.melo...@gmail.com> added the comment: > > Even if they are not valid they still "eat" all the 4/5/6 bytes, so they > should be fixed too. I haven't see anything about these bytes in chapter 3 so > far, but there are at least two possibilities: > 1) consider all the bytes in range F5-FD as invalid without looking for the > other bytes; > 2) try to read the next 4/5/6 bytes and fail if they are no continuation > bytes. > We can also look at what others do (e.g. browsers and other languages).
By marking those entries as 0 in the length table, they would only use one byte, however, compared to the current state, that would produce more replacement code points in the output, so perhaps applying the same logic as for the other sequences is a better strategy. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue8271> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com