John Machin <sjmac...@users.sourceforge.net> added the comment:

@lemburg: "failing byte" seems rather obvious: first byte that you meet that is 
not valid in the current state. I don't understand your explanation, especially 
"does not have the high bit set". I think you mean "is a valid starter byte". 
See example 3 below.

Example 1: F1 80 41 42 43. F1 implies a 4-byte character. 80 is OK. 41 is not 
in 80-BF. It is the "failing byte"; high bit not set. Required action is to 
emit FFFD then resync on the 41, causing 0041 0042 0043 to be emitted. Total 
output: FFFD 0041 0042 0043. Current code emits FFFD 0043.

Example 2: F1 80 FF 42 43. F1 implies a 4-byte character. 80 is OK. FF is not 
in 80-BF. It is the "failing byte". Required action is to emit FFFD then resync 
on the FF. FF is not a valid starter byte, so emit FFFD, and resync on the 42, 
causing 0042 0043 to be emitted. Total output: FFFD FFFD 0042 0043. Current 
code emits FFFD 0043.

Example 3: F1 80 C2 81 43. F1 implies a 4-byte character. 80 is OK. C2 is not 
in 80-BF. It is the "failing byte". Required action is to emit FFFD then resync 
on the C2. C2 and 81 have the high bit set, but C2 is a valid starter byte, and 
remaining bytes are OK, causing 0081 0043 to be emitted. Total output: FFFD 
0081 0043. Current code emits FFFD 0043.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue8271>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to