Re: [Lynx-dev] rendering — (0x97)

Mouse Mon, 29 Jun 2020 13:28:58 -0700

>> Content-Encoding=Windows-1252
> I meant Charset, and I hadn't read the other replies.


> If it is the document character set I'm not sure how one should
> interpret that for variable length codes.

As a codepoint, rather than as a encoding octet, I would guess.

Content-Type:'s charset= is actually two things.  (It arguably
shouldn't be, but since when has that made any difference to
HTTP-family protocols?)  It is a charset in the strict sense, a mapping
from integer codepoints to abstract characters, and it is an encoding,
a way of turning a stream of integer codepoints into a stream of
octets.  The latter really should be split out into a separate header;
I speculate that that wasn't done because everyone used the trivial
encoding for single-octet character sets, then added UTF-8, and nobody
noticed that they were silently adding an encoding spec to the charset
spec until after it got entrenched.

I could argue it either way whether something like &#151; should be
"octet 151 for the encoding specified by charset=" or "codepoint 151
for the character set specified by charset=".  I do strongly believe
it is broken for it to be "Unicode codepoint 151" even if the charset=
specifies something very non-Unicode like 8859-14 or KOI-8.  If nothing
else, it makes it completely impossible to represent non-single-octet
codepoints when using a character set that is not a subset of Unicode.
But what I believe doesn't matter....

/~\ The ASCII                             Mouse
\ / Ribbon Campaign
 X  Against HTML                mo...@rodents-montreal.org
/ \ Email!           7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

_______________________________________________
Lynx-dev mailing list
Lynx-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lynx-dev

Re: [Lynx-dev] rendering — (0x97)

Reply via email to