>> Content-Encoding=Windows-1252 > I meant Charset, and I hadn't read the other replies.
> If it is the document character set I'm not sure how one should > interpret that for variable length codes. As a codepoint, rather than as a encoding octet, I would guess. Content-Type:'s charset= is actually two things. (It arguably shouldn't be, but since when has that made any difference to HTTP-family protocols?) It is a charset in the strict sense, a mapping from integer codepoints to abstract characters, and it is an encoding, a way of turning a stream of integer codepoints into a stream of octets. The latter really should be split out into a separate header; I speculate that that wasn't done because everyone used the trivial encoding for single-octet character sets, then added UTF-8, and nobody noticed that they were silently adding an encoding spec to the charset spec until after it got entrenched. I could argue it either way whether something like — should be "octet 151 for the encoding specified by charset=" or "codepoint 151 for the character set specified by charset=". I do strongly believe it is broken for it to be "Unicode codepoint 151" even if the charset= specifies something very non-Unicode like 8859-14 or KOI-8. If nothing else, it makes it completely impossible to represent non-single-octet codepoints when using a character set that is not a subset of Unicode. But what I believe doesn't matter.... /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTML mo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B _______________________________________________ Lynx-dev mailing list Lynx-dev@nongnu.org https://lists.nongnu.org/mailman/listinfo/lynx-dev