Re: [Lynx-dev] rendering (0x97)

2020-06-29 Thread Mouse
>> Content-Encoding=Windows-1252
> I meant Charset, and I hadn't read the other replies.

> If it is the document character set I'm not sure how one should
> interpret that for variable length codes.

As a codepoint, rather than as a encoding octet, I would guess.

Content-Type:'s charset= is actually two things.  (It arguably
shouldn't be, but since when has that made any difference to
HTTP-family protocols?)  It is a charset in the strict sense, a mapping
from integer codepoints to abstract characters, and it is an encoding,
a way of turning a stream of integer codepoints into a stream of
octets.  The latter really should be split out into a separate header;
I speculate that that wasn't done because everyone used the trivial
encoding for single-octet character sets, then added UTF-8, and nobody
noticed that they were silently adding an encoding spec to the charset
spec until after it got entrenched.

I could argue it either way whether something like  should be
"octet 151 for the encoding specified by charset=" or "codepoint 151
for the character set specified by charset=".  I do strongly believe
it is broken for it to be "Unicode codepoint 151" even if the charset=
specifies something very non-Unicode like 8859-14 or KOI-8.  If nothing
else, it makes it completely impossible to represent non-single-octet
codepoints when using a character set that is not a subset of Unicode.
But what I believe doesn't matter

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

___
Lynx-dev mailing list
Lynx-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lynx-dev


Re: [Lynx-dev] rendering (0x97)

2020-06-29 Thread Thorsten Glaser
David Woolley dixit:

> If it is the document character set I'm not sure how one should
> interpret that for variable length codes.

Right…

| 4.1 Character and Entity References
|
| [Definition: A character reference refers to a specific character in
| the ISO/IEC 10646 character set, for example one not directly
| accessible from available input devices.] Character Reference
|
| [66]CharRef::=

Re: [Lynx-dev] rendering (0x97)

2020-06-29 Thread David Woolley

On 29/06/2020 20:51, David Woolley wrote:
 Content-Encoding=Windows-1252

I meant Charset, and I hadn't read the other replies.

If it is the document character set I'm not sure how one should 
interpret that for variable length codes.


___
Lynx-dev mailing list
Lynx-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lynx-dev


Re: [Lynx-dev] rendering (0x97)

2020-06-29 Thread David Woolley

On 29/06/2020 19:07, Halaasz Saandor via Lynx-dev wrote:
What do you mean? The actual Unicode number is U+2014, or 8212, and 
 is simply cp1252 in disguise. I hav seen that, and , in 
Microsoft HTML from Word.


I mean that  sent with Content-Encoding=Windows-1252 is still 
interpreted as Unicode and therefore has no valid graphic.


___
Lynx-dev mailing list
Lynx-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lynx-dev


Re: [Lynx-dev] rendering (0x97)

2020-06-29 Thread Thorsten Glaser
Mouse dixit:

>I think the double-quoted text above is saying that  is defined
>to be not "codepoint 151 in the encoding specified by the
>Content-Type:" but rather "Unicode codepoint 151".
>
>Is that actually true?  I don't know; I'm not au courant enough with

No, but the document character set is Unicode in UTF-8 encoding.

In both XML and HTML, numeric (decimal or hexadecimal) entities
are in the document character set.

bye,
//mirabilos
-- 
Yay for having to rewrite other people's Bash scripts because bash
suddenly stopped supporting the bash extensions they make use of
-- Tonnerre Lombard in #nosec

___
Lynx-dev mailing list
Lynx-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lynx-dev


Re: [Lynx-dev] rendering (0x97)

2020-06-29 Thread Mouse
>> but if they are sending  over the wire, rather than the a byte
>> containing the value 151, the contents encoding wouldn't matter, as
>> entities are interpreted in Unicode,

> What do you mean?  The actual Unicode number is U+2014, or 8212, and
>  is simply cp1252 in disguise.

I think the double-quoted text above is saying that  is defined
to be not "codepoint 151 in the encoding specified by the
Content-Type:" but rather "Unicode codepoint 151".

Is that actually true?  I don't know; I'm not au courant enough with
Web specs to know where to look - I have as little to do with the Web
as I can get away with.

> I hav seen that, and , in Microsoft HTML from Word.

That means little.  Just because a Microsoft program generates
something does not mean it's compatible with non-Microsoft software,
and sometimes does not even mean it's compatible with other Microsoft
software, and certainly does not mean it's correct.

For example, I've seen mail generated by Microsoft tools with
codepoints in the 128-159 range, obviously intended to be printable
characters, but labeled as being 8859-1.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

___
Lynx-dev mailing list
Lynx-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lynx-dev


Re: [Lynx-dev] rendering (0x97)

2020-06-29 Thread Thorsten Glaser
Halaasz Saandor via Lynx-dev dixit:

>  is simply cp1252 in disguise

It’s not, 

Re: [Lynx-dev] rendering (0x97)

2020-06-29 Thread Halaasz Saandor via Lynx-dev

2020/06/28 18:28 ... David Woolley:
but if they are sending  over the wire, rather than the a byte 
containing the value 151, the contents encoding wouldn't matter, as 
entities are interpreted in Unicode,


What do you mean? The actual Unicode number is U+2014, or 8212, and 
 is simply cp1252 in disguise. I hav seen that, and , in 
Microsoft HTML from Word.


___
Lynx-dev mailing list
Lynx-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lynx-dev


Re: [Lynx-dev] rendering (0x97)

2020-06-29 Thread russellbell
Quoth David Woolley: 'Firefox on Debian also faults it:
'adventures '
Firefox from Slackware renders it as emdash.  2 of my
resources identify it as em dash.
Usually you-all ignore my character-rendering comments.  I
don't mind; I edit the source to my preferences.  I bring it up on
this list in case it helps someone else who wants to customize theirs.
nytimes.com encodes pages that existed before digitization
variously.  It suits me to accommodate their mistakes if it doesn't
conflict with another character.  I don't need 'C1 special code'.  I
suspect it's left over from the good old TTY days - ah polar relays! -
I can hear them now.  They used to be kept behind plexiglass screens
to dampen the noise.

russell bell

___
Lynx-dev mailing list
Lynx-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lynx-dev


[Lynx-dev] automatic redirect of some URLs

2020-06-29 Thread Fadi Barbàra
Hi everybody,

Is there any way to have an automatic redirect of some links in lynx?

For example: I would like to get that links like

reddit.com/*

Are redirected to links of the type

old.reddit.com/*

Ideally those links are stored in an external file.

Thanks,
disnocen

___
Lynx-dev mailing list
Lynx-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lynx-dev