Re: [Lynx-dev] lynx un-renders

2018-06-04 Thread Thorsten Glaser
Halaasz Saandor dixit:

> If "latin1 is exactly the first 256 codepoints of Unicode" then cp1252 
> is not a superset of Latin1: instead, instead of C1 characters cp1252 

I’ve never said otherwise.

However, I decided to voluntarily ignore that because
① C1 control characters have no business *at all*
  appearing in HTML, and
② it gives the best results with real-world existing
  broken software (I literally shuddered in disgust
  the first time I saw “” in a document).

WHATWG saw to it that HTML is not about correctness,
so that’s what we get.

bye,
//mirabilos
-- 
  "Using Lynx is like wearing a really good pair of shades: cuts out
   the glare and harmful UV (ultra-vanity), and you feel so-o-o COOL."
 -- Henry Nelson, March 1999

___
Lynx-dev mailing list
Lynx-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lynx-dev


Re: [Lynx-dev] lynx un-renders

2018-06-04 Thread Halaasz Saandor

2018/05/28 04:57 ... Thorsten Glaser:

Now the codepage 1252 is a superset of latin1. latin1
leaves 0x80‥0x9F for C1 control characters (and latin1
is exactly the first 256 codepoints of Unicode), while
cp1252 assigns stuff like € and “” inside that block.


I suppose the following is legalistic

If "latin1 is exactly the first 256 codepoints of Unicode" then cp1252 
is not a superset of Latin1: instead, instead of C1 characters cp1252 
has a jumble of graphics. If Latin1 is, say, ASCII with 96 more 
graphics, with a gap for C1, then indeed cp1252 is a superset of Latin1, 
but then it is not the first 256 codepoints of Unicode. I know not which 
outlook is actually taken.


O, how I miss Latin1! In Mozilla software it no longer appears. I now 
use only ASCII in web-pages that I write, symbols outside that range 
written with &...;.


___
Lynx-dev mailing list
Lynx-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lynx-dev


Re: [Lynx-dev] lynx un-renders

2018-05-27 Thread russellbell
Quoth Thorsten Glaser, 'can you please use hexadecimal
numbers' 
 Sorry!  lynx uses hex sometimes, dec others.  129 = 0x81

Quoth Thorsten Glaser, 'I've only ever seen those used by
windows codepage 1252 users'.
It shows up rarely.  I can't make sense why.  There are
definitely mistakes in other pages on the same sites.  I don't know
what codepage the authors use.  Since 'high octet preset' is an
instruction to formatters that, as far as I can tell, lynx doesn't
handle, I silence it.  I mention it in this forum in case others like
the idea.
From a Unicode doc:

'# PADDING CHARACTER and HIGH OCTET PRESET represent
# architectural concepts initially proposed for early
# drafts of ISO/IEC 10646-1. They were never actually
# approved or standardized: hence their designation
# here as the "figment" type. Formal name aliases
# (and corresponding abbreviations) for these code
# points are included here because these names leaked
# out from the draft documents and were published in
# at least one RFC whose names for code points was
# implemented in Perl regex expressions.'

russell bell

PS
dec hex Description
129 81  high octet preset
699 2bb 'commaturnedmod' or 'Modifier Turned Comma'
79961f3c'Greek Capital Letter Iota With Psili And Oxia'.
10133f5 'lunate epsilon'
863421ba'Counter-Clockwise Arrow'
8764223c'sim'
894322ef'Midline Horizontal Ellipsis'
939824b6'Circled Latin Capital Letter A'
967925cfcirclefilled or blackcircle
97642624'Caduceus or "Kerykeion"
8203200bzero-width space
78791ec7'Latin Small Letter E With Circumflex And Dot Below'


___
Lynx-dev mailing list
Lynx-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lynx-dev


Re: [Lynx-dev] lynx un-renders

2018-05-27 Thread Thorsten Glaser
Hi russellbell,

can you please use hexadecimal numbers in threads like this one,
so it is easier for others to find the mentioned character in the
Unicode database? Thanks in advance!

>I think it appears by accident every time I see it.

I’ve only ever seen those used by windows codepage 1252 users…
I usually render the C1 control characters thus:

/* 0x80 */  0x20AC, 0x278A, 0x201A, 0x0192, 0x201E, 0x2026, 0x2020, 0x2021,
/* 0x88 */  0x02C6, 0x2030, 0x0160, 0x2039, 0x0152, 0x278B, 0x017D, 0x278C,
/* 0x90 */  0x278D, 0x2018, 0x2019, 0x201C, 0x201D, 0x2022, 0x2013, 0x2014,
/* 0x98 */  0x02DC, 0x2122, 0x0161, 0x203A, 0x0153, 0x278E, 0x017E, 0x0178,

I’m using 0x278A‥0x278E for those undefined even in cp1252 so
that there’s at least some indication of what went wrong. C1
control characters generally should not be interpreted, or even
sent out as part of text files, so this is the safe way to do
it, and rendering it as if it were miscalculated cp1252 makes
most of its uses legible.

Note that this applies equally for undefined uses of all three
of: bare \x80 in the HTML, bare \xC2\xA0 (U+0080) in the HTML,
and use of entities like  and  in the HTML. These
all should be rendered the same as  for… acceptance
that the word “web designer” is a curse word denoting idiots.

bye,
//mirabilos
-- 
[...] if maybe ext3fs wasn't a better pick, or jfs, or maybe reiserfs, oh but
what about xfs, and if only i had waited until reiser4 was ready... in the be-
ginning, there was ffs, and in the middle, there was ffs, and at the end, there
was still ffs, and the sys admins knew it was good. :)  -- Ted Unangst über *fs

___
Lynx-dev mailing list
Lynx-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lynx-dev


[Lynx-dev] lynx un-renders

2018-05-27 Thread russellbell

'Hop over protocol' is an instruction to the renderer for
formatting.  I notice that def7_uni.tbl comments out its rendering as
'HO':  good call.  I make lynx render it as nothing.  I think it appears by
accident every time I see it.

russell bell


___
Lynx-dev mailing list
Lynx-dev@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lynx-dev