Re: [Lynx-dev] lynx un-renders
Halaasz Saandor dixit: > If "latin1 is exactly the first 256 codepoints of Unicode" then cp1252 > is not a superset of Latin1: instead, instead of C1 characters cp1252 I’ve never said otherwise. However, I decided to voluntarily ignore that because ① C1 control characters have no business *at all* appearing in HTML, and ② it gives the best results with real-world existing broken software (I literally shuddered in disgust the first time I saw “” in a document). WHATWG saw to it that HTML is not about correctness, so that’s what we get. bye, //mirabilos -- "Using Lynx is like wearing a really good pair of shades: cuts out the glare and harmful UV (ultra-vanity), and you feel so-o-o COOL." -- Henry Nelson, March 1999 ___ Lynx-dev mailing list Lynx-dev@nongnu.org https://lists.nongnu.org/mailman/listinfo/lynx-dev
Re: [Lynx-dev] lynx un-renders
2018/05/28 04:57 ... Thorsten Glaser: Now the codepage 1252 is a superset of latin1. latin1 leaves 0x80‥0x9F for C1 control characters (and latin1 is exactly the first 256 codepoints of Unicode), while cp1252 assigns stuff like € and “” inside that block. I suppose the following is legalistic If "latin1 is exactly the first 256 codepoints of Unicode" then cp1252 is not a superset of Latin1: instead, instead of C1 characters cp1252 has a jumble of graphics. If Latin1 is, say, ASCII with 96 more graphics, with a gap for C1, then indeed cp1252 is a superset of Latin1, but then it is not the first 256 codepoints of Unicode. I know not which outlook is actually taken. O, how I miss Latin1! In Mozilla software it no longer appears. I now use only ASCII in web-pages that I write, symbols outside that range written with &...;. ___ Lynx-dev mailing list Lynx-dev@nongnu.org https://lists.nongnu.org/mailman/listinfo/lynx-dev
Re: [Lynx-dev] lynx un-renders
Quoth Thorsten Glaser, 'can you please use hexadecimal numbers' Sorry! lynx uses hex sometimes, dec others. 129 = 0x81 Quoth Thorsten Glaser, 'I've only ever seen those used by windows codepage 1252 users'. It shows up rarely. I can't make sense why. There are definitely mistakes in other pages on the same sites. I don't know what codepage the authors use. Since 'high octet preset' is an instruction to formatters that, as far as I can tell, lynx doesn't handle, I silence it. I mention it in this forum in case others like the idea. From a Unicode doc: '# PADDING CHARACTER and HIGH OCTET PRESET represent # architectural concepts initially proposed for early # drafts of ISO/IEC 10646-1. They were never actually # approved or standardized: hence their designation # here as the "figment" type. Formal name aliases # (and corresponding abbreviations) for these code # points are included here because these names leaked # out from the draft documents and were published in # at least one RFC whose names for code points was # implemented in Perl regex expressions.' russell bell PS dec hex Description 129 81 high octet preset 699 2bb 'commaturnedmod' or 'Modifier Turned Comma' 79961f3c'Greek Capital Letter Iota With Psili And Oxia'. 10133f5 'lunate epsilon' 863421ba'Counter-Clockwise Arrow' 8764223c'sim' 894322ef'Midline Horizontal Ellipsis' 939824b6'Circled Latin Capital Letter A' 967925cfcirclefilled or blackcircle 97642624'Caduceus or "Kerykeion" 8203200bzero-width space 78791ec7'Latin Small Letter E With Circumflex And Dot Below' ___ Lynx-dev mailing list Lynx-dev@nongnu.org https://lists.nongnu.org/mailman/listinfo/lynx-dev
Re: [Lynx-dev] lynx un-renders
Hi russellbell, can you please use hexadecimal numbers in threads like this one, so it is easier for others to find the mentioned character in the Unicode database? Thanks in advance! >I think it appears by accident every time I see it. I’ve only ever seen those used by windows codepage 1252 users… I usually render the C1 control characters thus: /* 0x80 */ 0x20AC, 0x278A, 0x201A, 0x0192, 0x201E, 0x2026, 0x2020, 0x2021, /* 0x88 */ 0x02C6, 0x2030, 0x0160, 0x2039, 0x0152, 0x278B, 0x017D, 0x278C, /* 0x90 */ 0x278D, 0x2018, 0x2019, 0x201C, 0x201D, 0x2022, 0x2013, 0x2014, /* 0x98 */ 0x02DC, 0x2122, 0x0161, 0x203A, 0x0153, 0x278E, 0x017E, 0x0178, I’m using 0x278A‥0x278E for those undefined even in cp1252 so that there’s at least some indication of what went wrong. C1 control characters generally should not be interpreted, or even sent out as part of text files, so this is the safe way to do it, and rendering it as if it were miscalculated cp1252 makes most of its uses legible. Note that this applies equally for undefined uses of all three of: bare \x80 in the HTML, bare \xC2\xA0 (U+0080) in the HTML, and use of entities like and in the HTML. These all should be rendered the same as for… acceptance that the word “web designer” is a curse word denoting idiots. bye, //mirabilos -- [...] if maybe ext3fs wasn't a better pick, or jfs, or maybe reiserfs, oh but what about xfs, and if only i had waited until reiser4 was ready... in the be- ginning, there was ffs, and in the middle, there was ffs, and at the end, there was still ffs, and the sys admins knew it was good. :) -- Ted Unangst über *fs ___ Lynx-dev mailing list Lynx-dev@nongnu.org https://lists.nongnu.org/mailman/listinfo/lynx-dev
[Lynx-dev] lynx un-renders
'Hop over protocol' is an instruction to the renderer for formatting. I notice that def7_uni.tbl comments out its rendering as 'HO': good call. I make lynx render it as nothing. I think it appears by accident every time I see it. russell bell ___ Lynx-dev mailing list Lynx-dev@nongnu.org https://lists.nongnu.org/mailman/listinfo/lynx-dev