Re: [Lynx-dev] lynx un-renders
Halaasz Saandor dixit: > If "latin1 is exactly the first 256 codepoints of Unicode" then cp1252 > is not a superset of Latin1: instead, instead of C1 characters cp1252 I’ve never said otherwise. However, I decided to voluntarily ignore that because ① C1 control characters have no business *at all* appearing in HTML, and ② it gives the best results with real-world existing broken software (I literally shuddered in disgust the first time I saw ““” in a document). WHATWG saw to it that HTML is not about correctness, so that’s what we get. bye, //mirabilos -- "Using Lynx is like wearing a really good pair of shades: cuts out the glare and harmful UV (ultra-vanity), and you feel so-o-o COOL." -- Henry Nelson, March 1999 ___ Lynx-dev mailing list Lynx-dev@nongnu.org https://lists.nongnu.org/mailman/listinfo/lynx-dev
Re: [Lynx-dev] lynx un-renders
2018/05/28 04:57 ... Thorsten Glaser: Now the codepage 1252 is a superset of latin1. latin1 leaves 0x80‥0x9F for C1 control characters (and latin1 is exactly the first 256 codepoints of Unicode), while cp1252 assigns stuff like € and “” inside that block. I suppose the following is legalistic If "latin1 is exactly the first 256 codepoints of Unicode" then cp1252 is not a superset of Latin1: instead, instead of C1 characters cp1252 has a jumble of graphics. If Latin1 is, say, ASCII with 96 more graphics, with a gap for C1, then indeed cp1252 is a superset of Latin1, but then it is not the first 256 codepoints of Unicode. I know not which outlook is actually taken. O, how I miss Latin1! In Mozilla software it no longer appears. I now use only ASCII in web-pages that I write, symbols outside that range written with &...;. ___ Lynx-dev mailing list Lynx-dev@nongnu.org https://lists.nongnu.org/mailman/listinfo/lynx-dev
Re: [Lynx-dev] lynx un-renders
russellb...@gmail.com dixit: > It shows up rarely. I can't make sense why. There are I can: it’s often produced by Microsoft users who save in a legacy codepage encoding, then convert from latin1 to Unicode. Now the codepage 1252 is a superset of latin1. latin1 leaves 0x80‥0x9F for C1 control characters (and latin1 is exactly the first 256 codepoints of Unicode), while cp1252 assigns stuff like € and “” inside that block. So, basically, a mild cause of Mojibake. But since C1 control characters have no business of existing inside an HTML document, I’d parse this to dissolve that, i.e. as misconverted cp1252, instead. bye, //mirabilos -- > emacs als auch vi zum Kotzen finde (joe rules) und pine für den einzig > bedienbaren textmode-mailclient halte (und ich hab sie alle ausprobiert). ;) Hallo, ich bin der Holger ("Hallo Holger!"), und ich bin ebenfalls ... pine-User, und das auch noch gewohnheitsmäßig ("Oooohhh"). [aus dasr] ___ Lynx-dev mailing list Lynx-dev@nongnu.org https://lists.nongnu.org/mailman/listinfo/lynx-dev
Re: [Lynx-dev] lynx un-renders
Quoth Thorsten Glaser, 'can you please use hexadecimal numbers' Sorry! lynx uses hex sometimes, dec others. 129 = 0x81 Quoth Thorsten Glaser, 'I've only ever seen those used by windows codepage 1252 users'. It shows up rarely. I can't make sense why. There are definitely mistakes in other pages on the same sites. I don't know what codepage the authors use. Since 'high octet preset' is an instruction to formatters that, as far as I can tell, lynx doesn't handle, I silence it. I mention it in this forum in case others like the idea. From a Unicode doc: '# PADDING CHARACTER and HIGH OCTET PRESET represent # architectural concepts initially proposed for early # drafts of ISO/IEC 10646-1. They were never actually # approved or standardized: hence their designation # here as the "figment" type. Formal name aliases # (and corresponding abbreviations) for these code # points are included here because these names leaked # out from the draft documents and were published in # at least one RFC whose names for code points was # implemented in Perl regex expressions.' russell bell PS dec hex Description 129 81 high octet preset 699 2bb 'commaturnedmod' or 'Modifier Turned Comma' 79961f3c'Greek Capital Letter Iota With Psili And Oxia'. 10133f5 'lunate epsilon' 863421ba'Counter-Clockwise Arrow' 8764223c'sim' 894322ef'Midline Horizontal Ellipsis' 939824b6'Circled Latin Capital Letter A' 967925cfcirclefilled or blackcircle 97642624'Caduceus or "Kerykeion" 8203200bzero-width space 78791ec7'Latin Small Letter E With Circumflex And Dot Below' ___ Lynx-dev mailing list Lynx-dev@nongnu.org https://lists.nongnu.org/mailman/listinfo/lynx-dev
Re: [Lynx-dev] lynx un-renders
Hi russellbell, can you please use hexadecimal numbers in threads like this one, so it is easier for others to find the mentioned character in the Unicode database? Thanks in advance! >I think it appears by accident every time I see it. I’ve only ever seen those used by windows codepage 1252 users… I usually render the C1 control characters thus: /* 0x80 */ 0x20AC, 0x278A, 0x201A, 0x0192, 0x201E, 0x2026, 0x2020, 0x2021, /* 0x88 */ 0x02C6, 0x2030, 0x0160, 0x2039, 0x0152, 0x278B, 0x017D, 0x278C, /* 0x90 */ 0x278D, 0x2018, 0x2019, 0x201C, 0x201D, 0x2022, 0x2013, 0x2014, /* 0x98 */ 0x02DC, 0x2122, 0x0161, 0x203A, 0x0153, 0x278E, 0x017E, 0x0178, I’m using 0x278A‥0x278E for those undefined even in cp1252 so that there’s at least some indication of what went wrong. C1 control characters generally should not be interpreted, or even sent out as part of text files, so this is the safe way to do it, and rendering it as if it were miscalculated cp1252 makes most of its uses legible. Note that this applies equally for undefined uses of all three of: bare \x80 in the HTML, bare \xC2\xA0 (U+0080) in the HTML, and use of entities like € and Ĩ in the HTML. These all should be rendered the same as € for… acceptance that the word “web designer” is a curse word denoting idiots. bye, //mirabilos -- [...] if maybe ext3fs wasn't a better pick, or jfs, or maybe reiserfs, oh but what about xfs, and if only i had waited until reiser4 was ready... in the be- ginning, there was ffs, and in the middle, there was ffs, and at the end, there was still ffs, and the sys admins knew it was good. :) -- Ted Unangst über *fs ___ Lynx-dev mailing list Lynx-dev@nongnu.org https://lists.nongnu.org/mailman/listinfo/lynx-dev
[Lynx-dev] lynx un-renders
'Hop over protocol' is an instruction to the renderer for formatting. I notice that def7_uni.tbl comments out its rendering as 'HO': good call. I make lynx render it as nothing. I think it appears by accident every time I see it. russell bell ___ Lynx-dev mailing list Lynx-dev@nongnu.org https://lists.nongnu.org/mailman/listinfo/lynx-dev