[Issue 5221] entity.c: Merge Walter's list with Thomas'
https://issues.dlang.org/show_bug.cgi?id=5221 Andrei Alexandrescu and...@erdani.com changed: What|Removed |Added Version|D1 D2 |D2 --
[Issue 5221] entity.c: Merge Walter's list with Thomas'
http://d.puremagic.com/issues/show_bug.cgi?id=5221 Don clugd...@yahoo.com.au changed: What|Removed |Added Status|NEW |RESOLVED Resolution||FIXED --- Comment #19 from Don clugd...@yahoo.com.au 2011-02-06 13:41:15 PST --- Fixed: https://github.com/D-Programming-Language/dmd/commit/b46fe402cff4618f5d49f99d71b8fefb764e16e5 -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email --- You are receiving this mail because: ---
[Issue 5221] entity.c: Merge Walter's list with Thomas'
http://d.puremagic.com/issues/show_bug.cgi?id=5221 --- Comment #18 from Aziz K�ksal aziz.koek...@gmail.com 2011-02-04 05:03:04 PST --- (In reply to comment #12) // Notice the leading space in the value. !ENTITY DotDot#x020DC; !--COMBINING FOUR DOTS ABOVE -- Means nothing. I guess you're right, since on the following list the value of those combining characters consists of only one Unicode codepoint: http://www.w3.org/2003/entities/2007doc/byalpha.html For DotDot; it's U+20DC. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email --- You are receiving this mail because: ---
[Issue 5221] entity.c: Merge Walter's list with Thomas'
http://d.puremagic.com/issues/show_bug.cgi?id=5221 --- Comment #15 from Don clugd...@yahoo.com.au 2011-01-31 01:08:57 PST --- The DMD test suite chokes on: lang; == 9001 (== U+2329), in the new list it is U+27E8. This really scared me, because I found a few web references that listed lang; as U+2329. http://www.fileformat.info/info/unicode/char/2329/index.htm Turns out that U+2329 and U+27E8 are visually almost identical. I found this helpful note in http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references#endnote_lang lang: 'mathematical left angle bracket' is NOT the same character as U+003C 'less than', or U+2039 'single left-pointing angle quotation mark', or U+2329 'left-pointing angle bracket', or U+3008 'left angle bracket'. I finally found what has happened: U+27E8 was added in unicode 3.2.0 In the book unicode explained, p423, it says that U+27E8 is poorly supported (because it was a recent addition to unicode) and that U+2329 is a more practical choice. But, U+2329 is canonically equivalent to U+3008, and is intended for chinese-japanese-korean ideographs, and it can look wrong if it goes through a normalization process. That book was published in 2006. Can we assume that unicode support is widespread enough now that we should change to the more correct value? -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email --- You are receiving this mail because: ---
[Issue 5221] entity.c: Merge Walter's list with Thomas'
http://d.puremagic.com/issues/show_bug.cgi?id=5221 --- Comment #16 from Iain Buclaw ibuc...@ubuntu.com 2011-01-31 03:36:26 PST --- (In reply to comment #15) That book was published in 2006. Can we assume that unicode support is widespread enough now that we should change to the more correct value? I would assume it to be safe to change. Having looked up on the reference myself, it appears to be added as part of the HTML5 standard? -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email --- You are receiving this mail because: ---
[Issue 5221] entity.c: Merge Walter's list with Thomas'
http://d.puremagic.com/issues/show_bug.cgi?id=5221 --- Comment #17 from Don clugd...@yahoo.com.au 2011-01-31 14:17:09 PST --- (In reply to comment #16) (In reply to comment #15) That book was published in 2006. Can we assume that unicode support is widespread enough now that we should change to the more correct value? I would assume it to be safe to change. Having looked up on the reference myself, it appears to be added as part of the HTML5 standard? Yes. The old definition was in 4.01, back in 1999. And here's the problem -- HTML5 still hasn't been ratified, and the documents all say draft. I wrote these next two lines: So, I'm not sure that we can use these entity values yet. Are we certain that they're not going to change them before HTML5 becomes official? and then I read this: http://blog.whatwg.org/html-is-the-new-html5 indicating that html5 will never happen, and the draft documents are as standard as it's ever going to get. That's a spectacular failure. So I guess there's no reason to hold off on the patch. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email --- You are receiving this mail because: ---
[Issue 5221] entity.c: Merge Walter's list with Thomas'
http://d.puremagic.com/issues/show_bug.cgi?id=5221 Iain Buclaw ibuc...@ubuntu.com changed: What|Removed |Added Attachment #887 is|0 |1 obsolete|| --- Comment #13 from Iain Buclaw ibuc...@ubuntu.com 2011-01-30 06:28:20 PST --- Created an attachment (id=889) new entity.c source based off new w3 list Dang diggity, just realised that some items are in the wrong order. Uploading corrected source. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email --- You are receiving this mail because: ---
[Issue 5221] entity.c: Merge Walter's list with Thomas'
http://d.puremagic.com/issues/show_bug.cgi?id=5221 Iain Buclaw ibuc...@ubuntu.com changed: What|Removed |Added Attachment #888 is|0 |1 obsolete|| --- Comment #14 from Iain Buclaw ibuc...@ubuntu.com 2011-01-30 06:29:42 PST --- Created an attachment (id=890) diff between file and donc/dmd/src/entity.c And corrected diff between file and Don's copy. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email --- You are receiving this mail because: ---
[Issue 5221] entity.c: Merge Walter's list with Thomas'
http://d.puremagic.com/issues/show_bug.cgi?id=5221 Aziz K�ksal aziz.koek...@gmail.com changed: What|Removed |Added CC||aziz.koek...@gmail.com --- Comment #4 from Aziz K�ksal aziz.koek...@gmail.com 2011-01-28 14:25:56 PST --- I researched this issue with named HTML entities and found several, different lists out there. I think the following list is the most complete and most accurate one: http://www.w3.org/2003/entities/2007/w3centities-f.ent Please consider mentioning this in the language specification. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email --- You are receiving this mail because: ---
[Issue 5221] entity.c: Merge Walter's list with Thomas'
http://d.puremagic.com/issues/show_bug.cgi?id=5221 Don clugd...@yahoo.com.au changed: What|Removed |Added CC||clugd...@yahoo.com.au --- Comment #5 from Don clugd...@yahoo.com.au 2011-01-28 14:36:53 PST --- (In reply to comment #4) I researched this issue with named HTML entities and found several, different lists out there. I think the following list is the most complete and most accurate one: http://www.w3.org/2003/entities/2007/w3centities-f.ent Please consider mentioning this in the language specification. A few hours ago I merged this patch into my fork of dmd. Complete source is here: https://github.com/donc/dmd/blob/master/src/entity.c Would be great if you or someone else could compare that list, to the one you've just posted. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email --- You are receiving this mail because: ---
[Issue 5221] entity.c: Merge Walter's list with Thomas'
http://d.puremagic.com/issues/show_bug.cgi?id=5221 --- Comment #6 from Iain Buclaw ibuc...@ubuntu.com 2011-01-28 16:20:41 PST --- (In reply to comment #5) (In reply to comment #4) I researched this issue with named HTML entities and found several, different lists out there. I think the following list is the most complete and most accurate one: http://www.w3.org/2003/entities/2007/w3centities-f.ent Please consider mentioning this in the language specification. A few hours ago I merged this patch into my fork of dmd. Complete source is here: https://github.com/donc/dmd/blob/master/src/entity.c Would be great if you or someone else could compare that list, to the one you've just posted. There are quite a lot of additions, and the odd difference inbetween. I can do an update, though I guess it depends on how much you want to put in. There are entities to whom's value is large than a unsigned short. eg: b.nu, 0x1D6CE, b.Omega, 0x1D6C0, b.omega, 0x1D6DA, Bopf, 0x1D539, bopf, 0x1D553, Which then leads to question #2, does the parser allow '\b.nu;' ? -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email --- You are receiving this mail because: ---
[Issue 5221] entity.c: Merge Walter's list with Thomas'
http://d.puremagic.com/issues/show_bug.cgi?id=5221 --- Comment #3 from Iain Buclaw ibuc...@ubuntu.com 2010-11-26 11:42:28 PST --- Created an attachment (id=834) Updated merge. Yikes! I didn't know my last update was going to do *that*. Sorry for any noise, here's an updated patch against the svn, adds some bits, corrects some mistakes in Thomas' list. Checked and tested against the testsuite. =) Regards -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email --- You are receiving this mail because: ---
[Issue 5221] entity.c: Merge Walter's list with Thomas'
http://d.puremagic.com/issues/show_bug.cgi?id=5221 --- Comment #1 from Iain Buclaw ibuc...@ubuntu.com 2010-11-16 05:45:59 PST --- Random examples of tests that fail on DMD: static assert('\check;'==10003); static assert('\lsim;'==8818); static assert('\numero;'==8470); static assert('\urcorn;'==8989); static assert('\Zdot;'==379); Regards -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email --- You are receiving this mail because: ---
[Issue 5221] entity.c: Merge Walter's list with Thomas'
http://d.puremagic.com/issues/show_bug.cgi?id=5221 --- Comment #2 from Iain Buclaw ibuc...@ubuntu.com 2010-11-16 06:07:53 PST --- (From update of attachment 815) diff -ur src.orig/entity.c src/entity.c --- src.orig/entity.c2010-03-31 01:26:18.0 +0100 +++ src/entity.c2010-11-16 14:01:58.423055202 + @@ -9,6 +9,7 @@ #include string.h +#include ctype.h /* * Convert from named entity to its encoding. @@ -23,7 +24,6 @@ unsigned short value; }; -#if IN_GCC static NameId namesA[]={ Aacgr,0x0386, aacgr,0x03AC, @@ -42,7 +42,9 @@ agr, 0x03B1, Agrave, 0x00C0, agrave, 0x00E0, +alefsym, 0x2135, aleph,0x2135, +Alpha,0x0391, alpha,0x03B1, Amacr,0x0100, amacr,0x0101, @@ -76,9 +78,11 @@ bcong,0x224C, Bcy, 0x0411, bcy, 0x0431, +bdquo,0x201E, becaus, 0x2235, bepsi,0x220D, bernou, 0x212C, +Beta, 0x0392, beta, 0x03B2, beth, 0x2136, Bgr, 0x0392, @@ -162,6 +166,7 @@ CHcy, 0x0427, chcy, 0x0447, check,0x2713, +Chi, 0x03A7, chi, 0x03C7, cir, 0x25CB, circ, 0x005E, @@ -178,6 +183,7 @@ coprod, 0x2210, copy, 0x00A9, copysr, 0x2117, +crarr,0x21B5, cross,0x2717, cuepr,0x22DE, cuesc,0x22DF, @@ -281,17 +287,21 @@ Eogon,0x0118, eogon,0x0119, epsi, 0x220A, +Epsilon, 0x0395, +epsilon, 0x03B5, epsis,0x220A, epsiv,0x03B5, equals, 0x003D, equiv,0x2261, erDot,0x2253, esdot,0x2250, +Eta, 0x0397, eta, 0x03B7, ETH, 0x00D0, eth, 0x00F0, Euml, 0x00CB, euml, 0x00EB, +euro, 0x20AC, excl, 0x0021, exist,0x2203, NULL, 0 @@ -325,6 +335,7 @@ frac56, 0x215A, frac58, 0x215D, frac78, 0x215E, +frasl,0x2044, frown,0x2322, NULL, 0 }; @@ -425,6 +436,7 @@ iocy, 0x0451, Iogon,0x012E, iogon,0x012F, +Iota, 0x0399, iota, 0x03B9, iquest, 0x00BF, isin, 0x220A, @@ -450,6 +462,7 @@ }; static NameId namesK[]={ +Kappa,0x039A, kappa,0x03BA, kappav, 0x03F0, Kcedil, 0x0136, @@ -523,7 +536,9 @@ lozf, 0x2726, lpar, 0x0028, lrarr2, 0x21C6, +lrm, 0x200E, lrhar2, 0x21CB, +lsaquo, 0x2039, lsh, 0x21B0, lsim, 0x2272, lsqb, 0x005B, @@ -561,6 +576,7 @@ mldr, 0x2026, mnplus, 0x2213, models, 0x22A7, +Mu, 0x039C, mu, 0x03BC, mumap,0x22B8, NULL, 0 @@ -573,8 +589,7 @@ nap, 0x2249, napos,0x0149, natur,0x266E, -// nbsp, 0x00A0, -nbsp, 32,// make non-breaking space appear as space +nbsp, 0x00A0, Ncaron, 0x0147, ncaron, 0x0148, Ncedil, 0x0145, @@ -631,6 +646,7 @@ nsupE,0x2289, Ntilde, 0x00D1, ntilde, 0x00F1, +Nu, 0x039D, nu, 0x03BD, num, 0x0023, numero, 0x2116, @@ -671,10 +687,13 @@ ohgr, 0x03C9, ohm, 0x2126, olarr,0x21BA, +oline,0x203E, Omacr,0x014C, omacr,0x014D, Omega,0x03A9, omega,0x03C9, +Omicron, 0x039F, +omicron, 0x03BF, ominus, 0x2296, oplus,0x2295, or, 0x2228, @@ -709,6 +728,7 @@ PHgr, 0x03A6, phgr, 0x03C6, Phi, 0x03A6, +phi, 0x03C6, phis, 0x03C6, phiv, 0x03D5, phmmat, 0x2133, @@ -780,13 +800,16 @@ rgr, 0x03C1, rhard,0x21C1, rharu,0x21C0, +Rho, 0x03A1, rho, 0x03C1,