#28067 [NoF-Opn]: partially incorrect utf8 to htmlentities mapping
ID: 28067 User updated by: ben at csgb dot de Reported By: ben at csgb dot de -Status: No Feedback +Status: Open Bug Type: Strings related Operating System: possibly all PHP Version: 4, 5, who knows Assigned To: derick New Comment: Code is still uncomplete, will send testfile to sniper and derick Previous Comments: [2005-03-01 01:00:30] php-bugs at lists dot php dot net No feedback was provided for this bug for over a week, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to Open. [2005-02-21 20:21:22] [EMAIL PROTECTED] Please try using this CVS snapshot: http://snaps.php.net/php4-STABLE-latest.tar.gz For Windows: http://snaps.php.net/win32/php4-win32-STABLE-latest.zip [2004-04-20 17:50:06] [EMAIL PROTECTED] received the patch, but it doesn't look 100% correct so I need to so some investigations. [2004-04-20 09:10:06] [EMAIL PROTECTED] Hello, can you please mail this patch to me, as the bug system garbled it a bit. regards, Derick [2004-04-19 20:51:01] ben at csgb dot de sorry, please be careful when using the diff, have to learn to copy and paste correctly )-; the diff ends after the first: + NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, /* 9001 */ lang, rang, }; without the eck The remainder of the comments for this report are too long. To view the rest of the comments, please view the bug report online at http://bugs.php.net/28067 -- Edit this bug report at http://bugs.php.net/?id=28067edit=1
#28042 [Csd]: greek letters in html to entity mapping not correct
ID: 28042 User updated by: ben at csgb dot de -Summary: greek letters in html to entitity mapping not correct Reported By: ben at csgb dot de Status: Closed Bug Type:Strings related PHP Version: all New Comment: fixing summary line for better search results (entitity - entity) Previous Comments: [2004-04-18 01:10:02] [EMAIL PROTECTED] This bug has been fixed in CVS. Snapshots of the sources are packaged every three hours; this change will be in the next snapshot. You can grab the snapshot at http://snaps.php.net/. Thank you for the report, and for helping us make PHP better. [2004-04-18 01:09:52] [EMAIL PROTECTED] This bug has been fixed in CVS. Snapshots of the sources are packaged every three hours; this change will be in the next snapshot. You can grab the snapshot at http://snaps.php.net/. Thank you for the report, and for helping us make PHP better. [2004-04-17 21:52:26] ben at csgb dot de retry to post code: ?php echo htmlentities(? ?,ENT_COMPAT,UTF-8); ? here is the diff of php-4.3.4/ext/standard/html.c: 139,140c139,140 Iota, Kappa, Lambda, Mu, Nu, X1, Omicron, P1, Rho, NULL, Sigma, Tau, Upsilon, Ph1, Ch1, Ps1, Omega, --- Iota, Kappa, Lambda, Mu, Nu, Xi, Omicron, Pi, Rho, NULL, Sigma, Tau, Upsilon, Phi, Chi, Psi, Omega, 144,145c144,145 iota, kappa, lambda, mu, nu, x1, omicron, p1, rho, sigmaf, sigma, tau, upsilon, ph1, ch1, ps1, omega, --- iota, kappa, lambda, mu, nu, xi, omicron, pi, rho, sigmaf, sigma, tau, upsilon, phi, chi, psi, omega, It's the same change in php-5 (with different line numbers) [2004-04-17 21:38:13] ben at csgb dot de Description: the html entity mappings used by htmlentities() have wrong entries in ent_uni_greek[]. They say P1 and p1 instead of Pi and pi. The same goes with Xi, Phi, Chi, Psi and their lowercase characters. Reproduce code: --- ?php echo htmlentities(? ?,ENT_COMPAT,UTF-8); ? Expected result: Xi;Pi;Phi;Chi;Psi; xi;pi;phi;chi;psi; Actual result: -- X1;P1;Ph1;Ch1;Ps1; x1;p1;ph1;ch1;ps1; -- Edit this bug report at http://bugs.php.net/?id=28042edit=1
#28067 [NEW]: partially incorrect utf8 to htmlentities mapping
From: ben at csgb dot de Operating system: possibly all PHP version: Irrelevant PHP Bug Type: Strings related Bug description: partially incorrect utf8 to htmlentities mapping Description: During some doublecheck after Bug #28042 was closed, I discovered some more mistakes in that file. I just checked the UTF-8 tables, don't know if the other charsets are wrong, too. In Bug #28042, We forgot two letters of the greek table, 'upsih' and 'piv', which are spelled with an 'i' as in ice instead of '1'. Also there are some NULLs missing at several points. This causes htmlentities(,,UTF-8) to convert UTF-8 encoded chars into the wrong or into no HTML-Entities since the mappings are shifted. For example U+202F is mapped to permil; which should be U+2030. Here is my diff of the php5-cvs/ext/standard/html.c, the same modifications should be made in php-4.3, please double check --- html.c 2004-04-18 02:30:24.0 +0200 +++ html.c.fixed2004-04-19 18:44:47.949012992 +0200 @@ -114,13 +114,13 @@ /* 354 - 375 */ NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, - NULL, NULL, NULL, NULL, NULL, NULL, NULL, + NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, /* 376 */ Yuml, /* 377 - 401 */ NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, - NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, + NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, /* 402 */ fnof }; @@ -130,7 +130,7 @@ circ, /* 711 - 731 */ NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, - NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, + NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, /* 732 */ tilde, }; @@ -147,9 +147,9 @@ sigmaf, sigma, tau, upsilon, phi, chi, psi, omega, /* 970 - 976 are not mapped */ NULL, NULL, NULL, NULL, NULL, NULL, NULL, - thetasym, ups1h, + thetasym, upsih, NULL, NULL, NULL, - p1v + piv }; static entity_table_t ent_uni_punct[] = { @@ -158,7 +158,7 @@ thinsp, NULL, NULL, zwnj, zwj, lrm, rlm, NULL, NULL, NULL, ndash, mdash, NULL, NULL, NULL, lsquo, rsquo, sbquo, NULL, ldquo, rdquo, bdquo, - dagger, Dagger, bull, NULL, NULL, NULL, hellip, + NULL, dagger, Dagger, bull, NULL, NULL, NULL, hellip, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, permil, NULL, prime, Prime, NULL, NULL, NULL, NULL, NULL, lsaquo, rsaquo, NULL, NULL, NULL, oline, NULL, NULL, NULL, NULL, NULL, @@ -191,7 +191,7 @@ NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, /* 8624 (0x21b0) */ - NULL, NULL, NULL, NULL, crarr, NULL, NULL, NULL, + NULL, NULL, NULL, NULL, NULL, crarr, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, /* 8640 (0x21c0) */ NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, @@ -206,9 +206,9 @@ NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, /* 8704 (0x2200) */ forall, comp, part, exist, nexist, empty, NULL, nabla, - isin, notin, epsis, NULL, ni, bepsi, NULL, prod, + isin, notin, epsis, ni, NULL, bepsi, NULL, prod, /* 8720 (0x2210) */ - coprod, sum, minus, mnplus, plusdo, NULL, setmn, NULL, + coprod, sum, minus, mnplus, plusdo, NULL, setmn, lowast, compfn, NULL, radic, NULL, NULL, prop, infin, ang90, /* 8736 (0x2220) */ ang, angmsd, angsph, mid, nmid, par, npar, and, @@ -232,17 +232,19 @@ npr, nsc, sub, sup, nsub, nsup, sube, supe, /* 8840 - 8852 */ NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, +NULL, /* 8853 */ oplus, NULL, otimes, /* 8856 - 8868 */ NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, +NULL, /* 8869 */ perp, /* 8870 - 8901 */ NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, - NULL, + NULL, /* 8901 */ sdot, /* 8902 - 8967 */ @@ -252,14 +254,13 @@ NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, - NULL, NULL, NULL, NULL, NULL, + NULL, NULL, NULL, NULL, NULL, NULL, /* 8968 */ lceil, rceil, lfloor, rfloor, /* 8969 - 9000 */ NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, - NULL, NULL, NULL, NULL
#28067 [Opn]: partially incorrect utf8 to htmlentities mapping
ID: 28067 User updated by: ben at csgb dot de Reported By: ben at csgb dot de Status: Open Bug Type: Strings related Operating System: possibly all PHP Version: Irrelevant New Comment: sorry, please be careful when using the diff, have to learn to copy and paste correctly )-; the diff ends after the first: + NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, /* 9001 */ lang, rang, }; without the eck Previous Comments: [2004-04-19 20:46:26] ben at csgb dot de Description: During some doublecheck after Bug #28042 was closed, I discovered some more mistakes in that file. I just checked the UTF-8 tables, don't know if the other charsets are wrong, too. In Bug #28042, We forgot two letters of the greek table, 'upsih' and 'piv', which are spelled with an 'i' as in ice instead of '1'. Also there are some NULLs missing at several points. This causes htmlentities(,,UTF-8) to convert UTF-8 encoded chars into the wrong or into no HTML-Entities since the mappings are shifted. For example U+202F is mapped to permil; which should be U+2030. Here is my diff of the php5-cvs/ext/standard/html.c, the same modifications should be made in php-4.3, please double check --- html.c 2004-04-18 02:30:24.0 +0200 +++ html.c.fixed2004-04-19 18:44:47.949012992 +0200 @@ -114,13 +114,13 @@ /* 354 - 375 */ NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, - NULL, NULL, NULL, NULL, NULL, NULL, NULL, + NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, /* 376 */ Yuml, /* 377 - 401 */ NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, - NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, + NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, /* 402 */ fnof }; @@ -130,7 +130,7 @@ circ, /* 711 - 731 */ NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, - NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, + NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, /* 732 */ tilde, }; @@ -147,9 +147,9 @@ sigmaf, sigma, tau, upsilon, phi, chi, psi, omega, /* 970 - 976 are not mapped */ NULL, NULL, NULL, NULL, NULL, NULL, NULL, - thetasym, ups1h, + thetasym, upsih, NULL, NULL, NULL, - p1v + piv }; static entity_table_t ent_uni_punct[] = { @@ -158,7 +158,7 @@ thinsp, NULL, NULL, zwnj, zwj, lrm, rlm, NULL, NULL, NULL, ndash, mdash, NULL, NULL, NULL, lsquo, rsquo, sbquo, NULL, ldquo, rdquo, bdquo, - dagger, Dagger, bull, NULL, NULL, NULL, hellip, + NULL, dagger, Dagger, bull, NULL, NULL, NULL, hellip, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, permil, NULL, prime, Prime, NULL, NULL, NULL, NULL, NULL, lsaquo, rsaquo, NULL, NULL, NULL, oline, NULL, NULL, NULL, NULL, NULL, @@ -191,7 +191,7 @@ NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, /* 8624 (0x21b0) */ - NULL, NULL, NULL, NULL, crarr, NULL, NULL, NULL, + NULL, NULL, NULL, NULL, NULL, crarr, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, /* 8640 (0x21c0) */ NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, @@ -206,9 +206,9 @@ NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, /* 8704 (0x2200) */ forall, comp, part, exist, nexist, empty, NULL, nabla, - isin, notin, epsis, NULL, ni, bepsi, NULL, prod, + isin, notin, epsis, ni, NULL, bepsi, NULL, prod, /* 8720 (0x2210) */ - coprod, sum, minus, mnplus, plusdo, NULL, setmn, NULL, + coprod, sum, minus, mnplus, plusdo, NULL, setmn, lowast, compfn, NULL, radic, NULL, NULL, prop, infin, ang90, /* 8736 (0x2220) */ ang, angmsd, angsph, mid, nmid, par, npar, and, @@ -232,17 +232,19 @@ npr, nsc, sub, sup, nsub, nsup, sube, supe, /* 8840 - 8852 */ NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, +NULL, /* 8853 */ oplus, NULL, otimes, /* 8856 - 8868 */ NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, +NULL, /* 8869 */ perp, /* 8870 - 8901 */ NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, - NULL, + NULL, /* 8901 */ sdot, /* 8902 - 8967 */ @@ -252,14 +254,13 @@ NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL
#28042 [NEW]: greek letters in html to entitity mapping not correct
From: ben at csgb dot de Operating system: PHP version: 4.3.5 PHP Bug Type: Strings related Bug description: greek letters in html to entitity mapping not correct Description: the html entity mappings used by htmlentities() have wrong entries in ent_uni_greek[]. They say P1 and p1 instead of Pi and pi. The same goes with Xi, Phi, Chi, Psi and their lowercase characters. Reproduce code: --- ?php echo htmlentities(? ?,ENT_COMPAT,UTF-8); ? Expected result: Xi;Pi;Phi;Chi;Psi; xi;pi;phi;chi;psi; Actual result: -- X1;P1;Ph1;Ch1;Ps1; x1;p1;ph1;ch1;ps1; -- Edit bug report at http://bugs.php.net/?id=28042edit=1 -- Try a CVS snapshot (php4): http://bugs.php.net/fix.php?id=28042r=trysnapshot4 Try a CVS snapshot (php5): http://bugs.php.net/fix.php?id=28042r=trysnapshot5 Fixed in CVS: http://bugs.php.net/fix.php?id=28042r=fixedcvs Fixed in release: http://bugs.php.net/fix.php?id=28042r=alreadyfixed Need backtrace: http://bugs.php.net/fix.php?id=28042r=needtrace Need Reproduce Script: http://bugs.php.net/fix.php?id=28042r=needscript Try newer version: http://bugs.php.net/fix.php?id=28042r=oldversion Not developer issue:http://bugs.php.net/fix.php?id=28042r=support Expected behavior: http://bugs.php.net/fix.php?id=28042r=notwrong Not enough info:http://bugs.php.net/fix.php?id=28042r=notenoughinfo Submitted twice:http://bugs.php.net/fix.php?id=28042r=submittedtwice register_globals: http://bugs.php.net/fix.php?id=28042r=globals PHP 3 support discontinued: http://bugs.php.net/fix.php?id=28042r=php3 Daylight Savings: http://bugs.php.net/fix.php?id=28042r=dst IIS Stability: http://bugs.php.net/fix.php?id=28042r=isapi Install GNU Sed:http://bugs.php.net/fix.php?id=28042r=gnused Floating point limitations: http://bugs.php.net/fix.php?id=28042r=float
#28042 [Opn]: greek letters in html to entitity mapping not correct
ID: 28042 User updated by: ben at csgb dot de Reported By: ben at csgb dot de Status: Open Bug Type:Strings related -PHP Version: 4.3.5 +PHP Version: all New Comment: retry to post code: ?php echo htmlentities(? ?,ENT_COMPAT,UTF-8); ? here is the diff of php-4.3.4/ext/standard/html.c: 139,140c139,140 Iota, Kappa, Lambda, Mu, Nu, X1, Omicron, P1, Rho, NULL, Sigma, Tau, Upsilon, Ph1, Ch1, Ps1, Omega, --- Iota, Kappa, Lambda, Mu, Nu, Xi, Omicron, Pi, Rho, NULL, Sigma, Tau, Upsilon, Phi, Chi, Psi, Omega, 144,145c144,145 iota, kappa, lambda, mu, nu, x1, omicron, p1, rho, sigmaf, sigma, tau, upsilon, ph1, ch1, ps1, omega, --- iota, kappa, lambda, mu, nu, xi, omicron, pi, rho, sigmaf, sigma, tau, upsilon, phi, chi, psi, omega, It's the same change in php-5 (with different line numbers) Previous Comments: [2004-04-17 21:38:13] ben at csgb dot de Description: the html entity mappings used by htmlentities() have wrong entries in ent_uni_greek[]. They say P1 and p1 instead of Pi and pi. The same goes with Xi, Phi, Chi, Psi and their lowercase characters. Reproduce code: --- ?php echo htmlentities(? ?,ENT_COMPAT,UTF-8); ? Expected result: Xi;Pi;Phi;Chi;Psi; xi;pi;phi;chi;psi; Actual result: -- X1;P1;Ph1;Ch1;Ps1; x1;p1;ph1;ch1;ps1; -- Edit this bug report at http://bugs.php.net/?id=28042edit=1