#28067 [NoF-Opn]: partially incorrect utf8 to htmlentities mapping

2005-04-12 Thread ben at csgb dot de
 ID:   28067
 User updated by:  ben at csgb dot de
 Reported By:  ben at csgb dot de
-Status:   No Feedback
+Status:   Open
 Bug Type: Strings related
 Operating System: possibly all
 PHP Version:  4, 5, who knows
 Assigned To:  derick
 New Comment:

Code is still uncomplete, will send testfile to sniper and derick


Previous Comments:


[2005-03-01 01:00:30] php-bugs at lists dot php dot net

No feedback was provided for this bug for over a week, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to Open.



[2005-02-21 20:21:22] [EMAIL PROTECTED]

Please try using this CVS snapshot:

  http://snaps.php.net/php4-STABLE-latest.tar.gz
 
For Windows:
 
  http://snaps.php.net/win32/php4-win32-STABLE-latest.zip





[2004-04-20 17:50:06] [EMAIL PROTECTED]

received the patch, but it doesn't look 100% correct so I need to so
some investigations.



[2004-04-20 09:10:06] [EMAIL PROTECTED]

Hello,

can you please mail this patch to me, as the bug system garbled it a
bit.

regards,
Derick



[2004-04-19 20:51:01] ben at csgb dot de

sorry, please be careful when using the diff, have to learn to copy and
paste correctly )-;

the diff ends after the first: 

+   NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
/* 9001 */
lang, rang,
};

without the eck



The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
http://bugs.php.net/28067

-- 
Edit this bug report at http://bugs.php.net/?id=28067edit=1


#28042 [Csd]: greek letters in html to entity mapping not correct

2004-04-19 Thread ben at csgb dot de
 ID:  28042
 User updated by: ben at csgb dot de
-Summary: greek letters in html to entitity mapping not correct
 Reported By: ben at csgb dot de
 Status:  Closed
 Bug Type:Strings related
 PHP Version: all
 New Comment:

fixing summary line for better search results (entitity - entity)


Previous Comments:


[2004-04-18 01:10:02] [EMAIL PROTECTED]

This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.





[2004-04-18 01:09:52] [EMAIL PROTECTED]

This bug has been fixed in CVS.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
http://snaps.php.net/.
 
Thank you for the report, and for helping us make PHP better.



[2004-04-17 21:52:26] ben at csgb dot de

retry to post code:

?php
echo htmlentities(? ?,ENT_COMPAT,UTF-8);
?

here is the diff of php-4.3.4/ext/standard/html.c:

139,140c139,140
   Iota, Kappa, Lambda, Mu, Nu, X1, Omicron, P1,
Rho,
   NULL, Sigma, Tau, Upsilon, Ph1, Ch1, Ps1, Omega,
---
   Iota, Kappa, Lambda, Mu, Nu, Xi, Omicron, Pi,
Rho,
   NULL, Sigma, Tau, Upsilon, Phi, Chi, Psi, Omega,
144,145c144,145
   iota, kappa, lambda, mu, nu, x1, omicron, p1,
rho,
   sigmaf, sigma, tau, upsilon, ph1, ch1, ps1,
omega,
---
   iota, kappa, lambda, mu, nu, xi, omicron, pi,
rho,
   sigmaf, sigma, tau, upsilon, phi, chi, psi,
omega,

It's the same change in php-5 (with different line numbers)



[2004-04-17 21:38:13] ben at csgb dot de

Description:

the html entity mappings used by htmlentities() have wrong entries in
ent_uni_greek[].

They say P1 and p1 instead of Pi and pi. The same goes with
Xi, Phi, Chi, Psi and their lowercase characters.

Reproduce code:
---
?php
echo htmlentities(? ?,ENT_COMPAT,UTF-8);
?

Expected result:

Xi;Pi;Phi;Chi;Psi; xi;pi;phi;chi;psi;

Actual result:
--
X1;P1;Ph1;Ch1;Ps1; x1;p1;ph1;ch1;ps1;





-- 
Edit this bug report at http://bugs.php.net/?id=28042edit=1


#28067 [NEW]: partially incorrect utf8 to htmlentities mapping

2004-04-19 Thread ben at csgb dot de
From: ben at csgb dot de
Operating system: possibly all
PHP version:  Irrelevant
PHP Bug Type: Strings related
Bug description:  partially incorrect utf8 to htmlentities mapping

Description:

During some doublecheck after Bug #28042 was closed, I discovered some
more mistakes in that file. I just checked the UTF-8 tables, don't know if
the other charsets are wrong, too.

In Bug #28042, We forgot two letters of the greek table, 'upsih' and
'piv', which are spelled with an 'i' as in ice instead of '1'.

Also there are some NULLs missing at several points. This causes
htmlentities(,,UTF-8) to convert UTF-8 encoded chars into the wrong or
into no HTML-Entities since the mappings are shifted. For example U+202F
is mapped to permil; which should be U+2030.

Here is my diff of the php5-cvs/ext/standard/html.c, the same
modifications should be made in php-4.3, please double check

--- html.c  2004-04-18 02:30:24.0 +0200
+++ html.c.fixed2004-04-19 18:44:47.949012992 +0200
@@ -114,13 +114,13 @@
/* 354 - 375 */
NULL, NULL, NULL, NULL, NULL, NULL, NULL,
NULL, NULL, NULL, NULL, NULL, NULL, NULL,
-   NULL, NULL, NULL, NULL, NULL, NULL, NULL,
+   NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
/* 376 */
Yuml,
/* 377 - 401 */
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
-   NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
+   NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
/* 402 */
fnof
 };
@@ -130,7 +130,7 @@
circ,
/* 711 - 731 */
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
-   NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
+   NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
/* 732 */
tilde,
 };
@@ -147,9 +147,9 @@
sigmaf, sigma, tau, upsilon, phi, chi, psi,
omega,
/* 970 - 976 are not mapped */
NULL, NULL, NULL, NULL, NULL, NULL, NULL,
-   thetasym, ups1h,
+   thetasym, upsih,
NULL, NULL, NULL,
-   p1v
+   piv
 };

 static entity_table_t ent_uni_punct[] = {
@@ -158,7 +158,7 @@
thinsp, NULL, NULL, zwnj, zwj, lrm, rlm,
NULL, NULL, NULL, ndash, mdash, NULL, NULL, NULL,
lsquo, rsquo, sbquo, NULL, ldquo, rdquo, bdquo,
-   dagger, Dagger, bull, NULL, NULL, NULL, hellip,
+   NULL, dagger, Dagger, bull, NULL, NULL, NULL, hellip,
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, permil,
NULL,
prime, Prime, NULL, NULL, NULL, NULL, NULL, lsaquo,
rsaquo,
NULL, NULL, NULL, oline, NULL, NULL, NULL, NULL, NULL,
@@ -191,7 +191,7 @@
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
/* 8624 (0x21b0) */
-   NULL, NULL, NULL, NULL, crarr, NULL, NULL, NULL,
+   NULL, NULL, NULL, NULL, NULL, crarr, NULL, NULL,
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
/* 8640 (0x21c0) */
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
@@ -206,9 +206,9 @@
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
/* 8704 (0x2200) */
forall, comp, part, exist, nexist, empty, NULL,
nabla,
-   isin, notin, epsis, NULL, ni, bepsi, NULL, prod,
+   isin, notin, epsis, ni, NULL, bepsi, NULL, prod,
/* 8720 (0x2210) */
-   coprod, sum, minus, mnplus, plusdo, NULL, setmn,
NULL,
+   coprod, sum, minus, mnplus, plusdo, NULL, setmn,
lowast,
compfn, NULL, radic, NULL, NULL, prop, infin, ang90,
/* 8736 (0x2220) */
ang, angmsd, angsph, mid, nmid, par, npar, and,
@@ -232,17 +232,19 @@
npr, nsc, sub, sup, nsub, nsup, sube, supe,
/* 8840 - 8852 */
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
NULL,
+NULL,
/* 8853 */
oplus, NULL, otimes,
/* 8856 - 8868 */
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
NULL,
+NULL,
/* 8869 */
perp,
/* 8870 - 8901 */
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
-   NULL,
+   NULL,
/* 8901 */
sdot,
/* 8902 - 8967 */
@@ -252,14 +254,13 @@
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
-   NULL, NULL, NULL, NULL, NULL,
+   NULL, NULL, NULL, NULL, NULL, NULL,
/* 8968 */
lceil, rceil, lfloor, rfloor,
/* 8969 - 9000 */
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
-   NULL, NULL, NULL, NULL

#28067 [Opn]: partially incorrect utf8 to htmlentities mapping

2004-04-19 Thread ben at csgb dot de
 ID:   28067
 User updated by:  ben at csgb dot de
 Reported By:  ben at csgb dot de
 Status:   Open
 Bug Type: Strings related
 Operating System: possibly all
 PHP Version:  Irrelevant
 New Comment:

sorry, please be careful when using the diff, have to learn to copy and
paste correctly )-;

the diff ends after the first: 

+   NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
/* 9001 */
lang, rang,
};

without the eck


Previous Comments:


[2004-04-19 20:46:26] ben at csgb dot de

Description:

During some doublecheck after Bug #28042 was closed, I discovered some
more mistakes in that file. I just checked the UTF-8 tables, don't know
if the other charsets are wrong, too.

In Bug #28042, We forgot two letters of the greek table, 'upsih' and
'piv', which are spelled with an 'i' as in ice instead of '1'.

Also there are some NULLs missing at several points. This causes
htmlentities(,,UTF-8) to convert UTF-8 encoded chars into the wrong
or into no HTML-Entities since the mappings are shifted. For example
U+202F is mapped to permil; which should be U+2030.

Here is my diff of the php5-cvs/ext/standard/html.c, the same
modifications should be made in php-4.3, please double check

--- html.c  2004-04-18 02:30:24.0 +0200
+++ html.c.fixed2004-04-19 18:44:47.949012992 +0200
@@ -114,13 +114,13 @@
/* 354 - 375 */
NULL, NULL, NULL, NULL, NULL, NULL, NULL,
NULL, NULL, NULL, NULL, NULL, NULL, NULL,
-   NULL, NULL, NULL, NULL, NULL, NULL, NULL,
+   NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
/* 376 */
Yuml,
/* 377 - 401 */
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
-   NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
+   NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
/* 402 */
fnof
 };
@@ -130,7 +130,7 @@
circ,
/* 711 - 731 */
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
-   NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
+   NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
NULL,
/* 732 */
tilde,
 };
@@ -147,9 +147,9 @@
sigmaf, sigma, tau, upsilon, phi, chi, psi,
omega,
/* 970 - 976 are not mapped */
NULL, NULL, NULL, NULL, NULL, NULL, NULL,
-   thetasym, ups1h,
+   thetasym, upsih,
NULL, NULL, NULL,
-   p1v
+   piv
 };

 static entity_table_t ent_uni_punct[] = {
@@ -158,7 +158,7 @@
thinsp, NULL, NULL, zwnj, zwj, lrm, rlm,
NULL, NULL, NULL, ndash, mdash, NULL, NULL, NULL,
lsquo, rsquo, sbquo, NULL, ldquo, rdquo, bdquo,
-   dagger, Dagger, bull, NULL, NULL, NULL, hellip,
+   NULL, dagger, Dagger, bull, NULL, NULL, NULL, hellip,
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, permil,
NULL,
prime, Prime, NULL, NULL, NULL, NULL, NULL, lsaquo,
rsaquo,
NULL, NULL, NULL, oline, NULL, NULL, NULL, NULL, NULL,
@@ -191,7 +191,7 @@
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
/* 8624 (0x21b0) */
-   NULL, NULL, NULL, NULL, crarr, NULL, NULL, NULL,
+   NULL, NULL, NULL, NULL, NULL, crarr, NULL, NULL,
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
/* 8640 (0x21c0) */
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
@@ -206,9 +206,9 @@
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
/* 8704 (0x2200) */
forall, comp, part, exist, nexist, empty, NULL,
nabla,
-   isin, notin, epsis, NULL, ni, bepsi, NULL, prod,
+   isin, notin, epsis, ni, NULL, bepsi, NULL, prod,
/* 8720 (0x2210) */
-   coprod, sum, minus, mnplus, plusdo, NULL, setmn,
NULL,
+   coprod, sum, minus, mnplus, plusdo, NULL, setmn,
lowast,
compfn, NULL, radic, NULL, NULL, prop, infin, ang90,
/* 8736 (0x2220) */
ang, angmsd, angsph, mid, nmid, par, npar,
and,
@@ -232,17 +232,19 @@
npr, nsc, sub, sup, nsub, nsup, sube, supe,
/* 8840 - 8852 */
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
NULL, NULL,
+NULL,
/* 8853 */
oplus, NULL, otimes,
/* 8856 - 8868 */
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
NULL, NULL,
+NULL,
/* 8869 */
perp,
/* 8870 - 8901 */
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
-   NULL,
+   NULL,
/* 8901 */
sdot,
/* 8902 - 8967 */
@@ -252,14 +254,13 @@
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
NULL, NULL, NULL, NULL, NULL, NULL

#28042 [NEW]: greek letters in html to entitity mapping not correct

2004-04-17 Thread ben at csgb dot de
From: ben at csgb dot de
Operating system: 
PHP version:  4.3.5
PHP Bug Type: Strings related
Bug description:  greek letters in html to entitity mapping not correct

Description:

the html entity mappings used by htmlentities() have wrong entries in
ent_uni_greek[].

They say P1 and p1 instead of Pi and pi. The same goes with Xi,
Phi, Chi, Psi and their lowercase characters.

Reproduce code:
---
?php
echo htmlentities(? ?,ENT_COMPAT,UTF-8);
?

Expected result:

Xi;Pi;Phi;Chi;Psi; xi;pi;phi;chi;psi;

Actual result:
--
X1;P1;Ph1;Ch1;Ps1; x1;p1;ph1;ch1;ps1;

-- 
Edit bug report at http://bugs.php.net/?id=28042edit=1
-- 
Try a CVS snapshot (php4):  http://bugs.php.net/fix.php?id=28042r=trysnapshot4
Try a CVS snapshot (php5):  http://bugs.php.net/fix.php?id=28042r=trysnapshot5
Fixed in CVS:   http://bugs.php.net/fix.php?id=28042r=fixedcvs
Fixed in release:   http://bugs.php.net/fix.php?id=28042r=alreadyfixed
Need backtrace: http://bugs.php.net/fix.php?id=28042r=needtrace
Need Reproduce Script:  http://bugs.php.net/fix.php?id=28042r=needscript
Try newer version:  http://bugs.php.net/fix.php?id=28042r=oldversion
Not developer issue:http://bugs.php.net/fix.php?id=28042r=support
Expected behavior:  http://bugs.php.net/fix.php?id=28042r=notwrong
Not enough info:http://bugs.php.net/fix.php?id=28042r=notenoughinfo
Submitted twice:http://bugs.php.net/fix.php?id=28042r=submittedtwice
register_globals:   http://bugs.php.net/fix.php?id=28042r=globals
PHP 3 support discontinued: http://bugs.php.net/fix.php?id=28042r=php3
Daylight Savings:   http://bugs.php.net/fix.php?id=28042r=dst
IIS Stability:  http://bugs.php.net/fix.php?id=28042r=isapi
Install GNU Sed:http://bugs.php.net/fix.php?id=28042r=gnused
Floating point limitations: http://bugs.php.net/fix.php?id=28042r=float


#28042 [Opn]: greek letters in html to entitity mapping not correct

2004-04-17 Thread ben at csgb dot de
 ID:  28042
 User updated by: ben at csgb dot de
 Reported By: ben at csgb dot de
 Status:  Open
 Bug Type:Strings related
-PHP Version: 4.3.5
+PHP Version: all
 New Comment:

retry to post code:

?php
echo htmlentities(? ?,ENT_COMPAT,UTF-8);
?

here is the diff of php-4.3.4/ext/standard/html.c:

139,140c139,140
   Iota, Kappa, Lambda, Mu, Nu, X1, Omicron, P1,
Rho,
   NULL, Sigma, Tau, Upsilon, Ph1, Ch1, Ps1, Omega,
---
   Iota, Kappa, Lambda, Mu, Nu, Xi, Omicron, Pi,
Rho,
   NULL, Sigma, Tau, Upsilon, Phi, Chi, Psi, Omega,
144,145c144,145
   iota, kappa, lambda, mu, nu, x1, omicron, p1,
rho,
   sigmaf, sigma, tau, upsilon, ph1, ch1, ps1,
omega,
---
   iota, kappa, lambda, mu, nu, xi, omicron, pi,
rho,
   sigmaf, sigma, tau, upsilon, phi, chi, psi,
omega,

It's the same change in php-5 (with different line numbers)


Previous Comments:


[2004-04-17 21:38:13] ben at csgb dot de

Description:

the html entity mappings used by htmlentities() have wrong entries in
ent_uni_greek[].

They say P1 and p1 instead of Pi and pi. The same goes with
Xi, Phi, Chi, Psi and their lowercase characters.

Reproduce code:
---
?php
echo htmlentities(? ?,ENT_COMPAT,UTF-8);
?

Expected result:

Xi;Pi;Phi;Chi;Psi; xi;pi;phi;chi;psi;

Actual result:
--
X1;P1;Ph1;Ch1;Ps1; x1;p1;ph1;ch1;ps1;





-- 
Edit this bug report at http://bugs.php.net/?id=28042edit=1