#35647 [NoF->Opn]: tidy does not produce valid utf8 when the encoding is specified in the config
ID: 35647 User updated by: bugs at nikmakepeace dot com Reported By: bugs at nikmakepeace dot com -Status: No Feedback +Status: Open Bug Type: XML related Operating System: FC3 PHP Version: 5.1.1 New Comment: The source is available at http://www.nikmakepeace.com/testcases/tidy-utf8.phps Be sure to force your browser's character encoding to utf-8 before copying it. Note also that changing the last line to echo tidy_repair_string($dirty, $config, 'utf8'); produces the desired results, but should not be necessary. Previous Comments: [2005-12-20 01:00:05] php-bugs at lists dot php dot net No feedback was provided for this bug for over a week, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open". [2005-12-12 22:05:50] [EMAIL PROTECTED] Put the data somewhere in the Net and paste the link here, please. ---- [2005-12-12 18:44:35] bugs at nikmakepeace dot com Description: If you specify utf8 encoding using the config options 'char-encoding', 'input-encoding' and 'output-encoding' with tidy it converts HTML entities into their latin1, single-byte equivalents rather than the correct, multi-byte utf-8 encodings (or just leaving them as entities) The result is that is converted into 0xA0, é is converted into 0xE9 and so on. This is not valid UTF-8 and so well-behaving XML parsers, including PHP's DOM, fail. Specifying 'utf8' as the third parameter works correctly. Reproduce code: --- http://fr.yahoo.com/r/n/fd/7/*http://fr.news.yahoo.com/12122005/202/beatrice-dalle-temoigne-au-proces-de-son-mari-accuse-de.html";>Béatrice Dalle témoigne au procès de son mari accusé de viol http://rd.yahoo.co.jp/toppage/topinfo/rikunabi/051213/?http://katsuyou.rikunabi-shinsotsu.yahoo.co.jp/2007/";>人ã¨å·®ãã¤ãå°±è·æ´»åãããã - http://rd.yahoo.co.jp/toppage/topinfo/event_xmas/051125/?http://xmas.yahoo.co.jp/";>ãã¤ã³ã5åã®ã¯ãªã¹ãã¹ã®ããã¯12æã¾ã§ï¼'; $config['char-encoding']='utf8'; $config['input-encoding']='utf8'; $config['output-encoding']='utf8'; $config['output-xhtml']=true; echo tidy_repair_string($dirty, $config); ?> Expected result: Note well the correct unicode e-acute and e-grave in the French text. http://www.w3.org/1999/xhtml";> http://fr.yahoo.com/r/n/fd/7/*http://fr.news.yahoo.com/12122005/202/beatrice-dalle-temoigne-au-proces-de-son-mari-accuse-de.html";> Béatrice Dalle témoigne au procès de son mari accusé de viol http://rd.yahoo.co.jp/toppage/topinfo/rikunabi/051213/?http://katsuyou.rikunabi-shinsotsu.yahoo.co.jp/2007/";> 人ã¨å·®ãã¤ãå°±è·æ´»åãããã - http://rd.yahoo.co.jp/toppage/topinfo/event_xmas/051125/?http://xmas.yahoo.co.jp/";> ãã¤ã³ã5åã®ã¯ãªã¹ãã¹ã®ããã¯12æã¾ã§ï¼ Actual result: -- Note how the e-acute and e-grave has been replaced with a non-unicode character. http://www.w3.org/1999/xhtml";> http://fr.yahoo.com/r/n/fd/7/*http://fr.news.yahoo.com/12122005/202/beatrice-dalle-temoigne-au-proces-de-son-mari-accuse-de.html";> B�atrice Dalle t�moigne au proc�s de son mari accus� de viol http://rd.yahoo.co.jp/toppage/topinfo/rikunabi/051213/?http://katsuyou.rikunabi-shinsotsu.yahoo.co.jp/2007/";> 人ã¨å·®ãã¤ãå°±è·æ´»åãããã - http://rd.yahoo.co.jp/toppage/topinfo/event_xmas/051125/?http://xmas.yahoo.co.jp/";> ãã¤ã³ã5åã®ã¯ãªã¹ãã¹ã®ããã¯12æã¾ã§ï¼ -- Edit this bug report at http://bugs.php.net/?id=35647&edit=1
#36171 [NEW]: $_FILES array keys have periods replaced with underscores
From: bugs at nikmakepeace dot com Operating system: Fedora Core 3 PHP version: 5.1.2 PHP Bug Type: Variables related Bug description: $_FILES array keys have periods replaced with underscores Description: After a file upload, the $_FILES array does not faithfully reflect the structure of the form from which the upload was made. An input with the valid HTML name of 'a.b' becomes the key 'a_b' in the $_FILES array. The same is true of the other request superglobals. The PHP manual says that an array index can be any string, and the HTML 5 rec says that the name attribute can be CDATA. Now that register_globals is on its last legs in the community, could you make it so that the dot transformation happens only when you explictly extract() or import_request_variables() Reproduce code: --- Expected result: Something like this: $_FILES is array ( 'a.b' => array ( 'name' => '5364-16.jpg', 'type' => 'image/jpeg', 'tmp_name' => '/tmp/php7vvvc0', 'error' => 0, 'size' => 66554, ), ) Actual result: -- $_FILES is array ( 'a_b' => array ( 'name' => '5364-16.jpg', 'type' => 'image/jpeg', 'tmp_name' => '/tmp/php7vvvc0', 'error' => 0, 'size' => 66554, ), ) -- Edit bug report at http://bugs.php.net/?id=36171&edit=1 -- Try a CVS snapshot (PHP 4.4): http://bugs.php.net/fix.php?id=36171&r=trysnapshot44 Try a CVS snapshot (PHP 5.1): http://bugs.php.net/fix.php?id=36171&r=trysnapshot51 Try a CVS snapshot (PHP 6.0): http://bugs.php.net/fix.php?id=36171&r=trysnapshot60 Fixed in CVS: http://bugs.php.net/fix.php?id=36171&r=fixedcvs Fixed in release: http://bugs.php.net/fix.php?id=36171&r=alreadyfixed Need backtrace: http://bugs.php.net/fix.php?id=36171&r=needtrace Need Reproduce Script:http://bugs.php.net/fix.php?id=36171&r=needscript Try newer version:http://bugs.php.net/fix.php?id=36171&r=oldversion Not developer issue: http://bugs.php.net/fix.php?id=36171&r=support Expected behavior:http://bugs.php.net/fix.php?id=36171&r=notwrong Not enough info: http://bugs.php.net/fix.php?id=36171&r=notenoughinfo Submitted twice: http://bugs.php.net/fix.php?id=36171&r=submittedtwice register_globals: http://bugs.php.net/fix.php?id=36171&r=globals PHP 3 support discontinued: http://bugs.php.net/fix.php?id=36171&r=php3 Daylight Savings: http://bugs.php.net/fix.php?id=36171&r=dst IIS Stability:http://bugs.php.net/fix.php?id=36171&r=isapi Install GNU Sed: http://bugs.php.net/fix.php?id=36171&r=gnused Floating point limitations: http://bugs.php.net/fix.php?id=36171&r=float No Zend Extensions: http://bugs.php.net/fix.php?id=36171&r=nozend MySQL Configuration Error:http://bugs.php.net/fix.php?id=36171&r=mysqlcfg
#35647 [NEW]: tidy does not produce vald utf8 when the encoding is specified in the config
From: bugs at nikmakepeace dot com Operating system: FC3 PHP version: 5.1.1 PHP Bug Type: Unknown/Other Function Bug description: tidy does not produce vald utf8 when the encoding is specified in the config Description: If you specify utf8 encoding using the config options 'char-encoding', 'input-encoding' and 'output-encoding' with tidy it converts HTML entities into their latin1, single-byte equivalents rather than the correct, multi-byte utf-8 encodings (or just leaving them as entities) The result is that is converted into 0xA0, é is converted into 0xE9 and so on. This is not valid UTF-8 and so well-behaving XML parsers, including PHP's DOM, fail. Specifying 'utf8' as the third parameter works correctly. Reproduce code: --- http://fr.yahoo.com/r/n/fd/7/*http://fr.news.yahoo.com/12122005/202/beatrice-dalle-temoigne-au-proces-de-son-mari-accuse-de.html";>Béatrice Dalle témoigne au procès de son mari accusé de viol http://rd.yahoo.co.jp/toppage/topinfo/rikunabi/051213/?http://katsuyou.rikunabi-shinsotsu.yahoo.co.jp/2007/";>人ã¨å·®ãã¤ãå°±è·æ´»åãããã - http://rd.yahoo.co.jp/toppage/topinfo/event_xmas/051125/?http://xmas.yahoo.co.jp/";>ãã¤ã³ã5åã®ã¯ãªã¹ãã¹ã®ããã¯12æã¾ã§ï¼'; $config['char-encoding']='utf8'; $config['input-encoding']='utf8'; $config['output-encoding']='utf8'; $config['output-xhtml']=true; echo tidy_repair_string($dirty, $config); ?> Expected result: Note well the correct unicode e-acute and e-grave in the French text. http://www.w3.org/1999/xhtml";> http://fr.yahoo.com/r/n/fd/7/*http://fr.news.yahoo.com/12122005/202/beatrice-dalle-temoigne-au-proces-de-son-mari-accuse-de.html";> Béatrice Dalle témoigne au procès de son mari accusé de viol http://rd.yahoo.co.jp/toppage/topinfo/rikunabi/051213/?http://katsuyou.rikunabi-shinsotsu.yahoo.co.jp/2007/";> 人ã¨å·®ãã¤ãå°±è·æ´»åãããã - http://rd.yahoo.co.jp/toppage/topinfo/event_xmas/051125/?http://xmas.yahoo.co.jp/";> ãã¤ã³ã5åã®ã¯ãªã¹ãã¹ã®ããã¯12æã¾ã§ï¼ Actual result: -- Note how the e-acute and e-grave has been replaced with a non-unicode character. http://www.w3.org/1999/xhtml";> http://fr.yahoo.com/r/n/fd/7/*http://fr.news.yahoo.com/12122005/202/beatrice-dalle-temoigne-au-proces-de-son-mari-accuse-de.html";> B�atrice Dalle t�moigne au proc�s de son mari accus� de viol http://rd.yahoo.co.jp/toppage/topinfo/rikunabi/051213/?http://katsuyou.rikunabi-shinsotsu.yahoo.co.jp/2007/";> 人ã¨å·®ãã¤ãå°±è·æ´»åãããã - http://rd.yahoo.co.jp/toppage/topinfo/event_xmas/051125/?http://xmas.yahoo.co.jp/";> ãã¤ã³ã5åã®ã¯ãªã¹ãã¹ã®ããã¯12æã¾ã§ï¼ -- Edit bug report at http://bugs.php.net/?id=35647&edit=1 -- Try a CVS snapshot (PHP 4.4): http://bugs.php.net/fix.php?id=35647&r=trysnapshot44 Try a CVS snapshot (PHP 5.1): http://bugs.php.net/fix.php?id=35647&r=trysnapshot51 Try a CVS snapshot (PHP 6.0): http://bugs.php.net/fix.php?id=35647&r=trysnapshot60 Fixed in CVS: http://bugs.php.net/fix.php?id=35647&r=fixedcvs Fixed in release: http://bugs.php.net/fix.php?id=35647&r=alreadyfixed Need backtrace: http://bugs.php.net/fix.php?id=35647&r=needtrace Need Reproduce Script:http://bugs.php.net/fix.php?id=35647&r=needscript Try newer version:http://bugs.php.net/fix.php?id=35647&r=oldversion Not developer issue: http://bugs.php.net/fix.php?id=35647&r=support Expected behavior:http://bugs.php.net/fix.php?id=35647&r=notwrong Not enough info: http://bugs.php.net/fix.php?id=35647&r=notenoughinfo Submitted twice: http://bugs.php.net/fix.php?id=35647&r=submittedtwice register_globals: http://bugs.php.net/fix.php?id=35647&r=globals PHP 3 support discontinued: http://bugs.php.net/fix.php?id=35647&r=php3 Daylight Savings: http://bugs.php.net/fix.php?id=35647&r=dst IIS Stability:http://bugs.php.net/fix.php?id=35647&r=isapi Install GNU Sed: http://bugs.php.net/fix.php?id=35647&r=gnused Floating point limitations: http://bugs.php.net/fix.php?id=35647&r=float No Zend Extensions: http://bugs.php.net/fix.php?id=35647&r=nozend MySQL Configuration Error:http://bugs.php.net/fix.php?id=35647&r=mysqlcfg