ID: 43549 Updated by: [EMAIL PROTECTED] Reported By: mariusads at helpedia dot com -Status: Open +Status: Feedback Bug Type: Strings related Operating System: Redhat?, Linux PHP Version: 5.2.5 New Comment:
You never specified the charset for the page. This works fine: <html> <head> <meta http-equiv="content-type" content="text/html; charset=utf-8" /> </head> <body> <pre> <?php $text = isset($_REQUEST['text']) ? $_REQUEST['text'] : ''; var_dump($text); var_dump(htmlentities($text,ENT_QUOTES,'UTF-8')); ?> </pre> <form name="A" method="post"> <textarea name="text"></textarea> <input name="sub" type="submit" value="submit"/> </form> </body></html> Previous Comments: ------------------------------------------------------------------------ [2007-12-10 11:45:38] mariusads at helpedia dot com Just downloaded on my computer (Windows 2003, PHP 5.2.5 from website) and the same problem occurs. For example this one works: hxtp://devtgdb.definethis.org:90/pc/faq/5842/Diablo-page1.html but this one doesn't: hxtp://devtgdb.definethis.org:90/pc/faq/5845/Diablo-page1.html The source code is identical, only difference is ads are disabled from site config. Also, if the links don't work, sorry, you may read this while I'm sleeping and my computer is turned off. Otherwise, it's cable 4mbps/512kbps so they should work. (again, please replace hxtp with http) ------------------------------------------------------------------------ [2007-12-10 11:24:16] mariusads at helpedia dot com Here are several pages that show this problem with htmlentities: hxtp://www.tgdb.net/pc/cheats/19556/18_Wheels_of_Steel_Convoy-page1.html hxtp://www.tgdb.net/pc/faq/5845/Diablo-page1.html The content on the second link worked fine up until the PHP version was upgraded. This page and lots of other work: hxtp://www.tgdb.net/pc/faq/5841/Diablo-page1.html So it's not a badly coded script in the sense that it worked as I planned. You can see the text right before it's being sent to htmlentities in all pages in a html comment, you just have to view the source (with the only difference that I've replaced '--' with '==' as -- is not allowed in comments. When I reported the problem to the hosting company, I have uploaded the test script written in the first post on two of their servers and a server from Dreamhost. PHP 5.2.5 hxtp://www.helpedia.com/test2.php PHP 5.2.5 hxtp://www.tgdb.net/test2.php PHP 5.2.2 hxtp://www.definethis.org/test2.php I've opened the file a.txt in Firefox, pressed Ctrl+A to select all text, copied to Clipboard and pasted it to the form. Result is an empty string on PHP 5.2.5 and the correct string on PHP 5.2.2. Correct result also on my work computer with PHP 5.2.4 I didn't manage to download 5.2.5 on my work computer and test it, so I guess it could be a bad build on the hosting company's servers. Will try in the following hour. (replace hxtp with http, this page thinks I'm spamming) ------------------------------------------------------------------------ [2007-12-10 09:32:57] [EMAIL PROTECTED] Works fine for me. Are you sure you have everything as utf-8..ie. the page you're sending the form from has content-type set to utf-8 ? ------------------------------------------------------------------------ [2007-12-09 23:59:03] mariusads at helpedia dot com Description: ------------ I run a website that accepts game cheats submissions from users and displays them in categories and so on. User submits .txt files which are saved on the driver, a certain page on the website reads the text file or a fragment of it, performs htmlentities on it and displays it on the screen. Recently, the hosting company upgraded PHP to PHP 5.2.5 and with htmlentities returned an empty string when trying to escape it. I understand this is probably because of that fix regarding multi-byte characters in string, making htmlentities ignore input. That seems dumb a bit, shouldn't it return at least a string part that's before that multibyte character? Anyway, the file submitted is plain text and I honestly don't know what characters are wrong, that it would make htmlentities to ignore the text. The file is uploaded here: http://www.tgdb.net/a.txt In the scripts I have the following code: function htmlesc($text) { $s = html_entity_decode($text,ENT_QUOTES,'UTF-8'); return htmlentities($s,ENT_QUOTES,'UTF-8');} } The text passes html_entity_decode with no problems but htmlentities returns empty string. If possible, could you please tell me how could I check in the future if a string contains multibyte characters, so that i don't have this problem? Right now, the only solution the hosting company gave to me was to add a rule in .htaccess which makes the server process the PHP files with PHP4. Thank you for your help. Marius Hudea PS. The captcha doesn't seem to work right, I'm sure I didn't get the captcha wrong 8 times in a row Reproduce code: --------------- I've used the code below uploaded on several web servers to test: <html><body> <? $text = $_REQUEST['text']; echo htmlentities($text,ENT_QUOTES,'UTF-8'); ?> <form name="A" method="post"> <textarea name="text"></textarea> <input name="sub" type="submit" value="submit"/> </form> </body></html> Test file: http://www.tgdb.net/a.txt Expected result: ---------------- Expected to have the text displayed on the screen, to have the function return a non-empty string. Expected at least a partial string, up to that error, not having to check scripts for 5 minutes to see what went wrong. Actual result: -------------- Copy and paste text from a.txt results in an empty string. Any other text is processed correctly. ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=43549&edit=1