ID: 43294 Updated by: [EMAIL PROTECTED] Reported By: tallyce at gmail dot com -Status: Open +Status: Bogus Bug Type: Strings related Operating System: Windows or Linux PHP Version: 5.2.5 New Comment:
Marking this as bogus for now. If you can show that a properly UTF-8 encoded dagger, or some other properly encoded UTF-8 character isn't working, re-open it with that information. Make sure you show the actual raw byte sequence that is being passed into the function. Previous Comments: ------------------------------------------------------------------------ [2008-01-29 14:57:26] [EMAIL PROTECTED] Just check to see if the dagger is properly represented as a UTF-8 character. It should be e2 80 a0 That same symbol can be represented in other encodings, obviously, but if you are telling htmlentities that you are using UTF-8 and you then pass it a dagger not encoded in UTF-8, it has no idea what to do with it. To test it correctly, do this: echo htmlentities(chr(0xe2).chr(0x80).chr(0xa0),null,'utf-8'); Spits out † then everything is fine, and the cases where it isn't working for you is because you aren't actually passing it the correct utf-8 sequence for that character. I don't do Windows, but the above test works fine on Linux, FreeBSD and OSX for me. ------------------------------------------------------------------------ [2008-01-22 14:55:12] tallyce at gmail dot com I've been spending further time trying to work out what's happening, and am convinced something is definitely not right. I've also found another character where the presence of the character results in the whole string disappearing, and there may be others. Using this reproduce code: <?php echo htmlentities ('Test ', ENT_COMPAT, 'UTF-8') . '<br />' . preg_replace('/[^\x00-\x7F]/e', '"&#".ord("$0").";"', 'Test ') . '<br />' . htmlentities ('Test', ENT_COMPAT, 'UTF-8') . '<br />'; ?> I get different results for machines running SUSE Linux/PHP5.2.4, Linux Ubuntu/PHP 5.2.3 and WinXP/PHP 5.2.5. Only the second gives the result I would expect. 1. From a linux machine terminal: Firstly doing less t.php gives <?php echo htmlentities ('Test 233 206', ENT_COMPAT, 'UTF-8') . '<br />' . preg_replace('/[^\x00-\x7F]/e', '"&#".ord("$0").";"', 'Test 233 206') . '< br />' . htmlentities ('Test', ENT_COMPAT, 'UTF-8') . '<br />'; ?> with the 233 and 206 background-highlighted. php -v PHP 5.2.4 (cli) (built: Sep 12 2007 15:23:24) Copyright (c) 1997-2007 The PHP Group Zend Engine v2.2.0, Copyright (c) 1998-2007 Zend Technologies Test <br />Test › †<br />Test<br /> 2. From the same machine but viewing with a web browser (FF2.0.0.11/WinXP), i.e. example.com/t.php (which is serving up UTF-8 pages as confirmed by web-sniffer.net): Test ? ?<br />Test › †<br />Test<br /> [two symbols appear as ? in diamond] 3. On another machine, with the putty terminal set to UTF-8: less t.php gives: <?php echo htmlentities ('Test ', ENT_COMPAT, 'UTF-8') . '<br />' . preg_replace('/[^\x00-\x7F]/e', '"&#".ord("$0").";"', 'Test ') . '<br />' . htmlentities ('Test', ENT_COMPAT, 'UTF-8') . '<br />'; ?> exactly as first entered. php -v PHP 5.2.3-1ubuntu6.2 (cli) (built: Dec 3 2007 19:59:42) Copyright (c) 1997-2007 The PHP Group Zend Engine v2.2.0, Copyright (c) 1998-2007 Zend Technologies php t.php Test › †<br />Test › †<br />Test<br /> 4. Same machine as (3) but via web browser: Test › †<br />Test › †<br />Test<br /> 5. On a Windows machine C:\Documents and Settings\username>php -v PHP 5.2.5 (cli) (built: Nov 8 2007 23:18:51) Copyright (c) 1997-2007 The PHP Group Zend Engine v2.2.0, Copyright (c) 1998-2007 Zend Technologies H:\>php t.php PHP Warning: htmlentities(): Invalid multibyte sequence in argument in H:\t.php on line 1 <br />Test › †<br />Test<br /> 6. Same machine as (5) but via web browser <br />Test › †<br />Test<br /> ------------------------------------------------------------------------ [2007-12-18 01:00:01] php-bugs at lists dot php dot net No feedback was provided for this bug for over a week, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open". ------------------------------------------------------------------------ [2007-12-10 10:02:15] [EMAIL PROTECTED] Correct output: $ php t.php Test †<br />Test ------------------------------------------------------------------------ [2007-12-10 10:01:49] [EMAIL PROTECTED] Seems to work fine for me: [EMAIL PROTECTED] ~]$ php t.php Test †<br />Test[ Please try on command line. ------------------------------------------------------------------------ The remainder of the comments for this report are too long. To view the rest of the comments, please view the bug report online at http://bugs.php.net/43294 -- Edit this bug report at http://bugs.php.net/?id=43294&edit=1