ID:               43294
 Updated by:       [EMAIL PROTECTED]
 Reported By:      tallyce at gmail dot com
-Status:           Open
+Status:           Bogus
 Bug Type:         Strings related
 Operating System: Windows or Linux
 PHP Version:      5.2.5
 New Comment:

Marking this as bogus for now.  If you can show that a properly UTF-8
encoded dagger, or some other properly encoded UTF-8 character isn't
working, re-open it with that information.  Make sure you show the
actual raw byte sequence that is being passed into the function.


Previous Comments:
------------------------------------------------------------------------

[2008-01-29 14:57:26] [EMAIL PROTECTED]

Just check to see if the dagger is properly represented as a UTF-8
character.  It should be e2 80 a0 
That same symbol can be represented in other encodings, obviously, but
if you are telling htmlentities that you are using UTF-8 and you then
pass it a dagger not encoded in UTF-8, it has no idea what to do with
it.

To test it correctly, do this:

echo htmlentities(chr(0xe2).chr(0x80).chr(0xa0),null,'utf-8');

Spits out † then everything is fine, and the cases where it
isn't working for you is because you aren't actually passing it the
correct utf-8 sequence for that character.  I don't do Windows, but the
above test works fine on Linux, FreeBSD and OSX for me.

------------------------------------------------------------------------

[2008-01-22 14:55:12] tallyce at gmail dot com

I've been spending further time trying to work out what's happening,
and am convinced something is definitely not right.

I've also found another character where the presence of the character
results in the whole string disappearing, and there may be others.

Using this reproduce code:

<?php echo htmlentities ('Test › †', ENT_COMPAT, 'UTF-8') . '<br />' .
preg_replace('/[^\x00-\x7F]/e', '"&#".ord("$0").";"', 'Test › †') . '<br
/>' . htmlentities ('Test', ENT_COMPAT, 'UTF-8') . '<br />'; ?>

I get different results for machines running SUSE Linux/PHP5.2.4, Linux
Ubuntu/PHP 5.2.3 and WinXP/PHP 5.2.5. Only the second gives the result I
would expect.





1. From a linux machine terminal:

Firstly doing
less t.php
gives
<?php echo htmlentities ('Test 233 206', ENT_COMPAT, 'UTF-8') . '<br
/>' . preg_replace('/[^\x00-\x7F]/e', '"&#".ord("$0").";"', 'Test 233
206') . '<
br />' . htmlentities ('Test', ENT_COMPAT, 'UTF-8') . '<br />'; ?>
with the 233 and 206 background-highlighted.


php -v
PHP 5.2.4 (cli) (built: Sep 12 2007 15:23:24)
Copyright (c) 1997-2007 The PHP Group
Zend Engine v2.2.0, Copyright (c) 1998-2007 Zend Technologies

Test <br />Test &#155; &#134;<br />Test<br />




2. From the same machine but viewing with a web browser
(FF2.0.0.11/WinXP), i.e. example.com/t.php (which is serving up UTF-8
pages as confirmed by web-sniffer.net):

Test ? ?<br />Test &#155; &#134;<br />Test<br />

[two symbols appear as ? in diamond]



3. On another machine, with the putty terminal set to UTF-8:

less t.php
gives:
<?php echo htmlentities ('Test › †', ENT_COMPAT, 'UTF-8') . '<br />' .
preg_replace('/[^\x00-\x7F]/e', '"&#".ord("$0").";"', 'Test › †') . '<br
/>' . htmlentities ('Test', ENT_COMPAT, 'UTF-8') . '<br />'; ?>
exactly as first entered.

php -v
PHP 5.2.3-1ubuntu6.2 (cli) (built: Dec  3 2007 19:59:42)
Copyright (c) 1997-2007 The PHP Group
Zend Engine v2.2.0, Copyright (c) 1998-2007 Zend Technologies

php t.php
Test &rsaquo; &dagger;<br />Test &#226;&#128;&#186;
&#226;&#128;&#160;<br />Test<br />



4. Same machine as (3) but via web browser:

Test &rsaquo; &dagger;<br />Test &#226;&#128;&#186;
&#226;&#128;&#160;<br />Test<br />



5. On a Windows machine

C:\Documents and Settings\username>php -v
PHP 5.2.5 (cli) (built: Nov  8 2007 23:18:51)
Copyright (c) 1997-2007 The PHP Group
Zend Engine v2.2.0, Copyright (c) 1998-2007 Zend Technologies

H:\>php t.php
PHP Warning:  htmlentities(): Invalid multibyte sequence in argument in
H:\t.php on line 1
<br />Test &#155; &#134;<br />Test<br />



6. Same machine as (5) but via web browser

<br />Test &#155; &#134;<br />Test<br />

------------------------------------------------------------------------

[2007-12-18 01:00:01] php-bugs at lists dot php dot net

No feedback was provided for this bug for over a week, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

------------------------------------------------------------------------

[2007-12-10 10:02:15] [EMAIL PROTECTED]

Correct output:

$ php t.php
Test &dagger;<br />Test


------------------------------------------------------------------------

[2007-12-10 10:01:49] [EMAIL PROTECTED]

Seems to work fine for me:

[EMAIL PROTECTED] ~]$ php t.php
Test &dagger;<br />Test[

Please try on command line.

------------------------------------------------------------------------

The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
    http://bugs.php.net/43294

-- 
Edit this bug report at http://bugs.php.net/?id=43294&edit=1

Reply via email to