Edit report at https://bugs.php.net/bug.php?id=62861&edit=1

 ID:                 62861
 Updated by:         ras...@php.net
 Reported by:        soapergem at gmail dot com
 Summary:            htmlentities returns empty string when it shouldn't
 Status:             Not a bug
 Type:               Bug
 Package:            *General Issues
 Operating System:   Windows
 PHP Version:        5.4.6
 Block user comment: N
 Private report:     N

 New Comment:

Every real editor can do that. Windows Notepad is not a real editor. Notepad++ 
(which is free and much much better than Notepad), Notepad2, Textmate, Vim, 
Jedit, Ultraedit, Emacs, SourceEdit can all do this.


Previous Comments:
------------------------------------------------------------------------
[2012-08-19 14:27:07] ni...@php.net

Windows Notepad does not support this because Notepad is not a suitable editor 
for development. All development-oriented texteditors and IDEs support saving 
files without BOM.

One commonly used text editor for Windows is Notepad++ (in case you don't want 
to use a full-blown IDE).

------------------------------------------------------------------------
[2012-08-19 14:11:43] soapergem at gmail dot com

There is no option to save without the BOM in Windows Notepad. Nor is there an 
option to save with/without the BOM in many other Windows editors. It is 
automatically added to the file and there is nothing I can do about that -- 
short of writing a script to programmatically go through all my other scripts 
with fopen(), remove the first three characters, and then re-save.

That is NOT a practical option. PHP should be handling this.

As it stands, PHP 5.4 is completely unusable. Until you guys fix this, I need 
to 
stick with 5.3, because 5.4 will break all of my scripts -- and all the scripts 
of ANYONE who uses htmlentities() on a Windows server. Please take my 
suggestion 
about using the default_charset to heart. That would finally resolve this issue.

------------------------------------------------------------------------
[2012-08-19 13:59:09] ni...@php.net

Save your document as UTF-8 *without* BOM. The  is just what the UTF-8 
Byte Order Mark (BOM) looks like when it is output (which is probably something 
you don't want, so save the file without it).

------------------------------------------------------------------------
[2012-08-19 13:49:39] ras...@php.net

>From my command line:

php > echo htmlentities('©', ENT_COMPAT | ENT_HTML401, 'UTF-8');
©

it works fine. If you are actually providing the correct UTF-8 char it will 
work 
fine. You can verify that by doing this:

php > $a = chr(0xC2).chr(0xA9);
php > echo htmlentities($a, ENT_COMPAT | ENT_HTML401, 'UTF-8');
©

Here I am explicitly passing C2A9 in and I get © back out.

So I have no idea what your Windows Notepad is doing. Look at the output with a 
hex editor and see what it is converting that copyright character to.

------------------------------------------------------------------------
[2012-08-19 13:30:07] soapergem at gmail dot com

Yes, your assumptions about what I was meaning to say were correct. I really 
meant "ANSI," which you know as CP-1252.

But there is definitely still a bug with this. I just followed your 
instructions 
by saving my test script specifically in the "UTF-8" encoding hoping that, as 
you said, "all my problems will go away."

They didn't.

My test script is exactly the same one that I have listed on this bug report. I 
saved it in Windows Notepad, using the "UTF-8" encoding. I am no longer getting 
an empty string -- which is progress. But now I am getting the following output:

©

This is definitely NOT the expected result here. It did finally convert the 
copyright symbol, but it prepended not one, not two, but THREE junk characters 
in front of it. This is even worse than before.

If I'm not mistaken, wasn't the whole reason PHP6 was abandoned because the 
idea 
of converting everything to Unicode deemed too ambitious? I've already spent 
far 
too much time dealing with this than is practical, as I'm sure you have much 
better things to do, as well. It just seems to me that you guys had a wonderful 
hammer -- a wonderful tool for the job -- and you went and broke off the hammer 
head for no apparent reason.

If I might make a humble suggestion, why not let htmlentities() default to 
whatever the default_charset option is in php.ini? Right now you can only do 
that by explicitly passing an empty string as the third parameter to 
htmlentities, which is very messy and counterintuitive. Shouldn't the 
default_charset actually be, you know, the _default character set_?

------------------------------------------------------------------------


The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at

    https://bugs.php.net/bug.php?id=62861


-- 
Edit this bug report at https://bugs.php.net/bug.php?id=62861&edit=1

Reply via email to