ID: 22108 Updated by: [EMAIL PROTECTED] Reported By: [EMAIL PROTECTED] Status: Open Bug Type: Feature/Change Request -Operating System: windows 2000 +Operating System: Any -PHP Version: 4.2.3 +PHP Version: All (as of the current implementation) -Assigned To: +Assigned To: moriyoshi New Comment:
And assigning this task to me. Previous Comments: ------------------------------------------------------------------------ [2003-02-08 01:48:15] [EMAIL PROTECTED] Yes, I suppose this might be a bug, but most of developers involved in PHP are not just so aware of this issue as you expected (and I had expected). So I thought that changing the category is a better choice than bogusing. ------------------------------------------------------------------------ [2003-02-07 23:13:07] [EMAIL PROTECTED] The BOM (byte order mark) is a few bytes at the very front of a file that act as a signature denoting what type of encoding has been used, and in UTF16/32 it also makes the byte order (LE or BE). Although utf-8 is byte order independent, it has become popular on windows (perhaps not so on unix) to make use of the BOM encoded in UTF-8 to flag the file as being in UTF-8 format. This allows editors to determine the type of the file from the first few characters instead of trying to guess what type the file is. Ref: Textpad 4.6 (http://textpad.com) See the Unicode FAQ for details of the utf-8 BOM... http://www.unicode.org/unicode/faq/utf_bom.html#25 The use of this should be obvious, you have to leave the my-language-only mindset that afflicts too many programmers (myself included before this job) and think about the growing multiplicity of languages on the web. I am writing web applications in Japan, with European language and CJK (Chinese/Japanese/Korean) language processing and interfaces. Thus I have php files where variable values are strings of all sorts of languages - hence utf-8 encoding. I feel that this is definitely a bug in php. Considering that: * php is slowly growing into a language-neutral (i18n/l10n possible) language * php is designed such that php commands can be liberally sprinkled through html, and html is increasing encoded in utf-8 these days * the utf-8 bom is becoming increasingly popular for reasons of indentifying the file character format * if the utf-8 bom exists php actually outputs it incorrectly and in doing so prevents header output I request that you don't see this as a feature request, but as a bug in the handling of utf-8 files. Whether the output generator is the correct characterization of this bug or not I leave up to you. Regards, Brodie. ------------------------------------------------------------------------ [2003-02-07 21:41:23] [EMAIL PROTECTED] Because BOM issue has been referenced repeatedly as a header output preventer and we should be more aware of this, I don't see any reason we have to mark this report as bogus. Changing category from "output control" to a kind of "feature request". ------------------------------------------------------------------------ [2003-02-07 13:57:22] [EMAIL PROTECTED] Thank you for taking the time to write to us, but this is not a bug. Please double-check the documentation available at http://www.php.net/manual/ and the instructions on how to report a bug at http://bugs.php.net/how-to-report.php BOM = Byte Order Mark for UCS-2 encoding This value sould not be used in UTF-8 since the only reason besides detecting the byte order of UCS-2 was a special non breaking space. And newer Unicode versions have another representation for the same thing. Anyhow BOM = FE FF That makes depending on the byte order: UCS-2BE <-> "\xFE\xFF" UCS-2LE <-> "\xFF\xFE" Therefore a sequence of "EF BB" is another character and must not be ignored. ------------------------------------------------------------------------ [2003-02-07 10:42:16] [EMAIL PROTECTED] sniper, imagine someone would want to echo some text in eg. French. In that case, if you'd save it as ascii, you would get corrupted output. So instead you'd have to save as utf-8. Which seems to cause problems (or so [EMAIL PROTECTED] tells us) ------------------------------------------------------------------------ The remainder of the comments for this report are too long. To view the rest of the comments, please view the bug report online at http://bugs.php.net/22108 -- Edit this bug report at http://bugs.php.net/?id=22108&edit=1