#22108 [Com]: php doesn't ignore the utf-8 BOM

yyasarr at hotmail dot com Mon, 28 Jul 2003 23:13:30 -0700

 ID:               22108
 Comment by:       yyasarr at hotmail dot com
 Reported By:      bugzilla at jellycan dot com
 Status:           Assigned
 Bug Type:         Feature/Change Request
 Operating System: Any
 PHP Version:      All (as of the current implementation)
 Assigned To:      moriyoshi
 New Comment:


php really doesn't ignore the utf-8 BOM
This is A BUG !

but can be solved easily



SOLVE METHOD:
1. Open any simple text editor. Such as NC (Norton Commander).
2. F4 for editing.
3. Delete first 3 (three) or 2 (two) bytes, characters [depends on file
type ] 

   file types:
   -----------
   UTF-8: EF BB BF              // 3(three) bytes
   Unicode big endian : FE FF   // 2 (two) bytes
   Unicode: FF FE               // 2 (two) bytes
4. Check the result.

i am using this method. if u know better one, write it.

__________________
Yashar Alekberzade


Previous Comments:
------------------------------------------------------------------------

[2003-07-19 09:01:50] ipa at assis dot lt

Simple Example:
I have multilanguage system where my text strings ar written as huge
array in one file. It has been saved in UTF-8 with simple Notepad,
which allows to save this format.

when i include() this file to access text strings, it outputs that
three bytes, so IE have displayed visible breakline with that HTML.

Result: cannot write multilanguage PHP scripts even with notepad. Have
to use iconv() what i dificult, because not every hoster have compiled
PHP with it.

------------------------------------------------------------------------

[2003-07-03 15:05:35] jaanus at heeringson dot com

One thing i forgot to mention in my previous post is that multiple
BOM's also inhibit Internet Explorer 6 from rendering pages in
standards-compliance mode, which is a big issue when developing by w3c
standards.

------------------------------------------------------------------------

[2003-07-03 14:59:44] jaanus at heeringson dot com

There are some consequenses that are not mentioned in the above
comments. More than one BOM renders a perfectly valid XHTM file
non-valid. Apparently the XHTML specification allows BOM characters,
but only ONE, not multiple which is what you get if you include other
utf-8 files (include, require). This in turn results in the possibility
that the xml declaration in not read since it is not found. This is the
case with the w3c validator.

------------------------------------------------------------------------

[2003-06-04 03:11:51] [EMAIL PROTECTED]

That script appears to be written in UTF-16. As for UTF-16, it could
actually be a parser problem as well, but this report addresses the
issue related to UTF-8.

------------------------------------------------------------------------

[2003-06-04 02:59:33] [EMAIL PROTECTED]

Actually, not totally. A friend mailed me a PHP script, which had the
annoying BOM AND the whole file was in double byte (saved by
notepad)... which definitely makes it a parser problem too (\0 < \0 ?
\0 p   doesn't match "<?p" for example).

------------------------------------------------------------------------

The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
    http://bugs.php.net/22108

-- 
Edit this bug report at http://bugs.php.net/?id=22108&edit=1

#22108 [Com]: php doesn't ignore the utf-8 BOM

Reply via email to