ID:               22108
 Updated by:       [EMAIL PROTECTED]
 Reported By:      [EMAIL PROTECTED]
-Status:           Open
+Status:           Bogus
 Bug Type:         Feature/Change Request
 Operating System: Any
 PHP Version:      All (as of the current implementation)
 Assigned To:      moriyoshi
 New Comment:

PHP doesn't want UNICODE scripts, but just ASCII ones. Not a bug ->
bogus.


Previous Comments:
------------------------------------------------------------------------

[2003-02-08 02:01:11] [EMAIL PROTECTED]

And assigning this task to me.


------------------------------------------------------------------------

[2003-02-08 01:48:15] [EMAIL PROTECTED]

Yes, I suppose this might be a bug, but most of developers involved in
PHP are not just so aware of this issue as you expected (and I had
expected). So I thought that changing the category is a better choice
than bogusing.


------------------------------------------------------------------------

[2003-02-07 23:13:07] [EMAIL PROTECTED]

The BOM (byte order mark) is a few bytes at the very front of a file
that act as a signature denoting what type of encoding has been used,
and in UTF16/32 it also makes the byte order (LE or BE). Although utf-8
is byte order independent, it has become popular on windows (perhaps
not so on unix) to make use of the BOM encoded in UTF-8 to flag the
file as being in UTF-8 format. This allows editors to determine the
type of the file from the first few characters instead of trying to
guess what type the file is. Ref: Textpad 4.6 (http://textpad.com)

See the Unicode FAQ for details of the utf-8 BOM...
http://www.unicode.org/unicode/faq/utf_bom.html#25

The use of this should be obvious, you have to leave the
my-language-only mindset that afflicts too many programmers (myself
included before this job) and think about the growing multiplicity of
languages on the web. I am writing web applications in Japan, with
European language and CJK (Chinese/Japanese/Korean) language processing
and interfaces. Thus I have php files where variable values are strings
of all sorts of languages - hence utf-8 encoding.

I feel that this is definitely a bug in php. Considering that:
* php is slowly growing into a language-neutral (i18n/l10n possible)
language
* php is designed such that php commands can be liberally sprinkled
through html, and html is increasing encoded in utf-8 these days
* the utf-8 bom is becoming increasingly popular for reasons of
indentifying the file character format
* if the utf-8 bom exists php actually outputs it incorrectly and in
doing so prevents header output

I request that you don't see this as a feature request, but as a bug in
the handling of utf-8 files. Whether the output generator is the
correct characterization of this bug or not I leave up to you.

Regards,
Brodie.

------------------------------------------------------------------------

[2003-02-07 21:41:23] [EMAIL PROTECTED]

Because BOM issue has been referenced repeatedly as a header output
preventer and we should be more aware of this, I don't see any reason
we have to mark this report as bogus.

Changing category from "output control" to a kind of "feature
request".


------------------------------------------------------------------------

[2003-02-07 13:57:22] [EMAIL PROTECTED]

Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

BOM = Byte Order Mark for UCS-2 encoding
This value sould not be used in UTF-8 since the only
reason besides detecting the byte order of UCS-2 was a 
special non breaking space. And newer Unicode versions 
have another representation for the same thing.

Anyhow BOM = FE FF
That makes depending on the byte order:
UCS-2BE <-> "\xFE\xFF"
UCS-2LE <-> "\xFF\xFE"

Therefore a sequence of "EF BB" is another character and 
must not be ignored.


------------------------------------------------------------------------

The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
    http://bugs.php.net/22108

-- 
Edit this bug report at http://bugs.php.net/?id=22108&edit=1

Reply via email to