On Mon, May 30, 2016 at 5:40 PM, Stanislav Malyshev <smalys...@gmail.com> wrote:
>> BOM's should not be treated as characters and should not be sent to
>> the output. Is there any reason this should be considered the expected
>> behavior?
>
> The reason would be PHP does not know where surrounding output ends and
> the code starts, beyond <?php. That means if there is some stuff in the
> file before <?php, it would be output - and it's an intended behavior,
> and so will happen with BOM too. Particular sequence of bytes being BOM
> and whether it is desired or not depends on context, but PHP engine does
> not have this context. Remember that pure HTML page is also a valid PHP
> file.
>
I'm with Sammy on the principle that being able to have a BOM in a
given file is important to any non-ascii code development.  Though we
can argue whether that's good or even necessary, I honestly don't know
how prevalent non-english coding is among PHP developers.

In fact, the idea of stripping content from a script file isn't
without precedent.  Shebang lines are routinely removed from
cli/cgi/fpm, and if you want to properly output it, you need to do so
in a coded echo statement. (The stripping only applies to a literal,
non-scripting line in the file, not dynamic output).

So can we apply the same to the BOM?  There's the obvious BC danger of
files which might depend on this behavior (declaring their encoding
via BOM, which happens to be the same as the script encoding).

So how about declare statement?

{U+FEFF}<?php
  declare(strip_bom=true);

code(); code(); code();

It's got the advantage of being per-file (a view template might
actually want the BOM included, while some business logic piece
doesn't, for example.  It's a compile-time strip, so it has no runtime
cost.  It's non-surprising, since it's stated in every file for which
the BOM strip is intentional.

-Sara

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to