On Mon, May 30, 2016 at 5:40 PM, Stanislav Malyshev <smalys...@gmail.com> wrote: >> BOM's should not be treated as characters and should not be sent to >> the output. Is there any reason this should be considered the expected >> behavior? > > The reason would be PHP does not know where surrounding output ends and > the code starts, beyond <?php. That means if there is some stuff in the > file before <?php, it would be output - and it's an intended behavior, > and so will happen with BOM too. Particular sequence of bytes being BOM > and whether it is desired or not depends on context, but PHP engine does > not have this context. Remember that pure HTML page is also a valid PHP > file. > I'm with Sammy on the principle that being able to have a BOM in a given file is important to any non-ascii code development. Though we can argue whether that's good or even necessary, I honestly don't know how prevalent non-english coding is among PHP developers.
In fact, the idea of stripping content from a script file isn't without precedent. Shebang lines are routinely removed from cli/cgi/fpm, and if you want to properly output it, you need to do so in a coded echo statement. (The stripping only applies to a literal, non-scripting line in the file, not dynamic output). So can we apply the same to the BOM? There's the obvious BC danger of files which might depend on this behavior (declaring their encoding via BOM, which happens to be the same as the script encoding). So how about declare statement? {U+FEFF}<?php declare(strip_bom=true); code(); code(); code(); It's got the advantage of being per-file (a view template might actually want the BOM included, while some business logic piece doesn't, for example. It's a compile-time strip, so it has no runtime cost. It's non-surprising, since it's stated in every file for which the BOM strip is intentional. -Sara -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php