On 1 Nov 2012, at 18:34, Akim Demaille wrote: > Hi Hans, > > Le 31 oct. 2012 à 15:47, Hans Aberg a écrit : > >> It is pointless in UTF-8, and accepting it encourages a number of other >> problems. >> https://en.wikipedia.org/wiki/Byte_order_mark > > You are right that Bison wants at least to be able to read > the ASCII part of the 8 bits, so that sort-of means UTF-8, > if we consider that Latin 1 and the like are dead. > > If we were to ignore the BOM, then at least we should check > that they match UTF-8, and reject the file otherwise? > > FWIW, the D compilers for instance obey these BOM, including for > other codings than UTF-8.
Note that the OP had a UTF-16 BOM, which is just an error in UTF-8. The UTF-8 BOM is a 3-byte sequence; there is a FAQ here: http://www.unicode.org/faq/utf_bom.html#BOM I think it says that if one wants to recognize it in the middle of a file, just ignore it. But point three of the last issue says to not use it in streams that expect starting with ASCII, like UNIX script '#!' then, and point four says one must not use it for the non-UTF-8 encodings if the endianess is being declared. So it seems inconsistent admitting it in UTF-8. Perhaps some legacy from those stray editors. Hans _______________________________________________ help-bison@gnu.org https://lists.gnu.org/mailman/listinfo/help-bison