Re: Compiling Reverse Polish Calculator Example

Hans Aberg Thu, 01 Nov 2012 16:29:16 -0700

On 1 Nov 2012, at 18:34, Akim Demaille wrote:

> Hi Hans,
> 
> Le 31 oct. 2012 à 15:47, Hans Aberg a écrit :
> 
>> It is pointless in UTF-8, and accepting it encourages a number of other 
>> problems.
>> https://en.wikipedia.org/wiki/Byte_order_mark
> 
> You are right that Bison wants at least to be able to read
> the ASCII part of the 8 bits, so that sort-of means UTF-8,
> if we consider that Latin 1 and the like are dead.
> 
> If we were to ignore the BOM, then at least we should check
> that they match UTF-8, and reject the file otherwise?
> 
> FWIW, the D compilers for instance obey these BOM, including for
> other codings than UTF-8.


Note that the OP had a UTF-16 BOM, which is just an error in UTF-8. The UTF-8 
BOM is a 3-byte sequence; there is a FAQ here:
  http://www.unicode.org/faq/utf_bom.html#BOM
I think it says that if one wants to recognize it in the middle of a file, just 
ignore it.

But point three of the last issue says to not use it in streams that expect 
starting with ASCII, like UNIX script '#!' then, and point four says one must 
not use it for the non-UTF-8 encodings if the endianess is being declared. So 
it seems inconsistent admitting it in UTF-8. Perhaps some legacy from those 
stray editors.

Hans



_______________________________________________
[email protected] https://lists.gnu.org/mailman/listinfo/help-bison

Re: Compiling Reverse Polish Calculator Example

Reply via email to