On 13 Nov 2009, at 12:25, Bertalan Fodor (LilyPondTool) wrote:
Hehe, we've got this:
<INITIAL,chords,lyrics,figures,notes>{BOM_UTF8}/.* {
if (this->lexloc->line_number () != 1 || this->lexloc->column_number
() != 0)
{
LexerError (_ ("stray UTF-8 BOM encountered").c_str ());
exit (1);
That means, we correctly parse the BOM, but exit if it is not the
first char.
This link says that, though there is no Unicode protocol for those of
BOMs, they suggest to treat them as a zero-width space (the non-
breaking part is not relevant here):
http://unicode.org/faq/utf_bom.html#bom6
So to follow that suggestion, that error-code should be removed, if
you now want to admit BOMs.
Hans
Hans Aberg wrote:
On 13 Nov 2009, at 10:08, Bertalan Fodor (LilyPondTool) wrote:
I think changing the LilyPond parser to support BOM in the middle
(ie not at the beginning) of the file is very hard. Actually if it
is not at the beginning, then it should be treated as a regular
character, which might not be present just anywhere in the file.
Why would that be? Did you not have a Flex generated .l file? If
the input .l file is in UTF-8 and Flex in 8-bit mode, add a rule
"<BOM>" {}
where <BOM> is the UTF-8 representation of the BOM. It will than
add act as space, breaking tokens, but otherwise ignored. So it
acts a zero-width space.
Hans
_______________________________________________
lilypond-devel mailing list
lilypond-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/lilypond-devel