http://codereview.appspot.com/5505090/diff/1/lily/lexer.ll File lily/lexer.ll (right):
http://codereview.appspot.com/5505090/diff/1/lily/lexer.ll#newcode134 lily/lexer.ll:134: A [a-zA-Z\200-\377] On 2012/01/01 02:01:11, Keith wrote:
non-ASCII characters are used internally as-read, tested below only to
warn the
user if the input is invalid as UTF-8
You mean, you want a comment here? I actually don't know how they look in this section of the Flex source. http://codereview.appspot.com/5505090/diff/1/lily/lexer.ll#newcode220 lily/lexer.ll:220: (void) YYText_utf8 (); On 2012/01/01 02:01:11, Keith wrote:
Complaining about comments doesn't help anybody.
There are editors and tools that decide about which encoding to use based on an analysis of the whole file. Exceptions inside of comments can make them pick a "safe choice" like displaying every byte in binary. So I don't agree with "doesn't help anybody". Also if you start with a file that contains non-ASCII non-UTF-8 characters in comments, the editor will not assume that you want to save in UTF-8. All additionally entered characters will be encoded compatibly with the previously detected file encoding. So I don't agree with your statement that basically says that files that contain non-UTF-8 characters in comments can't cause trouble. I won't insist on throwing a warning here, but I won't remove it without being sure that this is based on an informed decision. http://codereview.appspot.com/5505090/diff/1/lily/lexer.ll#newcode1009 lily/lexer.ll:1009: case 0xc2: On 2011/12/31 18:28:00, lemzwerg wrote:
Wouldn't it be more effective to create an array of 128 bytes (for
0x80-0xFF)
which maps `p[i]' to the value of `more' (or 0 if there is more to
do)? How and when to break switch statements down into conditionals and lookup tables depending on the individual processor's performance characteristics is a pet peeve of compiler writers. I see little point meddling with that. Have you looked at the generated code? I'd probably have exploded the case 0x00..0x7f into the switch as well, but did not like the resulting number of lines, and I don't know if clang++ or whatever other compilers we are using can deal with case ranges (like gcc can). http://codereview.appspot.com/5505090/diff/1/lily/lexer.ll#newcode1083 lily/lexer.ll:1083: LexerWarning (_ ("non-UTF-8 characters").c_str ()); On 2012/01/01 02:01:11, Keith wrote:
I suggest "unsupported file encoding, expected UTF-8"
I think that error message would be appropriate once per run, either because of making it a fatal error, or as a summary at the end. But when flagging the actual place where there is a problematic character?
You could warn once per token, rather than once per character, by
following this
with break;
I think once per character is ok (after all, users don't appreciate fixing all errors of one run, just to get fresh errors on the next). It does not help, however, if the error message points merely to the token (more accurately, the lexing unit which does not need to result in a token). So the final version to commit should either point straight to the bad character, or give just one message per string. I'll change this to the latter before the final commit if nobody gives a good suggestion of how to make it do the former. http://codereview.appspot.com/5505090/ _______________________________________________ lilypond-devel mailing list lilypond-devel@gnu.org https://lists.gnu.org/mailman/listinfo/lilypond-devel