Re: Lilypond's error column printer confuses bytes and characters
On 2009-10-22, David Kastrup wrote: > Patrick McCarty writes: > > > On 2009-10-18, David Kastrup wrote: > >> > >> GNU LilyPond 2.13.4 > >> Processing `bad.ly' > >> Parsing... > >> bad.ly:4:16: error: syntax error, unexpected MUSIC_IDENTIFIER > >> MÃÃÃ A\342\231 > >> \257 Bâ \break > >> error: failed files: "bad.ly" > >> > >> Apparently, the error column is being tracked by counting characters, > >> but is displayed by counting bytes. The indicator appears too early > >> because of that (which caused me to look for the wrong bug in an input > >> file of mine). > > > > This patch seems to correct the issue, but I don't know if it's the > > correct fix (or if there are any side effects I'm unaware of). > > The code before states: > > while (left > 0) > { > /* > FIXME, this is apparently locale dependent. > */ > #if HAVE_MBRTOWC > wchar_t multibyte[2]; > size_t thislen = mbrtowc (multibyte, line_chars, left, &state); > #else > size_t thislen = 1; > #endif /* !HAVE_MBRTOWC */ > > The question is what we do about locales. I think that in this case > behavior is arguably correct since we are talking about column numbers > on the terminal/locale, and even when Lilypond is using utf-8, those > will correspond with the interpretation of the locale. Sorry about the delay. The output looks okay to me when invoking xterm with various locales. Also, the point-and-click functionality still seems to work correctly, so this *might* fix the problem Harmath reported a few weeks ago: http://lists.gnu.org/archive/html/bug-lilypond/2009-10/msg1.html > By the way: when I switch into POSIX locale, the error message will > occur before the first Umlaut which is then no longer considered text > apparently. So we already have some built-in locale dependencies > elsewhere. Yes, I'm pretty sure this is coming from glibc. After stepping through Source_file::get_counts() when LC_ALL=POSIX, I noticed that mbrtowc() returned -1 (type size_t) when it processed the ä. As a result, this condition prevents the consideration of more characters: /* Stop converting at invalid character; this can mean we have read just the first part of a valid character. */ if (thislen == (size_t) -1) break; It seems that non-ASCII characters are not valid characters when the locale is POSIX. The glibc docs aren't very clear on this point, and only mention the fact that mbrtowc() is locale-dependent. BTW, as the comment states, it would be nice to use a function that is not locale-dependent, since the only information we need is the size (in bytes) of the current UTF-8 character. > My vote is on getting it merged, but it probably would do no harm if > somebody checked this on Windows where the old version purportedly > worked. I'll apply it and make a note to check the next devel release on Windows. Thanks, Patrick ___ bug-lilypond mailing list bug-lilypond@gnu.org http://lists.gnu.org/mailman/listinfo/bug-lilypond
Re: Lilypond's error column printer confuses bytes and characters
Patrick McCarty writes: > On 2009-10-18, David Kastrup wrote: >> >> GNU LilyPond 2.13.4 >> Processing `bad.ly' >> Parsing... >> bad.ly:4:16: error: syntax error, unexpected MUSIC_IDENTIFIER >> MÃÃÃ A\342\231 >> \257 Bâ \break >> error: failed files: "bad.ly" >> >> Apparently, the error column is being tracked by counting characters, >> but is displayed by counting bytes. The indicator appears too early >> because of that (which caused me to look for the wrong bug in an input >> file of mine). > > This patch seems to correct the issue, but I don't know if it's the > correct fix (or if there are any side effects I'm unaware of). The code before states: while (left > 0) { /* FIXME, this is apparently locale dependent. */ #if HAVE_MBRTOWC wchar_t multibyte[2]; size_t thislen = mbrtowc (multibyte, line_chars, left, &state); #else size_t thislen = 1; #endif /* !HAVE_MBRTOWC */ The question is what we do about locales. I think that in this case behavior is arguably correct since we are talking about column numbers on the terminal/locale, and even when Lilypond is using utf-8, those will correspond with the interpretation of the locale. Or something. Anyway, it seems like this change would cause the surrounding function to behave more consistently. As to consistency: when I switch into POSIX locale, the error message will occur before the first Umlaut which is then no longer considered text apparently. So we already have some built-in locale dependencies elsewhere. -- David Kastrup ___ bug-lilypond mailing list bug-lilypond@gnu.org http://lists.gnu.org/mailman/listinfo/bug-lilypond
Re: Lilypond's error column printer confuses bytes and characters
Patrick McCarty writes: > On 2009-10-18, David Kastrup wrote: >> >> GNU LilyPond 2.13.4 >> Processing `bad.ly' >> Parsing... >> bad.ly:4:16: error: syntax error, unexpected MUSIC_IDENTIFIER >> MÃÃÃ A\342\231 >> \257 Bâ \break >> error: failed files: "bad.ly" >> >> Apparently, the error column is being tracked by counting characters, >> but is displayed by counting bytes. The indicator appears too early >> because of that (which caused me to look for the wrong bug in an input >> file of mine). > > This patch seems to correct the issue, but I don't know if it's the > correct fix (or if there are any side effects I'm unaware of). The code before states: while (left > 0) { /* FIXME, this is apparently locale dependent. */ #if HAVE_MBRTOWC wchar_t multibyte[2]; size_t thislen = mbrtowc (multibyte, line_chars, left, &state); #else size_t thislen = 1; #endif /* !HAVE_MBRTOWC */ The question is what we do about locales. I think that in this case behavior is arguably correct since we are talking about column numbers on the terminal/locale, and even when Lilypond is using utf-8, those will correspond with the interpretation of the locale. Or something. Anyway, it seems like this change would cause the surrounding function to behave more consistently. It works in my case. By the way: when I switch into POSIX locale, the error message will occur before the first Umlaut which is then no longer considered text apparently. So we already have some built-in locale dependencies elsewhere. My vote is on getting it merged, but it probably would do no harm if somebody checked this on Windows where the old version purportedly worked. -- David Kastrup ___ bug-lilypond mailing list bug-lilypond@gnu.org http://lists.gnu.org/mailman/listinfo/bug-lilypond
Re: Lilypond's error column printer confuses bytes and characters
On 2009-10-18, David Kastrup wrote: > > GNU LilyPond 2.13.4 > Processing `bad.ly' > Parsing... > bad.ly:4:16: error: syntax error, unexpected MUSIC_IDENTIFIER > MÃÃÃ A\342\231 > \257 Bâ \break > error: failed files: "bad.ly" > > Apparently, the error column is being tracked by counting characters, > but is displayed by counting bytes. The indicator appears too early > because of that (which caused me to look for the wrong bug in an input > file of mine). This patch seems to correct the issue, but I don't know if it's the correct fix (or if there are any side effects I'm unaware of). I get this output: GNU LilyPond 2.13.6 Processing `bad.ly' Parsing... bad.ly:4:16: error: syntax error, unexpected MUSIC_IDENTIFIER Määä A♯ B♭ \break error: failed files: "bad.ly" If the patch looks okay, I'll add a commit summary for completeness. Thanks, Patrick >From 3a0a66f7d6bc2f4791da6c3f6efeb499eed49465 Mon Sep 17 00:00:00 2001 From: Patrick McCarty Date: Thu, 22 Oct 2009 03:01:09 -0700 Subject: [PATCH] Fix error message output alignment for wide chars --- lily/source-file.cc |7 ++- 1 files changed, 6 insertions(+), 1 deletions(-) diff --git a/lily/source-file.cc b/lily/source-file.cc index fc5b034..96264fb 100644 --- a/lily/source-file.cc +++ b/lily/source-file.cc @@ -308,7 +308,12 @@ Source_file::get_counts (char const *pos_str0, else (*column)++; - (*line_char)++; + /* + For accurate error output, consider multibyte + characters as a series of characters. + */ + (*line_char) += thislen; + /* Advance past this character. */ line_chars += thislen; left -= thislen; -- 1.6.5.1 ___ bug-lilypond mailing list bug-lilypond@gnu.org http://lists.gnu.org/mailman/listinfo/bug-lilypond
Re: Lilypond's error column printer confuses bytes and characters
2009/10/19 David Kastrup : > -Eluze writes: >> however, with version 2.13.3 (under windows vista) i get the following error >> message: >> >> Analysieren... >> bad.ly:4:23: Fehler: syntax error, unexpected MUSIC_IDENTIFIER >> Määä A♯ B♭ >> \break >> >> which to me looks correct! > > Yes. Since "Analysieren" looked German to me, I checked with the German > locale de_DE.UTF-8 (had to install language-pack-de for it to work > properly, though, since otherwise ä ist not accepted as text). > > No better luck: same bombout. My normal locale is en_US.UTF-8. It is > conceivable that people will see this bug (on POSIXy systems) only when > a valid UTF-8 locale is selected. > > If you don't see it with 2.13.3 under Windows, either the Windows > behavior is different, or something went wrong between 2.13.3 and now. I can reproduce it on 2.13.6 Linux with LANG=es_ES.UTF-8 If I prefix the \break token by a minimum of seven blank spaces, the error message stops printing trash characters which are printed otherwise: Määä A♯ B♭ \break -- Francisco Vila. Badajoz (Spain) www.paconet.org www.csmbadajoz.com ___ bug-lilypond mailing list bug-lilypond@gnu.org http://lists.gnu.org/mailman/listinfo/bug-lilypond
Re: Lilypond's error column printer confuses bytes and characters
-Eluze writes: > David Kastrup wrote: >> >> -Eluze writes: >> >>> now, if i comment the \break (or omit it), the file compiles quite well - >>> did i miss something? >> >> The subject line and the problem description? >> >> > > ahhh! i see now - i thought you were looking for a solution… > > however, with version 2.13.3 (under windows vista) i get the following error > message: > > Analysieren... > bad.ly:4:23: Fehler: syntax error, unexpected MUSIC_IDENTIFIER > Määä A♯ B♭ >\break > > which to me looks correct! Yes. Since "Analysieren" looked German to me, I checked with the German locale de_DE.UTF-8 (had to install language-pack-de for it to work properly, though, since otherwise ä ist not accepted as text). No better luck: same bombout. My normal locale is en_US.UTF-8. It is conceivable that people will see this bug (on POSIXy systems) only when a valid UTF-8 locale is selected. If you don't see it with 2.13.3 under Windows, either the Windows behavior is different, or something went wrong between 2.13.3 and now. -- David Kastrup ___ bug-lilypond mailing list bug-lilypond@gnu.org http://lists.gnu.org/mailman/listinfo/bug-lilypond
Re: Lilypond's error column printer confuses bytes and characters
David Kastrup wrote: > > -Eluze writes: > >> now, if i comment the \break (or omit it), the file compiles quite well - >> did i miss something? > > The subject line and the problem description? > > ahhh! i see now - i thought you were looking for a solution… however, with version 2.13.3 (under windows vista) i get the following error message: Analysieren... bad.ly:4:23: Fehler: syntax error, unexpected MUSIC_IDENTIFIER Määä A♯ B♭ \break which to me looks correct! -- View this message in context: http://www.nabble.com/Lilypond%27s-error-column-printer-confuses-bytes-and-characters-tp25946915p25950920.html Sent from the Gnu - Lilypond - Bugs mailing list archive at Nabble.com. ___ bug-lilypond mailing list bug-lilypond@gnu.org http://lists.gnu.org/mailman/listinfo/bug-lilypond
Re: Lilypond's error column printer confuses bytes and characters
-Eluze writes: > David Kastrup wrote: >> >> >> The following input file: >> > … which is a .bin file and should be a .ly file - but downloaded its content > looks like > > \markup{ > Määä A♯ B♭ \break > } > > now, if i comment the \break (or omit it), the file compiles quite well - > did i miss something? The subject line and the problem description? -- David Kastrup ___ bug-lilypond mailing list bug-lilypond@gnu.org http://lists.gnu.org/mailman/listinfo/bug-lilypond
Re: Lilypond's error column printer confuses bytes and characters
David Kastrup wrote: > > > The following input file: > … which is a .bin file and should be a .ly file - but downloaded its content looks like \markup{ Määä A♯ B♭ \break } now, if i comment the \break (or omit it), the file compiles quite well - did i miss something? -- View this message in context: http://www.nabble.com/Lilypond%27s-error-column-printer-confuses-bytes-and-characters-tp25946915p25950417.html Sent from the Gnu - Lilypond - Bugs mailing list archive at Nabble.com. ___ bug-lilypond mailing list bug-lilypond@gnu.org http://lists.gnu.org/mailman/listinfo/bug-lilypond
Lilypond's error column printer confuses bytes and characters
> I am not topposting I reported this bug once already, but its distribution was haphazard (never got to gmane) and nobody entered it into the bug tracker. The following input file: bin2jX8eDNUTu.bin Description: Binary data leads to the following error output (which hacks an utf-8 character into pieces, replaced by printable octal sequences to make this transfer better via mail): GNU LilyPond 2.13.4 Processing `bad.ly' Parsing... bad.ly:4:16: error: syntax error, unexpected MUSIC_IDENTIFIER MÃÃÃ A\342\231 \257 Bâ \break error: failed files: "bad.ly" Apparently, the error column is being tracked by counting characters, but is displayed by counting bytes. The indicator appears too early because of that (which caused me to look for the wrong bug in an input file of mine). -- David Kastrup ___ bug-lilypond mailing list bug-lilypond@gnu.org http://lists.gnu.org/mailman/listinfo/bug-lilypond
Lilypond's error column printer confuses bytes and characters
> I'm not top-posting The following input file: bad.ly Description: Binary data leads to the following error output (which hacks an utf-8 character into pieces, replaced by printable octal sequences to make this transfer better via mail): GNU LilyPond 2.13.4 Processing `bad.ly' Parsing... bad.ly:4:16: error: syntax error, unexpected MUSIC_IDENTIFIER Määä A\342\231 \257 B♭ \break error: failed files: "bad.ly" Apparently, the error column is being tracked by counting characters, but is displayed by counting bytes. The indicator appears too early because of that (which caused me to look for the wrong bug in an input file of mine). -- David Kastrup ___ bug-lilypond mailing list bug-lilypond@gnu.org http://lists.gnu.org/mailman/listinfo/bug-lilypond