Re: Lilypond's error column printer confuses bytes and characters

2009-10-26 Thread Patrick McCarty
On 2009-10-22, David Kastrup wrote:
> Patrick McCarty  writes:
> 
> > On 2009-10-18, David Kastrup wrote:
> >> 
> >> GNU LilyPond 2.13.4
> >> Processing `bad.ly'
> >> Parsing...
> >> bad.ly:4:16: error: syntax error, unexpected MUSIC_IDENTIFIER
> >>  MÃÃÃ A\342\231
> >> \257 Bâ \break
> >> error: failed files: "bad.ly"
> >> 
> >> Apparently, the error column is being tracked by counting characters,
> >> but is displayed by counting bytes.  The indicator appears too early
> >> because of that (which caused me to look for the wrong bug in an input
> >> file of mine).
> >
> > This patch seems to correct the issue, but I don't know if it's the
> > correct fix (or if there are any side effects I'm unaware of).
> 
> The code before states:
> 
>   while (left > 0)
> {
>   /*
>   FIXME, this is apparently locale dependent.
>   */
> #if HAVE_MBRTOWC
>   wchar_t multibyte[2];
>   size_t thislen = mbrtowc (multibyte, line_chars, left, &state);
> #else
>   size_t thislen = 1;
> #endif /* !HAVE_MBRTOWC */
> 
> The question is what we do about locales.  I think that in this case
> behavior is arguably correct since we are talking about column numbers
> on the terminal/locale, and even when Lilypond is using utf-8, those
> will correspond with the interpretation of the locale.

Sorry about the delay.  The output looks okay to me when invoking
xterm with various locales.

Also, the point-and-click functionality still seems to work correctly,
so this *might* fix the problem Harmath reported a few weeks ago:

http://lists.gnu.org/archive/html/bug-lilypond/2009-10/msg1.html

> By the way: when I switch into POSIX locale, the error message will
> occur before the first Umlaut which is then no longer considered text
> apparently.  So we already have some built-in locale dependencies
> elsewhere.

Yes, I'm pretty sure this is coming from glibc.

After stepping through Source_file::get_counts() when LC_ALL=POSIX, I
noticed that mbrtowc() returned -1 (type size_t) when it processed the
ä.  As a result, this condition prevents the consideration of more
characters:

  /* Stop converting at invalid character;
 this can mean we have read just the first part
 of a valid character.  */
  if (thislen == (size_t) -1)
break;


It seems that non-ASCII characters are not valid characters when the
locale is POSIX.  The glibc docs aren't very clear on this point, and
only mention the fact that mbrtowc() is locale-dependent.

BTW, as the comment states, it would be nice to use a function that is
not locale-dependent, since the only information we need is the size
(in bytes) of the current UTF-8 character.

> My vote is on getting it merged, but it probably would do no harm if
> somebody checked this on Windows where the old version purportedly
> worked.

I'll apply it and make a note to check the next devel release on
Windows.


Thanks,
Patrick


___
bug-lilypond mailing list
bug-lilypond@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-lilypond


Re: Lilypond's error column printer confuses bytes and characters

2009-10-22 Thread David Kastrup
Patrick McCarty  writes:

> On 2009-10-18, David Kastrup wrote:
>> 
>> GNU LilyPond 2.13.4
>> Processing `bad.ly'
>> Parsing...
>> bad.ly:4:16: error: syntax error, unexpected MUSIC_IDENTIFIER
>>  MÃÃÃ A\342\231
>> \257 Bâ \break
>> error: failed files: "bad.ly"
>> 
>> Apparently, the error column is being tracked by counting characters,
>> but is displayed by counting bytes.  The indicator appears too early
>> because of that (which caused me to look for the wrong bug in an input
>> file of mine).
>
> This patch seems to correct the issue, but I don't know if it's the
> correct fix (or if there are any side effects I'm unaware of).

The code before states:

  while (left > 0)
{
  /*
FIXME, this is apparently locale dependent.
  */
#if HAVE_MBRTOWC
  wchar_t multibyte[2];
  size_t thislen = mbrtowc (multibyte, line_chars, left, &state);
#else
  size_t thislen = 1;
#endif /* !HAVE_MBRTOWC */

The question is what we do about locales.  I think that in this case
behavior is arguably correct since we are talking about column numbers
on the terminal/locale, and even when Lilypond is using utf-8, those
will correspond with the interpretation of the locale.

Or something.

Anyway, it seems like this change would cause the surrounding function
to behave more consistently.

As to consistency: when I switch into POSIX locale, the error message
will occur before the first Umlaut which is then no longer considered
text apparently.  So we already have some built-in locale dependencies
elsewhere.

-- 
David Kastrup


___
bug-lilypond mailing list
bug-lilypond@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-lilypond


Re: Lilypond's error column printer confuses bytes and characters

2009-10-22 Thread David Kastrup
Patrick McCarty  writes:

> On 2009-10-18, David Kastrup wrote:
>> 
>> GNU LilyPond 2.13.4
>> Processing `bad.ly'
>> Parsing...
>> bad.ly:4:16: error: syntax error, unexpected MUSIC_IDENTIFIER
>>  MÃÃÃ A\342\231
>> \257 Bâ \break
>> error: failed files: "bad.ly"
>> 
>> Apparently, the error column is being tracked by counting characters,
>> but is displayed by counting bytes.  The indicator appears too early
>> because of that (which caused me to look for the wrong bug in an input
>> file of mine).
>
> This patch seems to correct the issue, but I don't know if it's the
> correct fix (or if there are any side effects I'm unaware of).

The code before states:

  while (left > 0)
{
  /*
FIXME, this is apparently locale dependent.
  */
#if HAVE_MBRTOWC
  wchar_t multibyte[2];
  size_t thislen = mbrtowc (multibyte, line_chars, left, &state);
#else
  size_t thislen = 1;
#endif /* !HAVE_MBRTOWC */

The question is what we do about locales.  I think that in this case
behavior is arguably correct since we are talking about column numbers
on the terminal/locale, and even when Lilypond is using utf-8, those
will correspond with the interpretation of the locale.

Or something.

Anyway, it seems like this change would cause the surrounding function
to behave more consistently.  It works in my case.

By the way: when I switch into POSIX locale, the error message will
occur before the first Umlaut which is then no longer considered text
apparently.  So we already have some built-in locale dependencies
elsewhere.

My vote is on getting it merged, but it probably would do no harm if
somebody checked this on Windows where the old version purportedly
worked.

-- 
David Kastrup



___
bug-lilypond mailing list
bug-lilypond@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-lilypond


Re: Lilypond's error column printer confuses bytes and characters

2009-10-22 Thread Patrick McCarty
On 2009-10-18, David Kastrup wrote:
> 
> GNU LilyPond 2.13.4
> Processing `bad.ly'
> Parsing...
> bad.ly:4:16: error: syntax error, unexpected MUSIC_IDENTIFIER
>  MÃÃÃ A\342\231
> \257 Bâ \break
> error: failed files: "bad.ly"
> 
> Apparently, the error column is being tracked by counting characters,
> but is displayed by counting bytes.  The indicator appears too early
> because of that (which caused me to look for the wrong bug in an input
> file of mine).

This patch seems to correct the issue, but I don't know if it's the
correct fix (or if there are any side effects I'm unaware of).

I get this output:

  GNU LilyPond 2.13.6
  Processing `bad.ly'
  Parsing...
  bad.ly:4:16: error: syntax error, unexpected MUSIC_IDENTIFIER
   Määä A♯ B♭ 
  \break
  error: failed files: "bad.ly"


If the patch looks okay, I'll add a commit summary for completeness.

Thanks,
Patrick
>From 3a0a66f7d6bc2f4791da6c3f6efeb499eed49465 Mon Sep 17 00:00:00 2001
From: Patrick McCarty 
Date: Thu, 22 Oct 2009 03:01:09 -0700
Subject: [PATCH] Fix error message output alignment for wide chars

---
 lily/source-file.cc |7 ++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/lily/source-file.cc b/lily/source-file.cc
index fc5b034..96264fb 100644
--- a/lily/source-file.cc
+++ b/lily/source-file.cc
@@ -308,7 +308,12 @@ Source_file::get_counts (char const *pos_str0,
   else
(*column)++;
 
-  (*line_char)++;
+  /*
+   For accurate error output, consider multibyte
+   characters as a series of characters.
+  */
+  (*line_char) += thislen;
+
   /* Advance past this character. */
   line_chars += thislen;
   left -= thislen;
-- 
1.6.5.1

___
bug-lilypond mailing list
bug-lilypond@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-lilypond


Re: Lilypond's error column printer confuses bytes and characters

2009-10-18 Thread Francisco Vila
2009/10/19 David Kastrup :
> -Eluze  writes:
>> however, with version 2.13.3 (under windows vista) i get the following error
>> message:
>>
>> Analysieren...
>> bad.ly:4:23: Fehler: syntax error, unexpected MUSIC_IDENTIFIER
>>      Määä A♯ B♭
>>                        \break
>>
>> which to me looks correct!
>
> Yes.  Since "Analysieren" looked German to me, I checked with the German
> locale de_DE.UTF-8 (had to install language-pack-de for it to work
> properly, though, since otherwise ä ist not accepted as text).
>
> No better luck: same bombout.  My normal locale is en_US.UTF-8.  It is
> conceivable that people will see this bug (on POSIXy systems) only when
> a valid UTF-8 locale is selected.
>
> If you don't see it with 2.13.3 under Windows, either the Windows
> behavior is different, or something went wrong between 2.13.3 and now.

I can reproduce it on 2.13.6 Linux with LANG=es_ES.UTF-8

If I prefix the \break token by a minimum of seven blank spaces, the
error message stops printing trash characters which are printed
otherwise:

 Määä A♯ B♭
 \break

-- 
Francisco Vila. Badajoz (Spain)
www.paconet.org
www.csmbadajoz.com


___
bug-lilypond mailing list
bug-lilypond@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-lilypond


Re: Lilypond's error column printer confuses bytes and characters

2009-10-18 Thread David Kastrup
-Eluze  writes:

> David Kastrup wrote:
>> 
>> -Eluze  writes:
>> 
>>> now, if i comment the \break (or omit it), the file compiles quite well -
>>> did i miss something?
>> 
>> The subject line and the problem description?
>> 
>> 
>
> ahhh! i see now - i thought you were looking for a solution…
>
> however, with version 2.13.3 (under windows vista) i get the following error
> message:
>
> Analysieren...
> bad.ly:4:23: Fehler: syntax error, unexpected MUSIC_IDENTIFIER
>  Määä A♯ B♭ 
>\break
>
> which to me looks correct!

Yes.  Since "Analysieren" looked German to me, I checked with the German
locale de_DE.UTF-8 (had to install language-pack-de for it to work
properly, though, since otherwise ä ist not accepted as text).

No better luck: same bombout.  My normal locale is en_US.UTF-8.  It is
conceivable that people will see this bug (on POSIXy systems) only when
a valid UTF-8 locale is selected.

If you don't see it with 2.13.3 under Windows, either the Windows
behavior is different, or something went wrong between 2.13.3 and now.

-- 
David Kastrup



___
bug-lilypond mailing list
bug-lilypond@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-lilypond


Re: Lilypond's error column printer confuses bytes and characters

2009-10-18 Thread -Eluze


David Kastrup wrote:
> 
> -Eluze  writes:
> 
>> now, if i comment the \break (or omit it), the file compiles quite well -
>> did i miss something?
> 
> The subject line and the problem description?
> 
> 

ahhh! i see now - i thought you were looking for a solution…

however, with version 2.13.3 (under windows vista) i get the following error
message:

Analysieren...
bad.ly:4:23: Fehler: syntax error, unexpected MUSIC_IDENTIFIER
 Määä A♯ B♭ 
   \break

which to me looks correct!
-- 
View this message in context: 
http://www.nabble.com/Lilypond%27s-error-column-printer-confuses-bytes-and-characters-tp25946915p25950920.html
Sent from the Gnu - Lilypond - Bugs mailing list archive at Nabble.com.



___
bug-lilypond mailing list
bug-lilypond@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-lilypond


Re: Lilypond's error column printer confuses bytes and characters

2009-10-18 Thread David Kastrup
-Eluze  writes:

> David Kastrup wrote:
>> 
>> 
>> The following input file:
>> 
> … which is a .bin file and should be a .ly file - but downloaded its content
> looks like
>
> \markup{
>  Määä A♯ B♭ \break
> }
>
> now, if i comment the \break (or omit it), the file compiles quite well -
> did i miss something?

The subject line and the problem description?

-- 
David Kastrup



___
bug-lilypond mailing list
bug-lilypond@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-lilypond


Re: Lilypond's error column printer confuses bytes and characters

2009-10-18 Thread -Eluze


David Kastrup wrote:
> 
> 
> The following input file:
> 
… which is a .bin file and should be a .ly file - but downloaded its content
looks like

\markup{
 Määä A♯ B♭ \break
}

now, if i comment the \break (or omit it), the file compiles quite well -
did i miss something?
 


-- 
View this message in context: 
http://www.nabble.com/Lilypond%27s-error-column-printer-confuses-bytes-and-characters-tp25946915p25950417.html
Sent from the Gnu - Lilypond - Bugs mailing list archive at Nabble.com.



___
bug-lilypond mailing list
bug-lilypond@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-lilypond


Lilypond's error column printer confuses bytes and characters

2009-10-18 Thread David Kastrup

> I am not topposting

I reported this bug once already, but its distribution was haphazard
(never got to gmane) and nobody entered it into the bug tracker.

The following input file:



bin2jX8eDNUTu.bin
Description: Binary data

leads to the following error output (which hacks an utf-8 character into
pieces, replaced by printable octal sequences to make this transfer
better via mail):

GNU LilyPond 2.13.4
Processing `bad.ly'
Parsing...
bad.ly:4:16: error: syntax error, unexpected MUSIC_IDENTIFIER
 MÃÃÃ A\342\231
\257 Bâ \break
error: failed files: "bad.ly"

Apparently, the error column is being tracked by counting characters,
but is displayed by counting bytes.  The indicator appears too early
because of that (which caused me to look for the wrong bug in an input
file of mine).

-- 
David Kastrup
___
bug-lilypond mailing list
bug-lilypond@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-lilypond


Lilypond's error column printer confuses bytes and characters

2009-10-04 Thread David Kastrup

> I'm not top-posting

The following input file:



bad.ly
Description: Binary data

leads to the following error output (which hacks an utf-8 character into
pieces, replaced by printable octal sequences to make this transfer
better via mail):

GNU LilyPond 2.13.4
Processing `bad.ly'
Parsing...
bad.ly:4:16: error: syntax error, unexpected MUSIC_IDENTIFIER
 Määä A\342\231
\257 B♭ \break
error: failed files: "bad.ly"

Apparently, the error column is being tracked by counting characters,
but is displayed by counting bytes.  The indicator appears too early
because of that (which caused me to look for the wrong bug in an input
file of mine).

-- 
David Kastrup
___
bug-lilypond mailing list
bug-lilypond@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-lilypond