Re: UTF-8 in MIDI Lyrics

2017-02-25 Thread karl
Joseph Austin:
> > From: Karl Hammar  > >
...
> >> But if we are going to use a "private standard", we might as well
> >> imitate the "official" standard and insert something like
> >> FF 05 07 { @ U T F 8 }
> >> And lobby AMEI/MMA to adopt an official UTF8 position.
> > 
> > Could be good, but why just not capitalize on the BOM and just use
> > utf8.
...
> OK, the UTF-8 BOM is 0x EF BB BF
> But given that the MIDI file is not a "text file" but a binary
> file with text fields scattered throughout,
> normally embedded in various MIDI Meta-events, where should the BOM
> be placed?
> 
> Interpreting your suggestion, we could add a Lyric Meta-Event with
> the BOM as the text field to Track 0 Time 0.  
> That should work for lyrics, but RP-26 indicates that lyrics
> "language encoding" should not extend to other types of text events.
> For other text events, it seems we would need to prefix every UTF-8
> text field with the BOM.

Unfortunately midi-file standard and rp26 doesn't help you with that.
The only fallback we have is the notice that you can use extended
charset, but you have to check that programs at the other end of the
pipe does something reasonable with it.

And all this boils down to how do programs out there interpret any
extended char, do they support utf8 BOM, do they support rp26 ?
Well, I don't know.

>From reading rp26, it only talks about latin (latin-1 I assume),
ms-kanji and utf16* (which has the boms "FFFE", "FEFF"), so I don't
think we can expect anything more. Utf16 points to MS-Windows, what
programs makes midi files on that os, how to they do it ?

Soo in the end, I think we should just use utf8 and wait and see if
any bug reports comes in about it.

But as you suggest, we could talk to the midi assosiation to amend
the standard. The easiest would be if they replaced the word "ascii"
with "utf8".

Regards,
/Karl Hammar

---
Asp� Data
Lilla Asp� 148
S-742 94 �sthammar
Sweden
+46 173 140 57



___
lilypond-user mailing list
lilypond-user@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-user


Re: UTF-8 in MIDI Lyrics

2017-02-25 Thread Joseph Austin

> On Feb 25, 2017, at 11:41 AM, lilypond-user-requ...@gnu.org wrote:
> 
> Date: Sat, 25 Feb 2017 17:34:54 +0100 (CET)
> From: Karl Hammar  >
> To: Joseph Austin mailto:drtechda...@gmail.com>>
> 
> 
> And,  rp26 clearly states in section 5:
> 
> In addition, if a byte order mark which specifies UNICODE such as
> 'FF FE' or 'FE FF' exists, the character code SET should be treated
>  as UNICODE.
> 
> There is such a "byte order mark" for utf8, see [2]. And then by
> extension, you just have to insert that BOM somewhere in the midi
> file (exists == not restricted to the lyrics meta event, preferable
> in track 0 at time 0) and it would be legal (according to the
> recommendation) to use utf8 straigth out the box.
> 
> [2] http://www.unicode.org/faq/utf_bom.html#BOM 
> 
> 
> 
> 
>> only ASCII chars between 0 and 127 are allowed.
> 
> Your wording is too hard. complete_midi_96-1-3.pdf, p.137 (or [1] 
> p.10) clearly says "should", but 
> 
> "other characters codes
> using the high-order bit may be used for interchange of files between
> different programs on the same computer which supports an extended
> character set. Programs on a computer   which  does not support
> non-ASCII characters should ignore those characters."

I stand corrected.

>> But if we are going to use a "private standard", we might as well
>> imitate the "official" standard and insert something like
>> FF 05 07 { @ U T F 8 }
>> And lobby AMEI/MMA to adopt an official UTF8 position.
> 
> Could be good, but why just not capitalize on the BOM and just use
> utf8.
> 
> Regards,
> /Karl Hammar

OK, the UTF-8 BOM is 0x EF BB BF
But given that the MIDI file is not a "text file" but a binary file with text 
fields scattered throughout,
normally embedded in various MIDI Meta-events, where should the BOM be placed?

Interpreting your suggestion, we could add a Lyric Meta-Event with the BOM as 
the text field to Track 0 Time 0.  
That should work for lyrics, but RP-26 indicates that lyrics "language 
encoding" should not extend to other types of text events.
For other text events, it seems we would need to prefix every UTF-8 text field 
with the BOM.
---
Joe Austin


___
lilypond-user mailing list
lilypond-user@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-user


Re: UTF-8 in MIDI Lyrics

2017-02-25 Thread karl
sorry, last mail wrong from header.

Joe Austen:
> > Am 24.02.2017 um 02:15 schrieb Joseph Austin:
> >> This raises another question.  I'm working with MIDI files,
> >> and it's not clear how to encode UTF-8 text in MIDI.
> >> There must be some convention, but I haven't found an official RP for it.
...
> I don't have a program that displays MIDI  files with lyrics, so I can't test 
> it.

Timidity will show the lyrics.
I have a simple program that dumps the midi as text:

 http://aspodata.se/git/musik/bin/midi.pl

$ midi.pl test.midi  | grep lyric | head
['lyric', 0, 'Sta'],
['lyric', 768, 'bat '],
['lyric', 768, 'Ma'],
['lyric', 768, 'ter '],
['lyric', 384, 'do'],
['lyric', 768, 'lo'],
['lyric', 384, 'ro'],
['lyric', 768, 'sa '],
['lyric', 384, 'sa '],
['lyric', 384, 'jux'],
$

> It appears that, when generating a MIDI file, LilyPond currently
> just puts UTF8 chars in the text fields as if they were ASCII.
> According the base MIDI spec, this is illegal;  only ASCII chars
> between 0 and 127 are allowed.

Your wording is too hard. complete_midi_96-1-3.pdf, p.137 (or [1] 
p.10) clearly says "should", but 

 "other characters codes
 using the high-order bit may be used for interchange of files between
 different programs on the same computer which supports an extended
 character set. Programs on a computer   which  does not support
 non-ASCII characters should ignore those characters."

[1] http://www.cdik.se/pdf/midiformat.pdf

Also, rp17.pdf, last paragraph gives you the set that are "accepted for use"
and that "it is best to avoid the use of these characters: \ [ ] { }".

And,  rp26 clearly states in section 5:
 
 In addition, if a byte order mark which specifies UNICODE such as
 'FF FE' or 'FE FF' exists, the character code SET should be treated
  as UNICODE.

There is such a "byte order mark" for utf8, see [2]. And then by
extension, you just have to insert that BOM somewhere in the midi
file (exists == not restricted to the lyrics meta event, preferable
in track 0 at time 0) and it would be legal (according to the
recommendation) to use utf8 straigth out the box.

[2] http://www.unicode.org/faq/utf_bom.html#BOM

> However, MIDI RP-17 and RP-26 introduce additional encodings for
> the  portion of the lyric meta-event FF 05  .

You do extrapolate a litte, rp17 tells you the "recommended" way to 
specify end of word/line/paragraph, and gives you a list of characters
that should give no compatibility problems.

> In particular, RP-26 specifies the "language" code  {@LATIN} to
> include 8-bit chars > 127.  It seems no code for "UTF8" has been
> officially defined, but a reasonable proposal might be language code:
> {@UTF8}.

You don't need that, see above about BOM. Also it would be interesting
to see which programs that actually support rp26. Since midi "standards"
just are recommendations, you have to know what works in the wild.

..
> So for LilyPond purposes, it would suffice to use a reversible
> encoding, that is, LilyPond would accept any MIDI file text format
> that LilyPond generates.  The apparently existing UTF-8 default
> should work for that.

Lilypond don't read midi files, you can convert midi files to ly files,
which then lilypond can read.

> But if we are going to use a "private standard", we might as well
> imitate the "official" standard and insert something like
> FF 05 07 { @ U T F 8 }
> And lobby AMEI/MMA to adopt an official UTF8 position.

Could be good, but why just not capitalize on the BOM and just use
utf8.

Regards,
/Karl Hammar

---
Aspö Data
Lilla Aspö 148
S-742 94 Östhammar
Sweden
+46 173 140 57

___
lilypond-user mailing list
lilypond-user@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-user


Re: UTF-8 in MIDI Lyrics

2017-02-25 Thread Karl Hammar

Joe Austen:
> > Am 24.02.2017 um 02:15 schrieb Joseph Austin:
> >> This raises another question.  I'm working with MIDI files,
> >> and it's not clear how to encode UTF-8 text in MIDI.
> >> There must be some convention, but I haven't found an official RP for it.
...
> I don't have a program that displays MIDI  files with lyrics, so I can't test 
> it.

Timidity will show the lyrics.
I have a simple program that dumps the midi as text:

 http://aspodata.se/git/musik/bin/midi.pl

$ midi.pl test.midi  | grep lyric | head
['lyric', 0, 'Sta'],
['lyric', 768, 'bat '],
['lyric', 768, 'Ma'],
['lyric', 768, 'ter '],
['lyric', 384, 'do'],
['lyric', 768, 'lo'],
['lyric', 384, 'ro'],
['lyric', 768, 'sa '],
['lyric', 384, 'sa '],
['lyric', 384, 'jux'],
$

> It appears that, when generating a MIDI file, LilyPond currently
> just puts UTF8 chars in the text fields as if they were ASCII.
> According the base MIDI spec, this is illegal;  only ASCII chars
> between 0 and 127 are allowed.

Your wording is too hard. complete_midi_96-1-3.pdf, p.137 (or [1] 
p.10) clearly says "should", but 

 "other characters codes
 using the high-order bit may be used for interchange of files between
 different programs on the same computer which supports an extended
 character set. Programs on a computer   which  does not support
 non-ASCII characters should ignore those characters."

[1] http://www.cdik.se/pdf/midiformat.pdf

Also, rp17.pdf, last paragraph gives you the set that are "accepted for use"
and that "it is best to avoid the use of these characters: \ [ ] { }".

And,  rp26 clearly states in section 5:
 
 In addition, if a byte order mark which specifies UNICODE such as
 'FF FE' or 'FE FF' exists, the character code SET should be treated
  as UNICODE.

There is such a "byte order mark" for utf8, see [2]. And then by
extension, you just have to insert that BOM somewhere in the midi
file (exists == not restricted to the lyrics meta event, preferable
in track 0 at time 0) and it would be legal (according to the
recommendation) to use utf8 straigth out the box.

[2] http://www.unicode.org/faq/utf_bom.html#BOM

> However, MIDI RP-17 and RP-26 introduce additional encodings for
> the  portion of the lyric meta-event FF 05  .

You do extrapolate a litte, rp17 tells you the "recommended" way to 
specify end of word/line/paragraph, and gives you a list of characters
that should give no compatibility problems.

> In particular, RP-26 specifies the "language" code  {@LATIN} to
> include 8-bit chars > 127.  It seems no code for "UTF8" has been
> officially defined, but a reasonable proposal might be language code:
> {@UTF8}.

You don't need that, see above about BOM. Also it would be interesting
to see which programs that actually support rp26. Since midi "standards"
just are recommendations, you have to know what works in the wild.

..
> So for LilyPond purposes, it would suffice to use a reversible
> encoding, that is, LilyPond would accept any MIDI file text format
> that LilyPond generates.  The apparently existing UTF-8 default
> should work for that.

Lilypond don't read midi files, you can convert midi files to ly files,
which then lilypond can read.

> But if we are going to use a "private standard", we might as well
> imitate the "official" standard and insert something like
> FF 05 07 { @ U T F 8 }
> And lobby AMEI/MMA to adopt an official UTF8 position.

Could be good, but why just not capitalize on the BOM and just use
utf8.

Regards,
/Karl Hammar

---
Aspö Data
Lilla Aspö 148
S-742 94 Östhammar
Sweden
+46 173 140 57



___
lilypond-user mailing list
lilypond-user@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-user


Re: UTF-8 in MIDI Lyrics

2017-02-25 Thread Joseph Austin

> On Feb 24, 2017, at 7:47 AM, lilypond-user-requ...@gnu.org wrote:
> 
> Am 24.02.2017 um 02:15 schrieb Joseph Austin:
>> This raises another question.  I'm working with MIDI files,
>> and it's not clear how to encode UTF-8 text in MIDI.
>> There must be some convention, but I haven't found an official RP for it.
> 
> Personally, I have no idea. Does Lily not do it right?
> Best, Simon

I don't have a program that displays MIDI  files with lyrics, so I can't test 
it.
Both Finale and MuseScore accept LilyPond generated midi files but ignore the 
lyrics.

It appears that, when generating a MIDI file, LilyPond currently just puts UTF8 
chars in the text fields as if they were ASCII.
According the base MIDI spec, this is illegal;  only ASCII chars between 0 and 
127 are allowed.

However, MIDI RP-17 and RP-26 introduce additional encodings for the  
portion of the lyric meta-event FF 05  . In particular, RP-26 
specifies the "language" code  {@LATIN} to include 8-bit chars > 127.  It seems 
no code for "UTF8" has been officially defined, but a reasonable proposal might 
be language code: {@UTF8}.
It's my impression that the largest body of MIDI files with lyrics are 
"Karaoke" files (extension .kar),
which may not be of interest to LilyPond.

I suppose that interest in using MIDI as a "score" language has waned in favor 
of MusicXML.
So for LilyPond purposes, it would suffice to use a reversible encoding, that 
is, LilyPond would accept any MIDI file text format that LilyPond generates.  
The apparently existing UTF-8 default should work for that.
But if we are going to use a "private standard", we might as well imitate the 
"official" standard and insert something like
FF 05 07 { @ U T F 8 }
And lobby AMEI/MMA to adopt an official UTF8 position.

Joe Austin



___
lilypond-user mailing list
lilypond-user@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-user


Re: UTF-8 in MIDI lyrics

2017-02-24 Thread karl
Simon Albrecht:
> Am 24.02.2017 um 02:15 schrieb Joseph Austin:
> > This raises another question.  I'm working with MIDI files,
> > and it's not clear how to encode UTF-8 text in MIDI.
> > There must be some convention, but I haven't found an official RP for it.


http://www.cdik.se/pdf/midiformat.pdf

 [...] Text events may also
 occur at other times in a track, to be used as lyrics, or descriptions
 of cue points. The text in this event should be printable ASCII
 characters for maximum interchange. However, other characters codes
 using the high-order bit may be used for interchange of files between
 different programs on the same computer which supports an extended
 character set. Programs on a computerwhich   does not support
 non-ASCII characters should ignore those characters.

Regards,
/Karl Hammar

---
Aspö Data
Lilla Aspö 148
S-742 94 Östhammar
Sweden
+46 173 140 57



___
lilypond-user mailing list
lilypond-user@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-user


Re: UTF-8 in MIDI lyrics

2017-02-24 Thread Simon Albrecht

Am 24.02.2017 um 02:15 schrieb Joseph Austin:

This raises another question.  I'm working with MIDI files,
and it's not clear how to encode UTF-8 text in MIDI.
There must be some convention, but I haven't found an official RP for it.


Personally, I have no idea. Does Lily not do it right?
Best, Simon

P.S. If you can, please don’t top-post, but reply below the quotation 
(as I now did). That makes it far easier to catch up with to what you 
were replying.



On Feb 23, 2017, at 4:36 PM, Simon Albrecht  wrote:

Am 23.02.2017 um 01:57 schrieb David Wright:

But why not do the Right Thing and dispose of this problem with
\addlyrics { “I am so lone- ly” said she }
and have yourself a proper set of 66 and 99 quotation marks
without needing to enclose them in quotes.

+1
Simon



___
lilypond-user mailing list
lilypond-user@gnu.org
https://lists.gnu.org/mailman/listinfo/lilypond-user