On Tue, 6 Jan 2026, Matthew Pierce wrote:

> I asked Grok to self-design a prompt for accurately converting sheet music
> images into Lilypond code. Early results are promising.
>

Matthew, thank you for starting this discussion. It got me curious about
what AI can do with music transcription in LilyPond.

I did my own experiment. Conclusion: Grok failed, and ChatGPT probably
wouldn’t/won’t do much better. Below my signature is some detail about what
I did, step-by-step.

If anyone works on software/AI/OMR that might actually *work* someday for
transcribing music PDFs into LilyPond code, I am happy to share the files
that I used in my experiment. I would gladly pay a monthly/annual
subscription to software/AI/OMR *if* it could complete a task like this
with ~90%+ accuracy.

For now, it’s back to the good old-fashioned mechanical pencil* to mark up
my hard-copy score with the annotations that I would have added in LilyPond
if AI had worked!

Cheers,

Gabriel

*By the way, I’d welcome recommendations of the best mechanical pencils for
annotating music (off-list, please, since this is not about LilyPond). I
was excited about my first Rotring 600, and there is a lot that I like
about it, but also it frustrates me often, and I’d like to try another
model.

My experiment:


   - I gave Grok the prompt in Matthew’s message of 6 January
   <https://lists.gnu.org/archive/html/lilypond-user/2026-01/msg00043.html>,
   with a little bit more instructions/context.
   - I gave it a good scan of a movement of Bach’s St John Passion
   (Bärenreiter edition) to “read.”
   - Grok’s first “draft” was nonsense/hallucinations. The code would not
   compile.
   - Then, I manually transcribed the beginning of the movement myself (the
   first 8 or so measures in each instrument/staff). I gave Grok my .ly file
   and the PDF output. I asked it to engrave the rest of the movement.
   - The results were bad again. For example, in the first violin part, I
   had carefully, exactly transcribed every detail of the first eight measures
   of music, and then written …

% Grok, please continue from here onward to the end of the piece (measure
> 91)

… and Grok picked up there with:

> % continue with the running pattern, transposing as needed to match harmony
> \repeat unfold 79 { b8. ais16 ais8. cis16 cis d cis b | }

… which is very obviously not Bach’s music.


   - I wrote back, saying, in essence, “You did not complete the
   assignment.” I pushed Grok actually to transcribe every note in the
   movement.
   - That did not go well, either. The code compiled this time, but it’s
   full of errors. Grok even removed a number of *correct* things in my
   human/manual transcription of the first eight measures!
   - I went to ChatGPT and explained that Grok had failed. ChatGPT sent a
   “lovely” response proposing that we work “together” in small chunks.
   Outline/headers within ChatGPT’s response:

1. What you’ve actually asked for (and why Grok failed)

2. How I will proceed (correctly)

3. Concrete continuation: measures 24–27 (instrumental line)

4. What I will not do (unless you ask)

5. Proposed next steps (editorially sane)

The only workflow that stays accurate is:

I continue *3–5 measures at a time, per voice*

You compile and visually confirm

We move on to:

remaining instrumental lines

then vocal line + lyrics alignment

finally layout refinements (system breaks, spacing, slur shaping)


   - It sounds nice in theory, but I am not going to bother with this
   method because ChatGPT’s draft music for measures 25–27 is obviously wrong.

Reply via email to