After I saw Matthew’s original message, I was curious to see if other tools
could do much along these lines. So, I prepared a fairly simple piano
score, only one note playing at a time, included accidentals, but not much
more than that and had DeepSeek and Perplexity try to generate LP code to
match. They each briefly discussed some high level structure they claimed
to see, but didn’t get much correct. The code they generated did compile,
which was nice, but both failed pretty miserably.

DeepSeek acted the most confident about the task when I asked it ahead of
time about it, but the notes it generated where 100% incorrect, completely
guessed. I told it several times it was wrong and to try again, but with
only a little improvement, if you could even call it that. So, big
strikeout for DeepSeek.

Perplexity at leased recognized the trend of the notes, but still failed
significantly. Both tools couldn’t determine the time signature (I used
12/8 for fun). Perplexity saw the accidentals and guessed a key signature
when there was none specified. So, definitely a no-go for Perplexity either
despite it also being confident it could do it.

Anyway, it was a fun exercise. They both admitted they likely wouldn’t do
as well as dedicated OMR tools and that is very true lol. Still, if any
general purpose AI can begin to do this, I can see that being a useful
function.

Best,
Abraham



On Thu, Jan 8, 2026 at 1:37 PM Matthew Pierce <[email protected]> wrote:

> On later testing, it failed badly on hard stuff, such as Mahler orchestral
> parts.
>
> Upon pressing, it blamed unclear parts (which was fair). I have yet to try
> a photo of a clean, simple, professionally produced part.
>
> That will be the next test.
>
>
>
>
>
> Sent from Samsung Galaxy smartphone.
>
>
>
> -------- Original message --------
> From: Richard Shann <[email protected]>
> Date: 1/7/26 6:19 AM (GMT-05:00)
> To: Matthew Pierce <[email protected]>, lilypond-user mailinglist <
> [email protected]>
> Subject: Re: LLM prompt: turn sheet music into code
>
> On Tue, 2026-01-06 at 17:52 +0000, Matthew Pierce wrote:
> > I asked Grok to self-design a prompt for accurately converting sheet
> > music images into Lilypond code. Early results are promising.
> >
> > My test: I showed Grok a screenshotted page from a cello arrangement
> > I recently constructed in Lilypond, gave it the prompt below,
> > compiled the code it suggested, and compared the results
> >
> > Results: The visual match was astonishingly good, even though Grok's
> > generated code is different from mine in various ways.
> >
> > The only major difference was the final system being kicked to the
> > next page by Grok's code. I suspect this may have been caused by a
> > slight cropping at the bottom of the page in my screenshotted image.
> >
> > This prompt might be a great shortcut for importing existing sheet
> > music into Lilypond code.
> >
> > Try it on the LLM of your choice. Happy testing!
> >
> > Prompt follows:
> >
> > "You are an expert in LilyPond notation and optical music
> > recognition. Given the attached sheet music image, meticulously
> > reverse-engineer the LilyPond code that would reproduce it with the
> > highest possible visual fidelity. Prioritize absolute precision and
> > accuracy in every aspect of the engraving—including exact note
> > placements, stem directions, beam groupings, slur shapes and
> > positions, dynamic markings, articulations, text annotations, staff
> > layout, spacing, and any special elements like scordatura diagrams or
> > irregular meters—over any considerations of speed or efficiency.
> > Proceed slowly and methodically: first, perform a exhaustive layer-
> > by-layer visual analysis of the image, documenting every observable
> > detail (e.g., clef type, key signature sharps/flats, time signature
> > symbol, pitch positions relative to the staff, durations, ties,
> > hairpins, bowings, and markup coordinates). Cross-reference with
> > music theory and LilyPond syntax to resolve ambiguities. Only after
> > this thorough dissection should you generate the complete LilyPond
> > code, including header, global settings, voices, and layout overrides
> > as needed to match the image pixel-for-pixel where possible. If
> > uncertainties arise, note them and propose the most accurate
> > interpretation based on standard engraving practices."
>
> I gave grok a screenshot of a single movement/single page of a sonata
> of my own composition. The written response sounded extremely
> convincing - it  had detected that the piece was written in Baroque
> style for a start - but in detail it was wildly out, it failed to
> detect the time signature, declared that the piece was in a different
> key from the correct one, failed to detect the systems beyond the first
> one and generated LilyPond syntax that doesn't compile, and which,
> while it showed insight into quite esoteric aspects of LilyPond, didn't
> have any obvious relation to the notes in the piece of music it was
> attempting to interpret.
> That it sounded extremely convincing could be the worst aspect of AI:
> "sounding convincing" is quite different from "being true", but it
> would be easy for a person to innocently re-post the results of an
> enquiry which the AI learning algorithms would find on the internet and
> incorporate into future responses.
>
> Richard Shann
>
>
>

Reply via email to