This might be a reasonable task for a GPT; however, a human will have to
proofread the entire text.

I attempted various forms of transcription a couple of decades ago, when I
was a typesetter, book designer, and editor of several hundred academic
books. None were very useful.

More recently, I have tried Word XML (you do not want to touch Word’s
native format) and PDF conversions using the freely available pdftotext and
pdftohtml. Neither was very successful, but I have yet to try a GPT on the
task.

Note that the only way I know of to get a Word XML file is to use Word’s
save-as-xml feature.

Fred



On Sat, Feb 21, 2026 at 6:52 AM <[email protected]> wrote:

> Send lilypond-user mailing list submissions to
>         [email protected]
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://lists.gnu.org/mailman/listinfo/lilypond-user
> or, via email, send a message with subject or body 'help' to
>         [email protected]
>
> You can reach the person managing the list at
>         [email protected]
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of lilypond-user digest..."
>
>
> Today's Topics:
>
>    1. Custom markup function integrations  (Kyle Baldwin)
>    2. OCR to Transcribe Text PDF in LaTeX (Gabriel Ellsworth)
>    3. Re: OCR to Transcribe Text PDF in LaTeX (Lucas Pinke)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Fri, 20 Feb 2026 11:37:44 -0800
> From: Kyle Baldwin <[email protected]>
> To: [email protected]
> Subject: Custom markup function integrations
> Message-ID: <[email protected]>
> Content-Type: text/plain;       charset=us-ascii
>
> Hi all -
>
> I'm trying to work on integrating a custom markup function to generate a
> procedurally generated graphic. The below is mainly nonsensical but
> demonstrates the interactions I'm trying to understand.
>
> %%% BEGIN MWE
> #(define (build-node-string layout props node-count stencil)
>    (if
>     (eq? node-count 0)
>     stencil
>     (let (
>            (new-markup (markup
>                             (#:combine
>                              (#:path 0.1 (list
>                                           (list 'lineto 1 0)))
>                              #:translate (cons 1 0)
>                              (#:draw-circle 0.25 0.1 #t)))))
>            (build-node-string
>             layout
>             props
>             (1- node-count)
>             (ly:stencil-combine-at-edge stencil X RIGHT (interpret-markup
> layout props new-markup) -0.25)))))
>
> #(define-markup-command (generate-graph layout props node-count)
>    (number?)
>    (build-node-string
>      layout
>      props
>      node-count
>      (interpret-markup layout props (markup #:draw-circle 0.25 0.1 #t))))
>
> \markup \generate-graph #4
>
> \version "2.24.4"
> %%% END MWE
>
> 1. I would like to create a list of markups as opposed to passing a
> stencil in and then concatenating the new stencil part every time. Ideally,
> rendering a structure like below. The main benefit would be not having to
> pass in layout and props into the second function to call interpret-markup.
> I noticed the interpret-markup-list, but that doesn't seem to be what I'm
> looking for.
> (interpret-markup
>   layout
>   props
>   ((markup first circle)
>     (markup second circle)
>     (markup third circle)))
>
> The other thing that I tried was a structure like below but the #:symbols
> need to be added in a markup call, which seems to prevent me from adding
> them outside of the markup constructor.
> (markup (
>         (#:draw-circle 0.25 0.1 #t)
>         (#:combine (#:path 0.1 (list (list 'lineto 1 0))) #:translate
> (cons 1 0) (#:draw-circle 0.25 0.1 #t))
>         (#:combine (#:path 0.1 (list (list 'lineto 1 0))) #:translate
> (cons 1 0) (#:draw-circle 0.25 0.1 #t))))
>
> 2. It would be nice to define `build-node-string` as a markup function so
> that I can read properties from the props because, as far as I can see,
> without the define-markup-command I am unable to use the #:properties
> construct. How do I call a custom markup command inside another custom
> markup command?
>
> Thanks!
>
> -kwb
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> ------------------------------
>
> Message: 2
> Date: Sat, 21 Feb 2026 06:35:00 -0500
> From: Gabriel Ellsworth <[email protected]>
> To: Lilypond-User Mailing List <[email protected]>
> Subject: OCR to Transcribe Text PDF in LaTeX
> Message-ID:
>         <
> cahavgtx-tn26x+whjmhf2rmjux_+uxo0btqvw3kpuw6jbqk...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Here is my situation.
>
>
>    1. I am trying to typeset a new edition of a public-domain book.
>    2. I have a PDF that contains a scanned copy of a 20th century printing
>    of this book (about 700 pages).
>    3. My output will contain a bit of LilyPond output, but music notation
>    will not be “the main actor” (to borrow Lucas’s very apt phrase below).
> I
>    estimate that the book will be 97% text and 3% LilyPond.
>    4. Based on past helpful input from this list, I suspect that LaTeX will
>    be the best way to create this book.
>    5. I have never used LaTeX before.
>    6. I know almost nothing about how OCR software or AI works on the back
>    end.
>
> My question:
>
> Is there a good program or site out there that can take my existing PDF,
> “read” it, and help me transcribe it in (convert it to) LaTeX code?
>
> The “3% music” portion of my output will be easy for me to code myself in
> LilyPond. But I’m hoping to save several hours of work coding the “97%
> text” component of this 700-page book.
>
> Gabriel
>
>
> ---------- Forwarded message ---------
> From: Lucas Pinke
> Date: Tue, 20 Jan 2026
> Subject: Re: Wrapping Lengthy Text around a Score-as-Markup
> Cc: Lilypond-User Mailing List <[email protected]>
>
> I strongly recommend trying out lylua! As you can import Lilypond files
> straight to the processor, you only headbutt the TeX processor and the TeX
> language.
>
> Like I said, TeX and lylua work best when notation ain't the main actor:
> Cherubini's treaty could be written without any notation and still would be
> somewhat understandable. It isn't in this, but that's the beauty of things:
> diverse alternatives.
>
> As per Kieren's question, I'm using Urs Liska's lyluatex package (available
> in CTAN). I don't know of another Lilypond code integration in TeX; other
> options would be importing the output images or using specific environments
> of the TeX language. Side note: unfortunately, I didn't get to know Urs
> before his passing... However, I always cite him with regards to TeX and
> serialist music!
>
> My programming knowledge is limited, but I've managed to like and
> understand TeX and the lylua integration. I'd said lylua works best than
> LibreOffice's oOOOoLilypond plugin (which is still a great plugin!).
>
> Em ter., 20 de jan. de 2026, 16:36, Gabriel Ellsworth escreveu:
>
> > I think Gabriel wants to stay in native Lilypond for the moment —
> >> @Gabriel: Correct me if I’m wrong! — but @Gabriel: Ultimately, something
> >> like LuaLaTeX is, I believe, something you’ll want to eventually have as
> >> your platform.
> >>
> >
> > Thank you, Raphael and Lucas!
> >
> > I have never used a TeX processor but am intrigued to learn more. For my
> > current project, native LilyPond is working just fine. And at present I
> am
> > intimidated by TeX — as I was by LilyPond for years until I discovered
> > Frescobaldi!
> >
> > But at some point I imagine that it will be worth it for me to learn
> > LuaTeX.
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> https://lists.gnu.org/archive/html/lilypond-user/attachments/20260221/d8adc032/attachment.htm
> >
>
> ------------------------------
>
> Message: 3
> Date: Sat, 21 Feb 2026 08:51:45 -0300
> From: Lucas Pinke <[email protected]>
> To: Gabriel Ellsworth <[email protected]>
> Cc: Lilypond-User Mailing List <[email protected]>
> Subject: Re: OCR to Transcribe Text PDF in LaTeX
> Message-ID:
>         <CAEgBy=
> [email protected]>
> Content-Type: text/plain; charset="utf-8"
>
> Hello!
>
> Is there a good program or site out there that can take my existing PDF,
> “read” it, and help me transcribe it in (convert it to) LaTeX code?
>
> I'm not familiar with OCR software that work flawlessly. Correcting the
> OCR's output was always a hassle, specially in books that have columns.
> With regards to this matter, straight up copying it down seems to work best
> (or reading it out loud with a speech-to-text converter).
>
> Keep in mind that TeX can include multiple files (like Lilypond), so you
> can write files based on chapters (or even smaller divisions).
>
> These copyist feats are daunting and challenging... Yet we keep on writing.
>
> Em sáb., 21 de fev. de 2026, 08:37, Gabriel Ellsworth <
> [email protected]> escreveu:
>
> > Here is my situation.
> >
> >
> >    1. I am trying to typeset a new edition of a public-domain book.
> >    2. I have a PDF that contains a scanned copy of a 20th century
> >    printing of this book (about 700 pages).
> >    3. My output will contain a bit of LilyPond output, but music notation
> >    will not be “the main actor” (to borrow Lucas’s very apt phrase
> below). I
> >    estimate that the book will be 97% text and 3% LilyPond.
> >    4. Based on past helpful input from this list, I suspect that LaTeX
> >    will be the best way to create this book.
> >    5. I have never used LaTeX before.
> >    6. I know almost nothing about how OCR software or AI works on the
> >    back end.
> >
> > My question:
> >
> > Is there a good program or site out there that can take my existing PDF,
> > “read” it, and help me transcribe it in (convert it to) LaTeX code?
> >
> > The “3% music” portion of my output will be easy for me to code myself in
> > LilyPond. But I’m hoping to save several hours of work coding the “97%
> > text” component of this 700-page book.
> >
> > Gabriel
> >
> >
> > ---------- Forwarded message ---------
> > From: Lucas Pinke
> > Date: Tue, 20 Jan 2026
> > Subject: Re: Wrapping Lengthy Text around a Score-as-Markup
> > Cc: Lilypond-User Mailing List <[email protected]>
> >
> > I strongly recommend trying out lylua! As you can import Lilypond files
> > straight to the processor, you only headbutt the TeX processor and the
> TeX
> > language.
> >
> > Like I said, TeX and lylua work best when notation ain't the main actor:
> > Cherubini's treaty could be written without any notation and still would
> be
> > somewhat understandable. It isn't in this, but that's the beauty of
> things:
> > diverse alternatives.
> >
> > As per Kieren's question, I'm using Urs Liska's lyluatex package
> > (available in CTAN). I don't know of another Lilypond code integration in
> > TeX; other options would be importing the output images or using specific
> > environments of the TeX language. Side note: unfortunately, I didn't get
> to
> > know Urs before his passing... However, I always cite him with regards to
> > TeX and serialist music!
> >
> > My programming knowledge is limited, but I've managed to like and
> > understand TeX and the lylua integration. I'd said lylua works best than
> > LibreOffice's oOOOoLilypond plugin (which is still a great plugin!).
> >
> > Em ter., 20 de jan. de 2026, 16:36, Gabriel Ellsworth escreveu:
> >
> >> I think Gabriel wants to stay in native Lilypond for the moment —
> >>> @Gabriel: Correct me if I’m wrong! — but @Gabriel: Ultimately,
> something
> >>> like LuaLaTeX is, I believe, something you’ll want to eventually have
> as
> >>> your platform.
> >>>
> >>
> >> Thank you, Raphael and Lucas!
> >>
> >> I have never used a TeX processor but am intrigued to learn more. For my
> >> current project, native LilyPond is working just fine. And at present I
> am
> >> intimidated by TeX — as I was by LilyPond for years until I discovered
> >> Frescobaldi!
> >>
> >> But at some point I imagine that it will be worth it for me to learn
> >> LuaTeX.
> >>
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> https://lists.gnu.org/archive/html/lilypond-user/attachments/20260221/47cee343/attachment.htm
> >
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> lilypond-user mailing list
> [email protected]
> https://lists.gnu.org/mailman/listinfo/lilypond-user
>
>
> ------------------------------
>
> End of lilypond-user Digest, Vol 279, Issue 34
> **********************************************
>

Reply via email to