This might be a reasonable task for a GPT; however, a human will have to proofread the entire text.
I attempted various forms of transcription a couple of decades ago, when I was a typesetter, book designer, and editor of several hundred academic books. None were very useful. More recently, I have tried Word XML (you do not want to touch Word’s native format) and PDF conversions using the freely available pdftotext and pdftohtml. Neither was very successful, but I have yet to try a GPT on the task. Note that the only way I know of to get a Word XML file is to use Word’s save-as-xml feature. Fred On Sat, Feb 21, 2026 at 6:52 AM <[email protected]> wrote: > Send lilypond-user mailing list submissions to > [email protected] > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.gnu.org/mailman/listinfo/lilypond-user > or, via email, send a message with subject or body 'help' to > [email protected] > > You can reach the person managing the list at > [email protected] > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of lilypond-user digest..." > > > Today's Topics: > > 1. Custom markup function integrations (Kyle Baldwin) > 2. OCR to Transcribe Text PDF in LaTeX (Gabriel Ellsworth) > 3. Re: OCR to Transcribe Text PDF in LaTeX (Lucas Pinke) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 20 Feb 2026 11:37:44 -0800 > From: Kyle Baldwin <[email protected]> > To: [email protected] > Subject: Custom markup function integrations > Message-ID: <[email protected]> > Content-Type: text/plain; charset=us-ascii > > Hi all - > > I'm trying to work on integrating a custom markup function to generate a > procedurally generated graphic. The below is mainly nonsensical but > demonstrates the interactions I'm trying to understand. > > %%% BEGIN MWE > #(define (build-node-string layout props node-count stencil) > (if > (eq? node-count 0) > stencil > (let ( > (new-markup (markup > (#:combine > (#:path 0.1 (list > (list 'lineto 1 0))) > #:translate (cons 1 0) > (#:draw-circle 0.25 0.1 #t))))) > (build-node-string > layout > props > (1- node-count) > (ly:stencil-combine-at-edge stencil X RIGHT (interpret-markup > layout props new-markup) -0.25))))) > > #(define-markup-command (generate-graph layout props node-count) > (number?) > (build-node-string > layout > props > node-count > (interpret-markup layout props (markup #:draw-circle 0.25 0.1 #t)))) > > \markup \generate-graph #4 > > \version "2.24.4" > %%% END MWE > > 1. I would like to create a list of markups as opposed to passing a > stencil in and then concatenating the new stencil part every time. Ideally, > rendering a structure like below. The main benefit would be not having to > pass in layout and props into the second function to call interpret-markup. > I noticed the interpret-markup-list, but that doesn't seem to be what I'm > looking for. > (interpret-markup > layout > props > ((markup first circle) > (markup second circle) > (markup third circle))) > > The other thing that I tried was a structure like below but the #:symbols > need to be added in a markup call, which seems to prevent me from adding > them outside of the markup constructor. > (markup ( > (#:draw-circle 0.25 0.1 #t) > (#:combine (#:path 0.1 (list (list 'lineto 1 0))) #:translate > (cons 1 0) (#:draw-circle 0.25 0.1 #t)) > (#:combine (#:path 0.1 (list (list 'lineto 1 0))) #:translate > (cons 1 0) (#:draw-circle 0.25 0.1 #t)))) > > 2. It would be nice to define `build-node-string` as a markup function so > that I can read properties from the props because, as far as I can see, > without the define-markup-command I am unable to use the #:properties > construct. How do I call a custom markup command inside another custom > markup command? > > Thanks! > > -kwb > > > > > > > > > > > > > > > > ------------------------------ > > Message: 2 > Date: Sat, 21 Feb 2026 06:35:00 -0500 > From: Gabriel Ellsworth <[email protected]> > To: Lilypond-User Mailing List <[email protected]> > Subject: OCR to Transcribe Text PDF in LaTeX > Message-ID: > < > cahavgtx-tn26x+whjmhf2rmjux_+uxo0btqvw3kpuw6jbqk...@mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > Here is my situation. > > > 1. I am trying to typeset a new edition of a public-domain book. > 2. I have a PDF that contains a scanned copy of a 20th century printing > of this book (about 700 pages). > 3. My output will contain a bit of LilyPond output, but music notation > will not be “the main actor” (to borrow Lucas’s very apt phrase below). > I > estimate that the book will be 97% text and 3% LilyPond. > 4. Based on past helpful input from this list, I suspect that LaTeX will > be the best way to create this book. > 5. I have never used LaTeX before. > 6. I know almost nothing about how OCR software or AI works on the back > end. > > My question: > > Is there a good program or site out there that can take my existing PDF, > “read” it, and help me transcribe it in (convert it to) LaTeX code? > > The “3% music” portion of my output will be easy for me to code myself in > LilyPond. But I’m hoping to save several hours of work coding the “97% > text” component of this 700-page book. > > Gabriel > > > ---------- Forwarded message --------- > From: Lucas Pinke > Date: Tue, 20 Jan 2026 > Subject: Re: Wrapping Lengthy Text around a Score-as-Markup > Cc: Lilypond-User Mailing List <[email protected]> > > I strongly recommend trying out lylua! As you can import Lilypond files > straight to the processor, you only headbutt the TeX processor and the TeX > language. > > Like I said, TeX and lylua work best when notation ain't the main actor: > Cherubini's treaty could be written without any notation and still would be > somewhat understandable. It isn't in this, but that's the beauty of things: > diverse alternatives. > > As per Kieren's question, I'm using Urs Liska's lyluatex package (available > in CTAN). I don't know of another Lilypond code integration in TeX; other > options would be importing the output images or using specific environments > of the TeX language. Side note: unfortunately, I didn't get to know Urs > before his passing... However, I always cite him with regards to TeX and > serialist music! > > My programming knowledge is limited, but I've managed to like and > understand TeX and the lylua integration. I'd said lylua works best than > LibreOffice's oOOOoLilypond plugin (which is still a great plugin!). > > Em ter., 20 de jan. de 2026, 16:36, Gabriel Ellsworth escreveu: > > > I think Gabriel wants to stay in native Lilypond for the moment — > >> @Gabriel: Correct me if I’m wrong! — but @Gabriel: Ultimately, something > >> like LuaLaTeX is, I believe, something you’ll want to eventually have as > >> your platform. > >> > > > > Thank you, Raphael and Lucas! > > > > I have never used a TeX processor but am intrigued to learn more. For my > > current project, native LilyPond is working just fine. And at present I > am > > intimidated by TeX — as I was by LilyPond for years until I discovered > > Frescobaldi! > > > > But at some point I imagine that it will be worth it for me to learn > > LuaTeX. > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > https://lists.gnu.org/archive/html/lilypond-user/attachments/20260221/d8adc032/attachment.htm > > > > ------------------------------ > > Message: 3 > Date: Sat, 21 Feb 2026 08:51:45 -0300 > From: Lucas Pinke <[email protected]> > To: Gabriel Ellsworth <[email protected]> > Cc: Lilypond-User Mailing List <[email protected]> > Subject: Re: OCR to Transcribe Text PDF in LaTeX > Message-ID: > <CAEgBy= > [email protected]> > Content-Type: text/plain; charset="utf-8" > > Hello! > > Is there a good program or site out there that can take my existing PDF, > “read” it, and help me transcribe it in (convert it to) LaTeX code? > > I'm not familiar with OCR software that work flawlessly. Correcting the > OCR's output was always a hassle, specially in books that have columns. > With regards to this matter, straight up copying it down seems to work best > (or reading it out loud with a speech-to-text converter). > > Keep in mind that TeX can include multiple files (like Lilypond), so you > can write files based on chapters (or even smaller divisions). > > These copyist feats are daunting and challenging... Yet we keep on writing. > > Em sáb., 21 de fev. de 2026, 08:37, Gabriel Ellsworth < > [email protected]> escreveu: > > > Here is my situation. > > > > > > 1. I am trying to typeset a new edition of a public-domain book. > > 2. I have a PDF that contains a scanned copy of a 20th century > > printing of this book (about 700 pages). > > 3. My output will contain a bit of LilyPond output, but music notation > > will not be “the main actor” (to borrow Lucas’s very apt phrase > below). I > > estimate that the book will be 97% text and 3% LilyPond. > > 4. Based on past helpful input from this list, I suspect that LaTeX > > will be the best way to create this book. > > 5. I have never used LaTeX before. > > 6. I know almost nothing about how OCR software or AI works on the > > back end. > > > > My question: > > > > Is there a good program or site out there that can take my existing PDF, > > “read” it, and help me transcribe it in (convert it to) LaTeX code? > > > > The “3% music” portion of my output will be easy for me to code myself in > > LilyPond. But I’m hoping to save several hours of work coding the “97% > > text” component of this 700-page book. > > > > Gabriel > > > > > > ---------- Forwarded message --------- > > From: Lucas Pinke > > Date: Tue, 20 Jan 2026 > > Subject: Re: Wrapping Lengthy Text around a Score-as-Markup > > Cc: Lilypond-User Mailing List <[email protected]> > > > > I strongly recommend trying out lylua! As you can import Lilypond files > > straight to the processor, you only headbutt the TeX processor and the > TeX > > language. > > > > Like I said, TeX and lylua work best when notation ain't the main actor: > > Cherubini's treaty could be written without any notation and still would > be > > somewhat understandable. It isn't in this, but that's the beauty of > things: > > diverse alternatives. > > > > As per Kieren's question, I'm using Urs Liska's lyluatex package > > (available in CTAN). I don't know of another Lilypond code integration in > > TeX; other options would be importing the output images or using specific > > environments of the TeX language. Side note: unfortunately, I didn't get > to > > know Urs before his passing... However, I always cite him with regards to > > TeX and serialist music! > > > > My programming knowledge is limited, but I've managed to like and > > understand TeX and the lylua integration. I'd said lylua works best than > > LibreOffice's oOOOoLilypond plugin (which is still a great plugin!). > > > > Em ter., 20 de jan. de 2026, 16:36, Gabriel Ellsworth escreveu: > > > >> I think Gabriel wants to stay in native Lilypond for the moment — > >>> @Gabriel: Correct me if I’m wrong! — but @Gabriel: Ultimately, > something > >>> like LuaLaTeX is, I believe, something you’ll want to eventually have > as > >>> your platform. > >>> > >> > >> Thank you, Raphael and Lucas! > >> > >> I have never used a TeX processor but am intrigued to learn more. For my > >> current project, native LilyPond is working just fine. And at present I > am > >> intimidated by TeX — as I was by LilyPond for years until I discovered > >> Frescobaldi! > >> > >> But at some point I imagine that it will be worth it for me to learn > >> LuaTeX. > >> > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > https://lists.gnu.org/archive/html/lilypond-user/attachments/20260221/47cee343/attachment.htm > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > lilypond-user mailing list > [email protected] > https://lists.gnu.org/mailman/listinfo/lilypond-user > > > ------------------------------ > > End of lilypond-user Digest, Vol 279, Issue 34 > ********************************************** >
