Re: OCR to Transcribe Text PDF in LaTeX

Carl Sorensen Sat, 21 Feb 2026 07:03:09 -0800

Hi, Gabriel!

On Sat, Feb 21, 2026 at 4:37 AM Gabriel Ellsworth
<[email protected]> wrote:
>
> Here is my situation.
>
> I am trying to typeset a new edition of a public-domain book.
> I have a PDF that contains a scanned copy of a 20th century printing of this 
> book (about 700 pages).
> My output will contain a bit of LilyPond output, but music notation will not 
> be “the main actor” (to borrow Lucas’s very apt phrase below). I estimate 
> that the book will be 97% text and 3% LilyPond.
> Based on past helpful input from this list, I suspect that LaTeX will be the 
> best way to create this book.
> I have never used LaTeX before.
> I know almost nothing about how OCR software or AI works on the back end.


Have you looked at Project Gutenberg to see if they have a copy of this book?

https://www.gutenberg.org/

Typically they will have plain text files of the books in their
catalog, which serve as a great start for getting the document into
LaTeX.

You might also check out the internet archive:

https://archive.org/

If you can find your book in one of these resources, the OCR will
already have been done (and most likely proofread.

If you get a plain text file, and want some help in converting it to
LaTeX, I can probably spend some time on a messaging platform helping
you get started with the process.

HTH,

Carl

Re: OCR to Transcribe Text PDF in LaTeX

Reply via email to