On January 23, 2014, Levi Pearson wrote:


> There's a nice program called Scan Tailor, which is a GUI wrapper

> around some command line tools, that helps to turn raw pages into

> a clean DJVu or PDF archive of the book/magazine.



I'll have to look at that one, thanks! BTW, What's DJVu? PDF is obvious. I
don't think I've heard of DJVu before. :)


--- Dan


On Thu, Jan 23, 2014 at 9:26 PM, Levi Pearson <[email protected]> wrote:

> On Thu, Jan 23, 2014 at 2:16 AM, Dan Egli <[email protected]> wrote:
> > I was letting my mind wander last night, and I got to thinking about all
> > these older magazines that I have stashed in various places. I kept them
> > because they had interesting articles and the like. I was wondering if
> > there was an easy way to convert them into a digital format, so I can
> > recycle the dead trees. It's not just the idea of capturing the text and
> > OCRing it, though. Many have pictures that are a vital part of the
> article.
> > My first thought was to write a PDF, but I don't see an easy way to do
> > that. All I can think of on that is to scan each page into an image file
> > (jpeg or similar), then import them into a LibreOffice document, then
> save
> > that document in PDF format. I imagine that would work, but it would also
> > kill text searching, I'd think. I suppose I could scan the images in,
> then
> > scan the text in, OCR it, and then re-format it for the PDF, but that
> seems
> > like a LOT of work. Especially as I think I have over 50 old issues of
> > various magazines lying around in storage that I'd like to convert.
> >
> >
> >
> > Does anyone know of any easy methods for converting articles on paper,
> with
> > images, into something digitally readable? I don't care if it's PDF or
> ePub
> > or something else, as long as it looks decent on the computer screen,
> and I
> > can search text within the article.
>
> There's a nice program called Scan Tailor, which is a GUI wrapper
> around some command line tools, that helps to turn raw page scans into
> a clean DJVu or PDF archive of the book/magazine.  I've not scanned
> any books myself, but I did use it to clean up and shrink some messy
> scans I've downloaded from others.  It lets you automate the splitting
> of two-up scans, de-skew the pages, crop out margins, and re-center
> with consistent margins. You can also run de-speckle algorithms,
> convert to mono *for text-only regions* and blank out flaws/marks on
> pages.  You end up a directory of cleaned-up uncompressed page images,
> which you can then use some other command-line tools to compile into
> your preferred container format (PDF or DJVu), possibly with an OCR
> phase to embed a textual representation as well, which enables
> searchability. There are some relatively automated open source OCR
> programs that can fit in this workflow and embed text for
> searchability into your PDF, but I haven't got to the point of doing
> that yet.
>
> Regarding DJVu vs. PDF: It used to be that only DJVu supported
> compression mechanisms that allowed you to layer and compose mono and
> grayscale/color page regions, which meant that you could get a much
> more efficient archive with DJVu at the cost of significantly reducing
> the set of programs that would read your archive.  But sometime in the
> last couple of years PDF gained some additional compression mechanisms
> for bitmaps that allowed it to reach near-parity with DJVu in file
> sizes (see archive.org for a whole lot of book scans in various
> formats [https://archive.org/details/dasleidenunsersh00bras for
> example]). The big advantage of PDF is that just about everyone has
> PDF viewing software installed already, including phones and tablets.
> Today's hi-res 10" tablet displays make wonderful PDF-reading
> machines, and hopefully someday we'll have nice 10" or so e-ink
> displays in affordable tablets as well.
>
> I've included some links below to some resources I found useful.
>
>     --Levi
>
> [Scan Tailor]: http://scantailor.sourceforge.net/
> [Scan Tailor Guide]:
> http://sourceforge.net/apps/mediawiki/scantailor/index.php?title=User_Guide
> [DIY Book Scanning Info]: http://www.diybookscanner.org/
>
> /*
> PLUG: http://plug.org, #utah on irc.freenode.net
> Unsubscribe: http://plug.org/mailman/options/plug
> Don't fear the penguin.
> */
>

/*
PLUG: http://plug.org, #utah on irc.freenode.net
Unsubscribe: http://plug.org/mailman/options/plug
Don't fear the penguin.
*/

Reply via email to