I have a few manuals to scan and I'm looking for suggestions, about how to add bookmarks and how to handle colour.

Bookmarks should be easier, so lets start with that. I want to add bookmarks (or whatever they are called) so that it is easy to navigate to page "2-48" or "C-17" in a document. Many of the PDFs on bitsavers have that and I've found it very useful so I'd like to do that for my future scans. I've tried with pdftk (the Java port as the original is no longer available on my distro) but that failed. So I tried GhostScript and that also failed, while also rewriting the PDF to be considerably larger. Is there simple way to achieve this (ideally from the CLI)?


Now for the scanning itself.

For manuals that are simple monochrome, I plan to scan at 600dpi bilevel G4 encoded, wrapped in PDF. For photographs or shaded areas that don't necessarily come out well under those settings, I plan to use 8-bit greyscale. I'd prefer to use 600dpi but I may have to fall back to 300dpi if the per-page fiile size shoots up too much.

The real issue is colour. I know that various people have looked at the issue of how to efficiently scan pages that are mostly black and white but have some coloured text (RSX-11 manuals and early VMS manuals did this to highlight terminal input, for example). I don't think this is a solved problem and I'm not expecting a solution, what I'm really looking for is to check that what I'm about to produce will have all the information that a future efficient algorithm is likely to need.

I'm going to start by scanning the whole manual as though it had no colour (so 600 dpi bilevel G4 encoded, except for pages with photos and shading and so on). Then I'm going to go back and rescan the pages that have colour and scan those at 600 dpi and save as a JPG. Then I'll produce a final PDF with the colour pages inserted. I'll also produce a PDF with the B&W pages that were replaced by colour pages (I assume OCR will be better served by non-jaggy scans).

So the final outputs will be:
manual.pdf  - the whole manual, including whole pages scanned as colour if any colour is present on them manual_BW.pdf  - the G4-encoded bilevel pages that were replaced by colour pages

Thanks


Antonio


--

Antonio Carlini
anto...@acarlini.com

Reply via email to