i've achieved satisfactory results paletteizing scans of low-color-depth material using a tool called 'noteshrink':
https://mzucker.github.io/2016/09/20/noteshrink.html -- æstrid smith (she/her) =<[ c y b e r ]>= antique telephone collectors association member #4870 On Fri, Aug 27, 2021, at 13:50, Antonio Carlini via cctalk wrote: > I have a few manuals to scan and I'm looking for suggestions, about how > to add bookmarks and how to handle colour. > > Bookmarks should be easier, so lets start with that. I want to add > bookmarks (or whatever they are called) so that it is easy to navigate > to page "2-48" or "C-17" in a document. Many of the PDFs on bitsavers > have that and I've found it very useful so I'd like to do that for my > future scans. I've tried with pdftk (the Java port as the original is no > longer available on my distro) but that failed. So I tried GhostScript > and that also failed, while also rewriting the PDF to be considerably > larger. Is there simple way to achieve this (ideally from the CLI)? > > > Now for the scanning itself. > > For manuals that are simple monochrome, I plan to scan at 600dpi bilevel > G4 encoded, wrapped in PDF. > For photographs or shaded areas that don't necessarily come out well > under those settings, I plan to use 8-bit greyscale. I'd prefer to use > 600dpi but I may have to fall back to 300dpi if the per-page fiile size > shoots up too much. > > The real issue is colour. I know that various people have looked at the > issue of how to efficiently scan pages that are mostly black and white > but have some coloured text (RSX-11 manuals and early VMS manuals did > this to highlight terminal input, for example). I don't think this is a > solved problem and I'm not expecting a solution, what I'm really looking > for is to check that what I'm about to produce will have all the > information that a future efficient algorithm is likely to need. > > I'm going to start by scanning the whole manual as though it had no > colour (so 600 dpi bilevel G4 encoded, except for pages with photos and > shading and so on). Then I'm going to go back and rescan the pages that > have colour and scan those at 600 dpi and save as a JPG. Then I'll > produce a final PDF with the colour pages inserted. I'll also produce a > PDF with the B&W pages that were replaced by colour pages (I assume OCR > will be better served by non-jaggy scans). > > So the final outputs will be: > manual.pdf - the whole manual, including whole pages scanned as colour > if any colour is present on them > manual_BW.pdf - the G4-encoded bilevel pages that were replaced by > colour pages > > Thanks > > > Antonio > > > -- > > Antonio Carlini > anto...@acarlini.com > >