On Sat, 5 Nov 2011 14:03:24 -0600, Marcelo de Moraes Serpa <celose...@gmail.com> wrote: > Hi list, > > I just bought a scanner and started to scan important documents as a > backup, and archiving them with meaningful metadata in orgmode files. Then > a question came to mind - what dpi to use? I'm not really savvy when it > comes to scanning or printing, and I want like a dpi that allows me to > reprint the document at an acceptable quality later if necessary, but that > also doesn't take that much space (600dpi pdfs take around 5MB). > > Any insights welcome, > > Thanks, > > Marcelo.
Using PDF for scanned documents results in *huge* files with a seriously disappointing image quality. Consider storing your scans in DjVu format [1], which was developed specifically for this purpose. I scan all docs @ 600dpi, predominantly gray-scale (only in colour when it's *really* necessary) and store in DjVu format, all using gscan2pdf [2]. Even at that seemingly overkill resolution, single-page documents are generally (if they aren't too "grainy") only a few 100 KiB in size. gscan2pdf also supports a number of OCR utils, but the UI for this is clumsy (aren't they all...), so you're better off using the CLI tools directly. Tesseract is recommended. I've used this approach to "convert" piles upon piles of old bank statements to Ledger format, with very little effort. NOTE: When attempting something like this, a fast scanner with a *reliable* automatic document feeder will help prevent premature hair loss ;) Peace -- Pieter [1] http://djvu.org/resources/whatisdjvu.php [2] http://gscan2pdf.sourceforge.net/