Amanda P wrote:
Cameras around $100 dollars are very low quality. You could get no where near the dpi recommended for materials that need to be OCRed. The quality of images from cameras would be not only low, but the OCR (even with the best software) would probably have many errors. For someone scanning items at home this might be ok, but for archival quality, I would not recommend cameras. If you are grant funded and the grant provider requires a certain level of quality, you need to make sure the scanning mechanism you use can scan at that quality.
To capture an image 8.5 x 11" at 300 dpi, you need roughly 8.4 megapixels, which is well within the capabilities of an inexpensive pocket camera. (If you need 600 dpi, then you're in the 33.6 megapixel range.) As to whether the quality will be sufficient, this would depend on the goals and requirements of the project, but 300 dpi should be enough to get good OCR results for normal-sized text. Our very old version of PrimeOCR recommends 300 dpi, and suggests that 400 dpi may provide substantially better quality for text sizes smaller than 8 point, while 200 dpi will be sufficient for text 12 points and up. At 300 and 400 dpi on 19th Century small-print, variable quality texts, we are generally getting good to very good recognition: the quality of the original document itself is the limiting factor. More modern documents (and OCR software) should produce even better results. The cameras used by the Internet Archive are only 12 megapixels, though they are of substantially higher quality than a Canon PowerShot.
Some applications require very high quality images, and cheap cameras might not be able to deliver the goods, but if you just want to make sure the text of your documents is digitally preserved and/or available to read online, you don't really need all that much in the way of hardware. Using a pocket camera and a stand to digitize more than a few pages is going to be slow, clumsy and painful, but for many applications, the end result may be entirely acceptable.
-William