With an optional job queue and expensive OCR package deal with scanned document.
On 22/04/2013, at 8:57 PM, Roger Bell_West wrote: > On Mon, Apr 22, 2013 at 11:45:43AM +0100, Mike Whitaker wrote: >> On a similar subject, what PDF (or even text, assuming I can find something >> to extract the text on a page by page basis) indexing solutions are there >> out there in Perl? > > pdftotext and then throw the text at a generic indexing package. I > keep meaning to do something with Plucene.