1) pdf is an image - needs to be ocr'd - then uploaded - metadata filtermedia will try to extract the text out of the pdf and save it as a text file along with the pdf files..--> search happens on the extracted text OR 2) pdf is an text - to be uploaded - metadata filtermedia will try to extract the text out of the pdf and save it as a text file along with the pdf files.. --> search happens on the extracted text
3) indexing is on metadata only. On Thu, Sep 10, 2009 at 2:15 AM, Mark H. Wood <mw...@iupui.edu> wrote: > On Tue, Sep 01, 2009 at 03:55:11PM +1000, Gary Browne wrote: >> When a user searches via the dspace web interface, is the search run >> across the content of text pdfs or just the metadata? If so, does the >> pdf submitted to the repository need to have been previously OCR'd, or >> does the repository attempt to extract & index text from all pdfs? > > DSpace doesn't include OCR code. > > The full-text extractor (which feeds the indexing) requires actual > coded-character text in the PDF to work with. If all you have is a > bag of bitmaps (such as you often get from scanning paper documents > into PDF) then they contain nothing useful to extract; you'll need to > OCR or otherwise recover the character data before ingesting the file > into DSpace. > > -- > Mark H. Wood, Lead System Programmer mw...@iupui.edu > Friends don't let friends publish revisable-form documents. > > ------------------------------------------------------------------------------ > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day > trial. Simplify your report design, integration and deployment - and focus on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > _______________________________________________ > DSpace-tech mailing list > DSpace-tech@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/dspace-tech > > ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech