1) pdf is an image - needs to be ocr'd - then uploaded - metadata
filtermedia will try to extract the text out of the pdf and save it as
a text file along with the pdf files..--> search happens on the
extracted text
OR
2) pdf is an text -  to be uploaded - metadata filtermedia will try to
extract the text out of the pdf and save it as a text file along with
the pdf files.. --> search happens on the extracted text

3) indexing is on metadata only.


On Thu, Sep 10, 2009 at 2:15 AM, Mark H. Wood <mw...@iupui.edu> wrote:
> On Tue, Sep 01, 2009 at 03:55:11PM +1000, Gary Browne wrote:
>> When a user searches via the dspace web interface, is the search run
>> across the content of text pdfs or just the metadata? If so, does the
>> pdf submitted to the repository need to have been previously OCR'd, or
>> does the repository attempt to extract & index text from all pdfs?
>
> DSpace doesn't include OCR code.
>
> The full-text extractor (which feeds the indexing) requires actual
> coded-character text in the PDF to work with.  If all you have is a
> bag of bitmaps (such as you often get from scanning paper documents
> into PDF) then they contain nothing useful to extract; you'll need to
> OCR or otherwise recover the character data before ingesting the file
> into DSpace.
>
> --
> Mark H. Wood, Lead System Programmer   mw...@iupui.edu
> Friends don't let friends publish revisable-form documents.
>
> ------------------------------------------------------------------------------
> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
> trial. Simplify your report design, integration and deployment - and focus on
> what you do best, core application coding. Discover what's new with
> Crystal Reports now.  http://p.sf.net/sfu/bobj-july
> _______________________________________________
> DSpace-tech mailing list
> DSpace-tech@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dspace-tech
>
>

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to