[HippoCMS-dev] how to extract PDF text into property?

Enrico Cervato Thu, 01 Apr 2010 07:05:26 -0700

Hi everybody,

When performing a DASL query I am retrieving also some PDF's from my
binaries folder. I would like to provide to video also an extract from
the text in the PDF's. Is that possible?


In the extractors.xml of my repository I already set the PDFExtractor.
Reading the [1], my understanding that is not a real extractor, it is
more an indexer. Therefore it will index the text contained in the PDF
but it will not extract it as a property.

>From [2] it seems that it is not possible to extract the text from PDF's.

I think it should be possible to do it somehow ... can you give me
some suggestions?
Thank you very much for your attention!

[1] http://old.nabble.com/Help-with-PDFExtractor-td26808675.html
[2] http://old.nabble.com/Show-content-of-a-pdf-document-td18647758.html

-- 
Enrico Cervato - 0031 (0)615293346
Open Source Software Engineer
Sourcesense - making sense of Open Source: http://www.sourcesense.com
********************************************
Hippocms-dev: Hippo CMS 6 development public mailinglist

Searchable archives can be found at:
MarkMail: http://hippocms-dev.markmail.org
Nabble: http://www.nabble.com/Hippo-CMS-f26633.html

[HippoCMS-dev] how to extract PDF text into property?

Reply via email to