Re: [HippoCMS-dev] how to extract PDF text into property?

Ard Schrijvers Thu, 01 Apr 2010 12:41:56 -0700

Hello Enrico,

I think you should try to hook in the pdf extractor just like the
property extractors. The problem with the normal pdf (and also xml
extractor) is that they take place *after* the save, where extractors
that set properties are before the save. It is unlucky that they both
are called extractors.


Anyways, I think you should dive into the property extractors, and see
if you can do something similar for pdf's. For me it has unfortunately
been to long to know this

Regards Ard

On Thu, Apr 1, 2010 at 4:05 PM, Enrico Cervato
<[email protected]> wrote:
> Hi everybody,
>
> When performing a DASL query I am retrieving also some PDF's from my
> binaries folder. I would like to provide to video also an extract from
> the text in the PDF's. Is that possible?
>
> In the extractors.xml of my repository I already set the PDFExtractor.
> Reading the [1], my understanding that is not a real extractor, it is
> more an indexer. Therefore it will index the text contained in the PDF
> but it will not extract it as a property.
>
> >From [2] it seems that it is not possible to extract the text from PDF's.
>
> I think it should be possible to do it somehow ... can you give me
> some suggestions?
> Thank you very much for your attention!
>
> [1] http://old.nabble.com/Help-with-PDFExtractor-td26808675.html
> [2] http://old.nabble.com/Show-content-of-a-pdf-document-td18647758.html
>
> --
> Enrico Cervato - 0031 (0)615293346
> Open Source Software Engineer
> Sourcesense - making sense of Open Source: http://www.sourcesense.com
> ********************************************
> Hippocms-dev: Hippo CMS 6 development public mailinglist
>
> Searchable archives can be found at:
> MarkMail: http://hippocms-dev.markmail.org
> Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
>
>
********************************************
Hippocms-dev: Hippo CMS 6 development public mailinglist

Searchable archives can be found at:
MarkMail: http://hippocms-dev.markmail.org
Nabble: http://www.nabble.com/Hippo-CMS-f26633.html

Re: [HippoCMS-dev] how to extract PDF text into property?

Reply via email to