Hi Ard, Thank you very much for your answer.
Only to be a 100% clear about the situation ... my understanding is that there is no PDF (real) extractor for hippo CMS 6 at the moment. So if I wanted to implement what I need I should write the extractor myself. Does that correspond to the truth? Thank you, -- Enrico Cervato - 0031 (0)615293346 Open Source Software Engineer Sourcesense - making sense of Open Source: http://www.sourcesense.com On Thu, Apr 1, 2010 at 9:41 PM, Ard Schrijvers <[email protected]> wrote: > Hello Enrico, > > I think you should try to hook in the pdf extractor just like the > property extractors. The problem with the normal pdf (and also xml > extractor) is that they take place *after* the save, where extractors > that set properties are before the save. It is unlucky that they both > are called extractors. > > Anyways, I think you should dive into the property extractors, and see > if you can do something similar for pdf's. For me it has unfortunately > been to long to know this > > Regards Ard > > On Thu, Apr 1, 2010 at 4:05 PM, Enrico Cervato > <[email protected]> wrote: >> Hi everybody, >> >> When performing a DASL query I am retrieving also some PDF's from my >> binaries folder. I would like to provide to video also an extract from >> the text in the PDF's. Is that possible? >> >> In the extractors.xml of my repository I already set the PDFExtractor. >> Reading the [1], my understanding that is not a real extractor, it is >> more an indexer. Therefore it will index the text contained in the PDF >> but it will not extract it as a property. >> >> >From [2] it seems that it is not possible to extract the text from PDF's. >> >> I think it should be possible to do it somehow ... can you give me >> some suggestions? >> Thank you very much for your attention! >> >> [1] http://old.nabble.com/Help-with-PDFExtractor-td26808675.html >> [2] http://old.nabble.com/Show-content-of-a-pdf-document-td18647758.html >> >> -- >> Enrico Cervato - 0031 (0)615293346 >> Open Source Software Engineer >> Sourcesense - making sense of Open Source: http://www.sourcesense.com >> ******************************************** >> Hippocms-dev: Hippo CMS 6 development public mailinglist >> >> Searchable archives can be found at: >> MarkMail: http://hippocms-dev.markmail.org >> Nabble: http://www.nabble.com/Hippo-CMS-f26633.html >> >> > ******************************************** > Hippocms-dev: Hippo CMS 6 development public mailinglist > > Searchable archives can be found at: > MarkMail: http://hippocms-dev.markmail.org > Nabble: http://www.nabble.com/Hippo-CMS-f26633.html > > ******************************************** Hippocms-dev: Hippo CMS 6 development public mailinglist Searchable archives can be found at: MarkMail: http://hippocms-dev.markmail.org Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
