Hello Enrico, I think you should try to hook in the pdf extractor just like the property extractors. The problem with the normal pdf (and also xml extractor) is that they take place *after* the save, where extractors that set properties are before the save. It is unlucky that they both are called extractors.
Anyways, I think you should dive into the property extractors, and see if you can do something similar for pdf's. For me it has unfortunately been to long to know this Regards Ard On Thu, Apr 1, 2010 at 4:05 PM, Enrico Cervato <[email protected]> wrote: > Hi everybody, > > When performing a DASL query I am retrieving also some PDF's from my > binaries folder. I would like to provide to video also an extract from > the text in the PDF's. Is that possible? > > In the extractors.xml of my repository I already set the PDFExtractor. > Reading the [1], my understanding that is not a real extractor, it is > more an indexer. Therefore it will index the text contained in the PDF > but it will not extract it as a property. > > >From [2] it seems that it is not possible to extract the text from PDF's. > > I think it should be possible to do it somehow ... can you give me > some suggestions? > Thank you very much for your attention! > > [1] http://old.nabble.com/Help-with-PDFExtractor-td26808675.html > [2] http://old.nabble.com/Show-content-of-a-pdf-document-td18647758.html > > -- > Enrico Cervato - 0031 (0)615293346 > Open Source Software Engineer > Sourcesense - making sense of Open Source: http://www.sourcesense.com > ******************************************** > Hippocms-dev: Hippo CMS 6 development public mailinglist > > Searchable archives can be found at: > MarkMail: http://hippocms-dev.markmail.org > Nabble: http://www.nabble.com/Hippo-CMS-f26633.html > > ******************************************** Hippocms-dev: Hippo CMS 6 development public mailinglist Searchable archives can be found at: MarkMail: http://hippocms-dev.markmail.org Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
