there is a pdf extractor, but this one only runs during indexing, and not for storing a property as you seem to want
Regards Ard On Tue, Apr 6, 2010 at 3:47 PM, Enrico Cervato <[email protected]> wrote: > Hi Ard, > > Thank you very much for your answer. > > Only to be a 100% clear about the situation ... my understanding is > that there is no PDF (real) extractor for hippo CMS 6 at the moment. > So if I wanted to implement what I need I should write the extractor > myself. Does that correspond to the truth? > > Thank you, > -- > Enrico Cervato - 0031 (0)615293346 > Open Source Software Engineer > Sourcesense - making sense of Open Source: http://www.sourcesense.com > > > > > On Thu, Apr 1, 2010 at 9:41 PM, Ard Schrijvers > <[email protected]> wrote: >> Hello Enrico, >> >> I think you should try to hook in the pdf extractor just like the >> property extractors. The problem with the normal pdf (and also xml >> extractor) is that they take place *after* the save, where extractors >> that set properties are before the save. It is unlucky that they both >> are called extractors. >> >> Anyways, I think you should dive into the property extractors, and see >> if you can do something similar for pdf's. For me it has unfortunately >> been to long to know this >> >> Regards Ard >> >> On Thu, Apr 1, 2010 at 4:05 PM, Enrico Cervato >> <[email protected]> wrote: >>> Hi everybody, >>> >>> When performing a DASL query I am retrieving also some PDF's from my >>> binaries folder. I would like to provide to video also an extract from >>> the text in the PDF's. Is that possible? >>> >>> In the extractors.xml of my repository I already set the PDFExtractor. >>> Reading the [1], my understanding that is not a real extractor, it is >>> more an indexer. Therefore it will index the text contained in the PDF >>> but it will not extract it as a property. >>> >>> >From [2] it seems that it is not possible to extract the text from PDF's. >>> >>> I think it should be possible to do it somehow ... can you give me >>> some suggestions? >>> Thank you very much for your attention! >>> >>> [1] http://old.nabble.com/Help-with-PDFExtractor-td26808675.html >>> [2] http://old.nabble.com/Show-content-of-a-pdf-document-td18647758.html >>> >>> -- >>> Enrico Cervato - 0031 (0)615293346 >>> Open Source Software Engineer >>> Sourcesense - making sense of Open Source: http://www.sourcesense.com >>> ******************************************** >>> Hippocms-dev: Hippo CMS 6 development public mailinglist >>> >>> Searchable archives can be found at: >>> MarkMail: http://hippocms-dev.markmail.org >>> Nabble: http://www.nabble.com/Hippo-CMS-f26633.html >>> >>> >> ******************************************** >> Hippocms-dev: Hippo CMS 6 development public mailinglist >> >> Searchable archives can be found at: >> MarkMail: http://hippocms-dev.markmail.org >> Nabble: http://www.nabble.com/Hippo-CMS-f26633.html >> >> > ******************************************** > Hippocms-dev: Hippo CMS 6 development public mailinglist > > Searchable archives can be found at: > MarkMail: http://hippocms-dev.markmail.org > Nabble: http://www.nabble.com/Hippo-CMS-f26633.html > > ******************************************** Hippocms-dev: Hippo CMS 6 development public mailinglist Searchable archives can be found at: MarkMail: http://hippocms-dev.markmail.org Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
