Hi everyone, I'm trying to use the PDFExtractor (using Hippo Repository 1.2.15); I've added to my (default) extractors.xml the following:
.... <extractor classname="org.apache.slide.extractor.PDFExtractor" uri="/files/default.preview/binaries" content-type="application/pdf"/> ..... then I dropped a Google Docs generated PDF file (attached) in /files/default.preview/binaries (via WebDAV); I see the repository logging some interesting bits (attached) as if the extraction process went fine, but I can't see the extracted data; I'd have expected a WebDAV property attached to the file, but nothing shows up; this is the list of properties related with the PDF file (using DAVExplorer) getlastmodified DAV: Wed, 16 Dec 2009 09:38:35 GMT displayname DAV: this_is_my_title.pdf modificationdate DAV: 2009-12-16T09:38:35Z UID DAV: 96da71317f000001004b0bbb796bcb32 supportedlock DAV: getcontenttype DAV: application/pdf getcontentlength DAV: 5078 resourcetype DAV: getcontentlanguage DAV: en getetag DAV: ada3fdca64b1fd70a3d7b2ed66b3e68b lockdiscovery DAV: source DAV: creationdate DAV: 2009-12-16T09:38:35Z I feel like I'm missing something on how the PDFExtractor works; I've looked for some documentation or specific configurations, but I couldn't find anything interesting. Any hints? TIA mau Met vriendelijke groet, -- Maurizio Pillitu - 0031 (0)615655668 Opensource Software Engineer Scrum Certified Master - http://www.scrumalliance.org Sourcesense - making sense of Open Source: http://www.sourcesense.com
this_is_my_title.pdf
Description: Adobe PDF document
indexes.log
Description: Binary data
******************************************** Hippocms-dev: Hippo CMS development public mailinglist Searchable archives can be found at: MarkMail: http://hippocms-dev.markmail.org Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
