[HippoCMS-dev] Help with PDFExtractor

Maurizio Pillitu Wed, 16 Dec 2009 01:49:18 -0800

Hi everyone,
I'm trying to use the PDFExtractor (using Hippo Repository 1.2.15); I've
added to my (default) extractors.xml the following:


....
<extractor classname="org.apache.slide.extractor.PDFExtractor"
uri="/files/default.preview/binaries" content-type="application/pdf"/>
.....

then I dropped a Google Docs generated PDF file (attached) in
/files/default.preview/binaries (via WebDAV); I see the repository logging
some interesting bits (attached) as if the extraction process went fine, but
I can't see the extracted data; I'd have expected a WebDAV property attached
to the file, but nothing shows up; this is the list of properties related
with the PDF file (using DAVExplorer)

getlastmodified DAV: Wed, 16 Dec 2009 09:38:35 GMT
displayname DAV: this_is_my_title.pdf
modificationdate DAV: 2009-12-16T09:38:35Z
UID DAV: 96da71317f000001004b0bbb796bcb32
supportedlock DAV:
getcontenttype DAV: application/pdf
getcontentlength DAV: 5078
resourcetype DAV:
getcontentlanguage DAV: en
getetag DAV: ada3fdca64b1fd70a3d7b2ed66b3e68b
lockdiscovery DAV:
source DAV:
creationdate DAV: 2009-12-16T09:38:35Z


I feel like I'm missing something on how the PDFExtractor works; I've looked
for some documentation or specific configurations, but I couldn't find
anything interesting.

Any hints?
TIA
  mau

Met vriendelijke groet,
-- 
Maurizio Pillitu - 0031 (0)615655668
Opensource Software Engineer
Scrum Certified Master - http://www.scrumalliance.org
Sourcesense - making sense of Open Source: http://www.sourcesense.com

this_is_my_title.pdf
Description: Adobe PDF document

indexes.log
Description: Binary data

********************************************
Hippocms-dev: Hippo CMS development public mailinglist

Searchable archives can be found at:
MarkMail: http://hippocms-dev.markmail.org
Nabble: http://www.nabble.com/Hippo-CMS-f26633.html

[HippoCMS-dev] Help with PDFExtractor

Reply via email to