https://issues.apache.org/bugzilla/show_bug.cgi?id=53816
Priority: P2
Bug ID: 53816
Assignee: [email protected]
Summary: Extracted word count is incorrect
Severity: normal
Classification: Unclassified
OS: Linux
Reporter: [email protected]
Hardware: PC
Status: NEW
Version: 3.9-dev
Component: HPSF
Product: POI
Created attachment 29316
--> https://issues.apache.org/bugzilla/attachment.cgi?id=29316&action=edit
Word document showing incorrect PID_WORDCOUNT=11
I have a Word doc (attached) that has 6 words, plus an embedded PDF document
(not sure that's relevant). When I view the word count with Word it correctly
says 6. But when I run org.apache.poi.hpsf.extractor.HPSFPropertiesExtractor
the word count incorrectly says 11:
1 = 1252
PID_TITLE =
PID_SUBJECT =
PID_AUTHOR = IBMer
PID_KEYWORDS =
PID_TEMPLATE = Normal.dot
PID_LASTAUTHOR = IBMer
PID_REVNUMBER = 3
PID_APPNAME = Microsoft Office Word
PID_EDITTIME = Sun Dec 31 19:03:00 EST 1600
PID_CREATE_DTM = Tue Jul 17 07:16:00 EDT 2012
PID_LASTSAVE_DTM = Mon Jul 23 07:21:00 EDT 2012
PID_PAGECOUNT = 1
PID_WORDCOUNT = 11
PID_CHARCOUNT = 55
PID_SECURITY = 0
PID_CODEPAGE = 1252
PID_COMPANY = IBM
PID_LINECOUNT = 1
PID_PARCOUNT = 1
17 = 65
23 = 730895
PID_SCALE = false
PID_LINKSDIRTY = false
19 = false
22 = false
PID_DOCPARTS =
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]