https://issues.apache.org/bugzilla/show_bug.cgi?id=53951
Priority: P2
Bug ID: 53951
Assignee: [email protected]
Summary: java.io.UnsupportedEncodingException: Codepage number
may not be 0
Severity: normal
Classification: Unclassified
OS: other
Reporter: [email protected]
Hardware: Macintosh
Status: NEW
Version: unspecified
Component: HPSF
Product: POI
Hi,
I'm using Nutch to crawl websites, using Tika to parse documents. Encountered
the following ERROR and thought that this would be the place to log it.
2012-09-22 22:30:03,321 ERROR tika.TikaParser - Error parsing
http://www.montpelier-vt.org/upload/groups/384/files/meac_11.17.10.doc
java.io.UnsupportedEncodingException: Codepage number may not be 0
at
org.apache.poi.hpsf.VariantSupport.codepageToEncoding(VariantSupport.java:338)
at org.apache.poi.hpsf.VariantSupport.read(VariantSupport.java:240)
at org.apache.poi.hpsf.Property.<init>(Property.java:164)
at org.apache.poi.hpsf.Section.<init>(Section.java:277)
at org.apache.poi.hpsf.PropertySet.init(PropertySet.java:452)
at org.apache.poi.hpsf.PropertySet.<init>(PropertySet.java:247)
at
org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaryEntryIfExists(SummaryExtractor.java:67)
at
org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaries(SummaryExtractor.java:57)
at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:182)
at org.apache.nutch.parse.tika.TikaParser.getParse(TikaParser.java:124)
at org.apache.nutch.parse.ParseCallable.call(ParseCallable.java:36)
at org.apache.nutch.parse.ParseCallable.call(ParseCallable.java:23)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:680)
2012-09-22 22:30:03,322 WARN parse.ParseUtil - Unable to successfully parse
content http://www.montpelier-vt.org/upload/groups/384/files/meac_11.17.10.doc
of type application/x-tika-msoffice
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]