Jorge Spinsanti created TIKA-1836:
-------------------------------------

             Summary: Convertion DOC->TXT failed due to POI issue
                 Key: TIKA-1836
                 URL: https://issues.apache.org/jira/browse/TIKA-1836
             Project: Tika
          Issue Type: Bug
    Affects Versions: 1.11
         Environment: Distributor ID:   Ubuntu
Description:    Ubuntu 12.04.5 LTS
Release:        12.04
Codename:       precise

java version "1.7.0_91"
OpenJDK Runtime Environment (IcedTea 2.6.3) (7u91-2.6.3-0ubuntu0.12.04.1)
OpenJDK 64-Bit Server VM (build 24.91-b01, mixed mode)


            Reporter: Jorge Spinsanti


When we try to convert DOC -> TXT, I got the next stack trace:
{code}
Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException 
from org.apache.tika.parser.microsoft.OfficeParser@1ddeedb6
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
        ... 15 more
Caused by: java.lang.UnsupportedOperationException: Non-extended character 
Pascal strings are not supported right now. Please, contact POI developers for 
update.
        at org.apache.poi.hwpf.model.Sttb.fillFields(Sttb.java:82)
        at org.apache.poi.hwpf.model.Sttb.<init>(Sttb.java:61)
        at 
org.apache.poi.hwpf.model.SttbUtils.readSttbSavedBy(SttbUtils.java:52)
        at org.apache.poi.hwpf.model.SavedByTable.<init>(SavedByTable.java:53)
        at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:361)
        at 
org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:144)
        at 
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:146)
        at 
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:117)
        at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
        ... 22 more
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to