[ https://issues.apache.org/jira/browse/TIKA-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Allison resolved TIKA-1836. ------------------------------- Resolution: Fixed > Convertion DOC->TXT failed due to POI issue > ------------------------------------------- > > Key: TIKA-1836 > URL: https://issues.apache.org/jira/browse/TIKA-1836 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 1.11 > Environment: Distributor ID: Ubuntu > Description: Ubuntu 12.04.5 LTS > Release: 12.04 > Codename: precise > java version "1.7.0_91" > OpenJDK Runtime Environment (IcedTea 2.6.3) (7u91-2.6.3-0ubuntu0.12.04.1) > OpenJDK 64-Bit Server VM (build 24.91-b01, mixed mode) > Reporter: Jorge Spinsanti > Attachments: test.doc > > > When we try to convert DOC -> TXT, I got the next stack trace: > {code} > Caused by: org.apache.tika.exception.TikaException: Unexpected > RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@1ddeedb6 > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > ... 15 more > Caused by: java.lang.UnsupportedOperationException: Non-extended character > Pascal strings are not supported right now. Please, contact POI developers > for update. > at org.apache.poi.hwpf.model.Sttb.fillFields(Sttb.java:82) > at org.apache.poi.hwpf.model.Sttb.<init>(Sttb.java:61) > at > org.apache.poi.hwpf.model.SttbUtils.readSttbSavedBy(SttbUtils.java:52) > at org.apache.poi.hwpf.model.SavedByTable.<init>(SavedByTable.java:53) > at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:361) > at > org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:144) > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:146) > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:117) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > ... 22 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)