[ https://issues.apache.org/jira/browse/TIKA-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216337#comment-14216337 ]
Milan Zivkovic commented on TIKA-1473: -------------------------------------- I have the similar problem with .docx file. What are we getting is: {code} Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at org.apache.xmlbeans.impl.store.CharUtil.allocate(CharUtil.java:397) at org.apache.xmlbeans.impl.store.CharUtil.saveChars(CharUtil.java:506) at org.apache.xmlbeans.impl.store.CharUtil.saveChars(CharUtil.java:419) at org.apache.xmlbeans.impl.store.CharUtil.saveChars(CharUtil.java:489) at org.apache.xmlbeans.impl.store.Cur$CurLoadContext.text(Cur.java:2927) at org.apache.xmlbeans.impl.store.Cur$CurLoadContext.stripText(Cur.java:3130) at org.apache.xmlbeans.impl.store.Cur$CurLoadContext.text(Cur.java:3143) at org.apache.xmlbeans.impl.store.Locale$SaxHandler.characters(Locale.java:3291) at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.reportCdata(Piccolo.java:992) at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseXMLNS(PiccoloLexer.java:1290) at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.parseXML(PiccoloLexer.java:1261) at org.apache.xmlbeans.impl.piccolo.xml.PiccoloLexer.yylex(PiccoloLexer.java:4812) at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.yylex(Piccolo.java:1290) at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.yyparse(Piccolo.java:1400) at org.apache.xmlbeans.impl.piccolo.xml.Piccolo.parse(Piccolo.java:714) at org.apache.xmlbeans.impl.store.Locale$SaxLoader.load(Locale.java:3479) at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1277) at org.apache.xmlbeans.impl.store.Locale.parseToXmlObject(Locale.java:1264) at org.apache.xmlbeans.impl.schema.SchemaTypeLoaderBase.parse(SchemaTypeLoaderBase.java:345) at org.openxmlformats.schemas.wordprocessingml.x2006.main.DocumentDocument$Factory.parse(Unknown Source) at org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:136) at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:166) at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:118) at org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:59) at org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:181) at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:86) at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:82) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:121) at org.apache.tika.Tika.parseToString(Tika.java:506) {code} I can not post here the exact document because I need to ask for a lot of permissions, but if you need me to debug something or to send you some output just say. Thank you > Apache Tika is not working for .docx documents > ----------------------------------------------- > > Key: TIKA-1473 > URL: https://issues.apache.org/jira/browse/TIKA-1473 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 1.5, 1.6 > Reporter: Franco Catto > Priority: Blocker > > I am using Apache Tika 1.6 to read different document files. > It is reading pdf and old format doc files but when I try to read docx file, > it gives me following exception: > org.apache.tika.exception.TikaException: Failed to close temporary resources > at org.apache.tika.io.TemporaryResources.dispose(TemporaryResources.java:152) > at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:127) > ... > The resource can not be closed because it is still being used by the Java > Process, certainly the OOXML parser. -- This message was sent by Atlassian JIRA (v6.3.4#6332)