[ https://issues.apache.org/jira/browse/TIKA-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Peter Davies updated TIKA-2524: ------------------------------- Description: When we parse XPS files using the AutoParser we always get an empty string. If we use DefaultDetector.detect() it correctly detects the MediaType as "application/vnd.ms-xpsdocument". This page https://tika.apache.org/1.16/formats.html suggests that XPS (application/vnd.ms-xpsdocument) is supported however. Our code: InputStream bis = new BufferedInputStream( this.getClass().getResourceAsStream("/" + EXPECTED_LOCATION + "doc_xps.xps")); Metadata metadata = new Metadata(); BodyContentHandler handler = new BodyContentHandler(); AutoDetectParser parser = new AutoDetectParser(); TikaInputStream tikaStream = TikaInputStream.get(bis); parser.parse(tikaStream, handler, metadata); String parsedText = handler.toString(); I will attach doc_xps.xps if I can was: When we parse XPS files using the AutoParser we always get an empty string. If we use DefaultDetector.detect() it correctly detects the MediaType as "application/vnd.ms-xpsdocument". This page https://tika.apache.org/1.16/formats.html suggests that XPS (application/vnd.ms-xpsdocument) is supported however. > Apache Tika returns empty string when parsing text from XPS files > ----------------------------------------------------------------- > > Key: TIKA-2524 > URL: https://issues.apache.org/jira/browse/TIKA-2524 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 1.16 > Reporter: Peter Davies > Labels: features > > When we parse XPS files using the AutoParser we always get an empty string. > If we use DefaultDetector.detect() it correctly detects the MediaType as > "application/vnd.ms-xpsdocument". > This page > https://tika.apache.org/1.16/formats.html > suggests that XPS (application/vnd.ms-xpsdocument) is supported however. > Our code: > InputStream bis = new BufferedInputStream( > this.getClass().getResourceAsStream("/" + > EXPECTED_LOCATION + "doc_xps.xps")); > Metadata metadata = new Metadata(); > BodyContentHandler handler = new BodyContentHandler(); > AutoDetectParser parser = new AutoDetectParser(); > TikaInputStream tikaStream = TikaInputStream.get(bis); > parser.parse(tikaStream, handler, metadata); > String parsedText = handler.toString(); > I will attach doc_xps.xps if I can -- This message was sent by Atlassian JIRA (v6.4.14#64029)