[ https://issues.apache.org/jira/browse/TIKA-3574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
liyu updated TIKA-3574: ----------------------- Description: code example {code:java} Parser parser = new AutoDecterParser(tikaConfig); parser = new RecursiveParserWrapper(parser); ForkParser forkParser = new ForkParser(parser.getClass().getClassLoader(), parser); forkParser.setServerParseTimeoutMills(600000); forkParser.setServerWaitTimeoutMills(600000); // then parser inputstream BasicContentHandlerFactory factory = new BasicContentHandlerFactory(HANDLER_TYPE.HEML, 104857600); RecursiveParseWrapperHandler handler = new RecursiveParseWrapperHandler(factory, -1); Metadata metadata = new Metadata(); ParseContext context = new ParseContext(); try{ forkParser.parse(inputStream,handler,metadata,context); } catch (Exception e) { } {code} after the fork parser timeout, i get metaDataList from handler.getMetaDataList() But handler.getMetaDataList().get(0) not root metadata of inputstream, it's embeddedDocument metadata of inputStream So i can't get current ContentType for inputstream tika version: apache tika 1.25 was: code example {code:java} Parser parser = new AutoDecterParser(tikaConfig); parser = new RecursiveParserWrapper(parser); ForkParser forkParser = new ForkParser(parser.getClass().getClassLoader(), parser); forkParser.setServerParseTimeoutMills(600000); forkParser.setServerWaitTimeoutMills(600000); // then parser inputstream BasicContentHandlerFactory factory = new BasicContentHandlerFactory(HANDLER_TYPE.HEML, 104857600); RecursiveParseWrapperHandler handler = new RecursiveParseWrapperHandler(factory, -1); Metadata metadata = new Metadata(); ParseContext context = new ParseContext(); try{ forkParser.parse(inputStream,handler,metadata,context); } catch (Exception e) { } {code} after the fork parser timeout, i get metaDataList from handler.getMetaDataList() But handler.getMetaDataList().get(0) not root metadata of inputstream, it's embeddedDocument metadata of inputStream So i can't get current ContentType for inputstream > after the fork parser timeout,Can't get the correct content-type > ---------------------------------------------------------------- > > Key: TIKA-3574 > URL: https://issues.apache.org/jira/browse/TIKA-3574 > Project: Tika > Issue Type: Bug > Reporter: liyu > Priority: Major > > code example > {code:java} > Parser parser = new AutoDecterParser(tikaConfig); > parser = new RecursiveParserWrapper(parser); > ForkParser forkParser = new ForkParser(parser.getClass().getClassLoader(), > parser); > forkParser.setServerParseTimeoutMills(600000); > forkParser.setServerWaitTimeoutMills(600000); > // then parser inputstream > BasicContentHandlerFactory factory = new > BasicContentHandlerFactory(HANDLER_TYPE.HEML, 104857600); > RecursiveParseWrapperHandler handler = new > RecursiveParseWrapperHandler(factory, -1); > Metadata metadata = new Metadata(); > ParseContext context = new ParseContext(); > try{ > forkParser.parse(inputStream,handler,metadata,context); > } catch (Exception e) { > } > {code} > after the fork parser timeout, i get metaDataList from > handler.getMetaDataList() > But handler.getMetaDataList().get(0) not root metadata of inputstream, it's > embeddedDocument metadata of inputStream > So i can't get current ContentType for inputstream > > > tika version: apache tika 1.25 > -- This message was sent by Atlassian Jira (v8.3.4#803005)