sebastian-nagel opened a new pull request, #850: URL: https://github.com/apache/nutch/pull/850
- Upgrade to shaded Tika packages 3.1.0.0 provided by Tim Allison. The shaded packages are required to avoid version conflicts when running in distributed mode caused by incompatible versions of the commons-io jar shipped with Hadoop and required by Tika, cf. NUTCH-2959. - Add "text/javascript" as MIME type supported by "parse-js". Note: This fixes the parse-js unit tests. Tika 3.1.0 identifies the Javascript test document as "text/javascript" instead of "application/javascript". Todo: - [ ] fix unit test o.a.n.parse.tika.TestDOMContentUtils : duplicated outlinks - [ ] fix unit test o.a.n.parse.tika.TestHtmlParser : parsing a UTF-16 encoded HTML fails partially (no title, no keywords). Note: it might be that there are two BOMs in the test document. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@nutch.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org