sebastian-nagel opened a new pull request, #850:
URL: https://github.com/apache/nutch/pull/850

   - Upgrade to shaded Tika packages 3.1.0.0 provided by Tim Allison.
     The shaded packages are required to avoid version conflicts when running 
in distributed mode caused by incompatible versions of the commons-io jar 
shipped with Hadoop and required by Tika, cf. NUTCH-2959.
   - Add "text/javascript" as MIME type supported by "parse-js". Note: This 
fixes the parse-js unit tests. Tika 3.1.0 identifies the Javascript test 
document as "text/javascript" instead of "application/javascript".
   
   Todo:
   - [ ] fix unit test o.a.n.parse.tika.TestDOMContentUtils : duplicated 
outlinks
   - [ ] fix unit test o.a.n.parse.tika.TestHtmlParser : parsing a UTF-16 
encoded HTML fails partially (no title, no keywords). Note: it might be that 
there are two BOMs in the test document.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@nutch.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to