[ https://issues.apache.org/jira/browse/NUTCH-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771170#comment-17771170 ]
Tim Allison edited comment on NUTCH-2959 at 10/2/23 3:51 PM: ------------------------------------------------------------- I've continued to stub my toes on this this morning. The best option, which I acknowledge might not be acceptable, seems to be to create a separate (temporary!) shim project that shades commons-io for Tika and POI and removes xerces/xml-apis. The shaded fat tika-app jar didn't work because of xerces/xml-apis. I could have done some ugly jar rewriting in ant to delete org/apache/xerces etc., but that felt really awful. The current shim project is here: https://github.com/tballison/hadoop-safe-tika If this is something we want to pursue, I can run through the full tests etc and then publish to maven central. I also have to add the language detector. The repo is purely proof of concept and shouldn't even be built/tested locally yet. The goal would be to use this until Apache Tika, Apache POI and Apache Hadoop can all get to a compatible version of commons-io. This solution would allow us to avoid the messy shading of commons-io in tika-app on the actual Apache Tika project. WDYT? was (Author: talli...@mitre.org): I've continued to stub my toes on this this morning. The best option, which I realize might not be acceptable, seems to be to create a separate (temporary!) shim project that shades commons-io for Tika and POI and removes xerces/xml-apis. The shaded fat tika-app jar didn't work because of xerces/xml-apis. The current shim project is here: https://github.com/tballison/hadoop-safe-tika If this is something we want to pursue, I can run through the full tests etc and then publish to maven central. I also have to add the language detector. The repo is purely proof of concept and shouldn't even be built/tested locally yet. The goal would be to use this until Apache Tika, Apache POI and Apache Hadoop can all get to a compatible version of commons-io. This solution would allow us to avoid the messy shading of commons-io in tika-app on the actual Apache Tika project. WDYT? > Upgrade to Apache Tika 2.9.0 > ---------------------------- > > Key: NUTCH-2959 > URL: https://issues.apache.org/jira/browse/NUTCH-2959 > Project: Nutch > Issue Type: Task > Affects Versions: 1.19 > Reporter: Markus Jelsma > Priority: Major > Fix For: 1.20 > > Attachments: NUTCH-2959.patch > > -- This message was sent by Atlassian Jira (v8.20.10#820010)