Hello, I am getting below error *[0]* while parsing an image. It seems Tika is detecting the URL ( http://www.sturmgewehr.com/forums/uploads/monthly_2016_01/412098676.jpg.41e2d3562701152834b1c10b068388e3.thumb.jpg.fe9b6fad3ae9d371830b52db8c271189.jpg) as application/gzip instead of an image/jpg.
Can anyone shed some light on this? Or please confirm if it is a bug. Meanwhile, I will be looking into the code to see what is going wrong. I am working on the latest build. *[0]*: 2016-03-31 02:20:29,980 WARN parse.ParseUtil - Error parsing http://www.sturmgewehr.com/forums/uploads/monthly_2016_01/412098676.jpg.41e2d3562701152834b1c10b068388e3.thumb.jpg.fe9b6fad3ae9d371830b52db8c271189.jpg with org.apache.nutch.parse.tika.TikaParser@48c56835 java.util.concurrent.ExecutionException: java.lang.NoSuchMethodError: org.apache.commons.compress.compressors.CompressorStreamFactory.<init>(Z)V at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:202) at org.apache.nutch.parse.ParseUtil.runParser(ParseUtil.java:171) at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:95) at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:104) at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:45) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NoSuchMethodError: org.apache.commons.compress.compressors.CompressorStreamFactory.<init>(Z)V at org.apache.tika.parser.pkg.CompressorParser.parse(CompressorParser.java:120) at org.apache.nutch.parse.tika.TikaParser.getParse(TikaParser.java:132) at org.apache.nutch.parse.ParseCallable.call(ParseCallable.java:35) at org.apache.nutch.parse.ParseCallable.call(ParseCallable.java:24) ... 4 more 2016-03-31 02:20:29,980 WARN parse.ParseUtil - Unable to successfully parse content http://www.sturmgewehr.com/forums/uploads/monthly_2016_01/412098676.jpg.41e2d3562701152834b1c10b068388e3.thumb.jpg.fe9b6fad3ae9d371830b52db8c271189.jpg of type application/gzip 2016-03-31 02:20:29,980 WARN parse.ParseSegment - Error parsing: http://www.sturmgewehr.com/forums/uploads/monthly_2016_01/412098676.jpg.41e2d3562701152834b1c10b068388e3.thumb.jpg.fe9b6fad3ae9d371830b52db8c271189.jpg: failed(2,200): org.apache.nutch.parse.ParseException: Unable to successfully parse content 2016-03-31 02:20:29,981 INFO cosine.CosineSimilarity - Setting score of http://www.sturmgewehr.com/forums/uploads/monthly_2016_01/412098676.jpg.41e2d3562701152834b1c10b068388e3.thumb.jpg.fe9b6fad3ae9d371830b52db8c271189.jpg to 0.0 2016-03-31 02:20:29,981 INFO parse.ParseSegment - Parsed (19ms): http://www.sturmgewehr.com/forums/uploads/monthly_2016_01/412098676.jpg.41e2d3562701152834b1c10b068388e3.thumb.jpg.fe9b6fad3ae9d371830b52db8c271189.jpg Thanks & Regards, Karanjeet Singh CS Graduate Student University of Southern California karan...@usc.edu | +1-213-675-9583 ᐧ