Hello,

I am getting below error *[0]* while parsing an image. It seems Tika is
detecting the URL (
http://www.sturmgewehr.com/forums/uploads/monthly_2016_01/412098676.jpg.41e2d3562701152834b1c10b068388e3.thumb.jpg.fe9b6fad3ae9d371830b52db8c271189.jpg)
as application/gzip instead of an image/jpg.

Can anyone shed some light on this? Or please confirm if it is a bug.
Meanwhile, I will be looking into the code to see what is going wrong. I am
working on the latest build.

*[0]*:

2016-03-31 02:20:29,980 WARN  parse.ParseUtil - Error parsing
http://www.sturmgewehr.com/forums/uploads/monthly_2016_01/412098676.jpg.41e2d3562701152834b1c10b068388e3.thumb.jpg.fe9b6fad3ae9d371830b52db8c271189.jpg
with org.apache.nutch.parse.tika.TikaParser@48c56835

java.util.concurrent.ExecutionException: java.lang.NoSuchMethodError:
org.apache.commons.compress.compressors.CompressorStreamFactory.<init>(Z)V

at java.util.concurrent.FutureTask.report(FutureTask.java:122)

at java.util.concurrent.FutureTask.get(FutureTask.java:202)

at org.apache.nutch.parse.ParseUtil.runParser(ParseUtil.java:171)

at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:95)

at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:104)

at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:45)

at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)

at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)

at
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

at java.util.concurrent.FutureTask.run(FutureTask.java:262)

at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)

Caused by: java.lang.NoSuchMethodError:
org.apache.commons.compress.compressors.CompressorStreamFactory.<init>(Z)V

at
org.apache.tika.parser.pkg.CompressorParser.parse(CompressorParser.java:120)

at org.apache.nutch.parse.tika.TikaParser.getParse(TikaParser.java:132)

at org.apache.nutch.parse.ParseCallable.call(ParseCallable.java:35)

at org.apache.nutch.parse.ParseCallable.call(ParseCallable.java:24)

... 4 more

2016-03-31 02:20:29,980 WARN  parse.ParseUtil - Unable to successfully
parse content
http://www.sturmgewehr.com/forums/uploads/monthly_2016_01/412098676.jpg.41e2d3562701152834b1c10b068388e3.thumb.jpg.fe9b6fad3ae9d371830b52db8c271189.jpg
of type application/gzip

2016-03-31 02:20:29,980 WARN  parse.ParseSegment - Error parsing:
http://www.sturmgewehr.com/forums/uploads/monthly_2016_01/412098676.jpg.41e2d3562701152834b1c10b068388e3.thumb.jpg.fe9b6fad3ae9d371830b52db8c271189.jpg:
failed(2,200): org.apache.nutch.parse.ParseException: Unable to
successfully parse content

2016-03-31 02:20:29,981 INFO  cosine.CosineSimilarity - Setting score of
http://www.sturmgewehr.com/forums/uploads/monthly_2016_01/412098676.jpg.41e2d3562701152834b1c10b068388e3.thumb.jpg.fe9b6fad3ae9d371830b52db8c271189.jpg
to 0.0

2016-03-31 02:20:29,981 INFO  parse.ParseSegment - Parsed (19ms):
http://www.sturmgewehr.com/forums/uploads/monthly_2016_01/412098676.jpg.41e2d3562701152834b1c10b068388e3.thumb.jpg.fe9b6fad3ae9d371830b52db8c271189.jpg

Thanks & Regards,
Karanjeet Singh
CS Graduate Student
University of Southern California
karan...@usc.edu | +1-213-675-9583
ᐧ

Reply via email to