[jira] Created: (NUTCH-631) MoreIndexingFilter fails with NoSuchElementException
MoreIndexingFilter fails with NoSuchElementException Key: NUTCH-631 URL: https://issues.apache.org/jira/browse/NUTCH-631 Project: Nutch Issue Type: Bug Components: indexer Affects Versions: 1.0.0 Environment: Verified on CentOS and OSX Reporter: Stefan Will Fix For: 1.0.0 I did a simple crawl and started the indexer with the index-more plugin activated. The index job fails with the following stack trace in the task log: java.util.NoSuchElementException at java.util.TreeMap.key(TreeMap.java:433) at java.util.TreeMap.firstKey(TreeMap.java:287) at java.util.TreeSet.first(TreeSet.java:407) at java.util.Collections$UnmodifiableSortedSet.first(Collections.java:1114) at org.apache.nutch.indexer.more.MoreIndexingFilter.addType(MoreIndexingFilter.java:207) at org.apache.nutch.indexer.more.MoreIndexingFilter.filter(MoreIndexingFilter.java:90) at org.apache.nutch.indexer.IndexingFilters.filter(IndexingFilters.java:111) at org.apache.nutch.indexer.Indexer.reduce(Indexer.java:249) at org.apache.nutch.indexer.Indexer.reduce(Indexer.java:52) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:333) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:164) I traced this down to the part in MoreIndexingFilter where the mime type is split into primary type and subtype for indexing: contentType = mimeType.getName(); String primaryType = mimeType.getSuperType().getName(); String subType = mimeType.getSubTypes().first().getName(); Apparently Tika does not have a subtype for text/html. Furthermore, the supertype for text/html is set as application/octet-stream, which I doubt is what we want indexed. Don't we want primaryType to be "text" and subType to be "html" ? So I changed the code to: contentType = mimeType.getName(); String[] split = contentType.split("/"); String primaryType = split[0]; String subType = (split.length>1)?split[1]:null; This does what I think it should do, but perhaps I'm missing something ? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Problem compiling plugins
Hi, You are missing some ant jars. I'm not sure which ones, but it looks like the class that cannot be found is TraXLiaison , so once you google you'll find which optional ant jar this is in. Get that jar, put it in your ant home's lib dir and try again. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Pau <[EMAIL PROTECTED]> > To: nutch-dev@lucene.apache.org > Sent: Friday, May 9, 2008 4:32:08 AM > Subject: Problem compiling plugins > > Hello, > I have to implement a plugin for Nutch 0.9, so I have followed the > WritingPluginExample-0.9. > When I try to compile the plugins I get warnings about > nutch-extensionpoints.jar: > [jar] Warning: skipping jar archive > /home/pau/Pau/Master/Tesis/nutch-0.9/build/nutch-extensionpoints/nutch-extensionpoints.jar > because no files were in > Why do I get this warning? > > Furthermore, when I try to compile the .war file with the command 'ant war', > I get the following error: > generate-locale: > [echo] Generating docs for locale=ca > [xslt] java.lang.ClassNotFoundException: > org.apache.tools.ant.taskdefs.optional.TraXLiaison > [xslt] at java.net.URLClassLoader$1.run(URLClassLoader.java:200) > [xslt] at java.security.AccessController.doPrivileged(Native > Method) > [xslt] at > java.net.URLClassLoader.findClass(URLClassLoader.java:188) > [xslt] at java.lang.ClassLoader.loadClass(ClassLoader.java:306) > [xslt] at java.lang.ClassLoader.loadClass(ClassLoader.java:251) > [xslt] at > java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319) > [xslt] at java.lang.Class.forName0(Native Method) > [xslt] at java.lang.Class.forName(Class.java:169) > [xslt] at > org.apache.tools.ant.taskdefs.XSLTProcess.loadClass(XSLTProcess.java:548) > [xslt] at > org.apache.tools.ant.taskdefs.XSLTProcess.resolveProcessor(XSLTProcess.java:533) > [xslt] at > org.apache.tools.ant.taskdefs.XSLTProcess.getLiaison(XSLTProcess.java:785) > [xslt] at > org.apache.tools.ant.taskdefs.XSLTProcess.execute(XSLTProcess.java:300) > [xslt] at > org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:288) > [xslt] at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown > Source) > [xslt] at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > [xslt] at java.lang.reflect.Method.invoke(Method.java:597) > [xslt] at > org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:105) > [xslt] at org.apache.tools.ant.Task.perform(Task.java:348) > [xslt] at org.apache.tools.ant.Target.execute(Target.java:357) > [xslt] at org.apache.tools.ant.Target.performTasks(Target.java:385) > [xslt] at > org.apache.tools.ant.Project.executeSortedTargets(Project.java:1329) > [xslt] at > org.apache.tools.ant.helper.SingleCheckExecutor.executeTargets(SingleCheckExecutor.java:38) > [xslt] at > org.apache.tools.ant.Project.executeTargets(Project.java:1181) > [xslt] at org.apache.tools.ant.taskdefs.Ant.execute(Ant.java:416) > [xslt] at > org.apache.tools.ant.taskdefs.CallTarget.execute(CallTarget.java:105) > [xslt] at > org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:288) > [xslt] at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown > Source) > [xslt] at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > [xslt] at java.lang.reflect.Method.invoke(Method.java:597) > [xslt] at > org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:105) > [xslt] at org.apache.tools.ant.Task.perform(Task.java:348) > [xslt] at org.apache.tools.ant.Target.execute(Target.java:357) > [xslt] at org.apache.tools.ant.Target.performTasks(Target.java:385) > [xslt] at > org.apache.tools.ant.Project.executeSortedTargets(Project.java:1329) > [xslt] at > org.apache.tools.ant.Project.executeTarget(Project.java:1298) > [xslt] at > org.apache.tools.ant.helper.DefaultExecutor.executeTargets(DefaultExecutor.java:41) > [xslt] at > org.apache.tools.ant.Project.executeTargets(Project.java:1181) > [xslt] at org.apache.tools.ant.Main.runBuild(Main.java:698) > [xslt] at org.apache.tools.ant.Main.startAnt(Main.java:199) > [xslt] at > org.apache.tools.ant.launch.Launcher.run(Launcher.java:257) > [xslt] at > org.apache.tools.ant.launch.Launcher.main(Launcher.java:104) > > BUILD FAILED > /home/pau/Pau/Master/Tesis/nutch-0.9/build.xml:442: The following error > occurred while executing this line: > /home/pau/Pau/Master/Tesis/nutch-0.9/build.xml:408: > java.lang.ClassNotFoundException: > org.apache.tools.ant.taskdefs.optional.TraXLiaison > > Could you please help me with it? > Thank you very much.
Problem compiling plugins
Hello, I have to implement a plugin for Nutch 0.9, so I have followed the WritingPluginExample-0.9. When I try to compile the plugins I get warnings about nutch-extensionpoints.jar: [jar] Warning: skipping jar archive /home/pau/Pau/Master/Tesis/nutch-0.9/build/nutch-extensionpoints/nutch-extensionpoints.jar because no files were in Why do I get this warning? Furthermore, when I try to compile the .war file with the command 'ant war', I get the following error: generate-locale: [echo] Generating docs for locale=ca [xslt] java.lang.ClassNotFoundException: org.apache.tools.ant.taskdefs.optional.TraXLiaison [xslt] at java.net.URLClassLoader$1.run(URLClassLoader.java:200) [xslt] at java.security.AccessController.doPrivileged(Native Method) [xslt] at java.net.URLClassLoader.findClass(URLClassLoader.java:188) [xslt] at java.lang.ClassLoader.loadClass(ClassLoader.java:306) [xslt] at java.lang.ClassLoader.loadClass(ClassLoader.java:251) [xslt] at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319) [xslt] at java.lang.Class.forName0(Native Method) [xslt] at java.lang.Class.forName(Class.java:169) [xslt] at org.apache.tools.ant.taskdefs.XSLTProcess.loadClass(XSLTProcess.java:548) [xslt] at org.apache.tools.ant.taskdefs.XSLTProcess.resolveProcessor(XSLTProcess.java:533) [xslt] at org.apache.tools.ant.taskdefs.XSLTProcess.getLiaison(XSLTProcess.java:785) [xslt] at org.apache.tools.ant.taskdefs.XSLTProcess.execute(XSLTProcess.java:300) [xslt] at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:288) [xslt] at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown Source) [xslt] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [xslt] at java.lang.reflect.Method.invoke(Method.java:597) [xslt] at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:105) [xslt] at org.apache.tools.ant.Task.perform(Task.java:348) [xslt] at org.apache.tools.ant.Target.execute(Target.java:357) [xslt] at org.apache.tools.ant.Target.performTasks(Target.java:385) [xslt] at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1329) [xslt] at org.apache.tools.ant.helper.SingleCheckExecutor.executeTargets(SingleCheckExecutor.java:38) [xslt] at org.apache.tools.ant.Project.executeTargets(Project.java:1181) [xslt] at org.apache.tools.ant.taskdefs.Ant.execute(Ant.java:416) [xslt] at org.apache.tools.ant.taskdefs.CallTarget.execute(CallTarget.java:105) [xslt] at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:288) [xslt] at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown Source) [xslt] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [xslt] at java.lang.reflect.Method.invoke(Method.java:597) [xslt] at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:105) [xslt] at org.apache.tools.ant.Task.perform(Task.java:348) [xslt] at org.apache.tools.ant.Target.execute(Target.java:357) [xslt] at org.apache.tools.ant.Target.performTasks(Target.java:385) [xslt] at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1329) [xslt] at org.apache.tools.ant.Project.executeTarget(Project.java:1298) [xslt] at org.apache.tools.ant.helper.DefaultExecutor.executeTargets(DefaultExecutor.java:41) [xslt] at org.apache.tools.ant.Project.executeTargets(Project.java:1181) [xslt] at org.apache.tools.ant.Main.runBuild(Main.java:698) [xslt] at org.apache.tools.ant.Main.startAnt(Main.java:199) [xslt] at org.apache.tools.ant.launch.Launcher.run(Launcher.java:257) [xslt] at org.apache.tools.ant.launch.Launcher.main(Launcher.java:104) BUILD FAILED /home/pau/Pau/Master/Tesis/nutch-0.9/build.xml:442: The following error occurred while executing this line: /home/pau/Pau/Master/Tesis/nutch-0.9/build.xml:408: java.lang.ClassNotFoundException: org.apache.tools.ant.taskdefs.optional.TraXLiaison Could you please help me with it? Thank you very much.