[jira] Created: (NUTCH-631) MoreIndexingFilter fails with NoSuchElementException

2008-05-09 Thread Stefan Will (JIRA)
MoreIndexingFilter fails with NoSuchElementException


 Key: NUTCH-631
 URL: https://issues.apache.org/jira/browse/NUTCH-631
 Project: Nutch
  Issue Type: Bug
  Components: indexer
Affects Versions: 1.0.0
 Environment: Verified on CentOS and OSX
Reporter: Stefan Will
 Fix For: 1.0.0


I did a simple crawl and started the indexer with the index-more plugin 
activated. The index job fails with the following stack trace in the task log:

java.util.NoSuchElementException
at java.util.TreeMap.key(TreeMap.java:433)
at java.util.TreeMap.firstKey(TreeMap.java:287)
at java.util.TreeSet.first(TreeSet.java:407)
at 
java.util.Collections$UnmodifiableSortedSet.first(Collections.java:1114)
at 
org.apache.nutch.indexer.more.MoreIndexingFilter.addType(MoreIndexingFilter.java:207)
at 
org.apache.nutch.indexer.more.MoreIndexingFilter.filter(MoreIndexingFilter.java:90)
at 
org.apache.nutch.indexer.IndexingFilters.filter(IndexingFilters.java:111)
at org.apache.nutch.indexer.Indexer.reduce(Indexer.java:249)
at org.apache.nutch.indexer.Indexer.reduce(Indexer.java:52)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:333)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:164)

I traced this down to the part in MoreIndexingFilter where the mime type is 
split into primary type and subtype for indexing:

contentType = mimeType.getName();
String primaryType = mimeType.getSuperType().getName();
String subType = mimeType.getSubTypes().first().getName();

Apparently Tika does not have a subtype for text/html. Furthermore, the 
supertype for text/html is set as application/octet-stream, which I doubt is 
what we want indexed. Don't we want primaryType to be "text" and subType to be 
"html" ?

So I changed the code to:

contentType = mimeType.getName();
String[] split = contentType.split("/");
String primaryType = split[0];
String subType = (split.length>1)?split[1]:null;

This does what I think it should do, but perhaps I'm missing something ? 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Problem compiling plugins

2008-05-09 Thread ogjunk-nutch
Hi,

You are missing some ant jars.  I'm not sure which ones, but it looks like the 
class that cannot be found is TraXLiaison , so once you google you'll find 
which optional ant jar this is in.  Get that jar, put it in your ant home's lib 
dir and try again.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message 
> From: Pau <[EMAIL PROTECTED]>
> To: nutch-dev@lucene.apache.org
> Sent: Friday, May 9, 2008 4:32:08 AM
> Subject: Problem compiling plugins
> 
> Hello,
> I have to implement a plugin for Nutch 0.9, so I have followed the
> WritingPluginExample-0.9.
> When I try to compile the plugins I get warnings about
> nutch-extensionpoints.jar:
>   [jar] Warning: skipping jar archive
> /home/pau/Pau/Master/Tesis/nutch-0.9/build/nutch-extensionpoints/nutch-extensionpoints.jar
> because no files were in
> Why do I get this warning?
> 
> Furthermore, when I try to compile the .war file with the command 'ant war',
> I get the following error:
> generate-locale:
>  [echo] Generating docs for locale=ca
>  [xslt] java.lang.ClassNotFoundException:
> org.apache.tools.ant.taskdefs.optional.TraXLiaison
>  [xslt] at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
>  [xslt] at java.security.AccessController.doPrivileged(Native
> Method)
>  [xslt] at
> java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>  [xslt] at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>  [xslt] at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
>  [xslt] at
> java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
>  [xslt] at java.lang.Class.forName0(Native Method)
>  [xslt] at java.lang.Class.forName(Class.java:169)
>  [xslt] at
> org.apache.tools.ant.taskdefs.XSLTProcess.loadClass(XSLTProcess.java:548)
>  [xslt] at
> org.apache.tools.ant.taskdefs.XSLTProcess.resolveProcessor(XSLTProcess.java:533)
>  [xslt] at
> org.apache.tools.ant.taskdefs.XSLTProcess.getLiaison(XSLTProcess.java:785)
>  [xslt] at
> org.apache.tools.ant.taskdefs.XSLTProcess.execute(XSLTProcess.java:300)
>  [xslt] at
> org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:288)
>  [xslt] at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown
> Source)
>  [xslt] at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>  [xslt] at java.lang.reflect.Method.invoke(Method.java:597)
>  [xslt] at
> org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:105)
>  [xslt] at org.apache.tools.ant.Task.perform(Task.java:348)
>  [xslt] at org.apache.tools.ant.Target.execute(Target.java:357)
>  [xslt] at org.apache.tools.ant.Target.performTasks(Target.java:385)
>  [xslt] at
> org.apache.tools.ant.Project.executeSortedTargets(Project.java:1329)
>  [xslt] at
> org.apache.tools.ant.helper.SingleCheckExecutor.executeTargets(SingleCheckExecutor.java:38)
>  [xslt] at
> org.apache.tools.ant.Project.executeTargets(Project.java:1181)
>  [xslt] at org.apache.tools.ant.taskdefs.Ant.execute(Ant.java:416)
>  [xslt] at
> org.apache.tools.ant.taskdefs.CallTarget.execute(CallTarget.java:105)
>  [xslt] at
> org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:288)
>  [xslt] at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown
> Source)
>  [xslt] at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>  [xslt] at java.lang.reflect.Method.invoke(Method.java:597)
>  [xslt] at
> org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:105)
>  [xslt] at org.apache.tools.ant.Task.perform(Task.java:348)
>  [xslt] at org.apache.tools.ant.Target.execute(Target.java:357)
>  [xslt] at org.apache.tools.ant.Target.performTasks(Target.java:385)
>  [xslt] at
> org.apache.tools.ant.Project.executeSortedTargets(Project.java:1329)
>  [xslt] at
> org.apache.tools.ant.Project.executeTarget(Project.java:1298)
>  [xslt] at
> org.apache.tools.ant.helper.DefaultExecutor.executeTargets(DefaultExecutor.java:41)
>  [xslt] at
> org.apache.tools.ant.Project.executeTargets(Project.java:1181)
>  [xslt] at org.apache.tools.ant.Main.runBuild(Main.java:698)
>  [xslt] at org.apache.tools.ant.Main.startAnt(Main.java:199)
>  [xslt] at
> org.apache.tools.ant.launch.Launcher.run(Launcher.java:257)
>  [xslt] at
> org.apache.tools.ant.launch.Launcher.main(Launcher.java:104)
> 
> BUILD FAILED
> /home/pau/Pau/Master/Tesis/nutch-0.9/build.xml:442: The following error
> occurred while executing this line:
> /home/pau/Pau/Master/Tesis/nutch-0.9/build.xml:408:
> java.lang.ClassNotFoundException:
> org.apache.tools.ant.taskdefs.optional.TraXLiaison
> 
> Could you please help me with it?
> Thank you very much.



Problem compiling plugins

2008-05-09 Thread Pau
Hello,
I have to implement a plugin for Nutch 0.9, so I have followed the
WritingPluginExample-0.9.
When I try to compile the plugins I get warnings about
nutch-extensionpoints.jar:
  [jar] Warning: skipping jar archive
/home/pau/Pau/Master/Tesis/nutch-0.9/build/nutch-extensionpoints/nutch-extensionpoints.jar
because no files were in
Why do I get this warning?

Furthermore, when I try to compile the .war file with the command 'ant war',
I get the following error:
generate-locale:
 [echo] Generating docs for locale=ca
 [xslt] java.lang.ClassNotFoundException:
org.apache.tools.ant.taskdefs.optional.TraXLiaison
 [xslt] at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
 [xslt] at java.security.AccessController.doPrivileged(Native
Method)
 [xslt] at
java.net.URLClassLoader.findClass(URLClassLoader.java:188)
 [xslt] at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
 [xslt] at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
 [xslt] at
java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
 [xslt] at java.lang.Class.forName0(Native Method)
 [xslt] at java.lang.Class.forName(Class.java:169)
 [xslt] at
org.apache.tools.ant.taskdefs.XSLTProcess.loadClass(XSLTProcess.java:548)
 [xslt] at
org.apache.tools.ant.taskdefs.XSLTProcess.resolveProcessor(XSLTProcess.java:533)
 [xslt] at
org.apache.tools.ant.taskdefs.XSLTProcess.getLiaison(XSLTProcess.java:785)
 [xslt] at
org.apache.tools.ant.taskdefs.XSLTProcess.execute(XSLTProcess.java:300)
 [xslt] at
org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:288)
 [xslt] at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown
Source)
 [xslt] at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 [xslt] at java.lang.reflect.Method.invoke(Method.java:597)
 [xslt] at
org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:105)
 [xslt] at org.apache.tools.ant.Task.perform(Task.java:348)
 [xslt] at org.apache.tools.ant.Target.execute(Target.java:357)
 [xslt] at org.apache.tools.ant.Target.performTasks(Target.java:385)
 [xslt] at
org.apache.tools.ant.Project.executeSortedTargets(Project.java:1329)
 [xslt] at
org.apache.tools.ant.helper.SingleCheckExecutor.executeTargets(SingleCheckExecutor.java:38)
 [xslt] at
org.apache.tools.ant.Project.executeTargets(Project.java:1181)
 [xslt] at org.apache.tools.ant.taskdefs.Ant.execute(Ant.java:416)
 [xslt] at
org.apache.tools.ant.taskdefs.CallTarget.execute(CallTarget.java:105)
 [xslt] at
org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:288)
 [xslt] at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown
Source)
 [xslt] at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 [xslt] at java.lang.reflect.Method.invoke(Method.java:597)
 [xslt] at
org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:105)
 [xslt] at org.apache.tools.ant.Task.perform(Task.java:348)
 [xslt] at org.apache.tools.ant.Target.execute(Target.java:357)
 [xslt] at org.apache.tools.ant.Target.performTasks(Target.java:385)
 [xslt] at
org.apache.tools.ant.Project.executeSortedTargets(Project.java:1329)
 [xslt] at
org.apache.tools.ant.Project.executeTarget(Project.java:1298)
 [xslt] at
org.apache.tools.ant.helper.DefaultExecutor.executeTargets(DefaultExecutor.java:41)
 [xslt] at
org.apache.tools.ant.Project.executeTargets(Project.java:1181)
 [xslt] at org.apache.tools.ant.Main.runBuild(Main.java:698)
 [xslt] at org.apache.tools.ant.Main.startAnt(Main.java:199)
 [xslt] at
org.apache.tools.ant.launch.Launcher.run(Launcher.java:257)
 [xslt] at
org.apache.tools.ant.launch.Launcher.main(Launcher.java:104)

BUILD FAILED
/home/pau/Pau/Master/Tesis/nutch-0.9/build.xml:442: The following error
occurred while executing this line:
/home/pau/Pau/Master/Tesis/nutch-0.9/build.xml:408:
java.lang.ClassNotFoundException:
org.apache.tools.ant.taskdefs.optional.TraXLiaison

Could you please help me with it?
Thank you very much.