[
https://issues.apache.org/jira/browse/NUTCH-422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12478683
]
Nathan ter Bogt commented on NUTCH-422:
---------------------------------------
Has anyone got the binary version of this module to work? I get to the indexing
part and get the following error:
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:357)
at org.apache.nutch.indexer.Indexer.index(Indexer.java:296)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:121)
And this is what I get in my hadoop log:
2007-03-07 15:26:33,272 INFO indexer.Indexer - Optimizing index.
2007-03-07 15:26:33,275 WARN mapred.LocalJobRunner - job_qq3l2z
java.lang.NoClassDefFoundError: org/jdom/JDOMException
at
org.apache.nutch.indexer.extra.ExtraIndexingFilter.filter(ExtraIndexingFilter.java:68)
at
org.apache.nutch.indexer.IndexingFilters.filter(IndexingFilters.java:72)
at org.apache.nutch.indexer.Indexer.reduce(Indexer.java:235)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:247)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:112)
Any help would be greatly appreciated. Lastly, I'm all for the query-extra
plugin also.
> index-extra plugin creates additional fields in the index, based on
> configurable logic
> --------------------------------------------------------------------------------------
>
> Key: NUTCH-422
> URL: https://issues.apache.org/jira/browse/NUTCH-422
> Project: Nutch
> Issue Type: New Feature
> Components: indexer
> Affects Versions: 0.8.1
> Environment: All environments
> Reporter: Alan Tanaman
> Assigned To: Sami Siren
> Attachments: index-extra-v1.0-bin-java1.5.zip,
> index-extra-v1.0-source.zip
>
>
> Extract from the Readme file:
> A. Introduction
> The index-extra plugin allows you to configure additional fields that you
> wish to be added to the index, based on one of the following sources:
> - The parsed text
> - Meta data fields
> - Previously created document-to-be-indexed fields
> - Plain constant string
> - Java expression combining one or more of the above, and resolving to
> a string
> A regex can also be applied to any of the above, allowing fields to be
> created based on patterns extracted from the source.
> B. Installation
> 1) Binaries only: Copy the 'index-extra' folder within
> index-extra-v1.0-bin-java1.5.zip to NUTCHDIR/build
> Copy the 'index-extra-conf.xml' file to
> NUTCHDIR/conf, and configure
> Enable the plugin by updating the nutch-site.xml file
> 2) Source code: Always refer to the Nutch wiki for detailed
> instructions on building Nutch. In short:
> Copy the 'index-extra' folder within
> index-extra-v1.0-source.zip to NUTCHDIR/src/plugin
> Update the build.xml in NUTCHDIR/src/plugin to
> include plugin
> Update the NUTCHDIR/default.properties file to
> include plugin
> run ant to build
> Copy the 'index-extra-conf.xml' file to
> NUTCHDIR/conf, and configure
> Enable the plugin by updating the nutch-site.xml file
> C. Known Issues
> 1) For this plugin to work correctly on any document field, it is
> necessary to run the other index filters
> first, so that all basic document fields are generated first. To do
> this, configure the indexingfilter.order
> property. (Please see patch NUTCH-421 to enable indexingfilter.order
> property. If this patch is not applied,
> the plugin will still work, but will not be able to use document fields
> created by other index filter plugins.)
> 2) At this stage, field boost can not be used as Nutch scoring overrides
> the field boost with its own
> document-level boost calculation. This occurs at the end of
> org.apache.nutch.indexer.Indexer's reduce method.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers