Re: Issues pending before 0.9 release
Sami Siren wrote: It would be more beneficial to everybody if the discussions (related to release or Nutch) is done on public (hey this is open source!). The off the list stuff IMO smells. +1 Folks sometimes wish to discuss project matters off-list to spare others the boring details, but this is usually a bad idea. All project decisions should be made in public on this list. Discussions relevant to these decisions are also thus best made on this list, since they explain the decision. Private discussions are permissible to develop a proposal, but that is usually better done on-list when possible, so that others can get involved earlier. (The one notable exception is that personnel issues are discussed on the private PMC list.) Doug
Re: FW: Nutch release process help
Chris Mattmann wrote: It's too bad that this has turned out to be an issue that I've handled incorrectly, and for that, I apologize. Sorry if I blew this out of proportion. We all help each other run this project. I don't think any grave error was made. I just saw an opportunity to remind folks to try to keep project discussions public, and did not mean to rebuke you. I am thrilled that you want to take on the responsibility of making a release. I very much do not want to damp your enthusiasm for that. As you probably know, the release documentation is at: http://wiki.apache.org/nutch/Release_HOWTO This may need to be updated. You might also look at the release documentation for other projects, to get ideas. http://wiki.apache.org/lucene-hadoop/HowToRelease http://wiki.apache.org/solr/HowToRelease http://wiki.apache.org/jakarta-lucene/ReleaseTodo Cheers, Doug
Re: FW: Nutch release process help
Chris, I have documented the process in the wiki. Doug have sent the links already. If you have any questions I would be willing to help. I can even do it myself if find it difficult - I simply do not want to be the bottleneck as I am behind my schedule at work and in private life. I still hope I would be able to get to be more active in ntuch community in future. Regards Pitor On 3/6/07, Doug Cutting [EMAIL PROTECTED] wrote: Chris Mattmann wrote: It's too bad that this has turned out to be an issue that I've handled incorrectly, and for that, I apologize. Sorry if I blew this out of proportion. We all help each other run this project. I don't think any grave error was made. I just saw an opportunity to remind folks to try to keep project discussions public, and did not mean to rebuke you. I am thrilled that you want to take on the responsibility of making a release. I very much do not want to damp your enthusiasm for that. As you probably know, the release documentation is at: http://wiki.apache.org/nutch/Release_HOWTO This may need to be updated. You might also look at the release documentation for other projects, to get ideas. http://wiki.apache.org/lucene-hadoop/HowToRelease http://wiki.apache.org/solr/HowToRelease http://wiki.apache.org/jakarta-lucene/ReleaseTodo Cheers, Doug
[jira] Commented: (NUTCH-422) index-extra plugin creates additional fields in the index, based on configurable logic
[ https://issues.apache.org/jira/browse/NUTCH-422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12478683 ] Nathan ter Bogt commented on NUTCH-422: --- Has anyone got the binary version of this module to work? I get to the indexing part and get the following error: Exception in thread main java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:357) at org.apache.nutch.indexer.Indexer.index(Indexer.java:296) at org.apache.nutch.crawl.Crawl.main(Crawl.java:121) And this is what I get in my hadoop log: 2007-03-07 15:26:33,272 INFO indexer.Indexer - Optimizing index. 2007-03-07 15:26:33,275 WARN mapred.LocalJobRunner - job_qq3l2z java.lang.NoClassDefFoundError: org/jdom/JDOMException at org.apache.nutch.indexer.extra.ExtraIndexingFilter.filter(ExtraIndexingFilter.java:68) at org.apache.nutch.indexer.IndexingFilters.filter(IndexingFilters.java:72) at org.apache.nutch.indexer.Indexer.reduce(Indexer.java:235) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:247) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:112) Any help would be greatly appreciated. Lastly, I'm all for the query-extra plugin also. index-extra plugin creates additional fields in the index, based on configurable logic -- Key: NUTCH-422 URL: https://issues.apache.org/jira/browse/NUTCH-422 Project: Nutch Issue Type: New Feature Components: indexer Affects Versions: 0.8.1 Environment: All environments Reporter: Alan Tanaman Assigned To: Sami Siren Attachments: index-extra-v1.0-bin-java1.5.zip, index-extra-v1.0-source.zip Extract from the Readme file: A. Introduction The index-extra plugin allows you to configure additional fields that you wish to be added to the index, based on one of the following sources: - The parsed text - Meta data fields - Previously created document-to-be-indexed fields - Plain constant string - Java expression combining one or more of the above, and resolving to a string A regex can also be applied to any of the above, allowing fields to be created based on patterns extracted from the source. B. Installation 1) Binaries only: Copy the 'index-extra' folder within index-extra-v1.0-bin-java1.5.zip to NUTCHDIR/build Copy the 'index-extra-conf.xml' file to NUTCHDIR/conf, and configure Enable the plugin by updating the nutch-site.xml file 2) Source code:Always refer to the Nutch wiki for detailed instructions on building Nutch. In short: Copy the 'index-extra' folder within index-extra-v1.0-source.zip to NUTCHDIR/src/plugin Update the build.xml in NUTCHDIR/src/plugin to include plugin Update the NUTCHDIR/default.properties file to include plugin run ant to build Copy the 'index-extra-conf.xml' file to NUTCHDIR/conf, and configure Enable the plugin by updating the nutch-site.xml file C. Known Issues 1) For this plugin to work correctly on any document field, it is necessary to run the other index filters first, so that all basic document fields are generated first. To do this, configure the indexingfilter.order property. (Please see patch NUTCH-421 to enable indexingfilter.order property. If this patch is not applied, the plugin will still work, but will not be able to use document fields created by other index filter plugins.) 2) At this stage, field boost can not be used as Nutch scoring overrides the field boost with its own document-level boost calculation. This occurs at the end of org.apache.nutch.indexer.Indexer's reduce method. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (NUTCH-422) index-extra plugin creates additional fields in the index, based on configurable logic
[ https://issues.apache.org/jira/browse/NUTCH-422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12478688 ] Nathan ter Bogt commented on NUTCH-422: --- Sorry all, I managed to get this working. Just had some issues with the jdom library (or lack thereof). I must have just misread the error earlier. Fantastic plugin idea too, thanks! index-extra plugin creates additional fields in the index, based on configurable logic -- Key: NUTCH-422 URL: https://issues.apache.org/jira/browse/NUTCH-422 Project: Nutch Issue Type: New Feature Components: indexer Affects Versions: 0.8.1 Environment: All environments Reporter: Alan Tanaman Assigned To: Sami Siren Attachments: index-extra-v1.0-bin-java1.5.zip, index-extra-v1.0-source.zip Extract from the Readme file: A. Introduction The index-extra plugin allows you to configure additional fields that you wish to be added to the index, based on one of the following sources: - The parsed text - Meta data fields - Previously created document-to-be-indexed fields - Plain constant string - Java expression combining one or more of the above, and resolving to a string A regex can also be applied to any of the above, allowing fields to be created based on patterns extracted from the source. B. Installation 1) Binaries only: Copy the 'index-extra' folder within index-extra-v1.0-bin-java1.5.zip to NUTCHDIR/build Copy the 'index-extra-conf.xml' file to NUTCHDIR/conf, and configure Enable the plugin by updating the nutch-site.xml file 2) Source code:Always refer to the Nutch wiki for detailed instructions on building Nutch. In short: Copy the 'index-extra' folder within index-extra-v1.0-source.zip to NUTCHDIR/src/plugin Update the build.xml in NUTCHDIR/src/plugin to include plugin Update the NUTCHDIR/default.properties file to include plugin run ant to build Copy the 'index-extra-conf.xml' file to NUTCHDIR/conf, and configure Enable the plugin by updating the nutch-site.xml file C. Known Issues 1) For this plugin to work correctly on any document field, it is necessary to run the other index filters first, so that all basic document fields are generated first. To do this, configure the indexingfilter.order property. (Please see patch NUTCH-421 to enable indexingfilter.order property. If this patch is not applied, the plugin will still work, but will not be able to use document fields created by other index filter plugins.) 2) At this stage, field boost can not be used as Nutch scoring overrides the field boost with its own document-level boost calculation. This occurs at the end of org.apache.nutch.indexer.Indexer's reduce method. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.