Re: Issues pending before 0.9 release

2007-03-06 Thread Doug Cutting

Sami Siren wrote:

It would be more beneficial to everybody if the discussions (related to
release or Nutch) is
done on public (hey this is open source!). The off the list stuff IMO
smells.


+1  Folks sometimes wish to discuss project matters off-list to spare 
others the boring details, but this is usually a bad idea.  All project 
decisions should be made in public on this list.  Discussions relevant 
to these decisions are also thus best made on this list, since they 
explain the decision.  Private discussions are permissible to develop a 
proposal, but that is usually better done on-list when possible, so that 
others can get involved earlier.


(The one notable exception is that personnel issues are discussed on the 
private PMC list.)


Doug


Re: FW: Nutch release process help

2007-03-06 Thread Doug Cutting

Chris Mattmann wrote:

It's too bad that
this has turned out to be an issue that I've handled incorrectly, and for
that, I apologize.


Sorry if I blew this out of proportion.  We all help each other run this 
project.  I don't think any grave error was made.  I just saw an 
opportunity to remind folks to try to keep project discussions public, 
and did not mean to rebuke you.


I am thrilled that you want to take on the responsibility of making a 
release.  I very much do not want to damp your enthusiasm for that.


As you probably know, the release documentation is at:

http://wiki.apache.org/nutch/Release_HOWTO

This may need to be updated.  You might also look at the release 
documentation for other projects, to get ideas.


http://wiki.apache.org/lucene-hadoop/HowToRelease
http://wiki.apache.org/solr/HowToRelease
http://wiki.apache.org/jakarta-lucene/ReleaseTodo

Cheers,

Doug


Re: FW: Nutch release process help

2007-03-06 Thread Piotr Kosiorowski

Chris,
I have documented the process in the wiki. Doug have sent the links
already. If you have  any questions I would be willing to help. I can
even do it myself if find it difficult - I simply do not want to be
the bottleneck as I am behind my schedule at work and in private life.
I still hope I would be able to get to be more active in ntuch
community in future.
Regards
Pitor

On 3/6/07, Doug Cutting [EMAIL PROTECTED] wrote:

Chris Mattmann wrote:
 It's too bad that
 this has turned out to be an issue that I've handled incorrectly, and for
 that, I apologize.

Sorry if I blew this out of proportion.  We all help each other run this
project.  I don't think any grave error was made.  I just saw an
opportunity to remind folks to try to keep project discussions public,
and did not mean to rebuke you.

I am thrilled that you want to take on the responsibility of making a
release.  I very much do not want to damp your enthusiasm for that.

As you probably know, the release documentation is at:

http://wiki.apache.org/nutch/Release_HOWTO

This may need to be updated.  You might also look at the release
documentation for other projects, to get ideas.

http://wiki.apache.org/lucene-hadoop/HowToRelease
http://wiki.apache.org/solr/HowToRelease
http://wiki.apache.org/jakarta-lucene/ReleaseTodo

Cheers,

Doug



[jira] Commented: (NUTCH-422) index-extra plugin creates additional fields in the index, based on configurable logic

2007-03-06 Thread Nathan ter Bogt (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12478683
 ] 

Nathan ter Bogt commented on NUTCH-422:
---

Has anyone got the binary version of this module to work? I get to the indexing 
part and get the following error:

Exception in thread main java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:357)
at org.apache.nutch.indexer.Indexer.index(Indexer.java:296)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:121)

And this is what I get in my hadoop log:

2007-03-07 15:26:33,272 INFO  indexer.Indexer - Optimizing index.
2007-03-07 15:26:33,275 WARN  mapred.LocalJobRunner - job_qq3l2z
java.lang.NoClassDefFoundError: org/jdom/JDOMException
at 
org.apache.nutch.indexer.extra.ExtraIndexingFilter.filter(ExtraIndexingFilter.java:68)
at 
org.apache.nutch.indexer.IndexingFilters.filter(IndexingFilters.java:72)
at org.apache.nutch.indexer.Indexer.reduce(Indexer.java:235)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:247)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:112)

Any help would be greatly appreciated. Lastly, I'm all for the query-extra 
plugin also.

 index-extra plugin creates additional fields in the index, based on 
 configurable logic
 --

 Key: NUTCH-422
 URL: https://issues.apache.org/jira/browse/NUTCH-422
 Project: Nutch
  Issue Type: New Feature
  Components: indexer
Affects Versions: 0.8.1
 Environment: All environments
Reporter: Alan Tanaman
 Assigned To: Sami Siren
 Attachments: index-extra-v1.0-bin-java1.5.zip, 
 index-extra-v1.0-source.zip


 Extract from the Readme file:
 A.  Introduction
 The index-extra plugin allows you to configure additional fields that you 
 wish to be added to the index, based on one of the following sources:
   - The parsed text
   - Meta data fields
   - Previously created document-to-be-indexed fields
   - Plain constant string
   - Java expression combining one or more of the above, and resolving to 
 a string
 A regex can also be applied to any of the above, allowing fields to be 
 created based on patterns extracted from the source.
 B.  Installation
 1)  Binaries only:  Copy the 'index-extra' folder within 
 index-extra-v1.0-bin-java1.5.zip to NUTCHDIR/build
 Copy the 'index-extra-conf.xml' file to 
 NUTCHDIR/conf, and configure
 Enable the plugin by updating the nutch-site.xml file
 2)  Source code:Always refer to the Nutch wiki for detailed 
 instructions on building Nutch.  In short:
 Copy the 'index-extra' folder within 
 index-extra-v1.0-source.zip to NUTCHDIR/src/plugin
 Update the build.xml in NUTCHDIR/src/plugin to 
 include plugin
 Update the NUTCHDIR/default.properties file to 
 include plugin
 run ant to build
 Copy the 'index-extra-conf.xml' file to 
 NUTCHDIR/conf, and configure
 Enable the plugin by updating the nutch-site.xml file
 C.  Known Issues
 1)  For this plugin to work correctly on any document field, it is 
 necessary to run the other index filters
 first, so that all basic document fields are generated first.  To do 
 this, configure the indexingfilter.order
 property.  (Please see patch NUTCH-421 to enable indexingfilter.order 
 property. If this patch is not applied,
 the plugin will still work, but will not be able to use document fields 
 created by other index filter plugins.)
 2)  At this stage, field boost can not be used as Nutch scoring overrides 
 the field boost with its own
 document-level boost calculation.  This occurs at the end of 
 org.apache.nutch.indexer.Indexer's reduce method.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (NUTCH-422) index-extra plugin creates additional fields in the index, based on configurable logic

2007-03-06 Thread Nathan ter Bogt (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12478688
 ] 

Nathan ter Bogt commented on NUTCH-422:
---

Sorry all,

I managed to get this working. Just had some issues with the jdom library (or 
lack thereof).
I must have just misread the error earlier.

Fantastic plugin idea too, thanks!

 index-extra plugin creates additional fields in the index, based on 
 configurable logic
 --

 Key: NUTCH-422
 URL: https://issues.apache.org/jira/browse/NUTCH-422
 Project: Nutch
  Issue Type: New Feature
  Components: indexer
Affects Versions: 0.8.1
 Environment: All environments
Reporter: Alan Tanaman
 Assigned To: Sami Siren
 Attachments: index-extra-v1.0-bin-java1.5.zip, 
 index-extra-v1.0-source.zip


 Extract from the Readme file:
 A.  Introduction
 The index-extra plugin allows you to configure additional fields that you 
 wish to be added to the index, based on one of the following sources:
   - The parsed text
   - Meta data fields
   - Previously created document-to-be-indexed fields
   - Plain constant string
   - Java expression combining one or more of the above, and resolving to 
 a string
 A regex can also be applied to any of the above, allowing fields to be 
 created based on patterns extracted from the source.
 B.  Installation
 1)  Binaries only:  Copy the 'index-extra' folder within 
 index-extra-v1.0-bin-java1.5.zip to NUTCHDIR/build
 Copy the 'index-extra-conf.xml' file to 
 NUTCHDIR/conf, and configure
 Enable the plugin by updating the nutch-site.xml file
 2)  Source code:Always refer to the Nutch wiki for detailed 
 instructions on building Nutch.  In short:
 Copy the 'index-extra' folder within 
 index-extra-v1.0-source.zip to NUTCHDIR/src/plugin
 Update the build.xml in NUTCHDIR/src/plugin to 
 include plugin
 Update the NUTCHDIR/default.properties file to 
 include plugin
 run ant to build
 Copy the 'index-extra-conf.xml' file to 
 NUTCHDIR/conf, and configure
 Enable the plugin by updating the nutch-site.xml file
 C.  Known Issues
 1)  For this plugin to work correctly on any document field, it is 
 necessary to run the other index filters
 first, so that all basic document fields are generated first.  To do 
 this, configure the indexingfilter.order
 property.  (Please see patch NUTCH-421 to enable indexingfilter.order 
 property. If this patch is not applied,
 the plugin will still work, but will not be able to use document fields 
 created by other index filter plugins.)
 2)  At this stage, field boost can not be used as Nutch scoring overrides 
 the field boost with its own
 document-level boost calculation.  This occurs at the end of 
 org.apache.nutch.indexer.Indexer's reduce method.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.