Re: [VOTE] Release Apache Nutch 1.0

2009-03-09 Thread Marko Bauhardt

my non-binding +1

marko


On Mar 8, 2009, at 10:07 PM, Dennis Kubes wrote:


Non-binding +1 too :)

Sami Siren wrote:

Hello,
I have packaged the first release candidate for Apache Nutch 1.0  
release at

http://people.apache.org/~siren/nutch-1.0/rc0/
See the included CHANGES.txt file for details on release contents  
and latest changes. The release was made from tag: http://svn.apache.org/viewvc/lucene/nutch/tags/release-1.0-rc0/?pathrev=751480 
 Please vote on releasing this package as Apache Nutch 1.0. The  
vote is open for the next 72 hours. Only votes from Lucene PMC  
members are binding, but everyone is welcome to check the release  
candidate and voice their approval or disapproval. The vote  passes  
if at least three binding +1 votes are cast.

[ ] +1 Release the packages as Apache Nutch 1.0
[ ] -1 Do not release the packages because...
Thanks!
--
Sami Siren






NUTCH-684 [was: Re: [VOTE] Release Apache Nutch 1.0]

2009-03-09 Thread Sami Siren

Dog(acan Güney wrote:

On Sun, Mar 8, 2009 at 20:25, Sami Siren ssi...@gmail.com wrote:
  

Hello,

I have packaged the first release candidate for Apache Nutch 1.0 release at

http://people.apache.org/~siren/nutch-1.0/rc0/

See the included CHANGES.txt file for details on release contents and latest
changes. The release was made from tag:
http://svn.apache.org/viewvc/lucene/nutch/tags/release-1.0-rc0/?pathrev=751480

Please vote on releasing this package as Apache Nutch 1.0. The vote is open
for the next 72 hours. Only votes from Lucene PMC members are binding, but
everyone is welcome to check the release candidate and voice their approval
or disapproval. The vote  passes if at least three binding +1 votes are
cast.

[ ] +1 Release the packages as Apache Nutch 1.0
[ ] -1 Do not release the packages because...

Thanks!




That's great!

I would like to see NUTCH-684 in but I guess I was too late :)

Anyway, my non-binding +1.
  


uh, I missed that one, sorry. Do you think it's ready to be included? 
(IMO that's an important feature) It's not a big deal for me to rebuild 
the package with that feature included.


--
Sami Siren



Re: planning for nutch-1.0-rc1

2009-03-09 Thread Bartosz Gadzimski

Hello,

It's on 2 linux boxes one with centos and one with ubuntu. Both properly 
running old bin/nutch crawl.
Problem is that it doesn't give exception on command line or in eclipse 
just writes to logs so it's hard to debug.


One is running nutch trunk from 07 march, and one from todays rc1

Any hints? Maybe some logs properties or sth?

In hadoop.log it looks exactly the same:

2009-03-09 12:12:09,452 INFO  plugin.PluginRepository - Nutch 
Scoring (org.apache.nutch.scoring.ScoringFilter)
2009-03-09 12:12:09,452 INFO  plugin.PluginRepository - Ontology 
Model Loader (org.apache.nutch.ontology.Ontology)
2009-03-09 12:12:09,560 INFO  field.FieldIndexer - IFD [Thread-11]: 
setInfoStream 
deletionpolicy=org.apache.lucene.index.keeponlylastcommitdeletionpol...@6210fb
2009-03-09 12:12:09,560 INFO  field.FieldIndexer - IW 0 [Thread-11]: 
setInfoStream: 
dir=org.apache.lucene.store.FSDirectory@/tmp/hadoop-agniesia441/mapred/local/index/_-174719952 
autoCommit=true
mergepolicy=org.apache.lucene.index.logbytesizemergepol...@48edb5 
mergescheduler=org.apache.lucene.index.concurrentmergeschedu...@1ee2c2c 
ramBufferSizeMB=16.0 maxBufferedDocs=50 maxBuffereDeleteTerms=-1 
maxFieldLength=1 index=

2009-03-09 12:12:09,585 WARN  mapred.LocalJobRunner - job_local_0001
java.lang.NullPointerException
   at 
org.apache.nutch.indexer.field.FieldIndexer$OutputFormat$1.write(FieldIndexer.java:139)
   at 
org.apache.nutch.indexer.field.FieldIndexer$OutputFormat$1.write(FieldIndexer.java:1)
   at 
org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:410)
   at 
org.apache.nutch.indexer.field.FieldIndexer.reduce(FieldIndexer.java:239)
   at 
org.apache.nutch.indexer.field.FieldIndexer.reduce(FieldIndexer.java:1)

   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:436)
   at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:170)
2009-03-09 12:12:10,021 FATAL field.FieldIndexer - FieldIndexer: 
java.io.IOException: Job failed!

   at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
   at 
org.apache.nutch.indexer.field.FieldIndexer.index(FieldIndexer.java:267)
   at 
org.apache.nutch.indexer.field.FieldIndexer.run(FieldIndexer.java:312)

   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
   at 
org.apache.nutch.indexer.field.FieldIndexer.main(FieldIndexer.java:275)



Thanks,
Bartosz


Dennis Kubes pisze:
Sorry about the docs being sparse on this.  I will write more about 
the process as time permits.  Don't know about the problem below.  
What platform are you running on, windows, linux?


Dennis

Bartosz Gadzimski wrote:

Hello,

Thanks Dennis for updateing wiki it helped a lot.

You gave example with indexing but you didn't said a bit about it. 
Can you write some more? :)


Anyways I have problems at the last step (nutch from 07 march):

bin/nutch org.apache.nutch.indexer.field.FieldIndexer

It simply stops somewhere

2009-03-07 16:09:04,432 INFO  field.FieldIndexer - FieldIndexer: 
starting
2009-03-07 16:09:04,436 INFO  field.FieldIndexer - FieldIndexer: 
adding fields db: crawl/fields/basicfields
2009-03-07 16:09:04,498 INFO  field.FieldIndexer - FieldIndexer: 
adding fields db: crawl/fields/anchorfields
2009-03-07 16:09:05,636 INFO  plugin.PluginRepository - Plugins: 
looking in: /usr/local/nutch/plugins
2009-03-07 16:09:06,437 INFO  plugin.PluginRepository - Plugin 
Auto-activation mode: [true]
2009-03-07 16:09:06,437 INFO  plugin.PluginRepository - Registered 
Plugins:
2009-03-07 16:09:06,437 INFO  plugin.PluginRepository - the 
nutch core extension points (nutch-extensionpoints)
2009-03-07 16:09:06,437 INFO  plugin.PluginRepository - Basic 
Query Filter (query-basic)

 plugins

2009-03-07 16:09:07,769 INFO  field.FieldIndexer - IFD [Thread-11]: 
setInfoStream 
deletionpolicy=org.apache.lucene.index.keeponlylastcommitdeletionpol...@1b4a74b 

2009-03-07 16:09:07,769 INFO  field.FieldIndexer - IW 0 [Thread-11]: 
setInfoStream: 
dir=org.apache.lucene.store.FSDirectory@/tmp/hadoop-root/mapred/local/index/_-884655313 
autoCommit=true 
mergepolicy=org.apache.lucene.index.logbytesizemergepol...@15356d5 
mergescheduler=org.apache.lucene.index.concurrentmergeschedu...@69d02b 
ramBufferSizeMB=16.0 maxBufferedDocs=50 maxBuffereDeleteTerms=-1 
maxFieldLength=1 index=

2009-03-07 16:09:07,781 WARN  mapred.LocalJobRunner - job_local_0001
java.lang.NullPointerException
   at 
org.apache.nutch.indexer.field.FieldIndexer$OutputFormat$1.write(FieldIndexer.java:139) 

   at 
org.apache.nutch.indexer.field.FieldIndexer$OutputFormat$1.write(FieldIndexer.java:131) 

   at 
org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:410)
   at 
org.apache.nutch.indexer.field.FieldIndexer.reduce(FieldIndexer.java:239) 

   at 
org.apache.nutch.indexer.field.FieldIndexer.reduce(FieldIndexer.java:69)

   at 

Re: NUTCH-684 [was: Re: [VOTE] Release Apache Nutch 1.0]

2009-03-09 Thread Doğacan Güney



On 09.Mar.2009, at 11:05, Sami Siren ssi...@gmail.com wrote:


Doğacan Güney wrote:


On Sun, Mar 8, 2009 at 20:25, Sami Siren ssi...@gmail.com wrote:


Hello,

I have packaged the first release candidate for Apache Nutch 1.0  
release at


http://people.apache.org/~siren/nutch-1.0/rc0/

See the included CHANGES.txt file for details on release contents  
and latest

changes. The release was made from tag:
http://svn.apache.org/viewvc/lucene/nutch/tags/release-1.0-rc0/?pathrev=751480

Please vote on releasing this package as Apache Nutch 1.0. The  
vote is open
for the next 72 hours. Only votes from Lucene PMC members are  
binding, but
everyone is welcome to check the release candidate and voice their  
approval
or disapproval. The vote  passes if at least three binding +1  
votes are

cast.

[ ] +1 Release the packages as Apache Nutch 1.0
[ ] -1 Do not release the packages because...

Thanks!



That's great!

I would like to see NUTCH-684 in but I guess I was too late :)

Anyway, my non-binding +1.



uh, I missed that one, sorry. Do you think it's ready to be  
included? (IMO that's an important feature) It's not a big deal for  
me to rebuild the package with that feature included.




I only tested it on a small crawl. Still, I believe it is important  
too so I would like to include it. Worst case we release a 1.0.1 soon  
after:)



--
 Sami Siren



[jira] Commented: (NUTCH-684) Dedup support for Solr

2009-03-09 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12680173#action_12680173
 ] 

Shalin Shekhar Mangar commented on NUTCH-684:
-

Just found this issue from Sami's post on Lucid blog. Are you guys aware of the 
Deduplication feature in Solr trunk?

http://wiki.apache.org/solr/Deduplication and SOLR-799

 Dedup support for Solr
 --

 Key: NUTCH-684
 URL: https://issues.apache.org/jira/browse/NUTCH-684
 Project: Nutch
  Issue Type: New Feature
  Components: indexer
Reporter: Doğacan Güney
Assignee: Doğacan Güney
 Attachments: NUTCH-684_bin_nutch.patch, NUTCH-684_solrdedup_v2.patch, 
 solrdedup.patch, solrdedup_v2.patch


 After NUTCH-442, nutch now can index to both solr and lucene. However, 
 duplicate deletion feature (based on digests) is only available in lucene. It 
 should also be available for solr.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (NUTCH-713) Config options for webgraph Scoring not documented

2009-03-09 Thread Eric J. Christeson (JIRA)
Config options for webgraph Scoring not documented
--

 Key: NUTCH-713
 URL: https://issues.apache.org/jira/browse/NUTCH-713
 Project: Nutch
  Issue Type: Improvement
  Components: indexer
Affects Versions: 1.0.0
 Environment: All
Reporter: Eric J. Christeson
Priority: Minor


There are a number of properties for webgraph scoring that are only documented 
in code.  I have found these:

link.ignore.internal.host
link.ignore.internal.domain
link.ignore.limit.domain
link.ignore.limit.host
link.ignore.limit.page
link.loops.depth
link.analyze.initial.score
link.analyze.damping.factor
link.analyze.rank.one
link.analyze.iteration
link.analyze.num.iterations

I have a patch to add these to conf/nutch-default.xml with the best description 
I could find.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: NUTCH-684 [was: Re: [VOTE] Release Apache Nutch 1.0]

2009-03-09 Thread Sami Siren

Doğacan Güney wrote:



On 09.Mar.2009, at 11:05, Sami Siren ssi...@gmail.com 
mailto:ssi...@gmail.com wrote:



Doğacan Güney wrote:

On Sun, Mar 8, 2009 at 20:25, Sami Siren ssi...@gmail.com 
mailto:ssi...@gmail.com wrote:
  

Hello,

I have packaged the first release candidate for Apache Nutch 1.0 release at

http://people.apache.org/~siren/nutch-1.0/rc0/ 
http://people.apache.org/%7Esiren/nutch-1.0/rc0/

See the included CHANGES.txt file for details on release contents and latest
changes. The release was made from tag:
http://svn.apache.org/viewvc/lucene/nutch/tags/release-1.0-rc0/?pathrev=751480

Please vote on releasing this package as Apache Nutch 1.0. The vote is open
for the next 72 hours. Only votes from Lucene PMC members are binding, but
everyone is welcome to check the release candidate and voice their approval
or disapproval. The vote  passes if at least three binding +1 votes are
cast.

[ ] +1 Release the packages as Apache Nutch 1.0
[ ] -1 Do not release the packages because...

Thanks!



That's great!

I would like to see NUTCH-684 in but I guess I was too late :)

Anyway, my non-binding +1.
  


uh, I missed that one, sorry. Do you think it's ready to be included? 
(IMO that's an important feature) It's not a big deal for me to 
rebuild the package with that feature included.




I only tested it on a small crawl. Still, I believe it is important 
too so I would like to include it. Worst case we release a 1.0.1 soon 
after:)
I am fine either way. So if you think it's good enough to go in just 
commit it and I'll build another rc. If not then we can release it later 
too when it's ready.


--
Sami Siren





--
 Sami Siren





[jira] Updated: (NUTCH-713) Config options for webgraph Scoring not documented

2009-03-09 Thread Eric J. Christeson (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric J. Christeson updated NUTCH-713:
-

Attachment: webgraph-scoring.diff

Patch to add config options to conf/nutch-default.xml

 Config options for webgraph Scoring not documented
 --

 Key: NUTCH-713
 URL: https://issues.apache.org/jira/browse/NUTCH-713
 Project: Nutch
  Issue Type: Improvement
  Components: indexer
Affects Versions: 1.0.0
 Environment: All
Reporter: Eric J. Christeson
Priority: Minor
 Attachments: webgraph-scoring.diff

   Original Estimate: 1h
  Remaining Estimate: 1h

 There are a number of properties for webgraph scoring that are only 
 documented in code.  I have found these:
 link.ignore.internal.host
 link.ignore.internal.domain
 link.ignore.limit.domain
 link.ignore.limit.host
 link.ignore.limit.page
 link.loops.depth
 link.analyze.initial.score
 link.analyze.damping.factor
 link.analyze.rank.one
 link.analyze.iteration
 link.analyze.num.iterations
 I have a patch to add these to conf/nutch-default.xml with the best 
 description I could find.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [VOTE] Release Apache Nutch 1.0

2009-03-09 Thread Eric J. Christeson


non-binding +1

--
Eric J. Christeson  
eric.christe...@ndsu.edu

Enterprise Computing and Infrastructure(701) 231-8693 (Voice)
North Dakota State University



PGP.sig
Description: This is a digitally signed message part


[jira] Commented: (NUTCH-684) Dedup support for Solr

2009-03-09 Thread Andrzej Bialecki (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12680194#action_12680194
 ] 

Andrzej Bialecki  commented on NUTCH-684:
-

Yes, I'm aware of this functionality. At this point however I thought that it 
would only complicate things, because users would have to install Nutch classes 
on Solr in order to use Signature implementations that we use. This is of 
course an open issue that we should investigate after 1.0 release.

 Dedup support for Solr
 --

 Key: NUTCH-684
 URL: https://issues.apache.org/jira/browse/NUTCH-684
 Project: Nutch
  Issue Type: New Feature
  Components: indexer
Reporter: Doğacan Güney
Assignee: Doğacan Güney
 Attachments: NUTCH-684_bin_nutch.patch, NUTCH-684_solrdedup_v2.patch, 
 solrdedup.patch, solrdedup_v2.patch


 After NUTCH-442, nutch now can index to both solr and lucene. However, 
 duplicate deletion feature (based on digests) is only available in lucene. It 
 should also be available for solr.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: NUTCH-684 [was: Re: [VOTE] Release Apache Nutch 1.0]

2009-03-09 Thread Doğacan Güney
On Mon, Mar 9, 2009 at 17:46, Sami Siren ssi...@gmail.com wrote:
 Doğacan Güney wrote:


 On 09.Mar.2009, at 11:05, Sami Siren ssi...@gmail.com
 mailto:ssi...@gmail.com wrote:

 Doğacan Güney wrote:

 On Sun, Mar 8, 2009 at 20:25, Sami Siren ssi...@gmail.com
 mailto:ssi...@gmail.com wrote:


 Hello,

 I have packaged the first release candidate for Apache Nutch 1.0
 release at

 http://people.apache.org/~siren/nutch-1.0/rc0/
 http://people.apache.org/%7Esiren/nutch-1.0/rc0/

 See the included CHANGES.txt file for details on release contents and
 latest
 changes. The release was made from tag:

 http://svn.apache.org/viewvc/lucene/nutch/tags/release-1.0-rc0/?pathrev=751480

 Please vote on releasing this package as Apache Nutch 1.0. The vote is
 open
 for the next 72 hours. Only votes from Lucene PMC members are binding,
 but
 everyone is welcome to check the release candidate and voice their
 approval
 or disapproval. The vote  passes if at least three binding +1 votes are
 cast.

 [ ] +1 Release the packages as Apache Nutch 1.0
 [ ] -1 Do not release the packages because...

 Thanks!



 That's great!

 I would like to see NUTCH-684 in but I guess I was too late :)

 Anyway, my non-binding +1.


 uh, I missed that one, sorry. Do you think it's ready to be included?
 (IMO that's an important feature) It's not a big deal for me to rebuild the
 package with that feature included.


 I only tested it on a small crawl. Still, I believe it is important too so
 I would like to include it. Worst case we release a 1.0.1 soon after:)

 I am fine either way. So if you think it's good enough to go in just commit
 it and I'll build another rc. If not then we can release it later too when
 it's ready.


Committed, thanks for waiting :)

 --
 Sami Siren



 --
  Sami Siren






-- 
Doğacan Güney


[jira] Closed: (NUTCH-684) Dedup support for Solr

2009-03-09 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/NUTCH-684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doğacan Güney closed NUTCH-684.
---

   Resolution: Fixed
Fix Version/s: 1.0.0

Fixed as of rev. 751774.

 Dedup support for Solr
 --

 Key: NUTCH-684
 URL: https://issues.apache.org/jira/browse/NUTCH-684
 Project: Nutch
  Issue Type: New Feature
  Components: indexer
Reporter: Doğacan Güney
Assignee: Doğacan Güney
 Fix For: 1.0.0

 Attachments: NUTCH-684_bin_nutch.patch, NUTCH-684_solrdedup_v2.patch, 
 solrdedup.patch, solrdedup_v2.patch


 After NUTCH-442, nutch now can index to both solr and lucene. However, 
 duplicate deletion feature (based on digests) is only available in lucene. It 
 should also be available for solr.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Nutch ML cleanup

2009-03-09 Thread Otis Gospodnetic

Hi,

This has been bugging me for a while now.  For some reason Nutch MLs get the 
most junk emails - both rude/rudeish emails, as well as clear spam (with 
SPAM in the subject - something must be detecting it).  

I just looked at the headers of the clearly labeled spam messages and found 
that they all seem to come from SF:

 To: nutch-...@lists.sourceforge.net
 To: nutch-gene...@lists.sourceforge.net

I assume there is some kind of a mail forward from the old Nutch MLs on SF to 
the new Nutch MLs at ASF.
Do you think we could remove this forwarding and get rid of this spam?

Sami  Andrzej seem to be members who mght be able to make this change:

http://sourceforge.net/project/memberlist.php?group_id=59548

Otis


[jira] Created: (NUTCH-714) Need a SFTP and SCP Protocol Handler

2009-03-09 Thread Sanjoy Ghosh (JIRA)
Need a SFTP and SCP Protocol Handler


 Key: NUTCH-714
 URL: https://issues.apache.org/jira/browse/NUTCH-714
 Project: Nutch
  Issue Type: New Feature
  Components: fetcher
Affects Versions: 0.9.0
Reporter: Sanjoy Ghosh
 Fix For: 0.8.2


An SFTP and SCP Protocol handler is needed to fetch intranet content on an SFTP 
or SCP server.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (NUTCH-714) Need a SFTP and SCP Protocol Handler

2009-03-09 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12680348#action_12680348
 ] 

Chris A. Mattmann commented on NUTCH-714:
-

Hi Sanjoy,

When you get a patch, let me know and I will work to integrate it. For 
reference, you were intending this as an upgrade for 0.8.2? I think we should 
probably do this as a post 1.0 upgrade (maybe 1.1)?

Cheers,.
Chris


 Need a SFTP and SCP Protocol Handler
 

 Key: NUTCH-714
 URL: https://issues.apache.org/jira/browse/NUTCH-714
 Project: Nutch
  Issue Type: New Feature
  Components: fetcher
Affects Versions: 0.9.0
Reporter: Sanjoy Ghosh
Assignee: Chris A. Mattmann
 Fix For: 0.8.2


 An SFTP and SCP Protocol handler is needed to fetch intranet content on an 
 SFTP or SCP server.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (NUTCH-714) Need a SFTP and SCP Protocol Handler

2009-03-09 Thread Chris A. Mattmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris A. Mattmann reassigned NUTCH-714:
---

Assignee: Chris A. Mattmann

 Need a SFTP and SCP Protocol Handler
 

 Key: NUTCH-714
 URL: https://issues.apache.org/jira/browse/NUTCH-714
 Project: Nutch
  Issue Type: New Feature
  Components: fetcher
Affects Versions: 0.9.0
Reporter: Sanjoy Ghosh
Assignee: Chris A. Mattmann
 Fix For: 0.8.2


 An SFTP and SCP Protocol handler is needed to fetch intranet content on an 
 SFTP or SCP server.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (NUTCH-684) Dedup support for Solr

2009-03-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12680374#action_12680374
 ] 

Hudson commented on NUTCH-684:
--

Integrated in Nutch-trunk #748 (See 
[http://hudson.zones.apache.org/hudson/job/Nutch-trunk/748/])
 - Dedup support for Solr


 Dedup support for Solr
 --

 Key: NUTCH-684
 URL: https://issues.apache.org/jira/browse/NUTCH-684
 Project: Nutch
  Issue Type: New Feature
  Components: indexer
Reporter: Doğacan Güney
Assignee: Doğacan Güney
 Fix For: 1.0.0

 Attachments: NUTCH-684_bin_nutch.patch, NUTCH-684_solrdedup_v2.patch, 
 solrdedup.patch, solrdedup_v2.patch


 After NUTCH-442, nutch now can index to both solr and lucene. However, 
 duplicate deletion feature (based on digests) is only available in lucene. It 
 should also be available for solr.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (NUTCH-715) Subcollection plugin doesn't work with default subcollections.xml file

2009-03-09 Thread Dmitry Lihachev (JIRA)
Subcollection plugin doesn't work with default subcollections.xml file
--

 Key: NUTCH-715
 URL: https://issues.apache.org/jira/browse/NUTCH-715
 Project: Nutch
  Issue Type: Bug
  Components: indexer
Affects Versions: 1.0.0
Reporter: Dmitry Lihachev
 Fix For: 1.0.0


Subcollection plugin cann't parse his configuration file because it contatins 
top level comment (ASF notice) and DomUtil doesn't carry about of top-level 
comments

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.