Re: Need pointers regarding accessing crawled data/customizing policy for crawl.

2008-01-17 Thread Andrzej Bialecki
Manoj Bist wrote: Hi, I posted this on nutch-user earlier but it did not elicit any response. I would really appreciate any pointers regarding this. 1.) Is it possible to have a control on the 'policy' to decide how soon a url is fetched. For e.g. if a document does not change frequently, I

Build failed in Hudson: Nutch-Nightly #331

2008-01-17 Thread hudson
See http://lucene.zones.apache.org:8080/hudson/job/Nutch-Nightly/331/changes -- started ERROR: svn: PROPFIND request failed on '/repos/asf/lucene/nutch/trunk' svn: Connection timed out org.tmatesoft.svn.core.SVNException: svn: PROPFIND request failed on

End-Of-Life status for 0.7.x?

2008-01-17 Thread Andrzej Bialecki
Hi all, I'd like to initiate the discussion about the EOL status of Nutch 0.7.x branch. The question is whether we want to actively support it, whether we have enough resources to make any new releases or apply patches that sit in JIRA? My opinion is that we should mark it EOL, and close

[jira] Updated: (NUTCH-570) Improvement of URL Ordering in Generator.java

2008-01-17 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated NUTCH-570: Patch Info: [Patch Available] Improvement of URL Ordering in Generator.java

[jira] Commented: (NUTCH-186) mapred-default.xml is over ridden by nutch-site.xml

2008-01-17 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12560073#action_12560073 ] Andrzej Bialecki commented on NUTCH-186: - Not applicable after the code has been

[jira] Commented: (NUTCH-152) TaskRunner io pipes are not setDaemon(true), cleanup and exception errors are incomplete, max heap too small

2008-01-17 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12560074#action_12560074 ] Andrzej Bialecki commented on NUTCH-152: - Not applicable after this code was moved

[jira] Resolved: (NUTCH-152) TaskRunner io pipes are not setDaemon(true), cleanup and exception errors are incomplete, max heap too small

2008-01-17 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki resolved NUTCH-152. - Resolution: Invalid Fix Version/s: 0.8 Assignee: Andrzej Bialecki

[jira] Resolved: (NUTCH-186) mapred-default.xml is over ridden by nutch-site.xml

2008-01-17 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki resolved NUTCH-186. - Resolution: Invalid Fix Version/s: 0.8 Assignee: Andrzej Bialecki

[jira] Closed: (NUTCH-95) DeleteDuplicates depends on the order of input segments

2008-01-17 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-95?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki closed NUTCH-95. -- Resolution: Duplicate DeleteDuplicates depends on the order of input segments

[jira] Closed: (NUTCH-159) Specify temp/working directory for crawl

2008-01-17 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki closed NUTCH-159. --- Resolution: Won't Fix Fix Version/s: 0.8 Specify temp/working directory for crawl

Re: End-Of-Life status for 0.7.x?

2008-01-17 Thread Dennis Kubes
+1. Andrzej Bialecki wrote: Hi all, I'd like to initiate the discussion about the EOL status of Nutch 0.7.x branch. The question is whether we want to actively support it, whether we have enough resources to make any new releases or apply patches that sit in JIRA? My opinion is that we

[jira] Commented: (NUTCH-159) Specify temp/working directory for crawl

2008-01-17 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12560075#action_12560075 ] Andrzej Bialecki commented on NUTCH-159: - No longer applicable (moved to Hadoop)

[jira] Commented: (NUTCH-95) DeleteDuplicates depends on the order of input segments

2008-01-17 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-95?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12560076#action_12560076 ] Andrzej Bialecki commented on NUTCH-95: See NUTCH-371. DeleteDuplicates depends on

Re: End-Of-Life status for 0.7.x?

2008-01-17 Thread Yousef Ourabi
+1 Nothing is sacred. On 1/17/08, Dennis Kubes [EMAIL PROTECTED] wrote: +1. Andrzej Bialecki wrote: Hi all, I'd like to initiate the discussion about the EOL status of Nutch 0.7.x branch. The question is whether we want to actively support it, whether we have enough resources to

Re: End-Of-Life status for 0.7.x?

2008-01-17 Thread Chris Mattmann
+1 On 1/17/08 12:49 PM, Dennis Kubes [EMAIL PROTECTED] wrote: +1. Andrzej Bialecki wrote: Hi all, I'd like to initiate the discussion about the EOL status of Nutch 0.7.x branch. The question is whether we want to actively support it, whether we have enough resources to make any new

New Developer

2008-01-17 Thread Ahmad Dahlan
Hi, I'am new to Nutch. But according to its feature I will use it as a basis of my application. I want to implement Citation Analysis and PageRank in Nutch as part of my dissertation. At the moment I just started as a user. I use Windows XP with ApacheTomcat6, Cygwin, jdk1.6.0_03, and Nutch-0.9.

Re: End-Of-Life status for 0.7.x?

2008-01-17 Thread Sami Siren
Andrzej Bialecki wrote: Hi all, My opinion is that we should mark it EOL, and close all JIRA issues that are relevant only to 0.7.x, with the status Won't Fix. +1 -- Sami Siren

Hudson build is back to normal: Nutch-Nightly #332

2008-01-17 Thread hudson
See http://lucene.zones.apache.org:8080/hudson/job/Nutch-Nightly/332/changes