Build failed in Hudson: Nutch-trunk #369

2008-02-24 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Nutch-trunk/369/changes

--
[...truncated 4599 lines...]

copy-generated-lib:
 [copy] Copying 1 file to 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-regex
 

init:
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-suffix
 
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-suffix/classes
 
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-suffix/test
 

init-plugin:

deps-jar:

compile:
 [echo] Compiling plugin: urlfilter-suffix
[javac] Compiling 1 source file to 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-suffix/classes
 
[javac] Note: 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/src/plugin/urlfilter-suffix/src/java/org/apache/nutch/urlfilter/suffix/SuffixURLFilter.java
  uses unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

jar:
  [jar] Building jar: 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-suffix/urlfilter-suffix.jar
 

deps-test:

deploy:
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-suffix
 
 [copy] Copying 1 file to 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-suffix
 

copy-generated-lib:
 [copy] Copying 1 file to 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-suffix
 

init:
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-validator
 
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-validator/classes
 
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-validator/test
 

init-plugin:

deps-jar:

compile:
 [echo] Compiling plugin: urlfilter-validator
[javac] Compiling 1 source file to 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-validator/classes
 

jar:
  [jar] Building jar: 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-validator/urlfilter-validator.jar
 

deps-test:

deploy:
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-validator
 
 [copy] Copying 1 file to 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-validator
 

copy-generated-lib:
 [copy] Copying 1 file to 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-validator
 

init:
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-basic
 
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-basic/classes
 
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-basic/test
 

init-plugin:

deps-jar:

compile:
 [echo] Compiling plugin: urlnormalizer-basic
[javac] Compiling 1 source file to 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-basic/classes
 

jar:
  [jar] Building jar: 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-basic/urlnormalizer-basic.jar
 

deps-test:

deploy:
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlnormalizer-basic
 
 [copy] Copying 1 file to 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlnormalizer-basic
 

copy-generated-lib:
 [copy] Copying 1 file to 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlnormalizer-basic
 

init:
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-pass
 
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-pass/classes
 
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-pass/test
 

init-plugin:

deps-jar:

compile:
 [echo] Compiling plugin: urlnormalizer-pass
[javac] Compiling 1 source file to 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-pass/classes
 

jar:
  [jar] Building jar: 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-pass/urlnormalizer-pass.jar
 

deps-test:

deploy:
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlnormalizer-pass
 
 [copy] Copying 1 file to 
http://hudson.zones.apache.org/hudson/job

[jira] Updated: (NUTCH-578) URL fetched with 403 is generated over and over again

2008-02-24 Thread Emmanuel Joke (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emmanuel Joke updated NUTCH-578:


Attachment: NUTCH-578_v2.patch

Actually i just realised that the setPageRetrySchedule in AbstractSchedule was 
not correctly defined.

This patch fix this issue too.

> URL fetched with 403 is generated over and over again
> -
>
> Key: NUTCH-578
> URL: https://issues.apache.org/jira/browse/NUTCH-578
> Project: Nutch
>  Issue Type: Bug
>  Components: generator
>Affects Versions: 1.0.0
> Environment: Ubuntu Gutsy Gibbon (7.10) running on VMware server. I 
> have checked out the most recent version of the trunk as of Nov 20, 2007
>Reporter: Nathaniel Powell
> Fix For: 1.0.0
>
> Attachments: crawl-urlfilter.txt, NUTCH-578.patch, 
> NUTCH-578_v2.patch, nutch-site.xml, regex-normalize.xml, urls.txt
>
>
> I have not changed the following parameter in the nutch-default.xml:
> 
>   db.fetch.retry.max
>   3
>   The maximum number of times a url that has encountered
>   recoverable errors is generated for fetch.
> 
> However, there is a URL which is on the site that I'm crawling, 
> www.teachertube.com, which keeps being generated over and over again for 
> almost every segment (many more times than 3):
> fetch of http://www.teachertube.com/images/ failed with: Http code=403, 
> url=http://www.teachertube.com/images/
> This is a bug, right?
> Thanks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (NUTCH-578) URL fetched with 403 is generated over and over again

2008-02-24 Thread Emmanuel Joke (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emmanuel Joke updated NUTCH-578:


Attachment: NUTCH-578.patch

I've got the same error for page with an HTTP status code = 503.

I found the issue in the CrawlDbReduce class. The fetchtime was not refresh 
correctly according to the DB Status.
My patch fix this issue.

> URL fetched with 403 is generated over and over again
> -
>
> Key: NUTCH-578
> URL: https://issues.apache.org/jira/browse/NUTCH-578
> Project: Nutch
>  Issue Type: Bug
>  Components: generator
>Affects Versions: 1.0.0
> Environment: Ubuntu Gutsy Gibbon (7.10) running on VMware server. I 
> have checked out the most recent version of the trunk as of Nov 20, 2007
>Reporter: Nathaniel Powell
> Fix For: 1.0.0
>
> Attachments: crawl-urlfilter.txt, NUTCH-578.patch, nutch-site.xml, 
> regex-normalize.xml, urls.txt
>
>
> I have not changed the following parameter in the nutch-default.xml:
> 
>   db.fetch.retry.max
>   3
>   The maximum number of times a url that has encountered
>   recoverable errors is generated for fetch.
> 
> However, there is a URL which is on the site that I'm crawling, 
> www.teachertube.com, which keeps being generated over and over again for 
> almost every segment (many more times than 3):
> fetch of http://www.teachertube.com/images/ failed with: Http code=403, 
> url=http://www.teachertube.com/images/
> This is a bug, right?
> Thanks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.