Build failed in Hudson: Nutch-trunk #369
See http://hudson.zones.apache.org/hudson/job/Nutch-trunk/369/changes -- [...truncated 4599 lines...] copy-generated-lib: [copy] Copying 1 file to http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-regex init: [mkdir] Created dir: http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-suffix [mkdir] Created dir: http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-suffix/classes [mkdir] Created dir: http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-suffix/test init-plugin: deps-jar: compile: [echo] Compiling plugin: urlfilter-suffix [javac] Compiling 1 source file to http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-suffix/classes [javac] Note: http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/src/plugin/urlfilter-suffix/src/java/org/apache/nutch/urlfilter/suffix/SuffixURLFilter.java uses unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. jar: [jar] Building jar: http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-suffix/urlfilter-suffix.jar deps-test: deploy: [mkdir] Created dir: http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-suffix [copy] Copying 1 file to http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-suffix copy-generated-lib: [copy] Copying 1 file to http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-suffix init: [mkdir] Created dir: http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-validator [mkdir] Created dir: http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-validator/classes [mkdir] Created dir: http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-validator/test init-plugin: deps-jar: compile: [echo] Compiling plugin: urlfilter-validator [javac] Compiling 1 source file to http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-validator/classes jar: [jar] Building jar: http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-validator/urlfilter-validator.jar deps-test: deploy: [mkdir] Created dir: http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-validator [copy] Copying 1 file to http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-validator copy-generated-lib: [copy] Copying 1 file to http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-validator init: [mkdir] Created dir: http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-basic [mkdir] Created dir: http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-basic/classes [mkdir] Created dir: http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-basic/test init-plugin: deps-jar: compile: [echo] Compiling plugin: urlnormalizer-basic [javac] Compiling 1 source file to http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-basic/classes jar: [jar] Building jar: http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-basic/urlnormalizer-basic.jar deps-test: deploy: [mkdir] Created dir: http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlnormalizer-basic [copy] Copying 1 file to http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlnormalizer-basic copy-generated-lib: [copy] Copying 1 file to http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlnormalizer-basic init: [mkdir] Created dir: http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-pass [mkdir] Created dir: http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-pass/classes [mkdir] Created dir: http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-pass/test init-plugin: deps-jar: compile: [echo] Compiling plugin: urlnormalizer-pass [javac] Compiling 1 source file to http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-pass/classes jar: [jar] Building jar: http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-pass/urlnormalizer-pass.jar deps-test: deploy: [mkdir] Created dir: http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlnormalizer-pass [copy] Copying 1 file to http://hudson.zones.apache.org/hudson/job
[jira] Updated: (NUTCH-578) URL fetched with 403 is generated over and over again
[ https://issues.apache.org/jira/browse/NUTCH-578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Emmanuel Joke updated NUTCH-578: Attachment: NUTCH-578_v2.patch Actually i just realised that the setPageRetrySchedule in AbstractSchedule was not correctly defined. This patch fix this issue too. > URL fetched with 403 is generated over and over again > - > > Key: NUTCH-578 > URL: https://issues.apache.org/jira/browse/NUTCH-578 > Project: Nutch > Issue Type: Bug > Components: generator >Affects Versions: 1.0.0 > Environment: Ubuntu Gutsy Gibbon (7.10) running on VMware server. I > have checked out the most recent version of the trunk as of Nov 20, 2007 >Reporter: Nathaniel Powell > Fix For: 1.0.0 > > Attachments: crawl-urlfilter.txt, NUTCH-578.patch, > NUTCH-578_v2.patch, nutch-site.xml, regex-normalize.xml, urls.txt > > > I have not changed the following parameter in the nutch-default.xml: > > db.fetch.retry.max > 3 > The maximum number of times a url that has encountered > recoverable errors is generated for fetch. > > However, there is a URL which is on the site that I'm crawling, > www.teachertube.com, which keeps being generated over and over again for > almost every segment (many more times than 3): > fetch of http://www.teachertube.com/images/ failed with: Http code=403, > url=http://www.teachertube.com/images/ > This is a bug, right? > Thanks. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (NUTCH-578) URL fetched with 403 is generated over and over again
[ https://issues.apache.org/jira/browse/NUTCH-578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Emmanuel Joke updated NUTCH-578: Attachment: NUTCH-578.patch I've got the same error for page with an HTTP status code = 503. I found the issue in the CrawlDbReduce class. The fetchtime was not refresh correctly according to the DB Status. My patch fix this issue. > URL fetched with 403 is generated over and over again > - > > Key: NUTCH-578 > URL: https://issues.apache.org/jira/browse/NUTCH-578 > Project: Nutch > Issue Type: Bug > Components: generator >Affects Versions: 1.0.0 > Environment: Ubuntu Gutsy Gibbon (7.10) running on VMware server. I > have checked out the most recent version of the trunk as of Nov 20, 2007 >Reporter: Nathaniel Powell > Fix For: 1.0.0 > > Attachments: crawl-urlfilter.txt, NUTCH-578.patch, nutch-site.xml, > regex-normalize.xml, urls.txt > > > I have not changed the following parameter in the nutch-default.xml: > > db.fetch.retry.max > 3 > The maximum number of times a url that has encountered > recoverable errors is generated for fetch. > > However, there is a URL which is on the site that I'm crawling, > www.teachertube.com, which keeps being generated over and over again for > almost every segment (many more times than 3): > fetch of http://www.teachertube.com/images/ failed with: Http code=403, > url=http://www.teachertube.com/images/ > This is a bug, right? > Thanks. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.