Build failed in Jenkins: Nutch-trunk #2758

2014-08-29 Thread Apache Jenkins Server
See 

--
[...truncated 5048 lines...]
init-plugin:

deps-jar:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 


compile:
 [echo] Compiling plugin: urlfilter-validator

deps-test-compile:

compile-test:
[javac] Compiling 1 source file to 

[javac] warning: [options] bootstrap class path not set in conjunction with 
-source 1.6
[javac] 1 warning

jar:

deps-test:

deploy:

copy-generated-lib:

test:
 [echo] Testing plugin: urlfilter-validator
[junit] Running org.apache.nutch.tika.TestRobotsMetaProcessor
[junit] WARNING: multiple versions of ant detected in path for junit 
[junit]  
jar:file:/home/jenkins/tools/ant/latest/lib/ant.jar!/org/apache/tools/ant/Project.class
[junit]  and 
jar:
[junit] Running org.apache.nutch.urlfilter.validator.TestUrlValidator
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.548 sec

init:

init-plugin:

deps-jar:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 

[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.252 sec

compile:
 [echo] Compiling plugin: urlnormalizer-basic

deps-test-compile:

compile-test:
[javac] Compiling 1 source file to 


init:

init-plugin:

deps-jar:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 


compile:
 [echo] Compiling plugin: urlnormalizer-host

deps-test-compile:

compile-test:
[javac] Compiling 1 source file to 

[javac] warning: [options] bootstrap class path not set in conjunction with 
-source 1.6
[javac] warning: [options] bootstrap class path not set in conjunction with 
-source 1.6
[javac] 1 warning

jar:

deps-test:

deploy:

copy-generated-lib:

test:
 [echo] Testing plugin: urlnormalizer-basic
[javac] 1 warning

jar:

deps-test:

deploy:

copy-generated-lib:

test:
 [echo] Testing plugin: urlnormalizer-host
[junit] WARNING: multiple versions of ant detected in path for junit 
[junit]  
jar:file:/home/jenkins/tools/ant/latest/lib/ant.jar!/org/apache/tools/ant/Project.class
[junit]  and 
jar:
[junit] WARNING: multiple versions of ant detected in path for junit 
[junit]  
jar:file:/home/jenkins/tools/ant/latest/lib/ant.jar!/org/apache/tools/ant/Project.class
[junit]  and 
jar:
[junit] Running 
org.apache.nutch.net.urlnormalizer.basic.TestBasicURLNormalizer
[junit] Running 
org.apache.nutch.net.urlnormalizer.host.TestHostURLNormalizer
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.146 sec

init:

init-plugin:

deps-jar:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 


compile:
 [echo] Compiling plugin: urlnormalizer-pass

deps-test-compile:

compile-test:
[javac] Compiling 1 source file to 

[javac] warning: [options] bootstrap class path not set in conjunction with 
-source 1.6
[javac] 1 warning

jar:

deps-test:

deploy:

copy-generated-lib:

test:
 [echo] Testing plugin: urlnormalizer-pass
[junit] WARNING: multiple versions of ant detected in path for junit 
[junit]  
jar:file:/home/jenkins/tools/ant/latest/lib/ant.jar!/org/apache/tools/ant/Project.class
[junit]  and 
jar:
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
4.196 sec

init:

init-plugin:

deps-jar:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 


compile:
 [echo] Compiling plugin: urlnormalizer-querystring

deps-test-compile:

compile-test:
[javac] Compiling 1 source file to 

[javac] warning: [options] bootstra

Re: Jump to 3.X WAS [RELEASE] Apache Nutch 1.9

2014-08-29 Thread Lewis John Mcgibbney
Hi Chris,

N.B. move to dev@

On Fri, Aug 29, 2014 at 7:40 AM,  wrote:

> +1, great.
>
> I'd like to have a conversation about versioning.
>
> Since we're at 1.9, my suggestion would be to have the
> next in the trunk series (1.x) move to version 3.x post
> 1.9 for the release.
>

Based on the discussion from which this new thread stems I would totally be
behind this. It breathes new life into trunk. Which is a bonnie feather in
the Nutch bonnet. Here is my +1 on that one.


>
> Nutch2 remains Nutch and can be worked on there. That
> would give us a nice split in the diversionary branch
> paths for Nutch.
>
>
+1


[jira] [Commented] (NUTCH-1828) bin/crawl : incorrect handling of nutch errors

2014-08-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14115171#comment-14115171
 ] 

Hudson commented on NUTCH-1828:
---

SUCCESS: Integrated in Nutch-trunk #2757 (See 
[https://builds.apache.org/job/Nutch-trunk/2757/])
NUTCH-1828 bin/crawl : incorrect handling of nutch errors (jnioche: 
http://svn.apache.org/viewvc/nutch/trunk/?view=rev&rev=1621284)
* /nutch/trunk/CHANGES.txt
* /nutch/trunk/src/bin/crawl


> bin/crawl : incorrect handling of nutch errors
> --
>
> Key: NUTCH-1828
> URL: https://issues.apache.org/jira/browse/NUTCH-1828
> Project: Nutch
>  Issue Type: Bug
>  Components: nutchNewbie
>Affects Versions: 1.9, 2.2.1
> Environment: Ubuntu Server 14.04, OpenJDK 7
>Reporter: Mathieu Bouchard
> Fix For: 2.3, 1.10
>
> Attachments: apache-nutch-1.9-crawl-fix-retcode.patch
>
>
> We are using Solr with Nutch to provide a complete search engine for our 
> website.
> I created a cron job that would use Nutch to crawl and update the Solr index 
> each night. This cron job is trying to automatically correct some errors that 
> could result in a corrupt crawldb. However, it seems that the bin/crawl 
> command doesn't correctly propagate errors coming from bin/nutch.
> Here is an exemple from the bin/crawl script :
> $bin/nutch inject $CRAWL_PATH/crawldb $SEEDDIR
> if [ $? -ne 0 ]
>   then exit $?
> fi
> Even if there is an error in the nutch inject command, the crawl script 
> always returns 0. The way I understand it, the exit code returned is the 
> result of the shell test and not the result of the nutch inject command.
> To correct this, we would need to modify the script with something like :
> $bin/nutch inject $CRAWL_PATH/crawldb $SEEDDIR
> RETCODE=$?
> if [ $RETCODE -ne 0 ]
>   then exit $RETCODE
> fi



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (NUTCH-1828) bin/crawl : incorrect handling of nutch errors

2014-08-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14115167#comment-14115167
 ] 

Hudson commented on NUTCH-1828:
---

SUCCESS: Integrated in Nutch-nutchgora #1135 (See 
[https://builds.apache.org/job/Nutch-nutchgora/1135/])
NUTCH-1828 bin/crawl : incorrect handling of nutch errors (jnioche: 
http://svn.apache.org/viewvc/nutch/branches/2.x/?view=rev&rev=1621285)
* /nutch/branches/2.x/CHANGES.txt
* /nutch/branches/2.x/src/bin/crawl


> bin/crawl : incorrect handling of nutch errors
> --
>
> Key: NUTCH-1828
> URL: https://issues.apache.org/jira/browse/NUTCH-1828
> Project: Nutch
>  Issue Type: Bug
>  Components: nutchNewbie
>Affects Versions: 1.9, 2.2.1
> Environment: Ubuntu Server 14.04, OpenJDK 7
>Reporter: Mathieu Bouchard
> Fix For: 2.3, 1.10
>
> Attachments: apache-nutch-1.9-crawl-fix-retcode.patch
>
>
> We are using Solr with Nutch to provide a complete search engine for our 
> website.
> I created a cron job that would use Nutch to crawl and update the Solr index 
> each night. This cron job is trying to automatically correct some errors that 
> could result in a corrupt crawldb. However, it seems that the bin/crawl 
> command doesn't correctly propagate errors coming from bin/nutch.
> Here is an exemple from the bin/crawl script :
> $bin/nutch inject $CRAWL_PATH/crawldb $SEEDDIR
> if [ $? -ne 0 ]
>   then exit $?
> fi
> Even if there is an error in the nutch inject command, the crawl script 
> always returns 0. The way I understand it, the exit code returned is the 
> result of the shell test and not the result of the nutch inject command.
> To correct this, we would need to modify the script with something like :
> $bin/nutch inject $CRAWL_PATH/crawldb $SEEDDIR
> RETCODE=$?
> if [ $RETCODE -ne 0 ]
>   then exit $RETCODE
> fi



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (NUTCH-1829) Generator : unable to distinguish real errors

2014-08-29 Thread Julien Nioche (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14115149#comment-14115149
 ] 

Julien Nioche commented on NUTCH-1829:
--

This would definitely be the right thing to do. We could then use this in 
scripts to retry generating later instead of exiting. What value should we use 
for indicating that there aren't any segments? 

> Generator : unable to distinguish real errors
> -
>
> Key: NUTCH-1829
> URL: https://issues.apache.org/jira/browse/NUTCH-1829
> Project: Nutch
>  Issue Type: Bug
>  Components: nutchNewbie
>Affects Versions: 1.9, 2.2.1
> Environment: Ubuntu Server 14.04, OpenJDK 7
>Reporter: Mathieu Bouchard
>Assignee: Julien Nioche
> Fix For: 2.3, 1.10
>
>
> The bin/nutch generate command is returning the same error code (-1) if there 
> is an error or no new segment to process, so there is no way to tell if the 
> error is real or not from a shell script. This problem is related to 
> NUTCH-1828.
> The problem can be fixed by modifying the following Java source file:
> http://svn.apache.org/viewvc/nutch/trunk/src/java/org/apache/nutch/crawl/Generator.java?revision=1619934&view=markup
> At line 711, if there are no new segment, the generator returns -1, which is 
> the same return code returned at line 714 if there was an error.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (NUTCH-1828) bin/crawl : incorrect handling of nutch errors

2014-08-29 Thread Julien Nioche (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Nioche resolved NUTCH-1828.
--

Resolution: Fixed

Committed in trunk  revision 1621284.
Committed in 2.x revision 1621285.

Thanks Mathieu. In your future contributions, could you please format your 
patches as explained in 
[https://wiki.apache.org/nutch/HowToContribute#Creating_a_patch]. This would 
make it easier for others to review your work.

> bin/crawl : incorrect handling of nutch errors
> --
>
> Key: NUTCH-1828
> URL: https://issues.apache.org/jira/browse/NUTCH-1828
> Project: Nutch
>  Issue Type: Bug
>  Components: nutchNewbie
>Affects Versions: 1.9, 2.2.1
> Environment: Ubuntu Server 14.04, OpenJDK 7
>Reporter: Mathieu Bouchard
> Fix For: 2.3, 1.10
>
> Attachments: apache-nutch-1.9-crawl-fix-retcode.patch
>
>
> We are using Solr with Nutch to provide a complete search engine for our 
> website.
> I created a cron job that would use Nutch to crawl and update the Solr index 
> each night. This cron job is trying to automatically correct some errors that 
> could result in a corrupt crawldb. However, it seems that the bin/crawl 
> command doesn't correctly propagate errors coming from bin/nutch.
> Here is an exemple from the bin/crawl script :
> $bin/nutch inject $CRAWL_PATH/crawldb $SEEDDIR
> if [ $? -ne 0 ]
>   then exit $?
> fi
> Even if there is an error in the nutch inject command, the crawl script 
> always returns 0. The way I understand it, the exit code returned is the 
> result of the shell test and not the result of the nutch inject command.
> To correct this, we would need to modify the script with something like :
> $bin/nutch inject $CRAWL_PATH/crawldb $SEEDDIR
> RETCODE=$?
> if [ $RETCODE -ne 0 ]
>   then exit $RETCODE
> fi



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (NUTCH-1828) bin/crawl : incorrect handling of nutch errors

2014-08-29 Thread Julien Nioche (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Nioche updated NUTCH-1828:
-

Fix Version/s: 1.10
   2.3

> bin/crawl : incorrect handling of nutch errors
> --
>
> Key: NUTCH-1828
> URL: https://issues.apache.org/jira/browse/NUTCH-1828
> Project: Nutch
>  Issue Type: Bug
>  Components: nutchNewbie
>Affects Versions: 1.9, 2.2.1
> Environment: Ubuntu Server 14.04, OpenJDK 7
>Reporter: Mathieu Bouchard
> Fix For: 2.3, 1.10
>
> Attachments: apache-nutch-1.9-crawl-fix-retcode.patch
>
>
> We are using Solr with Nutch to provide a complete search engine for our 
> website.
> I created a cron job that would use Nutch to crawl and update the Solr index 
> each night. This cron job is trying to automatically correct some errors that 
> could result in a corrupt crawldb. However, it seems that the bin/crawl 
> command doesn't correctly propagate errors coming from bin/nutch.
> Here is an exemple from the bin/crawl script :
> $bin/nutch inject $CRAWL_PATH/crawldb $SEEDDIR
> if [ $? -ne 0 ]
>   then exit $?
> fi
> Even if there is an error in the nutch inject command, the crawl script 
> always returns 0. The way I understand it, the exit code returned is the 
> result of the shell test and not the result of the nutch inject command.
> To correct this, we would need to modify the script with something like :
> $bin/nutch inject $CRAWL_PATH/crawldb $SEEDDIR
> RETCODE=$?
> if [ $RETCODE -ne 0 ]
>   then exit $RETCODE
> fi



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (NUTCH-1829) Generator : unable to distinguish real errors

2014-08-29 Thread Julien Nioche (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Nioche reassigned NUTCH-1829:


Assignee: Julien Nioche

> Generator : unable to distinguish real errors
> -
>
> Key: NUTCH-1829
> URL: https://issues.apache.org/jira/browse/NUTCH-1829
> Project: Nutch
>  Issue Type: Bug
>  Components: nutchNewbie
>Affects Versions: 1.9, 2.2.1
> Environment: Ubuntu Server 14.04, OpenJDK 7
>Reporter: Mathieu Bouchard
>Assignee: Julien Nioche
> Fix For: 2.3, 1.10
>
>
> The bin/nutch generate command is returning the same error code (-1) if there 
> is an error or no new segment to process, so there is no way to tell if the 
> error is real or not from a shell script. This problem is related to 
> NUTCH-1828.
> The problem can be fixed by modifying the following Java source file:
> http://svn.apache.org/viewvc/nutch/trunk/src/java/org/apache/nutch/crawl/Generator.java?revision=1619934&view=markup
> At line 711, if there are no new segment, the generator returns -1, which is 
> the same return code returned at line 714 if there was an error.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (NUTCH-1829) Generator : unable to distinguish real errors

2014-08-29 Thread Julien Nioche (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Nioche updated NUTCH-1829:
-

Fix Version/s: 1.10
   2.3

> Generator : unable to distinguish real errors
> -
>
> Key: NUTCH-1829
> URL: https://issues.apache.org/jira/browse/NUTCH-1829
> Project: Nutch
>  Issue Type: Bug
>  Components: nutchNewbie
>Affects Versions: 1.9, 2.2.1
> Environment: Ubuntu Server 14.04, OpenJDK 7
>Reporter: Mathieu Bouchard
> Fix For: 2.3, 1.10
>
>
> The bin/nutch generate command is returning the same error code (-1) if there 
> is an error or no new segment to process, so there is no way to tell if the 
> error is real or not from a shell script. This problem is related to 
> NUTCH-1828.
> The problem can be fixed by modifying the following Java source file:
> http://svn.apache.org/viewvc/nutch/trunk/src/java/org/apache/nutch/crawl/Generator.java?revision=1619934&view=markup
> At line 711, if there are no new segment, the generator returns -1, which is 
> the same return code returned at line 714 if there was an error.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Title of the page Version Control

2014-08-29 Thread Julien Nioche
Done! Thanks again


On 28 August 2014 16:41, Julien Nioche 
wrote:

> Thanks for reporting this Alfonso, I'll fix this tomorrow (unless a fellow
> committer beats me to it)
>
> Julien
>
>
>
> On 28 August 2014 10:13, Alfonso Nishikawa 
> wrote:
>
>> Greetings,
>>
>> I found that the page https://nutch.apache.org/version_control.html
>> states in it's title: "Apache Nutch™ - Gora Version Control System"
>>
>> Regards,
>>
>> Alfonso Nishikawa
>>
>
>
>
> --
>
> Open Source Solutions for Text Engineering
>
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble
>



-- 

Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble


Re: Incorrect download links for Nutch-1.9

2014-08-29 Thread Julien Nioche
Thanks Lewis


On 28 August 2014 22:41, Lewis John Mcgibbney 
wrote:

> Hi Jake,
> Thank you so much for reporting.
> Fixed.
> Thank you, have a great day.
> Lewis
>
>
> On Wed, Aug 27, 2014 at 9:37 AM,  wrote:
>
>>
>> Hi all,
>>
>> I noticed that following the download links for Nutch 1.9 (from
>> http://nutch.apache.org/downloads.html) takes users to a series of pages
>> all with the pattern
>> http://www.apache.org/dyn/closer.cgi/nutch/1.9/apache-nutch-1.8-*.  The
>> end of the URI has apache-nutch-1.8, rather than apache-nutch-1.9. I
>> haven’t tested any others, but at least the primary mirror for the 1.9
>> source .zip is broken.
>>
>> Has anybody caught this yet, and are there plans to fix it?
>>
>> Cheers
>>
>> Jake
>>
>>
>
>
> --
> *Lewis*
>



-- 

Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble