Build failed in Jenkins: Nutch-nutchgora #582

2013-04-25 Thread Apache Jenkins Server
See 

--
[...truncated 2928 lines...]
resolve-default:
[ivy:resolve] :: loading settings :: file = 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-nutchgora/2.x/ivy/ivysettings.xml

compile:
 [echo] Compiling plugin: urlfilter-validator
[javac] 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-nutchgora/2.x/src/plugin/build-plugin.xml:117:
 warning: 'includeantruntime' was not set, defaulting to 
build.sysclasspath=last; set to false for repeatable builds
[javac] Compiling 1 source file to 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-nutchgora/2.x/build/urlfilter-validator/classes
[javac] warning: [options] bootstrap class path not set in conjunction with 
-source 1.6
[javac] 1 warning

jar:
  [jar] Building jar: 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-nutchgora/2.x/build/urlfilter-validator/urlfilter-validator.jar

deps-test:

deploy:
 [copy] Copying 1 file to 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-nutchgora/2.x/build/plugins/urlfilter-validator

copy-generated-lib:
 [copy] Copying 1 file to 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-nutchgora/2.x/build/plugins/urlfilter-validator

init:
[mkdir] Created dir: 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-nutchgora/2.x/build/urlnormalizer-basic
[mkdir] Created dir: 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-nutchgora/2.x/build/urlnormalizer-basic/classes
[mkdir] Created dir: 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-nutchgora/2.x/build/urlnormalizer-basic/test
[mkdir] Created dir: 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-nutchgora/2.x/build/plugins/urlnormalizer-basic

init-plugin:

deps-jar:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-nutchgora/2.x/ivy/ivysettings.xml

compile:
 [echo] Compiling plugin: urlnormalizer-basic
[javac] 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-nutchgora/2.x/src/plugin/build-plugin.xml:117:
 warning: 'includeantruntime' was not set, defaulting to 
build.sysclasspath=last; set to false for repeatable builds
[javac] Compiling 1 source file to 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-nutchgora/2.x/build/urlnormalizer-basic/classes
[javac] warning: [options] bootstrap class path not set in conjunction with 
-source 1.6
[javac] 1 warning

jar:
  [jar] Building jar: 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-nutchgora/2.x/build/urlnormalizer-basic/urlnormalizer-basic.jar

deps-test:

deploy:
 [copy] Copying 1 file to 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-nutchgora/2.x/build/plugins/urlnormalizer-basic

copy-generated-lib:
 [copy] Copying 1 file to 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-nutchgora/2.x/build/plugins/urlnormalizer-basic

init:
[mkdir] Created dir: 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-nutchgora/2.x/build/urlnormalizer-pass
[mkdir] Created dir: 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-nutchgora/2.x/build/urlnormalizer-pass/classes
[mkdir] Created dir: 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-nutchgora/2.x/build/urlnormalizer-pass/test
[mkdir] Created dir: 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-nutchgora/2.x/build/plugins/urlnormalizer-pass

init-plugin:

deps-jar:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-nutchgora/2.x/ivy/ivysettings.xml

compile:
 [echo] Compiling plugin: urlnormalizer-pass
[javac] 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-nutchgora/2.x/src/plugin/build-plugin.xml:117:
 warning: 'includeantruntime' was not set, defaulting to 
build.sysclasspath=last; set to false for repeatable builds
[javac] Compiling 1 source file to 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-nutchgora/2.x/build/urlnormalizer-pass/classes
[javac] warning: [options] bootstrap class path not set in conjunction with 
-source 1.6
[javac] 1 warning

jar:
  [jar] Building jar: 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-nutchgora/2.x/build/urlnormalizer-pass/urlnormalizer-pass.jar

deps-test:

deploy:
 [copy] Copying 1 file to 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-nutchgora/2.x/build/plugins/urlnormalizer-pass

copy-generated-lib:
 [copy] Copying 1 file to 
/zonestorage/hudson_solaris/home/hudson

Build failed in Jenkins: Nutch-trunk #2181

2013-04-25 Thread Apache Jenkins Server
See 

--
[...truncated 4043 lines...]
[javac]   where K,V are type-variables:
[javac] K extends Object declared in class HashMap
[javac] V extends Object declared in class HashMap
[javac] 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-trunk/trunk/src/test/org/apache/nutch/crawl/TestCrawlDbMerger.java:78:
 warning: [unchecked] unchecked call to put(K,V) as a member of the raw type 
HashMap
[javac] expected.put(url21, cd2);
[javac] ^
[javac]   where K,V are type-variables:
[javac] K extends Object declared in class HashMap
[javac] V extends Object declared in class HashMap
[javac] 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-trunk/trunk/src/test/org/apache/nutch/crawl/TestCrawlDbMerger.java:89:
 warning: [deprecation] delete(Path) in FileSystem has been deprecated
[javac] fs.delete(testDir);
[javac]   ^
[javac] 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-trunk/trunk/src/test/org/apache/nutch/crawl/TestCrawlDbMerger.java:108:
 warning: [rawtypes] found raw type: Iterator
[javac] Iterator it = expected.keySet().iterator();
[javac] ^
[javac]   missing type arguments for generic class Iterator
[javac]   where E is a type-variable:
[javac] E extends Object declared in interface Iterator
[javac] 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-trunk/trunk/src/test/org/apache/nutch/crawl/TestCrawlDbMerger.java:123:
 warning: [deprecation] delete(Path) in FileSystem has been deprecated
[javac] fs.delete(testDir);
[javac]   ^
[javac] 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-trunk/trunk/src/test/org/apache/nutch/crawl/TestCrawlDbMerger.java:126:
 warning: [rawtypes] found raw type: TreeSet
[javac]   private void createCrawlDb(Configuration config, FileSystem fs, 
Path crawldb, TreeSet init, CrawlDatum cd) throws Exception {
[javac] 
^
[javac]   missing type arguments for generic class TreeSet
[javac]   where E is a type-variable:
[javac] E extends Object declared in class TreeSet
[javac] 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-trunk/trunk/src/test/org/apache/nutch/crawl/TestCrawlDbMerger.java:130:
 warning: [rawtypes] found raw type: Iterator
[javac] Iterator it = init.iterator();
[javac] ^
[javac]   missing type arguments for generic class Iterator
[javac]   where E is a type-variable:
[javac] E extends Object declared in interface Iterator
[javac] 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-trunk/trunk/src/test/org/apache/nutch/crawl/TestLinkDbMerger.java:71:
 warning: [rawtypes] found raw type: TreeMap
[javac]   TreeMap init1 = new TreeMap();
[javac]   ^
[javac]   missing type arguments for generic class TreeMap
[javac]   where K,V are type-variables:
[javac] K extends Object declared in class TreeMap
[javac] V extends Object declared in class TreeMap
[javac] 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-trunk/trunk/src/test/org/apache/nutch/crawl/TestLinkDbMerger.java:71:
 warning: [rawtypes] found raw type: TreeMap
[javac]   TreeMap init1 = new TreeMap();
[javac]   ^
[javac]   missing type arguments for generic class TreeMap
[javac]   where K,V are type-variables:
[javac] K extends Object declared in class TreeMap
[javac] V extends Object declared in class TreeMap
[javac] 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-trunk/trunk/src/test/org/apache/nutch/crawl/TestLinkDbMerger.java:72:
 warning: [rawtypes] found raw type: TreeMap
[javac]   TreeMap init2 = new TreeMap();
[javac]   ^
[javac]   missing type arguments for generic class TreeMap
[javac]   where K,V are type-variables:
[javac] K extends Object declared in class TreeMap
[javac] V extends Object declared in class TreeMap
[javac] 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-trunk/trunk/src/test/org/apache/nutch/crawl/TestLinkDbMerger.java:72:
 warning: [rawtypes] found raw type: TreeMap
[javac]   TreeMap init2 = new TreeMap();
[javac]   ^
[javac]   missing type arguments for generic class TreeMap
[javac]   where K,V are type-variables:
[javac] K extends Object declared in class TreeMap
[javac] V extends Object declared in class TreeMap
[javac] 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Nutch-trunk/trunk/src/test/org/apache/nutch/crawl/TestLinkDbMerger.java:73:
 warning: [rawtypes] found raw type: HashMap
[javac]   Has

Partial Updates in Solr 4.1

2013-04-25 Thread Jay Springbernate
Hey Nutchers! Hope you all are doing fine.

My friend and I are the creators of
Punkspiderand we use nutch
heavily with the Solr indexing feature. But besides the
data fetched from the crawling, we also need to update the documents with
the summary data of our scans, so we decided to move to the last version of
Solr that allows partial updates. We'd rather avoid having to read the
document to then combine the data with the fresh one, and then save the
merged document.
So I made a change in the indexer that allows to update just some fields of
the document, leaving the other intact, identifying the document by its id.
I'm attaching the file, hope you find it useful. The original lines are
comment out

Thanks for all and keep the good work.

Regards
Tomas Fornara


SolrWriter.java
Description: Binary data


[jira] [Updated] (NUTCH-1314) Impose a limit on the length of outlink target urls

2013-04-25 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-1314:


Attachment: NUTCH-1314-trunk.patch
NUTCH-1314-v2.patch

Fresh patches for 2.x and trunk branches respectively. [~markus17] I tried to 
accommodate your suggestions but please let me know if there is something we 
can work on.
If someone could test it would be great.
Thanks
Lewis  

> Impose a limit on the length of outlink target urls
> ---
>
> Key: NUTCH-1314
> URL: https://issues.apache.org/jira/browse/NUTCH-1314
> Project: Nutch
>  Issue Type: Improvement
>Reporter: Ferdy Galema
> Fix For: 1.7, 2.2
>
> Attachments: NUTCH-1314.patch, NUTCH-1314-trunk.patch, 
> NUTCH-1314-v2.patch
>
>
> In the past we have encountered situations where crawling specific broken 
> sites resulted in ridiciously long urls that caused the stalling of tasks. 
> The regex plugins (normalizing/filtering) processed single urls for hours, if 
> not indefinitely hanging.
> My suggestion is to limit the outlink url target length as soon possible. It 
> is a configurable limit, the default is 3000. This should be reasonably long 
> enough for most uses. But sufficienly strict enough to make sure regex 
> plugins do not choke on urls that are too long. Please see attached patch for 
> the Nutchgora implementation.
> I'd like to hear what you think about this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (NUTCH-1371) Replace Ivy with Maven Ant tasks

2013-04-25 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-1371:


Fix Version/s: (was: 2.2)
   2.3

> Replace Ivy with Maven Ant tasks
> 
>
> Key: NUTCH-1371
> URL: https://issues.apache.org/jira/browse/NUTCH-1371
> Project: Nutch
>  Issue Type: Improvement
>  Components: build
>Reporter: Julien Nioche
>Assignee: Lewis John McGibbney
> Fix For: 1.7, 2.3
>
> Attachments: NUTCH-1371.patch, NUTCH-1371-pom.patch, 
> NUTCH-1371-r1461140.patch
>
>
> We might move to Maven altogether but a good intermediate step could be to 
> rely on the maven ant tasks for managing the dependencies. Ivy does a good 
> job but we need to have a pom file anyway for publishing the artefacts which 
> means keeping the pom.xml and ivy.xml contents in sync. Most devs are also 
> more familiar with Maven, and it is well integrated in IDEs. Going the 
> ANT+MVN way also means that we don't have to rewrite the whole building 
> process and can rely on our existing script

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: GSoC Student Application Window Now Open

2013-04-25 Thread Ahmet Emre Aladağ

Hi,

I have been working for a while and preparing my proposal for LinkRank 
project and wondering if the proposals should only be written on the 
wiki or could they be delivered in another format like pdf?


Best,
Emre.


On 04/25/2013 07:46 AM, Lewis John Mcgibbney wrote:

Hi All,
Apology for cross post...

I thought I would relay this message to students who are now 
monitoring the above lists.


Google are now accepting applications from students to participate in 
Google Summer of Code 2013. The deadline to apply is May 3rd at 19:00 
UTC. Late proposals will not be accepted for any reason.


I would encourage applicants to get on to the project wiki's and begin 
getting ideas down which will act as a base for individual projects. 
Additionally, the community lists are the place to voice any queries 
and to obtain advice about project direction, scope, deliverables, etc.


Good luck, I am really looking forward to working with some of you 
this year as there are some great proposals so far.


Lewis

--
/Lewis/





[jira] [Comment Edited] (NUTCH-1555) Move to commons-cli for command line parsing

2013-04-25 Thread lufeng (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13641869#comment-13641869
 ] 

lufeng edited comment on NUTCH-1555 at 4/25/13 2:48 PM:


Lewis:
1. fixed the fetch NPE bug
2. fixed the update not work bug

Should we put every tools to use commons-cli? I find that there are 47 files 
need to be moved.

Sebastian:
1. use eclipse-codeformat.xml to format the source code

Thanks Lewis and Sebastian.

  was (Author: amuseme.lu):
Lewis:
1. fixed the fetch NPE bug
2. fixed the update not work bug

Should we put every tools to use commons-cli? I find that there are 47 files 
need to be moved.

[~wastl-nagel]
1. use eclipse-codeformat.xml to format the source code

Thanks Lewis and Sebastian.
  
> Move to commons-cli for command line parsing 
> -
>
> Key: NUTCH-1555
> URL: https://issues.apache.org/jira/browse/NUTCH-1555
> Project: Nutch
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.1
>Reporter: Lewis John McGibbney
>Assignee: lufeng
> Fix For: 2.2
>
> Attachments: NUTCH-1555.patch, NUTCH-1555-v1.patch
>
>
> I just accidentally passed in the following argument to parser job
> {code}
> law@CEE279Law3-Linux:~/Downloads/asf/2.x/runtime/local$ ./bin/nutch parse 
> updatedb
> ParserJob: starting
> ParserJob: resuming:  false
> ParserJob: forced reparse:false
> ParserJob: batchId:   updatedb
> ParserJob: success
> {code}
> This is a bug for sure

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (NUTCH-1555) Move to commons-cli for command line parsing

2013-04-25 Thread lufeng (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lufeng updated NUTCH-1555:
--

Attachment: NUTCH-1555-v1.patch

Lewis:
1. fixed the fetch NPE bug
2. fixed the update not work bug

Should we put every tools to use commons-cli? I find that there are 47 files 
need to be moved.

[~wastl-nagel]
1. use eclipse-codeformat.xml to format the source code

Thanks Lewis and Sebastian.

> Move to commons-cli for command line parsing 
> -
>
> Key: NUTCH-1555
> URL: https://issues.apache.org/jira/browse/NUTCH-1555
> Project: Nutch
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.1
>Reporter: Lewis John McGibbney
>Assignee: lufeng
> Fix For: 2.2
>
> Attachments: NUTCH-1555.patch, NUTCH-1555-v1.patch
>
>
> I just accidentally passed in the following argument to parser job
> {code}
> law@CEE279Law3-Linux:~/Downloads/asf/2.x/runtime/local$ ./bin/nutch parse 
> updatedb
> ParserJob: starting
> ParserJob: resuming:  false
> ParserJob: forced reparse:false
> ParserJob: batchId:   updatedb
> ParserJob: success
> {code}
> This is a bug for sure

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira