Re: Providing a list of FAQ's with every new subscribe request

2011-10-04 Thread lewis john mcgibbney
Hi Sami,

Thanks for this.

I think if we could get something similar to the following it would be
great. This means that instead of having to maintain yet another series of
documentation we are consistently referring back to the wiki for the source
of all documentation.
Just a quick introduction of the project and then some housekeeping. Please
anyone add material they think appropriate for new users to get exposed to.

Thank you

--
Welcome to user@/dev@nutch.apache.org

Apache Nutch is an open source web-search software project. Stemming from
Apache Lucene, it now builds on Apache Solr adding web-specifics, such as a
crawler, a link-graph database and parsing support handled by Apache Tika
for HTML and and array other document formats.

Apache Nutch can run on a single machine, but gains a lot of its strength
from running in a Hadoop cluster

The system can be enhanced (eg other document formats can be parsed) using a
highly flexible, easily extensible and thoroughly maintained plugin
infrastructure.

For more information about Apache Nutch, please see the Nutch wiki.
House Keeping on the lists

Questions that are not answered in the FAQ [1] or in the wiki documentation
[2] should be posted to the appropriate mailing list.

Please stick to technical issues on the discussion forum and mailing lists.
Keep in mind that these are public, so do not include any confidential
information in your questions!

You should also read the *Mailing Lists Developer Resource* [3] before
participating in the discussion forum and mailing lists.
NOTE: Please do NOT submit bugs, patches, or feature requests to the mailing
lists. Refer instead to Commiter's_Rules [4] and HowToContribute [5] areas
of the Nutch wiki.

[1] http://wiki.apache.org/nutch/FAQ
[2] http://wiki.apache.org/nutch/FrontPage
[3] http://www.apache.org/dev/#mail
[4] http://wiki.apache.org/nutch/Committer%27s_Rules
[5] http://wiki.apache.org/nutch/HowToContribute

--

On Mon, Oct 3, 2011 at 5:40 PM, Sami Siren  wrote:

>
>
> On Mon, Oct 3, 2011 at 3:48 PM, lewis john mcgibbney <
> lewis.mcgibb...@gmail.com> wrote:
>
>>
>> Would it be possible to send out a list of our official FAQ's when a new
>> user confirms their subscription to both user@ and dev@ lists.
>>
>>
> It seems this is possible. Can you craft a piece of text you would like to
> be sent out on successful subscribe and I'll try to set it up.
>
> This is the full list of files ezmlm lists as editable, just in case if
> someone comes up with something else to customize:
>
> FileUse
>
> bottom  bottom of all responses. General command info.
> digest  'administrivia' section of digests.
> faq frequently asked questions specific to this list.
> get_bad in place of messages not found in the archive.
> helpgeneral help (between 'top' and 'bottom').
> infolist info. First line should be meaningful on its own.
> mod_helpspecific help for list moderators.
> mod_reject  to sender of rejected post.
> mod_request to message moderators together with post.
> mod_sub to subscriber after moderator confirmed subscribe.
> mod_sub_confirm to subscription mod to request subscribe confirm.
> mod_timeout to sender of timed-out post.
> mod_unsub_confirm   to remote admin to request unsubscribe confirm.
> sub_bad to subscriber if confirm was bad.
> sub_confirm to subscriber to request subscribe confirm.
> sub_nop to subscriber after re-subscription.
> sub_ok  to subscriber after successful subscription.
> top top of all responses.
> unsub_bad   to subscriber if unsubscribe confirm was bad.
> unsub_confirm   to subscriber to request unsubscribe confirm.
> unsub_nop   to non-subscriber after unsubscribe.
> unsub_okto ex-subscriber after successful unsubscribe.
>
> --
>  Sami Siren
>
>


-- 
*Lewis*


[jira] [Created] (NUTCH-1146) Get rid of _success files in webgraph code

2011-10-04 Thread Markus Jelsma (Created) (JIRA)
Get rid of _success files in webgraph code
--

 Key: NUTCH-1146
 URL: https://issues.apache.org/jira/browse/NUTCH-1146
 Project: Nutch
  Issue Type: Task
Reporter: Markus Jelsma
Assignee: Markus Jelsma
Priority: Trivial
 Fix For: 1.5


WebGraph tools here and there also suffer from reading a _SUCCESS file. All 
jobs there should disable this setting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (NUTCH-1136) Ant pmd target is broken

2011-10-04 Thread Markus Jelsma (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13120023#comment-13120023
 ] 

Markus Jelsma commented on NUTCH-1136:
--

Seems fine to me.

> Ant pmd target is broken
> 
>
> Key: NUTCH-1136
> URL: https://issues.apache.org/jira/browse/NUTCH-1136
> Project: Nutch
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.4, nutchgora
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.4, nutchgora
>
> Attachments: NUTCH-1136-nutchgora-20110930.patch, 
> NUTCH-1136-trunk-1.4-20110930.patch
>
>
> issuing an 'ant pmd' command results in a failure as follows
> {code}
> BUILD FAILED
> /home/lewis/ASF/trunk/build.xml:327: taskdef class 
> net.sourceforge.pmd.ant.PMDTask cannot be found
>  using the classloader AntClassLoader[]
> {code}
> The resulting fix should address this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Tests for Nutchgora

2011-10-04 Thread lewis john mcgibbney
Hi everyone,

I'm roughly aware who is using Nutchgora on a regular basis, however I'm not
aware who has a working knowledge of the tests, in particular the Gora-based
tests, which Julien opened a ticket for [2] and I opened a series of tickets
for [1].

I'm concerned that the errors we constantly see seem specific to the
database setup assumed by the tests, and are probably best diagnosed by
people familiar with
the area of application and test suite(s). I'm very keen to get them
working, then to continue attempting to get our missing test cases written.
However I could really do with an idea of where to start.

Thanks for any pointers.

[1] https://issues.apache.org/jira/browse/NUTCH-1081
[2] https://issues.apache.org/jira/browse/NUTCH-896
-- 
*Lewis*


[Nutch Wiki] Trivial Update of "CommandLineOptions" by MarkusJelsma

2011-10-04 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "CommandLineOptions" page has been changed by MarkusJelsma:
http://wiki.apache.org/nutch/CommandLineOptions?action=diff&rev1=35&rev2=36

  == Useful Plugin Classes ==
  
   * bin/nutch plugin urlnormalizer-regex 
org.apache.nutch.net.urlnormalizer.regex.[[RegexURLNormalizer]]
+  * bin/nutch plugin lib-http 
org.apache.nutch.protocol.http.api.RobotRulesParser
  
  == Other Classes ==
  


[Nutch Wiki] Trivial Update of "CommandLineOptions" by MarkusJelsma

2011-10-04 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "CommandLineOptions" page has been changed by MarkusJelsma:
http://wiki.apache.org/nutch/CommandLineOptions?action=diff&rev1=36&rev2=37

   * bin/nutch 
org.apache.nutch.scoring.webgraph.[[NewScoringIndexingExample|LinkRank]]
   * bin/nutch 
org.apache.nutch.scoring.webgraph.[[NewScoringIndexingExample|ScoreUpdater]]
   * bin/nutch 
org.apache.nutch.scoring.webgraph.[[NewScoringIndexingExample|NodeDumper]]
+  * bin/nutch 
org.apache.nutch.scoring.webgraph.[[NewScoringIndexingExample|NodeReader]]
+  * bin/nutch 
org.apache.nutch.scoring.webgraph.[[NewScoringIndexingExample|LoopReader]]
+  * bin/nutch 
org.apache.nutch.scoring.webgraph.[[NewScoringIndexingExample|LinkDumper]]
  
  == Useful Plugin Classes ==
  


[jira] [Created] (NUTCH-1147) WebGraph nodeDumper uses only 1 reducer

2011-10-04 Thread Markus Jelsma (Created) (JIRA)
WebGraph nodeDumper uses only 1 reducer
---

 Key: NUTCH-1147
 URL: https://issues.apache.org/jira/browse/NUTCH-1147
 Project: Nutch
  Issue Type: Improvement
Reporter: Markus Jelsma
Assignee: Markus Jelsma
Priority: Trivial
 Fix For: 1.5


The noderDumper is restricted to only one reducer, making it slow and producing 
too large files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (NUTCH-1148) PluginManifestParser cannot load plugins from classpath that is dynamically set using a contextClassLoader.

2011-10-04 Thread Ferdy (Created) (JIRA)
PluginManifestParser cannot load plugins from classpath that is dynamically set 
using a contextClassLoader.
---

 Key: NUTCH-1148
 URL: https://issues.apache.org/jira/browse/NUTCH-1148
 Project: Nutch
  Issue Type: Bug
Reporter: Ferdy


This affects running nutchgora using Hadoop it's RunJar mechanism (hadoop jar 
...). The mr tasks are perfectly able to load the plugins (please note 
NUTCH-937). But, when the plugins are loaded from the *job submitter* process 
itself, loading plugins might fail due to classloading issues. This is caused 
by the fact that PluginManifestParser does not use the contextClassLoader that 
is set by RunJar. This classloader contains the plugins folder. At least the 
FetcherJob is affected by this, because the job submitter uses getFields of 
Protocol implementations, therefore loading the plugins.

The current 1.x is not affected because it does not load plugins at any point 
during the job submission. This might of course change so I propose to 'fix' 
the issue in the 1.x branch as well.

The solution is fairly simple, PluginManifestParser should use the 
contextClassLoader of the current thread instead of using the system 
classloader. I will attach patch right away. It currently works but it still 
needs some further testing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (NUTCH-1148) PluginManifestParser cannot load plugins from classpath that is dynamically set using a contextClassLoader.

2011-10-04 Thread Ferdy (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdy updated NUTCH-1148:
-

Attachment: NUTCH-1148-v1.patch

> PluginManifestParser cannot load plugins from classpath that is dynamically 
> set using a contextClassLoader.
> ---
>
> Key: NUTCH-1148
> URL: https://issues.apache.org/jira/browse/NUTCH-1148
> Project: Nutch
>  Issue Type: Bug
>Reporter: Ferdy
> Attachments: NUTCH-1148-v1.patch
>
>
> This affects running nutchgora using Hadoop it's RunJar mechanism (hadoop jar 
> ...). The mr tasks are perfectly able to load the plugins (please note 
> NUTCH-937). But, when the plugins are loaded from the *job submitter* process 
> itself, loading plugins might fail due to classloading issues. This is caused 
> by the fact that PluginManifestParser does not use the contextClassLoader 
> that is set by RunJar. This classloader contains the plugins folder. At least 
> the FetcherJob is affected by this, because the job submitter uses getFields 
> of Protocol implementations, therefore loading the plugins.
> The current 1.x is not affected because it does not load plugins at any point 
> during the job submission. This might of course change so I propose to 'fix' 
> the issue in the 1.x branch as well.
> The solution is fairly simple, PluginManifestParser should use the 
> contextClassLoader of the current thread instead of using the system 
> classloader. I will attach patch right away. It currently works but it still 
> needs some further testing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (NUTCH-1149) DomainStats should process numeric CrawlDB metadata

2011-10-04 Thread Markus Jelsma (Created) (JIRA)
DomainStats should process numeric CrawlDB metadata
---

 Key: NUTCH-1149
 URL: https://issues.apache.org/jira/browse/NUTCH-1149
 Project: Nutch
  Issue Type: Improvement
Reporter: Markus Jelsma
Assignee: Markus Jelsma
Priority: Trivial
 Fix For: 1.5


Right now the DomainStats program only outputs the sum of fetched records per 
domain or host. It should also be able to output processed numerics of meta 
data in order to get the average size (content length) for a given domain or 
host. This is also useful for generating a metric for adult material (by domain 
or host) when using a plugin that stores a propability factor of adult material 
per URL in the Crawl DB.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (NUTCH-1124) JUnit test for scoring-opic

2011-10-04 Thread Markus Jelsma (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13120628#comment-13120628
 ] 

Markus Jelsma commented on NUTCH-1124:
--

I'm not sure if this is neccessary as OPIC is not really considered a proper 
method for scoring incremental crawl's, which is what most do.

> JUnit test for scoring-opic
> ---
>
> Key: NUTCH-1124
> URL: https://issues.apache.org/jira/browse/NUTCH-1124
> Project: Nutch
>  Issue Type: Sub-task
>  Components: build
>Affects Versions: 1.4
>Reporter: Lewis John McGibbney
>Priority: Minor
> Fix For: 1.5
>
>
> This issue is part of the larger attempt to provide a Junit test case for 
> every Nutch plugin.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Build failed in Jenkins: Nutch-nutchgora #26

2011-10-04 Thread Apache Jenkins Server
See 

--
Started by timer
Building remotely on solaris1
hudson.util.IOException2: remote file operation failed: 
 at 
hudson.remoting.Channel@2a6dafee:solaris1
at hudson.FilePath.act(FilePath.java:754)
at hudson.FilePath.act(FilePath.java:740)
at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:731)
at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:676)
at hudson.model.AbstractProject.checkout(AbstractProject.java:1193)
at 
hudson.model.AbstractBuild$AbstractRunner.checkout(AbstractBuild.java:566)
at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:454)
at hudson.model.Run.run(Run.java:1376)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:230)
Caused by: java.io.IOException: Remote call on solaris1 failed
at hudson.remoting.Channel.call(Channel.java:690)
at hudson.FilePath.act(FilePath.java:747)
... 10 more
Caused by: java.lang.LinkageError: duplicate class definition: 
hudson/model/Descriptor
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:621)
at java.lang.ClassLoader.defineClass(ClassLoader.java:466)
at 
hudson.remoting.RemoteClassLoader.loadClassFile(RemoteClassLoader.java:151)
at 
hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:131)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
at java.lang.Class.getDeclaredFields0(Native Method)
at java.lang.Class.privateGetDeclaredFields(Class.java:2259)
at java.lang.Class.getDeclaredField(Class.java:1852)
at 
java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1582)
at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:52)
at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:408)
at java.security.AccessController.doPrivileged(Native Method)
at java.io.ObjectStreamClass.(ObjectStreamClass.java:400)
at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:297)
at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:531)
at 
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1552)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1466)
at 
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1552)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1466)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1699)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1910)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1834)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1910)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1834)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1910)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1834)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
at hudson.remoting.UserRequest.deserialize(UserRequest.java:182)
at hudson.remoting.UserRequest.perform(UserRequest.java:98)
at hudson.remoting.UserRequest.perform(UserRequest.java:48)
at hudson.remoting.Request$2.run(Request.java:287)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:417)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269)
at java.util.concurrent.FutureTask.run(FutureTask.java:123)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:651)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:676)
at java.lang.Thread.run(Thread.java:595)
[TASKS] Skipping publisher since build result is FAILURE



Build failed in Jenkins: Nutch-trunk #1624

2011-10-04 Thread Apache Jenkins Server
See 

--
Started by timer
Building remotely on solaris1
hudson.util.IOException2: remote file operation failed: 
 at 
hudson.remoting.Channel@2a6dafee:solaris1
at hudson.FilePath.act(FilePath.java:754)
at hudson.FilePath.act(FilePath.java:740)
at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:731)
at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:676)
at hudson.model.AbstractProject.checkout(AbstractProject.java:1193)
at 
hudson.model.AbstractBuild$AbstractRunner.checkout(AbstractBuild.java:566)
at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:454)
at hudson.model.Run.run(Run.java:1376)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:230)
Caused by: java.io.IOException: Remote call on solaris1 failed
at hudson.remoting.Channel.call(Channel.java:690)
at hudson.FilePath.act(FilePath.java:747)
... 10 more
Caused by: java.lang.LinkageError: duplicate class definition: 
hudson/model/Descriptor
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:621)
at java.lang.ClassLoader.defineClass(ClassLoader.java:466)
at 
hudson.remoting.RemoteClassLoader.loadClassFile(RemoteClassLoader.java:151)
at 
hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:131)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
at java.lang.Class.getDeclaredFields0(Native Method)
at java.lang.Class.privateGetDeclaredFields(Class.java:2259)
at java.lang.Class.getDeclaredField(Class.java:1852)
at 
java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1582)
at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:52)
at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:408)
at java.security.AccessController.doPrivileged(Native Method)
at java.io.ObjectStreamClass.(ObjectStreamClass.java:400)
at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:297)
at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:531)
at 
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1552)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1466)
at 
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1552)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1466)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1699)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1910)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1834)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1910)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1834)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1910)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1834)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
at hudson.remoting.UserRequest.deserialize(UserRequest.java:182)
at hudson.remoting.UserRequest.perform(UserRequest.java:98)
at hudson.remoting.UserRequest.perform(UserRequest.java:48)
at hudson.remoting.Request$2.run(Request.java:287)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:417)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269)
at java.util.concurrent.FutureTask.run(FutureTask.java:123)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:651)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:676)
at java.lang.Thread.run(Thread.java:595)
Recording test results
Publishing Javadoc



Re: Build failed in Jenkins: Nutch-trunk #1624

2011-10-04 Thread Sami Siren
The Nutch nightly builds seem to fail awfully often with some strange
errors that are not really related to the build itself. Is someone
trying to figure out what's going on? Is it perhaps a config issue or
something else that could be easily remedied?

--
 Sami Siren

On Wed, Oct 5, 2011 at 7:09 AM, Apache Jenkins Server
 wrote:
> See 
>
> --
> Started by timer
> Building remotely on solaris1
> hudson.util.IOException2: remote file operation failed: 
>  at 
> hudson.remoting.Channel@2a6dafee:solaris1
>        at hudson.FilePath.act(FilePath.java:754)
>        at hudson.FilePath.act(FilePath.java:740)
>        at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:731)
>        at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:676)
>        at hudson.model.AbstractProject.checkout(AbstractProject.java:1193)
>        at 
> hudson.model.AbstractBuild$AbstractRunner.checkout(AbstractBuild.java:566)
>        at 
> hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:454)
>        at hudson.model.Run.run(Run.java:1376)
>        at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
>        at hudson.model.ResourceController.execute(ResourceController.java:88)
>        at hudson.model.Executor.run(Executor.java:230)
> Caused by: java.io.IOException: Remote call on solaris1 failed
>        at hudson.remoting.Channel.call(Channel.java:690)
>        at hudson.FilePath.act(FilePath.java:747)
>        ... 10 more
> Caused by: java.lang.LinkageError: duplicate class definition: 
> hudson/model/Descriptor
>        at java.lang.ClassLoader.defineClass1(Native Method)
>        at java.lang.ClassLoader.defineClass(ClassLoader.java:621)
>        at java.lang.ClassLoader.defineClass(ClassLoader.java:466)
>        at 
> hudson.remoting.RemoteClassLoader.loadClassFile(RemoteClassLoader.java:151)
>        at 
> hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:131)
>        at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>        at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
>        at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
>        at java.lang.Class.getDeclaredFields0(Native Method)
>        at java.lang.Class.privateGetDeclaredFields(Class.java:2259)
>        at java.lang.Class.getDeclaredField(Class.java:1852)
>        at 
> java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1582)
>        at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:52)
>        at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:408)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at java.io.ObjectStreamClass.(ObjectStreamClass.java:400)
>        at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:297)
>        at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:531)
>        at 
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1552)
>        at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1466)
>        at 
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1552)
>        at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1466)
>        at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1699)
>        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
>        at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1910)
>        at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1834)
>        at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
>        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
>        at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1910)
>        at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1834)
>        at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
>        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
>        at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1910)
>        at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1834)
>        at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
>        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
>        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
>        at hudson.remoting.UserRequest.deserialize(UserRequest.java:182)
>        at hudson.remoting.UserRequest.perform(UserRequest.java:98)
>        at hudson.remoting.UserRequest.perform(UserRequest.java:48)
>        at hudson.remoting.Request$2.run(Request.java:287)
>        at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:417)
>        at java.util.concurrent.FutureTask$Sync.innerRun(

[jira] [Commented] (NUTCH-1148) PluginManifestParser cannot load plugins from classpath that is dynamically set using a contextClassLoader.

2011-10-04 Thread Ferdy (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13120714#comment-13120714
 ] 

Ferdy commented on NUTCH-1148:
--

For completeness: a fetcher job fails when run as 'hadoop jar ...' with the 
following exception. This is using the current nutchgora branch. (Btw is anyone 
able to affirm this? I could not find much about this on the mailing list and 
issuetracker. I want to make sure that is not caused by something else.)

11/10/05 08:27:05 WARN plugin.PluginRepository: Plugins: directory not found: 
plugins
11/10/05 08:27:05 INFO plugin.PluginRepository: Plugin Auto-activation mode: 
[true]
11/10/05 08:27:05 INFO plugin.PluginRepository: Registered Plugins:
11/10/05 08:27:05 INFO plugin.PluginRepository: NONE
11/10/05 08:27:05 INFO plugin.PluginRepository: Registered Extension-Points:
11/10/05 08:27:05 INFO plugin.PluginRepository: NONE
Exception in thread "main" java.lang.RuntimeException: x-point 
org.apache.nutch.protocol.Protocol not found.
at 
org.apache.nutch.protocol.ProtocolFactory.(ProtocolFactory.java:55)
at org.apache.nutch.fetcher.FetcherJob.getFields(FetcherJob.java:144)
at org.apache.nutch.fetcher.FetcherJob.run(FetcherJob.java:183)
at org.apache.nutch.fetcher.FetcherJob.fetch(FetcherJob.java:224)
at org.apache.nutch.fetcher.FetcherJob.run(FetcherJob.java:309)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.fetcher.FetcherJob.main(FetcherJob.java:315)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:186)


> PluginManifestParser cannot load plugins from classpath that is dynamically 
> set using a contextClassLoader.
> ---
>
> Key: NUTCH-1148
> URL: https://issues.apache.org/jira/browse/NUTCH-1148
> Project: Nutch
>  Issue Type: Bug
>Reporter: Ferdy
> Attachments: NUTCH-1148-v1.patch
>
>
> This affects running nutchgora using Hadoop it's RunJar mechanism (hadoop jar 
> ...). The mr tasks are perfectly able to load the plugins (please note 
> NUTCH-937). But, when the plugins are loaded from the *job submitter* process 
> itself, loading plugins might fail due to classloading issues. This is caused 
> by the fact that PluginManifestParser does not use the contextClassLoader 
> that is set by RunJar. This classloader contains the plugins folder. At least 
> the FetcherJob is affected by this, because the job submitter uses getFields 
> of Protocol implementations, therefore loading the plugins.
> The current 1.x is not affected because it does not load plugins at any point 
> during the job submission. This might of course change so I propose to 'fix' 
> the issue in the 1.x branch as well.
> The solution is fairly simple, PluginManifestParser should use the 
> contextClassLoader of the current thread instead of using the system 
> classloader. I will attach patch right away. It currently works but it still 
> needs some further testing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira