Re: [VOTE] Apache Nutch 1.5 release rc #1

2012-04-16 Thread Sami Siren
On Mon, Apr 16, 2012 at 8:43 AM, Mattmann, Chris A (388J)
chris.a.mattm...@jpl.nasa.gov wrote:
 Hi Folks,

 A candidate for the Nutch 1.5 release is available at:

  http://people.apache.org/~mattmann/apache-nutch-1.5/rc1/

 The release candidate is a zip and tar.gz archive of the sources in:

  http://svn.apache.org/repos/asf/nutch/tags/release-1.5/

 And a binary build suitable for deployment.

 A staged Maven repository is available here:

 https://repository.apache.org/content/repositories/orgapachenutch-054/

 Please vote on releasing this package as Apache Nutch 1.5.
 The vote is open for the next 72 hours and passes if a majority of at
 least three +1 Nutch PMC votes are cast.

  [ ] +1 Release this package as Apache Nutch 1.5
  [ ] -1 Do not release this package because...


The basics are good:
md5 and sha1 checksums for apache-nutch-1.5-bin.tar.gz and
apache-nutch-1.5-src.tar.gz  match
ant clean test completes succesfully for the source package
completed a simple crawl with local mode and a small hadoop 1.0.2
cluster by using the artifacts in the binary package

but it seems there are some license headers missing from source files:
[rat:report]  
==/home/sam/nutch/apache-nutch-1.5/src/java/org/apache/nutch/indexer/IndexingFiltersChecker.java
[rat:report]  
==/home/sam/nutch/apache-nutch-1.5/src/plugin/creativecommons/src/web/web.xml
[rat:report]  
==/home/sam/nutch/apache-nutch-1.5/src/plugin/protocol-httpclient/src/test/conf/httpclient-auth-test.xml
[rat:report]  
==/home/sam/nutch/apache-nutch-1.5/src/plugin/protocol-httpclient/src/test/conf/nutch-site-test.xml

-1 because of missing license headers

--
 Sami Siren


Re: Providing a list of FAQ's with every new subscribe request

2011-10-05 Thread Sami Siren
The  sub_ok files for user  dev have now been updtaed with the text
you provided.

--
 Sami Siren


Re: Build failed in Jenkins: Nutch-trunk #1624

2011-10-04 Thread Sami Siren
The Nutch nightly builds seem to fail awfully often with some strange
errors that are not really related to the build itself. Is someone
trying to figure out what's going on? Is it perhaps a config issue or
something else that could be easily remedied?

--
 Sami Siren

On Wed, Oct 5, 2011 at 7:09 AM, Apache Jenkins Server
jenk...@builds.apache.org wrote:
 See https://builds.apache.org/job/Nutch-trunk/1624/

 --
 Started by timer
 Building remotely on solaris1
 hudson.util.IOException2: remote file operation failed: 
 https://builds.apache.org/job/Nutch-trunk/ws/ at 
 hudson.remoting.Channel@2a6dafee:solaris1
        at hudson.FilePath.act(FilePath.java:754)
        at hudson.FilePath.act(FilePath.java:740)
        at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:731)
        at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:676)
        at hudson.model.AbstractProject.checkout(AbstractProject.java:1193)
        at 
 hudson.model.AbstractBuild$AbstractRunner.checkout(AbstractBuild.java:566)
        at 
 hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:454)
        at hudson.model.Run.run(Run.java:1376)
        at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
        at hudson.model.ResourceController.execute(ResourceController.java:88)
        at hudson.model.Executor.run(Executor.java:230)
 Caused by: java.io.IOException: Remote call on solaris1 failed
        at hudson.remoting.Channel.call(Channel.java:690)
        at hudson.FilePath.act(FilePath.java:747)
        ... 10 more
 Caused by: java.lang.LinkageError: duplicate class definition: 
 hudson/model/Descriptor
        at java.lang.ClassLoader.defineClass1(Native Method)
        at java.lang.ClassLoader.defineClass(ClassLoader.java:621)
        at java.lang.ClassLoader.defineClass(ClassLoader.java:466)
        at 
 hudson.remoting.RemoteClassLoader.loadClassFile(RemoteClassLoader.java:151)
        at 
 hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:131)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
        at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
        at java.lang.Class.getDeclaredFields0(Native Method)
        at java.lang.Class.privateGetDeclaredFields(Class.java:2259)
        at java.lang.Class.getDeclaredField(Class.java:1852)
        at 
 java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1582)
        at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:52)
        at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:408)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.io.ObjectStreamClass.init(ObjectStreamClass.java:400)
        at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:297)
        at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:531)
        at 
 java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1552)
        at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1466)
        at 
 java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1552)
        at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1466)
        at 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1699)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
        at 
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1910)
        at 
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1834)
        at 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
        at 
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1910)
        at 
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1834)
        at 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
        at 
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1910)
        at 
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1834)
        at 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
        at hudson.remoting.UserRequest.deserialize(UserRequest.java:182)
        at hudson.remoting.UserRequest.perform(UserRequest.java:98)
        at hudson.remoting.UserRequest.perform(UserRequest.java:48)
        at hudson.remoting.Request$2.run(Request.java:287)
        at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:417)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269

Re: Providing a list of FAQ's with every new subscribe request

2011-10-03 Thread Sami Siren
On Mon, Oct 3, 2011 at 3:48 PM, lewis john mcgibbney 
lewis.mcgibb...@gmail.com wrote:


 Would it be possible to send out a list of our official FAQ's when a new
 user confirms their subscription to both user@ and dev@ lists.


It seems this is possible. Can you craft a piece of text you would like to
be sent out on successful subscribe and I'll try to set it up.

This is the full list of files ezmlm lists as editable, just in case if
someone comes up with something else to customize:

FileUse

bottom  bottom of all responses. General command info.
digest  'administrivia' section of digests.
faq frequently asked questions specific to this list.
get_bad in place of messages not found in the archive.
helpgeneral help (between 'top' and 'bottom').
infolist info. First line should be meaningful on its own.
mod_helpspecific help for list moderators.
mod_reject  to sender of rejected post.
mod_request to message moderators together with post.
mod_sub to subscriber after moderator confirmed subscribe.
mod_sub_confirm to subscription mod to request subscribe confirm.
mod_timeout to sender of timed-out post.
mod_unsub_confirm   to remote admin to request unsubscribe confirm.
sub_bad to subscriber if confirm was bad.
sub_confirm to subscriber to request subscribe confirm.
sub_nop to subscriber after re-subscription.
sub_ok  to subscriber after successful subscription.
top top of all responses.
unsub_bad   to subscriber if unsubscribe confirm was bad.
unsub_confirm   to subscriber to request unsubscribe confirm.
unsub_nop   to non-subscriber after unsubscribe.
unsub_okto ex-subscriber after successful unsubscribe.

--
 Sami Siren


[jira] [Commented] (NUTCH-1091) Remove commons logging dependency from Nutch branch and trunk

2011-09-29 Thread Sami Siren (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13117862#comment-13117862
 ] 

Sami Siren commented on NUTCH-1091:
---

+1, go ahead Lewis

 Remove commons logging dependency from Nutch branch and trunk
 -

 Key: NUTCH-1091
 URL: https://issues.apache.org/jira/browse/NUTCH-1091
 Project: Nutch
  Issue Type: Improvement
  Components: build
Affects Versions: 1.4, nutchgora
 Environment: Ubuntu 11.04 (natty)
 Kernel Linux 2.6.38-11-generic
 GNOME 2.32.1
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
Priority: Minor
 Fix For: 1.4, nutchgora

 Attachments: NUTCH-1091-branch-1.4-20110923.patch


 Once all logging has been shifted to slf4j with log4j backend as per 
 NUTCH-1078, we should deprecate the ivy dependency on common logging. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Providing a list of FAQ's with every new subscribe request

2011-09-27 Thread Sami Siren
I am getting moderation emails and I think that there's somebody else doing
moderation too since the messages get sent to the list without me accepting
them.

I would like to step down from the moderator status and have someone else do
moderation instead, because frankly I have not been doing a great job with
it. Any volunteers?

--
 Sami Siren

On Tue, Sep 27, 2011 at 12:09 AM, Julien Nioche 
lists.digitalpeb...@gmail.com wrote:

 We don't have moderators for the user and dev lists


 On 26 September 2011 20:09, lewis john mcgibbney 
 lewis.mcgibb...@gmail.com wrote:

 Thanks Markus,

 Who is mailing list moderator? If I can get this info before trying to
 contact infra it would be great.


 On Mon, Sep 26, 2011 at 7:37 PM, Markus Jelsma 
 markus.jel...@openindex.io wrote:

 SOunds like a good idea. I think you need to be ML moderator to make
 changes

 http://www.apache.org/dev/committers.html#mail-moderate

  Hi,
 
  I just signed up to the JUnit users lists and received a really well
  documented FAQ accompaniment when I subscribed. I think this would be a
  great resource for new Nutch users. Does anyone agree/disagree? How do
 we
  go about configuring this? Is this a request for the infra team?
 
  Thank you




 --
 *Lewis*




 --
 *
 *Open Source Solutions for Text Engineering

 http://digitalpebble.blogspot.com/
 http://www.digitalpebble.com



Re: Providing a list of FAQ's with every new subscribe request

2011-09-27 Thread Sami Siren
I think moderators can be changed by filing a jira issue (by one of the PMC
members) to the infra project, for example see
https://issues.apache.org/jira/browse/INFRA-3511

Moderation is a simple task you just let good messages (usually|only coming
from non subscribed senders) through and forget abut the rest.

Julien: I am pretty sure I am still a moderator at dev  user - I just tried
some of the moderator commands and they were successful.
--
 Sami Siren

On Tue, Sep 27, 2011 at 9:32 PM, lewis john mcgibbney 
lewis.mcgibb...@gmail.com wrote:

 Hi Sami,

 Who is it that we are supposed to speak to regarding moderation. I tried to
 contact the infra@ team but still awaiting reply.

 What in included in moderation? I'm completely foreign to all of this, and
 as Julien stated I was not aware that there was anyone directly linked to
 Nutch list moderation. The info on the apache developers area is pretty
 vague and I haven't been able to get much further with this.

 Thanks


 On Tue, Sep 27, 2011 at 6:33 PM, Sami Siren ssi...@gmail.com wrote:

 I am getting moderation emails and I think that there's somebody else
 doing moderation too since the messages get sent to the list without me
 accepting them.

 I would like to step down from the moderator status and have someone else
 do moderation instead, because frankly I have not been doing a great job
 with it. Any volunteers?

 --
  Sami Siren


 On Tue, Sep 27, 2011 at 12:09 AM, Julien Nioche 
 lists.digitalpeb...@gmail.com wrote:

 We don't have moderators for the user and dev lists


 On 26 September 2011 20:09, lewis john mcgibbney 
 lewis.mcgibb...@gmail.com wrote:

 Thanks Markus,

 Who is mailing list moderator? If I can get this info before trying to
 contact infra it would be great.


 On Mon, Sep 26, 2011 at 7:37 PM, Markus Jelsma 
 markus.jel...@openindex.io wrote:

 SOunds like a good idea. I think you need to be ML moderator to make
 changes

 http://www.apache.org/dev/committers.html#mail-moderate

  Hi,
 
  I just signed up to the JUnit users lists and received a really well
  documented FAQ accompaniment when I subscribed. I think this would be
 a
  great resource for new Nutch users. Does anyone agree/disagree? How
 do we
  go about configuring this? Is this a request for the infra team?
 
  Thank you




 --
 *Lewis*




 --
 *
 *Open Source Solutions for Text Engineering

 http://digitalpebble.blogspot.com/
 http://www.digitalpebble.com





 --
 *Lewis*




Re: Providing a list of FAQ's with every new subscribe request

2011-09-27 Thread Sami Siren
I think moderators can be changed by filing a jira issue (by one of the PMC
members) to the infra project, for example see
https://issues.apache.org/jira/browse/INFRA-3511

Moderation is a simple task you just let good messages (usually|only coming
from non subscribed senders) through and forget abut the rest.

Julien: I am pretty sure I am still a moderator at dev  user - i just tried
some of the moderator commands and they were successful.
--
 Sami Siren

On Tue, Sep 27, 2011 at 9:32 PM, lewis john mcgibbney 
lewis.mcgibb...@gmail.com wrote:

 Hi Sami,

 Who is it that we are supposed to speak to regarding moderation. I tried to
 contact the infra@ team but still awaiting reply.

 What in included in moderation? I'm completely foreign to all of this, and
 as Julien stated I was not aware that there was anyone directly linked to
 Nutch list moderation. The info on the apache developers area is pretty
 vague and I haven't been able to get much further with this.

 Thanks


 On Tue, Sep 27, 2011 at 6:33 PM, Sami Siren ssi...@gmail.com wrote:

 I am getting moderation emails and I think that there's somebody else
 doing moderation too since the messages get sent to the list without me
 accepting them.

 I would like to step down from the moderator status and have someone else
 do moderation instead, because frankly I have not been doing a great job
 with it. Any volunteers?

 --
  Sami Siren


 On Tue, Sep 27, 2011 at 12:09 AM, Julien Nioche 
 lists.digitalpeb...@gmail.com wrote:

 We don't have moderators for the user and dev lists


 On 26 September 2011 20:09, lewis john mcgibbney 
 lewis.mcgibb...@gmail.com wrote:

 Thanks Markus,

 Who is mailing list moderator? If I can get this info before trying to
 contact infra it would be great.


 On Mon, Sep 26, 2011 at 7:37 PM, Markus Jelsma 
 markus.jel...@openindex.io wrote:

 SOunds like a good idea. I think you need to be ML moderator to make
 changes

 http://www.apache.org/dev/committers.html#mail-moderate

  Hi,
 
  I just signed up to the JUnit users lists and received a really well
  documented FAQ accompaniment when I subscribed. I think this would be
 a
  great resource for new Nutch users. Does anyone agree/disagree? How
 do we
  go about configuring this? Is this a request for the infra team?
 
  Thank you




 --
 *Lewis*




 --
 *
 *Open Source Solutions for Text Engineering

 http://digitalpebble.blogspot.com/
 http://www.digitalpebble.com





 --
 *Lewis*




[jira] [Commented] (NUTCH-657) Estonian N-gram profile has wrong name

2011-09-24 Thread Sami Siren (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13114000#comment-13114000
 ] 

Sami Siren commented on NUTCH-657:
--

bq.  I'm here to ask again if we can mark this as won't fix for the 1.4 current 
trunk?

Sounds like this is the right thing to do if/when tika lang-id is used. I think 
the lang-id component in Tika lost some of it's accuracy when it got moved from 
Nutch to Tika but I think it makes most sense to build on top of that and 
improve the one in Tika instead of having something special in Nutch.

 Estonian N-gram profile has wrong name
 --

 Key: NUTCH-657
 URL: https://issues.apache.org/jira/browse/NUTCH-657
 Project: Nutch
  Issue Type: Bug
Affects Versions: 0.8.1, 0.9.0
Reporter: Jonathan Young
Priority: Trivial

 The Nutch language identifier plugin contains an ngram profile, ee.ngp, in 
 src/plugin/languageidentifier/src/java/org/apache/nutch/analysis/lang .  ee 
 is the ISO-3166-1-alpha-2 code for Estonia (see 
 http://www.iso.org/iso/country_codes/iso_3166_code_lists/english_country_names_and_code_elements.htm),
  but it is the ISO-639-2 code for Ewe (see 
 http://www.loc.gov/standards/iso639-2/php/English_list.php).  et is the 
 ISO-639-2 code for Estonian, and the language profile in ee.ngp is clearly 
 Estonian.
 Proposed solution: rename ee.ngp to et.ngp .

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (NUTCH-1078) Upgrade all instances of commons logging to slf4j (with log4j backend)

2011-09-20 Thread Sami Siren (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13108975#comment-13108975
 ] 

Sami Siren commented on NUTCH-1078:
---

Oh, ok. I didn't realize there was another issue open about removing those.

 Upgrade all instances of commons logging to slf4j (with log4j backend)
 --

 Key: NUTCH-1078
 URL: https://issues.apache.org/jira/browse/NUTCH-1078
 Project: Nutch
  Issue Type: Improvement
Affects Versions: 1.4
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
Priority: Minor
 Fix For: 1.4

 Attachments: NUTCH-1078-branch-1.4-20110816.patch, 
 NUTCH-1078-branch-1.4-20110824-v2.patch, 
 NUTCH-1078-branch-1.4-20110911-v3.patch, 
 NUTCH-1078-branch-1.4-20110916-v4.patch


 Whilst working on another issue, I noticed that some classes still import and 
 use commons logging for example HttpBase.java
 {code}
 import java.util.*;
 // Commons Logging imports
 import org.apache.commons.logging.Log;
 import org.apache.commons.logging.LogFactory;
 // Nutch imports
 import org.apache.nutch.crawl.CrawlDatum;
 {code}
 At this stage I am unsure how many (if any others) still import and reply 
 upon commons logging, however they should be upgraded to slf4j for branch-1.4.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]

2011-09-15 Thread Sami Siren
On Thu, Sep 15, 2011 at 9:55 PM, Markus Jelsma
markus.jel...@openindex.io wrote:
 There are many things i can write about this topic right now but don't feel
 it's neccessary. The choice is difficult and perhaps painful but when the
 voting round is opened by our project lead, i will vote for promoting 1.x back
 to trunk.

+1, Same here

--
 Sami Siren


Re: Nutch 2.0

2010-06-28 Thread Sami Siren

On 06/28/2010 10:10 AM, Andrzej Bialecki wrote:

On 2010-06-28 07:49, Sami Siren wrote:

One aspect that has not been discussed yet is the legal aspect.
According to http://incubator.apache.org/ip-clearance/index.html
there is a formal process for integrating externally development
efforts that have happened outside of Apache. Should we be
following the ip clearance process in this case too?


The concept of a substantial contribution that should be subject to
a software grant is somewhat tenuous, though. Keep in mind that you
do something equivalent in JIRA already - when you check the Grant
license to ASF box you perform a micro-grant. So the question is
whether we should go through a full grant or through the JIRA
micro-grant.

In my opinion it's ok to do the latter, since much of the code is
simply a modified version of Nutch classes - not counting GORA, of
course, but that part will be added as a third-party lib. So IMHO
it's enough to zip all source (without libs), attach it to a JIRA
issue and mark the checkbox. Then we follow the process outlined by
Chris, which imports the same codebase into our svn. What do you
think?


I do not know what is the right approach, that's why I asked the
question. Also I have not looked at the donation but the following
comment made me think it might fall into substantial category:


There has been an enormous amount of changes between the nutchbase
branch and the version on GitHub - pretty much EVERY class has been
modified + a lot of classes have been removed etc...



If folks agree that this is sufficient, then Dogacan  Enis - can
you please create a separate JIRA issue, prepare a patch like this,
mark the checkbox, and list all dependencies and their licenses for
those that are not already in Nutch svn?


This would be a good thing to do in any case. It would help to
understand what the donation is about and also help to decide which process
(if any) needs to be followed.

--
 Sami Siren