Re: [VOTE] Apache Nutch 1.5 release rc #1
On Mon, Apr 16, 2012 at 8:43 AM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: Hi Folks, A candidate for the Nutch 1.5 release is available at: http://people.apache.org/~mattmann/apache-nutch-1.5/rc1/ The release candidate is a zip and tar.gz archive of the sources in: http://svn.apache.org/repos/asf/nutch/tags/release-1.5/ And a binary build suitable for deployment. A staged Maven repository is available here: https://repository.apache.org/content/repositories/orgapachenutch-054/ Please vote on releasing this package as Apache Nutch 1.5. The vote is open for the next 72 hours and passes if a majority of at least three +1 Nutch PMC votes are cast. [ ] +1 Release this package as Apache Nutch 1.5 [ ] -1 Do not release this package because... The basics are good: md5 and sha1 checksums for apache-nutch-1.5-bin.tar.gz and apache-nutch-1.5-src.tar.gz match ant clean test completes succesfully for the source package completed a simple crawl with local mode and a small hadoop 1.0.2 cluster by using the artifacts in the binary package but it seems there are some license headers missing from source files: [rat:report] ==/home/sam/nutch/apache-nutch-1.5/src/java/org/apache/nutch/indexer/IndexingFiltersChecker.java [rat:report] ==/home/sam/nutch/apache-nutch-1.5/src/plugin/creativecommons/src/web/web.xml [rat:report] ==/home/sam/nutch/apache-nutch-1.5/src/plugin/protocol-httpclient/src/test/conf/httpclient-auth-test.xml [rat:report] ==/home/sam/nutch/apache-nutch-1.5/src/plugin/protocol-httpclient/src/test/conf/nutch-site-test.xml -1 because of missing license headers -- Sami Siren
Re: Providing a list of FAQ's with every new subscribe request
The sub_ok files for user dev have now been updtaed with the text you provided. -- Sami Siren
Re: Build failed in Jenkins: Nutch-trunk #1624
The Nutch nightly builds seem to fail awfully often with some strange errors that are not really related to the build itself. Is someone trying to figure out what's going on? Is it perhaps a config issue or something else that could be easily remedied? -- Sami Siren On Wed, Oct 5, 2011 at 7:09 AM, Apache Jenkins Server jenk...@builds.apache.org wrote: See https://builds.apache.org/job/Nutch-trunk/1624/ -- Started by timer Building remotely on solaris1 hudson.util.IOException2: remote file operation failed: https://builds.apache.org/job/Nutch-trunk/ws/ at hudson.remoting.Channel@2a6dafee:solaris1 at hudson.FilePath.act(FilePath.java:754) at hudson.FilePath.act(FilePath.java:740) at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:731) at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:676) at hudson.model.AbstractProject.checkout(AbstractProject.java:1193) at hudson.model.AbstractBuild$AbstractRunner.checkout(AbstractBuild.java:566) at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:454) at hudson.model.Run.run(Run.java:1376) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:230) Caused by: java.io.IOException: Remote call on solaris1 failed at hudson.remoting.Channel.call(Channel.java:690) at hudson.FilePath.act(FilePath.java:747) ... 10 more Caused by: java.lang.LinkageError: duplicate class definition: hudson/model/Descriptor at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:621) at java.lang.ClassLoader.defineClass(ClassLoader.java:466) at hudson.remoting.RemoteClassLoader.loadClassFile(RemoteClassLoader.java:151) at hudson.remoting.RemoteClassLoader.findClass(RemoteClassLoader.java:131) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at java.lang.ClassLoader.loadClass(ClassLoader.java:252) at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320) at java.lang.Class.getDeclaredFields0(Native Method) at java.lang.Class.privateGetDeclaredFields(Class.java:2259) at java.lang.Class.getDeclaredField(Class.java:1852) at java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1582) at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:52) at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:408) at java.security.AccessController.doPrivileged(Native Method) at java.io.ObjectStreamClass.init(ObjectStreamClass.java:400) at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:297) at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:531) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1552) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1466) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1552) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1466) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1699) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1910) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1834) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1910) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1834) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1910) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1834) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348) at hudson.remoting.UserRequest.deserialize(UserRequest.java:182) at hudson.remoting.UserRequest.perform(UserRequest.java:98) at hudson.remoting.UserRequest.perform(UserRequest.java:48) at hudson.remoting.Request$2.run(Request.java:287) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:417) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269
Re: Providing a list of FAQ's with every new subscribe request
On Mon, Oct 3, 2011 at 3:48 PM, lewis john mcgibbney lewis.mcgibb...@gmail.com wrote: Would it be possible to send out a list of our official FAQ's when a new user confirms their subscription to both user@ and dev@ lists. It seems this is possible. Can you craft a piece of text you would like to be sent out on successful subscribe and I'll try to set it up. This is the full list of files ezmlm lists as editable, just in case if someone comes up with something else to customize: FileUse bottom bottom of all responses. General command info. digest 'administrivia' section of digests. faq frequently asked questions specific to this list. get_bad in place of messages not found in the archive. helpgeneral help (between 'top' and 'bottom'). infolist info. First line should be meaningful on its own. mod_helpspecific help for list moderators. mod_reject to sender of rejected post. mod_request to message moderators together with post. mod_sub to subscriber after moderator confirmed subscribe. mod_sub_confirm to subscription mod to request subscribe confirm. mod_timeout to sender of timed-out post. mod_unsub_confirm to remote admin to request unsubscribe confirm. sub_bad to subscriber if confirm was bad. sub_confirm to subscriber to request subscribe confirm. sub_nop to subscriber after re-subscription. sub_ok to subscriber after successful subscription. top top of all responses. unsub_bad to subscriber if unsubscribe confirm was bad. unsub_confirm to subscriber to request unsubscribe confirm. unsub_nop to non-subscriber after unsubscribe. unsub_okto ex-subscriber after successful unsubscribe. -- Sami Siren
[jira] [Commented] (NUTCH-1091) Remove commons logging dependency from Nutch branch and trunk
[ https://issues.apache.org/jira/browse/NUTCH-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13117862#comment-13117862 ] Sami Siren commented on NUTCH-1091: --- +1, go ahead Lewis Remove commons logging dependency from Nutch branch and trunk - Key: NUTCH-1091 URL: https://issues.apache.org/jira/browse/NUTCH-1091 Project: Nutch Issue Type: Improvement Components: build Affects Versions: 1.4, nutchgora Environment: Ubuntu 11.04 (natty) Kernel Linux 2.6.38-11-generic GNOME 2.32.1 Reporter: Lewis John McGibbney Assignee: Lewis John McGibbney Priority: Minor Fix For: 1.4, nutchgora Attachments: NUTCH-1091-branch-1.4-20110923.patch Once all logging has been shifted to slf4j with log4j backend as per NUTCH-1078, we should deprecate the ivy dependency on common logging. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Providing a list of FAQ's with every new subscribe request
I am getting moderation emails and I think that there's somebody else doing moderation too since the messages get sent to the list without me accepting them. I would like to step down from the moderator status and have someone else do moderation instead, because frankly I have not been doing a great job with it. Any volunteers? -- Sami Siren On Tue, Sep 27, 2011 at 12:09 AM, Julien Nioche lists.digitalpeb...@gmail.com wrote: We don't have moderators for the user and dev lists On 26 September 2011 20:09, lewis john mcgibbney lewis.mcgibb...@gmail.com wrote: Thanks Markus, Who is mailing list moderator? If I can get this info before trying to contact infra it would be great. On Mon, Sep 26, 2011 at 7:37 PM, Markus Jelsma markus.jel...@openindex.io wrote: SOunds like a good idea. I think you need to be ML moderator to make changes http://www.apache.org/dev/committers.html#mail-moderate Hi, I just signed up to the JUnit users lists and received a really well documented FAQ accompaniment when I subscribed. I think this would be a great resource for new Nutch users. Does anyone agree/disagree? How do we go about configuring this? Is this a request for the infra team? Thank you -- *Lewis* -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com
Re: Providing a list of FAQ's with every new subscribe request
I think moderators can be changed by filing a jira issue (by one of the PMC members) to the infra project, for example see https://issues.apache.org/jira/browse/INFRA-3511 Moderation is a simple task you just let good messages (usually|only coming from non subscribed senders) through and forget abut the rest. Julien: I am pretty sure I am still a moderator at dev user - I just tried some of the moderator commands and they were successful. -- Sami Siren On Tue, Sep 27, 2011 at 9:32 PM, lewis john mcgibbney lewis.mcgibb...@gmail.com wrote: Hi Sami, Who is it that we are supposed to speak to regarding moderation. I tried to contact the infra@ team but still awaiting reply. What in included in moderation? I'm completely foreign to all of this, and as Julien stated I was not aware that there was anyone directly linked to Nutch list moderation. The info on the apache developers area is pretty vague and I haven't been able to get much further with this. Thanks On Tue, Sep 27, 2011 at 6:33 PM, Sami Siren ssi...@gmail.com wrote: I am getting moderation emails and I think that there's somebody else doing moderation too since the messages get sent to the list without me accepting them. I would like to step down from the moderator status and have someone else do moderation instead, because frankly I have not been doing a great job with it. Any volunteers? -- Sami Siren On Tue, Sep 27, 2011 at 12:09 AM, Julien Nioche lists.digitalpeb...@gmail.com wrote: We don't have moderators for the user and dev lists On 26 September 2011 20:09, lewis john mcgibbney lewis.mcgibb...@gmail.com wrote: Thanks Markus, Who is mailing list moderator? If I can get this info before trying to contact infra it would be great. On Mon, Sep 26, 2011 at 7:37 PM, Markus Jelsma markus.jel...@openindex.io wrote: SOunds like a good idea. I think you need to be ML moderator to make changes http://www.apache.org/dev/committers.html#mail-moderate Hi, I just signed up to the JUnit users lists and received a really well documented FAQ accompaniment when I subscribed. I think this would be a great resource for new Nutch users. Does anyone agree/disagree? How do we go about configuring this? Is this a request for the infra team? Thank you -- *Lewis* -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com -- *Lewis*
Re: Providing a list of FAQ's with every new subscribe request
I think moderators can be changed by filing a jira issue (by one of the PMC members) to the infra project, for example see https://issues.apache.org/jira/browse/INFRA-3511 Moderation is a simple task you just let good messages (usually|only coming from non subscribed senders) through and forget abut the rest. Julien: I am pretty sure I am still a moderator at dev user - i just tried some of the moderator commands and they were successful. -- Sami Siren On Tue, Sep 27, 2011 at 9:32 PM, lewis john mcgibbney lewis.mcgibb...@gmail.com wrote: Hi Sami, Who is it that we are supposed to speak to regarding moderation. I tried to contact the infra@ team but still awaiting reply. What in included in moderation? I'm completely foreign to all of this, and as Julien stated I was not aware that there was anyone directly linked to Nutch list moderation. The info on the apache developers area is pretty vague and I haven't been able to get much further with this. Thanks On Tue, Sep 27, 2011 at 6:33 PM, Sami Siren ssi...@gmail.com wrote: I am getting moderation emails and I think that there's somebody else doing moderation too since the messages get sent to the list without me accepting them. I would like to step down from the moderator status and have someone else do moderation instead, because frankly I have not been doing a great job with it. Any volunteers? -- Sami Siren On Tue, Sep 27, 2011 at 12:09 AM, Julien Nioche lists.digitalpeb...@gmail.com wrote: We don't have moderators for the user and dev lists On 26 September 2011 20:09, lewis john mcgibbney lewis.mcgibb...@gmail.com wrote: Thanks Markus, Who is mailing list moderator? If I can get this info before trying to contact infra it would be great. On Mon, Sep 26, 2011 at 7:37 PM, Markus Jelsma markus.jel...@openindex.io wrote: SOunds like a good idea. I think you need to be ML moderator to make changes http://www.apache.org/dev/committers.html#mail-moderate Hi, I just signed up to the JUnit users lists and received a really well documented FAQ accompaniment when I subscribed. I think this would be a great resource for new Nutch users. Does anyone agree/disagree? How do we go about configuring this? Is this a request for the infra team? Thank you -- *Lewis* -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com -- *Lewis*
[jira] [Commented] (NUTCH-657) Estonian N-gram profile has wrong name
[ https://issues.apache.org/jira/browse/NUTCH-657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13114000#comment-13114000 ] Sami Siren commented on NUTCH-657: -- bq. I'm here to ask again if we can mark this as won't fix for the 1.4 current trunk? Sounds like this is the right thing to do if/when tika lang-id is used. I think the lang-id component in Tika lost some of it's accuracy when it got moved from Nutch to Tika but I think it makes most sense to build on top of that and improve the one in Tika instead of having something special in Nutch. Estonian N-gram profile has wrong name -- Key: NUTCH-657 URL: https://issues.apache.org/jira/browse/NUTCH-657 Project: Nutch Issue Type: Bug Affects Versions: 0.8.1, 0.9.0 Reporter: Jonathan Young Priority: Trivial The Nutch language identifier plugin contains an ngram profile, ee.ngp, in src/plugin/languageidentifier/src/java/org/apache/nutch/analysis/lang . ee is the ISO-3166-1-alpha-2 code for Estonia (see http://www.iso.org/iso/country_codes/iso_3166_code_lists/english_country_names_and_code_elements.htm), but it is the ISO-639-2 code for Ewe (see http://www.loc.gov/standards/iso639-2/php/English_list.php). et is the ISO-639-2 code for Estonian, and the language profile in ee.ngp is clearly Estonian. Proposed solution: rename ee.ngp to et.ngp . -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1078) Upgrade all instances of commons logging to slf4j (with log4j backend)
[ https://issues.apache.org/jira/browse/NUTCH-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13108975#comment-13108975 ] Sami Siren commented on NUTCH-1078: --- Oh, ok. I didn't realize there was another issue open about removing those. Upgrade all instances of commons logging to slf4j (with log4j backend) -- Key: NUTCH-1078 URL: https://issues.apache.org/jira/browse/NUTCH-1078 Project: Nutch Issue Type: Improvement Affects Versions: 1.4 Reporter: Lewis John McGibbney Assignee: Lewis John McGibbney Priority: Minor Fix For: 1.4 Attachments: NUTCH-1078-branch-1.4-20110816.patch, NUTCH-1078-branch-1.4-20110824-v2.patch, NUTCH-1078-branch-1.4-20110911-v3.patch, NUTCH-1078-branch-1.4-20110916-v4.patch Whilst working on another issue, I noticed that some classes still import and use commons logging for example HttpBase.java {code} import java.util.*; // Commons Logging imports import org.apache.commons.logging.Log; import org.apache.commons.logging.LogFactory; // Nutch imports import org.apache.nutch.crawl.CrawlDatum; {code} At this stage I am unsure how many (if any others) still import and reply upon commons logging, however they should be upgraded to slf4j for branch-1.4. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]
On Thu, Sep 15, 2011 at 9:55 PM, Markus Jelsma markus.jel...@openindex.io wrote: There are many things i can write about this topic right now but don't feel it's neccessary. The choice is difficult and perhaps painful but when the voting round is opened by our project lead, i will vote for promoting 1.x back to trunk. +1, Same here -- Sami Siren
Re: Nutch 2.0
On 06/28/2010 10:10 AM, Andrzej Bialecki wrote: On 2010-06-28 07:49, Sami Siren wrote: One aspect that has not been discussed yet is the legal aspect. According to http://incubator.apache.org/ip-clearance/index.html there is a formal process for integrating externally development efforts that have happened outside of Apache. Should we be following the ip clearance process in this case too? The concept of a substantial contribution that should be subject to a software grant is somewhat tenuous, though. Keep in mind that you do something equivalent in JIRA already - when you check the Grant license to ASF box you perform a micro-grant. So the question is whether we should go through a full grant or through the JIRA micro-grant. In my opinion it's ok to do the latter, since much of the code is simply a modified version of Nutch classes - not counting GORA, of course, but that part will be added as a third-party lib. So IMHO it's enough to zip all source (without libs), attach it to a JIRA issue and mark the checkbox. Then we follow the process outlined by Chris, which imports the same codebase into our svn. What do you think? I do not know what is the right approach, that's why I asked the question. Also I have not looked at the donation but the following comment made me think it might fall into substantial category: There has been an enormous amount of changes between the nutchbase branch and the version on GitHub - pretty much EVERY class has been modified + a lot of classes have been removed etc... If folks agree that this is sufficient, then Dogacan Enis - can you please create a separate JIRA issue, prepare a patch like this, mark the checkbox, and list all dependencies and their licenses for those that are not already in Nutch svn? This would be a good thing to do in any case. It would help to understand what the donation is about and also help to decide which process (if any) needs to be followed. -- Sami Siren