Re: Issues pending before 0.9 release
P.S. I am going to contact Pitor and coordinate with him: I'd like to be the release manager for this Nutch release. It would be more beneficial to everybody if the discussions (related to release or Nutch) is done on public (hey this is open source!). The off the list stuff IMO smells. -- Sami Siren
Re: Issues pending before 0.9 release
Chris Mattmann wrote: P.S. I am going to contact Pitor and coordinate with him: I'd like to be the release manager for this Nutch release. Everyone heard that? :) That's cool, thanks! -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
Re: Issues pending before 0.9 release
Chris Mattmann wrote: Hi Guys, Blocker * NUTCH-400 (Update & add missing license headers) - I believe this is fixed and should be closed +1, thanks to Sami for closing it. * NUTCH-353 (pages that serverside forwards will be refetched every time) - this was partially fixed in NUTCH-273, but a more complete solution would require significant changes to LinkDb. As there are no patches implementing this, I left it open, but it's no longer as critical as it was before. I propose to move it to "Major" and address it in the next release. +1 * NUTCH-233 (wrong regular expression hang reduce process for ever) - I propose to apply the fix provided by Sean Dean and close this issue for now. +1 Critical * NUTCH-436 (Incorrect handling of relative paths when the embedded URL path is empty). There is no patch available yet. If someone could contribute a patch I'd like to see this fixed before the release. Looks like Dennis is on this one * NUTCH-427 (protocol-smb). This relies on a LGPL library, and it's certainly not critical (as this is an optional new feature). I propose to change it to Major, and make a decision - do we want another plugin like parse-mp3 or parse-rtf, or not. Let's hold off on this: it's not necessary for 0.9, and I don't think there's been a bunch of traffic on the list identifying this as critical to get into the sources for the release * NUTCH-381 (Ignore external link not work as expected) - I'll try to reproduce it, and if I find an easy fix I'd like to apply it before the release. +1 * NUTCH-277 (Fetcher dies because of "max. redirects") - I wasn't able to reproduce it. If there is no updated information on this I propose to close it with "Can't reproduce". +1, I had to do something similar with NUTCH-258 * NUTCH-167 (Observation of ) - there's a patch which I tested in a limited production env. If there are no objections I'd like to apply it before the release. +1 Major = There are 84 major issues, but some of them are either invalid, or should be "minor", or no longer apply and should be closed. Please review them if you can and provide some comments or recommendations if you think you have some new information. I will spend some time going through JIRA today and see if there's any issues that I can find that: 1. Have a patch already 2. Sound like something quick, easy, and not so far-reaching across the entire Nutch API One decision also that we need to make is which version of Hadoop should be included in the release. Current trunk uses 0.10.1, I have a set of production-tested patches that use 0.11.2, and today the Hadoop team released 0.12.0 (to be followed shortly by a 0.12.1, most likely in time before our release). The most conservative option is to stay with 0.10.1, but by the time people start using Nutch this will be a fairly old version already. I propose to upgrade to 0.11.2. We could use 0.12.1 - but in this case with the expectation that we release less than stable version of Nutch to be soon followed by a minor stable release ... I'd agree with the upgrade to 0.11.2, +1 Cheers, Chris P.S. I am going to contact Pitor and coordinate with him: I'd like to be the release manager for this Nutch release. I would like to help with this as well, even if it is just watching how the process works this time. Dennis
Re: Issues pending before 0.9 release
Hi Guys, > Blocker > > * NUTCH-400 (Update & add missing license headers) - I believe this is > fixed and should be closed +1, thanks to Sami for closing it. > > * NUTCH-353 (pages that serverside forwards will be refetched every > time) - this was partially fixed in NUTCH-273, but a more complete > solution would require significant changes to LinkDb. As there are no > patches implementing this, I left it open, but it's no longer as > critical as it was before. I propose to move it to "Major" and address > it in the next release. +1 > > * NUTCH-233 (wrong regular expression hang reduce process for ever) - I > propose to apply the fix provided by Sean Dean and close this issue for now. +1 > > Critical > > * NUTCH-436 (Incorrect handling of relative paths when the embedded URL > path is empty). There is no patch available yet. If someone could > contribute a patch I'd like to see this fixed before the release. Looks like Dennis is on this one > > * NUTCH-427 (protocol-smb). This relies on a LGPL library, and it's > certainly not critical (as this is an optional new feature). I propose > to change it to Major, and make a decision - do we want another plugin > like parse-mp3 or parse-rtf, or not. Let's hold off on this: it's not necessary for 0.9, and I don't think there's been a bunch of traffic on the list identifying this as critical to get into the sources for the release > > * NUTCH-381 (Ignore external link not work as expected) - I'll try to > reproduce it, and if I find an easy fix I'd like to apply it before the > release. +1 > > * NUTCH-277 (Fetcher dies because of "max. redirects") - I wasn't able > to reproduce it. If there is no updated information on this I propose to > close it with "Can't reproduce". +1, I had to do something similar with NUTCH-258 > > * NUTCH-167 (Observation of ) - > there's a patch which I tested in a limited production env. If there are > no objections I'd like to apply it before the release. +1 > > Major > = > There are 84 major issues, but some of them are either invalid, or > should be "minor", or no longer apply and should be closed. Please > review them if you can and provide some comments or recommendations if > you think you have some new information. I will spend some time going through JIRA today and see if there's any issues that I can find that: 1. Have a patch already 2. Sound like something quick, easy, and not so far-reaching across the entire Nutch API > > > One decision also that we need to make is which version of Hadoop should > be included in the release. Current trunk uses 0.10.1, I have a set of > production-tested patches that use 0.11.2, and today the Hadoop team > released 0.12.0 (to be followed shortly by a 0.12.1, most likely in time > before our release). The most conservative option is to stay with > 0.10.1, but by the time people start using Nutch this will be a fairly > old version already. I propose to upgrade to 0.11.2. We could use 0.12.1 > - but in this case with the expectation that we release less than stable > version of Nutch to be soon followed by a minor stable release ... I'd agree with the upgrade to 0.11.2, +1 Cheers, Chris P.S. I am going to contact Pitor and coordinate with him: I'd like to be the release manager for this Nutch release.
Re: java.io.FileNotFoundException: / (Is a directory)
That is a hadoop.log.dir problem value not being set. It is trying to use the DRFA appender to a file and can't find the log directory. Dennis Gal Nitzan wrote: Just installed latest from trunk. I run mergesegs and I get the following error in all tasks log files (I use default log4j.properties): log4j:ERROR setFile(null,true) call failed. java.io.FileNotFoundException: / (Is a directory) at java.io.FileOutputStream.openAppend(Native Method) at java.io.FileOutputStream.(FileOutputStream.java:177) at java.io.FileOutputStream.(FileOutputStream.java:102) at org.apache.log4j.FileAppender.setFile(FileAppender.java:289) at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:163) at org.apache.log4j.DailyRollingFileAppender.activateOptions(DailyRollingFileAp pender.java:215) at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:256) at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:132 ) at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:96) at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.jav a:654) at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.jav a:612) at org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigur ator.java:509) at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java: 415) at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java: 441) at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter. java:468) at org.apache.log4j.LogManager.(LogManager.java:122) at org.apache.log4j.Logger.getLogger(Logger.java:104) at org.apache.commons.logging.impl.Log4JLogger.getLogger(Log4JLogger.java:229) at org.apache.commons.logging.impl.Log4JLogger.(Log4JLogger.java:65) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAcces sorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstruc torAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.ja va:529) at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.ja va:235) at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:370) at org.apache.hadoop.mapred.TaskTracker.(TaskTracker.java:59) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1346) log4j:ERROR Either File or DatePattern options are not set for appender [DRFA].
SSL & Nutch (SecureProtocolSocketFactory)
why DummySSLPtotocolSocketFactory class, in httpclient plugin implements ProtocolSocketFactory & not SecureProtocolSocketFactory ? please help me
SSL & Nutch (SecureProtocolSocketFactory)
- Messaggio inoltrato da [EMAIL PROTECTED] - Data: Mon, 05 Mar 2007 12:02:54 +0100 Da: [EMAIL PROTECTED] Rispondi-A:[EMAIL PROTECTED] Oggetto: Fwd: SSL & Nutch (SecureProtocolSocketFactory) A: nutch-dev@lucene.apache.org why DummySSLPtotocolSocketFactory class, in httpclient plugin implements ProtocolSocketFactory & not SecureProtocolSocketFactory ? please help me - Fine del messaggio inoltrato - This message was sent using IMP at ifc.cnr.it