Hi Guys, > Blocker > ======== > * NUTCH-400 (Update & add missing license headers) - I believe this is > fixed and should be closed
+1, thanks to Sami for closing it. > > * NUTCH-353 (pages that serverside forwards will be refetched every > time) - this was partially fixed in NUTCH-273, but a more complete > solution would require significant changes to LinkDb. As there are no > patches implementing this, I left it open, but it's no longer as > critical as it was before. I propose to move it to "Major" and address > it in the next release. +1 > > * NUTCH-233 (wrong regular expression hang reduce process for ever) - I > propose to apply the fix provided by Sean Dean and close this issue for now. +1 > > Critical > ======== > * NUTCH-436 (Incorrect handling of relative paths when the embedded URL > path is empty). There is no patch available yet. If someone could > contribute a patch I'd like to see this fixed before the release. Looks like Dennis is on this one > > * NUTCH-427 (protocol-smb). This relies on a LGPL library, and it's > certainly not critical (as this is an optional new feature). I propose > to change it to Major, and make a decision - do we want another plugin > like parse-mp3 or parse-rtf, or not. Let's hold off on this: it's not necessary for 0.9, and I don't think there's been a bunch of traffic on the list identifying this as critical to get into the sources for the release > > * NUTCH-381 (Ignore external link not work as expected) - I'll try to > reproduce it, and if I find an easy fix I'd like to apply it before the > release. +1 > > * NUTCH-277 (Fetcher dies because of "max. redirects") - I wasn't able > to reproduce it. If there is no updated information on this I propose to > close it with "Can't reproduce". +1, I had to do something similar with NUTCH-258 > > * NUTCH-167 (Observation of <META NAME="ROBOTS" CONTENT="NOARCHIVE">) - > there's a patch which I tested in a limited production env. If there are > no objections I'd like to apply it before the release. +1 > > Major > ===== > There are 84 major issues, but some of them are either invalid, or > should be "minor", or no longer apply and should be closed. Please > review them if you can and provide some comments or recommendations if > you think you have some new information. I will spend some time going through JIRA today and see if there's any issues that I can find that: 1. Have a patch already 2. Sound like something quick, easy, and not so far-reaching across the entire Nutch API > > > One decision also that we need to make is which version of Hadoop should > be included in the release. Current trunk uses 0.10.1, I have a set of > production-tested patches that use 0.11.2, and today the Hadoop team > released 0.12.0 (to be followed shortly by a 0.12.1, most likely in time > before our release). The most conservative option is to stay with > 0.10.1, but by the time people start using Nutch this will be a fairly > old version already. I propose to upgrade to 0.11.2. We could use 0.12.1 > - but in this case with the expectation that we release less than stable > version of Nutch to be soon followed by a minor stable release ... I'd agree with the upgrade to 0.11.2, +1 Cheers, Chris P.S. I am going to contact Pitor and coordinate with him: I'd like to be the release manager for this Nutch release.