Hi all,

The following issues need to be discussed and appropriate action taken before the 0.9 release:

Blocker
========
* NUTCH-400 (Update & add missing license headers) - I believe this is fixed and should be closed

* NUTCH-353 (pages that serverside forwards will be refetched every time) - this was partially fixed in NUTCH-273, but a more complete solution would require significant changes to LinkDb. As there are no patches implementing this, I left it open, but it's no longer as critical as it was before. I propose to move it to "Major" and address it in the next release.

* NUTCH-233 (wrong regular expression hang reduce process for ever) - I propose to apply the fix provided by Sean Dean and close this issue for now.

Critical
========
* NUTCH-436 (Incorrect handling of relative paths when the embedded URL path is empty). There is no patch available yet. If someone could contribute a patch I'd like to see this fixed before the release.

* NUTCH-427 (protocol-smb). This relies on a LGPL library, and it's certainly not critical (as this is an optional new feature). I propose to change it to Major, and make a decision - do we want another plugin like parse-mp3 or parse-rtf, or not.

* NUTCH-381 (Ignore external link not work as expected) - I'll try to reproduce it, and if I find an easy fix I'd like to apply it before the release.

* NUTCH-277 (Fetcher dies because of "max. redirects") - I wasn't able to reproduce it. If there is no updated information on this I propose to close it with "Can't reproduce".

* NUTCH-167 (Observation of <META NAME="ROBOTS" CONTENT="NOARCHIVE">) - there's a patch which I tested in a limited production env. If there are no objections I'd like to apply it before the release.

Major
=====
There are 84 major issues, but some of them are either invalid, or should be "minor", or no longer apply and should be closed. Please review them if you can and provide some comments or recommendations if you think you have some new information.


One decision also that we need to make is which version of Hadoop should be included in the release. Current trunk uses 0.10.1, I have a set of production-tested patches that use 0.11.2, and today the Hadoop team released 0.12.0 (to be followed shortly by a 0.12.1, most likely in time before our release). The most conservative option is to stay with 0.10.1, but by the time people start using Nutch this will be a fairly old version already. I propose to upgrade to 0.11.2. We could use 0.12.1 - but in this case with the expectation that we release less than stable version of Nutch to be soon followed by a minor stable release ...

--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Reply via email to