Hi all,

The following issues need to be discussed and appropriate action taken 
before the 0.9 release:

Blocker
========
* NUTCH-400 (Update & add missing license headers) - I believe this is 
fixed and should be closed

* NUTCH-353 (pages that serverside forwards will be refetched every 
time) - this was partially fixed in NUTCH-273, but a more complete 
solution would require significant changes to LinkDb. As there are no 
patches implementing this, I left it open, but it's no longer as 
critical as it was before. I propose to move it to "Major" and address 
it in the next release.

* NUTCH-233 (wrong regular expression hang reduce process for ever) - I 
propose to apply the fix provided by Sean Dean and close this issue for now.

Critical
========
* NUTCH-436 (Incorrect handling of relative paths when the embedded URL 
path is empty). There is no patch available yet. If someone could 
contribute a patch I'd like to see this fixed before the release.

* NUTCH-427 (protocol-smb). This relies on a LGPL library, and it's 
certainly not critical (as this is an optional new feature). I propose 
to change it to Major, and make a decision - do we want another plugin 
like parse-mp3 or parse-rtf, or not.

* NUTCH-381 (Ignore external link not work as expected) - I'll try to 
reproduce it, and if I find an easy fix I'd like to apply it before the 
release.

* NUTCH-277 (Fetcher dies because of "max. redirects") - I wasn't able 
to reproduce it. If there is no updated information on this I propose to 
close it with "Can't reproduce".

* NUTCH-167 (Observation of <META NAME="ROBOTS" CONTENT="NOARCHIVE">) - 
there's a patch which I tested in a limited production env. If there are 
no objections I'd like to apply it before the release.

Major
=====
There are 84 major issues, but some of them are either invalid, or 
should be "minor", or no longer apply and should be closed. Please 
review them if you can and provide some comments or recommendations if 
you think you have some new information.


One decision also that we need to make is which version of Hadoop should 
be included in the release. Current trunk uses 0.10.1, I have a set of 
production-tested patches that use 0.11.2, and today the Hadoop team 
released 0.12.0 (to be followed shortly by a 0.12.1, most likely in time 
before our release). The most conservative option is to stay with 
0.10.1, but by the time people start using Nutch this will be a fairly 
old version already. I propose to upgrade to 0.11.2. We could use 0.12.1 
- but in this case with the expectation that we release less than stable 
version of Nutch to be soon followed by a minor stable release ...

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to