Sami Siren wrote:
> Hello,
>
> It has been a while from a previous release (0.8.1) and looking at the
> great fixes done in trunk I'd start thinking about baking a new release
> soon.
>
> Looking at the jira roadmaps there are 1 blocking issues (fixing the
> license headers) for 0.8.2 and two other blocking issues for 0.9.0 of
> which I think NUTCH-233 is safe to put in.
>   

Agreed. The replacement regex mentioned in the original comment seems 
safe enough, and simpler.

> The top 10 voted issues are currently:
>
> NUTCH-61       Adaptive re-fetch interval. Detecting umodified content
>   

Well ... I'm of a split mind on this. I can bring this patch up to date 
and apply it before 0.9.0, if we understand that this is a "0" release 
... ;) Otherwise I'd prefer to wait with it right after the release.

I would like also to proceed with NUTCH-339 (Fetcher2 patches + plus 
some changes I made in the meantime), since I'd like to expose the new 
fetcher to a broader audience, and it doesn't affect the existing 
implementation.


> NUTCH-48      "Did you mean" query enhancement/refignment feature
> NUTCH-251     Administration GUI
> NUTCH-289     CrawlDatum should store IP address
>   

I'm still not entirely convinced about this - and there is already a 
mechanism in place to support it if someone really wishes to keep this 
particular info (CrawlDatum.metaData).

> NUTCH-36      Chinese in Nutch
> NUTCH-185     XMLParser is configurable xml parser plugin.            
> NUTCH-59        meta
> data support in webdb
> NUTCH-92      DistributedSearch incorrectly scores results            
> NUTCH-68        

This is too intrusive to fix just before the release - and needs 
additional discussion.


> NUTCH-68      A
> tool to generate arbitrary fetchlists         

Easy to port this to 0.9.0 - I can do this.


>       NUTCH-87        Efficient
> site-specific crawling for a large number of sites
>   



-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to