Dennis Kubes wrote:
What does everybody think of trying to do a Nutch 1.0 release in the next couple of weeks. I have 8 different patches that are ready to be committed including:

1) NUTCH-647: Resolve URLs tool
2) NUTCH-635: LinkAnalysis Tool for Nutch
3) NUTCH-646: New Indexing framework for Nutch
4) NUTCH-594: Serve Nutch search results in XML and JSON
5) Custom fields on index and plugins
6) Upgrade Nutch to the most recent Hadoop version (18.2).
7) Upgrade Nutch to the most recent Lucene version (2.4).
8) Analysis plugins and improvments to analyzer factory for multiple languages per analysis plugin. Language identifier.

I am going to try to get those posted in the next couple of days and committed in the next week. Are there other major improvements we want to put in before trying to do a 1.0 release for Nutch? Thoughts and suggestions?

A few recently opened ones that should be easy to fix:

NUTCH-661        errors when the uri contains space characters
NUTCH-657        Estonian N-gram profile has wrong name
NUTCH-652 AdaptiveFetchSchedule#setFetchSchedule doesn't calculate fetch interval correctly
NUTCH-644        RTF parser doesn't compile anymore
NUTCH-643 ClassCastException in PdfParser on encrypted PDF with empty password
NUTCH-636        Http client plug-in https doesn't work on IBM JRE
NUTCH-631        MoreIndexingFilter fails with NoSuchElementException
NUTCH-626 fetcher2 breaks out the domain with db.ignore.external.links set at cross domain redirects
NUTCH-566        Sun's URL class has bug in creation of relative query URLs
NUTCH-542 Null Pointer Exception on getSummary when segment no longer exists
NUTCH-531        Pages with no ContentType cause a Null Pointer exception

And of course this one:

NUTCH-442        Integrate Solr/Nutch


We should also review all other open issues marked as Blocker / Major, especially those with patches, and take some action - either fix them, or won't fix 'em, or postpone to the next release (the single Blocker issue should be fixed).


--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to