Dennis Kubes wrote:
What does everybody think of trying to do a Nutch 1.0 release in the
next couple of weeks. I have 8 different patches that are ready to be
committed including:
1) NUTCH-647: Resolve URLs tool
2) NUTCH-635: LinkAnalysis Tool for Nutch
3) NUTCH-646: New Indexing framework for Nutch
4) NUTCH-594: Serve Nutch search results in XML and JSON
5) Custom fields on index and plugins
6) Upgrade Nutch to the most recent Hadoop version (18.2).
7) Upgrade Nutch to the most recent Lucene version (2.4).
8) Analysis plugins and improvments to analyzer factory for multiple
languages per analysis plugin. Language identifier.
I am going to try to get those posted in the next couple of days and
committed in the next week. Are there other major improvements we want
to put in before trying to do a 1.0 release for Nutch? Thoughts and
suggestions?
A few recently opened ones that should be easy to fix:
NUTCH-661 errors when the uri contains space characters
NUTCH-657 Estonian N-gram profile has wrong name
NUTCH-652 AdaptiveFetchSchedule#setFetchSchedule doesn't calculate
fetch interval correctly
NUTCH-644 RTF parser doesn't compile anymore
NUTCH-643 ClassCastException in PdfParser on encrypted PDF with
empty password
NUTCH-636 Http client plug-in https doesn't work on IBM JRE
NUTCH-631 MoreIndexingFilter fails with NoSuchElementException
NUTCH-626 fetcher2 breaks out the domain with
db.ignore.external.links set at cross domain redirects
NUTCH-566 Sun's URL class has bug in creation of relative query URLs
NUTCH-542 Null Pointer Exception on getSummary when segment no
longer exists
NUTCH-531 Pages with no ContentType cause a Null Pointer exception
And of course this one:
NUTCH-442 Integrate Solr/Nutch
We should also review all other open issues marked as Blocker / Major,
especially those with patches, and take some action - either fix them,
or won't fix 'em, or postpone to the next release (the single Blocker
issue should be fixed).
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com