Stefan Groschupf wrote:
Hi,
I counted the votes manually, I hope I didn't oversee something. I
didn't filter out issues that are 0.8 related, since it is good to
know community wishes anyway. :-)
Shouldn't the period for voting be a bit longer? I didn't have time to
vote yet... Anyway, my take on this:
NUTCH-140 Add alias capability in parse-plugins.xml file that
allows mimeType->extensionId mapping
1
NUTCH-139 Standard metadata property names in the ParseData metadata
2
+1
NUTCH-138 non-Latin-1 characters cannot be submitted for search
1
NUTCH-3 multi values of header discarded
1
+1
NUTCH-134 Summarizer doesn't select the best snippets
1
+1
I have some patches, which use Lucene Highlighter package instead.
NUTCH-98 RobotRulesParser interprets robots.txt incorrectly
1
NUTCH-120 one "bad" link on a page kills parsing
3
NUTCH-127 uncorrect values using -du, or ls does not return items
2
+1
NUTCH-126 Fetching via https does not work with a proxy (patch)
1
NUTCH-125 OpenOffice Parser plugin
2
+1. Ready to commit, I'll do it tomorrow.
NUTCH-110 OpenSearchServlet outputs illegal xml characters
1
NUTCH-36 Chinese in Nutch
1
NUTCH-123 Cache.jsp some times generate NullPointerException
1
NUTCH-121 SegmentReader for mapred
2
Nearly ready to commit, I can do it probably by the end of the week.
However, this is valid only for the mapred branch, so it doesn't affect
the release.
NUTCH-119 Regexp to extract outlinks incorrect
1
NUTCH-115 jobtracker.jsp shows too much information
1
NUTCH-108 tasktracker crashs when reconnecting to a new jobtracker.
1
NUTCH-113 Disable permanent DNS-to-IP caching for JVM 1.4
1
NUTCH-111 ndfs.replication is not documented within the nutch-
default.xml configuration file.
1
NUTCH-100 New plugin urlfilter-db
1
NUTCH-106 Datanode corruption
1
NUTCH-95 DeleteDuplicates depends on the order of input segments
1
+1
NUTCH-92 DistributedSearch incorrectly scores results
2
+1. However, solving this correctly is _hard_ ... it's a very similar
problem to the MultiSearcher in Lucene, and it took that group quite
some time to reach an acceptable solution...
NUTCH-91 empty encoding causes exception
1
NUTCH-52 Parser plugin for MS Excel files
1
NUTCH-74 French Analyzer Plugin
1
NUTCH-64 no results after a restart of a search--server (without
tomcat restart)
1
NUTCH-68 A tool to generate arbitrary fetchlists
1
NUTCH-62 Add html META tag information into metaData in index-more
plugin
1
NUTCH-61 Adaptive re-fetch interval. Detecting umodified content
1
+1. I think this is an important feature. I have some patches, which
need to be updated. However, I wouldn't be so bold as to commit them
just before a release. There are quite a few subtle issues with the
segment handling if you use this.
NUTCH-13 If dns points to 127.0.0.1, the url is also crawled
1
NUTCH-48 "Did you mean" query enhancement/refignment feature request
1
NUTCH-45 Log corrupt segments in SegmentMergeTool
1
NUTCH-24 Cannot handle incorrectly cased Content-Type
1
Isn't this solved already?
NUTCH-16 boost documents matching a url pattern
1
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com