Stefan Groschupf wrote:
Hi,
I counted the votes manually, I hope I didn't oversee something. I
didn't filter out issues that are 0.8 related, since it is good to
know community wishes anyway. :-)
Shouldn't the period for voting be a bit longer? I didn't have time to
vote yet... Anyway, my take on this:
NUTCH-140 Add alias capability in parse-plugins.xml file that
allows mimeType->extensionId mapping
1
NUTCH-139 Standard metadata property names in the ParseData metadata
2
+1
NUTCH-138 non-Latin-1 characters cannot be submitted for search
1
NUTCH-3 multi values of header discarded
1
+1
NUTCH-134 Summarizer doesn't select the best snippets
1
+1
I have some patches, which use Lucene Highlighter package instead.
NUTCH-98 RobotRulesParser interprets robots.txt incorrectly
1
NUTCH-120 one "bad" link on a page kills parsing
3
NUTCH-127 uncorrect values using -du, or ls does not return items
2
+1
NUTCH-126 Fetching via https does not work with a proxy (patch)
1
NUTCH-125 OpenOffice Parser plugin
2
+1. Ready to commit, I'll do it tomorrow.
NUTCH-110 OpenSearchServlet outputs illegal xml characters
1
NUTCH-36 Chinese in Nutch
1
NUTCH-123 Cache.jsp some times generate NullPointerException
1
NUTCH-121 SegmentReader for mapred
2
Nearly ready to commit, I can do it probably by the end of the week.
However, this is valid only for the mapred branch, so it doesn't affect
the release.
NUTCH-119 Regexp to extract outlinks incorrect
1
NUTCH-115 jobtracker.jsp shows too much information
1
NUTCH-108 tasktracker crashs when reconnecting to a new jobtracker.
1
NUTCH-113 Disable permanent DNS-to-IP caching for JVM 1.4
1
NUTCH-111 ndfs.replication is not documented within the nutch-
default.xml configuration file.
1
NUTCH-100 New plugin urlfilter-db
1
NUTCH-106 Datanode corruption
1
NUTCH-95 DeleteDuplicates depends on the order of input segments
1
+1
NUTCH-92 DistributedSearch incorrectly scores results
2
+1. However, solving this correctly is _hard_ ... it's a very similar
problem to the MultiSearcher in Lucene, and it took that group quite
some time to reach an acceptable solution...
NUTCH-91 empty encoding causes exception
1
NUTCH-52 Parser plugin for MS Excel files
1
NUTCH-74 French Analyzer Plugin
1
NUTCH-64 no results after a restart of a search--server (without
tomcat restart)
1
NUTCH-68 A tool to generate arbitrary fetchlists
1
NUTCH-62 Add html META tag information into metaData in index-more
plugin
1
NUTCH-61 Adaptive re-fetch interval. Detecting umodified content
1
+1. I think this is an important feature. I have some patches, which
need to be updated. However, I wouldn't be so bold as to commit them
just before a release. There are quite a few subtle issues with the
segment handling if you use this.
NUTCH-13 If dns points to 127.0.0.1, the url is also crawled
1
NUTCH-48 "Did you mean" query enhancement/refignment feature request
1
NUTCH-45 Log corrupt segments in SegmentMergeTool
1
NUTCH-24 Cannot handle incorrectly cased Content-Type
1
Isn't this solved already?
NUTCH-16 boost documents matching a url pattern
1
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers