Topic-maps of related searchwords
-
Key: NUTCH-294
URL: http://issues.apache.org/jira/browse/NUTCH-294
Project: Nutch
Type: New Feature
Components: searcher
Reporter: Stefan Neufeind
Would it be possible to offer a user
[
http://issues.apache.org/jira/browse/NUTCH-282?page=comments#action_12414435 ]
Stefan Groschupf commented on NUTCH-282:
Is that related to host grouping we discussed? Can we in this case close this
bug?
Showing too few results on a page (Paging
[
http://issues.apache.org/jira/browse/NUTCH-286?page=comments#action_12414439 ]
Stefan Groschupf commented on NUTCH-286:
This is difficult to realize since the http error code is readed from response
in the fetcher and setted into the protocol
[
http://issues.apache.org/jira/browse/NUTCH-292?page=comments#action_12414443 ]
Stefan Groschupf commented on NUTCH-292:
+1, Can someone create a clean patch file?
OpenSearchServlet: OutOfMemoryError: Java heap space
[
http://issues.apache.org/jira/browse/NUTCH-291?page=comments#action_12414445 ]
Stefan Groschupf commented on NUTCH-291:
lastModified will be only indexed if you switch on the index-more plugin.
If you think you should change the way lastmodified
[
http://issues.apache.org/jira/browse/NUTCH-290?page=comments#action_12414448 ]
Stefan Groschupf commented on NUTCH-290:
If a parser throws an exeption:
Fetcher, 261:
try {
parse = this.parseUtil.parse(content);
parseStatus =
[ http://issues.apache.org/jira/browse/NUTCH-292?page=all ]
Stefan Neufeind updated NUTCH-292:
--
Attachment: NUTCH-292-summarizer08.diff
As per demand, here is the patch.
Please note that it has not throughly been testeed by myself. But the patch
[ http://issues.apache.org/jira/browse/NUTCH-287?page=all ]
Stefan Groschupf closed NUTCH-287:
--
Resolution: Won't Fix
http://www.mail-archive.com/nutch-user%40lucene.apache.org/msg04696.html
Exception when searching with sort
[ http://issues.apache.org/jira/browse/NUTCH-284?page=all ]
Stefan Groschupf closed NUTCH-284:
--
Resolution: Won't Fix
Yes, I was missing index-basic.
NullPointerException during index
-
Key: NUTCH-284
[
http://issues.apache.org/jira/browse/NUTCH-284?page=comments#action_12414453 ]
Stefan Groschupf commented on NUTCH-284:
Please try discuss such things first in the user mailing list than open a
issue.
Maintaining the issue tracking is very time
[
http://issues.apache.org/jira/browse/NUTCH-281?page=comments#action_12414454 ]
Stefan Groschupf commented on NUTCH-281:
Can you submit a patch file?
cached.jsp: base-href needs to be outside comments
[
http://issues.apache.org/jira/browse/NUTCH-274?page=comments#action_12414457 ]
Stefan Groschupf commented on NUTCH-274:
Should we fix this in TextInputFormat of Hadoop to ignore emthy lines or in the
Injector?
Empty row in/at end of URL-list
[
http://issues.apache.org/jira/browse/NUTCH-290?page=comments#action_12414458 ]
Stefan Neufeind commented on NUTCH-290:
---
But if one plugin fails in 0.8-dev, isn't the next used? I understand that in
the default-config the text-parser would be used
[
http://issues.apache.org/jira/browse/NUTCH-291?page=comments#action_12414466 ]
Stefan Neufeind commented on NUTCH-291:
---
Which way is most favorable? To always set lastModified although it was not
returned from the webserver (maybe unclean) or
[
http://issues.apache.org/jira/browse/NUTCH-290?page=comments#action_12414469 ]
Stefan Groschupf commented on NUTCH-290:
As far I understand the code, the next parser is only used if the previous
parser return with a unsuccessfully paring status.
[ http://issues.apache.org/jira/browse/NUTCH-286?page=all ]
Stefan Groschupf closed NUTCH-286:
--
Resolution: Won't Fix
I hope everybody agree with the statement: We can not detect http response
codes based on responded html content.
Prune the
[
http://issues.apache.org/jira/browse/NUTCH-290?page=comments#action_12414477 ]
Stefan Neufeind commented on NUTCH-290:
---
But to my understanding of the plugin it still extracts as much as possible
(meta-data) from the PDF. So if indexing is not
More description for fetcher.threads.fetch property
---
Key: NUTCH-295
URL: http://issues.apache.org/jira/browse/NUTCH-295
Project: Nutch
Type: Improvement
Components: fetcher
Versions: 0.8-dev
[ http://issues.apache.org/jira/browse/NUTCH-295?page=all ]
Dennis Kubes updated NUTCH-295:
---
Attachment: fetcher_threads_desc.patch
More description for fetcher.threads.fetch property as relating to running in
distributed mode.
More description for
19 matches
Mail list logo