[
http://issues.apache.org/jira/browse/NUTCH-134?page=comments#action_12378387 ]
Dawid Weiss commented on NUTCH-134:
---
(back from holidays, so a bit delayed, but) I confirm Andrzej's suggestion -- a
plain-text only summarized is ideal for clustering for
[
http://issues.apache.org/jira/browse/NUTCH-134?page=comments#action_12378458 ]
Doug Cutting commented on NUTCH-134:
+1 for Summary as Writable and change HitSummarizer.getSummary() to return a
Summary directly rather than a String. I don't think
[
http://issues.apache.org/jira/browse/NUTCH-134?page=comments#action_12378063 ]
Steven Yelton commented on NUTCH-134:
-
Andrzej, my solution to this problem was to fix the comparator to actually
compare the fragments if numFragments() was the same for
[
http://issues.apache.org/jira/browse/NUTCH-134?page=comments#action_12378170 ]
Andrzej Bialecki commented on NUTCH-134:
-
I still prefer Summary as Writable. The reason is that there are users of
Summary that don't want a single String with HTML
[
http://issues.apache.org/jira/browse/NUTCH-134?page=comments#action_12377866 ]
Chris Fellows commented on NUTCH-134:
-
Jerome,
Let me know if you could use a hand in implementation. I'd like to get to know
nutch and lucene code base better for my
[
http://issues.apache.org/jira/browse/NUTCH-134?page=comments#action_12363400 ]
byron miller commented on NUTCH-134:
Thanks Erik, I was able to pull down the highlighter and i'll be loading it up
on mozdex.com to test out over the weekend (1/21/2006).
[
http://issues.apache.org/jira/browse/NUTCH-134?page=comments#action_12361350 ]
byron miller commented on NUTCH-134:
Where is the lucene summarizer from the contrib? i'm not seeing anything
obvious (unless it's under a different name)
Summarizer
[
http://issues.apache.org/jira/browse/NUTCH-134?page=comments#action_12359626 ]
Doug Cutting commented on NUTCH-134:
Can we yet replace Nutch's summarizer with the summarizer in Lucene's contrib
directory? Are there features that Nutch requires that
[
http://issues.apache.org/jira/browse/NUTCH-134?page=comments#action_12359629 ]
Andrzej Bialecki commented on NUTCH-134:
-
I _think_ the Lucene summarizer requires more CPU than this one... but this has
to be checked. I'll work on that.
[
http://issues.apache.org/jira/browse/NUTCH-134?page=comments#action_12359649 ]
byron miller commented on NUTCH-134:
I would take more cpu for better summaries any day :) cpu power is cheaper than
manual intervention!
If any testing is needed, don't
10 matches
Mail list logo