[
https://issues.apache.org/jira/browse/LUCENE-590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Muir resolved LUCENE-590.
--------------------------------
Resolution: Fixed
Fix Version/s: 4.0
3.1
Committed revision 1031467, 1031468 (3x)
Thanks Curtis!
> Demo HTML parser gives incorrect summaries when title is repeated as a heading
> ------------------------------------------------------------------------------
>
> Key: LUCENE-590
> URL: https://issues.apache.org/jira/browse/LUCENE-590
> Project: Lucene - Java
> Issue Type: Bug
> Components: Examples
> Affects Versions: 2.0.0
> Reporter: Curtis d'Entremont
> Assignee: Robert Muir
> Priority: Minor
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-590.patch
>
>
> If you have an html document where the title is repeated as a heading at the
> top of the document, the HTMLParser will return the title as the summary,
> ignoring everything else that was added to the summary. Instead, it should
> keep the rest of the summary and chop off the title part at the beginning
> (essentially the opposite). I don't see any benefit to repeating the title in
> the summary for any case.
> In HTMLParser.jj's getSummary():
> String sum = summary.toString().trim();
> String tit = getTitle();
> if (sum.startsWith(tit) || sum.equals(""))
> return tit;
> else
> return sum;
> change it to: (* denotes a line that has changed)
> String sum = summary.toString().trim();
> String tit = getTitle();
> * if (sum.startsWith(tit)) // don't repeat title in summary
> * return sum.substring(tit.length()).trim();
> else
> return sum;
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]