[ 
https://issues.apache.org/jira/browse/LUCENE-590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-590:
-------------------------------

    Attachment: LUCENE-590.patch

here's a patch with a test... we dont even need to substring the summary...
the title is never added to the summary.


> Demo HTML parser gives incorrect summaries when title is repeated as a heading
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-590
>                 URL: https://issues.apache.org/jira/browse/LUCENE-590
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Examples
>    Affects Versions: 2.0.0
>            Reporter: Curtis d'Entremont
>            Priority: Minor
>         Attachments: LUCENE-590.patch
>
>
> If you have an html document where the title is repeated as a heading at the 
> top of the document, the HTMLParser will return the title as the summary, 
> ignoring everything else that was added to the summary. Instead, it should 
> keep the rest of the summary and chop off the title part at the beginning 
> (essentially the opposite). I don't see any benefit to repeating the title in 
> the summary for any case.
> In HTMLParser.jj's getSummary():
>     String sum = summary.toString().trim();
>     String tit = getTitle();
>     if (sum.startsWith(tit) || sum.equals(""))
>       return tit;
>     else
>       return sum;
> change it to: (* denotes a line that has changed)
>     String sum = summary.toString().trim();
>     String tit = getTitle();
> *    if (sum.startsWith(tit))             // don't repeat title in summary
> *      return sum.substring(tit.length()).trim();
>     else
>       return sum;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to