[ https://issues.apache.org/jira/browse/LUCENE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14047046#comment-14047046 ]
Michael Dodsworth edited comment on LUCENE-4730 at 6/29/14 3:39 PM: -------------------------------------------------------------------- This appears to be a symptom of LUCENE-4984 (fixed in 4.8). The following test fails: {code:java} // note Version.LUCENE_4_7 assertAnalyzesTo(new SmartChineseAnalyzer(Version.LUCENE_4_7, true), "My China ", new String[] { "my", "china"}, new int[] {0,3}, new int[] {2, 8}); {code} whereas this passes: {code:java} // note Version.LUCENE_4_8 assertAnalyzesTo(new SmartChineseAnalyzer(Version.LUCENE_4_8, true), "My China ", new String[] { "my", "china"}, new int[] {0,3}, new int[] {2, 8}); {code} I'll add a test to verify this double-whitespace case but otherwise, this can be closed out. was (Author: mdodswo...@salesforce.com): This appears to be a symptom of LUCENE-4984 (fixed in 4.8). The following test fails: {code:java} // note Version.LUCENE_4_7 assertAnalyzesTo(new SmartChineseAnalyzer(Version.LUCENE_4_7, true), "My China ", new String[] { "my", "china"}, new int[] {0,3}, new int[] {2, 8}); {code} whereas this passes: {code:java} note Version.LUCENE_4_8 assertAnalyzesTo(new SmartChineseAnalyzer(Version.LUCENE_4_8, true), "My China ", new String[] { "my", "china"}, new int[] {0,3}, new int[] {2, 8}); {code} I'll add a test to verify this double-whitespace case but otherwise, this can be closed out. > SmartChineseAnalyzer got wrong matched offset > --------------------------------------------- > > Key: LUCENE-4730 > URL: https://issues.apache.org/jira/browse/LUCENE-4730 > Project: Lucene - Core > Issue Type: Bug > Components: modules/analysis > Affects Versions: 4.0, 4.1 > Environment: JDK1.7 Linux/Windows > Reporter: Jinsong Hu > Priority: Critical > Attachments: LUCENE-4730.patch > > > We found that SmartChineseAnalyzer got wrong matched offset with the > following test code: > public void testHighlight() throws Exception { > String text = "My China "; > String queryText = "China"; > StringBuilder builder = new StringBuilder("<html>"); > Analyzer analyzer = new SmartChineseAnalyzer(Version.LUCENE_40); > //Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_40); > QueryParser parser = new QueryParser(Version.LUCENE_40, "text", > analyzer); > Query query = parser.parse(queryText); > SimpleHTMLFormatter formatter = new SimpleHTMLFormatter("<span > style=\"background: yellow\">", "</span>"); > TokenStream tokens = analyzer.tokenStream("text", new > StringReader(text)); > QueryScorer scorer = new QueryScorer(query, "text"); > Highlighter highlighter = new Highlighter(formatter, scorer); > highlighter.setTextFragmenter(new SimpleSpanFragmenter(scorer)); > String result = highlighter.getBestFragments(tokens, text, 10, "..."); > if (result.length() < text.length()) { > result = text; > } > builder.append("<body>"); > builder.append(result); > builder.append("</body>"); > builder.append("</html>"); > System.out.println(builder.toString()); > } > This method will generate a hilighted text, however, the highlight position > is obviously wrong, and if we remove one space from the text, that is, change > text from "My China " (ends with two spaces) to "My China " (ends with one > space), it will generate a text with correct highlight. If we change the > analyzer from SmartChineseAnalyzer to StandardAnalyzer, the highlight issue > will disappear. -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org