[ https://issues.apache.org/jira/browse/LUCENE-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Muir resolved LUCENE-3717. --------------------------------- Resolution: Fixed > Add fake charfilter to BaseTokenStreamTestCase to find offsets bugs > ------------------------------------------------------------------- > > Key: LUCENE-3717 > URL: https://issues.apache.org/jira/browse/LUCENE-3717 > Project: Lucene - Java > Issue Type: Task > Reporter: Robert Muir > Fix For: 3.6, 4.0 > > Attachments: LUCENE-3717.patch, LUCENE-3717_more.patch, > LUCENE-3717_ngram.patch > > > Recently lots of issues have been fixed about broken offsets, but it would be > nice to improve the > test coverage and test that they work across the board (especially with > charfilters). > in BaseTokenStreamTestCase.checkRandomData, we can sometimes pass the > analyzer a reader wrapped > in a "MockCharFilter" (the one in the patch sometimes doubles characters). If > the analyzer does > not call correctOffsets or does incorrect "offset math" (LUCENE-3642, etc) > then eventually > this will create offsets and the test will fail. > Other than tests bugs, this found 2 real bugs: ICUTokenizer did not call > correctOffset() in its end(), > and ThaiWordFilter did incorrect offset math. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org