[jira] [Commented] (LUCENE-3113) fix analyzer bugs found by MockTokenizer
[ https://issues.apache.org/jira/browse/LUCENE-3113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035007#comment-13035007 ] Robert Muir commented on LUCENE-3113: - thanks for reviewing Steven, I agree! I've made this change and will commit shortly. > fix analyzer bugs found by MockTokenizer > > > Key: LUCENE-3113 > URL: https://issues.apache.org/jira/browse/LUCENE-3113 > Project: Lucene - Java > Issue Type: Bug > Components: modules/analysis >Reporter: Robert Muir > Fix For: 3.2, 4.0 > > Attachments: LUCENE-3113.patch, LUCENE-3113.patch > > > In LUCENE-3064, we beefed up MockTokenizer with assertions, and I've switched > over the analysis tests to use MockTokenizer for better coverage. > However, this found a few bugs (one of which is LUCENE-3106): > * incrementToken() after it returns false in CommonGramsQueryFilter, > HyphenatedWordsFilter, ShingleFilter, SynonymFilter > * missing end() implementation for PrefixAwareTokenFilter > * double reset() in QueryAutoStopWordAnalyzer and ReusableAnalyzerBase > * missing correctOffset()s in MockTokenizer itself. > I think it would be nice to just fix all the bugs on one issue... I've fixed > everything except Shingle and Synonym -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3113) fix analyzer bugs found by MockTokenizer
[ https://issues.apache.org/jira/browse/LUCENE-3113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034956#comment-13034956 ] Steven Rowe commented on LUCENE-3113: - +1 bq. the ShingleAnalyzerWrapper was double-resetting Your patch just removes the reset call: {noformat} @@ -201,7 +201,6 @@ TokenStream result = defaultAnalyzer.reusableTokenStream(fieldName, reader); if (result == streams.wrapped) { /* the wrapped analyzer reused the stream */ -streams.shingle.reset(); } else { /* the wrapped analyzer did not, create a new shingle around the new one */ streams.wrapped = result; {noformat} but inverting the condition would read better: {noformat} TokenStream result = defaultAnalyzer.reusableTokenStream(fieldName, reader); - if (result == streams.wrapped) { -/* the wrapped analyzer reused the stream */ -streams.shingle.reset(); - } else { -/* the wrapped analyzer did not, create a new shingle around the new one */ + if (result != streams.wrapped) { +// The wrapped analyzer did not reuse the stream. +// Wrap the new stream with a new ShingleFilter. streams.wrapped = result; streams.shingle = new ShingleFilter(streams.wrapped); } {noformat} > fix analyzer bugs found by MockTokenizer > > > Key: LUCENE-3113 > URL: https://issues.apache.org/jira/browse/LUCENE-3113 > Project: Lucene - Java > Issue Type: Bug > Components: modules/analysis >Reporter: Robert Muir > Fix For: 3.2, 4.0 > > Attachments: LUCENE-3113.patch, LUCENE-3113.patch > > > In LUCENE-3064, we beefed up MockTokenizer with assertions, and I've switched > over the analysis tests to use MockTokenizer for better coverage. > However, this found a few bugs (one of which is LUCENE-3106): > * incrementToken() after it returns false in CommonGramsQueryFilter, > HyphenatedWordsFilter, ShingleFilter, SynonymFilter > * missing end() implementation for PrefixAwareTokenFilter > * double reset() in QueryAutoStopWordAnalyzer and ReusableAnalyzerBase > * missing correctOffset()s in MockTokenizer itself. > I think it would be nice to just fix all the bugs on one issue... I've fixed > everything except Shingle and Synonym -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3113) fix analyzer bugs found by MockTokenizer
[ https://issues.apache.org/jira/browse/LUCENE-3113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034846#comment-13034846 ] Robert Muir commented on LUCENE-3113: - Uwe, I think i'll open a followup issue to clean up the code about PrefixAndSuffixAwareTF. I don't like how tricky it is. > fix analyzer bugs found by MockTokenizer > > > Key: LUCENE-3113 > URL: https://issues.apache.org/jira/browse/LUCENE-3113 > Project: Lucene - Java > Issue Type: Bug > Components: modules/analysis >Reporter: Robert Muir > Fix For: 3.2, 4.0 > > Attachments: LUCENE-3113.patch, LUCENE-3113.patch > > > In LUCENE-3064, we beefed up MockTokenizer with assertions, and I've switched > over the analysis tests to use MockTokenizer for better coverage. > However, this found a few bugs (one of which is LUCENE-3106): > * incrementToken() after it returns false in CommonGramsQueryFilter, > HyphenatedWordsFilter, ShingleFilter, SynonymFilter > * missing end() implementation for PrefixAwareTokenFilter > * double reset() in QueryAutoStopWordAnalyzer and ReusableAnalyzerBase > * missing correctOffset()s in MockTokenizer itself. > I think it would be nice to just fix all the bugs on one issue... I've fixed > everything except Shingle and Synonym -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3113) fix analyzer bugs found by MockTokenizer
[ https://issues.apache.org/jira/browse/LUCENE-3113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034816#comment-13034816 ] Uwe Schindler commented on LUCENE-3113: --- A quick check on the fixes in the implementations: all fine. I was just confused about PrefixAndSuffixAwareTF, but thats fine (Robert explained it to me - this Filters are very complicated from the code/class hierarchy design *g*). I did not verify the Tests, I assume its just dumb search-replacements. > fix analyzer bugs found by MockTokenizer > > > Key: LUCENE-3113 > URL: https://issues.apache.org/jira/browse/LUCENE-3113 > Project: Lucene - Java > Issue Type: Bug > Components: modules/analysis >Reporter: Robert Muir > Fix For: 3.2, 4.0 > > Attachments: LUCENE-3113.patch, LUCENE-3113.patch > > > In LUCENE-3064, we beefed up MockTokenizer with assertions, and I've switched > over the analysis tests to use MockTokenizer for better coverage. > However, this found a few bugs (one of which is LUCENE-3106): > * incrementToken() after it returns false in CommonGramsQueryFilter, > HyphenatedWordsFilter, ShingleFilter, SynonymFilter > * missing end() implementation for PrefixAwareTokenFilter > * double reset() in QueryAutoStopWordAnalyzer and ReusableAnalyzerBase > * missing correctOffset()s in MockTokenizer itself. > I think it would be nice to just fix all the bugs on one issue... I've fixed > everything except Shingle and Synonym -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3113) fix analyzer bugs found by MockTokenizer
[ https://issues.apache.org/jira/browse/LUCENE-3113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034806#comment-13034806 ] Robert Muir commented on LUCENE-3113: - I think this patch is ready to commit, i'll wait and see if anyone feels like reviewing it :) > fix analyzer bugs found by MockTokenizer > > > Key: LUCENE-3113 > URL: https://issues.apache.org/jira/browse/LUCENE-3113 > Project: Lucene - Java > Issue Type: Bug > Components: modules/analysis >Reporter: Robert Muir > Fix For: 3.2, 4.0 > > Attachments: LUCENE-3113.patch, LUCENE-3113.patch > > > In LUCENE-3064, we beefed up MockTokenizer with assertions, and I've switched > over the analysis tests to use MockTokenizer for better coverage. > However, this found a few bugs (one of which is LUCENE-3106): > * incrementToken() after it returns false in CommonGramsQueryFilter, > HyphenatedWordsFilter, ShingleFilter, SynonymFilter > * missing end() implementation for PrefixAwareTokenFilter > * double reset() in QueryAutoStopWordAnalyzer and ReusableAnalyzerBase > * missing correctOffset()s in MockTokenizer itself. > I think it would be nice to just fix all the bugs on one issue... I've fixed > everything except Shingle and Synonym -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org