[ https://issues.apache.org/jira/browse/LUCENE-902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12502731 ]
Toru Matsuzawa edited comment on LUCENE-902 at 6/8/07 2:14 AM: --------------------------------------------------------------- Hi Hoss, Thank you your comments. > 1) in future patches, could you please use 2 spaces instead of tabs? It consented. > 2) am i understanding correctly that the primary use case you are trying to > address is > stop word removal when the stop word has synonyms with a position increment > of 0 > (the expectation being that the synonyms also be removed) ? Your understanding is correct. However, a synonym itself might be a stop word. > ... if so, wouldn't the most efficient thing be to do stop word removal > before doing > synonym expansion? (it means having a bigger stop word list - with all the > synonyms - > but that still seems better to me) ... are there other use cases i'm not > understanding? ... > i freely admit i don't understand the "Japanese morphological analysis" > comment. It is not realistic to have a stop word list with all the synonyms because the morphological engine must understand all the dictionaries to make that list. (The engine analyzes texts with such dictionaries.) > 3) i only glanced over the specifics of removeStopwordCollocatesNext() .. > but would promoting BufferedTokenStream from Solr simplify the code > (it seems to all be about buffering tokens) ... http://svn.apache.org/viewvc/lucene/solr/trunk/src/java/org/apache/solr/analysis/BufferedTokenStream.java?view=markup I think that it becomes more concise if BufferedTokenStream can be used. > 4) it would be useful if the test case could clarify not only the expected > tokens text > concatenated together, but also what the expected positions of position > increments are > for the tokens... i was certainly confused by the title of this issue. I agree with you. It would be better to compare them with expected tokens. I'm sorry to confuse you with my poor English. was: Hi Hoss, Than you your comments. > 1) in future patches, could you please use 2 spaces instead of tabs? It consented. > 2) am i understanding correctly that the primary use case you are trying to > address is > stop word removal when the stop word has synonyms with a position increment > of 0 > (the expectation being that the synonyms also be removed) ? Your understanding is correct. However, a synonym itself might be a stop word. > ... if so, wouldn't the most efficient thing be to do stop word removal > before doing > synonym expansion? (it means having a bigger stop word list - with all the > synonyms - > but that still seems better to me) ... are there other use cases i'm not > understanding? ... > i freely admit i don't understand the "Japanese morphological analysis" > comment. It is not realistic to have a stop word list with all the synonyms because the morphological engine must understand all the dictionaries to make that list. (The engine analyzes texts with such dictionaries.) > 3) i only glanced over the specifics of removeStopwordCollocatesNext() .. > but would promoting BufferedTokenStream from Solr simplify the code > (it seems to all be about buffering tokens) ... http://svn.apache.org/viewvc/lucene/solr/trunk/src/java/org/apache/solr/analysis/BufferedTokenStream.java?view=markup I think that it becomes more concise if BufferedTokenStream can be used. > 4) it would be useful if the test case could clarify not only the expected > tokens text > concatenated together, but also what the expected positions of position > increments are > for the tokens... i was certainly confused by the title of this issue. I agree with you. It would be better to compare them with expected tokens. I'm sorry to confuse you with my poor English. > Check on PositionIncrement with StopFilter. > -------------------------------------------- > > Key: LUCENE-902 > URL: https://issues.apache.org/jira/browse/LUCENE-902 > Project: Lucene - Java > Issue Type: Bug > Components: Analysis > Affects Versions: 2.2 > Reporter: Toru Matsuzawa > Attachments: stopfilter.patch, stopfilter20070604.patch, > stopfilter20070605.patch, stopfilter20070608.patch > > > PositionIncrement set with Tokenizer is not considered with StopFilter. > When PositionIncrement of Token is 1, it is deleted by StopFilter. However, > when PositionIncrement of Token following afterwards is 0, it is not deleted. > I think that it is necessary to be deleted. Because it is thought same Token > when PositionIncrement is 0. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]