[
https://issues.apache.org/jira/browse/LUCENE-902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12502731
]
Toru Matsuzawa commented on LUCENE-902:
---------------------------------------
Hi Hoss,
Than you your comments.
> 1) in future patches, could you please use 2 spaces instead of tabs?
It consented.
> 2) am i understanding correctly that the primary use case you are trying to
> address is
> stop word removal when the stop word has synonyms with a position increment
> of 0
> (the expectation being that the synonyms also be removed) ?
Your understanding is correct.
However, a synonym itself might be a stop word.
> ... if so, wouldn't the most efficient thing be to do stop word removal
> before doing
> synonym expansion? (it means having a bigger stop word list - with all the
> synonyms -
> but that still seems better to me) ... are there other use cases i'm not
> understanding? ...
> i freely admit i don't understand the "Japanese morphological analysis"
> comment.
It is not realistic to have a stop word list with all the synonyms
because the morphological engine must understand all the dictionaries to make
that list.
(The engine analyzes texts with such dictionaries.)
> 3) i only glanced over the specifics of removeStopwordCollocatesNext() ..
> but would promoting BufferedTokenStream from Solr simplify the code
> (it seems to all be about buffering tokens) ...
http://svn.apache.org/viewvc/lucene/solr/trunk/src/java/org/apache/solr/analysis/BufferedTokenStream.java?view=markup
I think that it becomes more concise if BufferedTokenStream can be used.
> 4) it would be useful if the test case could clarify not only the expected
> tokens text
> concatenated together, but also what the expected positions of position
> increments are
> for the tokens... i was certainly confused by the title of this issue.
I agree with you. It would be better to compare them with expected tokens.
I'm sorry to confuse you with my poor English.
> Check on PositionIncrement with StopFilter.
> --------------------------------------------
>
> Key: LUCENE-902
> URL: https://issues.apache.org/jira/browse/LUCENE-902
> Project: Lucene - Java
> Issue Type: Bug
> Components: Analysis
> Affects Versions: 2.2
> Reporter: Toru Matsuzawa
> Attachments: stopfilter.patch, stopfilter20070604.patch,
> stopfilter20070605.patch
>
>
> PositionIncrement set with Tokenizer is not considered with StopFilter.
> When PositionIncrement of Token is 1, it is deleted by StopFilter. However,
> when PositionIncrement of Token following afterwards is 0, it is not deleted.
> I think that it is necessary to be deleted. Because it is thought same Token
> when PositionIncrement is 0.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]