[ https://issues.apache.org/jira/browse/LUCENE-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629846#action_12629846 ]
Michael Semb Wever commented on LUCENE-1380: -------------------------------------------- i suspected such re the option name, but "coterminal" is a word i haven't used since high school. > I'm -1 on the patch in its current form. If rewritten to modify the position > increment only for those shingles that begin at the same word, I'd be +1 > (assuming it works and is tested appropriately). As i said in thread your suggestion does not work. Setting each shingle to have a positionIncrement=1 so to avoid using the MultiPhraseQuery in favour of the plain PhraseQuery makes sense, but does not work. And not phrasing the query doesn't invoke the ShingleFilter properly. > The ShingleFilter appears to only work, at least for me, on phrases. > I would think this correct as each shingle is in fact a sub-phrase to the > larger original phrase. If this is the case, ie ShingleFilter works on phrases as a whole entity, and that shingles from each term in the phrase do have a relationship as they all come from the one phrase, then does it not make sense to have the possibility to position them altogether. For example in the current implementation, in the phrase "abcd efgh ijkl" it is the first term "abcd" that is responsible for generating the shingles "abcd efgh ijkl" and "abcd efgh". What says that these shingles couldn't be generated from the "efgh" (or "ijkl" for the former shingle) term in an alternative implementation? Why the presumption that it's in the user's interest to force this separation between where this implementation chooses to put its shingles? If this isn't lost-in-the-bush-logic, have you a suggestion for a more appropriate option name for the current solution? > Patch for ShingleFilter.coterminalPositionIncrement > --------------------------------------------------- > > Key: LUCENE-1380 > URL: https://issues.apache.org/jira/browse/LUCENE-1380 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers > Reporter: Michael Semb Wever > Fix For: 2.4 > > Attachments: LUCENE-1380.patch > > > Make it possible for *all* words and shingles to be placed at the same > position. > Default is to place each shingle at the same position as the unigram (or > first shingle if outputUnigrams=false). That is, each coterminal token has > positionIncrement=1 and every other token a positionIncrement=0. > This leads to a MultiPhraseQuery where at least one word/shingle must be > matched from each word/token. This is not always desired. > See http://comments.gmane.org/gmane.comp.jakarta.lucene.user/34746 for > mailing list thread. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]