[ 
https://issues.apache.org/jira/browse/LUCENE-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe updated LUCENE-1380:
--------------------------------

    Attachment: LUCENE-1380-PositionFilter.patch

Mck, I was wrong about Filter testing over multiple docs - each instance of a 
Filter is defined only over a single doc, so this doesn't make sense.

However, you are completely on the right track with the reset() operation, 
since PositionFilter is sensitive to whether it's at the beginning of a stream, 
and it should respond as you have written it.

So, since I was wrong about PositionFilter needing to handle usage with 
multiple documents, the else clause that I said should go in (upon receiving 
null from the input stream) should come back out.  In fact, the proper response 
from a filter in the analysis chain upon encountering null is to stop 
processing, since it means end-of-stream, so I've removed your tests with null 
embedded in this revised patch.

bq. Steve, can you look at the reset versus null token in stream difference. 
Are both approaches valid to test? (I'd not overridden TokenStream.reset() in 
the previous patch).

I removed the void-return filterTest(), since it wasn't called from anywhere, 
and it only used ShingleFilter, and no PositionFilter.  In its place I've added 
another test named testReset().

I added a test that checks for non-default positionIncrement: 
testNonZeroPositionIncrement().

I removed PositionFilter.setPositionIncrement(), because using it one could 
potentially change the position increment in mid-stream, which makes little 
sense.  The alternate constructor provides a way to set it.

In the patch, I have modified the formatting a little to conform to Lucene 
convention, which is outlined on the [HowToContribute wiki 
page|http://wiki.apache.org/lucene-java/HowToContribute#head-59ae13df098fbdcc46abdf980aa8ee76d3ee2e3b]:

{quote}
* Code should be formatted according to [Sun's 
conventions|http://java.sun.com/docs/codeconv/] with one exception:
** indent two spaces per level, not four.
{quote}

I ran "svn diff" under the trunk/ directory, instead of in 
trunk/contrib/analyzers/ (where you based your patches) - it's simpler for 
people who look at a lot of these things to have them always be based from 
trunk/.

Take a look and make sure things are as they should be - the tests pass for me, 
and I think it's doing what it should do.

If you agree, then hopefully we can get Karl (or another committer, which I'm 
not) to take a look and see if they think it can be committed.


> Patch for ShingleFilter.enablePositions (or PositionFilter)
> -----------------------------------------------------------
>
>                 Key: LUCENE-1380
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1380
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/analyzers
>            Reporter: Mck SembWever
>            Priority: Trivial
>         Attachments: LUCENE-1380-PositionFilter.patch, 
> LUCENE-1380-PositionFilter.patch, LUCENE-1380-PositionFilter.patch, 
> LUCENE-1380.patch, LUCENE-1380.patch
>
>
> Make it possible for *all* words and shingles to be placed at the same 
> position, that is for _all_ shingles (and unigrams if included) to be treated 
> as synonyms of each other.
> Today the shingles generated are synonyms only to the first term in the 
> shingle.
> For example the query "abcd efgh ijkl" results in:
>    ("abcd" "abcd efgh" "abcd efgh ijkl") ("efgh" efgh ijkl") ("ijkl")
> where "abcd efgh" and "abcd efgh ijkl" are synonyms of "abcd", and "efgh 
> ijkl" is a synonym of "efgh".
> There exists no way today to alter which token a particular shingle is a 
> synonym for.
> This patch takes the first step in making it possible to make all shingles 
> (and unigrams if included) synonyms of each other.
> See http://comments.gmane.org/gmane.comp.jakarta.lucene.user/34746 for 
> mailing list thread.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to