Re: span / position increment issue

Marc Hadfield Thu, 05 Jan 2006 10:33:34 -0800


Thanks Erik, Hoss -

I will try MultiPhraseQuery and report back.

Back in an email thread with Doug be mentioned SpanQuery would work, andin a fashion it does, but I can't differentiate between terms at thesame position and contiguous positions. The problem gets worse if Iwant to test for more than two terms at the same position ( moon / noun/ end_of_sentence / ...), which opens it up to wider contiguous spaces.

Would it make sense to extend SpanNearQuery for this case, perhaps witha slop of a special value (-1) to indicate terms should be found at thesame position? I'm not sure how difficult this would be to represent inthe underlying queues keeping track of matches.


---Marc

Erik Hatcher wrote:

Marc,
SpanNearQuery isn't capable of performing the proximity to withinonly a single position in the manner you've described. A slop of 0means the terms must be contiguous with no gaps, which also allowsfor matches in the same position as in your first example.
I think MultiPhraseQuery (from svn trunk) will do what you wantthough. Please try that and report back on how it works for you.
    Erik

: i have a problem with a SpanNearQuery returning incorrect (false
: positive) results.

I'm not familiar with how Span queries are implimented, but there doesn't
appear to be any test cases dealing with an index where term position
increments are ever 0, so i can neither confirm nor deny the bug you're
seeing.

Can you post code demonstrating the problem?  ideally in the form of
a simple, self contained, JUnit test?



-Hoss

On Jan 4, 2006, at 9:39 PM, Marc Hadfield wrote:
hello all -
i have a problem with a SpanNearQuery returning incorrect (falsepositive) results.
I am creating the context of a field using tokens which haveposition increment set to either 1 or 0. The position increment isset to 0 for special tokens, in this case part-of-speech markers.
Thus, brackets set of position increments:
[The __pos_dt] [cow __pos_noun] [jumped __pos_verb] [over__pos_prep] [the __pos_dt] [moon __pos_noun] [. __pos_.]
My Span Query looks like:
SpanQuery sq = new SpanNearQuery(new SpanQuery[]
                     {
new SpanTermQuery(new Term("content","jumped")),new SpanTermQuery(newTerm("content", "__pos_verb"))
                                        }, 0, false);

This correctly finds the span:  [jumped __pos_verb]

However, if I query:
SpanQuery sq = new SpanNearQuery(new SpanQuery[]
                     {
new SpanTermQuery(new Term("content","jumped")),new SpanTermQuery(newTerm("content", "__pos_noun"))
                                        }, 0, false);

This incorrectly finds the span:  [cow __pos_noun] [jumped __pos_verb]
This is wrong because there is a distance of 1 between the tokens,not 0.
I am using a recent version of Lucene from SVN.
I am thinking that the problem is related to the position incrementbeing set to 0 for the first token of the incorrect "match" -- thusperhaps this is a bug in the SpanNearQuery?
Best,
Marc






---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: span / position increment issue

Reply via email to