Re: Positions in SpanFirst

Antony Bowesman Wed, 21 Feb 2007 16:10:00 -0800

Hi Erick,

What this does is allow you to put gaps between successive sets of terms
indexed in the same field. For instance...
doc.add("field", "some stuff");
doc.add("field", "bunch hooey");
doc.add("field", "what is this");
writer.add(doc);


In this case, there would be the following positions, assuming that the
IncrementGap was 1000....
some 0
stuff 1
bunch 1002
hooey 1003
what 2004
is 2005
this 2006

So, if you can add 1000, shouldn't setting 0 each time cause it to start at 0each time? The default Analyzer.getPositionIncrementGap always returns 0.

That's a good point.  The field is used to index mail recipients and
currently
has a "starts with" search (non Lucene implementation).  Unless I can set
the
position increment gap, it is only ever possible to search for the first
indexed
recipient with proxity queries.\



This is confusing me. You can easily use proximity queries with the above
scenario. For instance, searching for "bunch hooey"~4 would generate a hit.
As would "bunch hooey"~10000. But "some this"~10 would not generate a hit.
Whether that does what you need is another question <G>... So it's time to
ask "what are you really trying to do?" In other words, what behavior are
you trying to mimic from the old code? It's not clear to me what the
behavior you need is. It'd help if you gave a concrete example of the raw
data, and what you want returned...

You example is good enough, just assume they are people's names :) I know I hada mail from Mrs Bunch Ogilvy, so I want to do a "starts with", i.e. SpanFirstfor bunch, so I find all the first name bunches.

In your first example, using the above scheme, you'd get hits (using
SpanNear rather than SpanFirst) if you searched on
"first bit" in a SpanNear query with a slop of 2. You'd also get a hit if
you searched on
"second part" in a SpanNear with a slop of 2. Does this mimic the behavior
you need?

No, SpanNear is fine, but SpanFirst will not work as there always has to be astarting offset. I can't search "bunch hooey" as SpanFirst unless I know thatit was indexed as the second 'group' and therefore set the starting spanposition as 1002.

Using Lucene has added a whole world of new search possibilities to the product,but when people have been using something a certain way for 15 years, it can bedifficult to shift their expectations :) There's always someone who will shout...


Antony


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Positions in SpanFirst

Reply via email to