Re: Positions in SpanFirst

Erick Erickson Wed, 21 Feb 2007 16:41:19 -0800

I really think you need to stop obsessing on SpanFirst <G>. I suspect that
this is leading you down an unrewarding path.


So I don't see why using a SpanNear that respects order and a large
IncrementGap won't solve your problem...... Although it would return "odd"
matches. Let's say you indexed "first second third" as one name and then
searched on a SpanNear of second and third with a slop of 100. You'd get a
match on a middle and last name rather than a first and last name..... But I
wonder if this can be tolerated given all the new capabilities you'll
doubtless be adding <G>.

Best
Erick

On 2/21/07, Antony Bowesman <[EMAIL PROTECTED]> wrote:


Hi Erick,

> What this does is allow you to put gaps between successive sets of terms
> indexed in the same field. For instance...
> doc.add("field", "some stuff");
> doc.add("field", "bunch hooey");
> doc.add("field", "what is this");
> writer.add(doc);
>
> In this case, there would be the following positions, assuming that the
> IncrementGap was 1000....
> some 0
> stuff 1
> bunch 1002
> hooey 1003
> what 2004
> is 2005
> this 2006

So, if you can add 1000, shouldn't setting 0 each time cause it to start
at 0
each time?  The default Analyzer.getPositionIncrementGap always returns 0.

>> That's a good point.  The field is used to index mail recipients and
>> currently
>> has a "starts with" search (non Lucene implementation).  Unless I can
set
>> the
>> position increment gap, it is only ever possible to search for the
first
>> indexed
>> recipient with proxity queries.\
>
>
> This is confusing me. You can easily use proximity queries with the
above
> scenario. For instance, searching for "bunch hooey"~4 would generate a
hit.
> As would "bunch hooey"~10000. But "some this"~10 would not generate a
hit.
> Whether that does what you need is another question <G>... So it's time
to
> ask "what are you really trying to do?" In other words, what behavior
are
> you trying to mimic from the old code? It's not clear to me what the
> behavior you need is. It'd help if you gave a concrete example of the
raw
> data, and what you want returned...

You example is good enough, just assume they are people's names :)  I know
I had
a mail from Mrs Bunch Ogilvy, so I want to do a "starts with", i.e.
SpanFirst
for bunch, so I find all the first name bunches.

> In your first example, using the above scheme, you'd get hits (using
> SpanNear rather than SpanFirst) if you searched on
> "first bit" in a SpanNear query with a slop of 2. You'd also get a hit
if
> you searched on
> "second part" in a SpanNear with a slop of 2. Does this mimic the
behavior
> you need?

No, SpanNear is fine, but SpanFirst will not work as there always has to
be a
starting offset.  I can't search "bunch hooey" as SpanFirst unless I know
that
it was indexed as the second 'group' and therefore set the starting span
position as 1002.

Using Lucene has added a whole world of new search possibilities to the
product,
but when people have been using something a certain way for 15 years, it
can be
difficult to shift their expectations :)  There's always someone who will
shout...

Antony


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Positions in SpanFirst

Reply via email to