I really think you need to stop obsessing on SpanFirst <G>. I suspect that this is leading you down an unrewarding path.
So I don't see why using a SpanNear that respects order and a large IncrementGap won't solve your problem...... Although it would return "odd" matches. Let's say you indexed "first second third" as one name and then searched on a SpanNear of second and third with a slop of 100. You'd get a match on a middle and last name rather than a first and last name..... But I wonder if this can be tolerated given all the new capabilities you'll doubtless be adding <G>. Best Erick On 2/21/07, Antony Bowesman <[EMAIL PROTECTED]> wrote:
Hi Erick, > What this does is allow you to put gaps between successive sets of terms > indexed in the same field. For instance... > doc.add("field", "some stuff"); > doc.add("field", "bunch hooey"); > doc.add("field", "what is this"); > writer.add(doc); > > In this case, there would be the following positions, assuming that the > IncrementGap was 1000.... > some 0 > stuff 1 > bunch 1002 > hooey 1003 > what 2004 > is 2005 > this 2006 So, if you can add 1000, shouldn't setting 0 each time cause it to start at 0 each time? The default Analyzer.getPositionIncrementGap always returns 0. >> That's a good point. The field is used to index mail recipients and >> currently >> has a "starts with" search (non Lucene implementation). Unless I can set >> the >> position increment gap, it is only ever possible to search for the first >> indexed >> recipient with proxity queries.\ > > > This is confusing me. You can easily use proximity queries with the above > scenario. For instance, searching for "bunch hooey"~4 would generate a hit. > As would "bunch hooey"~10000. But "some this"~10 would not generate a hit. > Whether that does what you need is another question <G>... So it's time to > ask "what are you really trying to do?" In other words, what behavior are > you trying to mimic from the old code? It's not clear to me what the > behavior you need is. It'd help if you gave a concrete example of the raw > data, and what you want returned... You example is good enough, just assume they are people's names :) I know I had a mail from Mrs Bunch Ogilvy, so I want to do a "starts with", i.e. SpanFirst for bunch, so I find all the first name bunches. > In your first example, using the above scheme, you'd get hits (using > SpanNear rather than SpanFirst) if you searched on > "first bit" in a SpanNear query with a slop of 2. You'd also get a hit if > you searched on > "second part" in a SpanNear with a slop of 2. Does this mimic the behavior > you need? No, SpanNear is fine, but SpanFirst will not work as there always has to be a starting offset. I can't search "bunch hooey" as SpanFirst unless I know that it was indexed as the second 'group' and therefore set the starting span position as 1002. Using Lucene has added a whole world of new search possibilities to the product, but when people have been using something a certain way for 15 years, it can be difficult to shift their expectations :) There's always someone who will shout... Antony --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]