Hi,

I was looking at SpanNotQuery to see if I could make do without the position increment gaps. A search requirement that's causing me some trouble to implement is when two terms are supposed to be on the same L_2, yet on different L_3's (L_3's are hierarchically below L_2).

With the position increments in place, I can do this:

<SpanNot fieldName="MyField">
<Include>
<SpanNear slop="100000" inOrder="false">
<SpanTerm>t293</SpanTerm>
<SpanTerm>t4979</SpanTerm>
</SpanNear>
</Include>
<Exclude>
<SpanNear slop="1000" inOrder="false">
<SpanTerm>t293</SpanTerm>
<SpanTerm>t4979</SpanTerm>
</SpanNear>
</Exclude>
</SpanNot>

This query returns the expected documents.

I didn't manage to come up with a working solution for the approach without posIncGaps. The following, I thought, should work, but for some reason it doesn't:

<SpanNot fieldName="MyField">
<Include>
<!-- Gets all the matching spans within L_2 boundaries and includes them -->
<SpanNot>
<Include>
<SpanNear slop="2147483647" inOrder="false" >
<SpanTerm>t293</SpanTerm>
<SpanTerm>t4979</SpanTerm>
</SpanNear>
</Include>
<Exclude>
<SpanTerm>L_2</SpanTerm>
</Exclude>
</SpanNot>
</Include>
<Exclude>
<!-- Gets all the matching spans from L_3 boundaries and excludes them -->
<SpanNot>
<Include>
<SpanNear slop="2147483647" inOrder="false" >
<SpanTerm>t293</SpanTerm>
<SpanTerm>t4979</SpanTerm>
</SpanNear>
</Include>
<Exclude>
<SpanTerm>L_3</SpanTerm>
</Exclude>
</SpanNot>
</Exclude>
</SpanNot>

Shouldn't this query only leave documents, where t293 and t4979 are in the same L_2, but not within the same L_3? I fiddled about with different queries to no avail and I feel the above is the most straightforward try. But the query doesn't match any document at all.

Any ideas on how to improve the second query would be greatly appreciated.

Thanks
Rene

Hi Rene,

Have you seen SpanNotQuery?:

<http://lucene.apache.org/java/3_0_1/api/core/org/apache/lucene/search/spans/SpanNotQuery.html>

For a document that looks like:

<Level_1 id="1">
   <Level_2 id="1">
     <Level_3 id="1">T1 T2 T3</Level_3>
     <Level_3 id="2">T4 T5 T6</Level_3>
     <Level_3 id="3">T7 T8 T9</Level_3>
   </Level_2>
   <Level_2 id="2">
     <Level_3 id="1">T10 T11 T12</Level_3>
     <Level_3 id="2">T13 T14 T15</Level_3>
     <Level_3 id="3">T16 T17 T18</Level_3>
   </Level_2>
   ...
</Level1>
...

You could generate the following token stream (L_X being a concrete level 
boundary token):

L_1 L_2 L_3 T1  T2  T3  L_3 T4  T5  T6  L_3 T7  T8  T9
     L_2 L_3 T10 T11 T12 L_3 T13 T14 T15 L_3 T16 T17 T18
     L_2 ...
...

A query to find T2 and T8 on the same Level_2 would require you to find a span 
containing T2 and T8, but not containing L_2.

This scheme will generalize to as many levels as you need, and you can use 
nested span queries to simultaneously provide constraints at multiple levels.  
No position increment gap required.

Caveat: this scheme is not tested - I could be way off base :).

Steve


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


Reply via email to