Great, hope it will help others than me :)
Here's another one for you. As long as I understand, the skipTo method
contract is to skip to the next document whose id is *greater or equal* to
the target. However, it seems that the SpansScorer behaves differently :
from the following code, I assume that if the current doc *is* actually the
target, then you shoud avoid a call to setFreqCurrentDoc(), because it moves
ahead. Am I right ?
public boolean skipTo(int target) throws IOException {
if (firstTime) {
more = spans.skipTo(target);
firstTime = false;
}
if (! more) {
return false;
}
if (spans.doc() < target) { // setFreqCurrentDoc() leaves spans.doc()
ahead
more = spans.skipTo(target);
}
return setFreqCurrentDoc();
}
protected boolean setFreqCurrentDoc() throws IOException {
if (! more) {
return false;
}
doc = spans.doc();
freq = 0.0f;
while (more && doc == spans.doc()) {
int matchLength = spans.end() - spans.start();
freq += getSimilarity().sloppyFreq(matchLength);
more = spans.next();
}
return more || (freq != 0);
}
Paul Elschot wrote:
>
> On Monday 24 September 2007 11:23, melix wrote:
>>
>> Oops, forget it. It *does* seem I didn't have understood exactly what the
>> while (true) loop did. Therefore I wasn't adding the very first submatch.
>> Makes more sense now !
>
> I read through the NearSpansOrdered code once more, and I'm glad
> I added some comments to allow me to understand the code again.
> I'd prefer a version in which fewer comments are necessary,
> but I don't know a simpler way.
>
> Regards,
> Paul Elschot
>
>
>>
>>
>> melix wrote:
>> >
>> > I think I'll focus on that later, since it requires me to copy a whole
>> > bunch of sources from the core. But I have another tough question : I
>> am
>> > working with the NearSpansOrdered class in order to add my match
>> support.
>> > But I have a serious problem I don't understand, maybe you could help
>> me.
>> >
>> > Say I have a query "a NEAR b". With this case, I does happen that the
>> > shrinkToAfterShortestMatch() returns true although "a" is in a
>> document,
>> > and "b" in another one. Therefore, the next() method returns true too,
>> and
>> > it breaks my algorithms. Is there anything I'm missing ? Is this a bug
>> ?
>> >
>> > Thanks,
>> >
>> >
>> > Paul Elschot wrote:
>> >>
>> >> Cedric,
>> >>
>> >> The algorithms of the four scorers used by BooleanScorer2 are
>> >> fairly straightforward by themselves, a short look at the code
>> >> should suffice to get the idea.
>> >> The exception to that is BooleanScorer, but since this is
>> >> only used as an option, it's not really necessary to explain it.
>> >> The one advantage of BooleanScorer is that it is very fast for
>> >> disjunctions.
>> >>
>> >> Regards,
>> >> Paul Elschot
>> >>
>> >>
>> >>
>> >> On Sunday 23 September 2007 13:11, melix wrote:
>> >>>
>> >>> Hi Paul,
>> >>>
>> >>> His there any document which explains how those scorers interact ? My
>> >>> main
>> >>> problem is finding out how to create a match instance for each call
>> to
>> >>> next(), and in boolean queries, it is rather difficult to figure out
>> how
>> >>> to
>> >>> do that. An explanation on the algorithms would surely help.
>> >>>
>> >>> Thx.
>> >>>
>> >>>
>> >>> Paul Elschot wrote:
>> >>> >
>> >>> > Cedric,
>> >>> >
>> >>> > On Saturday 22 September 2007 11:45, melix wrote:
>> >>> >> The problem was even harder when I had to add the match() method
>> to
>> >>> the
>> >>> >> BooleanQuery : this class is so complex, and uses so many
>> protected
>> >>> or
>> >>> >> inner
>> >>> >> classes (for optimization purposes, I must understand) that I
>> would
>> >>> have
>> >>> >> to
>> >>> >> copy a lot of the original source code just to add my method. As
>> >>> >> documentation on how it works is really hard to find, I decided it
>> >>> would
>> >>> >> be
>> >>> >> simpler if I wrote my own boolean queries (which is what I've done
>> >>> now).
>> >>> >> I
>> >>> >> know it must be much less performant, but makes the tasks much
>> >>> easier.
>> >>> >
>> >>> > As long as your scorers are (a combination of) normal target
>> classes
>> >>> of
>> >>> > BooleanScorer2 you should get the same efficiency.
>> >>> > These target classes are ConjunctionScorer, DisjunctionSumScorer,
>> >>> > ReqOptSumScorer and ReqExclScorer. These scorees can be used for
>> >>> > "boolean" operators AND, OR, ANDoptional, and ANDNOT.
>> >>> > For some cases of top level OR, BooleanScorer can also be a target
>> >>> > scorer when scoring out of document order is allowed.
>> >>> > Most of the complexity of BooleanScorer2 comes from mapping
>> >>> > the + and - query operators for required and prohibited subqueries
>> >>> > to these target scorers.
>> >>> >
>> >>> > Regards,
>> >>> > Paul Elschot
>> >>> >
>> >>> >
>> ---------------------------------------------------------------------
>> >>> > To unsubscribe, e-mail: [EMAIL PROTECTED]
>> >>> > For additional commands, e-mail: [EMAIL PROTECTED]
>> >>> >
>> >>> >
>> >>> >
>> >>>
>> >>> --
>> >>> View this message in context:
>> >>
> http://www.nabble.com/Span-queries%2C-API-and-difficulties-tf4500460.html#a12845374
>> >>> Sent from the Lucene - Java Developer mailing list archive at
>> >>> Nabble.com.
>> >>>
>> >>>
>> >>> ---------------------------------------------------------------------
>> >>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> >>> For additional commands, e-mail: [EMAIL PROTECTED]
>> >>>
>> >>>
>> >>>
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> >> For additional commands, e-mail: [EMAIL PROTECTED]
>> >>
>> >>
>> >>
>> >
>> >
>>
>> --
>> View this message in context:
> http://www.nabble.com/Span-queries%2C-API-and-difficulties-tf4500460.html#a12856546
>> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>
--
View this message in context:
http://www.nabble.com/Span-queries%2C-API-and-difficulties-tf4500460.html#a12878823
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]