I haven't looked at your latest patch yet, so this is just guesswork,
but was thinking in TermScorer, around line 75 or so, we could add:
score *= similarity.scorePayload(payloadBuffer);
The default Similarity would just return 1. This would allow people
to incorporate a score based on what is in the payload, per their
application needs and would be completely backward-compatible. We
may even want to postpone the decoding of the payload to inside the
Similarity for performance reasons, but that should be tested, since
that could be cause for confusion for people overriding Similarity.
I will have to look at some of the other Scorers to see if there is a
way to incorporate into some of them.
None of this would prevent using payloads for other things as well,
such as the XPath query example.
Doing this would involve switching over to using TermPositions like
we talked about. Like I said, I will take a look at it and see if
anything resonates.
-Grant
On Mar 11, 2007, at 11:26 PM, Michael Busch wrote:
Grant Ingersoll wrote:
Cool. I will try and take a look at it tomorrow. Since we have
the lazy SegTermPos thing in now, we should be able to integrate
this into scoring via the Similarity and merge TermDocs and
TermPositions like you suggested.
If I can get the Scoring piece in and people are fine w/ the
flushBuffer change then hopefully we can get this in this week. I
will try to post a patch that includes your patch and the scoring
integration by tomorrow or Tuesday if that is fine with you.
I'm not completely sure how you want to integrate this in the
Similarity class. Payloads can not only be used for scoring.
Consider for example XML search: the payloads can be used here to
store in which element a term occurs. During search (e. g. an XPath
query) the payloads would be used then to find hits, not for scoring.
On the other hand if you want to store e. g. per-postions boosts in
the payloads, you could use the norm en/decoding methods that are
already in Similarity. You could use the following code in a
TokenStream:
byte[] payload = new byte[1];
payload[0] = Similari.encodeNorm(boost);
token.setPayload(payload);
and in a scorer you could get the boost then with:
termPositions.getPayload(payloadBuffer);
float boost = Similarity.decodeNorm(payloadBuffer[0]);
But maybe you have something different in mind? Could you
elaborate, please?
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org
Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/
LuceneFAQ
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]