Re: [jira] Updated: (LUCENE-755) Payloads

Grant Ingersoll Mon, 12 Mar 2007 04:38:48 -0800

I haven't looked at your latest patch yet, so this is just guesswork,but was thinking in TermScorer, around line 75 or so, we could add:


score *= similarity.scorePayload(payloadBuffer);

The default Similarity would just return 1. This would allow peopleto incorporate a score based on what is in the payload, per theirapplication needs and would be completely backward-compatible. Wemay even want to postpone the decoding of the payload to inside theSimilarity for performance reasons, but that should be tested, sincethat could be cause for confusion for people overriding Similarity.I will have to look at some of the other Scorers to see if there is away to incorporate into some of them.

None of this would prevent using payloads for other things as well,such as the XPath query example.

Doing this would involve switching over to using TermPositions likewe talked about. Like I said, I will take a look at it and see ifanything resonates.


-Grant

On Mar 11, 2007, at 11:26 PM, Michael Busch wrote:

Grant Ingersoll wrote:
Cool. I will try and take a look at it tomorrow. Since we havethe lazy SegTermPos thing in now, we should be able to integratethis into scoring via the Similarity and merge TermDocs andTermPositions like you suggested.
If I can get the Scoring piece in and people are fine w/ theflushBuffer change then hopefully we can get this in this week. Iwill try to post a patch that includes your patch and the scoringintegration by tomorrow or Tuesday if that is fine with you.
I'm not completely sure how you want to integrate this in theSimilarity class. Payloads can not only be used for scoring.Consider for example XML search: the payloads can be used here tostore in which element a term occurs. During search (e. g. an XPathquery) the payloads would be used then to find hits, not for scoring.
On the other hand if you want to store e. g. per-postions boosts inthe payloads, you could use the norm en/decoding methods that arealready in Similarity. You could use the following code in aTokenStream:
 byte[] payload = new byte[1];
 payload[0] = Similari.encodeNorm(boost);
 token.setPayload(payload);

and in a scorer you could get the boost then with:
 termPositions.getPayload(payloadBuffer);
 float boost = Similarity.decodeNorm(payloadBuffer[0]);
But maybe you have something different in mind? Could youelaborate, please?
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org

Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/LuceneFAQ




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [jira] Updated: (LUCENE-755) Payloads

Reply via email to