Re: Query time document group boosting

Toke Eskildsen Wed, 03 Dec 2008 02:23:15 -0800

On Tue, 2008-12-02 at 23:42 +0100, Chris Hostetter wrote:
> : A cosmetic remark, I would personally choose a single field for the boosts 
> and
> : then one token per source. (groupboost:A^10 groupboost:B^1 
> groupboost:C^0.1).
> 
> that's a key improvement, as it helps keep the number of unique fields 
> down, even if the number of sources grows without bounds.  make sure you 
> omitNorms on your groupboost field, and when buiding your various boolean 
> queries, consider disabling the coord (check the docs to understand why 
> that might make sense)


Ah yes, with norms, the score would be heavily influenced by the size of the 
group.
As for the number of groups, the downside to this method is clearly that all 
documents must have group status defined for each distinct set of groups.

...

Maybe this could be solved by a shared "dummy"-group that is defined for all 
documents? If we have two collections of groups: 
 * Source with the groups A-Z
 * Availability with the groups Available and Unavailable

Document 0
 * groupfield:dummy
 * groupfield:A
 * groupfield:Available
Document 1
 * groupfield:dummy
 * groupfield:Unavailable
Document 2
 * groupfield:dummy
 * groupfield:B
Document 3
 * groupfield:dummy

If we want to boost documents from Source A, we make the query
foo AND (groupfield:dummy OR groupfield:A^10)

Similary, if we want to demote documents with Availability Unavailable,
the query would be
foo AND (groupfield:dummy OR groupfield:Unavailable^0.1)

...

It would probably be cleaner to use a MatchAllDocsQuery instead of the dummy,
but I'm a bit unsure about the combined scoring. I'll have to experiment
further with this.



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Query time document group boosting

Reply via email to