Umesh, this is a good summary.

So, the question is what is the cost (performance and memory) of having the
CollapsingQParserPlugin choose the group head by using the Solr sort
criteria?

Keep in mind that the CollapsingQParserPlugin's main design goal is to
provide fast performance when collapsing on a high cardinality field. How
you choose the group head can have a big impact here, both on memory
consumption performance.

The function query collapse criteria was added to allow you to come up with
custom formulas for selecting the group head, with little or no impact on
performance and memory. Using Solr's recip() function query it seems like
you could come up with some nice scenarios where two variables could be
used to select the group head. For example:

fq={!collapse field=a max='sub(prod(cscore(),1000), recip(field(x),1, 1000,
1000))'}

This seems like it would basically give you two sort critea: cscore(),
which returns the score, would be the primary criteria. The recip of field
"x" would be the secondary criteria.













Joel Bernstein
Search Engineer at Heliosearch


On Thu, Jun 19, 2014 at 2:18 AM, Umesh Prasad <umesh.i...@gmail.com> wrote:

> Continuing the discussion on mailing list from Jira.
>
> An Example
>
>
> *id      group           f1              f2*1       g1
> 5               10
> 2       g1                 5               1000
> 3       g1                 5               1000
> 4       g1                 10              100
> 5       g2                 5               10
> 6       g2                 5               1000
> 7       g2                 5               1000
> 8       g2                10              100
>
> sort= f1 asc, f2 desc , id desc
>
>
> *Without collapse will give : *
> (7,g2), (6,g2),  (3,g1), (2,g1), (5,g2), (1,g1), (8,g2), (4,g1)
>
>
> *On collapsing by group_s  expected output is : *  (7,g2), (3,g1)
>
> solr standard collapsing does give this output  with
> group=on,group.field=group_s,group.main=true
>
> * Collapsing with CollapsingQParserPlugin* fq={!collapse field=group_s} :
>   (5,g2), (1,g1)
>
>
>
> * Summarizing Jira Discussion :*
> 1. CollapsingQParserPlugin picks up the group heads from matching results
> and passes those further. So in essence filtering some of the matching
> documents, so that subsequent collectors never see them. It can also pass
> on score to subsequent collectors using a dummy scorer.
>
> 2. TopDocCollector comes later in hierarchy and it will sort on the
> collapsed set. That works fine.
>
> The issue is with step 1. Collapsing is done by a single comparator which
> can take its value from a field or function. It defaults to score.
> Function queries do allow us to combine multiple fields / value sources,
> however it would be difficult to construct a function for given sort
> fields. Primarily because
>     a) The range of values for a given sort field is not known in advance.
> It is possible for one sort field to unbounded, but other to be bounded
> within a small range.
>     b) The sort field can itself hold custom logic.
>
> Because of (a) the group head selected by CollapsingQParserPlugin will be
> incorrect and subsequent sorting will break.
>
>
>
> On 14 June 2014 12:38, Umesh Prasad <umesh.i...@gmail.com> wrote:
>
>> Thanks Joel for the quick response. I have opened a new jira ticket.
>>
>> https://issues.apache.org/jira/browse/SOLR-6168
>>
>>
>>
>>
>> On 13 June 2014 17:45, Joel Bernstein <joels...@gmail.com> wrote:
>>
>>> Let's open a new ticket.
>>>
>>> Joel Bernstein
>>> Search Engineer at Heliosearch
>>>
>>>
>>> On Fri, Jun 13, 2014 at 8:08 AM, Umesh Prasad <umesh.i...@gmail.com>
>>> wrote:
>>>
>>> > The patch in SOLR-5408 fixes the issue with sorting only for two sort
>>> > fields. Sorting still breaks when 3 or more sort fields are used.
>>> >
>>> > I have attached a test case, which demonstrates the broken behavior
>>> when 3
>>> > sort fields are used.
>>> >
>>> > The failing test case patch is against Lucene/Solr 4.7 revision  number
>>> > 1602388
>>> >
>>> > Can someone apply and verify the bug ?
>>> >
>>> > Also, should I re-open SOLR-5408  or open a new ticket ?
>>> >
>>> >
>>> > ---
>>> > Thanks & Regards
>>> > Umesh Prasad
>>> >
>>>
>>
>>
>>
>> --
>> ---
>> Thanks & Regards
>> Umesh Prasad
>>
>
>
>
> --
> ---
> Thanks & Regards
> Umesh Prasad
>

Reply via email to