[jira] [Commented] (LUCENE-3097) Post grouping faceting

Michael McCandless (JIRA) Mon, 16 May 2011 03:11:30 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033947#comment-13033947
 ]


Michael McCandless commented on LUCENE-3097:
--------------------------------------------

Right, gender in this example was single-valued per group.

Another way to visualize / define how post-group faceting should behave is: 
imagine for ever facet value (ie field + value) you could define an aggregator. 
 Today, that aggregator is just the count of how many docs had that value from 
the full result set.  But you could, instead define it to be 
"count(distinct(doctor_id))", and then you'll get the group counts you want.  
(Other aggregators are conceivable -- max(relevance), min+max(prices), etc.).

Conceptually I think this also defines the post-group faceting functionality, 
even if we would never implement it this way (ie count(distinct(doctor_id)) 
would be way too costly to do naively).

> Post grouping faceting
> ----------------------
>
>                 Key: LUCENE-3097
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3097
>             Project: Lucene - Java
>          Issue Type: New Feature
>            Reporter: Martijn van Groningen
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>
> This issues focuses on implementing post grouping faceting.
> * How to handle multivalued fields. What field value to show with the facet.
> * Where the facet counts should be based on
> ** Facet counts can be based on the normal documents. Ungrouped counts. 
> ** Facet counts can be based on the groups. Grouped counts.
> ** Facet counts can be based on the combination of group value and facet 
> value. Matrix counts.   
> And properly more implementation options.
> The first two methods are implemented in the SOLR-236 patch. For the first 
> option it calculates a DocSet based on the individual documents from the 
> query result. For the second option it calculates a DocSet for all the most 
> relevant documents of a group. Once the DocSet is computed the FacetComponent 
> and StatsComponent use one the DocSet to create facets and statistics.  
> This last one is a bit more complex. I think it is best explained with an 
> example. Lets say we search on travel offers:
> |||hotel||departure_airport||duration||
> |Hotel a|AMS|5
> |Hotel a|DUS|10
> |Hotel b|AMS|5
> |Hotel b|AMS|10
> If we group by hotel and have a facet for airport. Most end users expect 
> (according to my experience off course) the following airport facet:
> AMS: 2
> DUS: 1
> The above result can't be achieved by the first two methods. You either get 
> counts AMS:3 and DUS:1 or 1 for both airports.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-3097) Post grouping faceting

Reply via email to