[jira] [Updated] (LUCENE-5333) Support sparse faceting for heterogeneous indices

Shai Erera (JIRA) Mon, 11 Nov 2013 02:26:58 -0800

     [ 
https://issues.apache.org/jira/browse/LUCENE-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Shai Erera updated LUCENE-5333:
-------------------------------

    Attachment: LUCENE-5333.patch

Patch add AllDimensionsFacetResultsHandler as a quick prototype to how this can 
be done. I also modified testTaxonomy to use it instead of 
AllFacetsAccumulator, and it passes.

If we want to proceed with this approach, we can do the following:

* Add a new AllDimensionsFacetRequest which either:
** Extends CountFacetRequest, but then we limit it to counting only
** Wraps another FacetRequest so that you can do any aggregation that you want.
** It setDepth(2) internally.
* Move FacetResultsHandler into FacetRequest, instead of 
TaxonomyFacetsAccumulator.createFacetResultsHandler. I'll admit that originally 
that's where it was (in FR), but I moved it to FA in order to simplify FR 
implementations. But perhaps it does belong w/ FR...

The only non-trivial part of this is that you get back a FacetResult, whose 
children are the actual results, so you cannot simply iterate on 
res.subResults, but need to realize you should iterate on each 
subResults.subResults. I don't know if this is considered as complicated or not 
(I didn't find it very complicating, but maybe I'm biased :)).

All-in-all, I think this is somewhat better than the accumulator approach, as 
it's more intuitive to define a FacetRequest, I think. In the faceted search 
module, FacetRequest == Query (in the content search jargon), and therefore 
more user-level than the underlying accumulator.

The downside is that it's not automatically supported by 
SortedSetDVAccumulator, since the latter doesn't respect any FacetRequest, only 
CountFacetRequest, and also does not let you specify your own 
FacetResultsHandler, but I think that that's solvable.

> Support sparse faceting for heterogeneous indices
> -------------------------------------------------
>
>                 Key: LUCENE-5333
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5333
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/facet
>            Reporter: Michael McCandless
>         Attachments: LUCENE-5333.patch, LUCENE-5333.patch, LUCENE-5333.patch
>
>
> In some search apps, e.g. a large e-commerce site, the index can have
> a mix of wildly different product categories and facet dimensions, and
> the number of dimensions could be huge.
> E.g. maybe the index has shirts, computer memory, hard drives, etc.,
> and each of these many categories has different attributes.
> In such an index, when someone searches for "so dimm", which should
> match a bunch of laptop memory modules, you can't (easily) know up
> front which facet dimensions will be important.
> But, I think this is very easy for the facet module, since ords are
> stored "row stride" (each doc lists all facet labels it has), we could
> simply count all facets that the hits actually saw, and then in the
> end see which ones "got traction" and return facet results for these
> top dims.
> I'm not sure what the API would look like, but conceptually this
> should work very well, because of how the facet module works.
> You shouldn't have to state up front exactly which facet dimensions
> to count...



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (LUCENE-5333) Support sparse faceting for heterogeneous indices

Reply via email to