[
https://issues.apache.org/jira/browse/LUCENE-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shai Erera updated LUCENE-5333:
-------------------------------
Attachment: LUCENE-5333.patch
Patch add AllDimensionsFacetResultsHandler as a quick prototype to how this can
be done. I also modified testTaxonomy to use it instead of
AllFacetsAccumulator, and it passes.
If we want to proceed with this approach, we can do the following:
* Add a new AllDimensionsFacetRequest which either:
** Extends CountFacetRequest, but then we limit it to counting only
** Wraps another FacetRequest so that you can do any aggregation that you want.
** It setDepth(2) internally.
* Move FacetResultsHandler into FacetRequest, instead of
TaxonomyFacetsAccumulator.createFacetResultsHandler. I'll admit that originally
that's where it was (in FR), but I moved it to FA in order to simplify FR
implementations. But perhaps it does belong w/ FR...
The only non-trivial part of this is that you get back a FacetResult, whose
children are the actual results, so you cannot simply iterate on
res.subResults, but need to realize you should iterate on each
subResults.subResults. I don't know if this is considered as complicated or not
(I didn't find it very complicating, but maybe I'm biased :)).
All-in-all, I think this is somewhat better than the accumulator approach, as
it's more intuitive to define a FacetRequest, I think. In the faceted search
module, FacetRequest == Query (in the content search jargon), and therefore
more user-level than the underlying accumulator.
The downside is that it's not automatically supported by
SortedSetDVAccumulator, since the latter doesn't respect any FacetRequest, only
CountFacetRequest, and also does not let you specify your own
FacetResultsHandler, but I think that that's solvable.
> Support sparse faceting for heterogeneous indices
> -------------------------------------------------
>
> Key: LUCENE-5333
> URL: https://issues.apache.org/jira/browse/LUCENE-5333
> Project: Lucene - Core
> Issue Type: New Feature
> Components: modules/facet
> Reporter: Michael McCandless
> Attachments: LUCENE-5333.patch, LUCENE-5333.patch, LUCENE-5333.patch
>
>
> In some search apps, e.g. a large e-commerce site, the index can have
> a mix of wildly different product categories and facet dimensions, and
> the number of dimensions could be huge.
> E.g. maybe the index has shirts, computer memory, hard drives, etc.,
> and each of these many categories has different attributes.
> In such an index, when someone searches for "so dimm", which should
> match a bunch of laptop memory modules, you can't (easily) know up
> front which facet dimensions will be important.
> But, I think this is very easy for the facet module, since ords are
> stored "row stride" (each doc lists all facet labels it has), we could
> simply count all facets that the hits actually saw, and then in the
> end see which ones "got traction" and return facet results for these
> top dims.
> I'm not sure what the API would look like, but conceptually this
> should work very well, because of how the facet module works.
> You shouldn't have to state up front exactly which facet dimensions
> to count...
--
This message was sent by Atlassian JIRA
(v6.1#6144)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]