[jira] [Commented] (LUCENE-5333) Support sparse faceting for heterogeneous indices

Shai Erera (JIRA) Mon, 11 Nov 2013 01:48:45 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13818801#comment-13818801
 ]


Shai Erera commented on LUCENE-5333:
------------------------------------

I talked with Gilad about it and he suggested a nice solution, with some 
limitations -- you can create whatever FacetRequest, e.g. CountFacetRequest 
over the ROOT category and set its depth to 2. That way, if we ask for 
numResults=10, you basically say "give me the top-10 dimensions (children of 
ROOT) and for each its top-10 children".

This isn't perfect as if you want to get all available dimensions you have to 
guess what numResults should be set to. And if you ask for a high number, e.g. 
100, you ask for the top-100 children of ROOT, and for each its top-100 
children. Still, you might not get all dimensions, but it's a very easy way to 
do this. No need for any custom code. Another limitation is that this is 
currently supported by TaxonomyFacetsAccumulator, but SortedSetDVAccumulator 
limits the depth to 1 for all given requests.

In that spirit, I can propose another solution - write a FacetResultsHandler 
which skips the first level of children and returns a FacetResult which has a 
tree structure, such that the first level are the dimensions and the second 
level are the actual children. That way, doing new CountFacetRequest(ROOT, 
10).setDepth(2) will result in all available dimensions in the first level, but 
top-10 for each in the second level. To implement such FacetResultsHandler we'd 
need to iterate over ROOT's children and compute the top-K for each, using e.g. 
DepthOneFacetResultsHandler...

> Support sparse faceting for heterogeneous indices
> -------------------------------------------------
>
>                 Key: LUCENE-5333
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5333
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/facet
>            Reporter: Michael McCandless
>         Attachments: LUCENE-5333.patch, LUCENE-5333.patch
>
>
> In some search apps, e.g. a large e-commerce site, the index can have
> a mix of wildly different product categories and facet dimensions, and
> the number of dimensions could be huge.
> E.g. maybe the index has shirts, computer memory, hard drives, etc.,
> and each of these many categories has different attributes.
> In such an index, when someone searches for "so dimm", which should
> match a bunch of laptop memory modules, you can't (easily) know up
> front which facet dimensions will be important.
> But, I think this is very easy for the facet module, since ords are
> stored "row stride" (each doc lists all facet labels it has), we could
> simply count all facets that the hits actually saw, and then in the
> end see which ones "got traction" and return facet results for these
> top dims.
> I'm not sure what the API would look like, but conceptually this
> should work very well, because of how the facet module works.
> You shouldn't have to state up front exactly which facet dimensions
> to count...



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5333) Support sparse faceting for heterogeneous indices

Reply via email to