[ https://issues.apache.org/jira/browse/LUCENE-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13818801#comment-13818801 ]
Shai Erera commented on LUCENE-5333: ------------------------------------ I talked with Gilad about it and he suggested a nice solution, with some limitations -- you can create whatever FacetRequest, e.g. CountFacetRequest over the ROOT category and set its depth to 2. That way, if we ask for numResults=10, you basically say "give me the top-10 dimensions (children of ROOT) and for each its top-10 children". This isn't perfect as if you want to get all available dimensions you have to guess what numResults should be set to. And if you ask for a high number, e.g. 100, you ask for the top-100 children of ROOT, and for each its top-100 children. Still, you might not get all dimensions, but it's a very easy way to do this. No need for any custom code. Another limitation is that this is currently supported by TaxonomyFacetsAccumulator, but SortedSetDVAccumulator limits the depth to 1 for all given requests. In that spirit, I can propose another solution - write a FacetResultsHandler which skips the first level of children and returns a FacetResult which has a tree structure, such that the first level are the dimensions and the second level are the actual children. That way, doing new CountFacetRequest(ROOT, 10).setDepth(2) will result in all available dimensions in the first level, but top-10 for each in the second level. To implement such FacetResultsHandler we'd need to iterate over ROOT's children and compute the top-K for each, using e.g. DepthOneFacetResultsHandler... > Support sparse faceting for heterogeneous indices > ------------------------------------------------- > > Key: LUCENE-5333 > URL: https://issues.apache.org/jira/browse/LUCENE-5333 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/facet > Reporter: Michael McCandless > Attachments: LUCENE-5333.patch, LUCENE-5333.patch > > > In some search apps, e.g. a large e-commerce site, the index can have > a mix of wildly different product categories and facet dimensions, and > the number of dimensions could be huge. > E.g. maybe the index has shirts, computer memory, hard drives, etc., > and each of these many categories has different attributes. > In such an index, when someone searches for "so dimm", which should > match a bunch of laptop memory modules, you can't (easily) know up > front which facet dimensions will be important. > But, I think this is very easy for the facet module, since ords are > stored "row stride" (each doc lists all facet labels it has), we could > simply count all facets that the hits actually saw, and then in the > end see which ones "got traction" and return facet results for these > top dims. > I'm not sure what the API would look like, but conceptually this > should work very well, because of how the facet module works. > You shouldn't have to state up front exactly which facet dimensions > to count... -- This message was sent by Atlassian JIRA (v6.1#6144) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org