[ https://issues.apache.org/jira/browse/LUCENE-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13818466#comment-13818466 ]
Shai Erera commented on LUCENE-5333: ------------------------------------ bq. Well, I think the facet module already has too many classes That's unrelated. It's like saying Lucene has many APIs: IndexWriter, IndexWriterConfig, Document, Field, MergePolicy, Query, QueryParser, Collector, IndexReader, IndexSearcher... just to name a few :). What's important here is FacetAccumulator and FacetRequest .. that's it. The rest are *totally* unrelated. This scenario fits into another accumulator. Or else, we'll end up with facet code diverging left and right. Even now, for really no good reason, if you choose to index facets using SortedSetDV, you can only count them. Why? What prevents these ords from weighted by SumScore or a ValueSource? Nothing I think? So I'm worried that if you add this to only SortedSetDV, it will increase the difference between the two. Rather, I prefer to pick the right API. We say that FacetsAccumulator is your entry point to accumulating facets. So far we've made FacetsAccumulator.create adhere to all existing FacetRequests and accumulators and return the proper one. I think that's a good API? And if all an AllFA needs to do is create dummy requests and filter out the not interesting ones, why complicate the code of all other accumulators (existing and future ones)? Won't it be simpler to add EnumFacetsAccumulator support to AllFA? Look, this is not a rocket science feature. Besides that I don't think it's such an important or common feature, I think the app doesn't really need to go out of its way to support it -- it can easily create all possible FRs using very simple API, and filter out FacetResults whose FRN.subResults is empty. Can we make a simple utility for these apps - I'm all for it! But I prefer that we don't complicate the code of existing FAs. > Support sparse faceting for heterogeneous indices > ------------------------------------------------- > > Key: LUCENE-5333 > URL: https://issues.apache.org/jira/browse/LUCENE-5333 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/facet > Reporter: Michael McCandless > Attachments: LUCENE-5333.patch > > > In some search apps, e.g. a large e-commerce site, the index can have > a mix of wildly different product categories and facet dimensions, and > the number of dimensions could be huge. > E.g. maybe the index has shirts, computer memory, hard drives, etc., > and each of these many categories has different attributes. > In such an index, when someone searches for "so dimm", which should > match a bunch of laptop memory modules, you can't (easily) know up > front which facet dimensions will be important. > But, I think this is very easy for the facet module, since ords are > stored "row stride" (each doc lists all facet labels it has), we could > simply count all facets that the hits actually saw, and then in the > end see which ones "got traction" and return facet results for these > top dims. > I'm not sure what the API would look like, but conceptually this > should work very well, because of how the facet module works. > You shouldn't have to state up front exactly which facet dimensions > to count... -- This message was sent by Atlassian JIRA (v6.1#6144) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org