[ 
https://issues.apache.org/jira/browse/LUCENE-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13818466#comment-13818466
 ] 

Shai Erera commented on LUCENE-5333:
------------------------------------

bq. Well, I think the facet module already has too many classes

That's unrelated. It's like saying Lucene has many APIs: IndexWriter, 
IndexWriterConfig, Document, Field, MergePolicy, Query, QueryParser, Collector, 
IndexReader, IndexSearcher... just to name a few :). What's important here is 
FacetAccumulator and FacetRequest .. that's it. The rest are *totally* 
unrelated.

This scenario fits into another accumulator. Or else, we'll end up with facet 
code diverging left and right. Even now, for really no good reason, if you 
choose to index facets using SortedSetDV, you can only count them. Why? What 
prevents these ords from weighted by SumScore or a ValueSource? Nothing I 
think? So I'm worried that if you add this to only SortedSetDV, it will 
increase the difference between the two.

Rather, I prefer to pick the right API. We say that FacetsAccumulator is your 
entry point to accumulating facets. So far we've made FacetsAccumulator.create 
adhere to all existing FacetRequests and accumulators and return the proper 
one. I think that's a good API? And if all an AllFA needs to do is create dummy 
requests and filter out the not interesting ones, why complicate the code of 
all other accumulators (existing and future ones)? Won't it be simpler to add 
EnumFacetsAccumulator support to AllFA?

Look, this is not a rocket science feature. Besides that I don't think it's 
such an important or common feature, I think the app doesn't really need to go 
out of its way to support it -- it can easily create all possible FRs using 
very simple API, and filter out FacetResults whose FRN.subResults is empty. Can 
we make a simple utility for these apps - I'm all for it! But I prefer that we 
don't complicate the code of existing FAs.


> Support sparse faceting for heterogeneous indices
> -------------------------------------------------
>
>                 Key: LUCENE-5333
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5333
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/facet
>            Reporter: Michael McCandless
>         Attachments: LUCENE-5333.patch
>
>
> In some search apps, e.g. a large e-commerce site, the index can have
> a mix of wildly different product categories and facet dimensions, and
> the number of dimensions could be huge.
> E.g. maybe the index has shirts, computer memory, hard drives, etc.,
> and each of these many categories has different attributes.
> In such an index, when someone searches for "so dimm", which should
> match a bunch of laptop memory modules, you can't (easily) know up
> front which facet dimensions will be important.
> But, I think this is very easy for the facet module, since ords are
> stored "row stride" (each doc lists all facet labels it has), we could
> simply count all facets that the hits actually saw, and then in the
> end see which ones "got traction" and return facet results for these
> top dims.
> I'm not sure what the API would look like, but conceptually this
> should work very well, because of how the facet module works.
> You shouldn't have to state up front exactly which facet dimensions
> to count...



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to