[jira] [Commented] (LUCENE-4715) Add OrdinalPolicy.NO_DIMENSION
[ https://issues.apache.org/jira/browse/LUCENE-4715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13565415#comment-13565415 ] Michael McCandless commented on LUCENE-4715: Here's the per-dim-rollup results. The index has 2 CLPs: one with only Date, the other with username+categories (= many ords, flat), and I facet only on Date CLP. base = trunk, comp = per-dim rollup {noformat} TaskQPS base StdDevQPS comp StdDev Pct diff HighTerm 21.08 (6.9%) 21.26 (5.2%)0.9% ( -10% - 13%) MedTerm 50.06 (6.1%) 52.39 (4.4%)4.6% ( -5% - 16%) LowTerm 97.70 (4.7%) 110.47 (4.6%) 13.1% ( 3% - 23%) {noformat} So it helps most for queries matching fewer docs since the rollup is a fixed cost in the end ... > Add OrdinalPolicy.NO_DIMENSION > -- > > Key: LUCENE-4715 > URL: https://issues.apache.org/jira/browse/LUCENE-4715 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-4715.patch > > > With the move of OrdinalPolicy to CategoryListParams, > NonTopLevelOrdinalPolicy was nuked. It might be good to restore it, as > another enum value of OrdinalPolicy. > It's the same like ALL_PARENTS, only doesn't add the dimension ordinal, which > could save space as well as computation time. It's good for when you don't > care about the count of Date/, but only about its children counts. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4715) Add OrdinalPolicy.NO_DIMENSION
[ https://issues.apache.org/jira/browse/LUCENE-4715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13565411#comment-13565411 ] Shai Erera commented on LUCENE-4715: Mike and I have been testing some aspects of this issue - we should test some others too and paste all the results here. Here are the scenarios: *ALL_BUT_DIMENSION* This should be better than ALL, since it encodes half the ordinals for flat dimensions. The test would be to index all flat dimensions with ALL (trunk) vs ALL_BUT (patch) and compare times. *Per-Dimension Rollup* This should be better for when you need to rollup counts for a small dimension (saves iterating on a large counts array). The test would be to: * Index all dimensions (flat + hierarchical), so the counts[] is big (2.5M entries) ** Index Date in its own CLP in both cases, the idea is to generate a big taxonomy * Query with a FacetRequest Date/ * Trunk would do the full traversal, patch would do the per-dim rollup and hopefully should be better *Per-Dimension OrdinalPolicy* The only advantage here is that it lets you index under the same CLP dimensions with different OrdinalPolicy settings. To compare, we'd need to index with trunk the dimensions as ALL or NO, vs patch which can mix between ALL and NO (we can discard ALL_BUT) for this test. > Add OrdinalPolicy.NO_DIMENSION > -- > > Key: LUCENE-4715 > URL: https://issues.apache.org/jira/browse/LUCENE-4715 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-4715.patch > > > With the move of OrdinalPolicy to CategoryListParams, > NonTopLevelOrdinalPolicy was nuked. It might be good to restore it, as > another enum value of OrdinalPolicy. > It's the same like ALL_PARENTS, only doesn't add the dimension ordinal, which > could save space as well as computation time. It's good for when you don't > care about the count of Date/, but only about its children counts. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4715) Add OrdinalPolicy.NO_DIMENSION
[ https://issues.apache.org/jira/browse/LUCENE-4715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13565385#comment-13565385 ] Shai Erera commented on LUCENE-4715: The thing is that there are two dimensions here: CategoryListParams and OrdinalPolicy for a category dimension: * Different CLPs are good for when an application has good reasons to group different categories into different category lists, and then at search time request different groups of facets. E.g. an eCommerce application will probably have different facets for Cameras and Shoes, and therefore it would make sense to separate the facets into different lists. * However, Mike and I saw that when you index hierarchical facets, then indexing them as NO_PARENTS yields better performance than ALL_PARENTS (b/c e.g. less ordinals are read), even when the parents' counts are rolled up in the end. ** Having said that, we also experimented with separating dimensions to separate field (one field per dimension), but that yielded worse results than grouping them together. ** So on one hand we want to have different OrdinalPolicy for different dimensions, but on the other hand, still group categories under the same CLP. When I started to work on that issue, I did just like as you suggest -- use PerDimensionIndexingParams, and pass different CLP instances for different dimensions, yet still keep the CLP.field the same for dimensions that "should go together". But that complicated matters for FacetFields, b/c it first groups all CPs under their respective CLPs, and creates a {{Map>}}. Then all the CPs of the same CLP are passed to CountingListBuilder. If I wanted to work w/ PerDimensionIndexingParams, I'd need to change FacetFields to map from a String -> (map of CLP -> CP) and then change CountingListBuilder accordingly. Also, CountingFacetsCollector would need to change as well, since currently it assumes a single CLP instance. In short, while this is doable, I think it's confusing, and could lead apps to think that if you need different OrdinalPolicy for dimensions, you also need different CLPs, and consequently different fields, which is bad! So I think that solution is good -- whoever intends to control OrdinalPolicy, would need to create some Map, so with this approach, he'll create a Map(String,OrdinalPolicy). If he needs both worlds (multiple CLPs AND OrdinalPolicy-ies), then he needs to create two Maps ... doesn't sound a big deal to me. > Add OrdinalPolicy.NO_DIMENSION > -- > > Key: LUCENE-4715 > URL: https://issues.apache.org/jira/browse/LUCENE-4715 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-4715.patch > > > With the move of OrdinalPolicy to CategoryListParams, > NonTopLevelOrdinalPolicy was nuked. It might be good to restore it, as > another enum value of OrdinalPolicy. > It's the same like ALL_PARENTS, only doesn't add the dimension ordinal, which > could save space as well as computation time. It's good for when you don't > care about the count of Date/, but only about its children counts. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4715) Add OrdinalPolicy.NO_DIMENSION
[ https://issues.apache.org/jira/browse/LUCENE-4715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13565359#comment-13565359 ] Gilad Barkai commented on LUCENE-4715: -- Looking at the patch, I think I might misunderstood something - in the build method, for every category the right policy is checked, but the build itself is per CategoryListParam - so why cant the policy be the same for each CLP? If one wishes to get different policies etc - I think it would be logical to separate them to different clps, and this check should not be performed over each category? > Add OrdinalPolicy.NO_DIMENSION > -- > > Key: LUCENE-4715 > URL: https://issues.apache.org/jira/browse/LUCENE-4715 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-4715.patch > > > With the move of OrdinalPolicy to CategoryListParams, > NonTopLevelOrdinalPolicy was nuked. It might be good to restore it, as > another enum value of OrdinalPolicy. > It's the same like ALL_PARENTS, only doesn't add the dimension ordinal, which > could save space as well as computation time. It's good for when you don't > care about the count of Date/, but only about its children counts. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org