[jira] [Commented] (LUCENE-4715) Add OrdinalPolicy.NO_DIMENSION

2013-01-29 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13565415#comment-13565415
 ] 

Michael McCandless commented on LUCENE-4715:


Here's the per-dim-rollup results.  The index has 2 CLPs: one with only Date, 
the other with username+categories (= many ords, flat), and I facet only on 
Date CLP.

base = trunk, comp = per-dim rollup

{noformat}
TaskQPS base  StdDevQPS comp StdDev 
   Pct diff
HighTerm   21.08  (6.9%)   21.26 (5.2%)0.9% 
( -10% -   13%)
 MedTerm   50.06  (6.1%)   52.39 (4.4%)4.6% 
(  -5% -   16%)
 LowTerm   97.70  (4.7%)  110.47 (4.6%)   13.1% 
(   3% -   23%)
{noformat}

So it helps most for queries matching fewer docs since the rollup is a fixed 
cost in the end ...

> Add OrdinalPolicy.NO_DIMENSION
> --
>
> Key: LUCENE-4715
> URL: https://issues.apache.org/jira/browse/LUCENE-4715
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-4715.patch
>
>
> With the move of OrdinalPolicy to CategoryListParams, 
> NonTopLevelOrdinalPolicy was nuked. It might be good to restore it, as 
> another enum value of OrdinalPolicy.
> It's the same like ALL_PARENTS, only doesn't add the dimension ordinal, which 
> could save space as well as computation time. It's good for when you don't 
> care about the count of Date/, but only about its children counts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4715) Add OrdinalPolicy.NO_DIMENSION

2013-01-29 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13565411#comment-13565411
 ] 

Shai Erera commented on LUCENE-4715:


Mike and I have been testing some aspects of this issue - we should test some 
others too and paste all the results here. Here are the scenarios:

*ALL_BUT_DIMENSION*

This should be better than ALL, since it encodes half the ordinals for flat 
dimensions.
The test would be to index all flat dimensions with ALL (trunk) vs ALL_BUT 
(patch) and compare times.

*Per-Dimension Rollup*

This should be better for when you need to rollup counts for a small dimension 
(saves iterating on a large counts array).
The test would be to:

* Index all dimensions (flat + hierarchical), so the counts[] is big (2.5M 
entries)
** Index Date in its own CLP in both cases, the idea is to generate a big 
taxonomy
* Query with a FacetRequest Date/
* Trunk would do the full traversal, patch would do the per-dim rollup and 
hopefully should be better

*Per-Dimension OrdinalPolicy*

The only advantage here is that it lets you index under the same CLP dimensions 
with different OrdinalPolicy settings.
To compare, we'd need to index with trunk the dimensions as ALL or NO, vs patch 
which can mix between ALL and NO 
(we can discard ALL_BUT) for this test.

> Add OrdinalPolicy.NO_DIMENSION
> --
>
> Key: LUCENE-4715
> URL: https://issues.apache.org/jira/browse/LUCENE-4715
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-4715.patch
>
>
> With the move of OrdinalPolicy to CategoryListParams, 
> NonTopLevelOrdinalPolicy was nuked. It might be good to restore it, as 
> another enum value of OrdinalPolicy.
> It's the same like ALL_PARENTS, only doesn't add the dimension ordinal, which 
> could save space as well as computation time. It's good for when you don't 
> care about the count of Date/, but only about its children counts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4715) Add OrdinalPolicy.NO_DIMENSION

2013-01-29 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13565385#comment-13565385
 ] 

Shai Erera commented on LUCENE-4715:


The thing is that there are two dimensions here: CategoryListParams and 
OrdinalPolicy for a category dimension:

* Different CLPs are good for when an application has good reasons to group 
different categories into different category lists, and then at search time 
request different groups of facets. E.g. an eCommerce application will probably 
have different facets for Cameras and Shoes, and therefore it would make sense 
to separate the facets into different lists.

* However, Mike and I saw that when you index hierarchical facets, then 
indexing them as NO_PARENTS yields better performance than ALL_PARENTS (b/c 
e.g. less ordinals are read), even when the parents' counts are rolled up in 
the end.
** Having said that, we also experimented with separating dimensions to 
separate field (one field per dimension), but that yielded worse results than 
grouping them together.
** So on one hand we want to have different OrdinalPolicy for different 
dimensions, but on the other hand, still group categories under the same CLP.

When I started to work on that issue, I did just like as you suggest -- use 
PerDimensionIndexingParams, and pass different CLP instances for different 
dimensions, yet still keep the CLP.field the same for dimensions that "should 
go together".

But that complicated matters for FacetFields, b/c it first groups all CPs under 
their respective CLPs, and creates a 
{{Map>}}. Then all the CPs of the 
same CLP are passed to CountingListBuilder.

If I wanted to work w/ PerDimensionIndexingParams, I'd need to change 
FacetFields to map from a String -> (map of CLP -> CP) and then change 
CountingListBuilder accordingly. Also, CountingFacetsCollector would need to 
change as well, since currently it assumes a single CLP instance.

In short, while this is doable, I think it's confusing, and could lead apps to 
think that if you need different OrdinalPolicy for dimensions, you also need 
different CLPs, and consequently different fields, which is bad!

So I think that solution is good -- whoever intends to control OrdinalPolicy, 
would need to create some Map, so with this approach, he'll create a 
Map(String,OrdinalPolicy). If he needs both worlds (multiple CLPs AND 
OrdinalPolicy-ies), then he needs to create two Maps ... doesn't sound a big 
deal to me.

> Add OrdinalPolicy.NO_DIMENSION
> --
>
> Key: LUCENE-4715
> URL: https://issues.apache.org/jira/browse/LUCENE-4715
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-4715.patch
>
>
> With the move of OrdinalPolicy to CategoryListParams, 
> NonTopLevelOrdinalPolicy was nuked. It might be good to restore it, as 
> another enum value of OrdinalPolicy.
> It's the same like ALL_PARENTS, only doesn't add the dimension ordinal, which 
> could save space as well as computation time. It's good for when you don't 
> care about the count of Date/, but only about its children counts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4715) Add OrdinalPolicy.NO_DIMENSION

2013-01-29 Thread Gilad Barkai (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13565359#comment-13565359
 ] 

Gilad Barkai commented on LUCENE-4715:
--

Looking at the patch, I think I might misunderstood something - in the build 
method, for every category the right policy is checked, but the build itself is 
per CategoryListParam - so why cant the policy be the same for each CLP? If one 
wishes to get different policies etc - I think it would be logical to separate 
them to different clps, and this check should not be performed over each 
category?



> Add OrdinalPolicy.NO_DIMENSION
> --
>
> Key: LUCENE-4715
> URL: https://issues.apache.org/jira/browse/LUCENE-4715
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-4715.patch
>
>
> With the move of OrdinalPolicy to CategoryListParams, 
> NonTopLevelOrdinalPolicy was nuked. It might be good to restore it, as 
> another enum value of OrdinalPolicy.
> It's the same like ALL_PARENTS, only doesn't add the dimension ordinal, which 
> could save space as well as computation time. It's good for when you don't 
> care about the count of Date/, but only about its children counts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org