[ 
https://issues.apache.org/jira/browse/SOLR-10123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068518#comment-16068518
 ] 

Houston Putman edited comment on SOLR-10123 at 6/29/17 6:18 PM:
----------------------------------------------------------------

Okay, so I have updated the cloud and non-cloud schemas to add the randomized 
numeric fields. However the randomized doc-values cannot be used since 
docValues are required for almost all Analytics Component functionality.

Almost all tests pass now, however there is a difference between 
SortedSetDocValues (TrieField) and SortedNumericDocValues (PointField) that 
might make this impossible. SortedSetDocValues only store the unique set of 
values for a multi-valued field, however SortedNumericDocValues can store the 
same value multiple times for a field on the same document. Therefore analytics 
results can vary between the two. 

Imagine you have the following document
{code}
{
  id="1", 
  multi_valued_int_field=[1,1,2,2,3], 
  float_field=3
}
{code}

and were executing a facet over multi_valued_int_field, and calculating the sum 
of float_field. Ie, for every unique value in multi_valued_int_field, calculate 
the sum of float_field.

If multi_valued_int_field is of type IntPointField, then the following results 
appear

||Facet Value||Calculation||Result||Reason||
|1|3 + 3|6|value 1 appears 2 times in the multivalued field so 2 instances of 3 
are summed|
|2|3 + 3|6|value 2 appears 2 times in the multivalued field so 2 instances of 3 
are summed|
|3|3|3|value 3 appears 1 time in the multivalued field so 3 is the result|

If multi_valued_int_field is of type TrieIntField, then the following results 
appear

||Facet Value||Calculation||Result||Reason||
|1|3|3|value 1 appears 1 time in the multivalued field so 3 is the result|
|2|3|3|value 2 appears 1 time in the multivalued field so 3 is the result|
|3|3|3|value 3 appears 1 time in the multivalued field so 3 is the result|

The difference here is how IntPointField and TrieIntField are stored. 
IntPointField does not deduplicate the values in the array while TrieIntField 
does.

The same thing would occur when a multi-valued numeric field was used in an 
expression, but that is not included in the unit tests.


was (Author: houstonputman):
Okay, so I have updated the cloud and non-cloud schemas to add the randomized 
numeric fields. However the randomized doc-values cannot be used since 
docValues are required for almost all Analytics Component functionality.

Almost all tests pass now, however there is a difference between 
SortedSetDocValues (TrieField) and SortedNumericDocValues (PointField) that 
might make this impossible. SortedSetDocValues only store the unique set of 
values for a multi-valued field, however SortedNumericDocValues can store the 
same value multiple times for a field on the same document. Therefore analytics 
results can vary between the two. 

For an example, if you faceting on {{multi_valued_int_field}} and calculated 
{{sum(float_field)}} on just the following document:
{{Document = ( id="1", multi_valued_int_field=\[1,1,2,2,3\], float_field=3 )}}

If {{multi_valued_int_field}} was a {{IntPointField}}, then the results of the 
facet would be ( {{facet_value : facet_results, ...}} ):
{{1 : ( sum(float_field) = 6 ) , 2 : ( sum(float_field) = 6 ) , 3 : ( 
sum(float_field) = 3 )}}

If {{multi_valued_int_field}} was a {{TrieIntField}}, then the results of the 
facet would be ( {{facet_value : facet_results, ...}} ):
{{1 : ( sum(float_field) = 3 ) , 2 : ( sum(float_field) = 3 ) , 3 : ( 
sum(float_field) = 3 )}}

This isn't included in the unit tests, but the same thing would occur when a 
multi-valued numeric field was used in an expression. The results could be 
different.

> Analytics Component 2.0
> -----------------------
>
>                 Key: SOLR-10123
>                 URL: https://issues.apache.org/jira/browse/SOLR-10123
>             Project: Solr
>          Issue Type: New Feature
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Houston Putman
>              Labels: features
>         Attachments: SOLR-10123.patch, SOLR-10123.patch, SOLR-10123.patch
>
>
> A completely redesigned Analytics Component, introducing the following 
> features:
> * Support for distributed collections
> * New JSON request language, and response format that fits JSON better.
> * Faceting over mapping functions in addition to fields (Value Faceting)
> * PivotFaceting with ValueFacets
> * More advanced facet sorting
> * Support for PointField types
> * Expressions over multi-valued fields
> * New types of mapping functions
> ** Logical
> ** Conditional
> ** Comparison
> * Concurrent request execution
> * Custom user functions, defined within the request
> Fully backwards compatible with the orifinal Analytics Component with the 
> following exceptions:
> * All fields used must have doc-values enabled
> * Expression results can no longer be used when defining Range and Query 
> facets
> * The reverse(string) mapping function is no longer a native function



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to