[ 
https://issues.apache.org/jira/browse/SOLR-13047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16821119#comment-16821119
 ] 

Joel Bernstein edited comment on SOLR-13047 at 4/18/19 1:48 PM:
----------------------------------------------------------------

Good questions.

It gets confusing when trying to understand the differences between sum, count, 
min, max and avg, and the math expression functions like *dotProduct.*

The math expression functions all extend sub-classes of StreamEvaluator. They 
are designed to work over data that has been read into memory structures like 
arrays, matrices and numbers.

The functions sum, count, min, max, avg all extend from the Metric class. 
Metrics are most often translated to functions that are translated to JSON 
facet API aggregations. You'll notice in the FacetStream they are used to build 
up the JSON facet API expression.

Currently there are only five Metrics available: sum, count, min, max and avg. 
But, the output of Facet2D can be read into memory and the operated on by math 
expressions as shown in the description.

About the constructor...

There are two constructors in the FacetStream. The one that matters is 
{code:java}
public FacetStream(StreamExpression expression, StreamFactory factory) throws 
IOException{   {code}
You can adapt this constructor for Facet2D.

This constructor is the one that will be called by the Streaming Expression 
parser. The FacetStream implementation shows how to extract the parameters 
using the *expression* and the *factory*. 

 


was (Author: joel.bernstein):
Good questions.

It gets confusing when trying to understand the differences between sum, count, 
min, max and avg, and the math expression functions like *dotProduct.*

The math expression functions all extend sub-classes of StreamEvaluator. They 
are designed to work over data that has been read into memory structures like 
arrays, matrices and numbers.

The functions sum, count, min, max, avg all extend from the Metric class. 
Metrics are most often translated to functions that are translated to JSON 
facet API aggregations. You'll notice in the FacetStream they are used to build 
up the JSON facet API expression.

Currently there are only five Metrics available: sum, count, min, max and avg. 
But, the output of Facet2D can be read into memory and the operated on by math 
expressions as shown in the description.

 

 

About the constructor...

The are two constructors in the FacetStream. The one that matters is 
{code:java}
public FacetStream(StreamExpression expression, StreamFactory factory) throws 
IOException{   {code}
You can adapt this constructor for Facet2D.

This constructor is the one that will be called by the Streaming Expression 
parser. The FacetStream implementation shows how to extract the parameters 
using the *expression* and the *factory*. 

 

> Add facet2D Streaming Expression
> --------------------------------
>
>                 Key: SOLR-13047
>                 URL: https://issues.apache.org/jira/browse/SOLR-13047
>             Project: Solr
>          Issue Type: New Feature
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Joel Bernstein
>            Assignee: Joel Bernstein
>            Priority: Major
>
> The current facet expression is a generic tool for creating multi-dimension 
> aggregations. The *facet2D* Streaming Expression has semantics specific for 2 
> dimensional facets which are designed to be *pivoted* into a matrix and 
> operated on by *Math Expressions*. 
> facet2D will use the json facet API under the covers. 
> Proposed syntax:
> {code:java}
> facet2D(medrecords, q=*:*, x=diseases, y=symptoms, dimensions="300, 10", 
> count(*)){code}
> The example above will return tuples containing the top 300 diseases and the 
> top ten symptoms for each disease. 
> Using math expression the tuples can be *pivoted* into a matrix where the 
> rows of the matrix are the diseases, the columns of the matrix are the 
> symptoms and the cells in the matrix contain the counts. This matrix can then 
> be *clustered* to find clusters of *diseases* that are correlated by 
> *symptoms*. 
> {code:java}
> let(a=facet2D(medrecords, q=*:*, x=diseases, y=symptoms, dimensions="300, 
> 10", count(*)),
>     b=pivot(a, diseases, symptoms, count(*)),
>     c=kmeans(b, 10)){code}
>  
> *Implementation Note:*
> The implementation plan for this ticket is to create a new stream called 
> Facet2DStream. The FacetStream code is a good starting point for the new 
> implementation and can be adapted for the Facet2D parameters. Similar tests 
> to the FacetStream can be added to StreamExpressionTest
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to