[ 
https://issues.apache.org/jira/browse/SOLR-10651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16343607#comment-16343607
 ] 

ASF subversion and git services commented on SOLR-10651:
--------------------------------------------------------

Commit 603bb7fb14e795b3317385fe97c3bfcd4bc39725 in lucene-solr's branch 
refs/heads/branch_7x from [~joel.bernstein]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=603bb7f ]

SOLR-10651, SOLR-10784: Add new statistical and machine learning functions to 
CHANGES.txt for 7.3 release


> Streaming Expressions statistical functions library
> ---------------------------------------------------
>
>                 Key: SOLR-10651
>                 URL: https://issues.apache.org/jira/browse/SOLR-10651
>             Project: Solr
>          Issue Type: New Feature
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: streaming expressions
>            Reporter: Joel Bernstein
>            Assignee: Joel Bernstein
>            Priority: Major
>             Fix For: master (8.0), 7.3
>
>         Attachments: SOLR_7_1_DOCS.patch
>
>
> This is a ticket for organizing the new statistical programming features of 
> Streaming Expressions. It's also a place for the community to discuss what 
> functions are needed to support statistical programming. 
> Basic Syntax:
> {code}
> let(a = timeseries(...),
>     b = timeseries(...),
>     c = col(a, count(*)),
>     d = col(b, count(*)),
>     r = regress(c, d),
>     tuple(p = predict(r, 50)))
> {code}
> The expression above is doing the following:
> 1) The let expression is setting variables (a, b, c, d, r).
> 2) Variables *a* and *b* are the output of timeseries() Streaming 
> Expressions. These will be stored in memory as lists of Tuples containing the 
> time series results.
> 3) Variables *c* and *d* are set using the *col* evaluator. The col evaluator 
> extracts a column of numbers from a list of tuples. In the example *col* is 
> extracting the count\(*\) field from the two time series result sets.
> 4) Variable *r* is the output from the *regress* evaluator. The regress 
> evaluator performs a simple regression analysis on two columns of numbers.
> 5) Once the variables are set, a single Streaming Expression is run by the 
> *let* expression. In the example the *tuple* expression is run. The tuple 
> expression outputs a single Tuple with name/value pairs. Any Streaming 
> Expression can be run by the *let* expression so this can be a complex 
> program. The streaming expression run by *let* has access to all the 
> variables defined earlier.
> 6) The tuple expression in the example has one name / value pair. The name 
> *p* is set to the output of the *predict* evaluator. The predict evaluator is 
> predicting the value of a dependent variable based on the independent 
> variable 50. The regression result stored in variable *r* is used to make the 
> prediction.
> 7) The output of this expression will be a single tuple with the value of the 
> predict function in the *p* field.
> The growing list of issues linked to this ticket are the array manipulation 
> and statistical functions that will form the basis of the stats library. The 
> vast majority of these functions are backed by algorithms in Apache Commons 
> Math. Other machine learning and math libraries will follow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to