[ https://issues.apache.org/jira/browse/SOLR-10651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16343607#comment-16343607 ]
ASF subversion and git services commented on SOLR-10651: -------------------------------------------------------- Commit 603bb7fb14e795b3317385fe97c3bfcd4bc39725 in lucene-solr's branch refs/heads/branch_7x from [~joel.bernstein] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=603bb7f ] SOLR-10651, SOLR-10784: Add new statistical and machine learning functions to CHANGES.txt for 7.3 release > Streaming Expressions statistical functions library > --------------------------------------------------- > > Key: SOLR-10651 > URL: https://issues.apache.org/jira/browse/SOLR-10651 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) > Components: streaming expressions > Reporter: Joel Bernstein > Assignee: Joel Bernstein > Priority: Major > Fix For: master (8.0), 7.3 > > Attachments: SOLR_7_1_DOCS.patch > > > This is a ticket for organizing the new statistical programming features of > Streaming Expressions. It's also a place for the community to discuss what > functions are needed to support statistical programming. > Basic Syntax: > {code} > let(a = timeseries(...), > b = timeseries(...), > c = col(a, count(*)), > d = col(b, count(*)), > r = regress(c, d), > tuple(p = predict(r, 50))) > {code} > The expression above is doing the following: > 1) The let expression is setting variables (a, b, c, d, r). > 2) Variables *a* and *b* are the output of timeseries() Streaming > Expressions. These will be stored in memory as lists of Tuples containing the > time series results. > 3) Variables *c* and *d* are set using the *col* evaluator. The col evaluator > extracts a column of numbers from a list of tuples. In the example *col* is > extracting the count\(*\) field from the two time series result sets. > 4) Variable *r* is the output from the *regress* evaluator. The regress > evaluator performs a simple regression analysis on two columns of numbers. > 5) Once the variables are set, a single Streaming Expression is run by the > *let* expression. In the example the *tuple* expression is run. The tuple > expression outputs a single Tuple with name/value pairs. Any Streaming > Expression can be run by the *let* expression so this can be a complex > program. The streaming expression run by *let* has access to all the > variables defined earlier. > 6) The tuple expression in the example has one name / value pair. The name > *p* is set to the output of the *predict* evaluator. The predict evaluator is > predicting the value of a dependent variable based on the independent > variable 50. The regression result stored in variable *r* is used to make the > prediction. > 7) The output of this expression will be a single tuple with the value of the > predict function in the *p* field. > The growing list of issues linked to this ticket are the array manipulation > and statistical functions that will form the basis of the stats library. The > vast majority of these functions are backed by algorithms in Apache Commons > Math. Other machine learning and math libraries will follow. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org