[jira] [Updated] (SOLR-9193) Add scoreNodes Streaming Expression

2016-07-06 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-9193:
-
Component/s: SolrJ

> Add scoreNodes Streaming Expression
> ---
>
> Key: SOLR-9193
> URL: https://issues.apache.org/jira/browse/SOLR-9193
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrJ
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
> Fix For: 6.2
>
> Attachments: SOLR-9193.patch
>
>
> The scoreNodes Streaming Expression is another *GraphExpression*. It will 
> decorate a gatherNodes expression and use a tf-idf scoring algorithm to score 
> the nodes.
> The gatherNodes expression only gathers nodes and aggregations. This is 
> similar in nature to tf in search ranking, where the number of times a node 
> appears in the traversal represents the tf. But this skews recommendations 
> towards nodes that appear frequently in the index.
> Using the idf for each node we can score each node as a function of tf-idf. 
> This will provide a boost to nodes that appear less frequently in the index. 
> The scoreNodes expression will gather the idf's from the shards for each node 
> emitted by the underlying gatherNodes expression. It will then assign the 
> score to each node. 
> The computed score will be added to each node in the *nodeScore* field. The 
> docFreq of the node across the entire collection will be added to each node 
> in the *docFreq* field. Other streaming expressions can then perform a 
> ranking based on the nodeScore or compute their own score using the nodeFreq.
> proposed syntax:
> {code}
> top(n="10",
>   sort="nodeScore desc",
>   scoreNodes(gatherNodes(...))) 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9193) Add scoreNodes Streaming Expression

2016-07-05 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-9193:
-
Description: 
The scoreNodes Streaming Expression is another *GraphExpression*. It will 
decorate a gatherNodes expression and use a tf-idf scoring algorithm to score 
the nodes.

The gatherNodes expression only gathers nodes and aggregations. This is similar 
in nature to tf in search ranking, where the number of times a node appears in 
the traversal represents the tf. But this skews recommendations towards nodes 
that appear frequently in the index.

Using the idf for each node we can score each node as a function of tf-idf. 
This will provide a boost to nodes that appear less frequently in the index. 

The scoreNodes expression will gather the idf's from the shards for each node 
emitted by the underlying gatherNodes expression. It will then assign the score 
to each node. 

The computed score will be added to each node in the *nodeScore* field. The 
docFreq of the node across the entire collection will be added to each node in 
the *docFreq* field. Other streaming expressions can then perform a ranking 
based on the nodeScore or compute their own score using the nodeFreq.

proposed syntax:
{code}
top(n="10",
  sort="nodeScore desc",
  scoreNodes(gatherNodes(...))) 
{code}








  was:
The scoreNodes Streaming Expression is another *GraphExpression*. It will 
decorate a gatherNodes expression and us a tf-idf scoring algorithm to score 
the nodes.

The gatherNodes expression only gathers nodes and aggregations. This is similar 
in nature to tf in search ranking, where the number of times a node appears in 
the traversal represents the tf. But this skews recommendations towards nodes 
that appear frequently in the index.

Using the idf for each node we can score each node as a function of tf and idf. 
This will provide a boost to nodes that appear less frequently in the index. 

The scoreNodes expression will gather the idf's from the shards for each node 
emitted by the underlying gatherNodes expression. It will then assign the score 
to each node. 

The computed score will be added to each node in the *nodeScore* field. The 
docFreq of the node across the entire collection will be added to each node in 
the *docFreq* field. Other streaming expressions can then perform a ranking 
based on the nodeScore or compute their own score using the nodeFreq.

proposed syntax:
{code}
top(n="10",
  sort="nodeScore desc",
  scoreNodes(gatherNodes(...))) 
{code}









> Add scoreNodes Streaming Expression
> ---
>
> Key: SOLR-9193
> URL: https://issues.apache.org/jira/browse/SOLR-9193
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
> Fix For: 6.2
>
> Attachments: SOLR-9193.patch
>
>
> The scoreNodes Streaming Expression is another *GraphExpression*. It will 
> decorate a gatherNodes expression and use a tf-idf scoring algorithm to score 
> the nodes.
> The gatherNodes expression only gathers nodes and aggregations. This is 
> similar in nature to tf in search ranking, where the number of times a node 
> appears in the traversal represents the tf. But this skews recommendations 
> towards nodes that appear frequently in the index.
> Using the idf for each node we can score each node as a function of tf-idf. 
> This will provide a boost to nodes that appear less frequently in the index. 
> The scoreNodes expression will gather the idf's from the shards for each node 
> emitted by the underlying gatherNodes expression. It will then assign the 
> score to each node. 
> The computed score will be added to each node in the *nodeScore* field. The 
> docFreq of the node across the entire collection will be added to each node 
> in the *docFreq* field. Other streaming expressions can then perform a 
> ranking based on the nodeScore or compute their own score using the nodeFreq.
> proposed syntax:
> {code}
> top(n="10",
>   sort="nodeScore desc",
>   scoreNodes(gatherNodes(...))) 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9193) Add scoreNodes Streaming Expression

2016-07-05 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-9193:
-
Description: 
The scoreNodes Streaming Expression is another *GraphExpression*. It will 
decorate a gatherNodes expression and us a tf-idf scoring algorithm to score 
the nodes.

The gatherNodes expression only gathers nodes and aggregations. This is similar 
in nature to tf in search ranking, where the number of times a node appears in 
the traversal represents the tf. But this skews recommendations towards nodes 
that appear frequently in the index.

Using the idf for each node we can score each node as a function of tf and idf. 
This will provide a boost to nodes that appear less frequently in the index. 

The scoreNodes expression will gather the idf's from the shards for each node 
emitted by the underlying gatherNodes expression. It will then assign the score 
to each node. 

The computed score will be added to each node in the *nodeScore* field. The 
docFreq of the node across the entire collection will be added to each node in 
the *docFreq* field. Other streaming expressions can then perform a ranking 
based on the nodeScore or compute their own score using the nodeFreq.

proposed syntax:
{code}
top(n="10",
  sort="nodeScore desc",
  scoreNodes(gatherNodes(...))) 
{code}








  was:
The scoreNodes Streaming Expression is another *GraphExpression*. It will 
decorate a gatherNodes expression and us a tf-idf scoring algorithm to score 
the nodes.

The gatherNodes expression only gathers nodes and aggregations. This is similar 
in nature to tf in search ranking, where the number of times a node appears in 
the traversal represents the tf. But this skews recommendations towards nodes 
that appear frequently in the index.

Using the idf for each node we can score each node as a function of tf and idf. 
This will provide a boost to nodes that appear less frequently in the index. 

The scoreNodes expression will gather the idf's from the shards for each node 
emitted by the underlying gatherNodes expression. It will then assign the score 
to each node. 

The computed score will be added to each node in the *nodeScore* field. The 
docFreq of the node across the entire collection will be added to each node in 
the *nodeFreq* field. Other streaming expressions can then perform a ranking 
based on the nodeScore or compute their own score using the nodeFreq.

proposed syntax:
{code}
top(n="10",
  sort="nodeScore desc",
  scoreNodes(gatherNodes(...))) 
{code}









> Add scoreNodes Streaming Expression
> ---
>
> Key: SOLR-9193
> URL: https://issues.apache.org/jira/browse/SOLR-9193
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
> Fix For: 6.2
>
> Attachments: SOLR-9193.patch
>
>
> The scoreNodes Streaming Expression is another *GraphExpression*. It will 
> decorate a gatherNodes expression and us a tf-idf scoring algorithm to score 
> the nodes.
> The gatherNodes expression only gathers nodes and aggregations. This is 
> similar in nature to tf in search ranking, where the number of times a node 
> appears in the traversal represents the tf. But this skews recommendations 
> towards nodes that appear frequently in the index.
> Using the idf for each node we can score each node as a function of tf and 
> idf. This will provide a boost to nodes that appear less frequently in the 
> index. 
> The scoreNodes expression will gather the idf's from the shards for each node 
> emitted by the underlying gatherNodes expression. It will then assign the 
> score to each node. 
> The computed score will be added to each node in the *nodeScore* field. The 
> docFreq of the node across the entire collection will be added to each node 
> in the *docFreq* field. Other streaming expressions can then perform a 
> ranking based on the nodeScore or compute their own score using the nodeFreq.
> proposed syntax:
> {code}
> top(n="10",
>   sort="nodeScore desc",
>   scoreNodes(gatherNodes(...))) 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9193) Add scoreNodes Streaming Expression

2016-06-30 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-9193:
-
Attachment: SOLR-9193.patch

First patch with the scoreNodes expression working. A simple testcase is 
included. 

This builds on the work to the TermsComponent in SOLR-9243. 

> Add scoreNodes Streaming Expression
> ---
>
> Key: SOLR-9193
> URL: https://issues.apache.org/jira/browse/SOLR-9193
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
> Fix For: 6.2
>
> Attachments: SOLR-9193.patch
>
>
> The scoreNodes Streaming Expression is another *GraphExpression*. It will 
> decorate a gatherNodes expression and us a tf-idf scoring algorithm to score 
> the nodes.
> The gatherNodes expression only gathers nodes and aggregations. This is 
> similar in nature to tf in search ranking, where the number of times a node 
> appears in the traversal represents the tf. But this skews recommendations 
> towards nodes that appear frequently in the index.
> Using the idf for each node we can score each node as a function of tf and 
> idf. This will provide a boost to nodes that appear less frequently in the 
> index. 
> The scoreNodes expression will gather the idf's from the shards for each node 
> emitted by the underlying gatherNodes expression. It will then assign the 
> score to each node. 
> The computed score will be added to each node in the *nodeScore* field. The 
> docFreq of the node across the entire collection will be added to each node 
> in the *nodeFreq* field. Other streaming expressions can then perform a 
> ranking based on the nodeScore or compute their own score using the nodeFreq.
> proposed syntax:
> {code}
> top(n="10",
>   sort="nodeScore desc",
>   scoreNodes(gatherNodes(...))) 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9193) Add scoreNodes Streaming Expression

2016-06-23 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-9193:
-
Summary: Add scoreNodes Streaming Expression  (was: Add the scoreNodes 
Streaming Expression)

> Add scoreNodes Streaming Expression
> ---
>
> Key: SOLR-9193
> URL: https://issues.apache.org/jira/browse/SOLR-9193
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
> Fix For: 6.2
>
>
> The scoreNodes Streaming Expression is another *GraphExpression*. It will 
> decorate a gatherNodes expression and us a tf-idf scoring algorithm to score 
> the nodes.
> The gatherNodes expression only gathers nodes and aggregations. This is 
> similar in nature to tf in search ranking, where the number of times a node 
> appears in the traversal represents the tf. But this skews recommendations 
> towards nodes that appear frequently in the index.
> Using the idf for each node we can score each node as a function of tf and 
> idf. This will provide a boost to nodes that appear less frequently in the 
> index. 
> The scoreNodes expression will gather the idf's from the shards for each node 
> emitted by the underlying gatherNodes expression. It will then assign the 
> score to each node. 
> The computed score will be added to each node in the *nodeScore* field. The 
> docFreq of the node across the entire collection will be added to each node 
> in the *nodeFreq* field. Other streaming expressions can then perform a 
> ranking based on the nodeScore or compute their own score using the nodeFreq.
> proposed syntax:
> {code}
> top(n="10",
>   sort="nodeScore desc",
>   scoreNodes(gatherNodes(...))) 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org