[ 
https://issues.apache.org/jira/browse/SOLR-8925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-8925:
---------------------------------
    Comment: was deleted

(was: bq. When using the scatter parameter will the nodes be marked as which 
group they fall into? What if a node falls into multiple groups (kinda related 
to #1 above)?

nodes will  be marked with the level of the traversal and the collection they 
came from.


bq. If gatherNodes is doing a 'join' between friends and articles I'd expect 
the tuple to be a join of the tuple found in articles and the tuple found in 
friends. But if "The inner gatherNodes() expression then emits the friend 
Tuples" I believe this is more of an intersect. Ie, give me tuples in friends 
which also appear in articles, using the author->user equalitor. Though I guess 
it would be returning tuples from both the left and right streams whereas a 
standard intersect only returns tuples from the left stream. That said, it's 
not joining those tuples together.

It's a join but not similar to the other joins expressions which are done with 
a single search for the left and right streams. This a parallel batched nested 
loop join. So I'm not sure it expresses quite like the other joins. You can see 
the implementation in the ShortestPathStream. Looking at the implementation 
might spark some ideas of how to express it. I'm open to ideas.


bq. What could one do if they wished to build a graph using a subset of data in 
friends collection? Can they apply a filter on friends as part of the 
gatherNodes function? Perhaps they could be allowed to add fq filters.

The fq,and fl params will be supported. This will support filtering and 
listing/aggregating edge properties.)

> Add gatherNodes Streaming Expression to support breadth first traversals
> ------------------------------------------------------------------------
>
>                 Key: SOLR-8925
>                 URL: https://issues.apache.org/jira/browse/SOLR-8925
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Joel Bernstein
>            Assignee: Joel Bernstein
>             Fix For: 6.1
>
>
> The gatherNodes Streaming Expression is a flexible general purpose breadth 
> first graph traversal. It uses the same parallel join under the covers as 
> (SOLR-8888) but is much more generalized and can be used for a wide range of 
> use cases.
> Sample syntax:
> {code}
> gatherNodes(
>                       friends,
>                       gatherNodes(
>                                            friends,
>                                             search(articles, q=“body:(query 
> 1)”, fl=“author”),
>                                           walk ="author->user”,
>                                             gather="friend"),
>                        walk=“friend-> user”,
>                        gather="friend",
>                        scatter=“roots, branches, leaves”
> )
> {code}
> The expression above is evaluated as follows:
> 1) The inner search() expression is evaluated on the *articles* collection, 
> emitting a Stream of Tuples with the author field populated.
> 2) The inner gatherNodes() expression reads the Tuples form the search() 
> stream and traverses to the *friends* collection by performing a distributed 
> join between articles.author and friends.user field.  It gathers the value 
> from the *friend* field during the join.
> 3) The inner gatherNodes() expression then emits the *friend* Tuples. By 
> default the gatherNodes function emits only the leaves which in this case are 
> the *friend* tuples.
> 4) The outer gatherNodes() expression reads the *friend* Tuples and Traverses 
> again in the "friends" collection, this time performing the join between 
> *friend* Tuples  emitted in step 3. This collects the friend of friends.
> 5) The outer gatherNodes() expression emits the entire graph that was 
> collected. This is controlled by the "scatter" parameter. In the example the 
> *root* nodes are the authors, the *branches* are the author's friends and the 
> *leaves* are the friend of friends.
> This traversal is fully distributed and cross collection.
> Like all streaming expressions the gather nodes expression can be combined 
> with other streaming expressions. For example the following expression uses a 
> hashJoin to intersect the network of friends rooted to authors found with 
> different queries:
> {code}
> hashInnerJoin(
>                       gatherNodes(friends,
>                                   gatherNodes(friends
>                                               search(articles, 
> q=“body:(queryA)”, fl=“author”),
>                                               walk ="author->user”,
>                                               gather="friend"),
>                                   walk=“friend -> user”,
>                                   gather="friend",
>                                   scatter=“branches, leaves”),
>                        gatherNodes(friends,
>                                   gatherNodes(friends
>                                               search(articles, 
> q=“body:(queryB)”, fl=“author”),
>                                               walk ="author->user”,
>                                               gather="friend"),
>                                   walk=“friend -> user”,
>                                   gather="friend",
>                                   scatter=“branches, leaves”),
>                       on=“friend”
>          )
> {code}
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to