[ https://issues.apache.org/jira/browse/SOLR-8925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15232800#comment-15232800 ]
Dennis Gove commented on SOLR-8925: ----------------------------------- The order in the walk parameter might be confusing. {code} walk ="author->user”, {code} In other expressions where we're checking equality between two streams we use a standard of firstStreamField = secondStreamField. In gatherNodes, the field on the right appears to go with the first stream while the field on the left goes with the second stream. I'm not suggesting I don't like the author->user structure, because I do, but perhaps that the use of collection as the first param might lead to confusion. > Add gatherNodes Streaming Expression to support breadth first traversals > ------------------------------------------------------------------------ > > Key: SOLR-8925 > URL: https://issues.apache.org/jira/browse/SOLR-8925 > Project: Solr > Issue Type: New Feature > Reporter: Joel Bernstein > Assignee: Joel Bernstein > Fix For: 6.1 > > > The gatherNodes Streaming Expression is a flexible general purpose breadth > first graph traversal. It uses the same parallel join under the covers as > (SOLR-8888) but is much more generalized and can be used for a wide range of > use cases. > Sample syntax: > {code} > gatherNodes( > friends, > gatherNodes( > friends, > search(articles, q=“body:(query > 1)”, fl=“author”), > walk ="author->user”, > gather="friend"), > walk=“friend-> user”, > gather="friend", > scatter=“roots, branches, leaves” > ) > {code} > The expression above is evaluated as follows: > 1) The inner search() expression is evaluated on the *articles* collection, > emitting a Stream of Tuples with the author field populated. > 2) The inner gatherNodes() expression reads the Tuples form the search() > stream and traverses to the *friends* collection by performing a distributed > join between articles.author and friends.user field. It gathers the value > from the *friend* field during the join. > 3) The inner gatherNodes() expression then emits the *friend* Tuples. By > default the gatherNodes function emits only the leaves which in this case are > the *friend* tuples. > 4) The outer gatherNodes() expression reads the *friend* Tuples and Traverses > again in the "friends" collection, this time performing the join between > *friend* Tuples emitted in step 3. This collects the friend of friends. > 5) The outer gatherNodes() expression emits the entire graph that was > collected. This is controlled by the "scatter" parameter. In the example the > *root* nodes are the authors, the *branches* are the author's friends and the > *leaves* are the friend of friends. > This traversal is fully distributed and cross collection. > Like all streaming expressions the gather nodes expression can be combined > with other streaming expressions. For example the following expression uses a > hashJoin to intersect the network of friends rooted to authors found with > different queries: > {code} > hashInnerJoin( > gatherNodes(friends, > gatherNodes(friends > search(articles, > q=“body:(queryA)”, fl=“author”), > walk ="author->user”, > gather="friend"), > walk=“friend -> user”, > gather="friend", > scatter=“branches, leaves”), > gatherNodes(friends, > gatherNodes(friends > search(articles, > q=“body:(queryB)”, fl=“author”), > walk ="author->user”, > gather="friend"), > walk=“friend -> user”, > gather="friend", > scatter=“branches, leaves”), > on=“friend” > ) > {code} > -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org