[jira] [Comment Edited] (SOLR-7543) Create GraphQuery that allows graph traversal as a query operator.

2015-05-19 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550031#comment-14550031
 ] 

Joel Bernstein edited comment on SOLR-7543 at 5/19/15 8:29 AM:
---

Thanks for the nice contribution [~kwatters]!

A couple of thoughts on the discussion particularly on *general* VS *specific* 
use case. I think this ticket covers a useful specific usecase, particularly 
for access control. And I suspect people will find other interesting uses for 
this type of graph query. It works for non-distributed graph traversals which 
is where we can do all kinds of low level things to improve performance. 

But it's also an opportunity to open the discussion about the generic use case 
which will be distributed graph queries. [~steff1193] mentions a Titan 
integration which would be very useful. But also as [~dgove1] mentions 
Streaming Expressions provides us with an elegant framework for all kinds of 
parallel computing tasks. I think it's worth exploring how to model parallel 
graph joins using the Streaming Expression language and the Streaming API.


was (Author: joel.bernstein):
Thanks for the nice contribution [~kwatters]!

A couple of thoughts on the discussion particularly on *general* VS *specific* 
use case. I think this ticket covers a useful specific usecase, particularly 
for access control. And I suspect people will find other interesting uses for 
this type of graph query. It works for non-distributed graph traversals which 
is where we can do all kinds of low level things to improve performance. 

But it's also an opportunity to open the discussion about the generic use case 
which will be distributed graph queries. [~steff1193] mentions a Titan 
integration which would be very useful. But also as [~dgove1] mentions 
Streaming Expression provides us with an elegant framework for all kinds of 
parallel computing tasks. I think it's worth exploring how to model parallel 
graph joins using the Streaming Expression language and the Streaming API.

> Create GraphQuery that allows graph traversal as a query operator.
> --
>
> Key: SOLR-7543
> URL: https://issues.apache.org/jira/browse/SOLR-7543
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Kevin Watters
>Priority: Minor
>
> I have a GraphQuery that I implemented a long time back that allows a user to 
> specify a "startQuery" to identify which documents to start graph traversal 
> from.  It then gathers up the edge ids for those documents , optionally 
> applies an additional filter.  The query is then re-executed continually 
> until no new edge ids are identified.  I am currently hosting this code up at 
> https://github.com/kwatters/solrgraph and I would like to work with the 
> community to get some feedback and ultimately get it committed back in as a 
> lucene query.
> Here's a bit more of a description of the parameters for the query / graph 
> traversal:
> q - the initial start query that identifies the universe of documents to 
> start traversal from.
> fromField - the field name that contains the node id
> toField - the name of the field that contains the edge id(s).
> traversalFilter - this is an additional query that can be supplied to limit 
> the scope of graph traversal to just the edges that satisfy the 
> traversalFilter query.
> maxDepth - integer specifying how deep the breadth first search should go.
> returnStartNodes - boolean to determine if the documents that matched the 
> original "q" should be returned as part of the graph.
> onlyLeafNodes - boolean that filters the graph query to only return 
> documents/nodes that have no edges.
> We identify a set of documents with "q" as any arbitrary lucene query.  It 
> will collect the values in the fromField, create an OR query with those 
> values , optionally apply an additional constraint from the "traversalFilter" 
> and walk the result set until no new edges are detected.  Traversal can also 
> be stopped at N hops away as defined with the maxDepth.  This is a BFS 
> (Breadth First Search) algorithm.  Cycle detection is done by not revisiting 
> the same document for edge extraction.  
> This query operator does not keep track of how you arrived at the document, 
> but only that the traversal did arrive at the document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-7543) Create GraphQuery that allows graph traversal as a query operator.

2015-05-15 Thread Dennis Gove (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14546062#comment-14546062
 ] 

Dennis Gove edited comment on SOLR-7543 at 5/15/15 7:39 PM:


I'm with you on wanting to keep the memory usage as low as possible - I thought 
maybe you had that info hanging around already. In either case, I think this 
syntax might lower the bar to entry for usage, especially if people are already 
using streaming aggregation for other things. 


was (Author: dpgove):
I'm with on the wanting to keep the memory usage as low as possible - I thought 
maybe you had that info hanging around already. In either case, I think this 
syntax might lower the bar to entry for usage, especially if people are already 
using streaming aggregation for other things. 

> Create GraphQuery that allows graph traversal as a query operator.
> --
>
> Key: SOLR-7543
> URL: https://issues.apache.org/jira/browse/SOLR-7543
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Kevin Watters
>Priority: Minor
>
> I have a GraphQuery that I implemented a long time back that allows a user to 
> specify a "startQuery" to identify which documents to start graph traversal 
> from.  It then gathers up the edge ids for those documents , optionally 
> applies an additional filter.  The query is then re-executed continually 
> until no new edge ids are identified.  I am currently hosting this code up at 
> https://github.com/kwatters/solrgraph and I would like to work with the 
> community to get some feedback and ultimately get it committed back in as a 
> lucene query.
> Here's a bit more of a description of the parameters for the query / graph 
> traversal:
> q - the initial start query that identifies the universe of documents to 
> start traversal from.
> fromField - the field name that contains the node id
> toField - the name of the field that contains the edge id(s).
> traversalFilter - this is an additional query that can be supplied to limit 
> the scope of graph traversal to just the edges that satisfy the 
> traversalFilter query.
> maxDepth - integer specifying how deep the breadth first search should go.
> returnStartNodes - boolean to determine if the documents that matched the 
> original "q" should be returned as part of the graph.
> onlyLeafNodes - boolean that filters the graph query to only return 
> documents/nodes that have no edges.
> We identify a set of documents with "q" as any arbitrary lucene query.  It 
> will collect the values in the fromField, create an OR query with those 
> values , optionally apply an additional constraint from the "traversalFilter" 
> and walk the result set until no new edges are detected.  Traversal can also 
> be stopped at N hops away as defined with the maxDepth.  This is a BFS 
> (Breadth First Search) algorithm.  Cycle detection is done by not revisiting 
> the same document for edge extraction.  
> This query operator does not keep track of how you arrived at the document, 
> but only that the traversal did arrive at the document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-7543) Create GraphQuery that allows graph traversal as a query operator.

2015-05-15 Thread Per Steffensen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14545858#comment-14545858
 ] 

Per Steffensen edited comment on SOLR-7543 at 5/15/15 5:52 PM:
---

Sounds interesting. I can't help thinking that this will help users do only one 
particular graph'ish search. But there are millions of other graph'ish searches 
one might want to do. The solution here might be too specific. A while back I 
wrote a Solr indexing-backend for the Titan graph database. We can do a 
storage-backend as well. Putting a full blown graph database on top of Solr (by 
supporting indexing- and potentially storage-backends for e.g. Titan) might be 
the way to go instead, so that we will not end up with lots and lots of very 
specific graph-search query-parsers/resolvers. And this way you will get all 
the other cool stuff from a full blown graph database - e.g. I liked playing 
with Titans REPL. Just a thought


was (Author: steff1193):
Sounds interesting. I can't help thinking that this will help users doing one 
particular graph'ish search. But there are millions of other graph'ish searches 
one might want to do. The solution here might be too specific. A while back I 
wrote a indexing-backend for the Titan graph database. Putting a full blown 
graph database on top of Solr might be the way to go instead, so that we will 
not end up with lots and lots of very specific graph-search 
query-parsers/resolvers. Just a thought

> Create GraphQuery that allows graph traversal as a query operator.
> --
>
> Key: SOLR-7543
> URL: https://issues.apache.org/jira/browse/SOLR-7543
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Kevin Watters
>Priority: Minor
>
> I have a GraphQuery that I implemented a long time back that allows a user to 
> specify a "startQuery" to identify which documents to start graph traversal 
> from.  It then gathers up the edge ids for those documents , optionally 
> applies an additional filter.  The query is then re-executed continually 
> until no new edge ids are identified.  I am currently hosting this code up at 
> https://github.com/kwatters/solrgraph and I would like to work with the 
> community to get some feedback and ultimately get it committed back in as a 
> lucene query.
> Here's a bit more of a description of the parameters for the query / graph 
> traversal:
> q - the initial start query that identifies the universe of documents to 
> start traversal from.
> fromField - the field name that contains the node id
> toField - the name of the field that contains the edge id(s).
> traversalFilter - this is an additional query that can be supplied to limit 
> the scope of graph traversal to just the edges that satisfy the 
> traversalFilter query.
> maxDepth - integer specifying how deep the breadth first search should go.
> returnStartNodes - boolean to determine if the documents that matched the 
> original "q" should be returned as part of the graph.
> onlyLeafNodes - boolean that filters the graph query to only return 
> documents/nodes that have no edges.
> We identify a set of documents with "q" as any arbitrary lucene query.  It 
> will collect the values in the fromField, create an OR query with those 
> values , optionally apply an additional constraint from the "traversalFilter" 
> and walk the result set until no new edges are detected.  Traversal can also 
> be stopped at N hops away as defined with the maxDepth.  This is a BFS 
> (Breadth First Search) algorithm.  Cycle detection is done by not revisiting 
> the same document for edge extraction.  
> This query operator does not keep track of how you arrived at the document, 
> but only that the traversal did arrive at the document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-7543) Create GraphQuery that allows graph traversal as a query operator.

2015-05-14 Thread Dennis Gove (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14544713#comment-14544713
 ] 

Dennis Gove edited comment on SOLR-7543 at 5/15/15 1:20 AM:


For interface/semantics, I think this might be able to benefit from the 
Expression stuff recently added for streams (SOLR-7377). With that, you could 
do something like

{code}
graph(root=search(collection1, q="", fl=""), 
traverse=search(collection1, q="",fl=""), 
on="parent.field=child.field", maxDepth=5, returnRoot=true, 
returnOnlyLeaf=false)
{code}

This would also allow you to do other things like make use of stream merging, 
uniquing, etc

Would even allow for tree traversal across multiple collections.


was (Author: dpgove):
For interface/semantics, I think this might be able to benefit from the 
Expression stuff recently added for streams (SOLR-7377). With that, you could 
do something like

{code}
graph(root=search(collection1, q="", fl=""), 
traverse=search(collection1, q="",fl=""), 
on="parent.field=child.field", maxDepth=5, returnRoot=true, 
returnOnlyLeaf=false)
{code}

This would also allow you to do other things like make use of stream merging, 
uniquing, etc

Would even allow for tree traversal across collections.

> Create GraphQuery that allows graph traversal as a query operator.
> --
>
> Key: SOLR-7543
> URL: https://issues.apache.org/jira/browse/SOLR-7543
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Kevin Watters
>Priority: Minor
>
> I have a GraphQuery that I implemented a long time back that allows a user to 
> specify a "startQuery" to identify which documents to start graph traversal 
> from.  It then gathers up the edge ids for those documents , optionally 
> applies an additional filter.  The query is then re-executed continually 
> until no new edge ids are identified.  I am currently hosting this code up at 
> https://github.com/kwatters/solrgraph and I would like to work with the 
> community to get some feedback and ultimately get it committed back in as a 
> lucene query.
> Here's a bit more of a description of the parameters for the query / graph 
> traversal:
> q - the initial start query that identifies the universe of documents to 
> start traversal from.
> fromField - the field name that contains the node id
> toField - the name of the field that contains the edge id(s).
> traversalFilter - this is an additional query that can be supplied to limit 
> the scope of graph traversal to just the edges that satisfy the 
> traversalFilter query.
> maxDepth - integer specifying how deep the breadth first search should go.
> returnStartNodes - boolean to determine if the documents that matched the 
> original "q" should be returned as part of the graph.
> onlyLeafNodes - boolean that filters the graph query to only return 
> documents/nodes that have no edges.
> We identify a set of documents with "q" as any arbitrary lucene query.  It 
> will collect the values in the fromField, create an OR query with those 
> values , optionally apply an additional constraint from the "traversalFilter" 
> and walk the result set until no new edges are detected.  Traversal can also 
> be stopped at N hops away as defined with the maxDepth.  This is a BFS 
> (Breadth First Search) algorithm.  Cycle detection is done by not revisiting 
> the same document for edge extraction.  
> This query operator does not keep track of how you arrived at the document, 
> but only that the traversal did arrive at the document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-7543) Create GraphQuery that allows graph traversal as a query operator.

2015-05-14 Thread Dennis Gove (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14544713#comment-14544713
 ] 

Dennis Gove edited comment on SOLR-7543 at 5/15/15 1:19 AM:


For interface/semantics, I think this might be able to benefit from the 
Expression stuff recently added for streams (SOLR-7377). With that, you could 
do something like

{code}
graph(root=search(collection1, q="", fl=""), 
traverse=search(collection1, q="",fl=""), 
on="parent.field=child.field", maxDepth=5, returnRoot=true, 
returnOnlyLeaf=false)
{code}

This would also allow you to do other things like make use of stream merging, 
uniquing, etc

Would even allow for tree traversal across collections.


was (Author: dpgove):
For interface/semantics, I think this might be able to benefit from the 
Expression stuff recently added for streams (SOLR-7377). With that, you could 
do something like

{code}
graph(root=search(collection1, q="", fl=""), 
traverse=search(collection1, q="",fl=""), 
on="parent.field=child.field", maxDepth=5, returnRoot=true, 
returnOnlyLeaf=false)
{code}

This would also allow you to do other things like make use of stream merging, 
uniquing, etc

> Create GraphQuery that allows graph traversal as a query operator.
> --
>
> Key: SOLR-7543
> URL: https://issues.apache.org/jira/browse/SOLR-7543
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Kevin Watters
>Priority: Minor
>
> I have a GraphQuery that I implemented a long time back that allows a user to 
> specify a "startQuery" to identify which documents to start graph traversal 
> from.  It then gathers up the edge ids for those documents , optionally 
> applies an additional filter.  The query is then re-executed continually 
> until no new edge ids are identified.  I am currently hosting this code up at 
> https://github.com/kwatters/solrgraph and I would like to work with the 
> community to get some feedback and ultimately get it committed back in as a 
> lucene query.
> Here's a bit more of a description of the parameters for the query / graph 
> traversal:
> q - the initial start query that identifies the universe of documents to 
> start traversal from.
> fromField - the field name that contains the node id
> toField - the name of the field that contains the edge id(s).
> traversalFilter - this is an additional query that can be supplied to limit 
> the scope of graph traversal to just the edges that satisfy the 
> traversalFilter query.
> maxDepth - integer specifying how deep the breadth first search should go.
> returnStartNodes - boolean to determine if the documents that matched the 
> original "q" should be returned as part of the graph.
> onlyLeafNodes - boolean that filters the graph query to only return 
> documents/nodes that have no edges.
> We identify a set of documents with "q" as any arbitrary lucene query.  It 
> will collect the values in the fromField, create an OR query with those 
> values , optionally apply an additional constraint from the "traversalFilter" 
> and walk the result set until no new edges are detected.  Traversal can also 
> be stopped at N hops away as defined with the maxDepth.  This is a BFS 
> (Breadth First Search) algorithm.  Cycle detection is done by not revisiting 
> the same document for edge extraction.  
> This query operator does not keep track of how you arrived at the document, 
> but only that the traversal did arrive at the document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org