Re: [Neo4j] Aggregate queries
Reporting back, I came up with the following query that does what I need: m = [:]; g.idx(user).get(key, %query%*)._()[0..5].sideEffect { x = it.key }.out.in.uniqueObject.loop(3) { it.loops 2 }.sideEffect { if (!m[x]) { m[x] = 0; }; m[x]++ } -1; m; * I didn't use mapWithDefault because I'm sending via the REST API and want to be able to see the values of the map in the result: {1=1, 2=1, 3=8, 4=1} * The out.in is a feature of our data (each user node is always connected to other user nodes via an intermediary node) * The [0..5] allows me to page through the index to avoid one massive query. * I'm thinking perhaps the aggregate step would be useful, but I haven't figured out how to use it yet. * I will unroll the loop as Marko suggested when I get to optimizing it Thanks for all the help, Xavier On Thu, Sep 8, 2011 at 9:32 AM, Xavier Shay xavier+ne...@squareup.comwrote: Thanks everyone, this gives me plenty to work with. Will report back. On Thu, Sep 8, 2011 at 7:03 AM, Marko Rodriguez okramma...@gmail.comwrote: Hey, Won't this count dupes more than once? Xavier's requirements of how many other nodes are they connected sounds like you should only count uniques, and that's why I am checking the size of groupCount map instead of using count(). Instead of a map you could use a Set with aggregate(), but I wasn't sure if they'd have the aggregate-loop fix yet. Then add a uniqueObject to the pipeline. g.idx(index_name)[[key:value]].both.loop(1){it.loops depth}.uniqueObject.count() Also Xavier said, For all nodes in a particular index. I took that to mean all nodes in an index, not all nodes for a particular value in an index, hence the wildcard query: index_nodes = g.idx(index_name).get(index_key,Neo4jTokens.QUERY_HEADER + *) However, I am not sure/can't remember if you can do a wildcard query without at least one leading character. Oh then yea, the %query% header can be used. Marko. http://markorodriguez.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Aggregate queries
Hey, m = [:]; g.idx(user).get(key, %query%*)._()[0..5].sideEffect { x = it.key }.out.in.uniqueObject.loop(3) { it.loops 2 }.sideEffect { if (!m[x]) { m[x] = 0; }; m[x]++ } -1; m; Tips: 1. do g.idx(user).get(key, %query%*)[0..5] - _() is not needed. 2. use can use single quotes g.idx('user').get('key','%query%*') 3. use groupCount(m) instead of that last sideEffect. * The [0..5] allows me to page through the index to avoid one massive query. * I'm thinking perhaps the aggregate step would be useful, but I haven't figured out how to use it yet. aggregate is just a way to collect everything up at a single step before moving to the next step. Typical use case: g.v(1).out('friend').aggregate(x).out('friend').except(x) - my friends' friends who are not my friends. * I will unroll the loop as Marko suggested when I get to optimizing it Cool. Realize that Gremlin 1.3 (coming out in a couple of weeks) is nearly 2-3x faster on most traversals as we have done many many performance optimizations. However, still unrolling loops will be faster than not doing so. Good luck with your project. Keep the questions/thoughts coming, Marko. http://markorodriguez.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Aggregate queries
Hi, Thanks James. Here is how I would do it -- groupCount is not needed. g.idx(index_name)[[key:value]].both.loop(1){it.loops depth}.count() Note: Be wary of this query. Make sure the branch factor of your graph is sufficiently small or the depth to which you are exploring is sufficiently small. With a large branch and depth, you can easily touch everything in your graph if your graph has natural statistics. ( http://en.wikipedia.org/wiki/Scale-free_network ) Also, if you want to get fancy, what I like to do, is unroll my loops to increase performance. Given that the while construct of your loop step is simply depth, you can append an appropriate number of .both steps. traversal = g.idx(index_name)[[key:value]]; for(i in 0..depth) { traversal = traversal.both; } traversal.count(); Finally, 'both' is for undirected traversals. Use 'out' for outgoing traversals (follow the direction of the arrows) and 'in' for incoming traversals. https://github.com/tinkerpop/gremlin/wiki/Gremlin-Steps HTH, Marko. http://markorodriguez.com Xavier Shay wrote: For all nodes in a particular index, how many other nodes are they connected to at depth X? Marko will be able to improve upon this, but try something like this (this is untested)... m = [:] depth = 10 index_name = vertices index_key = name index_nodes = g.idx(index_name).get(index_key,Neo4jTokens.QUERY_HEADER + *). index_nodes._().both.groupCount(m).loop(2){it.loops depth} m.size() - James -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/Neo4j-Aggregate-queries-tp3317720p3317876.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Aggregate queries
Marko Rodriguez-2 wrote: For all nodes in a particular index, how many other nodes are they connected to at depth X? Here is how I would do it -- groupCount is not needed. g.idx(index_name)[[key:value]].both.loop(1){it.loops depth}.count() Thanks Marko. A couple questions... Won't this count dupes more than once? Xavier's requirements of how many other nodes are they connected sounds like you should only count uniques, and that's why I am checking the size of groupCount map instead of using count(). Instead of a map you could use a Set with aggregate(), but I wasn't sure if they'd have the aggregate-loop fix yet. Also Xavier said, For all nodes in a particular index. I took that to mean all nodes in an index, not all nodes for a particular value in an index, hence the wildcard query: index_nodes = g.idx(index_name).get(index_key,Neo4jTokens.QUERY_HEADER + *) However, I am not sure/can't remember if you can do a wildcard query without at least one leading character. - James -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/Neo4j-Aggregate-queries-tp3317720p3319768.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Aggregate queries
Hey, Won't this count dupes more than once? Xavier's requirements of how many other nodes are they connected sounds like you should only count uniques, and that's why I am checking the size of groupCount map instead of using count(). Instead of a map you could use a Set with aggregate(), but I wasn't sure if they'd have the aggregate-loop fix yet. Then add a uniqueObject to the pipeline. g.idx(index_name)[[key:value]].both.loop(1){it.loops depth}.uniqueObject.count() Also Xavier said, For all nodes in a particular index. I took that to mean all nodes in an index, not all nodes for a particular value in an index, hence the wildcard query: index_nodes = g.idx(index_name).get(index_key,Neo4jTokens.QUERY_HEADER + *) However, I am not sure/can't remember if you can do a wildcard query without at least one leading character. Oh then yea, the %query% header can be used. Marko. http://markorodriguez.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Aggregate queries
Thanks everyone, this gives me plenty to work with. Will report back. On Thu, Sep 8, 2011 at 7:03 AM, Marko Rodriguez okramma...@gmail.comwrote: Hey, Won't this count dupes more than once? Xavier's requirements of how many other nodes are they connected sounds like you should only count uniques, and that's why I am checking the size of groupCount map instead of using count(). Instead of a map you could use a Set with aggregate(), but I wasn't sure if they'd have the aggregate-loop fix yet. Then add a uniqueObject to the pipeline. g.idx(index_name)[[key:value]].both.loop(1){it.loops depth}.uniqueObject.count() Also Xavier said, For all nodes in a particular index. I took that to mean all nodes in an index, not all nodes for a particular value in an index, hence the wildcard query: index_nodes = g.idx(index_name).get(index_key,Neo4jTokens.QUERY_HEADER + *) However, I am not sure/can't remember if you can do a wildcard query without at least one leading character. Oh then yea, the %query% header can be used. Marko. http://markorodriguez.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] Aggregate queries
Hello, Is there an effective way to run aggregate queries over neo4j database? I'm currently running a stock install of the REST server, and want to answer questions like: For all nodes in a particular index, how many other nodes are they connected to at depth X? Currently I have a script that fires up a number of threads and just hammers the server with an HTTP request per index entry to fetch the relationships, then does some post-processing on the result to calculate the count. This doesn't seem the most efficient way. What other options do I have? Cheers, Xavier ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Aggregate queries
Xavier, I would put the load on the server side by either scripting something with Gremlin, http://docs.neo4j.org/chunked/snapshot/gremlin-plugin.html or write a Server Plugin taking your parameters and do the heavy lifting in Java, http://docs.neo4j.org/chunked/snapshot/server-plugins.html Would that be an option? Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/ - Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Wed, Sep 7, 2011 at 9:24 PM, Xavier Shay xavier+ne...@squareup.com wrote: Hello, Is there an effective way to run aggregate queries over neo4j database? I'm currently running a stock install of the REST server, and want to answer questions like: For all nodes in a particular index, how many other nodes are they connected to at depth X? Currently I have a script that fires up a number of threads and just hammers the server with an HTTP request per index entry to fetch the relationships, then does some post-processing on the result to calculate the count. This doesn't seem the most efficient way. What other options do I have? Cheers, Xavier ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Aggregate queries
Xavier Shay wrote: For all nodes in a particular index, how many other nodes are they connected to at depth X? Marko will be able to improve upon this, but try something like this (this is untested)... m = [:] depth = 10 index_name = vertices index_key = name index_nodes = g.idx(index_name).get(index_key,Neo4jTokens.QUERY_HEADER + *). index_nodes._().both.groupCount(m).loop(2){it.loops depth} m.size() - James -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/Neo4j-Aggregate-queries-tp3317720p3317876.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user