Re: [Neo4j] Aggregate queries

2011-09-10 Thread Xavier Shay
Reporting back, I came up with the following query that does what I need:
m = [:];
g.idx(user).get(key, %query%*)._()[0..5].sideEffect { x = it.key
}.out.in.uniqueObject.loop(3) { it.loops  2 }.sideEffect {
  if (!m[x]) { m[x] = 0; };
  m[x]++
}  -1;
m;

* I didn't use mapWithDefault because I'm sending via the REST API and want
to be able to see the values of the map in the result: {1=1, 2=1, 3=8,
4=1}
* The out.in is a feature of our data (each user node is always connected to
other user nodes via an intermediary node)
* The [0..5] allows me to page through the index to avoid one massive query.
* I'm thinking perhaps the aggregate step would be useful, but I haven't
figured out how to use it yet.
* I will unroll the loop as Marko suggested when I get to optimizing it

Thanks for all the help,
Xavier

On Thu, Sep 8, 2011 at 9:32 AM, Xavier Shay xavier+ne...@squareup.comwrote:

 Thanks everyone, this gives me plenty to work with. Will report back.


 On Thu, Sep 8, 2011 at 7:03 AM, Marko Rodriguez okramma...@gmail.comwrote:

 Hey,

  Won't this count dupes more than once?
 
  Xavier's requirements of how many other nodes are they connected
 sounds
  like you should only count uniques, and that's why I am checking the
 size of
  groupCount map instead of using count(). Instead of a map you could use
 a
  Set with aggregate(), but I wasn't sure if they'd have the
 aggregate-loop
  fix yet.

 Then add a uniqueObject to the pipeline.

g.idx(index_name)[[key:value]].both.loop(1){it.loops 
 depth}.uniqueObject.count()

  Also Xavier said, For all nodes in a particular index. I took that to
 mean
  all nodes in an index, not all nodes for a particular value in an index,
  hence the wildcard query:
 
  index_nodes = g.idx(index_name).get(index_key,Neo4jTokens.QUERY_HEADER +
  *)
 
  However, I am not sure/can't remember if you can do a wildcard query
 without
  at least one leading character.


 Oh then yea, the %query% header can be used.

 Marko.

 http://markorodriguez.com
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user



___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Aggregate queries

2011-09-10 Thread Marko Rodriguez
Hey,

 m = [:];
 g.idx(user).get(key, %query%*)._()[0..5].sideEffect { x = it.key
 }.out.in.uniqueObject.loop(3) { it.loops  2 }.sideEffect {
  if (!m[x]) { m[x] = 0; };
  m[x]++
 }  -1;
 m;

Tips:

1. do g.idx(user).get(key, %query%*)[0..5] 
- _() is not needed.
2. use can use single quotes g.idx('user').get('key','%query%*')
3. use groupCount(m) instead of that last sideEffect.

 * The [0..5] allows me to page through the index to avoid one massive query.
 * I'm thinking perhaps the aggregate step would be useful, but I haven't
 figured out how to use it yet.

aggregate is just a way to collect everything up at a single step before moving 
to the next step. Typical use case:
g.v(1).out('friend').aggregate(x).out('friend').except(x) 
- my friends' friends who are not my friends.

 * I will unroll the loop as Marko suggested when I get to optimizing it

Cool. Realize that Gremlin 1.3 (coming out in a couple of weeks) is nearly 2-3x 
faster on most traversals as we have done many many performance optimizations. 
However, still unrolling loops will be faster than not doing so.

Good luck with your project.

Keep the questions/thoughts coming,
Marko.

http://markorodriguez.com
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Aggregate queries

2011-09-08 Thread Marko Rodriguez
Hi,

Thanks James. Here is how I would do it -- groupCount is not needed.

g.idx(index_name)[[key:value]].both.loop(1){it.loops  depth}.count()

Note: Be wary of this query. Make sure the branch factor of your graph is 
sufficiently small or the depth to which you are exploring is sufficiently 
small. With a large branch and depth, you can easily touch everything in your 
graph if your graph has natural statistics. ( 
http://en.wikipedia.org/wiki/Scale-free_network )

Also, if you want to get fancy, what I like to do, is unroll my loops to 
increase performance. Given that the while construct of your loop step is 
simply  depth, you can append an appropriate number of .both steps.

traversal = g.idx(index_name)[[key:value]];
for(i in 0..depth) { 
traversal = traversal.both; 
}
traversal.count();

Finally, 'both' is for undirected traversals. Use 'out' for outgoing traversals 
(follow the direction of the arrows) and 'in' for incoming traversals.
https://github.com/tinkerpop/gremlin/wiki/Gremlin-Steps

HTH,
Marko.

http://markorodriguez.com

 
 Xavier Shay wrote:
 
 For all nodes in a particular index, how many other nodes are they
 connected to at depth X?
 
 
 Marko will be able to improve upon this, but try something like this (this
 is untested)...
 
 m = [:]
 depth = 10
 index_name = vertices
 index_key = name
 index_nodes = g.idx(index_name).get(index_key,Neo4jTokens.QUERY_HEADER +
 *).
 index_nodes._().both.groupCount(m).loop(2){it.loops  depth}
 m.size()
 
 - James
 
 
 --
 View this message in context: 
 http://neo4j-community-discussions.438527.n3.nabble.com/Neo4j-Aggregate-queries-tp3317720p3317876.html
 Sent from the Neo4j Community Discussions mailing list archive at Nabble.com.
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Aggregate queries

2011-09-08 Thread espeed

Marko Rodriguez-2 wrote:
 

 For all nodes in a particular index, how many other nodes are they 
 connected to at depth X? 
 
 Here is how I would do it -- groupCount is not needed.
 
   g.idx(index_name)[[key:value]].both.loop(1){it.loops  depth}.count()
 
 

Thanks Marko. A couple questions...

Won't this count dupes more than once? 

Xavier's requirements of how many other nodes are they connected sounds
like you should only count uniques, and that's why I am checking the size of
groupCount map instead of using count(). Instead of a map you could use a
Set with aggregate(), but I wasn't sure if they'd have the aggregate-loop
fix yet. 

Also Xavier said, For all nodes in a particular index. I took that to mean
all nodes in an index, not all nodes for a particular value in an index,
hence the wildcard query:

index_nodes = g.idx(index_name).get(index_key,Neo4jTokens.QUERY_HEADER +
*)
 
However, I am not sure/can't remember if you can do a wildcard query without
at least one leading character.

- James





--
View this message in context: 
http://neo4j-community-discussions.438527.n3.nabble.com/Neo4j-Aggregate-queries-tp3317720p3319768.html
Sent from the Neo4j Community Discussions mailing list archive at Nabble.com.
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Aggregate queries

2011-09-08 Thread Marko Rodriguez
Hey,

 Won't this count dupes more than once? 
 
 Xavier's requirements of how many other nodes are they connected sounds
 like you should only count uniques, and that's why I am checking the size of
 groupCount map instead of using count(). Instead of a map you could use a
 Set with aggregate(), but I wasn't sure if they'd have the aggregate-loop
 fix yet. 

Then add a uniqueObject to the pipeline.

g.idx(index_name)[[key:value]].both.loop(1){it.loops  
depth}.uniqueObject.count()

 Also Xavier said, For all nodes in a particular index. I took that to mean
 all nodes in an index, not all nodes for a particular value in an index,
 hence the wildcard query:
 
 index_nodes = g.idx(index_name).get(index_key,Neo4jTokens.QUERY_HEADER +
 *)
 
 However, I am not sure/can't remember if you can do a wildcard query without
 at least one leading character.


Oh then yea, the %query% header can be used. 

Marko.

http://markorodriguez.com
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Aggregate queries

2011-09-08 Thread Xavier Shay
Thanks everyone, this gives me plenty to work with. Will report back.

On Thu, Sep 8, 2011 at 7:03 AM, Marko Rodriguez okramma...@gmail.comwrote:

 Hey,

  Won't this count dupes more than once?
 
  Xavier's requirements of how many other nodes are they connected sounds
  like you should only count uniques, and that's why I am checking the size
 of
  groupCount map instead of using count(). Instead of a map you could use a
  Set with aggregate(), but I wasn't sure if they'd have the aggregate-loop
  fix yet.

 Then add a uniqueObject to the pipeline.

g.idx(index_name)[[key:value]].both.loop(1){it.loops 
 depth}.uniqueObject.count()

  Also Xavier said, For all nodes in a particular index. I took that to
 mean
  all nodes in an index, not all nodes for a particular value in an index,
  hence the wildcard query:
 
  index_nodes = g.idx(index_name).get(index_key,Neo4jTokens.QUERY_HEADER +
  *)
 
  However, I am not sure/can't remember if you can do a wildcard query
 without
  at least one leading character.


 Oh then yea, the %query% header can be used.

 Marko.

 http://markorodriguez.com
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo4j] Aggregate queries

2011-09-07 Thread Xavier Shay
Hello,
Is there an effective way to run aggregate queries over neo4j database? I'm
currently running a stock install of the REST server, and want to answer
questions like:

For all nodes in a particular index, how many other nodes are they
connected to at depth X?

Currently I have a script that fires up a number of threads and just hammers
the server with an HTTP request per index entry to fetch the relationships,
then does some post-processing on the result to calculate the count. This
doesn't seem the most efficient way.

What other options do I have?

Cheers,
Xavier
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Aggregate queries

2011-09-07 Thread Peter Neubauer
Xavier,
I would put the load on the server side by either scripting something
with Gremlin, http://docs.neo4j.org/chunked/snapshot/gremlin-plugin.html
or write a Server Plugin taking your parameters and do the heavy
lifting in Java,
http://docs.neo4j.org/chunked/snapshot/server-plugins.html

Would that be an option?

Cheers,

/peter neubauer

GTalk:      neubauer.peter
Skype       peter.neubauer
Phone       +46 704 106975
LinkedIn   http://www.linkedin.com/in/neubauer
Twitter      http://twitter.com/peterneubauer

http://www.neo4j.org               - Your high performance graph database.
http://startupbootcamp.org/    - Öresund - Innovation happens HERE.
http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.



On Wed, Sep 7, 2011 at 9:24 PM, Xavier Shay xavier+ne...@squareup.com wrote:
 Hello,
 Is there an effective way to run aggregate queries over neo4j database? I'm
 currently running a stock install of the REST server, and want to answer
 questions like:

 For all nodes in a particular index, how many other nodes are they
 connected to at depth X?

 Currently I have a script that fires up a number of threads and just hammers
 the server with an HTTP request per index entry to fetch the relationships,
 then does some post-processing on the result to calculate the count. This
 doesn't seem the most efficient way.

 What other options do I have?

 Cheers,
 Xavier
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Aggregate queries

2011-09-07 Thread espeed

Xavier Shay wrote:
 
 For all nodes in a particular index, how many other nodes are they
 connected to at depth X?
 

Marko will be able to improve upon this, but try something like this (this
is untested)...

m = [:]
depth = 10
index_name = vertices
index_key = name
index_nodes = g.idx(index_name).get(index_key,Neo4jTokens.QUERY_HEADER +
*).
index_nodes._().both.groupCount(m).loop(2){it.loops  depth}
m.size()

- James


--
View this message in context: 
http://neo4j-community-discussions.438527.n3.nabble.com/Neo4j-Aggregate-queries-tp3317720p3317876.html
Sent from the Neo4j Community Discussions mailing list archive at Nabble.com.
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user