Re: [Neo4j] Fwd: Sync databases

2011-09-14 Thread Xavier Shay
 From: *Eddy Respondek* eddy.respon...@gmail.com
 Date: Wed, Sep 14, 2011 at 11:30 AM
 To: gremlin-us...@googlegroups.com


 This may be a little off topic but maybe someone has done something similar
 before.

 Basically I have a separate Wordpress site (php/mysql) which I've been
 extending significantly and I've setup another server on the same network
 for graph db testing (neo4j/tinkerpop/python-bulbs). I'm confident with my
 graph setup now and would like to attempt to get something small into
 development so I can monitor the results. I want to do a simple like
 relationship between users and articles.

 That means I need to keep an identical index of user ids and article ids in
 the graph db. I know how to update the id's when a new user or article is
 created, deleted, etc. What I don't know is the correct way to ensure data
 integrity in case something goes wrong like the graph db server crashes,
 etc.

 Does anyone have any thoughts on the best way to do this?

I store the last loaded IDs as properties on the root node, then every time
my sync script is run it loads everything from those IDs forward, checking
as it goes that it doesn't create duplicates. It's not the fastest way, but
it's robust. If the graph DB becomes corrupt, you roll back to the latest
back up and rerun the sync.

(We have the advantage that our data is immutable - you'll need some extra
changes if that isn't the case for you, but can use the same general
technique)

Xavier
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo4j] Returning arbitrary data from gremlin

2011-09-11 Thread Xavier Shay
Hello,
I've been playing with Gremlin over the weekend - thanks to all the help
provided already on other threads, it has been really useful.

I'm trying to optimize a query that currently takes three HTTP requests
(index fetch, traverse, batch get for node data) by writing a Gremlin script
and executing that over the REST API. I have it working, but it's somewhat
clunky in the way it gets the data out:

  t = [];
  start = g.v(3);
  for (i in 1..5) {
start = start.
  outE.except(t).sideEffect { t.add(it) }.inV.
   inE.except(t).sideEffect { t.add(it) }.outV;
  };
  start  -1;
  t = t.collect {
outN = it.outV.toList().get(0);
inN  = it.inV.toList().get(0);
[outN.type, outN.key, inN.type, inN.key].join(,);
  };
  t.join(:)

(The sideEffect bit is to ensure breadth first ordering - trying to copy
the behavior of traverse/relationships)

I feel I should be able to use the Table object, but couldn't figure it out
(is there explicit API documentation for it anywhere?)
Ideally, I'd like to return a custom JSON object, something like:
[{type: outN.type, key: outN.key}, {type: inN.type, key, inN.key}]

...but I couldn't see an easy way of constructing that.
What I have already is trivial to parse, so it's not a big deal, but it's a
bit messy.

Cheers,
Xavier
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Returning arbitrary data from gremlin

2011-09-11 Thread Xavier Shay
On Sun, Sep 11, 2011 at 9:26 AM, espeed ja...@jamesthornton.com wrote:

 Hi Xavier -

 If you would, provide a more detailed description of what the query is
 trying to do.

We have a primary node type, which has outbound connections to secondary
node types.
A primary node will never have a direct connection to another primary node,
but it can be related if they both have an outbound connection to the same
secondary node.

This query is pulling out all related edges to a given depth, and
returning those so that the graph can be visualized. As well as the edges,
for the visualization I also need properties stored on the nodes (they have
different types, for instance).



 You might look at the Gremlin aggregate, scatter, and gather methods, and I
 thought there was a collect() method, but I don't see it in the docs --
 Marko, isn't/wasn't that method?).

I couldn't from the docs figure out how to apply these methods to my
problem. I tried a few variations but nothing that behaved as I needed.


 Here are some docs that might be useful:

 Gremlin: Depth First vs. Breadth First
   https://github.com/tinkerpop/gremlin/wiki/Depth-First-vs.-Breadth-First

 [Gremlin] The Generalize AsPipe model - back(), loop(), and table()
   https://groups.google.com/d/topic/gremlin-users/5ujhy2bMKMI/discussion

loop() makes sense to me, the above code avoids it for performance reasons.
I played around with table quite a bit (reading both the wiki and Neo4j Rest
docs, including the one Peter linked) but couldn't make it fit into what I
want to do.

If it was possible to create a table outside of a pipe (where I am doing
t.collect in my code), I think that would be helpful, but I couldn't figure
out the API (I couldn't even find Table in the Gremlin source code ... where
is it?).

And to make things cleaner, you can also create user-define steps:
  https://github.com/tinkerpop/gremlin/wiki/User-Defined-Steps.

 For example, here's a user-defined step that builds a tree:
https://gist.github.com/1197179

I didn't know about these, will investigate.

Thanks everyone,
Xavier
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Gremlin pipes broken in 1.5

2011-09-10 Thread Xavier Shay
On Fri, Sep 9, 2011 at 1:16 PM, Peter Neubauer 
peter.neuba...@neotechnology.com wrote:

 Xavier,
 this can well be a packaging problem. Could you try with 1.5.M01? The
 dependency problem is fixed, but we are fighting for the build
 pipeline to deliver a new snapshot of the final artifact.

ah ok, I thought I was on 1.5.M01 but was on a snapshot. 1.5.M01 works fine.
Sorry for the confusion.

Xavier



 Cheers,

 /peter neubauer

 GTalk:  neubauer.peter
 Skype   peter.neubauer
 Phone   +46 704 106975
 LinkedIn   http://www.linkedin.com/in/neubauer
 Twitter  http://twitter.com/peterneubauer

 http://www.neo4j.org   - Your high performance graph database.
 http://startupbootcamp.org/- Ă–resund - Innovation happens HERE.
 http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.



 On Fri, Sep 9, 2011 at 10:03 PM, Xavier Shay xavier+ne...@squareup.com
 wrote:
  On Fri, Sep 9, 2011 at 12:39 PM, Marko Rodriguez okramma...@gmail.com
 wrote:
 
  Hi,
 
  This error seems to be an issue with depedencies. Are you using an
  TinkerPop dependency/s in your project. I believe Neo4j 1.5.M01 is
  depending on Blueprints 0.9 which does have the getInEdges(String...)
 method
  (similarly for getOutEdges(). Perhaps you are depending on Blueprints
 0.8
  somewhere in your project and the class loader is getting confused?
 
  Don't really understand this suggestion. I downloaded neo4j from
  http://dist.neo4j.org/neo4j-community-1.5-SNAPSHOT-unix.tar.gz, unpacked
 it,
  then started up a server using `bin/neo4j server`. After loading in a
 node
  via the rest interface, I ran the above command. I don't have a project
 that
  could be relevant.
 
 
 
  Also, you don't need an identity pipe. You can do:
 g.v(1).both
 
  ah nice. (This still causes the same exception though.)
 
  Cheers,
  Xavier
 
 
  Marko.
 
  http://markorodriguez.com
 
  On Sep 9, 2011, at 1:29 PM, Xavier Shay wrote:
 
   Hello,
   I have just upgraded to neo4j 1.5 (brew install neo4j --HEAD) and am
  getting
   the following exception whenever I try to use a pipe:
  
   curl -H Accept:application/json -X POST -d
  '{script:g.v(1)._().both;}'
   -H Content-Type:application/json 
  
 http://localhost:7474/db/data/ext/GremlinPlugin/graphdb/execute_script;
   {
message :
  
 
 com.tinkerpop.blueprints.pgm.impls.neo4j.Neo4jVertex.getInEdges([Ljava/lang/String;)Ljava/lang/Iterable;,
exception : java.lang.AbstractMethodError:
  
 
 com.tinkerpop.blueprints.pgm.impls.neo4j.Neo4jVertex.getInEdges([Ljava/lang/String;)Ljava/lang/Iterable;,
stacktrace : [... trim, see
   https://gist.github.com/1d6df2b03fa4d402ade3... ]
   }
  
   The same command without .both works as expected. Other pipe methods
  such
   as outE all cause the same exception. This was working with 1.4, and
 I
   couldn't find any mention of backwards incompatibility in the Gremlin
   changelog.
  
   I'm not sure how to debug this further. Any suggestions?
  
   Cheers,
   Xavier
   ___
   Neo4j mailing list
   User@lists.neo4j.org
   https://lists.neo4j.org/mailman/listinfo/user
 
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Aggregate queries

2011-09-10 Thread Xavier Shay
Reporting back, I came up with the following query that does what I need:
m = [:];
g.idx(user).get(key, %query%*)._()[0..5].sideEffect { x = it.key
}.out.in.uniqueObject.loop(3) { it.loops  2 }.sideEffect {
  if (!m[x]) { m[x] = 0; };
  m[x]++
}  -1;
m;

* I didn't use mapWithDefault because I'm sending via the REST API and want
to be able to see the values of the map in the result: {1=1, 2=1, 3=8,
4=1}
* The out.in is a feature of our data (each user node is always connected to
other user nodes via an intermediary node)
* The [0..5] allows me to page through the index to avoid one massive query.
* I'm thinking perhaps the aggregate step would be useful, but I haven't
figured out how to use it yet.
* I will unroll the loop as Marko suggested when I get to optimizing it

Thanks for all the help,
Xavier

On Thu, Sep 8, 2011 at 9:32 AM, Xavier Shay xavier+ne...@squareup.comwrote:

 Thanks everyone, this gives me plenty to work with. Will report back.


 On Thu, Sep 8, 2011 at 7:03 AM, Marko Rodriguez okramma...@gmail.comwrote:

 Hey,

  Won't this count dupes more than once?
 
  Xavier's requirements of how many other nodes are they connected
 sounds
  like you should only count uniques, and that's why I am checking the
 size of
  groupCount map instead of using count(). Instead of a map you could use
 a
  Set with aggregate(), but I wasn't sure if they'd have the
 aggregate-loop
  fix yet.

 Then add a uniqueObject to the pipeline.

g.idx(index_name)[[key:value]].both.loop(1){it.loops 
 depth}.uniqueObject.count()

  Also Xavier said, For all nodes in a particular index. I took that to
 mean
  all nodes in an index, not all nodes for a particular value in an index,
  hence the wildcard query:
 
  index_nodes = g.idx(index_name).get(index_key,Neo4jTokens.QUERY_HEADER +
  *)
 
  However, I am not sure/can't remember if you can do a wildcard query
 without
  at least one leading character.


 Oh then yea, the %query% header can be used.

 Marko.

 http://markorodriguez.com
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user



___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo4j] Gremlin pipes broken in 1.5

2011-09-09 Thread Xavier Shay
Hello,
I have just upgraded to neo4j 1.5 (brew install neo4j --HEAD) and am getting
the following exception whenever I try to use a pipe:

 curl -H Accept:application/json -X POST -d '{script:g.v(1)._().both;}'
-H Content-Type:application/json 
http://localhost:7474/db/data/ext/GremlinPlugin/graphdb/execute_script;
{
  message :
com.tinkerpop.blueprints.pgm.impls.neo4j.Neo4jVertex.getInEdges([Ljava/lang/String;)Ljava/lang/Iterable;,
  exception : java.lang.AbstractMethodError:
com.tinkerpop.blueprints.pgm.impls.neo4j.Neo4jVertex.getInEdges([Ljava/lang/String;)Ljava/lang/Iterable;,
  stacktrace : [... trim, see
https://gist.github.com/1d6df2b03fa4d402ade3... ]
}

The same command without .both works as expected. Other pipe methods such
as outE all cause the same exception. This was working with 1.4, and I
couldn't find any mention of backwards incompatibility in the Gremlin
changelog.

I'm not sure how to debug this further. Any suggestions?

Cheers,
Xavier
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Gremlin pipes broken in 1.5

2011-09-09 Thread Xavier Shay
On Fri, Sep 9, 2011 at 12:39 PM, Marko Rodriguez okramma...@gmail.comwrote:

 Hi,

 This error seems to be an issue with depedencies. Are you using an
 TinkerPop dependency/s in your project. I believe Neo4j 1.5.M01 is
 depending on Blueprints 0.9 which does have the getInEdges(String...) method
 (similarly for getOutEdges(). Perhaps you are depending on Blueprints 0.8
 somewhere in your project and the class loader is getting confused?

Don't really understand this suggestion. I downloaded neo4j from
http://dist.neo4j.org/neo4j-community-1.5-SNAPSHOT-unix.tar.gz, unpacked it,
then started up a server using `bin/neo4j server`. After loading in a node
via the rest interface, I ran the above command. I don't have a project that
could be relevant.



 Also, you don't need an identity pipe. You can do:
g.v(1).both

ah nice. (This still causes the same exception though.)

Cheers,
Xavier


 Marko.

 http://markorodriguez.com

 On Sep 9, 2011, at 1:29 PM, Xavier Shay wrote:

  Hello,
  I have just upgraded to neo4j 1.5 (brew install neo4j --HEAD) and am
 getting
  the following exception whenever I try to use a pipe:
 
  curl -H Accept:application/json -X POST -d
 '{script:g.v(1)._().both;}'
  -H Content-Type:application/json 
  http://localhost:7474/db/data/ext/GremlinPlugin/graphdb/execute_script;
  {
   message :
 
 com.tinkerpop.blueprints.pgm.impls.neo4j.Neo4jVertex.getInEdges([Ljava/lang/String;)Ljava/lang/Iterable;,
   exception : java.lang.AbstractMethodError:
 
 com.tinkerpop.blueprints.pgm.impls.neo4j.Neo4jVertex.getInEdges([Ljava/lang/String;)Ljava/lang/Iterable;,
   stacktrace : [... trim, see
  https://gist.github.com/1d6df2b03fa4d402ade3... ]
  }
 
  The same command without .both works as expected. Other pipe methods
 such
  as outE all cause the same exception. This was working with 1.4, and I
  couldn't find any mention of backwards incompatibility in the Gremlin
  changelog.
 
  I'm not sure how to debug this further. Any suggestions?
 
  Cheers,
  Xavier
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user

 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Aggregate queries

2011-09-08 Thread Xavier Shay
Thanks everyone, this gives me plenty to work with. Will report back.

On Thu, Sep 8, 2011 at 7:03 AM, Marko Rodriguez okramma...@gmail.comwrote:

 Hey,

  Won't this count dupes more than once?
 
  Xavier's requirements of how many other nodes are they connected sounds
  like you should only count uniques, and that's why I am checking the size
 of
  groupCount map instead of using count(). Instead of a map you could use a
  Set with aggregate(), but I wasn't sure if they'd have the aggregate-loop
  fix yet.

 Then add a uniqueObject to the pipeline.

g.idx(index_name)[[key:value]].both.loop(1){it.loops 
 depth}.uniqueObject.count()

  Also Xavier said, For all nodes in a particular index. I took that to
 mean
  all nodes in an index, not all nodes for a particular value in an index,
  hence the wildcard query:
 
  index_nodes = g.idx(index_name).get(index_key,Neo4jTokens.QUERY_HEADER +
  *)
 
  However, I am not sure/can't remember if you can do a wildcard query
 without
  at least one leading character.


 Oh then yea, the %query% header can be used.

 Marko.

 http://markorodriguez.com
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo4j] Aggregate queries

2011-09-07 Thread Xavier Shay
Hello,
Is there an effective way to run aggregate queries over neo4j database? I'm
currently running a stock install of the REST server, and want to answer
questions like:

For all nodes in a particular index, how many other nodes are they
connected to at depth X?

Currently I have a script that fires up a number of threads and just hammers
the server with an HTTP request per index entry to fetch the relationships,
then does some post-processing on the result to calculate the count. This
doesn't seem the most efficient way.

What other options do I have?

Cheers,
Xavier
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user