Re: [Neo4j] Fwd: Sync databases
> From: *Eddy Respondek* > Date: Wed, Sep 14, 2011 at 11:30 AM > To: gremlin-us...@googlegroups.com > > > This may be a little off topic but maybe someone has done something similar > before. > > Basically I have a separate Wordpress site (php/mysql) which I've been > extending significantly and I've setup another server on the same network > for graph db testing (neo4j/tinkerpop/python-bulbs). I'm confident with my > graph setup now and would like to attempt to get something small into > development so I can monitor the results. I want to do a simple "like" > relationship between users and articles. > > That means I need to keep an identical index of user ids and article ids in > the graph db. I know how to update the id's when a new user or article is > created, deleted, etc. What I don't know is the correct way to ensure data > integrity in case something goes wrong like the graph db server crashes, > etc. > > Does anyone have any thoughts on the best way to do this? > I store the last loaded IDs as properties on the root node, then every time my sync script is run it loads everything from those IDs forward, checking as it goes that it doesn't create duplicates. It's not the fastest way, but it's robust. If the graph DB becomes corrupt, you roll back to the latest back up and rerun the sync. (We have the advantage that our data is immutable - you'll need some extra changes if that isn't the case for you, but can use the same general technique) Xavier ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Returning arbitrary data from gremlin
On Sun, Sep 11, 2011 at 9:26 AM, espeed wrote: > Hi Xavier - > > If you would, provide a more detailed description of what the query is > trying to do. > We have a primary node type, which has outbound connections to secondary node types. A primary node will never have a direct connection to another primary node, but it can be "related" if they both have an outbound connection to the same secondary node. This query is pulling out all "related" edges to a given depth, and returning those so that the graph can be visualized. As well as the edges, for the visualization I also need properties stored on the nodes (they have different types, for instance). > > You might look at the Gremlin aggregate, scatter, and gather methods, and I > thought there was a collect() method, but I don't see it in the docs -- > Marko, isn't/wasn't that method?). > I couldn't from the docs figure out how to apply these methods to my problem. I tried a few variations but nothing that behaved as I needed. > Here are some docs that might be useful: > > Gremlin: Depth First vs. Breadth First > https://github.com/tinkerpop/gremlin/wiki/Depth-First-vs.-Breadth-First > > [Gremlin] The Generalize AsPipe model - back(), loop(), and table() > https://groups.google.com/d/topic/gremlin-users/5ujhy2bMKMI/discussion loop() makes sense to me, the above code avoids it for performance reasons. I played around with table quite a bit (reading both the wiki and Neo4j Rest docs, including the one Peter linked) but couldn't make it "fit" into what I want to do. If it was possible to create a table outside of a pipe (where I am doing t.collect in my code), I think that would be helpful, but I couldn't figure out the API (I couldn't even find Table in the Gremlin source code ... where is it?). And to make things cleaner, you can also create user-define steps: > https://github.com/tinkerpop/gremlin/wiki/User-Defined-Steps. > > For example, here's a user-defined step that builds a tree: >https://gist.github.com/1197179 I didn't know about these, will investigate. Thanks everyone, Xavier ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] Returning arbitrary data from gremlin
Hello, I've been playing with Gremlin over the weekend - thanks to all the help provided already on other threads, it has been really useful. I'm trying to optimize a query that currently takes three HTTP requests (index fetch, traverse, batch get for node data) by writing a Gremlin script and executing that over the REST API. I have it working, but it's somewhat clunky in the way it gets the data out: t = []; start = g.v(3); for (i in 1..5) { start = start. outE.except(t).sideEffect { t.add(it) }.inV. inE.except(t).sideEffect { t.add(it) }.outV; }; start >> -1; t = t.collect { outN = it.outV.toList().get(0); inN = it.inV.toList().get(0); [outN.type, outN.key, inN.type, inN.key].join(","); }; t.join(":") (The sideEffect bit is to ensure breadth first ordering - trying to copy the behavior of traverse/relationships) I feel I should be able to use the Table object, but couldn't figure it out (is there explicit API documentation for it anywhere?) Ideally, I'd like to return a custom JSON object, something like: [{"type": outN.type, "key": outN.key}, {"type": inN.type, "key", inN.key}] ...but I couldn't see an easy way of constructing that. What I have already is trivial to parse, so it's not a big deal, but it's a bit messy. Cheers, Xavier ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Aggregate queries
Reporting back, I came up with the following query that does what I need: m = [:]; g.idx("user").get("key", "%query%*")._()[0..5].sideEffect { x = it.key }.out.in.uniqueObject.loop(3) { it.loops < 2 }.sideEffect { if (!m[x]) { m[x] = 0; }; m[x]++ } >> -1; m; * I didn't use mapWithDefault because I'm sending via the REST API and want to be able to see the values of the map in the result: "{1=1, 2=1, 3=8, 4=1}" * The out.in is a feature of our data (each user node is always connected to other user nodes via an intermediary node) * The [0..5] allows me to page through the index to avoid one massive query. * I'm thinking perhaps the "aggregate" step would be useful, but I haven't figured out how to use it yet. * I will unroll the loop as Marko suggested when I get to optimizing it Thanks for all the help, Xavier On Thu, Sep 8, 2011 at 9:32 AM, Xavier Shay wrote: > Thanks everyone, this gives me plenty to work with. Will report back. > > > On Thu, Sep 8, 2011 at 7:03 AM, Marko Rodriguez wrote: > >> Hey, >> >> > Won't this count dupes more than once? >> > >> > Xavier's requirements of "how many other nodes are they connected" >> sounds >> > like you should only count uniques, and that's why I am checking the >> size of >> > groupCount map instead of using count(). Instead of a map you could use >> a >> > Set with aggregate(), but I wasn't sure if they'd have the >> aggregate-loop >> > fix yet. >> >> Then add a uniqueObject to the pipeline. >> >>g.idx(index_name)[[key:value]].both.loop(1){it.loops < >> depth}.uniqueObject.count() >> >> > Also Xavier said, "For all nodes in a particular index". I took that to >> mean >> > all nodes in an index, not all nodes for a particular value in an index, >> > hence the wildcard query: >> > >> > index_nodes = g.idx(index_name).get(index_key,Neo4jTokens.QUERY_HEADER + >> > "*") >> > >> > However, I am not sure/can't remember if you can do a wildcard query >> without >> > at least one leading character. >> >> >> Oh then yea, the %query% header can be used. >> >> Marko. >> >> http://markorodriguez.com >> ___ >> Neo4j mailing list >> User@lists.neo4j.org >> https://lists.neo4j.org/mailman/listinfo/user >> > > ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Gremlin pipes broken in 1.5
On Fri, Sep 9, 2011 at 1:16 PM, Peter Neubauer < peter.neuba...@neotechnology.com> wrote: > Xavier, > this can well be a packaging problem. Could you try with 1.5.M01? The > dependency problem is fixed, but we are fighting for the build > pipeline to deliver a new snapshot of the final artifact. > ah ok, I thought I was on 1.5.M01 but was on a snapshot. 1.5.M01 works fine. Sorry for the confusion. Xavier > > Cheers, > > /peter neubauer > > GTalk: neubauer.peter > Skype peter.neubauer > Phone +46 704 106975 > LinkedIn http://www.linkedin.com/in/neubauer > Twitter http://twitter.com/peterneubauer > > http://www.neo4j.org - Your high performance graph database. > http://startupbootcamp.org/- Ă–resund - Innovation happens HERE. > http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. > > > > On Fri, Sep 9, 2011 at 10:03 PM, Xavier Shay > wrote: > > On Fri, Sep 9, 2011 at 12:39 PM, Marko Rodriguez >wrote: > > > >> Hi, > >> > >> This error seems to be an issue with depedencies. Are you using an > >> TinkerPop s in your project. I believe Neo4j 1.5.M01 is > >> depending on Blueprints 0.9 which does have the getInEdges(String...) > method > >> (similarly for getOutEdges(). Perhaps you are depending on Blueprints > 0.8 > >> somewhere in your project and the class loader is getting confused? > >> > > Don't really understand this suggestion. I downloaded neo4j from > > http://dist.neo4j.org/neo4j-community-1.5-SNAPSHOT-unix.tar.gz, unpacked > it, > > then started up a server using `bin/neo4j server`. After loading in a > node > > via the rest interface, I ran the above command. I don't have a project > that > > could be relevant. > > > > > >> > >> Also, you don't need an identity pipe. You can do: > >>g.v(1).both > >> > > ah nice. (This still causes the same exception though.) > > > > Cheers, > > Xavier > > > > > >> Marko. > >> > >> http://markorodriguez.com > >> > >> On Sep 9, 2011, at 1:29 PM, Xavier Shay wrote: > >> > >> > Hello, > >> > I have just upgraded to neo4j 1.5 (brew install neo4j --HEAD) and am > >> getting > >> > the following exception whenever I try to use a pipe: > >> > > >> >> curl -H Accept:application/json -X POST -d > >> '{"script":"g.v(1)._().both;"}' > >> > -H Content-Type:application/json " > >> > > http://localhost:7474/db/data/ext/GremlinPlugin/graphdb/execute_script"; > >> > { > >> > "message" : > >> > > >> > "com.tinkerpop.blueprints.pgm.impls.neo4j.Neo4jVertex.getInEdges([Ljava/lang/String;)Ljava/lang/Iterable;", > >> > "exception" : "java.lang.AbstractMethodError: > >> > > >> > com.tinkerpop.blueprints.pgm.impls.neo4j.Neo4jVertex.getInEdges([Ljava/lang/String;)Ljava/lang/Iterable;", > >> > "stacktrace" : [... trim, see > >> > https://gist.github.com/1d6df2b03fa4d402ade3... ] > >> > } > >> > > >> > The same command without ".both" works as expected. Other pipe methods > >> such > >> > as "outE" all cause the same exception. This was working with 1.4, and > I > >> > couldn't find any mention of backwards incompatibility in the Gremlin > >> > changelog. > >> > > >> > I'm not sure how to debug this further. Any suggestions? > >> > > >> > Cheers, > >> > Xavier > >> > ___ > >> > Neo4j mailing list > >> > User@lists.neo4j.org > >> > https://lists.neo4j.org/mailman/listinfo/user > >> > >> ___ > >> Neo4j mailing list > >> User@lists.neo4j.org > >> https://lists.neo4j.org/mailman/listinfo/user > >> > > ___ > > Neo4j mailing list > > User@lists.neo4j.org > > https://lists.neo4j.org/mailman/listinfo/user > > > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Gremlin pipes broken in 1.5
On Fri, Sep 9, 2011 at 12:39 PM, Marko Rodriguez wrote: > Hi, > > This error seems to be an issue with depedencies. Are you using an > TinkerPop s in your project. I believe Neo4j 1.5.M01 is > depending on Blueprints 0.9 which does have the getInEdges(String...) method > (similarly for getOutEdges(). Perhaps you are depending on Blueprints 0.8 > somewhere in your project and the class loader is getting confused? > Don't really understand this suggestion. I downloaded neo4j from http://dist.neo4j.org/neo4j-community-1.5-SNAPSHOT-unix.tar.gz, unpacked it, then started up a server using `bin/neo4j server`. After loading in a node via the rest interface, I ran the above command. I don't have a project that could be relevant. > > Also, you don't need an identity pipe. You can do: >g.v(1).both > ah nice. (This still causes the same exception though.) Cheers, Xavier > Marko. > > http://markorodriguez.com > > On Sep 9, 2011, at 1:29 PM, Xavier Shay wrote: > > > Hello, > > I have just upgraded to neo4j 1.5 (brew install neo4j --HEAD) and am > getting > > the following exception whenever I try to use a pipe: > > > >> curl -H Accept:application/json -X POST -d > '{"script":"g.v(1)._().both;"}' > > -H Content-Type:application/json " > > http://localhost:7474/db/data/ext/GremlinPlugin/graphdb/execute_script"; > > { > > "message" : > > > "com.tinkerpop.blueprints.pgm.impls.neo4j.Neo4jVertex.getInEdges([Ljava/lang/String;)Ljava/lang/Iterable;", > > "exception" : "java.lang.AbstractMethodError: > > > com.tinkerpop.blueprints.pgm.impls.neo4j.Neo4jVertex.getInEdges([Ljava/lang/String;)Ljava/lang/Iterable;", > > "stacktrace" : [... trim, see > > https://gist.github.com/1d6df2b03fa4d402ade3... ] > > } > > > > The same command without ".both" works as expected. Other pipe methods > such > > as "outE" all cause the same exception. This was working with 1.4, and I > > couldn't find any mention of backwards incompatibility in the Gremlin > > changelog. > > > > I'm not sure how to debug this further. Any suggestions? > > > > Cheers, > > Xavier > > ___ > > Neo4j mailing list > > User@lists.neo4j.org > > https://lists.neo4j.org/mailman/listinfo/user > > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] Gremlin pipes broken in 1.5
Hello, I have just upgraded to neo4j 1.5 (brew install neo4j --HEAD) and am getting the following exception whenever I try to use a pipe: > curl -H Accept:application/json -X POST -d '{"script":"g.v(1)._().both;"}' -H Content-Type:application/json " http://localhost:7474/db/data/ext/GremlinPlugin/graphdb/execute_script"; { "message" : "com.tinkerpop.blueprints.pgm.impls.neo4j.Neo4jVertex.getInEdges([Ljava/lang/String;)Ljava/lang/Iterable;", "exception" : "java.lang.AbstractMethodError: com.tinkerpop.blueprints.pgm.impls.neo4j.Neo4jVertex.getInEdges([Ljava/lang/String;)Ljava/lang/Iterable;", "stacktrace" : [... trim, see https://gist.github.com/1d6df2b03fa4d402ade3... ] } The same command without ".both" works as expected. Other pipe methods such as "outE" all cause the same exception. This was working with 1.4, and I couldn't find any mention of backwards incompatibility in the Gremlin changelog. I'm not sure how to debug this further. Any suggestions? Cheers, Xavier ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Aggregate queries
Thanks everyone, this gives me plenty to work with. Will report back. On Thu, Sep 8, 2011 at 7:03 AM, Marko Rodriguez wrote: > Hey, > > > Won't this count dupes more than once? > > > > Xavier's requirements of "how many other nodes are they connected" sounds > > like you should only count uniques, and that's why I am checking the size > of > > groupCount map instead of using count(). Instead of a map you could use a > > Set with aggregate(), but I wasn't sure if they'd have the aggregate-loop > > fix yet. > > Then add a uniqueObject to the pipeline. > >g.idx(index_name)[[key:value]].both.loop(1){it.loops < > depth}.uniqueObject.count() > > > Also Xavier said, "For all nodes in a particular index". I took that to > mean > > all nodes in an index, not all nodes for a particular value in an index, > > hence the wildcard query: > > > > index_nodes = g.idx(index_name).get(index_key,Neo4jTokens.QUERY_HEADER + > > "*") > > > > However, I am not sure/can't remember if you can do a wildcard query > without > > at least one leading character. > > > Oh then yea, the %query% header can be used. > > Marko. > > http://markorodriguez.com > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] Aggregate queries
Hello, Is there an effective way to run aggregate queries over neo4j database? I'm currently running a stock install of the REST server, and want to answer questions like: "For all nodes in a particular index, how many other nodes are they connected to at depth X?" Currently I have a script that fires up a number of threads and just hammers the server with an HTTP request per index entry to fetch the relationships, then does some post-processing on the result to calculate the count. This doesn't seem the most efficient way. What other options do I have? Cheers, Xavier ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user