[Neo4j] Fwd: Sync databases
I recently posted the following topic on the Gremlin users list and I've been directed here. Apparently Michael Hunger is the man I want to talk to :) It seems there are a few other people interested in finding out my results as well. Forwarded conversation Subject: Sync databases From: *Eddy Respondek* eddy.respon...@gmail.com Date: Wed, Sep 14, 2011 at 11:30 AM To: gremlin-us...@googlegroups.com This may be a little off topic but maybe someone has done something similar before. Basically I have a separate Wordpress site (php/mysql) which I've been extending significantly and I've setup another server on the same network for graph db testing (neo4j/tinkerpop/python-bulbs). I'm confident with my graph setup now and would like to attempt to get something small into development so I can monitor the results. I want to do a simple like relationship between users and articles. That means I need to keep an identical index of user ids and article ids in the graph db. I know how to update the id's when a new user or article is created, deleted, etc. What I don't know is the correct way to ensure data integrity in case something goes wrong like the graph db server crashes, etc. Does anyone have any thoughts on the best way to do this? -- From: *Marko Rodriguez* okramma...@gmail.com Date: Wed, Sep 14, 2011 at 11:44 AM To: gremlin-us...@googlegroups.com Hey Eddy, Someone might be able to help you here, but the guy who will give you the two page rattle on such matters is Michael Hunger on the Neo4j users list. I've read him talking about similar things --- cross db transactions-style stuff. You might want to post your thoughts to that list. Marko. http://markorodriguez.com -- From: *James Thornton* james.thorn...@gmail.com Date: Wed, Sep 14, 2011 at 12:34 PM To: gremlin-us...@googlegroups.com Hi Eddy - You should definitely post this question to the Neo4j list as well because I would be interested in Michael's ideas on this. One approach would be to use a message-passing library like ZeroMQ ( http://www.zeromq.org/) to set up a communication channel between PHP and Python. This will allow you to write to both MySQL and Neo4j when you create a new user. ZeroMQ is stupid fast. You can send millions of requests per second ( http://www.zeromq.org/results:10gbe-tests-v031), and it supports pub/sub and muticast so you can write to multiple devices/programs at once ( http://zguide.zeromq.org/page:all). Here are the PHP and Python bindings: * http://www.zeromq.org/bindings:php * http://www.zeromq.org/bindings:python JSON is probably the easiest way to serialize data, or you could use a binary serialization library like MessagePack. I'm working on creating a batch loader for Bulbs that uses ZeroMQ to send requests to a Java/Jython server running the Neo4jBatchGraph implementation Marko just added ( https://groups.google.com/d/topic/gremlin-users/muuylAEZKrQ/discussionZeroMQ) -- I'll post an example soon. - James ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Fwd: Sync databases
HI Eddy, the way I do it in java with JPA (relational) and graph: * store the row-primary key + table in a property of the node/relationship * index the table:key for the node I didn't change the relational schema at all: Then I hooked into events: * when a node is loaded I get table + id from the field and load the relational data too and connect the two * when a relational row is loaded I lookup the table:id in the index to get the node and connect the two * nodes are attached to relational rows, so whenever a row loaded that has no node, then one is created and the lookup information is stored as above That of course depends if you have to use the data together. Cheers Michael Am 14.09.2011 um 08:59 schrieb Eddy Respondek: I recently posted the following topic on the Gremlin users list and I've been directed here. Apparently Michael Hunger is the man I want to talk to :) It seems there are a few other people interested in finding out my results as well. Forwarded conversation Subject: Sync databases From: *Eddy Respondek* eddy.respon...@gmail.com Date: Wed, Sep 14, 2011 at 11:30 AM To: gremlin-us...@googlegroups.com This may be a little off topic but maybe someone has done something similar before. Basically I have a separate Wordpress site (php/mysql) which I've been extending significantly and I've setup another server on the same network for graph db testing (neo4j/tinkerpop/python-bulbs). I'm confident with my graph setup now and would like to attempt to get something small into development so I can monitor the results. I want to do a simple like relationship between users and articles. That means I need to keep an identical index of user ids and article ids in the graph db. I know how to update the id's when a new user or article is created, deleted, etc. What I don't know is the correct way to ensure data integrity in case something goes wrong like the graph db server crashes, etc. Does anyone have any thoughts on the best way to do this? -- From: *Marko Rodriguez* okramma...@gmail.com Date: Wed, Sep 14, 2011 at 11:44 AM To: gremlin-us...@googlegroups.com Hey Eddy, Someone might be able to help you here, but the guy who will give you the two page rattle on such matters is Michael Hunger on the Neo4j users list. I've read him talking about similar things --- cross db transactions-style stuff. You might want to post your thoughts to that list. Marko. http://markorodriguez.com -- From: *James Thornton* james.thorn...@gmail.com Date: Wed, Sep 14, 2011 at 12:34 PM To: gremlin-us...@googlegroups.com Hi Eddy - You should definitely post this question to the Neo4j list as well because I would be interested in Michael's ideas on this. One approach would be to use a message-passing library like ZeroMQ ( http://www.zeromq.org/) to set up a communication channel between PHP and Python. This will allow you to write to both MySQL and Neo4j when you create a new user. ZeroMQ is stupid fast. You can send millions of requests per second ( http://www.zeromq.org/results:10gbe-tests-v031), and it supports pub/sub and muticast so you can write to multiple devices/programs at once ( http://zguide.zeromq.org/page:all). Here are the PHP and Python bindings: * http://www.zeromq.org/bindings:php * http://www.zeromq.org/bindings:python JSON is probably the easiest way to serialize data, or you could use a binary serialization library like MessagePack. I'm working on creating a batch loader for Bulbs that uses ZeroMQ to send requests to a Java/Jython server running the Neo4jBatchGraph implementation Marko just added ( https://groups.google.com/d/topic/gremlin-users/muuylAEZKrQ/discussionZeroMQ) -- I'll post an example soon. - James ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Fwd: Sync databases
From: *Eddy Respondek* eddy.respon...@gmail.com Date: Wed, Sep 14, 2011 at 11:30 AM To: gremlin-us...@googlegroups.com This may be a little off topic but maybe someone has done something similar before. Basically I have a separate Wordpress site (php/mysql) which I've been extending significantly and I've setup another server on the same network for graph db testing (neo4j/tinkerpop/python-bulbs). I'm confident with my graph setup now and would like to attempt to get something small into development so I can monitor the results. I want to do a simple like relationship between users and articles. That means I need to keep an identical index of user ids and article ids in the graph db. I know how to update the id's when a new user or article is created, deleted, etc. What I don't know is the correct way to ensure data integrity in case something goes wrong like the graph db server crashes, etc. Does anyone have any thoughts on the best way to do this? I store the last loaded IDs as properties on the root node, then every time my sync script is run it loads everything from those IDs forward, checking as it goes that it doesn't create duplicates. It's not the fastest way, but it's robust. If the graph DB becomes corrupt, you roll back to the latest back up and rerun the sync. (We have the advantage that our data is immutable - you'll need some extra changes if that isn't the case for you, but can use the same general technique) Xavier ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user