[Neo4j] Fwd: Sync databases

2011-09-14 Thread Eddy Respondek
I recently posted the following topic on the Gremlin users list and I've
been directed here. Apparently Michael Hunger is the man I want to talk to
:)

It seems there are a few other people interested in finding out my results
as well.


Forwarded conversation
Subject: Sync databases


From: *Eddy Respondek* eddy.respon...@gmail.com
Date: Wed, Sep 14, 2011 at 11:30 AM
To: gremlin-us...@googlegroups.com


This may be a little off topic but maybe someone has done something similar
before.

Basically I have a separate Wordpress site (php/mysql) which I've been
extending significantly and I've setup another server on the same network
for graph db testing (neo4j/tinkerpop/python-bulbs). I'm confident with my
graph setup now and would like to attempt to get something small into
development so I can monitor the results. I want to do a simple like
relationship between users and articles.

That means I need to keep an identical index of user ids and article ids in
the graph db. I know how to update the id's when a new user or article is
created, deleted, etc. What I don't know is the correct way to ensure data
integrity in case something goes wrong like the graph db server crashes,
etc.

Does anyone have any thoughts on the best way to do this?

--
From: *Marko Rodriguez* okramma...@gmail.com
Date: Wed, Sep 14, 2011 at 11:44 AM
To: gremlin-us...@googlegroups.com


Hey Eddy,

Someone might be able to help you here, but the guy who will give you the
two page rattle on such matters is Michael Hunger on the Neo4j users list.
I've read him talking about similar things --- cross db transactions-style
stuff.

You might want to post your thoughts to that list.

Marko.

http://markorodriguez.com

--
From: *James Thornton* james.thorn...@gmail.com
Date: Wed, Sep 14, 2011 at 12:34 PM
To: gremlin-us...@googlegroups.com


Hi Eddy -

You should definitely post this question to the Neo4j list as well because I
would be interested in Michael's ideas on this.

One approach would be to use a message-passing library like ZeroMQ (
http://www.zeromq.org/) to set up a communication channel between PHP and
Python. This will allow you to write to both MySQL and Neo4j when you create
a new user.

ZeroMQ is stupid fast. You can send millions of requests per second (
http://www.zeromq.org/results:10gbe-tests-v031), and it supports pub/sub and
muticast so you can write to multiple devices/programs at once (
http://zguide.zeromq.org/page:all).

Here are the PHP and Python bindings:

* http://www.zeromq.org/bindings:php
* http://www.zeromq.org/bindings:python

JSON is probably the easiest way to serialize data, or you could use a
binary serialization library like MessagePack.

I'm working on creating a batch loader for Bulbs that uses ZeroMQ to send
requests to a Java/Jython server running the Neo4jBatchGraph implementation
Marko just added (
https://groups.google.com/d/topic/gremlin-users/muuylAEZKrQ/discussionZeroMQ)
-- I'll post an example soon.

- James
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Fwd: Sync databases

2011-09-14 Thread Michael Hunger
HI Eddy,

the way I do it in java with JPA (relational) and graph:

* store the row-primary key + table in a property of the node/relationship
* index the table:key for the node

I didn't change the relational schema at all:

Then I hooked into events:

* when a node is loaded I get table + id from the field and load the relational 
data too and connect the two
* when a relational row is loaded I lookup the table:id in the index to get the 
node and connect the two
* nodes are attached to relational rows, so whenever a row loaded that has no 
node, then one is created and the lookup information is stored as above

That of course depends if you have to use the data together.

Cheers

Michael

Am 14.09.2011 um 08:59 schrieb Eddy Respondek:

 I recently posted the following topic on the Gremlin users list and I've
 been directed here. Apparently Michael Hunger is the man I want to talk to
 :)
 
 It seems there are a few other people interested in finding out my results
 as well.
 
 
 Forwarded conversation
 Subject: Sync databases
 
 
 From: *Eddy Respondek* eddy.respon...@gmail.com
 Date: Wed, Sep 14, 2011 at 11:30 AM
 To: gremlin-us...@googlegroups.com
 
 
 This may be a little off topic but maybe someone has done something similar
 before.
 
 Basically I have a separate Wordpress site (php/mysql) which I've been
 extending significantly and I've setup another server on the same network
 for graph db testing (neo4j/tinkerpop/python-bulbs). I'm confident with my
 graph setup now and would like to attempt to get something small into
 development so I can monitor the results. I want to do a simple like
 relationship between users and articles.
 
 That means I need to keep an identical index of user ids and article ids in
 the graph db. I know how to update the id's when a new user or article is
 created, deleted, etc. What I don't know is the correct way to ensure data
 integrity in case something goes wrong like the graph db server crashes,
 etc.
 
 Does anyone have any thoughts on the best way to do this?
 
 --
 From: *Marko Rodriguez* okramma...@gmail.com
 Date: Wed, Sep 14, 2011 at 11:44 AM
 To: gremlin-us...@googlegroups.com
 
 
 Hey Eddy,
 
 Someone might be able to help you here, but the guy who will give you the
 two page rattle on such matters is Michael Hunger on the Neo4j users list.
 I've read him talking about similar things --- cross db transactions-style
 stuff.
 
 You might want to post your thoughts to that list.
 
 Marko.
 
 http://markorodriguez.com
 
 --
 From: *James Thornton* james.thorn...@gmail.com
 Date: Wed, Sep 14, 2011 at 12:34 PM
 To: gremlin-us...@googlegroups.com
 
 
 Hi Eddy -
 
 You should definitely post this question to the Neo4j list as well because I
 would be interested in Michael's ideas on this.
 
 One approach would be to use a message-passing library like ZeroMQ (
 http://www.zeromq.org/) to set up a communication channel between PHP and
 Python. This will allow you to write to both MySQL and Neo4j when you create
 a new user.
 
 ZeroMQ is stupid fast. You can send millions of requests per second (
 http://www.zeromq.org/results:10gbe-tests-v031), and it supports pub/sub and
 muticast so you can write to multiple devices/programs at once (
 http://zguide.zeromq.org/page:all).
 
 Here are the PHP and Python bindings:
 
 * http://www.zeromq.org/bindings:php
 * http://www.zeromq.org/bindings:python
 
 JSON is probably the easiest way to serialize data, or you could use a
 binary serialization library like MessagePack.
 
 I'm working on creating a batch loader for Bulbs that uses ZeroMQ to send
 requests to a Java/Jython server running the Neo4jBatchGraph implementation
 Marko just added (
 https://groups.google.com/d/topic/gremlin-users/muuylAEZKrQ/discussionZeroMQ)
 -- I'll post an example soon.
 
 - James
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Fwd: Sync databases

2011-09-14 Thread Xavier Shay
 From: *Eddy Respondek* eddy.respon...@gmail.com
 Date: Wed, Sep 14, 2011 at 11:30 AM
 To: gremlin-us...@googlegroups.com


 This may be a little off topic but maybe someone has done something similar
 before.

 Basically I have a separate Wordpress site (php/mysql) which I've been
 extending significantly and I've setup another server on the same network
 for graph db testing (neo4j/tinkerpop/python-bulbs). I'm confident with my
 graph setup now and would like to attempt to get something small into
 development so I can monitor the results. I want to do a simple like
 relationship between users and articles.

 That means I need to keep an identical index of user ids and article ids in
 the graph db. I know how to update the id's when a new user or article is
 created, deleted, etc. What I don't know is the correct way to ensure data
 integrity in case something goes wrong like the graph db server crashes,
 etc.

 Does anyone have any thoughts on the best way to do this?

I store the last loaded IDs as properties on the root node, then every time
my sync script is run it loads everything from those IDs forward, checking
as it goes that it doesn't create duplicates. It's not the fastest way, but
it's robust. If the graph DB becomes corrupt, you roll back to the latest
back up and rerun the sync.

(We have the advantage that our data is immutable - you'll need some extra
changes if that isn't the case for you, but can use the same general
technique)

Xavier
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user