Re: [Neo4j] reify links with other neo4j databases located on different distributed servers
Actually the opposite. We use nodes in both databases, of course, but we use node references via properties in the *data* graph to point back to entities in the *model* graph. - Reply message - From: "Aliabbas Petiwala" Date: Tue, Jul 5, 2011 1:10 am Subject: [Neo4j] reify links with other neo4j databases located on different distributed servers To: "Neo4j user discussions" Thanks for sharing with us the details, Your solutions sounds innovative and interesting but You have stored the relationships only in the model graph. does it mean you have a fixed schema and how to store new relationships between different nodes. Its really difficult to understand how can you store relationships only once in a model graph and only nodes in the data graph? Regards, Ali On 7/3/11, Rick Bullotta wrote: > Our approach is very application-specific, but it can be summarized by: > > - We keep our model database on one server and our "run time" data (somewhat > like activity streams) on another server > - A long value (node id) "source" property on data nodes that identify a > model node in the other graph > - Long value (node id) "server" property on data nodes that identify a node > in the same graph, which contains "logical server" information stored as > properties (logical name + domain name/IP address + port + protocol) > - Lucene indices on the data nodes that index the data by tag(s), source, > and time > - Relationships in the "model" graph that describe inter-entity model > relationships (inheritance, reference), dependencies and usage references, > etc. > - Lucene indices on the model nodes that index the model entities by type, > tag(s) > - Lucene indices on the "tagging" vocabularies on both the model and data > graph(s) > > We avoided using relationships in the data graph due to the fact that we are > constantly adding and deleting potentially thousands of items per second, > and this could create concurrency and performance issues when there are > potentially millions of relationships on a node > > We didn't originally design it this way. The original approach was a single > (embedded) database, using relationships for all node<->node connections. > We're in the process of moving to our new design in phases, the first of > which was a logical separation of model + data, though in the same graph, > and switching from relationships to the "node id property" approach for some > specific scenarios. > > I have to think there are substantial performance implications *if* you are > trying to do complex cross-shard or cross-graph traversals, which we > generally do not need to do. Rather, we can deal with this at the > application layer. > > > > > -Original Message- > From: Aliabbas Petiwala [mailto:aliabba...@gmail.com] > Sent: Sunday, July 03, 2011 2:54 AM > To: Neo4j user discussions > Subject: Re: [Neo4j] reify links with other neo4j databases located on > different distributed servers > > Thanks a lot Rick > > can you please provide more details on issues which you faced while > using this approach and share some code with us . > Had you decided about this at design time itself and designed your > graph db schema accordingly? > Is there much perceived performance penalties if there are a large > number of such references spanning physical boundaries? > > On 7/2/11, Rick Bullotta wrote: >> We are using node-id property references (the node id as a property), >> qualified with a "logical server" reference, to provide this type of >> binding >> across graphs. If you combine these with an index, you can actually get a >> lot of the functionality of relationships "cross graph", spanning physical >> boundaries. Of course, as Craig points out, this all has to be done at >> the >> application level, including dealing with cascading deletes when a node is >> removed from one graph, ensuring that references to it in another graph >> are >> removed/redirected. >> >> -Original Message- >> From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] >> On >> Behalf Of Craig Taverner >> Sent: Saturday, July 02, 2011 6:03 AM >> To: Neo4j user discussions >> Subject: Re: [Neo4j] reify links with other neo4j databases located on >> different distributed servers >> >> As far as I know there is no internal support for transparent traversals >> across shards. Generally people are doing that in the application layer. >> However, I think there might be a middle ground of sorts. I we modify the >> relationship expander, I could imagine that
Re: [Neo4j] reify links with other neo4j databases located on different distributed servers
Thanks for sharing with us the details, Your solutions sounds innovative and interesting but You have stored the relationships only in the model graph. does it mean you have a fixed schema and how to store new relationships between different nodes. Its really difficult to understand how can you store relationships only once in a model graph and only nodes in the data graph? Regards, Ali On 7/3/11, Rick Bullotta wrote: > Our approach is very application-specific, but it can be summarized by: > > - We keep our model database on one server and our "run time" data (somewhat > like activity streams) on another server > - A long value (node id) "source" property on data nodes that identify a > model node in the other graph > - Long value (node id) "server" property on data nodes that identify a node > in the same graph, which contains "logical server" information stored as > properties (logical name + domain name/IP address + port + protocol) > - Lucene indices on the data nodes that index the data by tag(s), source, > and time > - Relationships in the "model" graph that describe inter-entity model > relationships (inheritance, reference), dependencies and usage references, > etc. > - Lucene indices on the model nodes that index the model entities by type, > tag(s) > - Lucene indices on the "tagging" vocabularies on both the model and data > graph(s) > > We avoided using relationships in the data graph due to the fact that we are > constantly adding and deleting potentially thousands of items per second, > and this could create concurrency and performance issues when there are > potentially millions of relationships on a node > > We didn't originally design it this way. The original approach was a single > (embedded) database, using relationships for all node<->node connections. > We're in the process of moving to our new design in phases, the first of > which was a logical separation of model + data, though in the same graph, > and switching from relationships to the "node id property" approach for some > specific scenarios. > > I have to think there are substantial performance implications *if* you are > trying to do complex cross-shard or cross-graph traversals, which we > generally do not need to do. Rather, we can deal with this at the > application layer. > > > > > -Original Message- > From: Aliabbas Petiwala [mailto:aliabba...@gmail.com] > Sent: Sunday, July 03, 2011 2:54 AM > To: Neo4j user discussions > Subject: Re: [Neo4j] reify links with other neo4j databases located on > different distributed servers > > Thanks a lot Rick > > can you please provide more details on issues which you faced while > using this approach and share some code with us . > Had you decided about this at design time itself and designed your > graph db schema accordingly? > Is there much perceived performance penalties if there are a large > number of such references spanning physical boundaries? > > On 7/2/11, Rick Bullotta wrote: >> We are using node-id property references (the node id as a property), >> qualified with a "logical server" reference, to provide this type of >> binding >> across graphs. If you combine these with an index, you can actually get a >> lot of the functionality of relationships "cross graph", spanning physical >> boundaries. Of course, as Craig points out, this all has to be done at >> the >> application level, including dealing with cascading deletes when a node is >> removed from one graph, ensuring that references to it in another graph >> are >> removed/redirected. >> >> -Original Message- >> From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] >> On >> Behalf Of Craig Taverner >> Sent: Saturday, July 02, 2011 6:03 AM >> To: Neo4j user discussions >> Subject: Re: [Neo4j] reify links with other neo4j databases located on >> different distributed servers >> >> As far as I know there is no internal support for transparent traversals >> across shards. Generally people are doing that in the application layer. >> However, I think there might be a middle ground of sorts. I we modify the >> relationship expander, I could imagine that relationships that are between >> shards could be modified to return node on the other shard. This would >> make >> the traversal return nodes across shards, but since I've not tried this >> myself, I am uncertain if there are other consequences. >> >> On Sat, Jul 2, 2011 at 4:03 AM, Aliabbas Petiwala >> wrote:
Re: [Neo4j] reify links with other neo4j databases located on different distributed servers
Our approach is very application-specific, but it can be summarized by: - We keep our model database on one server and our "run time" data (somewhat like activity streams) on another server - A long value (node id) "source" property on data nodes that identify a model node in the other graph - Long value (node id) "server" property on data nodes that identify a node in the same graph, which contains "logical server" information stored as properties (logical name + domain name/IP address + port + protocol) - Lucene indices on the data nodes that index the data by tag(s), source, and time - Relationships in the "model" graph that describe inter-entity model relationships (inheritance, reference), dependencies and usage references, etc. - Lucene indices on the model nodes that index the model entities by type, tag(s) - Lucene indices on the "tagging" vocabularies on both the model and data graph(s) We avoided using relationships in the data graph due to the fact that we are constantly adding and deleting potentially thousands of items per second, and this could create concurrency and performance issues when there are potentially millions of relationships on a node We didn't originally design it this way. The original approach was a single (embedded) database, using relationships for all node<->node connections. We're in the process of moving to our new design in phases, the first of which was a logical separation of model + data, though in the same graph, and switching from relationships to the "node id property" approach for some specific scenarios. I have to think there are substantial performance implications *if* you are trying to do complex cross-shard or cross-graph traversals, which we generally do not need to do. Rather, we can deal with this at the application layer. -Original Message- From: Aliabbas Petiwala [mailto:aliabba...@gmail.com] Sent: Sunday, July 03, 2011 2:54 AM To: Neo4j user discussions Subject: Re: [Neo4j] reify links with other neo4j databases located on different distributed servers Thanks a lot Rick can you please provide more details on issues which you faced while using this approach and share some code with us . Had you decided about this at design time itself and designed your graph db schema accordingly? Is there much perceived performance penalties if there are a large number of such references spanning physical boundaries? On 7/2/11, Rick Bullotta wrote: > We are using node-id property references (the node id as a property), > qualified with a "logical server" reference, to provide this type of binding > across graphs. If you combine these with an index, you can actually get a > lot of the functionality of relationships "cross graph", spanning physical > boundaries. Of course, as Craig points out, this all has to be done at the > application level, including dealing with cascading deletes when a node is > removed from one graph, ensuring that references to it in another graph are > removed/redirected. > > -Original Message- > From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On > Behalf Of Craig Taverner > Sent: Saturday, July 02, 2011 6:03 AM > To: Neo4j user discussions > Subject: Re: [Neo4j] reify links with other neo4j databases located on > different distributed servers > > As far as I know there is no internal support for transparent traversals > across shards. Generally people are doing that in the application layer. > However, I think there might be a middle ground of sorts. I we modify the > relationship expander, I could imagine that relationships that are between > shards could be modified to return node on the other shard. This would make > the traversal return nodes across shards, but since I've not tried this > myself, I am uncertain if there are other consequences. > > On Sat, Jul 2, 2011 at 4:03 AM, Aliabbas Petiwala > wrote: > >> Hi, >> >> I cannot figure out how my application logic can reify links with >> other neo4j databases located on different distributed servers? >> hence , how can i make the traversals and graph algorithms transparent >> to the location of the different databases ? >> -- >> Aliabbas Petiwala >> M.Tech CSE >> ___ >> Neo4j mailing list >> User@lists.neo4j.org >> https://lists.neo4j.org/mailman/listinfo/user >> > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > -- Aliabbas Petiwala M.Tech CSE ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] reify links with other neo4j databases located on different distributed servers
Thanks a lot Rick can you please provide more details on issues which you faced while using this approach and share some code with us . Had you decided about this at design time itself and designed your graph db schema accordingly? Is there much perceived performance penalties if there are a large number of such references spanning physical boundaries? On 7/2/11, Rick Bullotta wrote: > We are using node-id property references (the node id as a property), > qualified with a "logical server" reference, to provide this type of binding > across graphs. If you combine these with an index, you can actually get a > lot of the functionality of relationships "cross graph", spanning physical > boundaries. Of course, as Craig points out, this all has to be done at the > application level, including dealing with cascading deletes when a node is > removed from one graph, ensuring that references to it in another graph are > removed/redirected. > > -Original Message- > From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On > Behalf Of Craig Taverner > Sent: Saturday, July 02, 2011 6:03 AM > To: Neo4j user discussions > Subject: Re: [Neo4j] reify links with other neo4j databases located on > different distributed servers > > As far as I know there is no internal support for transparent traversals > across shards. Generally people are doing that in the application layer. > However, I think there might be a middle ground of sorts. I we modify the > relationship expander, I could imagine that relationships that are between > shards could be modified to return node on the other shard. This would make > the traversal return nodes across shards, but since I've not tried this > myself, I am uncertain if there are other consequences. > > On Sat, Jul 2, 2011 at 4:03 AM, Aliabbas Petiwala > wrote: > >> Hi, >> >> I cannot figure out how my application logic can reify links with >> other neo4j databases located on different distributed servers? >> hence , how can i make the traversals and graph algorithms transparent >> to the location of the different databases ? >> -- >> Aliabbas Petiwala >> M.Tech CSE >> ___ >> Neo4j mailing list >> User@lists.neo4j.org >> https://lists.neo4j.org/mailman/listinfo/user >> > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > -- Aliabbas Petiwala M.Tech CSE ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] reify links with other neo4j databases located on different distributed servers
We are using node-id property references (the node id as a property), qualified with a "logical server" reference, to provide this type of binding across graphs. If you combine these with an index, you can actually get a lot of the functionality of relationships "cross graph", spanning physical boundaries. Of course, as Craig points out, this all has to be done at the application level, including dealing with cascading deletes when a node is removed from one graph, ensuring that references to it in another graph are removed/redirected. -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Craig Taverner Sent: Saturday, July 02, 2011 6:03 AM To: Neo4j user discussions Subject: Re: [Neo4j] reify links with other neo4j databases located on different distributed servers As far as I know there is no internal support for transparent traversals across shards. Generally people are doing that in the application layer. However, I think there might be a middle ground of sorts. I we modify the relationship expander, I could imagine that relationships that are between shards could be modified to return node on the other shard. This would make the traversal return nodes across shards, but since I've not tried this myself, I am uncertain if there are other consequences. On Sat, Jul 2, 2011 at 4:03 AM, Aliabbas Petiwala wrote: > Hi, > > I cannot figure out how my application logic can reify links with > other neo4j databases located on different distributed servers? > hence , how can i make the traversals and graph algorithms transparent > to the location of the different databases ? > -- > Aliabbas Petiwala > M.Tech CSE > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] reify links with other neo4j databases located on different distributed servers
As far as I know there is no internal support for transparent traversals across shards. Generally people are doing that in the application layer. However, I think there might be a middle ground of sorts. I we modify the relationship expander, I could imagine that relationships that are between shards could be modified to return node on the other shard. This would make the traversal return nodes across shards, but since I've not tried this myself, I am uncertain if there are other consequences. On Sat, Jul 2, 2011 at 4:03 AM, Aliabbas Petiwala wrote: > Hi, > > I cannot figure out how my application logic can reify links with > other neo4j databases located on different distributed servers? > hence , how can i make the traversals and graph algorithms transparent > to the location of the different databases ? > -- > Aliabbas Petiwala > M.Tech CSE > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user