Re: [Neo4j] reify links with other neo4j databases located on different distributed servers

2011-07-05 Thread Rick Bullotta
Actually the opposite.  We use nodes in both databases, of course, but we  use 
node references via properties in the *data* graph to point back to entities in 
the *model* graph.



- Reply message -
From: "Aliabbas Petiwala" 
Date: Tue, Jul 5, 2011 1:10 am
Subject: [Neo4j] reify links with other neo4j databases located on different 
distributed servers
To: "Neo4j user discussions" 

Thanks for sharing with us the details,

Your solutions sounds innovative and interesting but You have stored
the relationships only in the model graph. does it mean you have a
fixed schema and how to store new relationships between different
nodes. Its really difficult to understand how can you store
relationships only once in a model graph and only nodes in the data
graph?
Regards,
Ali

On 7/3/11, Rick Bullotta  wrote:
> Our approach is very application-specific, but it can be summarized by:
>
> - We keep our model database on one server and our "run time" data (somewhat
> like activity streams) on another server
> - A long value (node id) "source" property on data nodes that identify a
> model node in the other graph
> - Long value (node id) "server" property on data nodes that identify a node
> in the same graph, which contains "logical server" information stored as
> properties (logical name + domain name/IP address + port + protocol)
> - Lucene indices on the data nodes that index the data by tag(s), source,
> and time
> - Relationships in the "model" graph that describe inter-entity model
> relationships (inheritance, reference), dependencies and usage references,
> etc.
> - Lucene indices on the model nodes that index the model entities by type,
> tag(s)
> - Lucene indices on the "tagging" vocabularies on both the model and data
> graph(s)
>
> We avoided using relationships in the data graph due to the fact that we are
> constantly adding and deleting potentially thousands of items per second,
> and this could create concurrency and performance issues when there are
> potentially millions of relationships on a node
>
> We didn't originally design it this way.  The original approach was a single
> (embedded) database, using relationships for all node<->node connections.
> We're in the process of moving to our new design in phases, the first of
> which was a logical separation of model + data, though in the same graph,
> and switching from relationships to the "node id property" approach for some
> specific scenarios.
>
> I have to think there are substantial performance implications *if* you are
> trying to do complex cross-shard or cross-graph traversals, which we
> generally do not need to do.  Rather, we can deal with this at the
> application layer.
>
>
>
>
> -Original Message-
> From: Aliabbas Petiwala [mailto:aliabba...@gmail.com]
> Sent: Sunday, July 03, 2011 2:54 AM
> To: Neo4j user discussions
> Subject: Re: [Neo4j] reify links with other neo4j databases located on
> different distributed servers
>
> Thanks a lot Rick
>
> can you please provide more details on  issues which you faced while
> using this approach and  share some code with us .
> Had you decided about this at design time itself and designed your
> graph db schema accordingly?
> Is there much perceived performance penalties if there are a large
> number of such references spanning physical boundaries?
>
> On 7/2/11, Rick Bullotta  wrote:
>> We are using node-id property references (the node id as a property),
>> qualified with a "logical server" reference, to provide this type of
>> binding
>> across graphs. If you combine these with an index, you can actually get a
>> lot of the functionality of relationships "cross graph", spanning physical
>> boundaries.  Of course, as Craig points out, this all has to be done at
>> the
>> application level, including dealing with cascading deletes when a node is
>> removed from one graph, ensuring that references to it in another graph
>> are
>> removed/redirected.
>>
>> -Original Message-
>> From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org]
>> On
>> Behalf Of Craig Taverner
>> Sent: Saturday, July 02, 2011 6:03 AM
>> To: Neo4j user discussions
>> Subject: Re: [Neo4j] reify links with other neo4j databases located on
>> different distributed servers
>>
>> As far as I know there is no internal support for transparent traversals
>> across shards. Generally people are doing that in the application layer.
>> However, I think there might be a middle ground of sorts. I we modify the
>> relationship expander, I could imagine that 

Re: [Neo4j] reify links with other neo4j databases located on different distributed servers

2011-07-04 Thread Aliabbas Petiwala
Thanks for sharing with us the details,

Your solutions sounds innovative and interesting but You have stored
the relationships only in the model graph. does it mean you have a
fixed schema and how to store new relationships between different
nodes. Its really difficult to understand how can you store
relationships only once in a model graph and only nodes in the data
graph?
Regards,
Ali

On 7/3/11, Rick Bullotta  wrote:
> Our approach is very application-specific, but it can be summarized by:
>
> - We keep our model database on one server and our "run time" data (somewhat
> like activity streams) on another server
> - A long value (node id) "source" property on data nodes that identify a
> model node in the other graph
> - Long value (node id) "server" property on data nodes that identify a node
> in the same graph, which contains "logical server" information stored as
> properties (logical name + domain name/IP address + port + protocol)
> - Lucene indices on the data nodes that index the data by tag(s), source,
> and time
> - Relationships in the "model" graph that describe inter-entity model
> relationships (inheritance, reference), dependencies and usage references,
> etc.
> - Lucene indices on the model nodes that index the model entities by type,
> tag(s)
> - Lucene indices on the "tagging" vocabularies on both the model and data
> graph(s)
>
> We avoided using relationships in the data graph due to the fact that we are
> constantly adding and deleting potentially thousands of items per second,
> and this could create concurrency and performance issues when there are
> potentially millions of relationships on a node
>
> We didn't originally design it this way.  The original approach was a single
> (embedded) database, using relationships for all node<->node connections.
> We're in the process of moving to our new design in phases, the first of
> which was a logical separation of model + data, though in the same graph,
> and switching from relationships to the "node id property" approach for some
> specific scenarios.
>
> I have to think there are substantial performance implications *if* you are
> trying to do complex cross-shard or cross-graph traversals, which we
> generally do not need to do.  Rather, we can deal with this at the
> application layer.
>
>
>
>
> -Original Message-
> From: Aliabbas Petiwala [mailto:aliabba...@gmail.com]
> Sent: Sunday, July 03, 2011 2:54 AM
> To: Neo4j user discussions
> Subject: Re: [Neo4j] reify links with other neo4j databases located on
> different distributed servers
>
> Thanks a lot Rick
>
> can you please provide more details on  issues which you faced while
> using this approach and  share some code with us .
> Had you decided about this at design time itself and designed your
> graph db schema accordingly?
> Is there much perceived performance penalties if there are a large
> number of such references spanning physical boundaries?
>
> On 7/2/11, Rick Bullotta  wrote:
>> We are using node-id property references (the node id as a property),
>> qualified with a "logical server" reference, to provide this type of
>> binding
>> across graphs. If you combine these with an index, you can actually get a
>> lot of the functionality of relationships "cross graph", spanning physical
>> boundaries.  Of course, as Craig points out, this all has to be done at
>> the
>> application level, including dealing with cascading deletes when a node is
>> removed from one graph, ensuring that references to it in another graph
>> are
>> removed/redirected.
>>
>> -Original Message-
>> From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org]
>> On
>> Behalf Of Craig Taverner
>> Sent: Saturday, July 02, 2011 6:03 AM
>> To: Neo4j user discussions
>> Subject: Re: [Neo4j] reify links with other neo4j databases located on
>> different distributed servers
>>
>> As far as I know there is no internal support for transparent traversals
>> across shards. Generally people are doing that in the application layer.
>> However, I think there might be a middle ground of sorts. I we modify the
>> relationship expander, I could imagine that relationships that are between
>> shards could be modified to return node on the other shard. This would
>> make
>> the traversal return nodes across shards, but since I've not tried this
>> myself, I am uncertain if there are other consequences.
>>
>> On Sat, Jul 2, 2011 at 4:03 AM, Aliabbas Petiwala
>> wrote:

Re: [Neo4j] reify links with other neo4j databases located on different distributed servers

2011-07-03 Thread Rick Bullotta
Our approach is very application-specific, but it can be summarized by:

- We keep our model database on one server and our "run time" data (somewhat 
like activity streams) on another server
- A long value (node id) "source" property on data nodes that identify a model 
node in the other graph
- Long value (node id) "server" property on data nodes that identify a node in 
the same graph, which contains "logical server" information stored as 
properties (logical name + domain name/IP address + port + protocol)
- Lucene indices on the data nodes that index the data by tag(s), source, and 
time
- Relationships in the "model" graph that describe inter-entity model 
relationships (inheritance, reference), dependencies and usage references, etc.
- Lucene indices on the model nodes that index the model entities by type, 
tag(s)
- Lucene indices on the "tagging" vocabularies on both the model and data 
graph(s)

We avoided using relationships in the data graph due to the fact that we are 
constantly adding and deleting potentially thousands of items per second, and 
this could create concurrency and performance issues when there are potentially 
millions of relationships on a node

We didn't originally design it this way.  The original approach was a single 
(embedded) database, using relationships for all node<->node connections. We're 
in the process of moving to our new design in phases, the first of which was a 
logical separation of model + data, though in the same graph, and switching 
from relationships to the "node id property" approach for some specific 
scenarios.

I have to think there are substantial performance implications *if* you are 
trying to do complex cross-shard or cross-graph traversals, which we generally 
do not need to do.  Rather, we can deal with this at the application layer.




-Original Message-
From: Aliabbas Petiwala [mailto:aliabba...@gmail.com] 
Sent: Sunday, July 03, 2011 2:54 AM
To: Neo4j user discussions
Subject: Re: [Neo4j] reify links with other neo4j databases located on 
different distributed servers

Thanks a lot Rick

can you please provide more details on  issues which you faced while
using this approach and  share some code with us .
Had you decided about this at design time itself and designed your
graph db schema accordingly?
Is there much perceived performance penalties if there are a large
number of such references spanning physical boundaries?

On 7/2/11, Rick Bullotta  wrote:
> We are using node-id property references (the node id as a property),
> qualified with a "logical server" reference, to provide this type of binding
> across graphs. If you combine these with an index, you can actually get a
> lot of the functionality of relationships "cross graph", spanning physical
> boundaries.  Of course, as Craig points out, this all has to be done at the
> application level, including dealing with cascading deletes when a node is
> removed from one graph, ensuring that references to it in another graph are
> removed/redirected.
>
> -Original Message-
> From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On
> Behalf Of Craig Taverner
> Sent: Saturday, July 02, 2011 6:03 AM
> To: Neo4j user discussions
> Subject: Re: [Neo4j] reify links with other neo4j databases located on
> different distributed servers
>
> As far as I know there is no internal support for transparent traversals
> across shards. Generally people are doing that in the application layer.
> However, I think there might be a middle ground of sorts. I we modify the
> relationship expander, I could imagine that relationships that are between
> shards could be modified to return node on the other shard. This would make
> the traversal return nodes across shards, but since I've not tried this
> myself, I am uncertain if there are other consequences.
>
> On Sat, Jul 2, 2011 at 4:03 AM, Aliabbas Petiwala
> wrote:
>
>> Hi,
>>
>> I cannot figure out how my application logic can reify links with
>> other neo4j databases located on different distributed servers?
>> hence , how can i make the traversals and graph algorithms transparent
>> to the location of the different databases ?
>> --
>> Aliabbas Petiwala
>> M.Tech CSE
>> ___
>> Neo4j mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
>>
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>


-- 
Aliabbas Petiwala
M.Tech CSE
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] reify links with other neo4j databases located on different distributed servers

2011-07-02 Thread Aliabbas Petiwala
Thanks a lot Rick

can you please provide more details on  issues which you faced while
using this approach and  share some code with us .
Had you decided about this at design time itself and designed your
graph db schema accordingly?
Is there much perceived performance penalties if there are a large
number of such references spanning physical boundaries?

On 7/2/11, Rick Bullotta  wrote:
> We are using node-id property references (the node id as a property),
> qualified with a "logical server" reference, to provide this type of binding
> across graphs. If you combine these with an index, you can actually get a
> lot of the functionality of relationships "cross graph", spanning physical
> boundaries.  Of course, as Craig points out, this all has to be done at the
> application level, including dealing with cascading deletes when a node is
> removed from one graph, ensuring that references to it in another graph are
> removed/redirected.
>
> -Original Message-
> From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On
> Behalf Of Craig Taverner
> Sent: Saturday, July 02, 2011 6:03 AM
> To: Neo4j user discussions
> Subject: Re: [Neo4j] reify links with other neo4j databases located on
> different distributed servers
>
> As far as I know there is no internal support for transparent traversals
> across shards. Generally people are doing that in the application layer.
> However, I think there might be a middle ground of sorts. I we modify the
> relationship expander, I could imagine that relationships that are between
> shards could be modified to return node on the other shard. This would make
> the traversal return nodes across shards, but since I've not tried this
> myself, I am uncertain if there are other consequences.
>
> On Sat, Jul 2, 2011 at 4:03 AM, Aliabbas Petiwala
> wrote:
>
>> Hi,
>>
>> I cannot figure out how my application logic can reify links with
>> other neo4j databases located on different distributed servers?
>> hence , how can i make the traversals and graph algorithms transparent
>> to the location of the different databases ?
>> --
>> Aliabbas Petiwala
>> M.Tech CSE
>> ___
>> Neo4j mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
>>
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>


-- 
Aliabbas Petiwala
M.Tech CSE
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] reify links with other neo4j databases located on different distributed servers

2011-07-02 Thread Rick Bullotta
We are using node-id property references (the node id as a property), qualified 
with a "logical server" reference, to provide this type of binding across 
graphs. If you combine these with an index, you can actually get a lot of the 
functionality of relationships "cross graph", spanning physical boundaries.  Of 
course, as Craig points out, this all has to be done at the application level, 
including dealing with cascading deletes when a node is removed from one graph, 
ensuring that references to it in another graph are removed/redirected.

-Original Message-
From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On 
Behalf Of Craig Taverner
Sent: Saturday, July 02, 2011 6:03 AM
To: Neo4j user discussions
Subject: Re: [Neo4j] reify links with other neo4j databases located on 
different distributed servers

As far as I know there is no internal support for transparent traversals
across shards. Generally people are doing that in the application layer.
However, I think there might be a middle ground of sorts. I we modify the
relationship expander, I could imagine that relationships that are between
shards could be modified to return node on the other shard. This would make
the traversal return nodes across shards, but since I've not tried this
myself, I am uncertain if there are other consequences.

On Sat, Jul 2, 2011 at 4:03 AM, Aliabbas Petiwala wrote:

> Hi,
>
> I cannot figure out how my application logic can reify links with
> other neo4j databases located on different distributed servers?
> hence , how can i make the traversals and graph algorithms transparent
> to the location of the different databases ?
> --
> Aliabbas Petiwala
> M.Tech CSE
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] reify links with other neo4j databases located on different distributed servers

2011-07-02 Thread Craig Taverner
As far as I know there is no internal support for transparent traversals
across shards. Generally people are doing that in the application layer.
However, I think there might be a middle ground of sorts. I we modify the
relationship expander, I could imagine that relationships that are between
shards could be modified to return node on the other shard. This would make
the traversal return nodes across shards, but since I've not tried this
myself, I am uncertain if there are other consequences.

On Sat, Jul 2, 2011 at 4:03 AM, Aliabbas Petiwala wrote:

> Hi,
>
> I cannot figure out how my application logic can reify links with
> other neo4j databases located on different distributed servers?
> hence , how can i make the traversals and graph algorithms transparent
> to the location of the different databases ?
> --
> Aliabbas Petiwala
> M.Tech CSE
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user