Re: [Neo4j] Traversal RelationshipExpander
Thanks again, Craig. I think we're on the same track. I agree about the 'join' node. I'm not quite sure what to call it yet in my model but the concept looks right. Thanks for the traversal tips, they make sense too. Kalin On Nov 25, 2010, at 4:50 AM, Craig Taverner wrote: Hi Kalin, I'm not sure I follow about duplicating. The suggestion I made did not involve any duplicating. The AssetType nodes would contain properties appropriate to the type of asset, and the Asset node would contain properties only appropriate to that instance (or none if none are appropriate). Perhaps it is easier if I rather revert back to your original names, and use Asset for the type again, and come up with a new name for the specific instance? Then you get something like: Producer[1] --(contributes_to)-- [X] --(subscribes_to)-- Consumer1 | (IS_A) | V Asset[P] --(subscribes_to)-- Consumer2 Now you can see that the X is really just like a join table in a relational database. In fact my suggestion is very similar to a common refactoring that occurs in a relational database when you have a foreign key to another table and then need to add properties to that relationship, you create an intermediate 'join table' and add the properties there. I definitely still think you will need to expand your producer-asset-consumer triple to be a producer-x-asset-consumer, where the 'x' is the 'join table' that allows a consumer to subscribe to assets by a particular producer. Which is what you want, right? I will answer, based on my model suggestion, each of the queries you ask below: 1. Given a Consumer, what Assets does it subscribe to? Traverse from consumer along both outgoing 'subscribes_to' and outgoing 'is_a' relationships, and you will get all assets regardless of whether the consumer subscribes to the asset in general, or the specific producer-asset. 2. Given a Consumer, what Producers is it dependent upon? Traverse from consumer along outgoing 'subscribes_to', incoming 'is_a' and incoming 'contributes_to' relationships, and you will get all producers of all assets (specific or otherwise) that the consumer subscribes to. 3. Given a Producer, what Assets does it contribute to? Traverse from producer along outgoing 'contributes_to', and outgoing 'is_a' relationships, and you get the Assets they they contribute to. 4. Given a Producer, what Consumers are dependent on it? Traverse from producer along outgoing 'contributes_to', outgoing 'is_a' relationships and incoming 'subscribes_to' and you get all Consumers that depend on assets that producer contributes to. This one is more subtle, though, because it depends on what you mean by 'dependent on'. Since the traverser I described will not exclude consumers that subscribe_to assets that are produced by both the given producer and others also (giving the consumer the choice of producer). If you only want consumers that have no choice, leave the 'is_a' relationship out of the traverser. 5. Given an Asset, what Producers contribute to it and what Consumers subscribe to it? For producers traverse on incoming 'is_a' and incoming *contributes_to'. For consumers traverse on incoming 'is_a' and incoming 'subscribes_to'. There's other node types and properties within the graph that aren't important or that I can't discuss (this is a mock model anyway). Hopefully the fact that I renamed my AssetType back to your original Asset, and have a new 'join table', possibly called 'producer-asset' for the additional concept I added before will make it easier to see how to fit this model into your existing structures. What is the consensus about duplicating node data within a network? I can see how using indexing or a hierarchy, as you've pointed out, might help with that. Part of my hangup is that I'm looking at using Neo4J as an adjunct to a RDBMS to store dependency relationships. Each node will have information that ties back to the RDBMS for lookups. But that doesn't require absolutely unique nodes. Keeping the two DBs in sync will be a challenge but I don't have the option to push all of the data into Neo4J and I'd rather not manage the dependencies in the RDBMS. I dislike duplicating data in neo4j as much as I dislike duplicating it in a RDBMS. I'm sorry if I gave the impression in my previous email that I was suggestion duplication. My suggestion was just the addition of this 'join-table' idea (an extra node in the graph), so make it possible to capture the concept of a consumer subscribing to an asset and an asset by a particular producer. Actually, now that I've talked around it, I think I see how your model would work for what I want to do. I'd have to see how a Traverser would return the results.
Re: [Neo4j] Traversal RelationshipExpander
Thanks for your suggestions regarding my model, Craig. I agree about the ids. Using a generalization for Asset Types might be something to look at. I'm still stuck on wanting to capture the general nature of an Asset as well as the specific triple of Producer-Asset-Consumer. I'll have to think about your statement that Asset is specific to a Producer. In my current model, an Asset may have multiple Producers and multiple Consumers. The concept of an Asset is more general than a single Producer-Consumer relationship. Maybe it's appropriate to duplicate the Asset node (not the actual node but the properties that identify an Asset). My relational DB mind doesn't want that kind of duplication. Some of the questions I want to answer: 1. Given a Consumer, what Assets does it subscribe to? 2. Given a Consumer, what Producers is it dependent upon? 3. Given a Producer, what Assets does it contribute to? 4. Given a Producer, what Consumers are dependent on it? 5. Given an Asset, what Producers contribute to it and what Consumers subscribe to it? There's other node types and properties within the graph that aren't important or that I can't discuss (this is a mock model anyway). What is the consensus about duplicating node data within a network? I can see how using indexing or a hierarchy, as you've pointed out, might help with that. Part of my hangup is that I'm looking at using Neo4J as an adjunct to a RDBMS to store dependency relationships. Each node will have information that ties back to the RDBMS for lookups. But that doesn't require absolutely unique nodes. Keeping the two DBs in sync will be a challenge but I don't have the option to push all of the data into Neo4J and I'd rather not manage the dependencies in the RDBMS. Actually, now that I've talked around it, I think I see how your model would work for what I want to do. I'd have to see how a Traverser would return the results. Thanks for your time, Kalin On Nov 24, 2010, at 4:50 PM, Craig Taverner wrote: Hi, I also do not like having the producers ids in the relationship. This is like having an non-indexed foreign key. I think the right solution is to change the database structure to match the intention of the model. I'll go out on a limb here and make some assumptions about what you really mean by your model. You want to ask 'who consumes assets by this producer'. Your problem is that what you are calling an Asset is actually a type of Asset, but some consumers are interested in a specific asset, not just all assets of that type, and so you are having to add extra data to resolve the type in the 'subscribes' relationship. A far better solution is to have a real asset node. By that I mean a 'ford mustan '68' node instead of a 'car' node. Then the case where the subscribes node has no producers property really means the Consumer subscribers to the AssetType, and the case where there are producers properties, the Consumer subscribers to those specific Assets (where Asset is now specific to a producer). Then the model will have no foreign keys, only relationships, and a plain old standard out-the-box traverser will give you your answer :-) Some further clarifications of the relationships I see in the graph: - Assets can have IS_A relationships to AssetTypes - Produces CONTRIBUTE_TO Assets, never AssetTypes (just like Ford produces the 'Ford Mustang', not 'cars') - Consumers can SUBSCRIBE_TO either Assets directly (ie. specific models), or AssetTypes (this deals with your two cases before, but no longer needs ids in properties) The traverser can traverse directly from the Producer to the Subscriber without any complications. Just follow the right relationships in the right order, and only return Subscribers. The paths could look like: Producer1 --(contributes_to)-- AssetX --(subscribes_to)-- Consumer1 | (IS_A) | V AssetTypeP --(subscribes_to)-- Consumer2 Notice that traversing to consumers that are subscribing to specific assets (assets by specific producers) is a shorter path than traversing to consumers that subscribe to assets by any producer (asset types). This should have no impact on the traverser. Just remember to include the IS_A relationships type (with the right direction) to get the results you want. Cheers, Craig On Thu, Nov 25, 2010 at 12:00 AM, Mattias Persson matt...@neotechnology.com wrote: Hi Kalin, To begin with I'm not fond of storing ids as properties... that's what relationships are for. So I'd perhaps insert a middle node between Asset and Consumer which then also can have relationships to Producers. Anyways, to get that behaviour you can use a filter which will exclude unwanted paths. Traversal.description().uniqueness(RELATIONSHIP_PATH)
Re: [Neo4j] Traversal RelationshipExpander
Hi Kalin, I'm not sure I follow about duplicating. The suggestion I made did not involve any duplicating. The AssetType nodes would contain properties appropriate to the type of asset, and the Asset node would contain properties only appropriate to that instance (or none if none are appropriate). Perhaps it is easier if I rather revert back to your original names, and use Asset for the type again, and come up with a new name for the specific instance? Then you get something like: Producer[1] --(contributes_to)-- [X] --(subscribes_to)-- Consumer1 | (IS_A) | V Asset[P] --(subscribes_to)-- Consumer2 Now you can see that the X is really just like a join table in a relational database. In fact my suggestion is very similar to a common refactoring that occurs in a relational database when you have a foreign key to another table and then need to add properties to that relationship, you create an intermediate 'join table' and add the properties there. I definitely still think you will need to expand your producer-asset-consumer triple to be a producer-x-asset-consumer, where the 'x' is the 'join table' that allows a consumer to subscribe to assets by a particular producer. Which is what you want, right? I will answer, based on my model suggestion, each of the queries you ask below: 1. Given a Consumer, what Assets does it subscribe to? Traverse from consumer along both outgoing 'subscribes_to' and outgoing 'is_a' relationships, and you will get all assets regardless of whether the consumer subscribes to the asset in general, or the specific producer-asset. 2. Given a Consumer, what Producers is it dependent upon? Traverse from consumer along outgoing 'subscribes_to', incoming 'is_a' and incoming 'contributes_to' relationships, and you will get all producers of all assets (specific or otherwise) that the consumer subscribes to. 3. Given a Producer, what Assets does it contribute to? Traverse from producer along outgoing 'contributes_to', and outgoing 'is_a' relationships, and you get the Assets they they contribute to. 4. Given a Producer, what Consumers are dependent on it? Traverse from producer along outgoing 'contributes_to', outgoing 'is_a' relationships and incoming 'subscribes_to' and you get all Consumers that depend on assets that producer contributes to. This one is more subtle, though, because it depends on what you mean by 'dependent on'. Since the traverser I described will not exclude consumers that subscribe_to assets that are produced by both the given producer and others also (giving the consumer the choice of producer). If you only want consumers that have no choice, leave the 'is_a' relationship out of the traverser. 5. Given an Asset, what Producers contribute to it and what Consumers subscribe to it? For producers traverse on incoming 'is_a' and incoming *contributes_to'. For consumers traverse on incoming 'is_a' and incoming 'subscribes_to'. There's other node types and properties within the graph that aren't important or that I can't discuss (this is a mock model anyway). Hopefully the fact that I renamed my AssetType back to your original Asset, and have a new 'join table', possibly called 'producer-asset' for the additional concept I added before will make it easier to see how to fit this model into your existing structures. What is the consensus about duplicating node data within a network? I can see how using indexing or a hierarchy, as you've pointed out, might help with that. Part of my hangup is that I'm looking at using Neo4J as an adjunct to a RDBMS to store dependency relationships. Each node will have information that ties back to the RDBMS for lookups. But that doesn't require absolutely unique nodes. Keeping the two DBs in sync will be a challenge but I don't have the option to push all of the data into Neo4J and I'd rather not manage the dependencies in the RDBMS. I dislike duplicating data in neo4j as much as I dislike duplicating it in a RDBMS. I'm sorry if I gave the impression in my previous email that I was suggestion duplication. My suggestion was just the addition of this 'join-table' idea (an extra node in the graph), so make it possible to capture the concept of a consumer subscribing to an asset and an asset by a particular producer. Actually, now that I've talked around it, I think I see how your model would work for what I want to do. I'd have to see how a Traverser would return the results. Thanks for your time, Super. Perhaps I should have read your complete mail before writing this new one or hopefully the new one, especially all the traverser suggestions, has helped clarify further ? :-) Regards, Craig ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] Traversal RelationshipExpander
Im enjoying Neo4J so far. The new Traversal framework has a lot of potential. However, Id like to propose an extension to the RelationshipExpander interface or have someone tell me of another way to accomplish a task. Here is an outline of the basics of my network: Producer contributes_to = Asset = subscribes_to Consumer (various properties on each type of Node and Relationship) A Consumer may subscribe to an Asset generically or the subscribes_to relationship may have a property, producers, which is an array of ids (long) of Producer nodes that specifically produce assets for that Consumer. Given a list of Producer nodes, I want to retrieve all paths from a Producer to all of its Consumers through Assets. When at an Asset node the subscribes_to relationship should only be traversed if the relationship has no producers property (meaning the related Consumer consumes from all producers of that asset) or if the producers property contains the id of the Producer that we started with. My first approach was to implement RelationshipExpander to determine which relationships from Asset to traverse. However, since all the Expand() method has to work with is the current Node, I dont have enough information to make the decision above. I would need the current Path in order to know what Producer related to the current Asset the Traversal came from. Right now Im using this description which will give me Producer-Asset-Consumer paths but wont exclude based on producers on subscribes_to: TraversalDescription producerToConsumer = Traversal.description() .depthFirst().uniqueness(Uniqueness.RELATIONSHIP_PATH) .relationships(RelTypes.Contributes_to, Direction.OUTGOING) .relationships(RelTypes.Subscribes_to,Direction.BOTH) .prune(Traversal.pruneAfterDepth(2)); If there is a way to accomplish what I describe above with the current framework, Im open to suggestions, including a different network design to track the Producer to Consumer relationship as it relates to Asset. Alternatively, I suggest that there may be situations where the Path context is needed to make the decision on what relationships to traverse from a Node, hence perhaps add Expand(Path p) to the RelationshipExpander interface. Thanks, Kalin -- Kalin Wilson http://www.kalinwilson.com Message sent using UebiMiau 2.7.9 ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Traversal RelationshipExpander
Hi Kalin, To begin with I'm not fond of storing ids as properties... that's what relationships are for. So I'd perhaps insert a middle node between Asset and Consumer which then also can have relationships to Producers. Anyways, to get that behaviour you can use a filter which will exclude unwanted paths. Traversal.description().uniqueness(RELATIONSHIP_PATH) .relationships(Contributes_to, OUTGOING) .relationships(Subscribes_to, INCOMING) .filter(new PredicatePath() { public boolean accept(Path position) { if ( position.length() != 2 ) return false; Relationship subscribesToRel = position.lastRelationship(); if ( /* check properties on subscribesToRel is OK */ ) return true; return false; } }); do you need the pruneAfterDepth(2) here? I don't think so because I don't think your traverser will be able to go deeper anyways, but that's just a detail. 2010/11/24 Kalin Wilson d...@kalinwilson.com Im enjoying Neo4J so far. The new Traversal framework has a lot of potential. However, Id like to propose an extension to the RelationshipExpander interface or have someone tell me of another way to accomplish a task. Here is an outline of the basics of my network: Producer contributes_to = Asset = subscribes_to Consumer (various properties on each type of Node and Relationship) A Consumer may subscribe to an Asset generically or the subscribes_to relationship may have a property, producers, which is an array of ids (long) of Producer nodes that specifically produce assets for that Consumer. Given a list of Producer nodes, I want to retrieve all paths from a Producer to all of its Consumers through Assets. When at an Asset node the subscribes_to relationship should only be traversed if the relationship has no producers property (meaning the related Consumer consumes from all producers of that asset) or if the producers property contains the id of the Producer that we started with. My first approach was to implement RelationshipExpander to determine which relationships from Asset to traverse. However, since all the Expand() method has to work with is the current Node, I dont have enough information to make the decision above. I would need the current Path in order to know what Producer related to the current Asset the Traversal came from. Right now Im using this description which will give me Producer-Asset-Consumer paths but wont exclude based on producers on subscribes_to: TraversalDescription producerToConsumer = Traversal.description() .depthFirst().uniqueness(Uniqueness.RELATIONSHIP_PATH) .relationships(RelTypes.Contributes_to, Direction.OUTGOING) .relationships(RelTypes.Subscribes_to,Direction.BOTH) .prune(Traversal.pruneAfterDepth(2)); If there is a way to accomplish what I describe above with the current framework, Im open to suggestions, including a different network design to track the Producer to Consumer relationship as it relates to Asset. Alternatively, I suggest that there may be situations where the Path context is needed to make the decision on what relationships to traverse from a Node, hence perhaps add Expand(Path p) to the RelationshipExpander interface. Thanks, Kalin -- Kalin Wilson http://www.kalinwilson.com Message sent using UebiMiau 2.7.9 ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Traversal RelationshipExpander
Hi, I also do not like having the producers ids in the relationship. This is like having an non-indexed foreign key. I think the right solution is to change the database structure to match the intention of the model. I'll go out on a limb here and make some assumptions about what you really mean by your model. You want to ask 'who consumes assets by this producer'. Your problem is that what you are calling an Asset is actually a type of Asset, but some consumers are interested in a specific asset, not just all assets of that type, and so you are having to add extra data to resolve the type in the 'subscribes' relationship. A far better solution is to have a real asset node. By that I mean a 'ford mustan '68' node instead of a 'car' node. Then the case where the subscribes node has no producers property really means the Consumer subscribers to the AssetType, and the case where there are producers properties, the Consumer subscribers to those specific Assets (where Asset is now specific to a producer). Then the model will have no foreign keys, only relationships, and a plain old standard out-the-box traverser will give you your answer :-) Some further clarifications of the relationships I see in the graph: - Assets can have IS_A relationships to AssetTypes - Produces CONTRIBUTE_TO Assets, never AssetTypes (just like Ford produces the 'Ford Mustang', not 'cars') - Consumers can SUBSCRIBE_TO either Assets directly (ie. specific models), or AssetTypes (this deals with your two cases before, but no longer needs ids in properties) The traverser can traverse directly from the Producer to the Subscriber without any complications. Just follow the right relationships in the right order, and only return Subscribers. The paths could look like: Producer1 --(contributes_to)-- AssetX --(subscribes_to)-- Consumer1 | (IS_A) | V AssetTypeP --(subscribes_to)-- Consumer2 Notice that traversing to consumers that are subscribing to specific assets (assets by specific producers) is a shorter path than traversing to consumers that subscribe to assets by any producer (asset types). This should have no impact on the traverser. Just remember to include the IS_A relationships type (with the right direction) to get the results you want. Cheers, Craig On Thu, Nov 25, 2010 at 12:00 AM, Mattias Persson matt...@neotechnology.com wrote: Hi Kalin, To begin with I'm not fond of storing ids as properties... that's what relationships are for. So I'd perhaps insert a middle node between Asset and Consumer which then also can have relationships to Producers. Anyways, to get that behaviour you can use a filter which will exclude unwanted paths. Traversal.description().uniqueness(RELATIONSHIP_PATH) .relationships(Contributes_to, OUTGOING) .relationships(Subscribes_to, INCOMING) .filter(new PredicatePath() { public boolean accept(Path position) { if ( position.length() != 2 ) return false; Relationship subscribesToRel = position.lastRelationship(); if ( /* check properties on subscribesToRel is OK */ ) return true; return false; } }); do you need the pruneAfterDepth(2) here? I don't think so because I don't think your traverser will be able to go deeper anyways, but that's just a detail. 2010/11/24 Kalin Wilson d...@kalinwilson.com Im enjoying Neo4J so far. The new Traversal framework has a lot of potential. However, Id like to propose an extension to the RelationshipExpander interface or have someone tell me of another way to accomplish a task. Here is an outline of the basics of my network: Producer contributes_to = Asset = subscribes_to Consumer (various properties on each type of Node and Relationship) A Consumer may subscribe to an Asset generically or the subscribes_to relationship may have a property, producers, which is an array of ids (long) of Producer nodes that specifically produce assets for that Consumer. Given a list of Producer nodes, I want to retrieve all paths from a Producer to all of its Consumers through Assets. When at an Asset node the subscribes_to relationship should only be traversed if the relationship has no producers property (meaning the related Consumer consumes from all producers of that asset) or if the producers property contains the id of the Producer that we started with. My first approach was to implement RelationshipExpander to determine which relationships from Asset to traverse. However, since all the Expand() method has to work with is the current Node, I dont have enough information to make the decision above. I would need the current Path in order to know what Producer related to the current Asset the Traversal
Re: [Neo4j] Traversal RelationshipExpander
Thanks Mattias. I agree about storing the IDs, it doesn't feel right but I'm still working out the best network model. Thanks for the example, I guess I need to relook at when to use a filter vs control the traversal other ways. On Nov 24, 2010, at 4:00 PM, Mattias Persson wrote: Hi Kalin, To begin with I'm not fond of storing ids as properties... that's what relationships are for. So I'd perhaps insert a middle node between Asset and Consumer which then also can have relationships to Producers. Anyways, to get that behaviour you can use a filter which will exclude unwanted paths. Traversal.description().uniqueness(RELATIONSHIP_PATH) .relationships(Contributes_to, OUTGOING) .relationships(Subscribes_to, INCOMING) .filter(new PredicatePath() { public boolean accept(Path position) { if ( position.length() != 2 ) return false; Relationship subscribesToRel = position.lastRelationship(); if ( /* check properties on subscribesToRel is OK */ ) return true; return false; } }); do you need the pruneAfterDepth(2) here? I don't think so because I don't think your traverser will be able to go deeper anyways, but that's just a detail. -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user