Re: [Neo4j] Modelling with neo4j
Great Bryce, Let us know if it could work out! /peter Sent from my phone. On Sep 28, 2011 5:38 AM, "Bryce" wrote: > Following up on the part of this discussion about moving the enhanced api > out of the graph collections module, was meaning to get to this earlier but > got side tracked. > > The dependency that IndexedRelationship had on the ComparablePropertyType > which I am assuming is what you are referring to there no longer exists. > PropertySortedTree still has a dependency on ComparablePropertyType but > this collection could be taken along with the enhanced api as a > specialisation of the SortedTree collection specific to the enhanced api. > As it still implements NodeCollection it can be used as the basis of an > IndexedRelationship, but IndexedRelationship doesn't need to know about it > at all. > > I am wondering however whether there is a problem with this as one thing I > just realised isn't happening that previously did is the storage of > the property type into the node collection, and for that matter > PropertySortedTree currently has no node only constructor so wouldn't > currently work correctly I will look into that (hadn't done that since I > haven't even had a good look at the enhanced api yet). > > On Sun, Sep 25, 2011 at 3:28 AM, Niels Hoogeveen > wrote: > >> >> +1 >> Enhanced API grew out of a couple of classes I added to make >> IndexedRelationship work more easily (not exposing comparators), but it is >> essentially a separate component. Giving it that status would help other's >> improve it. Having laid some of the ground work, I feel it needs other >> people's input too. As it stands now, it is very much a one-man's work and >> while I am confident it contains plenty of good ideas, it can only grow with >> the input of other developers, just like IndexedRelationships has become >> much better thanks to the work Bryce put into it, and the work of others to >> include graph-collections with structures I would not even have thought >> about. >> There is however one thing we need to look at. Right now IndexRelationships >> has a dependency on Enhanced API for the indexing of nodes based on a >> property. At the same time Enhanced API has a dependency on >> graph-collections, transparently supporting IndexedRelationships in the API. >> I think it would be best to remove the dependency of graph-collections on >> enhanced-api and only offer the slightly more complex option where the user >> needs to provide a comparator. The other dependency can remain and in fact >> can even be made stronger. Enhanced API could in principle be made to >> support any type of collection, now that Bryce has added a generic >> nodecollection interface. >> I agree "enhanced api" is not a great name, it says what it does, but >> certainly has little appeal. So I will be happy if someone can come up with >> something sexier. >> Niels >> > From: peter.neuba...@neotechnology.com >> > Date: Sat, 24 Sep 2011 15:42:13 +0200 >> > To: user@lists.neo4j.org >> > Subject: Re: [Neo4j] Modelling with neo4j >> > >> > Great thoughts guys! >> > I think it would be interesting to break out the Enhanced API" from >> > graph-collections, rename it into something better (we can think of a >> > name together) and provide a more fully fledged example that we can >> > document and evolve. >> > >> > WDYT? >> > >> > Cheers, >> > >> > /peter neubauer >> > >> > GTalk: neubauer.peter >> > Skype peter.neubauer >> > Phone +46 704 106975 >> > LinkedIn http://www.linkedin.com/in/neubauer >> > Twitter http://twitter.com/peterneubauer >> > >> > http://www.neo4j.org - Your high performance graph >> database. >> > http://startupbootcamp.org/ - Öresund - Innovation happens HERE. >> > http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. >> > >> > >> > >> > On Sat, Sep 24, 2011 at 3:37 PM, Rick Bullotta >> > wrote: >> > > That's a great summary, Niels. Very similar to how we've applied Neo4J >> here at ThingWorx, though we've done most of the type system work (nodes and >> relationships are all typed/subtyped) in our application domain layer. A >> few other items that we leveraged in our implementation that you may wish to >> consider: >> > > >> > > - A common pattern we encountered was a "collection" of typed entities >> (e.g. a typed collection), and we im
Re: [Neo4j] Modelling with neo4j
Following up on the part of this discussion about moving the enhanced api out of the graph collections module, was meaning to get to this earlier but got side tracked. The dependency that IndexedRelationship had on the ComparablePropertyType which I am assuming is what you are referring to there no longer exists. PropertySortedTree still has a dependency on ComparablePropertyType but this collection could be taken along with the enhanced api as a specialisation of the SortedTree collection specific to the enhanced api. As it still implements NodeCollection it can be used as the basis of an IndexedRelationship, but IndexedRelationship doesn't need to know about it at all. I am wondering however whether there is a problem with this as one thing I just realised isn't happening that previously did is the storage of the property type into the node collection, and for that matter PropertySortedTree currently has no node only constructor so wouldn't currently work correctly I will look into that (hadn't done that since I haven't even had a good look at the enhanced api yet). On Sun, Sep 25, 2011 at 3:28 AM, Niels Hoogeveen wrote: > > +1 > Enhanced API grew out of a couple of classes I added to make > IndexedRelationship work more easily (not exposing comparators), but it is > essentially a separate component. Giving it that status would help other's > improve it. Having laid some of the ground work, I feel it needs other > people's input too. As it stands now, it is very much a one-man's work and > while I am confident it contains plenty of good ideas, it can only grow with > the input of other developers, just like IndexedRelationships has become > much better thanks to the work Bryce put into it, and the work of others to > include graph-collections with structures I would not even have thought > about. > There is however one thing we need to look at. Right now IndexRelationships > has a dependency on Enhanced API for the indexing of nodes based on a > property. At the same time Enhanced API has a dependency on > graph-collections, transparently supporting IndexedRelationships in the API. > I think it would be best to remove the dependency of graph-collections on > enhanced-api and only offer the slightly more complex option where the user > needs to provide a comparator. The other dependency can remain and in fact > can even be made stronger. Enhanced API could in principle be made to > support any type of collection, now that Bryce has added a generic > nodecollection interface. > I agree "enhanced api" is not a great name, it says what it does, but > certainly has little appeal. So I will be happy if someone can come up with > something sexier. > Niels > > From: peter.neuba...@neotechnology.com > > Date: Sat, 24 Sep 2011 15:42:13 +0200 > > To: user@lists.neo4j.org > > Subject: Re: [Neo4j] Modelling with neo4j > > > > Great thoughts guys! > > I think it would be interesting to break out the Enhanced API" from > > graph-collections, rename it into something better (we can think of a > > name together) and provide a more fully fledged example that we can > > document and evolve. > > > > WDYT? > > > > Cheers, > > > > /peter neubauer > > > > GTalk: neubauer.peter > > Skype peter.neubauer > > Phone +46 704 106975 > > LinkedIn http://www.linkedin.com/in/neubauer > > Twitter http://twitter.com/peterneubauer > > > > http://www.neo4j.org - Your high performance graph > database. > > http://startupbootcamp.org/- Öresund - Innovation happens HERE. > > http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. > > > > > > > > On Sat, Sep 24, 2011 at 3:37 PM, Rick Bullotta > > wrote: > > > That's a great summary, Niels. Very similar to how we've applied Neo4J > here at ThingWorx, though we've done most of the type system work (nodes and > relationships are all typed/subtyped) in our application domain layer. A > few other items that we leveraged in our implementation that you may wish to > consider: > > > > > > - A common pattern we encountered was a "collection" of typed entities > (e.g. a typed collection), and we implemented a specific model using > supernodes for this. This also allowed us to rapidly and easily > iterate/search collections and also to organize nodes in a "human > comprehensible way" that can be readily viewed with something like Neoclipse > for troubleshooting purposes. Also, if the type was "truck", we stamped the > node with the type "truck" as a property (using enumerations with a custom > i
Re: [Neo4j] Modelling with neo4j
Beautiful. Thank you very much :) Jon On Sep 25, 2011 5:48 AM, "Peter Hunsberger [via Neo4j Community Discussions]" wrote: > > > I'm going to take a slightly different tack here than the responses you've > got so far... > > First as others have pointed out, this is three entities and two > relationships > > joe - is a - janitor - at the - school > > This is important, the lighter weight the relationships, the less problem > you are going to have with needing some form of meta type for them. You > really want to follow the basic OO lead here and stick to "isa" and "hasa". > The "at the" is essentially a "hasa" (a school has a janitor). If you can > map your relationships to these two simple concepts then you've got a > realistic model, otherwise you probably need to refactor / abstract or > normalize (take your pick depending on your background, they're essentially > the same thing at this level of modelling). > > That leads to point number two. Modelling is modelling is modelling, and > although a graph database might let you get up and running easily it's not > going to save you from modelling your domain properly in the long run. The > worse job you do up front the more work it's going to be to fix it in the > long run. One common pattern is that if you think you have a meta type then > use that for your entity and add the details that makes it a specific sub > type as instance data: > > a janitor is a type of job > > therefore the entity should be "job", not janitor, job type is a property of > job > > school is a type of building (or maybe even more abstractly a type of > location), building type is a property of building > > therefore the entity should be building. Though in this case it may be > location depending on exactly what the domain is going to be used for, > though most likely a building "hasa" location... > > As long as your number of subtypes is some reasonably small number then this > pattern of abstraction works well. Here, "reasonably small" is in the range > of things you don't mind coding up a enum for in your code. (How's that for > a non-committal type guild-line? I'll pick 10 as an arbitrary limit if > pushed.) > > Note that this need for abstraction is true for relational or graph > databases. A relational database can behave polymorphically with respect to > type just as much as a graph database, the difference is that with a > relational database, as you make the model more abstract the number of joins > needed to fully resolve type goes up (assuming its a fully normalized > model). With a graph database you can always be an edge away. However, in > this case the cost is the number of relationships that must be examined. > There is no free lunch, the space / time trade off will always be there and > that is what you have to worry about as you determine whether you want to > abstract more and build more of a meta model. > > This brings us to my third point. Your layers are perfectly realizable in a > graph database (or a relational database for that matter). There is no > reason why the entity "janitor" can't have an "isa" relationship with > "person" and it in turn can have an "isa" relationship with the metatype > "mammal" if need be. Same with school, building and even structure if you > need to go that far. If you really need this, proceed with caution, the > metatypes are going to have many relationships as your instances grow and > the cost of maintaining the metadata this way could get expensive, not so > much in ways of space (since you've normalized out the metadata) but in > terms of the time needed to traverse all the relationships. However, if you > have millions of each janitor or school and all of their equivalent sub > types then this may be the way to go. IE; if you have M subtypes of job > where M is a medium sized number (say 10 < M > 50) then you have can reduce > the size of the index on job by instead implementing the M different > subtypes (you more-or-less divide the index size for jobs by M). As I said, > the cost is that you know have a whole bunch more relationship to the > metatype, but if you don't normally have to touch them, or if finding > specific instances of a subtype is a common and / or maybe expensive > operation for some reason you've got the right model. > > Last point. What about the case where you have more than 50, or whatever > you consider some large number of subtypes? In that case your model is > likely wrong. There's hopefully a way to split the subtypes into categories > . If some categories overlap then figure out what makes them overlap and > split that off as a category (type or metatype) of it's own. In other > words, it's time to refactor, which takes us full circle back to point 1 and > it's now time to go get some sleep. > > Hope this helps. > > > Peter Hunsberger > > > On Sat, Sep 24, 2011 at 12:52 AM, loldrup wrote: > >> I'm trying to figure out how to model the world most flexibly (okay, so I'm >> sticking to modelling
Re: [Neo4j] Modelling with neo4j
I'm going to take a slightly different tack here than the responses you've got so far... First as others have pointed out, this is three entities and two relationships joe - is a - janitor - at the - school This is important, the lighter weight the relationships, the less problem you are going to have with needing some form of meta type for them. You really want to follow the basic OO lead here and stick to "isa" and "hasa". The "at the" is essentially a "hasa" (a school has a janitor). If you can map your relationships to these two simple concepts then you've got a realistic model, otherwise you probably need to refactor / abstract or normalize (take your pick depending on your background, they're essentially the same thing at this level of modelling). That leads to point number two. Modelling is modelling is modelling, and although a graph database might let you get up and running easily it's not going to save you from modelling your domain properly in the long run. The worse job you do up front the more work it's going to be to fix it in the long run. One common pattern is that if you think you have a meta type then use that for your entity and add the details that makes it a specific sub type as instance data: a janitor is a type of job therefore the entity should be "job", not janitor, job type is a property of job school is a type of building (or maybe even more abstractly a type of location), building type is a property of building therefore the entity should be building. Though in this case it may be location depending on exactly what the domain is going to be used for, though most likely a building "hasa" location... As long as your number of subtypes is some reasonably small number then this pattern of abstraction works well. Here, "reasonably small" is in the range of things you don't mind coding up a enum for in your code. (How's that for a non-committal type guild-line? I'll pick 10 as an arbitrary limit if pushed.) Note that this need for abstraction is true for relational or graph databases. A relational database can behave polymorphically with respect to type just as much as a graph database, the difference is that with a relational database, as you make the model more abstract the number of joins needed to fully resolve type goes up (assuming its a fully normalized model). With a graph database you can always be an edge away. However, in this case the cost is the number of relationships that must be examined. There is no free lunch, the space / time trade off will always be there and that is what you have to worry about as you determine whether you want to abstract more and build more of a meta model. This brings us to my third point. Your layers are perfectly realizable in a graph database (or a relational database for that matter). There is no reason why the entity "janitor" can't have an "isa" relationship with "person" and it in turn can have an "isa" relationship with the metatype "mammal" if need be. Same with school, building and even structure if you need to go that far. If you really need this, proceed with caution, the metatypes are going to have many relationships as your instances grow and the cost of maintaining the metadata this way could get expensive, not so much in ways of space (since you've normalized out the metadata) but in terms of the time needed to traverse all the relationships. However, if you have millions of each janitor or school and all of their equivalent sub types then this may be the way to go. IE; if you have M subtypes of job where M is a medium sized number (say 10 < M > 50) then you have can reduce the size of the index on job by instead implementing the M different subtypes (you more-or-less divide the index size for jobs by M). As I said, the cost is that you know have a whole bunch more relationship to the metatype, but if you don't normally have to touch them, or if finding specific instances of a subtype is a common and / or maybe expensive operation for some reason you've got the right model. Last point. What about the case where you have more than 50, or whatever you consider some large number of subtypes? In that case your model is likely wrong. There's hopefully a way to split the subtypes into categories . If some categories overlap then figure out what makes them overlap and split that off as a category (type or metatype) of it's own. In other words, it's time to refactor, which takes us full circle back to point 1 and it's now time to go get some sleep. Hope this helps. Peter Hunsberger On Sat, Sep 24, 2011 at 12:52 AM, loldrup wrote: > I'm trying to figure out how to model the world most flexibly (okay, so I'm > sticking to modelling organisations for now, but still). My main problem > seems to occur when I want to allow the model to naturally expand in > complexity. Say we have the following relationship: > > Joe is a janitor at the school. > > This can easily be modelled with two entities
Re: [Neo4j] Modelling with neo4j
So... that type of modeling is more inline with NLP and Noun / Verb Property linkage. Which you can do. Do you need to also then describe semantically the WORKS_AT relationship ? You could give all relationships themselves describing properties, OR perhaps just link them to a SKOS_CONCEPT of _work_ In Freebase, we have Janitor looking more like this: http://www.freebase.com/inspect/en/janitor where we have assigned multiple Types to that Entity (the "Janitor" Topic). You'll also notice that it is an Equivalent Topic to the SKOS_CONCEPT of a "Janitor": http://www.freebase.com/inspect/authority/us/gov/loc/sh/sh85069345 Basically, Freebase uses a Triplestore http://en.wikipedia.org/wiki/Triplestore called "graphd" to maintain quad data: {, , , } more fully described on our wiki here: http://wiki.freebase.com/wiki/Data_dump Basically, is where a Namespace is held. And you can see the layout of a tuple when looking at any entity with the URI http://www.freebase.com/inspect If your more technically inclined about the underpinnings, Toby gives a brief technical breakdown of graphd here: http://blog.freebase.com/2008/04/09/a-brief-tour-of-graphd/ He also wrote a book, http://shop.oreilly.com/product/9780596153823.do How Freebase has enoted that a Person has an Employment with a Job Title at an Employer is shown here: http://www.freebase.com/inspect/en/patrick_simmons <-- look at my example school "THAD SCHOOL" that is linked out from Patrick Simmons to this source node with various properties, http://www.freebase.com/inspect/m/0h5mvw6 Making more sense now ? On Sat, Sep 24, 2011 at 1:57 PM, loldrup wrote: > What if: > Joe WORKS_AT the school > Joe WORKS_AS a janitor > The school HAS_A janitor > > How do I denote that Joe works as I janitor at that exact school? > Do you see other problems in the notation above? > > Also, thank you very much for your thought inspiring reply! > > Jon > On Sep 24, 2011 7:55 PM, "Thad Guidry [via Neo4j Community Discussions]" < > ml-node+s438527n3364798...@n3.nabble.com> wrote: > > > > > > Quite wrong. > > > > IS_JANITOR_OF will stick you into a boxed node ordinal. > > What you really want when modeling the world is to only capture the > > "semantic relationships" themselves. IS_A being a core semantic > > relationship. I am a janitor. He IS_A janitor. What is a janitor ? What > > properties does a janitor have ? Does a janitor always have those > > properties, no matter it's state ? Does a janitor that LIVES_AT the > > Seychelles Islands always have a pail and mop ? > > > > When trying to model "the world", you must break down to the lowest of > lows. > > And then use Types to clearly designate Property Reasonings. > > > > For instance, SWRC ontology says that Bioinformatics IS_A subtopic of > > KnowledgeWeb Applications. > > > > > > https://wiki-sop.inria.fr/wiki/bin/view/Acacia/KnowledgeWeb#Bioinformatics > "> > > https://wiki-sop.inria.fr/wiki/bin/view/Acacia/KnowledgeWeb#Applications > "/> > > http://www.w3.org/2001/XMLSchema#string > ">2.7.3 > > > > > > > > > > Great for them. But WHAT is Bioinformatics to the rest of "the world", > > generally ? Is it a FIELD_OF_STUDY as Freebase.com says ? Is it a > > STUDY_SUBJECT as other Vocabularies describe ? Is a FIELD_OF_STUDY the > same > > as a STUDY_SUBJECT ? Or is it more proper and correct to say that a > > FIELD_OF_STUDY can be PART_OF a STUDY_SUBJECT ? Bioinformatics PART_OF > > Biology PART_OF Science ? I would say both and all. And there you would > > need many "semantic relationships", depending again on the domains' > usage. > > > > In Freebase, we decided early on that the lowest of lows would be TOPICS. > > Some TOPICS could be given Types. A Janitor is a Type of Person. Oh > > Really ? No. Not always to some ! But all domains typically agree that a > > Janitor is a Profession. A Job_Type (TypeOfJob) that someone professes or > > agrees to WORK_AS for payment. And some folks might be enslaved to > WORK_AS > > :) > > > > Existing Ontologies and Vocabularies (which are domain based, some wider > > than others) can help anyone trying to model "the world". However, be > aware > > that many longtail domains, like Food Service, or Laser Etching, are > simply > > not modeled, no one has touched those yet in building ontologies or > > vocabularies and henceforth, require community domain experts (the folks > in > > those businesses or scientific or government communities) to help you > think > > correctly within their domains, rather than how "the rest of world" would > > typically organize them. Organizing across *domains* with Types will > > require Namespaces for those domains, and in some cases, you will find > that > > only a FEW Properties really apply to a specific Namespace. They are just > > simply NOT used by the rest of "the world". > > > > The very last part for you in modeling "the world" should be at a CONCEPT > > level. Like SKOS_CONCEPT. Only once you have seen the overlap of a > CONCEPT > > a
Re: [Neo4j] Modelling with neo4j
What if: Joe WORKS_AT the school Joe WORKS_AS a janitor The school HAS_A janitor How do I denote that Joe works as I janitor at that exact school? Do you see other problems in the notation above? Also, thank you very much for your thought inspiring reply! Jon On Sep 24, 2011 7:55 PM, "Thad Guidry [via Neo4j Community Discussions]" < ml-node+s438527n3364798...@n3.nabble.com> wrote: > > > Quite wrong. > > IS_JANITOR_OF will stick you into a boxed node ordinal. > What you really want when modeling the world is to only capture the > "semantic relationships" themselves. IS_A being a core semantic > relationship. I am a janitor. He IS_A janitor. What is a janitor ? What > properties does a janitor have ? Does a janitor always have those > properties, no matter it's state ? Does a janitor that LIVES_AT the > Seychelles Islands always have a pail and mop ? > > When trying to model "the world", you must break down to the lowest of lows. > And then use Types to clearly designate Property Reasonings. > > For instance, SWRC ontology says that Bioinformatics IS_A subtopic of > KnowledgeWeb Applications. > > > https://wiki-sop.inria.fr/wiki/bin/view/Acacia/KnowledgeWeb#Bioinformatics "> > https://wiki-sop.inria.fr/wiki/bin/view/Acacia/KnowledgeWeb#Applications "/> > http://www.w3.org/2001/XMLSchema#string ">2.7.3 > > > > > Great for them. But WHAT is Bioinformatics to the rest of "the world", > generally ? Is it a FIELD_OF_STUDY as Freebase.com says ? Is it a > STUDY_SUBJECT as other Vocabularies describe ? Is a FIELD_OF_STUDY the same > as a STUDY_SUBJECT ? Or is it more proper and correct to say that a > FIELD_OF_STUDY can be PART_OF a STUDY_SUBJECT ? Bioinformatics PART_OF > Biology PART_OF Science ? I would say both and all. And there you would > need many "semantic relationships", depending again on the domains' usage. > > In Freebase, we decided early on that the lowest of lows would be TOPICS. > Some TOPICS could be given Types. A Janitor is a Type of Person. Oh > Really ? No. Not always to some ! But all domains typically agree that a > Janitor is a Profession. A Job_Type (TypeOfJob) that someone professes or > agrees to WORK_AS for payment. And some folks might be enslaved to WORK_AS > :) > > Existing Ontologies and Vocabularies (which are domain based, some wider > than others) can help anyone trying to model "the world". However, be aware > that many longtail domains, like Food Service, or Laser Etching, are simply > not modeled, no one has touched those yet in building ontologies or > vocabularies and henceforth, require community domain experts (the folks in > those businesses or scientific or government communities) to help you think > correctly within their domains, rather than how "the rest of world" would > typically organize them. Organizing across *domains* with Types will > require Namespaces for those domains, and in some cases, you will find that > only a FEW Properties really apply to a specific Namespace. They are just > simply NOT used by the rest of "the world". > > The very last part for you in modeling "the world" should be at a CONCEPT > level. Like SKOS_CONCEPT. Only once you have seen the overlap of a CONCEPT > across domains, can you then begin to give the answer, YES, when 2 or 3 > domains ask, "Is this CONCEPT_OF "Janitor - a profession type where someone > cleans" the SAME_AS ours and RELATED_TO the CONCEPT_OF "Maid" ? > > Proper "semantic relationships" have to allow flexibility across domains. > Find some common overlapping Types and Topics across Domains, and then > begin your experimentation there (and make sure you get a bit of History or > Historical Types in there as well to account for Time Space associations - > those always screw with my head personally, lol). You will soon begin to > see that Domains are really like "Photoshop layers". > > -- > -Thad > http://www.freebase.com/view/en/thad_guidry > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > > > ___ > If you reply to this email, your message will be added to the discussion below: > http://neo4j-community-discussions.438527.n3.nabble.com/Modelling-with-neo4j-tp3363823p3364798.html > > To unsubscribe from Modelling with neo4j, visit http://neo4j-community-discussions.438527.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3363823&code=bG9sZHJ1cEBnbWFpbC5jb218MzM2MzgyM3wtODU1NTY5ODYz -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/Modelling-with-neo4j-tp3363823p3364902.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Modelling with neo4j
Unfortunately, most of what I describe as best practices in modeling "semantic relationships" has not been written well ENOUGH in most books. However the W3C community lists such as SKOS and RDFa have many experts that have deep knowledge around those best practices. Asking your questions there on there mailing lists would provide very good guidance for you. The starting point of it all would be here: http://www.w3.org/standards/semanticweb/ Specifically, get your questions answered on this mailing lists: public-...@w3.org and semantic-...@w3.org On Sat, Sep 24, 2011 at 1:31 PM, loldrup wrote: > Hmm.. Which book would you recommend me to read? > > Jon > On Sep 24, 2011 7:55 PM, "Thad Guidry [via Neo4j Community Discussions]" < > ml-node+s438527n3364798...@n3.nabble.com> wrote: > > > > > > Quite wrong. > > > > IS_JANITOR_OF will stick you into a boxed node ordinal. > > What you really want when modeling the world is to only capture the > > "semantic relationships" themselves. IS_A being a core semantic > > relationship. I am a janitor. He IS_A janitor. What is a janitor ? What > > properties does a janitor have ? Does a janitor always have those > > properties, no matter it's state ? Does a janitor that LIVES_AT the > > Seychelles Islands always have a pail and mop ? > > > > When trying to model "the world", you must break down to the lowest of > lows. > > And then use Types to clearly designate Property Reasonings. > > > > For instance, SWRC ontology says that Bioinformatics IS_A subtopic of > > KnowledgeWeb Applications. > > > > > > https://wiki-sop.inria.fr/wiki/bin/view/Acacia/KnowledgeWeb#Bioinformatics > "> > > https://wiki-sop.inria.fr/wiki/bin/view/Acacia/KnowledgeWeb#Applications > "/> > > http://www.w3.org/2001/XMLSchema#string > ">2.7.3 > > > > > > > > > > Great for them. But WHAT is Bioinformatics to the rest of "the world", > > generally ? Is it a FIELD_OF_STUDY as Freebase.com says ? Is it a > > STUDY_SUBJECT as other Vocabularies describe ? Is a FIELD_OF_STUDY the > same > > as a STUDY_SUBJECT ? Or is it more proper and correct to say that a > > FIELD_OF_STUDY can be PART_OF a STUDY_SUBJECT ? Bioinformatics PART_OF > > Biology PART_OF Science ? I would say both and all. And there you would > > need many "semantic relationships", depending again on the domains' > usage. > > > > In Freebase, we decided early on that the lowest of lows would be TOPICS. > > Some TOPICS could be given Types. A Janitor is a Type of Person. Oh > > Really ? No. Not always to some ! But all domains typically agree that a > > Janitor is a Profession. A Job_Type (TypeOfJob) that someone professes or > > agrees to WORK_AS for payment. And some folks might be enslaved to > WORK_AS > > :) > > > > Existing Ontologies and Vocabularies (which are domain based, some wider > > than others) can help anyone trying to model "the world". However, be > aware > > that many longtail domains, like Food Service, or Laser Etching, are > simply > > not modeled, no one has touched those yet in building ontologies or > > vocabularies and henceforth, require community domain experts (the folks > in > > those businesses or scientific or government communities) to help you > think > > correctly within their domains, rather than how "the rest of world" would > > typically organize them. Organizing across *domains* with Types will > > require Namespaces for those domains, and in some cases, you will find > that > > only a FEW Properties really apply to a specific Namespace. They are just > > simply NOT used by the rest of "the world". > > > > The very last part for you in modeling "the world" should be at a CONCEPT > > level. Like SKOS_CONCEPT. Only once you have seen the overlap of a > CONCEPT > > across domains, can you then begin to give the answer, YES, when 2 or 3 > > domains ask, "Is this CONCEPT_OF "Janitor - a profession type where > someone > > cleans" the SAME_AS ours and RELATED_TO the CONCEPT_OF "Maid" ? > > > > Proper "semantic relationships" have to allow flexibility across domains. > > Find some common overlapping Types and Topics across Domains, and then > > begin your experimentation there (and make sure you get a bit of History > or > > Historical Types in there as well to account for Time Space associations > - > > those always screw with my head personally, lol). You will soon begin to > > see that Domains are really like "Photoshop layers". > > > > -- > > -Thad > > http://www.freebase.com/view/en/thad_guidry > > ___ > > Neo4j mailing list > > User@lists.neo4j.org > > https://lists.neo4j.org/mailman/listinfo/user > > > > > > ___ > > If you reply to this email, your message will be added to the discussion > below: > > > > http://neo4j-community-discussions.438527.n3.nabble.com/Modelling-with-neo4j-tp3363823p3364798.html > > > > To unsubscribe from Modelling with neo4j, visit > > http://neo4j-community-discussions.438527.n3.nabbl
Re: [Neo4j] Modelling with neo4j
Hmm.. Which book would you recommend me to read? Jon On Sep 24, 2011 7:55 PM, "Thad Guidry [via Neo4j Community Discussions]" < ml-node+s438527n3364798...@n3.nabble.com> wrote: > > > Quite wrong. > > IS_JANITOR_OF will stick you into a boxed node ordinal. > What you really want when modeling the world is to only capture the > "semantic relationships" themselves. IS_A being a core semantic > relationship. I am a janitor. He IS_A janitor. What is a janitor ? What > properties does a janitor have ? Does a janitor always have those > properties, no matter it's state ? Does a janitor that LIVES_AT the > Seychelles Islands always have a pail and mop ? > > When trying to model "the world", you must break down to the lowest of lows. > And then use Types to clearly designate Property Reasonings. > > For instance, SWRC ontology says that Bioinformatics IS_A subtopic of > KnowledgeWeb Applications. > > > https://wiki-sop.inria.fr/wiki/bin/view/Acacia/KnowledgeWeb#Bioinformatics "> > https://wiki-sop.inria.fr/wiki/bin/view/Acacia/KnowledgeWeb#Applications "/> > http://www.w3.org/2001/XMLSchema#string ">2.7.3 > > > > > Great for them. But WHAT is Bioinformatics to the rest of "the world", > generally ? Is it a FIELD_OF_STUDY as Freebase.com says ? Is it a > STUDY_SUBJECT as other Vocabularies describe ? Is a FIELD_OF_STUDY the same > as a STUDY_SUBJECT ? Or is it more proper and correct to say that a > FIELD_OF_STUDY can be PART_OF a STUDY_SUBJECT ? Bioinformatics PART_OF > Biology PART_OF Science ? I would say both and all. And there you would > need many "semantic relationships", depending again on the domains' usage. > > In Freebase, we decided early on that the lowest of lows would be TOPICS. > Some TOPICS could be given Types. A Janitor is a Type of Person. Oh > Really ? No. Not always to some ! But all domains typically agree that a > Janitor is a Profession. A Job_Type (TypeOfJob) that someone professes or > agrees to WORK_AS for payment. And some folks might be enslaved to WORK_AS > :) > > Existing Ontologies and Vocabularies (which are domain based, some wider > than others) can help anyone trying to model "the world". However, be aware > that many longtail domains, like Food Service, or Laser Etching, are simply > not modeled, no one has touched those yet in building ontologies or > vocabularies and henceforth, require community domain experts (the folks in > those businesses or scientific or government communities) to help you think > correctly within their domains, rather than how "the rest of world" would > typically organize them. Organizing across *domains* with Types will > require Namespaces for those domains, and in some cases, you will find that > only a FEW Properties really apply to a specific Namespace. They are just > simply NOT used by the rest of "the world". > > The very last part for you in modeling "the world" should be at a CONCEPT > level. Like SKOS_CONCEPT. Only once you have seen the overlap of a CONCEPT > across domains, can you then begin to give the answer, YES, when 2 or 3 > domains ask, "Is this CONCEPT_OF "Janitor - a profession type where someone > cleans" the SAME_AS ours and RELATED_TO the CONCEPT_OF "Maid" ? > > Proper "semantic relationships" have to allow flexibility across domains. > Find some common overlapping Types and Topics across Domains, and then > begin your experimentation there (and make sure you get a bit of History or > Historical Types in there as well to account for Time Space associations - > those always screw with my head personally, lol). You will soon begin to > see that Domains are really like "Photoshop layers". > > -- > -Thad > http://www.freebase.com/view/en/thad_guidry > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > > > ___ > If you reply to this email, your message will be added to the discussion below: > http://neo4j-community-discussions.438527.n3.nabble.com/Modelling-with-neo4j-tp3363823p3364798.html > > To unsubscribe from Modelling with neo4j, visit http://neo4j-community-discussions.438527.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3363823&code=bG9sZHJ1cEBnbWFpbC5jb218MzM2MzgyM3wtODU1NTY5ODYz -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/Modelling-with-neo4j-tp3363823p3364865.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Modelling with neo4j
Quite wrong. IS_JANITOR_OF will stick you into a boxed node ordinal. What you really want when modeling the world is to only capture the "semantic relationships" themselves. IS_A being a core semantic relationship. I am a janitor. He IS_A janitor. What is a janitor ? What properties does a janitor have ? Does a janitor always have those properties, no matter it's state ? Does a janitor that LIVES_AT the Seychelles Islands always have a pail and mop ? When trying to model "the world", you must break down to the lowest of lows. And then use Types to clearly designate Property Reasonings. For instance, SWRC ontology says that Bioinformatics IS_A subtopic of KnowledgeWeb Applications. https://wiki-sop.inria.fr/wiki/bin/view/Acacia/KnowledgeWeb#Bioinformatics";> https://wiki-sop.inria.fr/wiki/bin/view/Acacia/KnowledgeWeb#Applications"/> http://www.w3.org/2001/XMLSchema#string";>2.7.3 Great for them. But WHAT is Bioinformatics to the rest of "the world", generally ? Is it a FIELD_OF_STUDY as Freebase.com says ? Is it a STUDY_SUBJECT as other Vocabularies describe ? Is a FIELD_OF_STUDY the same as a STUDY_SUBJECT ? Or is it more proper and correct to say that a FIELD_OF_STUDY can be PART_OF a STUDY_SUBJECT ? Bioinformatics PART_OF Biology PART_OF Science ? I would say both and all. And there you would need many "semantic relationships", depending again on the domains' usage. In Freebase, we decided early on that the lowest of lows would be TOPICS. Some TOPICS could be given Types. A Janitor is a Type of Person. Oh Really ? No. Not always to some ! But all domains typically agree that a Janitor is a Profession. A Job_Type (TypeOfJob) that someone professes or agrees to WORK_AS for payment. And some folks might be enslaved to WORK_AS :) Existing Ontologies and Vocabularies (which are domain based, some wider than others) can help anyone trying to model "the world". However, be aware that many longtail domains, like Food Service, or Laser Etching, are simply not modeled, no one has touched those yet in building ontologies or vocabularies and henceforth, require community domain experts (the folks in those businesses or scientific or government communities) to help you think correctly within their domains, rather than how "the rest of world" would typically organize them. Organizing across *domains* with Types will require Namespaces for those domains, and in some cases, you will find that only a FEW Properties really apply to a specific Namespace. They are just simply NOT used by the rest of "the world". The very last part for you in modeling "the world" should be at a CONCEPT level. Like SKOS_CONCEPT. Only once you have seen the overlap of a CONCEPT across domains, can you then begin to give the answer, YES, when 2 or 3 domains ask, "Is this CONCEPT_OF "Janitor - a profession type where someone cleans" the SAME_AS ours and RELATED_TO the CONCEPT_OF "Maid" ? Proper "semantic relationships" have to allow flexibility across domains. Find some common overlapping Types and Topics across Domains, and then begin your experimentation there (and make sure you get a bit of History or Historical Types in there as well to account for Time Space associations - those always screw with my head personally, lol). You will soon begin to see that Domains are really like "Photoshop layers". -- -Thad http://www.freebase.com/view/en/thad_guidry ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Modelling with neo4j
Subtyping works as follows in Enhanced API. When calling getRelationships(RelationshipType, Direction) or any of its alternatives, the API looks up all subtypes of that relationship type and then call getRelationshipTypes(Direction, ). All you need to do is create a RelationshipType "IS_JANITOR_OF" and a RelationshipType "WORKS_FOR" and state that the former is a subtype of the latter. Haskell type classes are a great mechanism for ad-hoc polymorphism and in some ways are preferable to subtyping, though not necessarily in the context of a database. It allows you indeed to say there is a commonality between "WORKS_AT" and "IS_JANITOR_OF", but it doesn't allow you to state that the relationships of type "IS_JANITOR_OF" are a subset of the relationships of type "WORKS_AT". In a database context the subsumption rule is actually quite important and Haskell type classes don't offer that. The combination of type classes and subtyping is as far as I know still an open research topic. It is not without reason that Scala (which has subtyping) doesn't have type classes, though it allows similar constructs through implicit conversions. Working in both disciplines at the same time (poor-man type classes through implicit conversions in combination with subtyping) seems to be non-trivial. Niels > Date: Sat, 24 Sep 2011 08:09:48 -0700 > From: lold...@gmail.com > To: user@lists.neo4j.org > Subject: Re: [Neo4j] Modelling with neo4j > > Subtyping of relationship types sounds like the cure to my problems. > When creating a relationship IS_A_JANITOR_OF, will a corresponding > relationship type IS_A_JANITOR_OF-relationship-type automatically be > created? > > If I have a simple relationship can I then ask which relationship types it's > type is a subtype of? > > Regarding interfaces: > I took the idea of interfaces from Haskells type classes, which makes great > sense as interfaces. In Neo4j we could imagine that relationships with types > WORKS_AT and REFERS_TO might have something in common (e.g. they both have > to specify a boss who gives them orders). > > For now I don't think my problem requires interfaces before it can be be > solved, but I only just started so who knows :) > > Jon > On Sep 24, 2011 3:15 PM, "Niels Hoogeveen [via Neo4j Community Discussions]" > wrote: > > > > > > > > You raise interesting questions, most of them very much related to the > work I did on Enhanced API. > > > > Let me start with the distinction between Node and Relationship, which in > my opinion too is a bit artificial. I understand when creating a graph > database, it is helpful to have something like vertices and edges, but > indeed see those more as modalities of the elements of the graph than as > clearly separated types. This was one of the reasons to unify all elements > of the graph with one underlying type. > > > > At the time, I saw two option: > > > > a) make the graph bipartite, so that all relationships and properties > become nodes and use relationships only as a hidden linking feature > > b) create shadow nodes for relationships and properties when needed and > let the API handle that transparently > > > > I chose for option b for performance reasons. There are likely many > applications where most of the relationships are simple, ie. link two nodes > while possibly having some properties. Using a bipartite layout for such > relationships adds nothing, but it takes twice as many links to traverse. > > > > The shadow node solution only treats relationships and properties as > special (having relationships to them) when that is needed. > > > > Now to the typing issues. Neo4j has chosen not to add typing features to > the database and I actually like that. It allows for optional type systems > that can be used but are not enforced to be used. > > > > Type systems are nice beasts, especially when dealing with large and > complex applications, but they impose a development overhead, mostly felt in > small quick and dirty applications. This is true for programming languages, > where many people prefer to use an untyped language such as Javascript, > Python, Ruby and PHP over a typed language such as Java, Scala, C# or > Haskell and I think it is also true for databases. I think one of the > reasons NOSQL became so popular is because the type system of an RDBMS adds > overhead to simple applications. > > > > An RDBMS needs a type system because the storage layout requires that. > Tables have a fixed number of columns, where each column has a designated > type. While this is a great feature when processing massive amounts of > similar data, it c
Re: [Neo4j] Modelling with neo4j
I too think that much of the type system of the application actually belongs in the application domain layer. There are so many trade-offs that can be made that a one-size-fits-all typing of a graph is not the way to go. This is why I decided that enhanced-api should only provide the nuts and bolts to roll your own type-layer. Typing of nodes is certainly something that is important in many applications. There is minimal support for node types in enhanced-api (just a node with a name), and it uses the unidirectional relationship pattern you describe. Each node has a property containing the id's of the node-types (stored as an array of long values). This makes the look-up of node-types fast since no indexing is required. I am still struggling a bit with the notion of what makes a node-type. On the one hand there are structural aspects to a node type. For example a node represents a person and therefore has a "first name", "last name", "birth date". On the other hand there are relational aspects to a node type. For example a relationship can represent the fact that a person is "president" of some "country". Such a relationship in some ways enhances that node with an additional type; the "person" becomes a "president". I am still not sure how to exactly model that. Some node-types seem to be intrinsic, ie. its type is independent of other entities in the database, while other node-types seem to require other entities. After all a person is a person, independent of what other information we have in the database, while a person can only be "president" if there is a relationship to a "country" (or an organization, if we want to denote the other (related) meaning of "president"). Such relationship related node-types can impose additional constraints on the node. For example to be president of the USA, one has to be older than 35 and in other countries similar constraints exits. I am still not sure how to incorporate this in a database model. Maybe the notion of an intrinsic type is wrong after all and node-types only arise out of relationships with other nodes. It is certainly food for thought. The unidirectional relationship you describe is indeed somewhat of a hack, but it is necessary as long as the densely-connected-node issue is not solved. I hope this will be the case in Neo4j 1.6. It would be nice to be able to always have bidirectional relationships and I see no reason why this can't be the case. The solution to the problem is beneficial to the database in more than one way and has no additional cost, except for the implementation of it. Niels > From: rick.bullo...@thingworx.com > To: user@lists.neo4j.org > Date: Sat, 24 Sep 2011 06:37:43 -0700 > Subject: Re: [Neo4j] Modelling with neo4j > > That's a great summary, Niels. Very similar to how we've applied Neo4J here > at ThingWorx, though we've done most of the type system work (nodes and > relationships are all typed/subtyped) in our application domain layer. A few > other items that we leveraged in our implementation that you may wish to > consider: > > - A common pattern we encountered was a "collection" of typed entities (e.g. > a typed collection), and we implemented a specific model using supernodes for > this. This also allowed us to rapidly and easily iterate/search collections > and also to organize nodes in a "human comprehensible way" that can be > readily viewed with something like Neoclipse for troubleshooting purposes. > Also, if the type was "truck", we stamped the node with the type "truck" as a > property (using enumerations with a custom int member) and used that same > enum as the relationship type between the node and the collection node. In > our model, an entity has a single "type", but we implemented the concept of > supertyping/subtyping in our domain model > > - We found quite a few examples where a "one-way relationship" was more than > adequate and, instead of incurring the overhead of a relationship > (particularly when millions of these relationships were attached to a single > supernode), we used a *property* on a node containing the node id of the node > it references. Sounds like a hack, but it actually has substantial > performance advantages, particularly if you are frequently adding/removing > relationships to/from the supernode > > - We overlaid our own REST API on our domain model, and wanted to come up > with a simple way to resolve the URI for any given node/entity. For that, we > used a pattern for which each node can have an optional "parent" node type. > Example: a blog comment is always attached to a blog entry or other blog > comment. A blo
Re: [Neo4j] Modelling with neo4j
Subtyping of relationship types sounds like the cure to my problems. When creating a relationship IS_A_JANITOR_OF, will a corresponding relationship type IS_A_JANITOR_OF-relationship-type automatically be created? If I have a simple relationship can I then ask which relationship types it's type is a subtype of? Regarding interfaces: I took the idea of interfaces from Haskells type classes, which makes great sense as interfaces. In Neo4j we could imagine that relationships with types WORKS_AT and REFERS_TO might have something in common (e.g. they both have to specify a boss who gives them orders). For now I don't think my problem requires interfaces before it can be be solved, but I only just started so who knows :) Jon On Sep 24, 2011 3:15 PM, "Niels Hoogeveen [via Neo4j Community Discussions]" wrote: > > > > You raise interesting questions, most of them very much related to the work I did on Enhanced API. > > Let me start with the distinction between Node and Relationship, which in my opinion too is a bit artificial. I understand when creating a graph database, it is helpful to have something like vertices and edges, but indeed see those more as modalities of the elements of the graph than as clearly separated types. This was one of the reasons to unify all elements of the graph with one underlying type. > > At the time, I saw two option: > > a) make the graph bipartite, so that all relationships and properties become nodes and use relationships only as a hidden linking feature > b) create shadow nodes for relationships and properties when needed and let the API handle that transparently > > I chose for option b for performance reasons. There are likely many applications where most of the relationships are simple, ie. link two nodes while possibly having some properties. Using a bipartite layout for such relationships adds nothing, but it takes twice as many links to traverse. > > The shadow node solution only treats relationships and properties as special (having relationships to them) when that is needed. > > Now to the typing issues. Neo4j has chosen not to add typing features to the database and I actually like that. It allows for optional type systems that can be used but are not enforced to be used. > > Type systems are nice beasts, especially when dealing with large and complex applications, but they impose a development overhead, mostly felt in small quick and dirty applications. This is true for programming languages, where many people prefer to use an untyped language such as Javascript, Python, Ruby and PHP over a typed language such as Java, Scala, C# or Haskell and I think it is also true for databases. I think one of the reasons NOSQL became so popular is because the type system of an RDBMS adds overhead to simple applications. > > An RDBMS needs a type system because the storage layout requires that. Tables have a fixed number of columns, where each column has a designated type. While this is a great feature when processing massive amounts of similar data, it can also make the application brittle. The tight coupling between type system and storage layout makes that rapid schema evolution is not easy to do. > > Neo4j doesn't impose a type system like an RDBMS does, because its storage layout doesn't require it. Something is either a node, a relationship or a property, but the combinations don't need to explicit modelling for the sake of storage. > > Because of this untyped nature of the database, it now becomes possible to add a type system that not only is optional, but can in fact be made as strong or as weak as the application demands. > > Unfortunately Neo4j doesn't provide all the necessary hooks for a type system, another reason why I started Enhanced API. It was not my intention with that API to provide a full fledged type system to Neo4j, but to provide the necessary hooks so a type system can be created. > > Of course there is some type-creep in Neo4j. Properties and relationships have names, which in almost every application are used as types. Say we have several nodes we like to use to store information about people, where each of those nodes has a property "last_name". This property name effectively is used as a type. For all nodes the property name will denote the same fact: the last name of a person. > > This is not necessarily required by the Neo4j database. Different nodes may use the same property name to denote different things even with different datatypes. It is possible to have nodes with property name "last_name" that for some nodes is a String while it is an Integer for other nodes. While this is possible, I venture this is not all that common. The same property name will likely be used to denote the same fact and have the same datatype across the graph and therefore in most common cases be used like a type. > > The same applies to relationships, where the name will in general be used to denote the same type of relationship. It is unlikely an application with use
Re: [Neo4j] Modelling with neo4j
+1 Enhanced API grew out of a couple of classes I added to make IndexedRelationship work more easily (not exposing comparators), but it is essentially a separate component. Giving it that status would help other's improve it. Having laid some of the ground work, I feel it needs other people's input too. As it stands now, it is very much a one-man's work and while I am confident it contains plenty of good ideas, it can only grow with the input of other developers, just like IndexedRelationships has become much better thanks to the work Bryce put into it, and the work of others to include graph-collections with structures I would not even have thought about. There is however one thing we need to look at. Right now IndexRelationships has a dependency on Enhanced API for the indexing of nodes based on a property. At the same time Enhanced API has a dependency on graph-collections, transparently supporting IndexedRelationships in the API. I think it would be best to remove the dependency of graph-collections on enhanced-api and only offer the slightly more complex option where the user needs to provide a comparator. The other dependency can remain and in fact can even be made stronger. Enhanced API could in principle be made to support any type of collection, now that Bryce has added a generic nodecollection interface. I agree "enhanced api" is not a great name, it says what it does, but certainly has little appeal. So I will be happy if someone can come up with something sexier. Niels > From: peter.neuba...@neotechnology.com > Date: Sat, 24 Sep 2011 15:42:13 +0200 > To: user@lists.neo4j.org > Subject: Re: [Neo4j] Modelling with neo4j > > Great thoughts guys! > I think it would be interesting to break out the Enhanced API" from > graph-collections, rename it into something better (we can think of a > name together) and provide a more fully fledged example that we can > document and evolve. > > WDYT? > > Cheers, > > /peter neubauer > > GTalk: neubauer.peter > Skype peter.neubauer > Phone +46 704 106975 > LinkedIn http://www.linkedin.com/in/neubauer > Twitter http://twitter.com/peterneubauer > > http://www.neo4j.org - Your high performance graph database. > http://startupbootcamp.org/- Öresund - Innovation happens HERE. > http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. > > > > On Sat, Sep 24, 2011 at 3:37 PM, Rick Bullotta > wrote: > > That's a great summary, Niels. Very similar to how we've applied Neo4J > > here at ThingWorx, though we've done most of the type system work (nodes > > and relationships are all typed/subtyped) in our application domain layer. > > A few other items that we leveraged in our implementation that you may wish > > to consider: > > > > - A common pattern we encountered was a "collection" of typed entities > > (e.g. a typed collection), and we implemented a specific model using > > supernodes for this. This also allowed us to rapidly and easily > > iterate/search collections and also to organize nodes in a "human > > comprehensible way" that can be readily viewed with something like > > Neoclipse for troubleshooting purposes. Also, if the type was "truck", we > > stamped the node with the type "truck" as a property (using enumerations > > with a custom int member) and used that same enum as the relationship type > > between the node and the collection node. In our model, an entity has a > > single "type", but we implemented the concept of supertyping/subtyping in > > our domain model > > > > - We found quite a few examples where a "one-way relationship" was more > > than adequate and, instead of incurring the overhead of a relationship > > (particularly when millions of these relationships were attached to a > > single supernode), we used a *property* on a node containing the node id of > > the node it references. Sounds like a hack, but it actually has > > substantial performance advantages, particularly if you are frequently > > adding/removing relationships to/from the supernode > > > > - We overlaid our own REST API on our domain model, and wanted to come up > > with a simple way to resolve the URI for any given node/entity. For that, > > we used a pattern for which each node can have an optional "parent" node > > type. Example: a blog comment is always attached to a blog entry or other > > blog comment. A blog entry is always attached to a blog. A blog is always > > attached to the blogs collection, and so on. Each node has a name and/or > > an ID.
Re: [Neo4j] Modelling with neo4j
Great thoughts guys! I think it would be interesting to break out the Enhanced API" from graph-collections, rename it into something better (we can think of a name together) and provide a more fully fledged example that we can document and evolve. WDYT? Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/ - Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Sat, Sep 24, 2011 at 3:37 PM, Rick Bullotta wrote: > That's a great summary, Niels. Very similar to how we've applied Neo4J here > at ThingWorx, though we've done most of the type system work (nodes and > relationships are all typed/subtyped) in our application domain layer. A few > other items that we leveraged in our implementation that you may wish to > consider: > > - A common pattern we encountered was a "collection" of typed entities (e.g. > a typed collection), and we implemented a specific model using supernodes for > this. This also allowed us to rapidly and easily iterate/search collections > and also to organize nodes in a "human comprehensible way" that can be > readily viewed with something like Neoclipse for troubleshooting purposes. > Also, if the type was "truck", we stamped the node with the type "truck" as a > property (using enumerations with a custom int member) and used that same > enum as the relationship type between the node and the collection node. In > our model, an entity has a single "type", but we implemented the concept of > supertyping/subtyping in our domain model > > - We found quite a few examples where a "one-way relationship" was more than > adequate and, instead of incurring the overhead of a relationship > (particularly when millions of these relationships were attached to a single > supernode), we used a *property* on a node containing the node id of the node > it references. Sounds like a hack, but it actually has substantial > performance advantages, particularly if you are frequently adding/removing > relationships to/from the supernode > > - We overlaid our own REST API on our domain model, and wanted to come up > with a simple way to resolve the URI for any given node/entity. For that, we > used a pattern for which each node can have an optional "parent" node type. > Example: a blog comment is always attached to a blog entry or other blog > comment. A blog entry is always attached to a blog. A blog is always > attached to the blogs collection, and so on. Each node has a name and/or an > ID. Because those relationship "patterns" are well known, it is a trival > matter to create the URI to any entity given only its node, e.g.: > > /Blogs/MyBlog/Entries/103/Comments/204 > > Of course, it works the other way as well - easy to parse and traverse. > > - We often found that there were data structures in our application domain > for which it was OK to be "opaque" - e.g. although the structures were deep > and complex, they did not require searchability or traversability (e.g. they > were kind like "object blobs"), so in our metamodel, they are not stored as > nodes, relationships, and properties, but rather, as a JSON blob, serialized > as a string to a node property. That has worked out really well. When we do > need to filter/manipulate those, we do them at the domain level > > Just wanted to share some more examples. > > Rick > > ____ > From: user-boun...@lists.neo4j.org [user-boun...@lists.neo4j.org] On Behalf > Of Niels Hoogeveen [pd_aficion...@hotmail.com] > Sent: Saturday, September 24, 2011 9:14 AM > To: user@lists.neo4j.org > Subject: Re: [Neo4j] Modelling with neo4j > > You raise interesting questions, most of them very much related to the work I > did on Enhanced API. > > Let me start with the distinction between Node and Relationship, which in my > opinion too is a bit artificial. I understand when creating a graph database, > it is helpful to have something like vertices and edges, but indeed see those > more as modalities of the elements of the graph than as clearly separated > types. This was one of the reasons to unify all elements of the graph with > one underlying type. > > At the time, I saw two option: > > a) make the graph bipartite, so that all relationships and properties become > nodes and use relationships only as a hidden linking feature > b) create shado
Re: [Neo4j] Modelling with neo4j
We're using Neo4J to model the "real world" with "things" here at ThingWorx as well. See my responses to Niels for some specifics. From: user-boun...@lists.neo4j.org [user-boun...@lists.neo4j.org] On Behalf Of loldrup [lold...@gmail.com] Sent: Saturday, September 24, 2011 1:52 AM To: user@lists.neo4j.org Subject: [Neo4j] Modelling with neo4j I'm trying to figure out how to model the world most flexibly (okay, so I'm sticking to modelling organisations for now, but still). My main problem seems to occur when I want to allow the model to naturally expand in complexity. Say we have the following relationship: Joe is a janitor at the school. This can easily be modelled with two entities and a relationship. Now say I have some common properties for janitors. I would have to make a link from the janitor-relation to some node denoting the type 'janitor' which could then hold information on these common things. Unfortunately, relationships doesn't support that. Long story short: the problem is that sometimes I want my things to act as things, sometimes as types, sometimes as interfaces, and I cannot know in advance which of these modalities I'm going to need. Therefore, I'm considering going with this model: Imagine a graph in three layers. The lower layer represents things, the middle layer represents types and the upper layer represents interfaces. Initially i populate only the lowest layer, but as need arise I go back and promote various things to also be types or interfaces. These then crop up in the second and third layer of the graph, respectively. When this happens, a vertical relationship is added between the element in the lower layer and its new type/interface in three higher layers. Now the question is: how to model this scheme in neo4j? A number of challenges pops up: * Neo4j relationships cannot be n-ary, so every relationship must be modelled with a hyperrelationship, thus allowing future relations to the second and third layers. * In a modalities-are-a-changing-paradigm it doesn't really make sense to distinguish between relations and entities; at different points in time, one element may have to act in the roles of both. Neo4j however makes a fundamental destinction between the two things. I could choose too model all relationships as nodes, but will that not make graph traversals messy? * Neo4j doesn't come with a type strong destinction between such three layers of modalityy -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/Modelling-with-neo4j-tp3363823p3363823.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Modelling with neo4j
That's a great summary, Niels. Very similar to how we've applied Neo4J here at ThingWorx, though we've done most of the type system work (nodes and relationships are all typed/subtyped) in our application domain layer. A few other items that we leveraged in our implementation that you may wish to consider: - A common pattern we encountered was a "collection" of typed entities (e.g. a typed collection), and we implemented a specific model using supernodes for this. This also allowed us to rapidly and easily iterate/search collections and also to organize nodes in a "human comprehensible way" that can be readily viewed with something like Neoclipse for troubleshooting purposes. Also, if the type was "truck", we stamped the node with the type "truck" as a property (using enumerations with a custom int member) and used that same enum as the relationship type between the node and the collection node. In our model, an entity has a single "type", but we implemented the concept of supertyping/subtyping in our domain model - We found quite a few examples where a "one-way relationship" was more than adequate and, instead of incurring the overhead of a relationship (particularly when millions of these relationships were attached to a single supernode), we used a *property* on a node containing the node id of the node it references. Sounds like a hack, but it actually has substantial performance advantages, particularly if you are frequently adding/removing relationships to/from the supernode - We overlaid our own REST API on our domain model, and wanted to come up with a simple way to resolve the URI for any given node/entity. For that, we used a pattern for which each node can have an optional "parent" node type. Example: a blog comment is always attached to a blog entry or other blog comment. A blog entry is always attached to a blog. A blog is always attached to the blogs collection, and so on. Each node has a name and/or an ID. Because those relationship "patterns" are well known, it is a trival matter to create the URI to any entity given only its node, e.g.: /Blogs/MyBlog/Entries/103/Comments/204 Of course, it works the other way as well - easy to parse and traverse. - We often found that there were data structures in our application domain for which it was OK to be "opaque" - e.g. although the structures were deep and complex, they did not require searchability or traversability (e.g. they were kind like "object blobs"), so in our metamodel, they are not stored as nodes, relationships, and properties, but rather, as a JSON blob, serialized as a string to a node property. That has worked out really well. When we do need to filter/manipulate those, we do them at the domain level Just wanted to share some more examples. Rick From: user-boun...@lists.neo4j.org [user-boun...@lists.neo4j.org] On Behalf Of Niels Hoogeveen [pd_aficion...@hotmail.com] Sent: Saturday, September 24, 2011 9:14 AM To: user@lists.neo4j.org Subject: Re: [Neo4j] Modelling with neo4j You raise interesting questions, most of them very much related to the work I did on Enhanced API. Let me start with the distinction between Node and Relationship, which in my opinion too is a bit artificial. I understand when creating a graph database, it is helpful to have something like vertices and edges, but indeed see those more as modalities of the elements of the graph than as clearly separated types. This was one of the reasons to unify all elements of the graph with one underlying type. At the time, I saw two option: a) make the graph bipartite, so that all relationships and properties become nodes and use relationships only as a hidden linking feature b) create shadow nodes for relationships and properties when needed and let the API handle that transparently I chose for option b for performance reasons. There are likely many applications where most of the relationships are simple, ie. link two nodes while possibly having some properties. Using a bipartite layout for such relationships adds nothing, but it takes twice as many links to traverse. The shadow node solution only treats relationships and properties as special (having relationships to them) when that is needed. Now to the typing issues. Neo4j has chosen not to add typing features to the database and I actually like that. It allows for optional type systems that can be used but are not enforced to be used. Type systems are nice beasts, especially when dealing with large and complex applications, but they impose a development overhead, mostly felt in small quick and dirty applications. This is true for programming languages, where many people prefer to use an untyped language such as Javascript, Python, Ruby and PHP over a typed language such as Java, Scal
Re: [Neo4j] Modelling with neo4j
You raise interesting questions, most of them very much related to the work I did on Enhanced API. Let me start with the distinction between Node and Relationship, which in my opinion too is a bit artificial. I understand when creating a graph database, it is helpful to have something like vertices and edges, but indeed see those more as modalities of the elements of the graph than as clearly separated types. This was one of the reasons to unify all elements of the graph with one underlying type. At the time, I saw two option: a) make the graph bipartite, so that all relationships and properties become nodes and use relationships only as a hidden linking feature b) create shadow nodes for relationships and properties when needed and let the API handle that transparently I chose for option b for performance reasons. There are likely many applications where most of the relationships are simple, ie. link two nodes while possibly having some properties. Using a bipartite layout for such relationships adds nothing, but it takes twice as many links to traverse. The shadow node solution only treats relationships and properties as special (having relationships to them) when that is needed. Now to the typing issues. Neo4j has chosen not to add typing features to the database and I actually like that. It allows for optional type systems that can be used but are not enforced to be used. Type systems are nice beasts, especially when dealing with large and complex applications, but they impose a development overhead, mostly felt in small quick and dirty applications. This is true for programming languages, where many people prefer to use an untyped language such as Javascript, Python, Ruby and PHP over a typed language such as Java, Scala, C# or Haskell and I think it is also true for databases. I think one of the reasons NOSQL became so popular is because the type system of an RDBMS adds overhead to simple applications. An RDBMS needs a type system because the storage layout requires that. Tables have a fixed number of columns, where each column has a designated type. While this is a great feature when processing massive amounts of similar data, it can also make the application brittle. The tight coupling between type system and storage layout makes that rapid schema evolution is not easy to do. Neo4j doesn't impose a type system like an RDBMS does, because its storage layout doesn't require it. Something is either a node, a relationship or a property, but the combinations don't need to explicit modelling for the sake of storage. Because of this untyped nature of the database, it now becomes possible to add a type system that not only is optional, but can in fact be made as strong or as weak as the application demands. Unfortunately Neo4j doesn't provide all the necessary hooks for a type system, another reason why I started Enhanced API. It was not my intention with that API to provide a full fledged type system to Neo4j, but to provide the necessary hooks so a type system can be created. Of course there is some type-creep in Neo4j. Properties and relationships have names, which in almost every application are used as types. Say we have several nodes we like to use to store information about people, where each of those nodes has a property "last_name". This property name effectively is used as a type. For all nodes the property name will denote the same fact: the last name of a person. This is not necessarily required by the Neo4j database. Different nodes may use the same property name to denote different things even with different datatypes. It is possible to have nodes with property name "last_name" that for some nodes is a String while it is an Integer for other nodes. While this is possible, I venture this is not all that common. The same property name will likely be used to denote the same fact and have the same datatype across the graph and therefore in most common cases be used like a type. The same applies to relationships, where the name will in general be used to denote the same type of relationship. It is unlikely an application with use the "FRIEND" relationship to sometimes denote a friendship between two people while at other times use that relationship name to denote the address of a building. This is as far as typing goes in Neo4j, but it is there and means we have to incorporate it into the API somehow. This is the reason why I decided to add subtyping of relationship-types and property-types in the API, a feature that may be of interest to the model you describe in your email. Joe is a janitor at the school. Here we see three elements: "Joe", "is janitor at", and "the school", which can indeed be modeled with two nodes and a relationship. There is however a more general statement here of the form: person works with organization. Suppose we want to store the fact:" Jane is principal of the school. Again we can model this