Re: [Neo4j] Modelling with neo4j

2011-09-27 Thread Peter Neubauer
Great Bryce,
Let us know if it could work out!

/peter

Sent from my phone.
On Sep 28, 2011 5:38 AM, "Bryce"  wrote:
> Following up on the part of this discussion about moving the enhanced api
> out of the graph collections module, was meaning to get to this earlier
but
> got side tracked.
>
> The dependency that IndexedRelationship had on the ComparablePropertyType
> which I am assuming is what you are referring to there no longer exists.
> PropertySortedTree still has a dependency on ComparablePropertyType but
> this collection could be taken along with the enhanced api as a
> specialisation of the SortedTree collection specific to the enhanced api.
> As it still implements NodeCollection it can be used as the basis of an
> IndexedRelationship, but IndexedRelationship doesn't need to know about it
> at all.
>
> I am wondering however whether there is a problem with this as one thing I
> just realised isn't happening that previously did is the storage of
> the property type into the node collection, and for that matter
> PropertySortedTree currently has no node only constructor so wouldn't
> currently work correctly I will look into that (hadn't done that since
I
> haven't even had a good look at the enhanced api yet).
>
> On Sun, Sep 25, 2011 at 3:28 AM, Niels Hoogeveen
> wrote:
>
>>
>> +1
>> Enhanced API grew out of a couple of classes I added to make
>> IndexedRelationship work more easily (not exposing comparators), but it
is
>> essentially a separate component. Giving it that status would help
other's
>> improve it. Having laid some of the ground work, I feel it needs other
>> people's input too. As it stands now, it is very much a one-man's work
and
>> while I am confident it contains plenty of good ideas, it can only grow
with
>> the input of other developers, just like IndexedRelationships has become
>> much better thanks to the work Bryce put into it, and the work of others
to
>> include graph-collections with structures I would not even have thought
>> about.
>> There is however one thing we need to look at. Right now
IndexRelationships
>> has a dependency on Enhanced API for the indexing of nodes based on a
>> property. At the same time Enhanced API has a dependency on
>> graph-collections, transparently supporting IndexedRelationships in the
API.
>> I think it would be best to remove the dependency of graph-collections on
>> enhanced-api and only offer the slightly more complex option where the
user
>> needs to provide a comparator. The other dependency can remain and in
fact
>> can even be made stronger. Enhanced API could in principle be made to
>> support any type of collection, now that Bryce has added a generic
>> nodecollection interface.
>> I agree "enhanced api" is not a great name, it says what it does, but
>> certainly has little appeal. So I will be happy if someone can come up
with
>> something sexier.
>> Niels
>> > From: peter.neuba...@neotechnology.com
>> > Date: Sat, 24 Sep 2011 15:42:13 +0200
>> > To: user@lists.neo4j.org
>> > Subject: Re: [Neo4j] Modelling with neo4j
>> >
>> > Great thoughts guys!
>> > I think it would be interesting to break out the Enhanced API" from
>> > graph-collections, rename it into something better (we can think of a
>> > name together) and provide a more fully fledged example that we can
>> > document and evolve.
>> >
>> > WDYT?
>> >
>> > Cheers,
>> >
>> > /peter neubauer
>> >
>> > GTalk: neubauer.peter
>> > Skype peter.neubauer
>> > Phone +46 704 106975
>> > LinkedIn http://www.linkedin.com/in/neubauer
>> > Twitter http://twitter.com/peterneubauer
>> >
>> > http://www.neo4j.org - Your high performance graph
>> database.
>> > http://startupbootcamp.org/ - Öresund - Innovation happens HERE.
>> > http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.
>> >
>> >
>> >
>> > On Sat, Sep 24, 2011 at 3:37 PM, Rick Bullotta
>> >  wrote:
>> > > That's a great summary, Niels. Very similar to how we've applied
Neo4J
>> here at ThingWorx, though we've done most of the type system work (nodes
and
>> relationships are all typed/subtyped) in our application domain layer. A
>> few other items that we leveraged in our implementation that you may wish
to
>> consider:
>> > >
>> > > - A common pattern we encountered was a "collection" of typed
entities
>> (e.g. a typed collection), and we im

Re: [Neo4j] Modelling with neo4j

2011-09-27 Thread Bryce
Following up on the part of this discussion about moving the enhanced api
out of the graph collections module, was meaning to get to this earlier but
got side tracked.

The dependency that IndexedRelationship had on the ComparablePropertyType
which I am assuming is what you are referring to there no longer exists.
 PropertySortedTree still has a dependency on ComparablePropertyType but
this collection could be taken along with the enhanced api as a
specialisation of the SortedTree collection specific to the enhanced api.
 As it still implements NodeCollection it can be used as the basis of an
IndexedRelationship, but IndexedRelationship doesn't need to know about it
at all.

I am wondering however whether there is a problem with this as one thing I
just realised isn't happening that previously did is the storage of
the property type into the node collection, and for that matter
PropertySortedTree currently has no node only constructor so wouldn't
currently work correctly I will look into that (hadn't done that since I
haven't even had a good look at the enhanced api yet).

On Sun, Sep 25, 2011 at 3:28 AM, Niels Hoogeveen
wrote:

>
> +1
> Enhanced API grew out of a couple of classes I added to make
> IndexedRelationship work more easily (not exposing comparators), but it is
> essentially a separate component. Giving it that status would help other's
> improve it. Having laid some of the ground work, I feel it needs other
> people's input too. As it stands now, it is very much a one-man's work and
> while I am confident it contains plenty of good ideas, it can only grow with
> the input of other developers, just like IndexedRelationships has become
> much better thanks to the work Bryce put into it, and the work of others to
> include graph-collections with structures I would not even have thought
> about.
> There is however one thing we need to look at. Right now IndexRelationships
> has a dependency on Enhanced API for the indexing of nodes based on a
> property. At the same time Enhanced API has a dependency on
> graph-collections, transparently supporting IndexedRelationships in the API.
> I think it would be best to remove the dependency of graph-collections on
> enhanced-api and only offer the slightly more complex option where the user
> needs to provide a comparator. The other dependency can remain and in fact
> can even be made stronger. Enhanced API could in principle be made to
> support any type of collection, now that Bryce has added a generic
> nodecollection interface.
> I agree "enhanced api" is not a great name, it says what it does, but
> certainly has little appeal. So I will be happy if someone can come up with
> something sexier.
> Niels
> > From: peter.neuba...@neotechnology.com
> > Date: Sat, 24 Sep 2011 15:42:13 +0200
> > To: user@lists.neo4j.org
> > Subject: Re: [Neo4j] Modelling with neo4j
> >
> > Great thoughts guys!
> > I think it would be interesting to break out the Enhanced API" from
> > graph-collections, rename it into something better (we can think of a
> > name together) and provide a more fully fledged example that we can
> > document and evolve.
> >
> > WDYT?
> >
> > Cheers,
> >
> > /peter neubauer
> >
> > GTalk:  neubauer.peter
> > Skype   peter.neubauer
> > Phone   +46 704 106975
> > LinkedIn   http://www.linkedin.com/in/neubauer
> > Twitter  http://twitter.com/peterneubauer
> >
> > http://www.neo4j.org   - Your high performance graph
> database.
> > http://startupbootcamp.org/- Öresund - Innovation happens HERE.
> > http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.
> >
> >
> >
> > On Sat, Sep 24, 2011 at 3:37 PM, Rick Bullotta
> >  wrote:
> > > That's a great summary, Niels.  Very similar to how we've applied Neo4J
> here at ThingWorx, though we've done most of the type system work (nodes and
> relationships are all typed/subtyped) in our application domain layer.  A
> few other items that we leveraged in our implementation that you may wish to
> consider:
> > >
> > > - A common pattern we encountered was a "collection" of typed entities
> (e.g. a typed collection), and we implemented a specific model using
> supernodes for this.  This also allowed us to rapidly and easily
> iterate/search collections and also to organize nodes in a "human
> comprehensible way" that can be readily viewed with something like Neoclipse
> for troubleshooting purposes.  Also, if the type was "truck", we stamped the
> node with the type "truck" as a property (using enumerations with a custom
> i

Re: [Neo4j] Modelling with neo4j

2011-09-24 Thread loldrup
Beautiful.

Thank you very much :)

Jon
On Sep 25, 2011 5:48 AM, "Peter Hunsberger [via Neo4j Community
Discussions]"  wrote:
>
>
> I'm going to take a slightly different tack here than the responses you've
> got so far...
>
> First as others have pointed out, this is three entities and two
> relationships
>
> joe - is a - janitor - at the - school
>
> This is important, the lighter weight the relationships, the less problem
> you are going to have with needing some form of meta type for them. You
> really want to follow the basic OO lead here and stick to "isa" and
"hasa".
> The "at the" is essentially a "hasa" (a school has a janitor). If you can
> map your relationships to these two simple concepts then you've got a
> realistic model, otherwise you probably need to refactor / abstract or
> normalize (take your pick depending on your background, they're
essentially
> the same thing at this level of modelling).
>
> That leads to point number two. Modelling is modelling is modelling, and
> although a graph database might let you get up and running easily it's not
> going to save you from modelling your domain properly in the long run. The
> worse job you do up front the more work it's going to be to fix it in the
> long run. One common pattern is that if you think you have a meta type
then
> use that for your entity and add the details that makes it a specific sub
> type as instance data:
>
> a janitor is a type of job
>
> therefore the entity should be "job", not janitor, job type is a property
of
> job
>
> school is a type of building (or maybe even more abstractly a type of
> location), building type is a property of building
>
> therefore the entity should be building. Though in this case it may be
> location depending on exactly what the domain is going to be used for,
> though most likely a building "hasa" location...
>
> As long as your number of subtypes is some reasonably small number then
this
> pattern of abstraction works well. Here, "reasonably small" is in the
range
> of things you don't mind coding up a enum for in your code. (How's that
for
> a non-committal type guild-line? I'll pick 10 as an arbitrary limit if
> pushed.)
>
> Note that this need for abstraction is true for relational or graph
> databases. A relational database can behave polymorphically with respect
to
> type just as much as a graph database, the difference is that with a
> relational database, as you make the model more abstract the number of
joins
> needed to fully resolve type goes up (assuming its a fully normalized
> model). With a graph database you can always be an edge away. However, in
> this case the cost is the number of relationships that must be examined.
> There is no free lunch, the space / time trade off will always be there
and
> that is what you have to worry about as you determine whether you want to
> abstract more and build more of a meta model.
>
> This brings us to my third point. Your layers are perfectly realizable in
a
> graph database (or a relational database for that matter). There is no
> reason why the entity "janitor" can't have an "isa" relationship with
> "person" and it in turn can have an "isa" relationship with the metatype
> "mammal" if need be. Same with school, building and even structure if you
> need to go that far. If you really need this, proceed with caution, the
> metatypes are going to have many relationships as your instances grow and
> the cost of maintaining the metadata this way could get expensive, not so
> much in ways of space (since you've normalized out the metadata) but in
> terms of the time needed to traverse all the relationships. However, if
you
> have millions of each janitor or school and all of their equivalent sub
> types then this may be the way to go. IE; if you have M subtypes of job
> where M is a medium sized number (say 10 < M > 50) then you have can
reduce
> the size of the index on job by instead implementing the M different
> subtypes (you more-or-less divide the index size for jobs by M). As I
said,
> the cost is that you know have a whole bunch more relationship to the
> metatype, but if you don't normally have to touch them, or if finding
> specific instances of a subtype is a common and / or maybe expensive
> operation for some reason you've got the right model.
>
> Last point. What about the case where you have more than 50, or whatever
> you consider some large number of subtypes? In that case your model is
> likely wrong. There's hopefully a way to split the subtypes into
categories
> . If some categories overlap then figure out what makes them overlap and
> split that off as a category (type or metatype) of it's own. In other
> words, it's time to refactor, which takes us full circle back to point 1
and
> it's now time to go get some sleep.
>
> Hope this helps.
>
>
> Peter Hunsberger
>
>
> On Sat, Sep 24, 2011 at 12:52 AM, loldrup  wrote:
>
>> I'm trying to figure out how to model the world most flexibly (okay, so
I'm
>> sticking to modelling

Re: [Neo4j] Modelling with neo4j

2011-09-24 Thread Peter Hunsberger
I'm going to take a slightly different tack here than the responses you've
got so far...

First as others have pointed out, this is three entities and two
relationships

joe - is a - janitor - at the - school

This is important, the lighter weight the relationships, the less problem
you are going to have with needing some form of meta type for them.  You
really want to follow the basic OO lead here and stick to "isa" and "hasa".
 The "at the" is essentially a "hasa" (a school has a janitor).  If you can
map your relationships to these two simple concepts then you've got a
realistic model, otherwise you probably need to refactor / abstract or
normalize (take your pick depending on your background, they're essentially
the same thing at this level of modelling).

That leads to point number two.  Modelling is modelling is modelling, and
although a graph database might let you get up and running easily it's not
going to save you from modelling your domain properly in the long run.  The
worse job you do up front the more work it's going to be to fix it in the
long run.  One common pattern is that if you think you have a meta type then
use that for your entity and add the details that makes it a specific sub
type as instance data:

a janitor is a type of job

therefore the entity should be "job", not janitor, job type is a property of
job

school is a type of building (or maybe even more abstractly a type of
location), building type is a property of building

therefore the entity should be building. Though in this case it may be
location depending on exactly what the domain is going to be used for,
though most likely a building "hasa" location...

As long as your number of subtypes is some reasonably small number then this
pattern of abstraction works well.  Here, "reasonably small" is in the range
of things you don't mind coding up a enum for in your code.  (How's that for
a non-committal type guild-line? I'll pick 10 as an arbitrary limit if
pushed.)

Note that this need for abstraction is true for relational or graph
databases. A relational database can behave polymorphically with respect to
type just as much as a graph database, the difference is that with a
relational database, as you make the model more abstract the number of joins
needed to fully resolve type goes up (assuming its a fully normalized
model).  With a graph database you can always be an edge away.  However, in
this case the cost is the number of relationships that must be examined.
 There is no free lunch, the space / time trade off will always be there and
that is what you have to worry about as you determine whether you want to
abstract more and build more of a meta model.

This brings us to my third point.  Your layers are perfectly realizable in a
graph database (or a relational database for that matter).  There is no
reason why the entity "janitor" can't have an "isa" relationship with
"person" and it in turn can have an "isa" relationship with the metatype
"mammal" if need be. Same with school, building and even structure if you
need to go that far. If you really need this, proceed with caution, the
metatypes are going to have many relationships as your instances grow and
the cost of maintaining the metadata this way could get expensive, not so
much in ways of space (since you've normalized out the metadata) but in
terms of the time needed to traverse all the relationships.  However, if you
have millions of each janitor or school and all of their equivalent sub
types then this may be the way to go.  IE; if you have M subtypes of job
where M is a medium sized number (say 10 < M > 50) then you have can reduce
the size of the index on job by instead implementing the M different
subtypes (you more-or-less divide the index size for jobs by M).  As I said,
the cost is that you know have a whole bunch more relationship to the
metatype, but if you don't normally have to touch them, or if finding
specific instances of a subtype is a common and / or maybe expensive
operation for some reason you've got the right model.

Last point.  What about the case where you have more than 50, or whatever
you consider some large number of subtypes?  In that case your model is
likely wrong.  There's hopefully a way to split the subtypes into categories
. If some categories overlap then figure out what makes them overlap and
split that off as a category (type or metatype) of it's own.  In other
words, it's time to refactor, which takes us full circle back to point 1 and
it's now time to go get some sleep.

Hope this helps.


Peter Hunsberger


On Sat, Sep 24, 2011 at 12:52 AM, loldrup  wrote:

> I'm trying to figure out how to model the world most flexibly (okay, so I'm
> sticking to modelling organisations for now, but still). My main problem
> seems to occur when I want to allow the model to naturally expand in
> complexity. Say we have the following relationship:
>
> Joe is a janitor at the school.
>
> This can easily be modelled with two entities

Re: [Neo4j] Modelling with neo4j

2011-09-24 Thread Thad Guidry
So... that type of modeling is more inline with NLP and Noun / Verb Property
linkage.  Which you can do.  Do you need to also then describe semantically
the WORKS_AT relationship ? You could give all relationships themselves
describing properties, OR perhaps just link them to a SKOS_CONCEPT of _work_

In Freebase, we have Janitor looking more like this:
http://www.freebase.com/inspect/en/janitor  where we have assigned multiple
Types to that Entity (the "Janitor" Topic).  You'll also notice that it is
an Equivalent Topic to the SKOS_CONCEPT of a "Janitor":
http://www.freebase.com/inspect/authority/us/gov/loc/sh/sh85069345

Basically, Freebase uses a Triplestore
http://en.wikipedia.org/wiki/Triplestore called "graphd" to maintain quad
data: {, , , } more fully described on
our wiki here: http://wiki.freebase.com/wiki/Data_dump  Basically,
 is where a Namespace is held.  And you can see the layout of a
tuple when looking at any entity with the URI
http://www.freebase.com/inspect

If your more technically inclined about the underpinnings, Toby gives a
brief technical breakdown of graphd here:
http://blog.freebase.com/2008/04/09/a-brief-tour-of-graphd/  He also wrote a
book, http://shop.oreilly.com/product/9780596153823.do

How Freebase has enoted that a Person has an Employment with a Job Title at
an Employer is shown here:

http://www.freebase.com/inspect/en/patrick_simmons  <-- look at my example
school "THAD SCHOOL" that is linked out from Patrick Simmons to this source
node with various properties, http://www.freebase.com/inspect/m/0h5mvw6

Making more sense now ?

On Sat, Sep 24, 2011 at 1:57 PM, loldrup  wrote:

> What if:
> Joe WORKS_AT the school
> Joe WORKS_AS a janitor
> The school HAS_A janitor
>
> How do I denote that Joe works as  I janitor at that exact school?
> Do you see other problems in the notation above?
>
> Also, thank you very much for your thought inspiring reply!
>
> Jon
> On Sep 24, 2011 7:55 PM, "Thad Guidry [via Neo4j Community Discussions]" <
> ml-node+s438527n3364798...@n3.nabble.com> wrote:
> >
> >
> > Quite wrong.
> >
> > IS_JANITOR_OF will stick you into a boxed node ordinal.
> > What you really want when modeling the world is to only capture the
> > "semantic relationships" themselves. IS_A being a core semantic
> > relationship. I am a janitor. He IS_A janitor. What is a janitor ? What
> > properties does a janitor have ? Does a janitor always have those
> > properties, no matter it's state ? Does a janitor that LIVES_AT the
> > Seychelles Islands always have a pail and mop ?
> >
> > When trying to model "the world", you must break down to the lowest of
> lows.
> > And then use Types to clearly designate Property Reasonings.
> >
> > For instance, SWRC ontology says that Bioinformatics IS_A subtopic of
> > KnowledgeWeb Applications.
> >
> > 
> > https://wiki-sop.inria.fr/wiki/bin/view/Acacia/KnowledgeWeb#Bioinformatics
> ">
> > https://wiki-sop.inria.fr/wiki/bin/view/Acacia/KnowledgeWeb#Applications
> "/>
> > http://www.w3.org/2001/XMLSchema#string
> ">2.7.3
> > 
> > 
> > 
> >
> > Great for them. But WHAT is Bioinformatics to the rest of "the world",
> > generally ? Is it a FIELD_OF_STUDY as Freebase.com says ? Is it a
> > STUDY_SUBJECT as other Vocabularies describe ? Is a FIELD_OF_STUDY the
> same
> > as a STUDY_SUBJECT ? Or is it more proper and correct to say that a
> > FIELD_OF_STUDY can be PART_OF a STUDY_SUBJECT ? Bioinformatics PART_OF
> > Biology PART_OF Science ? I would say both and all. And there you would
> > need many "semantic relationships", depending again on the domains'
> usage.
> >
> > In Freebase, we decided early on that the lowest of lows would be TOPICS.
> > Some TOPICS could be given Types. A Janitor is a Type of Person. Oh
> > Really ? No. Not always to some ! But all domains typically agree that a
> > Janitor is a Profession. A Job_Type (TypeOfJob) that someone professes or
> > agrees to WORK_AS for payment. And some folks might be enslaved to
> WORK_AS
> > :)
> >
> > Existing Ontologies and Vocabularies (which are domain based, some wider
> > than others) can help anyone trying to model "the world". However, be
> aware
> > that many longtail domains, like Food Service, or Laser Etching, are
> simply
> > not modeled, no one has touched those yet in building ontologies or
> > vocabularies and henceforth, require community domain experts (the folks
> in
> > those businesses or scientific or government communities) to help you
> think
> > correctly within their domains, rather than how "the rest of world" would
> > typically organize them. Organizing across *domains* with Types will
> > require Namespaces for those domains, and in some cases, you will find
> that
> > only a FEW Properties really apply to a specific Namespace. They are just
> > simply NOT used by the rest of "the world".
> >
> > The very last part for you in modeling "the world" should be at a CONCEPT
> > level. Like SKOS_CONCEPT. Only once you have seen the overlap of a
> CONCEPT
> > a

Re: [Neo4j] Modelling with neo4j

2011-09-24 Thread loldrup
What if:
Joe WORKS_AT the school
Joe WORKS_AS a janitor
The school HAS_A janitor

How do I denote that Joe works as  I janitor at that exact school?
Do you see other problems in the notation above?

Also, thank you very much for your thought inspiring reply!

Jon
On Sep 24, 2011 7:55 PM, "Thad Guidry [via Neo4j Community Discussions]" <
ml-node+s438527n3364798...@n3.nabble.com> wrote:
>
>
> Quite wrong.
>
> IS_JANITOR_OF will stick you into a boxed node ordinal.
> What you really want when modeling the world is to only capture the
> "semantic relationships" themselves. IS_A being a core semantic
> relationship. I am a janitor. He IS_A janitor. What is a janitor ? What
> properties does a janitor have ? Does a janitor always have those
> properties, no matter it's state ? Does a janitor that LIVES_AT the
> Seychelles Islands always have a pail and mop ?
>
> When trying to model "the world", you must break down to the lowest of
lows.
> And then use Types to clearly designate Property Reasonings.
>
> For instance, SWRC ontology says that Bioinformatics IS_A subtopic of
> KnowledgeWeb Applications.
>
> 
> https://wiki-sop.inria.fr/wiki/bin/view/Acacia/KnowledgeWeb#Bioinformatics
">
> https://wiki-sop.inria.fr/wiki/bin/view/Acacia/KnowledgeWeb#Applications
"/>
> http://www.w3.org/2001/XMLSchema#string
">2.7.3
> 
> 
> 
>
> Great for them. But WHAT is Bioinformatics to the rest of "the world",
> generally ? Is it a FIELD_OF_STUDY as Freebase.com says ? Is it a
> STUDY_SUBJECT as other Vocabularies describe ? Is a FIELD_OF_STUDY the
same
> as a STUDY_SUBJECT ? Or is it more proper and correct to say that a
> FIELD_OF_STUDY can be PART_OF a STUDY_SUBJECT ? Bioinformatics PART_OF
> Biology PART_OF Science ? I would say both and all. And there you would
> need many "semantic relationships", depending again on the domains' usage.
>
> In Freebase, we decided early on that the lowest of lows would be TOPICS.
> Some TOPICS could be given Types. A Janitor is a Type of Person. Oh
> Really ? No. Not always to some ! But all domains typically agree that a
> Janitor is a Profession. A Job_Type (TypeOfJob) that someone professes or
> agrees to WORK_AS for payment. And some folks might be enslaved to WORK_AS
> :)
>
> Existing Ontologies and Vocabularies (which are domain based, some wider
> than others) can help anyone trying to model "the world". However, be
aware
> that many longtail domains, like Food Service, or Laser Etching, are
simply
> not modeled, no one has touched those yet in building ontologies or
> vocabularies and henceforth, require community domain experts (the folks
in
> those businesses or scientific or government communities) to help you
think
> correctly within their domains, rather than how "the rest of world" would
> typically organize them. Organizing across *domains* with Types will
> require Namespaces for those domains, and in some cases, you will find
that
> only a FEW Properties really apply to a specific Namespace. They are just
> simply NOT used by the rest of "the world".
>
> The very last part for you in modeling "the world" should be at a CONCEPT
> level. Like SKOS_CONCEPT. Only once you have seen the overlap of a CONCEPT
> across domains, can you then begin to give the answer, YES, when 2 or 3
> domains ask, "Is this CONCEPT_OF "Janitor - a profession type where
someone
> cleans" the SAME_AS ours and RELATED_TO the CONCEPT_OF "Maid" ?
>
> Proper "semantic relationships" have to allow flexibility across domains.
> Find some common overlapping Types and Topics across Domains, and then
> begin your experimentation there (and make sure you get a bit of History
or
> Historical Types in there as well to account for Time Space associations -
> those always screw with my head personally, lol). You will soon begin to
> see that Domains are really like "Photoshop layers".
>
> --
> -Thad
> http://www.freebase.com/view/en/thad_guidry
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
>
> ___
> If you reply to this email, your message will be added to the discussion
below:
>
http://neo4j-community-discussions.438527.n3.nabble.com/Modelling-with-neo4j-tp3363823p3364798.html
>
> To unsubscribe from Modelling with neo4j, visit
http://neo4j-community-discussions.438527.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3363823&code=bG9sZHJ1cEBnbWFpbC5jb218MzM2MzgyM3wtODU1NTY5ODYz


--
View this message in context: 
http://neo4j-community-discussions.438527.n3.nabble.com/Modelling-with-neo4j-tp3363823p3364902.html
Sent from the Neo4j Community Discussions mailing list archive at Nabble.com.
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Modelling with neo4j

2011-09-24 Thread Thad Guidry
Unfortunately, most of what I describe as best practices in modeling
"semantic relationships" has not been written well ENOUGH in most books.
 However the W3C community lists such as SKOS and RDFa have many experts
that have deep knowledge around those best practices.  Asking your questions
there on there mailing lists would provide very good guidance for you.

The starting point of it all would be here:
http://www.w3.org/standards/semanticweb/

Specifically, get your questions answered on this mailing lists:

public-...@w3.org
and
semantic-...@w3.org

On Sat, Sep 24, 2011 at 1:31 PM, loldrup  wrote:

> Hmm.. Which book would you recommend me to read?
>
> Jon
> On Sep 24, 2011 7:55 PM, "Thad Guidry [via Neo4j Community Discussions]" <
> ml-node+s438527n3364798...@n3.nabble.com> wrote:
> >
> >
> > Quite wrong.
> >
> > IS_JANITOR_OF will stick you into a boxed node ordinal.
> > What you really want when modeling the world is to only capture the
> > "semantic relationships" themselves. IS_A being a core semantic
> > relationship. I am a janitor. He IS_A janitor. What is a janitor ? What
> > properties does a janitor have ? Does a janitor always have those
> > properties, no matter it's state ? Does a janitor that LIVES_AT the
> > Seychelles Islands always have a pail and mop ?
> >
> > When trying to model "the world", you must break down to the lowest of
> lows.
> > And then use Types to clearly designate Property Reasonings.
> >
> > For instance, SWRC ontology says that Bioinformatics IS_A subtopic of
> > KnowledgeWeb Applications.
> >
> > 
> > https://wiki-sop.inria.fr/wiki/bin/view/Acacia/KnowledgeWeb#Bioinformatics
> ">
> > https://wiki-sop.inria.fr/wiki/bin/view/Acacia/KnowledgeWeb#Applications
> "/>
> > http://www.w3.org/2001/XMLSchema#string
> ">2.7.3
> > 
> > 
> > 
> >
> > Great for them. But WHAT is Bioinformatics to the rest of "the world",
> > generally ? Is it a FIELD_OF_STUDY as Freebase.com says ? Is it a
> > STUDY_SUBJECT as other Vocabularies describe ? Is a FIELD_OF_STUDY the
> same
> > as a STUDY_SUBJECT ? Or is it more proper and correct to say that a
> > FIELD_OF_STUDY can be PART_OF a STUDY_SUBJECT ? Bioinformatics PART_OF
> > Biology PART_OF Science ? I would say both and all. And there you would
> > need many "semantic relationships", depending again on the domains'
> usage.
> >
> > In Freebase, we decided early on that the lowest of lows would be TOPICS.
> > Some TOPICS could be given Types. A Janitor is a Type of Person. Oh
> > Really ? No. Not always to some ! But all domains typically agree that a
> > Janitor is a Profession. A Job_Type (TypeOfJob) that someone professes or
> > agrees to WORK_AS for payment. And some folks might be enslaved to
> WORK_AS
> > :)
> >
> > Existing Ontologies and Vocabularies (which are domain based, some wider
> > than others) can help anyone trying to model "the world". However, be
> aware
> > that many longtail domains, like Food Service, or Laser Etching, are
> simply
> > not modeled, no one has touched those yet in building ontologies or
> > vocabularies and henceforth, require community domain experts (the folks
> in
> > those businesses or scientific or government communities) to help you
> think
> > correctly within their domains, rather than how "the rest of world" would
> > typically organize them. Organizing across *domains* with Types will
> > require Namespaces for those domains, and in some cases, you will find
> that
> > only a FEW Properties really apply to a specific Namespace. They are just
> > simply NOT used by the rest of "the world".
> >
> > The very last part for you in modeling "the world" should be at a CONCEPT
> > level. Like SKOS_CONCEPT. Only once you have seen the overlap of a
> CONCEPT
> > across domains, can you then begin to give the answer, YES, when 2 or 3
> > domains ask, "Is this CONCEPT_OF "Janitor - a profession type where
> someone
> > cleans" the SAME_AS ours and RELATED_TO the CONCEPT_OF "Maid" ?
> >
> > Proper "semantic relationships" have to allow flexibility across domains.
> > Find some common overlapping Types and Topics across Domains, and then
> > begin your experimentation there (and make sure you get a bit of History
> or
> > Historical Types in there as well to account for Time Space associations
> -
> > those always screw with my head personally, lol). You will soon begin to
> > see that Domains are really like "Photoshop layers".
> >
> > --
> > -Thad
> > http://www.freebase.com/view/en/thad_guidry
> > ___
> > Neo4j mailing list
> > User@lists.neo4j.org
> > https://lists.neo4j.org/mailman/listinfo/user
> >
> >
> > ___
> > If you reply to this email, your message will be added to the discussion
> below:
> >
>
> http://neo4j-community-discussions.438527.n3.nabble.com/Modelling-with-neo4j-tp3363823p3364798.html
> >
> > To unsubscribe from Modelling with neo4j, visit
>
> http://neo4j-community-discussions.438527.n3.nabbl

Re: [Neo4j] Modelling with neo4j

2011-09-24 Thread loldrup
Hmm.. Which book would you recommend me to read?

Jon
On Sep 24, 2011 7:55 PM, "Thad Guidry [via Neo4j Community Discussions]" <
ml-node+s438527n3364798...@n3.nabble.com> wrote:
>
>
> Quite wrong.
>
> IS_JANITOR_OF will stick you into a boxed node ordinal.
> What you really want when modeling the world is to only capture the
> "semantic relationships" themselves. IS_A being a core semantic
> relationship. I am a janitor. He IS_A janitor. What is a janitor ? What
> properties does a janitor have ? Does a janitor always have those
> properties, no matter it's state ? Does a janitor that LIVES_AT the
> Seychelles Islands always have a pail and mop ?
>
> When trying to model "the world", you must break down to the lowest of
lows.
> And then use Types to clearly designate Property Reasonings.
>
> For instance, SWRC ontology says that Bioinformatics IS_A subtopic of
> KnowledgeWeb Applications.
>
> 
> https://wiki-sop.inria.fr/wiki/bin/view/Acacia/KnowledgeWeb#Bioinformatics
">
> https://wiki-sop.inria.fr/wiki/bin/view/Acacia/KnowledgeWeb#Applications
"/>
> http://www.w3.org/2001/XMLSchema#string
">2.7.3
> 
> 
> 
>
> Great for them. But WHAT is Bioinformatics to the rest of "the world",
> generally ? Is it a FIELD_OF_STUDY as Freebase.com says ? Is it a
> STUDY_SUBJECT as other Vocabularies describe ? Is a FIELD_OF_STUDY the
same
> as a STUDY_SUBJECT ? Or is it more proper and correct to say that a
> FIELD_OF_STUDY can be PART_OF a STUDY_SUBJECT ? Bioinformatics PART_OF
> Biology PART_OF Science ? I would say both and all. And there you would
> need many "semantic relationships", depending again on the domains' usage.
>
> In Freebase, we decided early on that the lowest of lows would be TOPICS.
> Some TOPICS could be given Types. A Janitor is a Type of Person. Oh
> Really ? No. Not always to some ! But all domains typically agree that a
> Janitor is a Profession. A Job_Type (TypeOfJob) that someone professes or
> agrees to WORK_AS for payment. And some folks might be enslaved to WORK_AS
> :)
>
> Existing Ontologies and Vocabularies (which are domain based, some wider
> than others) can help anyone trying to model "the world". However, be
aware
> that many longtail domains, like Food Service, or Laser Etching, are
simply
> not modeled, no one has touched those yet in building ontologies or
> vocabularies and henceforth, require community domain experts (the folks
in
> those businesses or scientific or government communities) to help you
think
> correctly within their domains, rather than how "the rest of world" would
> typically organize them. Organizing across *domains* with Types will
> require Namespaces for those domains, and in some cases, you will find
that
> only a FEW Properties really apply to a specific Namespace. They are just
> simply NOT used by the rest of "the world".
>
> The very last part for you in modeling "the world" should be at a CONCEPT
> level. Like SKOS_CONCEPT. Only once you have seen the overlap of a CONCEPT
> across domains, can you then begin to give the answer, YES, when 2 or 3
> domains ask, "Is this CONCEPT_OF "Janitor - a profession type where
someone
> cleans" the SAME_AS ours and RELATED_TO the CONCEPT_OF "Maid" ?
>
> Proper "semantic relationships" have to allow flexibility across domains.
> Find some common overlapping Types and Topics across Domains, and then
> begin your experimentation there (and make sure you get a bit of History
or
> Historical Types in there as well to account for Time Space associations -
> those always screw with my head personally, lol). You will soon begin to
> see that Domains are really like "Photoshop layers".
>
> --
> -Thad
> http://www.freebase.com/view/en/thad_guidry
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
>
> ___
> If you reply to this email, your message will be added to the discussion
below:
>
http://neo4j-community-discussions.438527.n3.nabble.com/Modelling-with-neo4j-tp3363823p3364798.html
>
> To unsubscribe from Modelling with neo4j, visit
http://neo4j-community-discussions.438527.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3363823&code=bG9sZHJ1cEBnbWFpbC5jb218MzM2MzgyM3wtODU1NTY5ODYz


--
View this message in context: 
http://neo4j-community-discussions.438527.n3.nabble.com/Modelling-with-neo4j-tp3363823p3364865.html
Sent from the Neo4j Community Discussions mailing list archive at Nabble.com.
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Modelling with neo4j

2011-09-24 Thread Thad Guidry
Quite wrong.

IS_JANITOR_OF will stick you into a boxed node ordinal.
What you really want when modeling the world is to only capture the
"semantic relationships" themselves.  IS_A being a core semantic
relationship.  I am a janitor.  He IS_A janitor.  What is a janitor ?  What
properties does a janitor have ?  Does a janitor always have those
properties, no matter it's state ?  Does a janitor that LIVES_AT the
Seychelles Islands always have a pail and mop ?

When trying to model "the world", you must break down to the lowest of lows.
 And then use Types to clearly designate Property Reasonings.

For instance, SWRC ontology says that Bioinformatics IS_A subtopic of
KnowledgeWeb Applications.


https://wiki-sop.inria.fr/wiki/bin/view/Acacia/KnowledgeWeb#Bioinformatics";>
https://wiki-sop.inria.fr/wiki/bin/view/Acacia/KnowledgeWeb#Applications"/>
http://www.w3.org/2001/XMLSchema#string";>2.7.3




Great for them.  But WHAT is Bioinformatics to the rest of "the world",
generally ?  Is it a FIELD_OF_STUDY as Freebase.com says ?  Is it a
STUDY_SUBJECT as other Vocabularies describe ?  Is a FIELD_OF_STUDY the same
as a STUDY_SUBJECT ?  Or is it more proper and correct to say that a
FIELD_OF_STUDY can be PART_OF a STUDY_SUBJECT ?  Bioinformatics PART_OF
Biology PART_OF Science ?  I would say both and all.  And there you would
need many "semantic relationships", depending again on the domains' usage.

In Freebase, we decided early on that the lowest of lows would be TOPICS.
 Some TOPICS could be given Types.  A Janitor is a Type of Person.  Oh
Really ?  No. Not always to some !  But all domains typically agree that a
Janitor is a Profession.  A Job_Type (TypeOfJob) that someone professes or
agrees to WORK_AS for payment.  And some folks might be enslaved to WORK_AS
:)

Existing Ontologies and Vocabularies (which are domain based, some wider
than others) can help anyone trying to model "the world".  However, be aware
that many longtail domains, like Food Service, or Laser Etching, are simply
not modeled, no one has touched those yet in building ontologies or
vocabularies and henceforth, require community domain experts (the folks in
those businesses or scientific or government communities) to help you think
correctly within their domains, rather than how "the rest of world" would
typically organize them.  Organizing across *domains* with Types will
require Namespaces for those domains, and in some cases, you will find that
only a FEW Properties really apply to a specific Namespace.  They are just
simply NOT used by the rest of "the world".

The very last part for you in modeling "the world" should be at a CONCEPT
level.  Like SKOS_CONCEPT.  Only once you have seen the overlap of a CONCEPT
across domains, can you then begin to give the answer, YES, when 2 or 3
domains ask, "Is this CONCEPT_OF "Janitor - a profession type where someone
cleans" the SAME_AS ours and RELATED_TO the CONCEPT_OF "Maid" ?

Proper "semantic relationships" have to allow flexibility across domains.
 Find some common overlapping Types and Topics across Domains, and then
begin your experimentation there (and make sure you get a bit of History or
Historical Types in there as well to account for Time Space associations -
those always screw with my head personally, lol).  You will soon begin to
see that Domains are really like "Photoshop layers".

-- 
-Thad
http://www.freebase.com/view/en/thad_guidry
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Modelling with neo4j

2011-09-24 Thread Niels Hoogeveen

Subtyping works as follows in Enhanced API.
When calling getRelationships(RelationshipType, Direction) or any of its 
alternatives, the API looks up all subtypes of that relationship type and then 
call getRelationshipTypes(Direction, ). All 
you need to do is create a RelationshipType "IS_JANITOR_OF" and a 
RelationshipType "WORKS_FOR" and state that the former is a subtype of the 
latter. 
Haskell type classes are a great mechanism for ad-hoc polymorphism and in some 
ways are preferable to subtyping, though not necessarily in the context of a 
database. It allows you indeed to say there is a commonality between "WORKS_AT" 
and "IS_JANITOR_OF", but it doesn't allow you to state that the relationships 
of type  "IS_JANITOR_OF" are a subset of the relationships of type "WORKS_AT". 
In a database context the subsumption rule is actually quite important and 
Haskell type classes don't offer that. The combination of type classes and 
subtyping is as far as I know still an open research topic. It is not without 
reason that Scala (which has subtyping) doesn't have type classes, though it 
allows similar constructs through implicit conversions. Working in both 
disciplines at the same time (poor-man type classes through implicit 
conversions in combination with subtyping) seems to be non-trivial. 
Niels

> Date: Sat, 24 Sep 2011 08:09:48 -0700
> From: lold...@gmail.com
> To: user@lists.neo4j.org
> Subject: Re: [Neo4j] Modelling with neo4j
> 
> Subtyping of relationship types sounds like the cure to my problems.
> When creating a relationship IS_A_JANITOR_OF, will a corresponding
> relationship type IS_A_JANITOR_OF-relationship-type automatically be
> created?
> 
> If I have a simple relationship can I then ask which relationship types it's
> type is a subtype of?
> 
> Regarding interfaces:
> I took the idea of interfaces from Haskells type classes, which makes great
> sense as interfaces. In Neo4j we could imagine that relationships with types
> WORKS_AT and REFERS_TO might have something in common (e.g. they both have
> to specify a boss who gives them orders).
> 
> For now I don't think my problem requires interfaces before it can be be
> solved, but I only just started so who knows :)
> 
> Jon
> On Sep 24, 2011 3:15 PM, "Niels Hoogeveen [via Neo4j Community Discussions]"
>  wrote:
> >
> >
> >
> > You raise interesting questions, most of them very much related to the
> work I did on Enhanced API.
> >
> > Let me start with the distinction between Node and Relationship, which in
> my opinion too is a bit artificial. I understand when creating a graph
> database, it is helpful to have something like vertices and edges, but
> indeed see those more as modalities of the elements of the graph than as
> clearly separated types. This was one of the reasons to unify all elements
> of the graph with one underlying type.
> >
> > At the time, I saw two option:
> >
> > a) make the graph bipartite, so that all relationships and properties
> become nodes and use relationships only as a hidden linking feature
> > b) create shadow nodes for relationships and properties when needed and
> let the API handle that transparently
> >
> > I chose for option b for performance reasons. There are likely many
> applications where most of the relationships are simple, ie. link two nodes
> while possibly having some properties. Using a bipartite layout for such
> relationships adds nothing, but it takes twice as many links to traverse.
> >
> > The shadow node solution only treats relationships and properties as
> special (having relationships to them) when that is needed.
> >
> > Now to the typing issues. Neo4j has chosen not to add typing features to
> the database and I actually like that. It allows for optional type systems
> that can be used but are not enforced to be used.
> >
> > Type systems are nice beasts, especially when dealing with large and
> complex applications, but they impose a development overhead, mostly felt in
> small quick and dirty applications. This is true for programming languages,
> where many people prefer to use an untyped language such as Javascript,
> Python, Ruby and PHP over a typed language such as Java, Scala, C# or
> Haskell and I think it is also true for databases. I think one of the
> reasons NOSQL became so popular is because the type system of an RDBMS adds
> overhead to simple applications.
> >
> > An RDBMS needs a type system because the storage layout requires that.
> Tables have a fixed number of columns, where each column has a designated
> type. While this is a great feature when processing massive amounts of
> similar data, it c

Re: [Neo4j] Modelling with neo4j

2011-09-24 Thread Niels Hoogeveen

I too think that much of the type system of the application actually belongs in 
the application domain layer. There are so many trade-offs that can be made 
that a one-size-fits-all typing of a graph is not the way to go. This is why I 
decided that enhanced-api should only provide the nuts and bolts to roll your 
own type-layer.
Typing of nodes is certainly something that is important in many applications. 
There is minimal support for node types in enhanced-api (just a node with a 
name), and it uses the unidirectional relationship pattern you describe. Each 
node has a property containing the id's of the node-types (stored as an array 
of long values). This makes the look-up of node-types fast since no indexing is 
required.
I am still struggling a bit with the notion of what makes a node-type. On the 
one hand there are structural aspects to a node type. For example a node 
represents a person and therefore has a "first name", "last name", "birth 
date". On the other hand there are relational aspects to a node type. For 
example a relationship can represent the fact that a person is "president" of 
some "country". Such a relationship in some ways enhances that node with an 
additional type; the "person" becomes a "president". I am still not sure how to 
exactly model that. 
Some node-types seem to be intrinsic, ie. its type is independent of other 
entities in the database, while other node-types seem to require other 
entities. After all a person is a person, independent of what other information 
we have in the database, while a person can only be "president" if there is a 
relationship to a "country" (or an organization, if we want to denote the other 
(related) meaning of "president").
Such relationship related node-types can impose additional constraints on the 
node. For example to be president of the USA, one has to be older than 35 and 
in other countries similar constraints exits. 
I am still not sure how to incorporate this in a database model. Maybe the 
notion of an intrinsic type is wrong after all and node-types only arise out of 
relationships with other nodes. It is certainly food for thought.
The unidirectional relationship you describe is indeed somewhat of a hack, but 
it is necessary as long as the densely-connected-node issue is not solved. I 
hope this will be the case in Neo4j 1.6. It would be nice to be able to always 
have bidirectional relationships and I see no reason why this can't be the 
case. The solution to the problem is beneficial to the database in more than 
one way and has no additional cost, except for the implementation of it. 
Niels
> From: rick.bullo...@thingworx.com
> To: user@lists.neo4j.org
> Date: Sat, 24 Sep 2011 06:37:43 -0700
> Subject: Re: [Neo4j] Modelling with neo4j
> 
> That's a great summary, Niels.  Very similar to how we've applied Neo4J here 
> at ThingWorx, though we've done most of the type system work (nodes and 
> relationships are all typed/subtyped) in our application domain layer.  A few 
> other items that we leveraged in our implementation that you may wish to 
> consider:
> 
> - A common pattern we encountered was a "collection" of typed entities (e.g. 
> a typed collection), and we implemented a specific model using supernodes for 
> this.  This also allowed us to rapidly and easily iterate/search collections 
> and also to organize nodes in a "human comprehensible way" that can be 
> readily viewed with something like Neoclipse for troubleshooting purposes.  
> Also, if the type was "truck", we stamped the node with the type "truck" as a 
> property (using enumerations with a custom int member) and used that same 
> enum as the relationship type between the node and the collection node.  In 
> our model, an entity has a single "type", but we implemented the concept of 
> supertyping/subtyping in our domain model
> 
> - We found quite a few examples where a "one-way relationship" was more than 
> adequate and, instead of incurring the overhead of a relationship 
> (particularly when millions of these relationships were attached to a single 
> supernode), we used a *property* on a node containing the node id of the node 
> it references.  Sounds like a hack, but it actually has substantial 
> performance advantages, particularly if you are frequently adding/removing 
> relationships to/from the supernode
> 
> - We overlaid our own REST API on our domain model, and wanted to come up 
> with a simple way to resolve the URI for any given node/entity.  For that, we 
> used a pattern for which each node can have an optional "parent" node type.  
> Example:  a blog comment is always attached to a blog entry or other blog 
> comment.  A blo

Re: [Neo4j] Modelling with neo4j

2011-09-24 Thread loldrup
Subtyping of relationship types sounds like the cure to my problems.
When creating a relationship IS_A_JANITOR_OF, will a corresponding
relationship type IS_A_JANITOR_OF-relationship-type automatically be
created?

If I have a simple relationship can I then ask which relationship types it's
type is a subtype of?

Regarding interfaces:
I took the idea of interfaces from Haskells type classes, which makes great
sense as interfaces. In Neo4j we could imagine that relationships with types
WORKS_AT and REFERS_TO might have something in common (e.g. they both have
to specify a boss who gives them orders).

For now I don't think my problem requires interfaces before it can be be
solved, but I only just started so who knows :)

Jon
On Sep 24, 2011 3:15 PM, "Niels Hoogeveen [via Neo4j Community Discussions]"
 wrote:
>
>
>
> You raise interesting questions, most of them very much related to the
work I did on Enhanced API.
>
> Let me start with the distinction between Node and Relationship, which in
my opinion too is a bit artificial. I understand when creating a graph
database, it is helpful to have something like vertices and edges, but
indeed see those more as modalities of the elements of the graph than as
clearly separated types. This was one of the reasons to unify all elements
of the graph with one underlying type.
>
> At the time, I saw two option:
>
> a) make the graph bipartite, so that all relationships and properties
become nodes and use relationships only as a hidden linking feature
> b) create shadow nodes for relationships and properties when needed and
let the API handle that transparently
>
> I chose for option b for performance reasons. There are likely many
applications where most of the relationships are simple, ie. link two nodes
while possibly having some properties. Using a bipartite layout for such
relationships adds nothing, but it takes twice as many links to traverse.
>
> The shadow node solution only treats relationships and properties as
special (having relationships to them) when that is needed.
>
> Now to the typing issues. Neo4j has chosen not to add typing features to
the database and I actually like that. It allows for optional type systems
that can be used but are not enforced to be used.
>
> Type systems are nice beasts, especially when dealing with large and
complex applications, but they impose a development overhead, mostly felt in
small quick and dirty applications. This is true for programming languages,
where many people prefer to use an untyped language such as Javascript,
Python, Ruby and PHP over a typed language such as Java, Scala, C# or
Haskell and I think it is also true for databases. I think one of the
reasons NOSQL became so popular is because the type system of an RDBMS adds
overhead to simple applications.
>
> An RDBMS needs a type system because the storage layout requires that.
Tables have a fixed number of columns, where each column has a designated
type. While this is a great feature when processing massive amounts of
similar data, it can also make the application brittle. The tight coupling
between type system and storage layout makes that rapid schema evolution is
not easy to do.
>
> Neo4j doesn't impose a type system like an RDBMS does, because its storage
layout doesn't require it. Something is either a node, a relationship or a
property, but the combinations don't need to explicit modelling for the sake
of storage.
>
> Because of this untyped nature of the database, it now becomes possible to
add a type system that not only is optional, but can in fact be made as
strong or as weak as the application demands.
>
> Unfortunately Neo4j doesn't provide all the necessary hooks for a type
system, another reason why I started Enhanced API. It was not my intention
with that API to provide a full fledged type system to Neo4j, but to provide
the necessary hooks so a type system can be created.
>
> Of course there is some type-creep in Neo4j. Properties and relationships
have names, which in almost every application are used as types. Say we have
several nodes we like to use to store information about people, where each
of those nodes has a property "last_name". This property name effectively is
used as a type. For all nodes the property name will denote the same fact:
the last name of a person.
>
> This is not necessarily required by the Neo4j database. Different nodes
may use the same property name to denote different things even with
different datatypes. It is possible to have nodes with property name
"last_name" that for some nodes is a String while it is an Integer for other
nodes. While this is possible, I venture this is not all that common. The
same property name will likely be used to denote the same fact and have the
same datatype across the graph and therefore in most common cases be used
like a type.
>
> The same applies to relationships, where the name will in general be used
to denote the same type of relationship. It is unlikely an application with
use

Re: [Neo4j] Modelling with neo4j

2011-09-24 Thread Niels Hoogeveen

+1
Enhanced API grew out of a couple of classes I added to make 
IndexedRelationship work more easily (not exposing comparators), but it is 
essentially a separate component. Giving it that status would help other's 
improve it. Having laid some of the ground work, I feel it needs other people's 
input too. As it stands now, it is very much a one-man's work and while I am 
confident it contains plenty of good ideas, it can only grow with the input of 
other developers, just like IndexedRelationships has become much better thanks 
to the work Bryce put into it, and the work of others to include 
graph-collections with structures I would not even have thought about.
There is however one thing we need to look at. Right now IndexRelationships has 
a dependency on Enhanced API for the indexing of nodes based on a property. At 
the same time Enhanced API has a dependency on graph-collections, transparently 
supporting IndexedRelationships in the API.
I think it would be best to remove the dependency of graph-collections on 
enhanced-api and only offer the slightly more complex option where the user 
needs to provide a comparator. The other dependency can remain and in fact can 
even be made stronger. Enhanced API could in principle be made to support any 
type of collection, now that Bryce has added a generic nodecollection interface.
I agree "enhanced api" is not a great name, it says what it does, but certainly 
has little appeal. So I will be happy if someone can come up with something 
sexier.
Niels
> From: peter.neuba...@neotechnology.com
> Date: Sat, 24 Sep 2011 15:42:13 +0200
> To: user@lists.neo4j.org
> Subject: Re: [Neo4j] Modelling with neo4j
> 
> Great thoughts guys!
> I think it would be interesting to break out the Enhanced API" from
> graph-collections, rename it into something better (we can think of a
> name together) and provide a more fully fledged example that we can
> document and evolve.
> 
> WDYT?
> 
> Cheers,
> 
> /peter neubauer
> 
> GTalk:  neubauer.peter
> Skype   peter.neubauer
> Phone   +46 704 106975
> LinkedIn   http://www.linkedin.com/in/neubauer
> Twitter  http://twitter.com/peterneubauer
> 
> http://www.neo4j.org   - Your high performance graph database.
> http://startupbootcamp.org/- Öresund - Innovation happens HERE.
> http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.
> 
> 
> 
> On Sat, Sep 24, 2011 at 3:37 PM, Rick Bullotta
>  wrote:
> > That's a great summary, Niels.  Very similar to how we've applied Neo4J 
> > here at ThingWorx, though we've done most of the type system work (nodes 
> > and relationships are all typed/subtyped) in our application domain layer.  
> > A few other items that we leveraged in our implementation that you may wish 
> > to consider:
> >
> > - A common pattern we encountered was a "collection" of typed entities 
> > (e.g. a typed collection), and we implemented a specific model using 
> > supernodes for this.  This also allowed us to rapidly and easily 
> > iterate/search collections and also to organize nodes in a "human 
> > comprehensible way" that can be readily viewed with something like 
> > Neoclipse for troubleshooting purposes.  Also, if the type was "truck", we 
> > stamped the node with the type "truck" as a property (using enumerations 
> > with a custom int member) and used that same enum as the relationship type 
> > between the node and the collection node.  In our model, an entity has a 
> > single "type", but we implemented the concept of supertyping/subtyping in 
> > our domain model
> >
> > - We found quite a few examples where a "one-way relationship" was more 
> > than adequate and, instead of incurring the overhead of a relationship 
> > (particularly when millions of these relationships were attached to a 
> > single supernode), we used a *property* on a node containing the node id of 
> > the node it references.  Sounds like a hack, but it actually has 
> > substantial performance advantages, particularly if you are frequently 
> > adding/removing relationships to/from the supernode
> >
> > - We overlaid our own REST API on our domain model, and wanted to come up 
> > with a simple way to resolve the URI for any given node/entity.  For that, 
> > we used a pattern for which each node can have an optional "parent" node 
> > type.  Example:  a blog comment is always attached to a blog entry or other 
> > blog comment.  A blog entry is always attached to a blog.  A blog is always 
> > attached to the blogs collection, and so on.  Each node has a name and/or 
> > an ID.

Re: [Neo4j] Modelling with neo4j

2011-09-24 Thread Peter Neubauer
Great thoughts guys!
I think it would be interesting to break out the Enhanced API" from
graph-collections, rename it into something better (we can think of a
name together) and provide a more fully fledged example that we can
document and evolve.

WDYT?

Cheers,

/peter neubauer

GTalk:      neubauer.peter
Skype       peter.neubauer
Phone       +46 704 106975
LinkedIn   http://www.linkedin.com/in/neubauer
Twitter      http://twitter.com/peterneubauer

http://www.neo4j.org               - Your high performance graph database.
http://startupbootcamp.org/    - Öresund - Innovation happens HERE.
http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.



On Sat, Sep 24, 2011 at 3:37 PM, Rick Bullotta
 wrote:
> That's a great summary, Niels.  Very similar to how we've applied Neo4J here 
> at ThingWorx, though we've done most of the type system work (nodes and 
> relationships are all typed/subtyped) in our application domain layer.  A few 
> other items that we leveraged in our implementation that you may wish to 
> consider:
>
> - A common pattern we encountered was a "collection" of typed entities (e.g. 
> a typed collection), and we implemented a specific model using supernodes for 
> this.  This also allowed us to rapidly and easily iterate/search collections 
> and also to organize nodes in a "human comprehensible way" that can be 
> readily viewed with something like Neoclipse for troubleshooting purposes.  
> Also, if the type was "truck", we stamped the node with the type "truck" as a 
> property (using enumerations with a custom int member) and used that same 
> enum as the relationship type between the node and the collection node.  In 
> our model, an entity has a single "type", but we implemented the concept of 
> supertyping/subtyping in our domain model
>
> - We found quite a few examples where a "one-way relationship" was more than 
> adequate and, instead of incurring the overhead of a relationship 
> (particularly when millions of these relationships were attached to a single 
> supernode), we used a *property* on a node containing the node id of the node 
> it references.  Sounds like a hack, but it actually has substantial 
> performance advantages, particularly if you are frequently adding/removing 
> relationships to/from the supernode
>
> - We overlaid our own REST API on our domain model, and wanted to come up 
> with a simple way to resolve the URI for any given node/entity.  For that, we 
> used a pattern for which each node can have an optional "parent" node type.  
> Example:  a blog comment is always attached to a blog entry or other blog 
> comment.  A blog entry is always attached to a blog.  A blog is always 
> attached to the blogs collection, and so on.  Each node has a name and/or an 
> ID.  Because those relationship "patterns" are well known, it is a trival 
> matter to create the URI to any entity given only its node, e.g.:
>
> /Blogs/MyBlog/Entries/103/Comments/204
>
> Of course, it works the other way as well - easy to parse and traverse.
>
> - We often found that there were data structures in our application domain 
> for which it was OK to be "opaque" - e.g. although the structures were deep 
> and complex, they did not require searchability or traversability (e.g. they 
> were kind like "object blobs"), so in our metamodel, they are not stored as 
> nodes, relationships, and properties, but rather, as a JSON blob, serialized 
> as a string to a node property.  That has worked out really well.  When we do 
> need to filter/manipulate those, we do them at the domain level
>
> Just wanted to share some more examples.
>
> Rick
>
> ____
> From: user-boun...@lists.neo4j.org [user-boun...@lists.neo4j.org] On Behalf 
> Of Niels Hoogeveen [pd_aficion...@hotmail.com]
> Sent: Saturday, September 24, 2011 9:14 AM
> To: user@lists.neo4j.org
> Subject: Re: [Neo4j] Modelling with neo4j
>
> You raise interesting questions, most of them very much related to the work I 
> did on Enhanced API.
>
> Let me start with the distinction between Node and Relationship, which in my 
> opinion too is a bit artificial. I understand when creating a graph database, 
> it is helpful to have something like vertices and edges, but indeed see those 
> more as modalities of the elements of the graph than as clearly separated 
> types. This was one of the reasons to unify all elements of the graph with 
> one underlying type.
>
> At the time, I saw two option:
>
> a) make the graph bipartite, so that all relationships and properties become 
> nodes and use relationships only as a hidden linking feature
> b) create shado

Re: [Neo4j] Modelling with neo4j

2011-09-24 Thread Rick Bullotta
We're using Neo4J to model the "real world" with "things" here at ThingWorx as 
well.  See my responses to Niels for some specifics.


From: user-boun...@lists.neo4j.org [user-boun...@lists.neo4j.org] On Behalf Of 
loldrup [lold...@gmail.com]
Sent: Saturday, September 24, 2011 1:52 AM
To: user@lists.neo4j.org
Subject: [Neo4j] Modelling with neo4j

I'm trying to figure out how to model the world most flexibly (okay, so I'm
sticking to modelling organisations for now, but still). My main problem
seems to occur when I want to allow the model to naturally expand in
complexity. Say we have the following relationship:

Joe is a janitor at the school.

This can easily be modelled with two entities and a relationship. Now say I
have some common properties for janitors. I would have to make a link from
the janitor-relation to some node denoting the type 'janitor' which could
then hold information on these common things. Unfortunately, relationships
doesn't support that.

Long story short: the problem is that sometimes I want my things to act as
things, sometimes as types, sometimes as interfaces, and I cannot know in
advance which of these modalities I'm going to need.

Therefore, I'm considering going with this model:

Imagine a graph in three layers. The lower layer represents things, the
middle layer represents types and the upper layer represents interfaces.
Initially i populate only the lowest layer, but as need arise I go back and
promote various things to also be types or interfaces. These then crop up in
the second and third layer of the graph, respectively. When this happens, a
vertical relationship is added between the element in the lower layer and
its new type/interface in three higher layers.

Now the question is: how to model this scheme in neo4j? A number of
challenges pops up:

* Neo4j relationships cannot be n-ary, so every relationship must be
modelled with a hyperrelationship, thus allowing future relations to the
second and third layers.

* In a modalities-are-a-changing-paradigm it doesn't really make sense to
distinguish between relations and entities; at different points in time, one
element may have to act in the roles of both. Neo4j however makes a
fundamental destinction between the two things. I could choose too model all
relationships as nodes, but will that not make graph traversals messy?

* Neo4j doesn't come with a type strong destinction between such three
layers of modalityy

--
View this message in context: 
http://neo4j-community-discussions.438527.n3.nabble.com/Modelling-with-neo4j-tp3363823p3363823.html
Sent from the Neo4j Community Discussions mailing list archive at Nabble.com.
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Modelling with neo4j

2011-09-24 Thread Rick Bullotta
That's a great summary, Niels.  Very similar to how we've applied Neo4J here at 
ThingWorx, though we've done most of the type system work (nodes and 
relationships are all typed/subtyped) in our application domain layer.  A few 
other items that we leveraged in our implementation that you may wish to 
consider:

- A common pattern we encountered was a "collection" of typed entities (e.g. a 
typed collection), and we implemented a specific model using supernodes for 
this.  This also allowed us to rapidly and easily iterate/search collections 
and also to organize nodes in a "human comprehensible way" that can be readily 
viewed with something like Neoclipse for troubleshooting purposes.  Also, if 
the type was "truck", we stamped the node with the type "truck" as a property 
(using enumerations with a custom int member) and used that same enum as the 
relationship type between the node and the collection node.  In our model, an 
entity has a single "type", but we implemented the concept of 
supertyping/subtyping in our domain model

- We found quite a few examples where a "one-way relationship" was more than 
adequate and, instead of incurring the overhead of a relationship (particularly 
when millions of these relationships were attached to a single supernode), we 
used a *property* on a node containing the node id of the node it references.  
Sounds like a hack, but it actually has substantial performance advantages, 
particularly if you are frequently adding/removing relationships to/from the 
supernode

- We overlaid our own REST API on our domain model, and wanted to come up with 
a simple way to resolve the URI for any given node/entity.  For that, we used a 
pattern for which each node can have an optional "parent" node type.  Example:  
a blog comment is always attached to a blog entry or other blog comment.  A 
blog entry is always attached to a blog.  A blog is always attached to the 
blogs collection, and so on.  Each node has a name and/or an ID.  Because those 
relationship "patterns" are well known, it is a trival matter to create the URI 
to any entity given only its node, e.g.:

/Blogs/MyBlog/Entries/103/Comments/204

Of course, it works the other way as well - easy to parse and traverse.

- We often found that there were data structures in our application domain for 
which it was OK to be "opaque" - e.g. although the structures were deep and 
complex, they did not require searchability or traversability (e.g. they were 
kind like "object blobs"), so in our metamodel, they are not stored as nodes, 
relationships, and properties, but rather, as a JSON blob, serialized as a 
string to a node property.  That has worked out really well.  When we do need 
to filter/manipulate those, we do them at the domain level

Just wanted to share some more examples.

Rick


From: user-boun...@lists.neo4j.org [user-boun...@lists.neo4j.org] On Behalf Of 
Niels Hoogeveen [pd_aficion...@hotmail.com]
Sent: Saturday, September 24, 2011 9:14 AM
To: user@lists.neo4j.org
Subject: Re: [Neo4j] Modelling with neo4j

You raise interesting questions, most of them very much related to the work I 
did on Enhanced API.

Let me start with the distinction between Node and Relationship, which in my 
opinion too is a bit artificial. I understand when creating a graph database, 
it is helpful to have something like vertices and edges, but indeed see those 
more as modalities of the elements of the graph than as clearly separated 
types. This was one of the reasons to unify all elements of the graph with one 
underlying type.

At the time, I saw two option:

a) make the graph bipartite, so that all relationships and properties become 
nodes and use relationships only as a hidden linking feature
b) create shadow nodes for relationships and properties when needed and let the 
API handle that transparently

I chose for option b for performance reasons. There are likely many 
applications where most of the relationships are simple, ie. link two nodes 
while possibly having some properties. Using a bipartite layout for such 
relationships adds nothing, but it takes twice as many links to traverse.

The shadow node solution only treats relationships and properties as special 
(having relationships to them) when that is needed.

Now to the typing issues. Neo4j has chosen not to add typing features to the 
database and I actually like that. It allows for optional type systems that can 
be used but are not enforced to be used.

Type systems are nice beasts, especially when dealing with large and complex 
applications, but they impose a development overhead, mostly felt in small 
quick and dirty applications. This is true for programming languages, where 
many people prefer to use an untyped language such as Javascript, Python, Ruby 
and PHP over a typed language such as Java, Scal

Re: [Neo4j] Modelling with neo4j

2011-09-24 Thread Niels Hoogeveen

You raise interesting questions, most of them very much related to the work I 
did on Enhanced API.

Let me start with the distinction between Node and Relationship, which in my 
opinion too is a bit artificial. I understand when creating a graph database, 
it is helpful to have something like vertices and edges, but indeed see those 
more as modalities of the elements of the graph than as clearly separated 
types. This was one of the reasons to unify all elements of the graph with one 
underlying type.

At the time, I saw two option: 

a) make the graph bipartite, so that all relationships and properties become 
nodes and use relationships only as a hidden linking feature
b) create shadow nodes for relationships and properties when needed and let the 
API handle that transparently

I chose for option b for performance reasons. There are likely many 
applications where most of the relationships are simple, ie. link two nodes 
while possibly having some properties. Using a bipartite layout for such 
relationships adds nothing, but it takes twice as many links to traverse.

The shadow node solution only treats relationships and properties as special 
(having relationships to them) when that is needed. 

Now to the typing issues. Neo4j has chosen not to add typing features to the 
database and I actually like that. It allows for optional type systems that can 
be used but are not enforced to be used. 

Type systems are nice beasts, especially when dealing with large and complex 
applications, but they impose a development overhead, mostly felt in small 
quick and dirty applications. This is true for programming languages, where 
many people prefer to use an untyped language such as Javascript, Python, Ruby 
and PHP over a typed language such as Java, Scala, C# or Haskell and I think it 
is also true for databases. I think one of the reasons NOSQL became so popular 
is because the type system of an RDBMS adds overhead to simple applications. 

An RDBMS needs a type system because the storage layout requires that. Tables 
have a fixed number of columns, where each column has a designated type. While 
this is a great feature when processing massive amounts of similar data, it can 
also make the application brittle. The tight coupling between type system and 
storage layout makes that rapid schema evolution is not easy to do.

Neo4j doesn't impose a type system like an RDBMS does, because its storage 
layout doesn't require it. Something is either a node, a relationship or a 
property, but the combinations don't need to explicit modelling for the sake of 
storage.

Because of this untyped nature of the database, it now becomes possible to add 
a type system that not only is optional, but can in fact be made as strong or 
as weak as the application demands.

Unfortunately Neo4j doesn't provide all the necessary hooks for a type system, 
another reason why I started Enhanced API. It was not my intention with that 
API to provide a full fledged type system to Neo4j, but to provide the 
necessary hooks so a type system can be created.

Of course there is some type-creep in Neo4j. Properties and relationships have 
names, which in almost every application are used as types. Say we have several 
nodes we like to use to store information about people, where each of those 
nodes has a property "last_name". This property name effectively is used as a 
type. For all nodes the property name will denote the same fact: the last name 
of a person. 

This is not necessarily required by the Neo4j database. Different nodes may use 
the same property name to denote different things even with different 
datatypes. It is possible to have nodes with property name "last_name" that for 
some nodes is a String while it is an Integer for other nodes. While this is 
possible, I venture this is not all that common. The same property name will 
likely be used to denote the same fact and have the same datatype across the 
graph and therefore in most common cases be used like a type. 

The same applies to relationships, where the name will in general be used to 
denote the same type of relationship. It is unlikely an application with use 
the "FRIEND" relationship to sometimes denote a friendship between two people 
while at other times use that relationship name to denote the address of a 
building.

This is as far as typing goes in Neo4j, but it is there and means we have to 
incorporate it into the API somehow. 

This is the reason why I decided to add subtyping of relationship-types and 
property-types in the API, a feature that may be of interest to the model you 
describe in your email.

Joe is a janitor at the school.

Here we see three elements: "Joe", "is janitor at", and "the school", which can 
indeed be modeled with two nodes and a relationship.

There is however a more general statement here of the form: person works with 
organization. Suppose we want to store the fact:"

Jane is principal of the school. Again we can model this