Re: Performance Cost of Reification

Paul Houle Fri, 09 Oct 2015 12:15:13 -0700

These days I am a big fan of RDF* and SPARQL*,  which unifies RDF with the
property graph model.  On the other hand I used to hate blank nodes but I
learned to stop worrying and love them.  I am hoping anyway that Neo4J and
it's ilk become a gateway drug to the RDF world.





On Fri, Oct 9, 2015 at 9:44 AM, Andy Seaborne <[email protected]> wrote:

> It is nice that the Titan guys see RDF as something to compare to.
> Coincidently, I was giving a talk about Property Graph / Linked Data just
> recently at the European ApacheCon BigData conference.
>
>
> The Property Graph (PG) market is maybe x2 the size of the RDF market, and
> both are small.  The challenge is growing the graph market, not one form
> taking market share away from the other.
>
> And the key difference between graph databases and other data systems is
> modelling.  The differences between graph systems is not the key here.
>
> About reification, they are somewhat off track.  Reification is a quite
> specialised feature for limited use. It is not RDF's equivalent to
> attributes on links in PG.
>
> Let me make that concrete with an example simplified from Graph databases
> / chapter 3 (page 52 in my copy).  The book is written the Neo4J folks.
>
> Email provenance.
>
>     A sends_email_to B
>
> Now, you could reify that statement (the act by A of sending the email to
> B).
>
> Reification is way more powerful than just being about to add data to the
> triple.  It says "claim: A sends_mail_to B"  - several different and
> competing claims can be made. But let's continue assuming reification and
> assertion of the triple ... [*]
>
> <<A sends email to B>>
>     cc C
>     cc D
>     sentOn Tuesday
>
> In the same modelling way you could add attributes to a PG graph edge for
> sends_email_to.
>
> Both are anti-patterns (as chapter 3 notes).
>
> The email sent is an important concept so model it explicitly:
>
> A   sends       MSG
> MSG receivedBy  B
> MSG cc_to       C
> MSG cc_to       D
> MSG sentOn      "Tuesday"
>
> By modelling the email message as a first class concept, not implicit in
> the activity via reification/link attributes, you can better add
> information e.g. which servers it was transferred by and stored on, when
> was it received (this is email - that might be twice) and better query it
> (who else accessed it on receipt).  Modelling those on the act of sending
> is making life hard (how do you talk about a draft email?)
>
> MSG contents        URL_to_content
> MSG hasChecksum     0xABCDEF
> MSG receivedHeader  "from nm15-vm2.bullet.mail.ne1.yahoo.com ...."
>
> That last one is tricky - one sending of a message can result in different
> receivedHeaders depending on the receiver.
>
> This event based modelling, not reification.
>
>
> If you wanted a highly efficient reification-supporting RDF store, then
> build one.  No need to blindly store as multiple triples (its called
> compression!).  You don't see such stores because reification is a minor
> feature of RDF.  Event-based modelling and named graphs are often better.
>
>         Andy
>
> [*]
> << >> is syntax that I proposed in early SPARQL drafts pre 1.0 for
> reification support but didn't gain much support. It is still in the ARQ
> parser source but not active.
>
>


-- 
Paul Houle

*Applying Schemas for Natural Language Processing, Distributed Systems,
Classification and Text Mining and Data Lakes*

(607) 539 6254    paul.houle on Skype   [email protected]

:BaseKB -- Query Freebase Data With SPARQL
http://basekb.com/gold/

Legal Entity Identifier Lookup
https://legalentityidentifier.info/lei/lookup/
<http://legalentityidentifier.info/lei/lookup/>

Join our Data Lakes group on LinkedIn
https://www.linkedin.com/grp/home?gid=8267275

Re: Performance Cost of Reification

Reply via email to