Re: Comments on Data 3.0 manifesto

Kingsley Idehen Sun, 18 Apr 2010 12:09:00 -0700

Jiří Procházka wrote:

So essentially, all this is a cover-up maneuver to sell RDF to the
people masked as something else, more familiar?

Certainly not!


This about "Stars Wars" the prequel, basically how the prequel came to be.

The Data Model aspect of Linked Data has been conflated with a MarkupLanguage (inadvertently), and as a result reaction to the Markup is nowobscuring routes to the essence of the matter: use of a common datamodel to for heterogeneous data virtualization and across-the-wirereference.

If so, I understand why you feel this is necessary, after all the goal
is not to sell the customer what he asked for, but what he really wanted
but didn't realize or could fully express (this time customer being tech
folk).


In a sense yes, but via cleaner separation of the constituent parts.

Anyway I rather use and try to market RDF as it is, maybe it's a bit too
fast for some, but I guess I've left too little people in utter
confusion yet to try so different ways :)

RDF is generally perceived as Markup. You will have mix results -- atbest -- when you start with RDF, on any marketing quest.

Personally, you'll have more success with RDF when you have a audiencethat's bought into the fundamental EAV Data Model. Basically, RDF'sstrength lies in the fidelity that it brings to Linked Data at globalscale re. HTTP networks like the World Wide Web.

But before proceeding with your plan to fix RDF + Linked Data marketing,
I ask you to consider also what in marketing RDF was done right beside
what wasn't.

This is not about marketing, this is more about clear education byseparation of parts and re-connecting relevant parts to theirappropriate place within a broader innovation continuum re. data access,data integration, and data management.

What we are really dealing with here is the ability to virtualizeheterogeneously shaped data, across disparate data sources. The effectsof virtualization at the hardware and operating system layers are clear(i.e., Cloud Computing), why not data? Do we need to pay an "RDF Tax" enroute to data virtualization comprehension and appreciation?

For example RDF has clear name (Data 3.0? not very good name IMHO), the
core model is very simple and has been numerously very well explained.
On the other hand your manifesto sounds a bit too complex, more like a
spec than a manifesto. For the effect I think you are aiming you need
something very simple and striking...
Not to mention it is first time I am hearing about EAV model, we all are
from different backgrounds so this terminology won't have much of an
impact I fear, though it is still good to introduce yet distant
communities.... ;)


The fact you are hearing about EAV for the first time makes my point.

EAV is very very old. It Precedes both the World Wide Web, and theSemantic Web Project, let alone Linked Data. I urge you to read [1][2] .

For me greatest value of RDF and Linked Data lies in semantics - the
ontologies (RDFS/OWL), which, as far as I understand it, the EAV model
doesn't touch at all which in my eyes makes it only a bit better than
tabular data models ("rectangular" as someone nicely coined some time
ago somewhere).

The biggest problem we have is this: we don't do "mutual inclusion"anymore, everyone wants to be the "new inventor". Basically, we are inthe era of boolean "OR" rather than "AND".

All the items you list re. "semantics" are expressed in 3-tuples(triples) re. RDF, then partitioned across TBoxes (definitions) andABoxes (instances of the definitions). You don't have triples without anEAV model. Where do you think the "Triples" come from?

RDF just adds HTTP scheme based Identifiers to the mix so that you havea single Identifier for:


1. Naming a Referent

2. Accessing a Representation of a Referent's Description (carried by aDescriptor Document or Resource).

How do you think programming languages have worked since the start oftime? I know of none that don't cater for items 1&2 above.

Overall it seems to me like building a sand island in middle of a wide
river to ease construction of bridges across it... I guess you have
tried building a bridge without the island a few times and it collapsed
every time, so I understand why you are building the island. But maybe I
got better steel and mine bridges would last... maybe...
On one hand I am glad we try these various ways and on the other I keep
myself asking if the gain outweighs the price of fragmentation...

I hope you hold the same view once you've read up on the history of theEAV Data Model. I'll leave it at that for now :-)


Links:

1. http://en.wikipedia.org/wiki/Entity-attribute-value_model
2. http://ycmi.med.yale.edu/nadkarni/eav_cr_frame.htm .


Kingsley

Best,
Jiri Prochazka

On 04/17/2010 10:51 PM, Kingsley Idehen wrote:

John Erickson wrote:

Hi Kingsley!

Reading between the lines, I think I grok where you are trying to go
with your "manifesto." For it to be an effective, stand-alone document
I think a few pieces are needed:

1. What is your GOAL? It should be clearly stated, something like, "to
promote best-practices for standards-compliant access to structured
data object (or entity) descriptors by getting data architects to do X
instead of Y," etc.

Okay, I'll see what I can do.

This document is really a continuation of a document that's actually
missing from the Web, sadly.

A long time ago (start of Web 2.0), there was a Data 2.0 manifesto by
Alex James (now at Microsoft), so in classic two-fer fashion I've opted
to kill two birds with a single stone:

1. Linked Data incomprehension (Technical and Political)

2. Data 2.0 manifesto upgrade and update.

2. What is your MOTIVATION? I think this is implicit in your current
text --- your argument seems to be that TBL's "Four Principles" are
not enough --- but you need to make your motivations explicit and
JUSTIFY them. If TBL's principles are too nebulous, explain concisely
why and what the implications are. Keep in mind that they seem to be
"good enough" for many practitioners today. ;)

My motivation is simply this: Get RDF out of the way!
The "RDF incomprehension cloud" is only second to what's heading across
Northern Europe from Iceland, re. obscuring a myriad of routes to Linked
Data comprehension.

How can we spend 12+ years on the basic issue of EAV + de-referencable
identifiers? Compounded by poor monikers such as: Information Resource
and Non-Information Resource. We have Data Objects (Entities, Data Items
etc.) and their associated Descriptor Documents (Representation Carriers
or Senses), its always been so!

Note,  RDF "the Data Model" doesn't exist in the minds of the broader
Web audience (I am not sending an inbound meme to the Semantic Web
Community, my meme is being beamed to a wider audience that's taking way
to long to grok the essence of the Linked Data matter).

I (and many others) are utterly fed up with trying to accentuate the
fact that RDF is based on a Graph Data Model. The initial "RDF/XML is
RDF" conflation has dealt a fatal blow to RDF re., broad audience
communications.

EAV has been with us forever, people already use applications that are
based on this model, across all major operating systems. Why not
triangulate from this position (top down) instead of bottom up (which
ultimately reeks of NIH rather than a Cool Tweak)?

3. Be SPECIFIC about what practitioners must do moving forward. I
think you've made a good start on this, to the extent that you have
lots of "SHOULDS." I would argue that more specificity of a different
kind is needed; if data architects SHOULD be following more abstract
EAV conceptualizations, what exactly should they do in practice?

Hmm.. will see what I can do.

This is a seed document (I hope). Anyone (including yourself) should be
able to add perspective to it etc..

Finally, on the deeper question of motivation, I suggest that while a
historical argument can be made that RDF is likely a subset or special
case of EAV, the community has developed convenient and familiar
languages for expressing RDF (such as N3 and Turtle); practitioners
are much less familiar with EAV. Does the community really lose
anything by using RDF as its shorthand?

RDF is a variant of EAV courtesy of Generic HTTP scheme Identifiers for
Names.

Nothing in what I am saying or seeking dislocates RDF from the big
picture here. It just isn't the item for starting conversations with
people outside the Semantic Web Community (still very small in the grand
scale of things).

I am simply seeking to extend the picture (coherently) without
unnecessary RDF specificity.

OData, GData, Core Data, are all EAV model based, in the very worst case
they make RDF based Linked Data easier to generate, thus a win-win.
Sadly, that isn't how these other EAV based approaches are perceived,
the gut instinct is to pick them apart as not conforming to the RDF
based Linked Data principles (btw -- when TimBL added RDF and SPARQL to
the meme, he basically put a crack in the pot IMHO).

Perhaps you can suggest a pattern within current RDF practice that
more strongly enforces EAV principles?

RDF is all fine re. EAV.
It about getting other communities (e.g. WEb 2.0) to adopt and exploit
EAV via use of de-referencable Identifiers (Names).

What I am hoping is that we just tweak how we introduce Linked Data,
establish the fact that we have a common data model at the base, a model
that we already use (in our heads) and across almost every application
we've worked with to date. Then show how Linked Data is ultimately about
deconstructing application data silos so that we have a much richer
corpus of structured data, across a myriad of boundaries (application,
OS, network etc..), that amenable to "data meshing" rather than "data
mashing".

IMHO. The Linked Data Value Proposition and Elevator Pitch is simply
this: Individual and/or Enterprise Agility via Data Silo Deconstruction.

Kingsley

John

On Sat, Apr 17, 2010 at 12:37 PM, Kingsley Idehen
<kide...@openlinksw.com> wrote:

Richard Cyganiak wrote:

Hi Kingsley,

Regarding your blog post at

http://www.openlinksw.com/dataspace/kide...@openlinksw.com/weblog/kide...@openlinksw.com%27s%20blog%20%5b127%5d/1624


Great job -- I like it a lot, it's not as fuzzy as Tim's four
principles,
not as mired in detail as most of the concrete literature around linked
data, and on the right level of abstraction to explain why we need
to do
certain things in linked data in a certain way. It's also great for
comparing the strengths and weaknesses of different data exchange
stacks.

Thanks, happy its resonating.

RDF has inadvertently caused mass distraction away from the fact that a
common Data Model is the key to meshing heterogeneous data sources.
People
just don't "buy" or "grok" the data model aspect of RDF, so why continue
fighting this battle, when all we want is mass comprehension, however
we get
there.

A few comments:

1. I'd like to see mentioned that identifiers should have global scope.

Yes, will add that emphasis for sure. I guess "Network" might not
necessarily emphasize that strongly enough.

2. I'd prefer a list of the parts of a 3-tuple that reads:

    - an Identifier that names an Entity
    - an Identifier that names an Attribute
    - an Attribute Value, which may be an Identifier or a Literal
(typed
or untyped).

  This avoids using the new terms “Entity Identifier” and “Attribute
Identifier”.

No problem.

3. “Structured Descriptions SHOULD be borne by Descriptor Resources”
-- I
think this one is incomprehensible, because “to bear” is such an
unusual
verb and has no clear connotations in technical circles. I'd
encourage a
different phrasing.

Will think about that, getting the right phrase here is is
challenging, so I
am naturally open to suggestions etc..

3b. Any chance of talking you into using “Descriptor Document”
rather than
“Descriptor Resource”?

No problem, "Descriptor Document" it is :-)

4. One thing that's left unclear: Can a Descriptor Resource carry
multiple
Structured Entity Descriptions or just a single one?

Descriptor Documents are compound in that they can describe a single
Entity
or a Collection.

5. Putting each term in quotes when first introduced is a good idea and
helps -- you did it for the first few terms but then stopped.

Writers exhaustion I guess, will fix.

6. I'm tempted to add somewhere, “Descriptor Resources are Entities
themselves.” But this might be a purposeful omission on your part?

Yes, this is deliberate because I am trying to say: "Referent" is the
"Thing" you describe by giving it a "Name" so, anything can be a
"Referent"
including a "Document" (which has always been problematic in general RDF
realm work e.g. the failure to make links between  a ".rdf" Descriptor
Document and the actual "Entity Descriptions" they contain etc. via
"primarytopic", "describedby", and other relations.

7. The last point talks about a “Structured Representation” of the
Referent's Structured Description. The term hasn't been introduced.
Shouldn't this just read “Descriptor Resource carrying the Referent's
Structured Description”?

Yes, so basically this is: s/bear/carry/g  :-)

What's your preferred name for the entire thing? I'm tempted to call it
“Kingsley's networked EAV model” or something like that. Do you
insist on
“Data 3.0”?

Well EAV is old, and one of my real inspirations for hamming its
relevance
to Linked Data is the fact that over the years I spoken with too many
people
that grok EAV but never connected it to the Semantic Web Project, or the
more recent Linked Data meme.

Imagine talking to founders of companies like Ingres, Informix, MySQL
etc..,
and witnessing them not making the EAV model connection; especially
when you
can't actually write a DBMS engine without comprehension of EAV,
Identifiers, and Data representation (simple or complex data
structure). How
ironic!

Best,
Richard

Thanks for the great feedback, I think we're getting closer to the
global
epiphany we all seek !!

--

Regards,

Kingsley Idehen       President & CEO OpenLink Software     Web:
http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen



--

Regards,

Kingsley IdehenPresident & CEOOpenLink SoftwareWeb: http://www.openlinksw.com

Weblog: http://www.openlinksw.com/blog/~kidehen

Twitter/Identi.ca: kidehen

Re: Comments on Data 3.0 manifesto

Reply via email to