Re: How To Do Deal with the Subjective Issue of Data Quality?

Kingsley Idehen Thu, 07 Apr 2011 14:21:23 -0700

On 4/7/11 3:17 PM, Patrick Logan wrote:

Note: my response is only going to public-lod because I wanted to
choose just one, I subscribe to it, and this is where "data quality"
takes on new, well, qualities, from it's more typical application
within a single enterprise.

Okay, but note, DBpedia and Semantic Web mailing lists were initially incc. list due to relevance and the splintered nature of subscriptions.Anyway, let's see how it goes confining this to LOD :-)

On Thu, Apr 7, 2011 at 11:06 AM, Kingsley Idehen<kide...@openlinksw.com>  wrote:

Personally, I subscribe to the doctrine that "data quality" is like "beauty"
it lies strictly in the eyes of the beholder i.e., a function of said
beholders "context lenses".

First and foremost is the "eyes of the beholder" have to set different
expectations for public LOD than they would for something like
"enterprise LOD".

I do see Linked Data and Linked Open Data as quite distinct. The formeris about whole data representation using Links, where content creationis constrained by a conceptual schema grounded in first-order logic.Linked Open Data is about the application of the aforementioned topublicly available structured data e.g., via the World Wide Web. Thatsaid, in either realm, quality remains constrained by context fluidityof the beholder.

If I had the time to illustrate and animate "context fluidity" I wouldhave an animation that showcased individual context halos (aroundpeople's head like the Biblical depictions) and the fluidity inherent intheir time variant states of mind i.e., real sense of "current status"or "current state of mind" beyond what you see today re. Twitter andOStatus oriented apps.

Ideally each source of data would publish something about their DQ
goals and current status, so consumers have an idea what to expect and
where improvements may be heading.

Via the power of Linked Data graphs, each person can filter data in sucha way that everyone sees what's individually important to them, and thenin the cause of information exchange differences are ironed out,basically a reconciliation process driven by a specific pragmatic goals.In a sense, most enterprises sorta try to work this way but the ITinfrastructure fails, woefully. Therein, lies one of many massiveopportunities for Linked Data across Intranets and the InterWeb.

As a community, public LOD providers and consumers have to discuss the
quality of these various sources and the implications for things such
as "same as" and "counting".

Yes, discussion is good, but cognitive beings are wired (I believe) withthe ability to observe aspects of the same Subject differently. Thus,keeping all the layers loose is paramount. I can never really dictate toyou what constitutes quality data, all we can do is attempt to reconcileindividual observations of common Subjects. Some cases we'll agree andsometime we wont. The beauty of the Web (to me) is that it'sarchitecture ultimately allows everyone to "agree to disagree", withoutgoing to war. For example, without 404's the Web would have been yetanother failed global network attempt :-)

A foundation for that are the
specifications of the public data and how to specify aspects of
quality, and how to *publish* that DQ information in a consumable
way... make DQ statements part of the public LOD.

I think the Web will allow user agents coalesce around data spaces thatover value. Others will simply wither away over time. No set ofdraconian rules will avert this reality because said reality is wiredinto the fabric of scale-free networks such as the Web.

I believe Data Wikis will go long way to crowd sourcing datareconciliation. Of course, for that to happen you need access controllists (ACLs) and verifiable identity, which is why the WebID protocol(an application of Linked Data) is so important to this whole topic ofsubjective data quality.

If the logic is already making its way into the data, why not makeconversations about data reconciliation part of the data too? Wikipediasorta, works, but Data Wikis will take this matter to much greaterheights. We'll never be able to compute "Why" from "Who", "What","Where", and "When" data with 100% precision. Adding reconciliatoryconversations into the data via Data Wikis will get us much closer thanwe are today.


Kingsley

-Patrick



--

Regards,

Kingsley Idehen 
President&  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen

Re: How To Do Deal with the Subjective Issue of Data Quality?

Reply via email to