On 4/7/11 3:17 PM, Patrick Logan wrote:
Note: my response is only going to public-lod because I wanted to
choose just one, I subscribe to it, and this is where "data quality"
takes on new, well, qualities, from it's more typical application
within a single enterprise.

Okay, but note, DBpedia and Semantic Web mailing lists were initially in cc. list due to relevance and the splintered nature of subscriptions. Anyway, let's see how it goes confining this to LOD :-)

On Thu, Apr 7, 2011 at 11:06 AM, Kingsley Idehen<kide...@openlinksw.com>  wrote:
Personally, I subscribe to the doctrine that "data quality" is like "beauty"
it lies strictly in the eyes of the beholder i.e., a function of said
beholders "context lenses".
First and foremost is the "eyes of the beholder" have to set different
expectations for public LOD than they would for something like
"enterprise LOD".

I do see Linked Data and Linked Open Data as quite distinct. The former is about whole data representation using Links, where content creation is constrained by a conceptual schema grounded in first-order logic. Linked Open Data is about the application of the aforementioned to publicly available structured data e.g., via the World Wide Web. That said, in either realm, quality remains constrained by context fluidity of the beholder.

If I had the time to illustrate and animate "context fluidity" I would have an animation that showcased individual context halos (around people's head like the Biblical depictions) and the fluidity inherent in their time variant states of mind i.e., real sense of "current status" or "current state of mind" beyond what you see today re. Twitter and OStatus oriented apps.
Ideally each source of data would publish something about their DQ
goals and current status, so consumers have an idea what to expect and
where improvements may be heading.

Via the power of Linked Data graphs, each person can filter data in such a way that everyone sees what's individually important to them, and then in the cause of information exchange differences are ironed out, basically a reconciliation process driven by a specific pragmatic goals. In a sense, most enterprises sorta try to work this way but the IT infrastructure fails, woefully. Therein, lies one of many massive opportunities for Linked Data across Intranets and the InterWeb.

As a community, public LOD providers and consumers have to discuss the
quality of these various sources and the implications for things such
as "same as" and "counting".

Yes, discussion is good, but cognitive beings are wired (I believe) with the ability to observe aspects of the same Subject differently. Thus, keeping all the layers loose is paramount. I can never really dictate to you what constitutes quality data, all we can do is attempt to reconcile individual observations of common Subjects. Some cases we'll agree and sometime we wont. The beauty of the Web (to me) is that it's architecture ultimately allows everyone to "agree to disagree", without going to war. For example, without 404's the Web would have been yet another failed global network attempt :-)

A foundation for that are the
specifications of the public data and how to specify aspects of
quality, and how to *publish* that DQ information in a consumable
way... make DQ statements part of the public LOD.

I think the Web will allow user agents coalesce around data spaces that over value. Others will simply wither away over time. No set of draconian rules will avert this reality because said reality is wired into the fabric of scale-free networks such as the Web.


I believe Data Wikis will go long way to crowd sourcing data reconciliation. Of course, for that to happen you need access control lists (ACLs) and verifiable identity, which is why the WebID protocol (an application of Linked Data) is so important to this whole topic of subjective data quality.

If the logic is already making its way into the data, why not make conversations about data reconciliation part of the data too? Wikipedia sorta, works, but Data Wikis will take this matter to much greater heights. We'll never be able to compute "Why" from "Who", "What", "Where", and "When" data with 100% precision. Adding reconciliatory conversations into the data via Data Wikis will get us much closer than we are today.

Kingsley
-Patrick




--

Regards,

Kingsley Idehen 
President&  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen






Reply via email to