On 4/7/11 3:17 PM, Patrick Logan wrote:
Note: my response is only going to public-lod because I wanted to
choose just one, I subscribe to it, and this is where "data quality"
takes on new, well, qualities, from it's more typical application
within a single enterprise.
Okay, but note, DBpedia and Semantic Web mailing lists were initially in
cc. list due to relevance and the splintered nature of subscriptions.
Anyway, let's see how it goes confining this to LOD :-)
On Thu, Apr 7, 2011 at 11:06 AM, Kingsley Idehen<kide...@openlinksw.com> wrote:
Personally, I subscribe to the doctrine that "data quality" is like "beauty"
it lies strictly in the eyes of the beholder i.e., a function of said
beholders "context lenses".
First and foremost is the "eyes of the beholder" have to set different
expectations for public LOD than they would for something like
"enterprise LOD".
I do see Linked Data and Linked Open Data as quite distinct. The former
is about whole data representation using Links, where content creation
is constrained by a conceptual schema grounded in first-order logic.
Linked Open Data is about the application of the aforementioned to
publicly available structured data e.g., via the World Wide Web. That
said, in either realm, quality remains constrained by context fluidity
of the beholder.
If I had the time to illustrate and animate "context fluidity" I would
have an animation that showcased individual context halos (around
people's head like the Biblical depictions) and the fluidity inherent in
their time variant states of mind i.e., real sense of "current status"
or "current state of mind" beyond what you see today re. Twitter and
OStatus oriented apps.
Ideally each source of data would publish something about their DQ
goals and current status, so consumers have an idea what to expect and
where improvements may be heading.
Via the power of Linked Data graphs, each person can filter data in such
a way that everyone sees what's individually important to them, and then
in the cause of information exchange differences are ironed out,
basically a reconciliation process driven by a specific pragmatic goals.
In a sense, most enterprises sorta try to work this way but the IT
infrastructure fails, woefully. Therein, lies one of many massive
opportunities for Linked Data across Intranets and the InterWeb.
As a community, public LOD providers and consumers have to discuss the
quality of these various sources and the implications for things such
as "same as" and "counting".
Yes, discussion is good, but cognitive beings are wired (I believe) with
the ability to observe aspects of the same Subject differently. Thus,
keeping all the layers loose is paramount. I can never really dictate to
you what constitutes quality data, all we can do is attempt to reconcile
individual observations of common Subjects. Some cases we'll agree and
sometime we wont. The beauty of the Web (to me) is that it's
architecture ultimately allows everyone to "agree to disagree", without
going to war. For example, without 404's the Web would have been yet
another failed global network attempt :-)
A foundation for that are the
specifications of the public data and how to specify aspects of
quality, and how to *publish* that DQ information in a consumable
way... make DQ statements part of the public LOD.
I think the Web will allow user agents coalesce around data spaces that
over value. Others will simply wither away over time. No set of
draconian rules will avert this reality because said reality is wired
into the fabric of scale-free networks such as the Web.
I believe Data Wikis will go long way to crowd sourcing data
reconciliation. Of course, for that to happen you need access control
lists (ACLs) and verifiable identity, which is why the WebID protocol
(an application of Linked Data) is so important to this whole topic of
subjective data quality.
If the logic is already making its way into the data, why not make
conversations about data reconciliation part of the data too? Wikipedia
sorta, works, but Data Wikis will take this matter to much greater
heights. We'll never be able to compute "Why" from "Who", "What",
"Where", and "When" data with 100% precision. Adding reconciliatory
conversations into the data via Data Wikis will get us much closer than
we are today.
Kingsley
-Patrick
--
Regards,
Kingsley Idehen
President& CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen