Re: Next version of the LOD cloud diagram. Please provide input, so that your dataset is included.

Kingsley Idehen Sun, 05 Sep 2010 09:21:29 -0700

 On 9/5/10 11:00 AM, Alan Ruttenberg wrote:

On Sun, Sep 5, 2010 at 5:08 AM, Chris Bizer<ch...@bizer.de>  wrote:

Hi Alan,

I have just spent some time evaluating one source and reported to you
the result. Perhaps you might act on this investment in time and thank
me for doing so. You might find that the result was myself and more
people doing such quality control.

Sorry that my reply yesterday might have been a bit too harsh.

I have looked up the CAS license (http://www.cas.org/legal/infopolicy.html)
and added a reference to the description of the CAS dataset at

http://ckan.net/package/bio2rdf-cas

Please also note that CKAN provides a rating function for the datasets and
also provides for commenting and discussing the datasets.

Maybe people could use these features as a start to collect quality-related
meta-information about the datasets.

CKAN also provides a link to the http://www.isitopendata.org/ service, which
might be used for license inquiries.

Dear Chris,

As I said, the first line on the CKAN home page says: "CKAN is a
registry of open data and content packages.". Therefore I think there
is a reasonable expectation that the packages registered there are
open. I maintain that CKAN should either change how it explains itself
to make clear that it is a registry of packages that may or may not be
open, or it should remove the packages that are not known to be open.
I'm not taking a position one way or another which they should do
(that's their business), but they should say what they do, and do what
they say.

Thank you for your pointers to further information on how to find
licenses. I'm fairly familiar with this area given that I work for
Creative Commons.


Chris,

The critical point here is that CKAN should simply make the correct Alanis suggesting. As you know, we don't need misleading headlines in theLOD realm, it ultimately causes problems.

Anyway, this is maybe more of a CKAN issue, so I am hoping that Jonathanis reading this thread and takes this as a cue to fix the title, that'sall. Basically, this is about publicly available structured data thatmay or may not be "Open". Basically, making something available to thepublic still doesn't imply that it's actually "Open" etc..



I think we can fix this little issue.

Kingsley

I agree with you that the quality of Linked Data published on the Web is
crucial, but we also have to take into account that much of the data in the
LOD cloud is currently still published by research projects in order to
demonstrate the technologies.

As the Web of Data is evolving and more and more actual owners of the
datasets start to provide them as Linked Data, I hope that the quality will
also increase and the datasets will be keep current. Encouraging
developments into this direction currently happen in the libraries,
eGovernment, and eCommerce domains.

I agree that these are good examples. I would suggest that you focus
on including the good examples in the LOD cloud, or at a minimum
remove those, like CAS, that fall below the minimal standard of
supplying *some* data and being *open*, so that "linked open data"
means something coherent.

On the other hand, the Web is an open system and we will thus always see
people publishing low-quality, wrong and misleading data. Google handles
this fact rather successfully using PageRank. As the Web of Data provides
more structure then the classic Web, I think we might even be able to apply
more sophisticated data-quality assessment heuristics to decide which data
we want to use in our applications and which to ignore. Some of these
methods are listed in [1].

Look, Chris, I just did a "manual page rank" on the CAS dataset. It is
meaningless.  This is a high quality assessment. If the movement can't
act on known good quality information I (and others) will doubt that
automatic algorithms will be credible.

Moreover, the LOD cloud diagram is an advertisement. There are enough
data sets now that inclusion in the diagram can become a reward for
good work. It's not good advertising for Google when junk sites come
up at the top of search results and they do their best to minimize
this occurrence. The LOD cloud is your front page, and to a certain
extent mine as well as I invest all my time in doing work towards
building the web of data in the Sciences.

Regards,
Alan

Best,

Chris

[1] Christian Bizer, Richard Cyganiak: Quality-driven information filtering
using the WIQA policy framework. Journal of Web Semantics: Science, Services
and Agents on the World Wide Web, Volume 7, Issue 1, January 2009, Pages
1-10.
http://dx.doi.org/10.1016/j.websem.2008.02.005


-----Ursprüngliche Nachricht-----
Von: Alan Ruttenberg [mailto:alanruttenb...@gmail.com]
Gesendet: Samstag, 4. September 2010 22:20
An: Chris Bizer
Cc: Anja Jentzsch; public-lod@w3.org; Leigh Dodds; Jonathan Gray
Betreff: Re: Next version of the LOD cloud diagram. Please provide input, so
that your dataset is included.

On Sat, Sep 4, 2010 at 3:43 PM, Chris Bizer<ch...@bizer.de>  wrote:

So rather than to criticize the work that other people do on collecting
meta-information about the datasets in the LOD cloud

Did you read what I wrote? I made no comment on the adequacy of
metainformation. In fact I *used* that metainformation to point out
that the data source in question did not satisfy the "open" provision
of linked *open* data. In addition I criticized the *inclusion* of the
data set in the *lod cloud diagram* because of this lack of openness
and because the actual content of that resource didn't resemble any
data in the resource that it was derived from (a registry of
information about chemical compounds), suggesting that it would hurt
the LOD effort as inclusion would be a kind of "false advertising".

-Alan



--

Regards,

Kingsley Idehen 
President&  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen

Re: Next version of the LOD cloud diagram. Please provide input, so that your dataset is included.

Reply via email to