Re: [agenda] eGov IG Call, 25 Nov 2009, item 6

Chris Beer Wed, 25 Nov 2009 06:03:41 -0800

I think Thomas makes some excellent points.

Is it possible as a group to agree on something akin to the following?

1) Open Data refers to how data is accessed and is primarily apolitical/policy consideration2) Linked (Open) Data refers to how data is structured and delivered andis primarily a technological/standards consideration3) The majority of datasets, LOD or not, that are of real value, aredeveloped, maintained and delivered by Government, like it or not. Weknow this without even looking at the LOD Projects work<http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData#head-277d7f68544ce1a9e252f5c0080b6402cd983a49>(which interestingly, contains very little Government data, which is aworry as it possibly indicates that Governments just AREN'T getting onboard with early take up of LOD, despite the various legal requirementscoming out world wide).

3) We accept that Linked (Open) Data is the purview of the Linking OpenData W3C Project - there is probably little we can add to the discussionhere apart from supporting them in thier own work of IDing datasets thatcan be linked.

In support of this point, e-Government will be as any other entity inthis regard, and the methodologies in delivering LOD will not likelydiffer to the rest of the world or society, much as there is littledifference in Web Content Delivery between Government models andCommercial/Public models. In that sense I agree with Thomas 100% when itcomes to a technology model. It will be Semantic, and RDF is likely tobecome the dominant paradigm, if not the only one.

5) Open Data therefore is what we SHOULD be focused on - not in thesense of forcing a standard on Gov in terms of Open Data Deliverypolicy, but in Education and Outreach.

The question of non-RDF data consumers is almost moot. Given the timescales we are operating on, it is akin to asking at the start of thefirst version of HTML "how does hyperlinked content support .txt basedusers such as BBS systems". Non semantic, non-RDF, pre HTML 5 browsersand technologies will be legacy before we know it, probably while we arestill discussing all this. I mean it.

This leaves us with two outcomes. The first is that the current userbase that Thomas identifies as professional RDF consumers willinevitably drive the conversion of their suppliers data into RDF/XMLformats, essentially as a snowball effect. GIS Data is a good example ofwhere this is already happening.

The second is that as Thomas says, human-readable formats HAVE to beprovided - ultimately the user is human, and the transition on the techside between how the machine reads it, and how it is displayed to theuser in a usable, displayable form should be seamless. Ultimately theuser should not even realise that they are doing anything but looking ata web page of results that they have asked a server for.

This is where I do disagree with Thomas. A Federation of providers is anice concept, but it is too far off to think about, and will beinevitable in the end so probably doesn't need to be focused on. Ibelieve that the key to overcoming the mistrust issue is three-fold:

a) Focusing on educating Governments on WOG methodologies in adoptinginter-agency delivery on a National level - ie: promote the creation ofthe data.gov.* model. The international model is far to scary a prospectfor most Governments to contemplate.

b) Educating Government on the ROI in making Data open to the public

c) Educating Government in ways in which "clearly marked-off data spaceswith a trusted provenance" can still mean open data delivery for all -essentially this already happens whenever data is published, even in aHTML/PDF format - having data in the public domain does not mean givingaccess to the original uncorrupted dataset.


Just some thoughts.

Cheers

Chris

Thomas Bandholtz wrote:

There has been much discussion about *Open* Data in the eGov list these
days, which is a rather political question.
I am currently not so much concerned about openness, more about *Linked*
Data, as we have tons of government data with a legal obligation to make
them available to the public (at least in Europe, and especially
environmental data), and we are looking for means to do so in the most
efficient way.

So, among the six items of today's agenda, I find number 6 the most
challenging:

6. Discussion: Government Linked Data, Techniques and Technologies
[35min]

some considerations:

+ how does linked data support (non-RDF) data consumers?

First of all: Linked Data supports RDF data consumers.

Human readable formats should also be provided based on content
negotiation. Some providers have dedicated HTML formats, others have
not. Those who haven't depend on some available, general purpose "linked
data browser".
The latest discussion about the state of such tools has been started by
http://lists.w3.org/Archives/Public/public-lod/2009Oct/0105.html, and I
am afraid the state-of-the-art of such browsers cannot compete with a
well-made dedicated HTML page (how could it).

So one might say linked data supports non-RDF data consumers rather
badly, but there a two objections:

    * even non-RDF data consumers benefit from the availability of some
      linked data which would not be available in the Web at all if not
      generated with D2R (or similar)
    * even non-RDF data consumers benefit from the extensive and
      systematic linkage provided by Linked Data which is rather unusual
      for common HTML pages.

I think the value of this question is somehow disputable, as - aside
form any content negotiation - linked data supports RDF consumers at
first. These consumers are mostly professionals who depend on government
data in order to do their work. So I would rather ask:

"How do professional RDF data consumers integrate linked data into their
working data bases today?"

+ strategies for modelling government data

Well, I would say, the basic model is RDF in this case ;-).
We are wasting too much time with efforts on "harmonising" models in a
waterfall manner (see http://inspire.jrc.ec.europa.eu/, for example)
instead of just publish it somehow.

One of TBL's Do's and Don'ts reads:
"Do NOT wait until you have a complete schema or ontology to publish data. "
http://www.w3.org/DesignIssues/GovData

I do not see any problem about schema diversity. However, we should make
use of existing schemas which have proved to work well. For example, the
OGC Observation and Measurement XML schema:
http://www.opengeospatial.org/standards/om

OM is expressed as an XML schema, not in RDF so far. But it expresses
perfectly clarified semantics about any kind of measurement data of
whatever kind of sensor, including timelines. XSD and URN patterns are
some drawbacks of this formalisation, but this could be resolved by a
RDF reformulation of the same semantics easily.

The most important aspect again is linkage. When expressing what or
where has been measured, don't use a dumb character string, but link to
a reference vocabulary.

+ essential metadata for Government Linked Open Data (eg VoiD)

VoiD is a good start. I wouldn't overestimate the need for metadata as
long as you can access the data itself. Metadata was a great thing in
former times when data access was a complex issue, so you would like to
know what you will get before starting the effort to get access to it.
If the data itself is linked to reference vocabularies extensively, the
data vs. metadata discussion ends in smoke.

+ expressing rights and licensing information

VoiD can do this.

+ approaches to provenance, authority and trust

Government generally is not so amused about the open world assumption,
they prefer clearly marked-off data spaces with a trusted provenance.

I think mistrust can be overcome by federation of providers. Federated
agencies can easily state that they trust each provider in this
federation. Just set up a domain for such a federation, link to this
federation from the data, and to the data from the federation.

No problem if anybody is publishing her own possibliy weird statements
about the same things as long as the federarion does not link to this data.

One rather developed case of such a sub-cloud is Linking Open Drug Data
(LODD).
see http://esw.w3.org/topic/HCLSIG/LODD
We might learn from them.

+ using RDF for Statistical Data

Parts of EUROSTAT have been published in SCOVO
http://sw.joanneum.at/scovo/schema.html.
Even SDMX is apparently moving towards SCOVO.
Does anyone see an alternative approach?



Looking forward to discussion this afternoon (well, in my time).

Thomas

(consulting the Federal Environment Agency in Germany)

Re: [agenda] eGov IG Call, 25 Nov 2009, item 6

Reply via email to