Re: [agenda] eGov IG Call, 25 Nov 2009, item 6

2009-11-26 Thread Kingsley Idehen

Chris Beer wrote:

Thanks for the reply Kingsley.

/Kingsley Idehen wrote:/

/Chris Beer wrote:
/

/I think Thomas makes some excellent points.

Is it possible as a group to agree on something akin to the following?

1) Open Data refers to how data is accessed and is primarily a 
political/policy consideration

/
/Structured Data based on industry standard data representation 
formats. Just as UNIX came down to POSIX. Ditto Internet re. TCP/IP. 
Openness is about Standards, and has nothing to do with politics or 
philosophy.


You can institute policies that mandate the use of industry standard 
data formats re. data placed in the public domain or simply published 
for reuse by others.

/
/2) Linked (Open) Data refers to how data is structured and 
delivered and is primarily a technological/standards consideration

/
/To be precise: HTTP based Linked Open Data.  This is about the 
incorporation of HTTP scheme Identifiers into data that has be 
published using a standard data representation format.


Note: to get data into any standard data representation format there 
has to be a formal data model. At the most basic, said model takes 
the form: Entity-Attribute-Value. In the case of Linked Open Data, 
you have the intersection of the following:


1. EAV model
2. Standard Data Formats
3. HTTP scheme Identifiers (HTTP URIs)./
What I was in fact suggesting here is that we clearly define the 
difference between Data being Open as in access and policy surrounding 
it - the political/philosophical side of the coin, and Data being Open 
as in Standards and, as you so better put it - structured data - the 
technical side of the coin. The semantics surrounding the two are 
important - to date we have basically said in e-Gov IG "Lets make Open 
(Standard) Data Open (to the Public)" - anyone coming in with no 
background knowledge - potentially such as as those working in policy 
from a non-IT background that is covered by initial Working Draft 
's  "/To: Any government wishing to 
set-up data.gov.*" /(wiki version), is simply going to start to find 
it confusing. We have discussions on defining open data that are 
centered around the access/policy question, and we have discussions on 
Linked Open Gov Data that are veering into the technical. For the good 
of all involved, I feel we need to define, set, and stick to some 
basic terminology that doesn't confuse the two.

Amen!
/3) The majority of datasets, LOD or not, that are of real value, 
are developed, maintained and delivered by Government, like it or 
not. We know this without even looking at the LOD Projects work 
 
(which interestingly, contains very little Government data, which is 
a worry as it possibly indicates that Governments just AREN'T 
getting on board with early take up of LOD, despite the various 
legal requirements coming out world wide).

/
/How have you arrived at the above bearing in mind the pivotal role 
of DBpedia?  Basically, this is about a  Linked Open Data Space 
derived from Wikipedia snapshots which have little or no Govt. data. 
Of course, things get much better across depth, quality, and linked 
density dimensions when Govt. data is cross linked with LOD spaces 
like DBpedia etc./
Quite simply - It is Government that conducts the majority of hard 
statistical research and collates data. DBpedia, or indeed any other 
commercial enterprise, including Academia, does not equal the sum 
total of Linked Open Data. They do indeed provide a pivotal and 
valuable service - but only in the sense that Google does with 
searches. They are a "reseller" of Data in that sense - but Government 
is, and will remain for a long time, the primary producer of raw datasets.
What we really have to invest some time in re., general communications, 
is genuine appreciation the fact that HTTP based Linked Data, HTTP based 
Linked Open Data, or anything else associated with the EAV/CR with HTTP 
URIs model, comes down to embracing and extending time-tested best 
practices from the pre-Web DBMS realm. Basically, ignoring the DBMS 
realm (covertly or overtly) simply puts this whole endeavor in the: "Pot 
is Broken, so you now own it", bucket.


Mater-Details associations between tables in a DBMS is one of the most 
basic data interaction patterns, its typically underlies construction 
of  user oriented VIEWs and data entry forms. This very pattern 
reemerges in the Linked Data realm, but the roles of Master and Details 
become relative due to inherent contextual fluidity that pervades HTTP 
based Linked Data openness. Thus, in the eyes of the Government, 
publishing raw data in structured form is akin to publishing a Master 
lookup table; and from this perspective DBpedia, is an associated 
Details oriented Table. Likewise, In the eyes of DBpedia, the inverse is 
the case i.e., it see Govt Data (or any other 

Re: [agenda] eGov IG Call, 25 Nov 2009, item 6

2009-11-26 Thread Chris Beer

Thanks for the reply Kingsley.

/Kingsley Idehen wrote:/

/Chris Beer wrote:
/

/I think Thomas makes some excellent points.

Is it possible as a group to agree on something akin to the following?

1) Open Data refers to how data is accessed and is primarily a 
political/policy consideration

/
/Structured Data based on industry standard data representation 
formats. Just as UNIX came down to POSIX. Ditto Internet re. TCP/IP. 
Openness is about Standards, and has nothing to do with politics or 
philosophy.


You can institute policies that mandate the use of industry standard 
data formats re. data placed in the public domain or simply published 
for reuse by others.

/
/2) Linked (Open) Data refers to how data is structured and delivered 
and is primarily a technological/standards consideration

/
/To be precise: HTTP based Linked Open Data.  This is about the 
incorporation of HTTP scheme Identifiers into data that has be 
published using a standard data representation format.


Note: to get data into any standard data representation format there 
has to be a formal data model. At the most basic, said model takes the 
form: Entity-Attribute-Value. In the case of Linked Open Data, you 
have the intersection of the following:


1. EAV model
2. Standard Data Formats
3. HTTP scheme Identifiers (HTTP URIs)./
What I was in fact suggesting here is that we clearly define the 
difference between Data being Open as in access and policy surrounding 
it - the political/philosophical side of the coin, and Data being Open 
as in Standards and, as you so better put it - structured data - the 
technical side of the coin. The semantics surrounding the two are 
important - to date we have basically said in e-Gov IG "Lets make Open 
(Standard) Data Open (to the Public)" - anyone coming in with no 
background knowledge - potentially such as as those working in policy 
from a non-IT background that is covered by initial Working Draft 
's  "/To: Any government wishing to 
set-up data.gov.*" /(wiki version), is simply going to start to find it 
confusing. We have discussions on defining open data that are centered 
around the access/policy question, and we have discussions on Linked 
Open Gov Data that are veering into the technical. For the good of all 
involved, I feel we need to define, set, and stick to some basic 
terminology that doesn't confuse the two.
/3) The majority of datasets, LOD or not, that are of real value, are 
developed, maintained and delivered by Government, like it or not. We 
know this without even looking at the LOD Projects work 
 
(which interestingly, contains very little Government data, which is 
a worry as it possibly indicates that Governments just AREN'T getting 
on board with early take up of LOD, despite the various legal 
requirements coming out world wide).

/
/How have you arrived at the above bearing in mind the pivotal role of 
DBpedia?  Basically, this is about a  Linked Open Data Space derived 
from Wikipedia snapshots which have little or no Govt. data. Of 
course, things get much better across depth, quality, and linked 
density dimensions when Govt. data is cross linked with LOD spaces 
like DBpedia etc./
Quite simply - It is Government that conducts the majority of hard 
statistical research and collates data. DBpedia, or indeed any other 
commercial enterprise, including Academia, does not equal the sum total 
of Linked Open Data. They do indeed provide a pivotal and valuable 
service - but only in the sense that Google does with searches. They are 
a "reseller" of Data in that sense - but Government is, and will remain 
for a long time, the primary producer of raw datasets.


If there were huge chunks of Government Datasets floating around in the 
public domain waiting to be linked, it would of been done by someone 
already, and we wouldn't be having this discussion. As Thomas points 
out: /"we have tons of government data with a legal obligation to make 
them available to the public (at least in Europe, and especially 
environmental data), and we are looking for means to do so in the most 
efficient way." /While he is referring to the technical aspect here, the 
inference and reality to us all is clear. Government datasets are a 
small percentage of what is openly available and being linked. Primarily 
due to access, which has much to do with issues that Government 
considers important, such as politics, provenance, authority and trust. 
I am not counting "resellers" in this, as that in itself raises further 
issues about why some organisations have access to this first hand data, 
and why the man on the street often doesn't.



/
3) We accept that Linked (Open) Data is the purview of the Linking 
Open Data W3C Project - there is probably little we can add to the 
discussion here apart from supporting them in thier own work o

Re: [agenda] eGov IG Call, 25 Nov 2009, item 6

2009-11-25 Thread Kingsley Idehen

Chris Beer wrote:

I think Thomas makes some excellent points.

Is it possible as a group to agree on something akin to the following?

1) Open Data refers to how data is accessed and is primarily a 
political/policy consideration
Structured Data based on industry standard data representation formats. 
Just as UNIX came down to POSIX. Ditto Internet re. TCP/IP. Openness is 
about Standards, and has nothing to do with politics or philosophy.


You can institute policies that mandate the use of industry standard 
data formats re. data placed in the public domain or simply published 
for reuse by others.
2) Linked (Open) Data refers to how data is structured and delivered 
and is primarily a technological/standards consideration
To be precise: HTTP based Linked Open Data.  This is about the 
incorporation of HTTP scheme Identifiers into data that has be published 
using a standard data representation format.


Note: to get data into any standard data representation format there has 
to be a formal data model. At the most basic, said model takes the form: 
Entity-Attribute-Value. In the case of Linked Open Data, you have the 
intersection of the following:


1. EAV model
2. Standard Data Formats
3. HTTP scheme Identifiers (HTTP URIs).
3) The majority of datasets, LOD or not, that are of real value, are 
developed, maintained and delivered by Government, like it or not. We 
know this without even looking at the LOD Projects work 
 
(which interestingly, contains very little Government data, which is a 
worry as it possibly indicates that Governments just AREN'T getting on 
board with early take up of LOD, despite the various legal 
requirements coming out world wide).
How have you arrived at the above bearing in mind the pivotal role of 
DBpedia?  Basically, this is about a  Linked Open Data Space derived 
from Wikipedia snapshots which have little or no Govt. data. Of course, 
things get much better across depth, quality, and linked density 
dimensions when Govt. data is cross linked with LOD spaces like DBpedia etc.


3) We accept that Linked (Open) Data is the purview of the Linking 
Open Data W3C Project - there is probably little we can add to the 
discussion here apart from supporting them in thier own work of IDing 
datasets that can be linked.


In support of this point, e-Government will be as any other entity in 
this regard, and the methodologies in delivering LOD will not likely 
differ to the rest of the world or society, much as there is little 
difference in Web Content Delivery between Government models and 
Commercial/Public models. In that sense I agree with Thomas 100% when 
it comes to a technology model. It will be Semantic, and RDF is likely 
to become the dominant paradigm, if not the only one.


5) Open Data therefore is what we SHOULD be focused on - not in the 
sense of forcing a standard on Gov in terms of Open Data Delivery 
policy, but in Education and Outreach.


The question of non-RDF data consumers is almost moot. Given the time 
scales we are operating on, it is akin to asking at the start of the 
first version of HTML "how does hyperlinked content support .txt based 
users such as BBS systems". Non semantic, non-RDF, pre HTML 5 browsers 
and technologies will be legacy before we know it, probably while we 
are still discussing all this. I mean it.


This leaves us with two outcomes. The first is that the current user 
base that Thomas identifies as professional RDF consumers will 
inevitably drive the conversion of their suppliers data into RDF/XML 
formats, essentially as a snowball effect. GIS Data is a good example 
of where this is already happening.


RDF/XML really has little to do with the matter, at best its just a data 
representation option for an EAV model variant i.e., RDF Data Model.


I guess, RDF/XML will continue distract us for as long as it has 
preeminence in the Semantic Web Layer Cake :-(


The second is that as Thomas says,  human-readable formats  HAVE to be 
provided - ultimately the user is human, and the transition on the 
tech side between how the machine reads it, and how it is displayed to 
the user in a usable, displayable form should be seamless. Ultimately 
the user should not even realise that they are doing anything but 
looking at a web page of results that they have asked a server for.
Essence of Linked Data is server the representation requested by the 
User Agent. If they want HTML you send an HTML+RDFa representation of a 
resource description etc.. Basically, this is just about using HTTP's 
in-built prowess, the right way.


The value of HTML representations of resource descriptions remains too 
under appreciated re. overall demonstration of the real virtues of: HTTP 
based Linked Data and/or HTTP based Linked Open Data.


This is where I do disagree with Thomas. A Federation of providers is 
a nice concept, but it

Re: [agenda] eGov IG Call, 25 Nov 2009, item 6

2009-11-25 Thread Chris Beer

I think Thomas makes some excellent points.

Is it possible as a group to agree on something akin to the following?

1) Open Data refers to how data is accessed and is primarily a 
political/policy consideration
2) Linked (Open) Data refers to how data is structured and delivered and 
is primarily a technological/standards consideration
3) The majority of datasets, LOD or not, that are of real value, are 
developed, maintained and delivered by Government, like it or not. We 
know this without even looking at the LOD Projects work 
 
(which interestingly, contains very little Government data, which is a 
worry as it possibly indicates that Governments just AREN'T getting on 
board with early take up of LOD, despite the various legal requirements 
coming out world wide).


3) We accept that Linked (Open) Data is the purview of the Linking Open 
Data W3C Project - there is probably little we can add to the discussion 
here apart from supporting them in thier own work of IDing datasets that 
can be linked.


In support of this point, e-Government will be as any other entity in 
this regard, and the methodologies in delivering LOD will not likely 
differ to the rest of the world or society, much as there is little 
difference in Web Content Delivery between Government models and 
Commercial/Public models. In that sense I agree with Thomas 100% when it 
comes to a technology model. It will be Semantic, and RDF is likely to 
become the dominant paradigm, if not the only one.


5) Open Data therefore is what we SHOULD be focused on - not in the 
sense of forcing a standard on Gov in terms of Open Data Delivery 
policy, but in Education and Outreach.


The question of non-RDF data consumers is almost moot. Given the time 
scales we are operating on, it is akin to asking at the start of the 
first version of HTML "how does hyperlinked content support .txt based 
users such as BBS systems". Non semantic, non-RDF, pre HTML 5 browsers 
and technologies will be legacy before we know it, probably while we are 
still discussing all this. I mean it.


This leaves us with two outcomes. The first is that the current user 
base that Thomas identifies as professional RDF consumers will 
inevitably drive the conversion of their suppliers data into RDF/XML 
formats, essentially as a snowball effect. GIS Data is a good example of 
where this is already happening.


The second is that as Thomas says,  human-readable formats  HAVE to be 
provided - ultimately the user is human, and the transition on the tech 
side between how the machine reads it, and how it is displayed to the 
user in a usable, displayable form should be seamless. Ultimately the 
user should not even realise that they are doing anything but looking at 
a web page of results that they have asked a server for.


This is where I do disagree with Thomas. A Federation of providers is a 
nice concept, but it is too far off to think about, and will be 
inevitable in the end so probably doesn't need to be focused on. I 
believe that the key to overcoming the mistrust issue is three-fold:


a) Focusing on educating Governments on WOG methodologies in adopting 
inter-agency delivery on a National level - ie: promote the creation of 
the data.gov.* model. The international model is far to scary a prospect 
for most Governments to contemplate.

b) Educating Government on the ROI in making Data open to the public
c) Educating Government in ways in which "clearly marked-off data spaces 
with a trusted provenance" can still mean open data delivery for all - 
essentially this already happens whenever data is published, even in a 
HTML/PDF format - having data in the public domain does not mean giving 
access to the original uncorrupted dataset.


Just some thoughts.

Cheers

Chris

Thomas Bandholtz wrote:

There has been much discussion about *Open* Data in the eGov list these
days, which is a rather political question.
I am currently not so much concerned about openness, more about *Linked*
Data, as we have tons of government data with a legal obligation to make
them available to the public (at least in Europe, and especially
environmental data), and we are looking for means to do so in the most
efficient way.

So, among the six items of today's agenda, I find number 6 the most
challenging:
  

6. Discussion: Government Linked Data, Techniques and Technologies
[35min]


some considerations:
  

+ how does linked data support (non-RDF) data consumers?


First of all: Linked Data supports RDF data consumers.

Human readable formats should also be provided based on content
negotiation. Some providers have dedicated HTML formats, others have
not. Those who haven't depend on some available, general purpose "linked
data browser".
The latest discussion about the state of such tools has been started by
http://lists.w3.org/Archives/Public/public-lod/2009Oct

Re: [agenda] eGov IG Call, 25 Nov 2009, item 6

2009-11-25 Thread Thomas Bandholtz
There has been much discussion about *Open* Data in the eGov list these
days, which is a rather political question.
I am currently not so much concerned about openness, more about *Linked*
Data, as we have tons of government data with a legal obligation to make
them available to the public (at least in Europe, and especially
environmental data), and we are looking for means to do so in the most
efficient way.

So, among the six items of today's agenda, I find number 6 the most
challenging:
> 6. Discussion: Government Linked Data, Techniques and Technologies
> [35min]
some considerations:
> + how does linked data support (non-RDF) data consumers?
First of all: Linked Data supports RDF data consumers.

Human readable formats should also be provided based on content
negotiation. Some providers have dedicated HTML formats, others have
not. Those who haven't depend on some available, general purpose "linked
data browser".
The latest discussion about the state of such tools has been started by
http://lists.w3.org/Archives/Public/public-lod/2009Oct/0105.html, and I
am afraid the state-of-the-art of such browsers cannot compete with a
well-made dedicated HTML page (how could it).

So one might say linked data supports non-RDF data consumers rather
badly, but there a two objections:

* even non-RDF data consumers benefit from the availability of some
  linked data which would not be available in the Web at all if not
  generated with D2R (or similar)
* even non-RDF data consumers benefit from the extensive and
  systematic linkage provided by Linked Data which is rather unusual
  for common HTML pages.

I think the value of this question is somehow disputable, as - aside
form any content negotiation - linked data supports RDF consumers at
first. These consumers are mostly professionals who depend on government
data in order to do their work. So I would rather ask:

"How do professional RDF data consumers integrate linked data into their
working data bases today?"
 
> + strategies for modelling government data
Well, I would say, the basic model is RDF in this case ;-).
We are wasting too much time with efforts on "harmonising" models in a
waterfall manner (see http://inspire.jrc.ec.europa.eu/, for example)
instead of just publish it somehow.

One of TBL's Do's and Don'ts reads:
"Do NOT wait until you have a complete schema or ontology to publish data. "
http://www.w3.org/DesignIssues/GovData

I do not see any problem about schema diversity. However, we should make
use of existing schemas which have proved to work well. For example, the
OGC Observation and Measurement XML schema:
http://www.opengeospatial.org/standards/om

OM is expressed as an XML schema, not in RDF so far. But it expresses
perfectly clarified semantics about any kind of measurement data of
whatever kind of sensor, including timelines. XSD and URN patterns are
some drawbacks of this formalisation, but this could be resolved by a
RDF reformulation of the same semantics easily.

The most important aspect again is linkage. When expressing what or
where has been measured, don't use a dumb character string, but link to
a reference vocabulary.

> + essential metadata for Government Linked Open Data (eg VoiD)
VoiD is a good start. I wouldn't overestimate the need for metadata as
long as you can access the data itself. Metadata was a great thing in
former times when data access was a complex issue, so you would like to
know what you will get before starting the effort to get access to it.
If the data itself is linked to reference vocabularies extensively, the
data vs. metadata discussion ends in smoke.
> + expressing rights and licensing information
VoiD can do this.
> + approaches to provenance, authority and trust
Government generally is not so amused about the open world assumption,
they prefer clearly marked-off data spaces with a trusted provenance.

I think mistrust can be overcome by federation of providers. Federated
agencies can easily state that they trust each provider in this
federation. Just set up a domain for such a federation, link to this
federation from the data, and to the data from the federation.

No problem if anybody is publishing her own possibliy weird statements
about the same things as long as the federarion does not link to this data.

One rather developed case of such a sub-cloud is Linking Open Drug Data
(LODD).
see http://esw.w3.org/topic/HCLSIG/LODD
We might learn from them.

> + using RDF for Statistical Data
Parts of EUROSTAT have been published in SCOVO
http://sw.joanneum.at/scovo/schema.html.
Even SDMX is apparently moving towards SCOVO.
Does anyone see an alternative approach?



Looking forward to discussion this afternoon (well, in my time).

Thomas

(consulting the Federal Environment Agency in Germany)

-- 
Thomas Bandholtz, thomas.bandho...@innoq.com, http://www.innoq.com 
innoQ Deutschland GmbH, Halskestr. 17, D-40880 Ratingen, Germany
Phone: +49 228 9288490 Mobile: +49 178 4049387 Fax: +49