Re: Please allow JS access to Ontologies and LOD

2010-10-23 Thread Ian Davis
On Sat, Oct 23, 2010 at 2:28 AM, Nathan nat...@webr3.org wrote:
 Hi Ian,

 Thanks, I can confirm the change has been successful :)

 However, one small note is that the conneg URIs such as
 http://productdb.org/gtin/00319980033520 do not expose the header, thus
 can't be used.

Ta. These should be emitting the header now.

Ian



Re: Concordance, Reconciliation, and shared identifiers

2010-10-23 Thread Leigh Dodds
Hi,

On Friday, October 22, 2010, Kingsley Idehen kide...@openlinksw.com wrote:
 On 10/22/10 11:47 AM, Leigh Dodds wrote:

 A great project would be for someone to produce a Linked Data wrapper
 for the Guardian API, that allows linking *in* to their data, based on
 ISBNs and MusicBrainz ids. Its on my TODO list, but then so is a lot
 of other stuff ;)

 We've had sponger meta cartridges [1] for the Guardian API since its
 early incarnations.

Do you have an actual example of that? I had a look at the docs and I
couldn't see how/where the Guardian API data was being surfaced. The
meta cartridge seems to pull a small amount of info from the Guardian
website, rather than the OpenPlatform.

 Again, it would be interesting to build bridges between different
 communities by showing how one can achieve the same effects with
 Linked Data, as well as integrating Linked Data into those services by
 providing gateways services, e.g. implementing the same API but backed
 by RDF. This is what I did for the Gridworks, but the same could be
 extended to other services.

 On our part, we've been doing so since Linked Data inception, and will
 continue to do so

Well the more effort the better. Has anyone else explored the
boundaries between the Linked Data cloud and other APIs and services?

What do people think of more tailored lookup and access services onto
Linked Data over and above simple follow your nose and SPARQL?

Cheers,

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.do...@talis.com
http://www.talis.com



Re: Schema Mappings (was Re: AW: ANN: LOD Cloud - Statistics and compliance with best practices)

2010-10-23 Thread Leigh Dodds
Hi Antoine,

On Friday, October 22, 2010, Antoine Zimmermann
antoine.zimmerm...@insa-lyon.fr wrote:
 Le 22/10/2010 17:23, Leigh Dodds a écrit :
 This also strikes me as an opportunity: someone could usefully build a
 service (perhaps built on facilities in Sindice) that aggregated
 schema information and provides tools for expressing simple mappings
 and equivalencies. It could fill a dual role: recommend more
 common/preferred terms, whilst simultaneously providing
 machine-readable equivalencies.

 This sounds very much like what an ontology alignment server is doing:
 it provides alignments [often synonym with mappings] on demand (given
 two ontology URIs), either by retrieving locally stored alignments, or
 by asking another alignment server for an alignment that it may have, or
 by computing the alignment on the fly, given a certain direct matching
 algorithm or from the aggregation (e.g., composition) of existing
 alignments. The alignment server can also be used for various other
 things such as comparing alignments, evaluating them, rating them,
 updating them, etc.

 A paper describing the Alignment server [1] has been submitted to the
 Semantic Web Journal and is under open review (you can read the paper
 and the reviews and submit your own reviews or comments). The server
 itself can be downloaded and installed anywhere [2].

Interesting, thanks for the reference. I was aware that there's has
been and continues to be a lot of research in this area, but was just
wondering out loud whether anyone has explored opening up some kind of
matching service on a more production footing, either as an automated
service or using crowd-sourced mappings.

Running tools locally, and explore their effectiveness would he an
interesting exercise. But presumably  there will be a need to start
surfacing some services in this area soon, as part of the general
semweb infrastructure.

Cheers,

L.

-- 
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh.do...@talis.com
http://www.talis.com



Re: Schema Mappings (was Re: AW: ANN: LOD Cloud - Statistics and compliance with best practices)

2010-10-23 Thread Chris Bizer
Hi Leigh and Enrico,

 Hi,

 On 22 October 2010 09:35, Chris Bizer ch...@bizer.de wrote:
 Anja has pointed to a wealth of openly
 available numbers (no pun intended), that have not been discussed at
all.
 For
 example, only 7.5% of the data source provide a mapping of proprietary
 vocabulary terms to other vocabulary terms. For anyone building
 applications to work with LOD, this is a real problem.

 Yes, this is also the figure that scared me most.

 This might be low for a good reason: people may be creating
 proprietary terms because they don't feel well served by existing
 vocabularies and hence defining mappings (or even just reusing terms)
 may be difficult or even impossible.

Yes, this is true in many cases and for a given point in time.

But altogether I think it is important to see web-scale data integration
more in an evolutionary fashion in which different factors play together
over time. 

In my opinion these factors are:

1. An increasing amount of people start to use existing vocabularies which
already solves the integration problem in some areas simply by agreement on
these vocabularies.
2. More and more instance data is becoming available on the Web, which makes
it easier to mine schema mappings using statistical methods.
3. Different groups in various areas want to contribute to solving the
integration problem and thus invest effort in manually aligning vocabularies
(for instance between different standards used in the libraries community or
for people and provenance related vocabularies within the W3C Social Web and
Provenance XGs).
4. The Web allows you to share mappings by publishing them as RDF. Thus many
different people and groups may provide small contributions (= hints) that
help to solve the problem in the long run.

My thinking on the topic was strongly influenced by the pay-as-you-go data
integration ideas developed by Alon Halevy and other people in the
dataspaces community. A cool paper on the topic is in my opinion:

Web-Scale Data Integration: You can afford to Pay as You Go. Madhavan, J.;
Cohen, S.; Dong, X.; Halevy, A.; Jeffery, S.; Ko, D.; Yu, C., CIDR (2007)
http://research.yahoo.com/files/paygo.pdf

describing a system that applies schema clustering in order to mine mappings
from Google Base and web table data and presents ideas on how you can deal
with the uncertainty that you introduce using ranking algorithms.

Other interesting papers in the area are:

Das Sarma, A., Dong, X., Halevy, A.: Bootstrapping pay-as-you-go data
integration
systems. Proceedings of the Conference on Management of Data, SIGMOD (2008)

Vaz Salles, M.A., Dittrich, J., Karakashian, S.K., Girard, O.R., Blunschi,
L.: iTrails: Payas-you-go Information Integration in Dataspaces. In:
Conference of Very large Data Bases
(VLDB 2007), 663-674 (2007)

Franklin, M.J., Halevy, A.Y., Maier, D.: From databases to dataspaces: A new
abstraction
for information management. SIGMOD Record 34(4), pp. 27–33 (2005)

Hedeler, C., et al.: Dimensions of Dataspaces. In: Proceedings of the 26th
British National
Conference on Databases, pp. 55-66 (2009)

These guys always have the idea that mappings are added to a dataspace by
administrators or mined using a single, specific method.

What I think is interesting in the Web of Linked Data setting is that
mappings can be created and published by different parties to a single
global dataspace. Meaning that the necessary effort to create the mappings
can be divided between different parties. So pay-as-you-go might evolve into
somebody-pay-as-you-go :-)
But of course also meaning that the quality of mappings is becoming
increasingly uncertain and that the information consumer needs to assess the
quality of mappings and decide which ones it wants to use.

We are currently exploring this problem space and will present a paper about
publishing and discovering mappings on the Web of Linked Data at the COLD
workshop at ISWC 2010.

http://www.wiwiss.fu-berlin.de/en/institute/pwo/bizer/research/publications/
BizerSchultz-COLD-R2R-Paper.pdf

Central ideas of the paper are that:
1. you identify mappings with URIs so that they can be interlinked from
vocabulary definitions or void dataset descriptions and so that client
applications as well as Web of Data search engines can discover them.
2. A client application which discovers data that is represented using terms
that are unknown to the application may search the Web for mappings, apply a
quality evaluation heuristic to decide which alternative mappings to use and
then apply the chosen
mappings to translate data to its local schema. 

 This also strikes me as an opportunity: someone could usefully build a
 service (perhaps built on facilities in Sindice) that aggregated
 schema information and provides tools for expressing simple mappings
 and equivalencies. It could fill a dual role: recommend more
 common/preferred terms, whilst simultaneously providing
 machine-readable equivalencies.

Absolutely, there might even be opportunities