Re: automatic data interlinking

2010-05-31 Thread Hugh Glaser
Dear François,

This is a great initiative in a crucial area.
I am wondering if there is anything we (rkbexplorer.com and sameas.org) can
do to help.
Clearly we have a lot of datasets which our tools have been grinding over
aligning for many years, and we would be happy to offer anything you would
find useful.
However, there may also be other things.
I looked at taking the outputs of last year's exercise into a sameas store,
but found the URIs (at least the few I tried) were not LD, so backed off.
So perhaps the first suggestion would be that whatever datasets you choose,
they should be over LD URIs.
Another suggestion would be that the outputs of the exercise should be
published in such a way that they will be useful to the LD world. Not least,
this would be more motivating to the participants.
We would be happy to bring up a sameas store for this, or indeed a separate
sameas store for each of the participants, where they can post their data,
and they and others can then access it.
And of course results with high precision can safely be put in sameas.org,
which would be very exciting for me.
(Of course the level of help we can give will be limited by the resources,
which are limited.)
In choosing datasets, perhaps an obvious place to start is something like
the geographical data in the data.gov.uk world?

Best
Hugh

PS
In fact, there are some useful ones that would help personally, although you
may feel they are too close to the last year's topics: for example we have
LD datasets of the NSF (National Science Foundation) project data with the
OAI (Open Archive Initiative) bibliographic data, and aligning these would
be challenging but very interesting.


On 21/05/2010 14:51, "François Scharffe"  wrote:

> Hello,
> 
> Part of the ontology alignment evaluation initiative [1] we will have
> for the 2nd year a data interlinking evaluation.
> 
> We propose in this track to evaluate systems able to *automatically*
> find interlinks between Web datasets, in contrast to semi-automatic
> tools. This year we will focus on large datasets. Two datasets are given
> in input and a set of links between equivalent resources will have to be
> given in output.
> 
> We're looking for systems to participate to the evaluation. We're also
> looking for datasets that may be used for the evaluation, that is have a
> nicely curated linkset to serve as a reference.
> 
> btw I also invite you to look at the result of last year evaluation [2]
> 
> Cheers
> 
> François
> 
> 
> 
> [1] http:///oaei.ontologymatching.org
> [2] http:///oaei.ontologymatching.org/2009/instances/
> 




Cool URIs (was: Re: Java Framework for Content Negotiation)

2010-05-31 Thread Angelo Veltens

On 27.05.2010 15:51, Richard Cyganiak wrote:

On 27 May 2010, at 10:47, Angelo Veltens wrote:
What I am going to implement is this: 
http://www.w3.org/TR/cooluris/#r303uri


I think, this is the way dbpedia works and it seems a good solution 
for me.


It's the way DBpedia works, but it's by far the worst solution of the 
three presented in the document.


DBpedia has copied the approach from D2R Server. The person who came 
up with it and designed and implemented it for D2R Server is me. This 
was back in 2006, before the term Linked Data was even coined, so I 
didn't exactly have a lot of experience to rely on. With what I know 
today, I would never, ever again choose that approach. Use 303s if you 
must; but please do me a favour and add that generic document, and 
please do me a favour and name the different variants  and 
 rather than  and .


Thanks a lot for sharing your experience with me. I will follow your 
advice. So if i'm going to implement what is described in section 4.2. i 
have to


- serve html at http://www.example.org/doc/alice if text/html wins 
content negotiation and set content-location header to 
http://www.example.org/doc/alice.html
- serve rdf/xml at http://www.example.org/doc/alice if 
application/rdf+xml wins content negotiation and set content-location 
header to http://www.example.org/doc/alice.rdf

- serve html at http://www.example.org/doc/alice.html always
- serve rdf/xml at http://www.example.org/doc/alice.rdf always

Right?

By the way: Is there any defined behavior for the client, what to do 
with the content-location information? Do Browsers take account of it?




The DBpedia guys are probably stuck with my stupid design forever 
because changing it now would break all sorts of links. But the thing 
that really kills me is how lots of newbies copy that design just 
because they saw it on DBpedia and therefore think that it must be good.


I think the problem is not only, that dbpedia uses that design, but that 
it is described in many examples as a possible or even "cool" solution, 
e.g. http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/ (one 
of the first documents i stumbled upon)


If we want to prevent people from using that design it should be 
clarified that and why it is a bad choice.


Kind regards and thanks for your patience,
Angelo



Semantic Web Challenge @ ISWC 2010 - Call for Participation

2010-05-31 Thread Chris Bizer
Dear all,

we are happy to announce the Semantic Web Challenge 2010!

The Semantic Web Challenge 2010 is collocated with the 9th International
Semantic Web Conference (ISWC2010) in Shanghai, China. As last year, the
challenge consists of two tacks: The Open Track and the Billion Triples
Track, which requires participants to make use of the data set that has been
crawled from the public Semantic Web. The data set consists of 3.2 billion
triples this year and can be downloaded from the challenge's website.   

The Call for Participation is found below. More information about the
Challenge is provided at

http://challenge.semanticweb.org/

We are looking forward to your submissions which as we hope will make the
Semantic Web Challenge again one of the most exciting events at ISWC.

Best regards,

Diana and Chris


--

Call for Participation for the 

8th Semantic Web Challenge 

at the 9th International Semantic Web Conference ISWC 2010 
Shanghai, China, November 7-11, 2010 

http://challenge.semanticweb.org/

--

Introduction

Submissions are now invited for the 8th annual Semantic Web Challenge, the
premier event for demonstrating practical progress towards achieving the
vision of the Semantic Web. The central idea of the Semantic Web is to
extend the current human-readable Web by encoding some of the semantics of
resources in a machine-processable form. Moving beyond syntax opens the door
to more advanced applications and functionality on the Web. Computers will
be better able to search, process, integrate and present the content of
these resources in a meaningful, intelligent manner. 

As the core technological building blocks are now in place, the next
challenge is to demonstrate the benefits of semantic technologies by
developing integrated, easy to use applications that can provide new levels
of Web functionality for end users on the Web or within enterprise settings.
Applications submitted should give evidence of clear practical value that
goes above and beyond what is possible with conventional web technologies
alone. 

As in previous years, the Semantic Web Challenge 2010 will consist of two
tracks: the Open Track and the Billion Triples Track. The key difference
between the two tracks is that the Billion Triples Track requires the
participants to make use of the data set (consisting of 3.2 billion triples
this year) that has been crawled from the Web and is provided by the
organizers. The Open Track has no such restrictions. As before, the
Challenge is open to everyone from industry and academia. The authors of the
best applications will be awarded prizes and featured prominently at special
sessions during the conference. 

The overall goal of this event is to advance our understanding of how
Semantic Web technologies can be exploited to produce useful applications
for the Web. Semantic Web applications should integrate, combine, and deduce
information from various sources to assist users in performing specific
tasks. 

---
Challenge Criteria

The Challenge is defined in terms of minimum requirements and additional
desirable features that submissions should exhibit. The minimum requirements
and the additional desirable features are listed below per track. 

Open Track

Minimal requirements

1. The application has to be an end-user application, i.e. an application
that provides a practical value to general Web users or, if this is not the
case, at least to domain experts. 
2. The information sources used 
should be under diverse ownership or control 
should be heterogeneous (syntactically, structurally, and semantically), and

should contain substantial quantities of real world data (i.e. not toy
examples). 
The meaning of data has to play a central role. 
3. Meaning must be represented using Semantic Web technologies. 
4. Data must be manipulated/processed in interesting ways to derive useful
information and this semantic information processing has to play a central
role in achieving things that alternative technologies cannot do as well, or
at all; 

Additional Desirable Features 

In addition to the above minimum requirements, we note other desirable
features that will be used as criteria to evaluate submissions. 

1. The application provides an attractive and functional Web interface (for
human users) 
2. The application should be scalable (in terms of the amount of data used
and in terms of distributed components working together). Ideally, the
application should use all data that is currently published on the Semantic
Web. 
3. Rigorous evaluations have taken place that demonstrate the benefits of
semantic technologies, or validate the results obtained. 
4. Novelty, in applying semantic technology to a domain or task that have
not been considered before 
5. Functionality is different from or goes beyond pure information retrieval

6. The application has clear commercia

CFP: Workshop on Knowledge Injecting into and Extraction from Linked Data (KIELD2010)

2010-05-31 Thread François Scharffe

[Apologies for cross-posting]

CALL FOR PAPERS - Workshop on Knowledge Injecting into and Extraction
from Linked Data (KIELD2010)
Co-located with EKAW 2010 - 11th to 15th October 2010
Website: http://ontologydesignpatterns.org/wiki/Odp:KIELD2010
Submissions:

KEY DATES
Submission deadline:July 15th
Authors notified:   August 9th
Camera-ready versionAugust 25th
WorkshopOctober 11 or 15

The KIELD workshop aims at gathering three prominent sub-communities of
Knowledge Engineering and Management: Knowledge Modelling, Knowledge
Discovery and Linked Data. The rapid growth of the Linked Data cloud, in
parallel with on-the-fly design of relevant vocabularies, presents new
opportunities for traditional research disciplines.

The workshop welcomes research, application, and position papers on the
following topics (non-exclusive):

Ontology engineering for Linked Data
Methodologies
Ontology patterns extraction
Ontology patterns identification and discovery
Pattern-based triplification
Anti-patterns or worst practices
Data mining from Linked Data
Entity recognition
Link prediction
Pattern mining
Sequential patterns
Rule mining
Linked Data in use
Domain applications based-on linked data
Linked data exploitation
Interaction with linked data

There will be two categories of papers:
* Regular research and application papers of up to 12 pages
* Position papers of up to 6 pages
All submissions should indicate into which category they fall.
N.B.  The workshop is organised in association with EKAW
(http://ekaw2010.inesc-id.pt/).
In order to register for CIAO2010, it is also necessary to register for
EKAW 2010.