2nd CfP: Semantic Web Challenge @ ISWC 2010

2010-08-09 Thread Chris Bizer
Dear all,

this is a reminder that the submission deadline for the Semantic Web
Challenge 2010 is slowly approaching. The submission deadline is

October 1st, 2010, 12 a.m. (midnight) CET

The Semantic Web Challenge 2010 is collocated with the 9th International
Semantic Web Conference (ISWC2010) in Shanghai, China. As last year, the
challenge consists of two tacks: The Open Track and the Billion Triples
Track, which requires participants to make use of the data set that has been
crawled from the public Semantic Web. The data set consists of 3.2 billion
triples this year and can be downloaded from the challenge's website.   

The Call for Participation is found below. More information about the
Challenge is provided at

http://challenge.semanticweb.org/

We are looking forward to your submissions which as we hope will make the
Semantic Web Challenge again one of the most exciting events at ISWC.

Best regards,

Diana and Chris



--

Call for Participation for the 

8th Semantic Web Challenge 

at the 9th International Semantic Web Conference ISWC 2010 
Shanghai, China, November 7-11, 2010 

http://challenge.semanticweb.org/

--

Introduction

Submissions are now invited for the 8th annual Semantic Web Challenge, the
premier event for demonstrating practical progress towards achieving the
vision of the Semantic Web. The central idea of the Semantic Web is to
extend the current human-readable Web by encoding some of the semantics of
resources in a machine-processable form. Moving beyond syntax opens the door
to more advanced applications and functionality on the Web. Computers will
be better able to search, process, integrate and present the content of
these resources in a meaningful, intelligent manner. 

As the core technological building blocks are now in place, the next
challenge is to demonstrate the benefits of semantic technologies by
developing integrated, easy to use applications that can provide new levels
of Web functionality for end users on the Web or within enterprise settings.
Applications submitted should give evidence of clear practical value that
goes above and beyond what is possible with conventional web technologies
alone. 

As in previous years, the Semantic Web Challenge 2010 will consist of two
tracks: the Open Track and the Billion Triples Track. The key difference
between the two tracks is that the Billion Triples Track requires the
participants to make use of the data set (consisting of 3.2 billion triples
this year) that has been crawled from the Web and is provided by the
organizers. The Open Track has no such restrictions. As before, the
Challenge is open to everyone from industry and academia. The authors of the
best applications will be awarded prizes and featured prominently at special
sessions during the conference. 

The overall goal of this event is to advance our understanding of how
Semantic Web technologies can be exploited to produce useful applications
for the Web. Semantic Web applications should integrate, combine, and deduce
information from various sources to assist users in performing specific
tasks. 

---
Challenge Criteria

The Challenge is defined in terms of minimum requirements and additional
desirable features that submissions should exhibit. The minimum requirements
and the additional desirable features are listed below per track. 

Open Track

Minimal requirements

1. The application has to be an end-user application, i.e. an application
that provides a practical value to general Web users or, if this is not the
case, at least to domain experts. 
2. The information sources used should be under diverse ownership or control
should be heterogeneous (syntactically, structurally, and semantically), and
should contain substantial quantities of real world data (i.e. not toy
examples). The meaning of data has to play a central role. 
3. Meaning must be represented using Semantic Web technologies. 
4. Data must be manipulated/processed in interesting ways to derive useful
information and this semantic information processing has to play a central
role in achieving things that alternative technologies cannot do as well, or
at all; 

Additional Desirable Features 

In addition to the above minimum requirements, we note other desirable
features that will be used as criteria to evaluate submissions. 

1. The application provides an attractive and functional Web interface (for
human users) 
2. The application should be scalable (in terms of the amount of data used
and in terms of distributed components working together). Ideally, the
application should use all data that is currently published on the Semantic
Web. 
3. Rigorous evaluations have taken place that demonstrate the benefits of
semantic technologies, or validate the results obtained. 
4. Novelty, in applying semantic technology to a domain or task that have
not been considered before 

Re: Best Practices for Converting CSV into LOD?

2010-08-09 Thread Axel Rauschmayer
I gave this a shot in a previous version of Hyena. By prepending one or more 
special rows, one could control how the columns were converted: what predicate 
to use, how to convert the content. If a column specification was missing, 
defaults were used. There were several options: If a cell value was similar to 
a tag, resources could be auto-created (the cell value became the resource 
label, existing resources were looked up via their labels). One could also 
split a cell value prior to processing it (to account for multiple values per 
column).

Creating meaningful URIs for predicates and rows (resources) is especially 
important, but tricky. Ideally, import would work bi-directionally (and 
idempotently): Changes you make in RDF can be written back to the spreadsheet, 
changes in the spreadsheet can be reimported without causing chaos.

Even though my solution worked OK and I do not see how it could be done better, 
I was not completely happy with it, because writing this kind of CSV/RDF 
mapping is beyond the capabilities of normal end users. One could automatically 
create URIs for predicates from column titles, but as for reliable URIs 
(primary keys), I am at a loss. So it seems like one is stuck with letting an 
expert write an import specification and hiding it from end users. Then my 
solution of embedding such a spec in the spreadsheet should be re-thought. And 
it seems like a simple script might be a better solution than a complex 
specification language that can handle all the special cases. For example, I 
hadn’t even thought about two cells contributing to the same literal. Maybe a 
JVM-hosted scripting language (such as Jython) could be used, but even raw Java 
is not so bad and has the advantage of superior tool support.

This is important stuff, as many people have all kinds of lists in 
Excel---which would make great LOD data. It also shows that spreadsheets are 
hard to beat when it comes to getting started quickly: You just enter your 
data. Should someone come up with a simpler way of translating CSV data then 
that might translate to general usability improvements for entering LOD data.

On Aug 9, 2010, at 18:37 , Wood, Jamey wrote:

 Are there any established best practices for converting CSV data into 
 LOD-friendly RDF?  For example, I would like to produce an LOD-friendly RDF 
 version of the 2001 - Present Net Generation by State by Type of Producer by 
 Energy Source CSV data at:
 
  http://www.eia.doe.gov/cneaf/electricity/epa/epa_sprdshts_monthly.html
 
 I'm attaching a sample of a first stab at this.  Questions I'm running into 
 include the following:
 
 
 1.  Should one try to convert primitive data types (particularly strings) 
 into URI references?  Or just leave them as primitives?  Or perhaps provide 
 both (with separate predicate names)?  For example, the  sample EIA data I 
 reference has two-letter state abbreviations in one column.  Should those be 
 left alone or converted into URIs?
 2.  Should one merge separate columns from the original data in order to 
 align to well-known RDF types?  For example, the sample EIA data has separate 
 Year and Month columns.  Should those be merged in the RDF version so 
 that an xs:gYearMonth type can be used?
 3.  Should one attempt to introduce some sort of hierarchical structure (to 
 make the LOD more browseable)?  The skos:related triples in the attached 
 sample are an initial attempt to do that.  Is this a good idea?  If so, is 
 that a reasonable predicate to use?  If it is a reasonable thing to do, we 
 would presumably craft these triples so that one could navigate through the 
 entire LOD (e.g. state - state/year - state/year/month - 
 state/year/month/typeOfProducer - 
 state/year/month/typeOfProducer/energySource).
 4.  Any other considerations that I'm overlooking?
 
 Thanks,
 Jamey
 generation_state_mon.rdf

-- 
Dr. Axel Rauschmayer
axel.rauschma...@ifi.lmu.de
http://hypergraphs.de/
### Hyena: organize your ideas, free at hypergraphs.de/hyena/






Re: Best Practices for Converting CSV into LOD?

2010-08-09 Thread Mike Bergman
You may want to look at irON [1] and its commON [2] format.  The 
specs provide guidance on our approach to your questions.


We use it all the time (as do our clients) and it works great. 
Fred Giasson also just completed a dataset append Web service 
that integrates with it for incremental updates.


Thanks, Mike

[1] http://openstructs.org/iron
[2] http://techwiki.openstructs.org/index.php/CommON_Case_Study

On 8/9/2010 2:12 PM, Axel Rauschmayer wrote:

I gave this a shot in a previous version of Hyena. By prepending one or more 
special rows, one could control how the columns were converted: what predicate 
to use, how to convert the content. If a column specification was missing, 
defaults were used. There were several options: If a cell value was similar to 
a tag, resources could be auto-created (the cell value became the resource 
label, existing resources were looked up via their labels). One could also 
split a cell value prior to processing it (to account for multiple values per 
column).

Creating meaningful URIs for predicates and rows (resources) is especially 
important, but tricky. Ideally, import would work bi-directionally (and 
idempotently): Changes you make in RDF can be written back to the spreadsheet, 
changes in the spreadsheet can be reimported without causing chaos.

Even though my solution worked OK and I do not see how it could be done better, I was not 
completely happy with it, because writing this kind of CSV/RDF mapping is beyond the 
capabilities of normal end users. One could automatically create URIs for predicates from 
column titles, but as for reliable URIs (primary keys), I am at a loss. So it 
seems like one is stuck with letting an expert write an import specification and hiding 
it from end users. Then my solution of embedding such a spec in the spreadsheet should be 
re-thought. And it seems like a simple script might be a better solution than a complex 
specification language that can handle all the special cases. For example, I hadn’t even 
thought about two cells contributing to the same literal. Maybe a JVM-hosted scripting 
language (such as Jython) could be used, but even raw Java is not so bad and has the 
advantage of superior tool support.

This is important stuff, as many people have all kinds of lists in 
Excel---which would make great LOD data. It also shows that spreadsheets are 
hard to beat when it comes to getting started quickly: You just enter your 
data. Should someone come up with a simpler way of translating CSV data then 
that might translate to general usability improvements for entering LOD data.

On Aug 9, 2010, at 18:37 , Wood, Jamey wrote:


Are there any established best practices for converting CSV data into LOD-friendly RDF?  
For example, I would like to produce an LOD-friendly RDF version of the 2001 - 
Present Net Generation by State by Type of Producer by Energy Source CSV data at:

  http://www.eia.doe.gov/cneaf/electricity/epa/epa_sprdshts_monthly.html

I'm attaching a sample of a first stab at this.  Questions I'm running into 
include the following:


1.  Should one try to convert primitive data types (particularly strings) into 
URI references?  Or just leave them as primitives?  Or perhaps provide both 
(with separate predicate names)?  For example, the  sample EIA data I reference 
has two-letter state abbreviations in one column.  Should those be left alone 
or converted into URIs?
2.  Should one merge separate columns from the original data in order to align to well-known RDF types?  For 
example, the sample EIA data has separate Year and Month columns.  Should those be 
merged in the RDF version so that an xs:gYearMonth type can be used?
3.  Should one attempt to introduce some sort of hierarchical structure (to make the LOD more browseable)?  The skos:related triples in 
the attached sample are an initial attempt to do that.  Is this a good idea?  If so, is that a reasonable predicate to use?  If it is a reasonable thing to do, 
we would presumably craft these triples so that one could navigate through the entire LOD (e.g. state -  state/year -  
state/year/month -  state/year/month/typeOfProducer -  state/year/month/typeOfProducer/energySource).
4.  Any other considerations that I'm overlooking?

Thanks,
Jamey
generation_state_mon.rdf