2nd CfP: Semantic Web Challenge @ ISWC 2010
Dear all, this is a reminder that the submission deadline for the Semantic Web Challenge 2010 is slowly approaching. The submission deadline is October 1st, 2010, 12 a.m. (midnight) CET The Semantic Web Challenge 2010 is collocated with the 9th International Semantic Web Conference (ISWC2010) in Shanghai, China. As last year, the challenge consists of two tacks: The Open Track and the Billion Triples Track, which requires participants to make use of the data set that has been crawled from the public Semantic Web. The data set consists of 3.2 billion triples this year and can be downloaded from the challenge's website. The Call for Participation is found below. More information about the Challenge is provided at http://challenge.semanticweb.org/ We are looking forward to your submissions which as we hope will make the Semantic Web Challenge again one of the most exciting events at ISWC. Best regards, Diana and Chris -- Call for Participation for the 8th Semantic Web Challenge at the 9th International Semantic Web Conference ISWC 2010 Shanghai, China, November 7-11, 2010 http://challenge.semanticweb.org/ -- Introduction Submissions are now invited for the 8th annual Semantic Web Challenge, the premier event for demonstrating practical progress towards achieving the vision of the Semantic Web. The central idea of the Semantic Web is to extend the current human-readable Web by encoding some of the semantics of resources in a machine-processable form. Moving beyond syntax opens the door to more advanced applications and functionality on the Web. Computers will be better able to search, process, integrate and present the content of these resources in a meaningful, intelligent manner. As the core technological building blocks are now in place, the next challenge is to demonstrate the benefits of semantic technologies by developing integrated, easy to use applications that can provide new levels of Web functionality for end users on the Web or within enterprise settings. Applications submitted should give evidence of clear practical value that goes above and beyond what is possible with conventional web technologies alone. As in previous years, the Semantic Web Challenge 2010 will consist of two tracks: the Open Track and the Billion Triples Track. The key difference between the two tracks is that the Billion Triples Track requires the participants to make use of the data set (consisting of 3.2 billion triples this year) that has been crawled from the Web and is provided by the organizers. The Open Track has no such restrictions. As before, the Challenge is open to everyone from industry and academia. The authors of the best applications will be awarded prizes and featured prominently at special sessions during the conference. The overall goal of this event is to advance our understanding of how Semantic Web technologies can be exploited to produce useful applications for the Web. Semantic Web applications should integrate, combine, and deduce information from various sources to assist users in performing specific tasks. --- Challenge Criteria The Challenge is defined in terms of minimum requirements and additional desirable features that submissions should exhibit. The minimum requirements and the additional desirable features are listed below per track. Open Track Minimal requirements 1. The application has to be an end-user application, i.e. an application that provides a practical value to general Web users or, if this is not the case, at least to domain experts. 2. The information sources used should be under diverse ownership or control should be heterogeneous (syntactically, structurally, and semantically), and should contain substantial quantities of real world data (i.e. not toy examples). The meaning of data has to play a central role. 3. Meaning must be represented using Semantic Web technologies. 4. Data must be manipulated/processed in interesting ways to derive useful information and this semantic information processing has to play a central role in achieving things that alternative technologies cannot do as well, or at all; Additional Desirable Features In addition to the above minimum requirements, we note other desirable features that will be used as criteria to evaluate submissions. 1. The application provides an attractive and functional Web interface (for human users) 2. The application should be scalable (in terms of the amount of data used and in terms of distributed components working together). Ideally, the application should use all data that is currently published on the Semantic Web. 3. Rigorous evaluations have taken place that demonstrate the benefits of semantic technologies, or validate the results obtained. 4. Novelty, in applying semantic technology to a domain or task that have not been considered before
Re: Best Practices for Converting CSV into LOD?
I gave this a shot in a previous version of Hyena. By prepending one or more special rows, one could control how the columns were converted: what predicate to use, how to convert the content. If a column specification was missing, defaults were used. There were several options: If a cell value was similar to a tag, resources could be auto-created (the cell value became the resource label, existing resources were looked up via their labels). One could also split a cell value prior to processing it (to account for multiple values per column). Creating meaningful URIs for predicates and rows (resources) is especially important, but tricky. Ideally, import would work bi-directionally (and idempotently): Changes you make in RDF can be written back to the spreadsheet, changes in the spreadsheet can be reimported without causing chaos. Even though my solution worked OK and I do not see how it could be done better, I was not completely happy with it, because writing this kind of CSV/RDF mapping is beyond the capabilities of normal end users. One could automatically create URIs for predicates from column titles, but as for reliable URIs (primary keys), I am at a loss. So it seems like one is stuck with letting an expert write an import specification and hiding it from end users. Then my solution of embedding such a spec in the spreadsheet should be re-thought. And it seems like a simple script might be a better solution than a complex specification language that can handle all the special cases. For example, I hadn’t even thought about two cells contributing to the same literal. Maybe a JVM-hosted scripting language (such as Jython) could be used, but even raw Java is not so bad and has the advantage of superior tool support. This is important stuff, as many people have all kinds of lists in Excel---which would make great LOD data. It also shows that spreadsheets are hard to beat when it comes to getting started quickly: You just enter your data. Should someone come up with a simpler way of translating CSV data then that might translate to general usability improvements for entering LOD data. On Aug 9, 2010, at 18:37 , Wood, Jamey wrote: Are there any established best practices for converting CSV data into LOD-friendly RDF? For example, I would like to produce an LOD-friendly RDF version of the 2001 - Present Net Generation by State by Type of Producer by Energy Source CSV data at: http://www.eia.doe.gov/cneaf/electricity/epa/epa_sprdshts_monthly.html I'm attaching a sample of a first stab at this. Questions I'm running into include the following: 1. Should one try to convert primitive data types (particularly strings) into URI references? Or just leave them as primitives? Or perhaps provide both (with separate predicate names)? For example, the sample EIA data I reference has two-letter state abbreviations in one column. Should those be left alone or converted into URIs? 2. Should one merge separate columns from the original data in order to align to well-known RDF types? For example, the sample EIA data has separate Year and Month columns. Should those be merged in the RDF version so that an xs:gYearMonth type can be used? 3. Should one attempt to introduce some sort of hierarchical structure (to make the LOD more browseable)? The skos:related triples in the attached sample are an initial attempt to do that. Is this a good idea? If so, is that a reasonable predicate to use? If it is a reasonable thing to do, we would presumably craft these triples so that one could navigate through the entire LOD (e.g. state - state/year - state/year/month - state/year/month/typeOfProducer - state/year/month/typeOfProducer/energySource). 4. Any other considerations that I'm overlooking? Thanks, Jamey generation_state_mon.rdf -- Dr. Axel Rauschmayer axel.rauschma...@ifi.lmu.de http://hypergraphs.de/ ### Hyena: organize your ideas, free at hypergraphs.de/hyena/
Re: Best Practices for Converting CSV into LOD?
You may want to look at irON [1] and its commON [2] format. The specs provide guidance on our approach to your questions. We use it all the time (as do our clients) and it works great. Fred Giasson also just completed a dataset append Web service that integrates with it for incremental updates. Thanks, Mike [1] http://openstructs.org/iron [2] http://techwiki.openstructs.org/index.php/CommON_Case_Study On 8/9/2010 2:12 PM, Axel Rauschmayer wrote: I gave this a shot in a previous version of Hyena. By prepending one or more special rows, one could control how the columns were converted: what predicate to use, how to convert the content. If a column specification was missing, defaults were used. There were several options: If a cell value was similar to a tag, resources could be auto-created (the cell value became the resource label, existing resources were looked up via their labels). One could also split a cell value prior to processing it (to account for multiple values per column). Creating meaningful URIs for predicates and rows (resources) is especially important, but tricky. Ideally, import would work bi-directionally (and idempotently): Changes you make in RDF can be written back to the spreadsheet, changes in the spreadsheet can be reimported without causing chaos. Even though my solution worked OK and I do not see how it could be done better, I was not completely happy with it, because writing this kind of CSV/RDF mapping is beyond the capabilities of normal end users. One could automatically create URIs for predicates from column titles, but as for reliable URIs (primary keys), I am at a loss. So it seems like one is stuck with letting an expert write an import specification and hiding it from end users. Then my solution of embedding such a spec in the spreadsheet should be re-thought. And it seems like a simple script might be a better solution than a complex specification language that can handle all the special cases. For example, I hadn’t even thought about two cells contributing to the same literal. Maybe a JVM-hosted scripting language (such as Jython) could be used, but even raw Java is not so bad and has the advantage of superior tool support. This is important stuff, as many people have all kinds of lists in Excel---which would make great LOD data. It also shows that spreadsheets are hard to beat when it comes to getting started quickly: You just enter your data. Should someone come up with a simpler way of translating CSV data then that might translate to general usability improvements for entering LOD data. On Aug 9, 2010, at 18:37 , Wood, Jamey wrote: Are there any established best practices for converting CSV data into LOD-friendly RDF? For example, I would like to produce an LOD-friendly RDF version of the 2001 - Present Net Generation by State by Type of Producer by Energy Source CSV data at: http://www.eia.doe.gov/cneaf/electricity/epa/epa_sprdshts_monthly.html I'm attaching a sample of a first stab at this. Questions I'm running into include the following: 1. Should one try to convert primitive data types (particularly strings) into URI references? Or just leave them as primitives? Or perhaps provide both (with separate predicate names)? For example, the sample EIA data I reference has two-letter state abbreviations in one column. Should those be left alone or converted into URIs? 2. Should one merge separate columns from the original data in order to align to well-known RDF types? For example, the sample EIA data has separate Year and Month columns. Should those be merged in the RDF version so that an xs:gYearMonth type can be used? 3. Should one attempt to introduce some sort of hierarchical structure (to make the LOD more browseable)? The skos:related triples in the attached sample are an initial attempt to do that. Is this a good idea? If so, is that a reasonable predicate to use? If it is a reasonable thing to do, we would presumably craft these triples so that one could navigate through the entire LOD (e.g. state - state/year - state/year/month - state/year/month/typeOfProducer - state/year/month/typeOfProducer/energySource). 4. Any other considerations that I'm overlooking? Thanks, Jamey generation_state_mon.rdf