[CFP] Third International Workshop on Semantic Statistics (SemStats 2015)

Sarven Capadisli Wed, 29 Apr 2015 03:45:20 -0700

SemStats 2015 Call for Papers
=============================


Third International Workshop on Semantic Statistics (SemStats 2015)

Workshop website: http://semstats.org/
Event hashtags: #SemStats #ISWC2015

in conjunction with

ISWC 2015
The 14th International Semantic Web Conference
Bethlehem - USA, October 11-15, 2015
http://iswc2015.semanticweb.org/

Workshop Summary
================

The goal of this workshop is to explore and strengthen the relationshipbetween the Semantic Web and statistical communities, to provide betteraccess to the data held by statistical offices. It will focus on ways inwhich statisticians can use Semantic Web technologies and standards inorder to formalize, publish, document and link their data and metadata,and also on how statistical methods can be applied on linked data. Itfollows two very successful edition of the Semantic Statistics workshopheld at ISWC 2013 (SemStats 2013) and at ISWC 2014 (SemStats 2014).

The statistical community has recently shown an interest in the SemanticWeb. In particular, initiatives have been launched to develop semanticvocabularies representing statistical classifications and discoverymetadata. Tools are also being created by statistical organizations tosupport the publication of dimensional data conforming to the Data CubeW3C Recommendation. But statisticians see challenges in the SemanticWeb: how can data and concepts be linked in a statistically rigorousfashion? How can we avoid fuzzy semantics leading to wrong analyses? Howcan we preserve data confidentiality?

The workshop will also cover the question of how to apply statisticalmethods or treatments to linked data, and how to develop new methods andtools for this purpose. Except for visualisation techniques and tools,this question is relatively unexplored, but the subject will obviouslygrow in importance in the near future.


Motivation
==========

There is a growing interest regarding linked data and the semantic webin the statistical community. A large amount of statistical data frominternational and national agencies has already been published on theweb of data, for example Census data from Ireland, Italy or Franceamongst others. In most cases, though, this publication is done byactors exterior to the statistical office (see in particularhttp://270a.info/, http://eurostat.linkedstatistics.org/ orhttp://linkedstatistics.gr/), which raises issues such as longterm URIpersistence, institutional commitment and data maintenance.

Statistical organizations also possess an important corpus of structuralmetadata such as concept schemes, thesaurii, code lists andclassifications. Some of those are already available as linked data,generally in SKOS format (e.g. FAO's Agrovoc or UN's COFOG). Semanticweb standards useful for the statisticians have now arrived at maturity.The best examples are the W3C Data Cube, DCAT and ADMS vocabularies. Thestatistical community is also working on the definition of morespecialized vocabularies, especially under the umbrella of the DDIAlliance. For example, XKOS extends SKOS for the representation ofstatistical classifications, and Disco defines a vocabulary for datadocumentation and discovery. The Visual Analytics Vocabulary is a firststep towards semantic descriptions for user interface componentsdeveloped to visualize Linked Statistical Data which can lead toincreased linked data consumption and accessibility. We are now at thetipping point where the statistical and the semantic web communitieshave to formally exchange in order to share experiences and tools andthink ahead regarding the upcoming challenges.

The web of data will benefit in getting rich data published byprofessional and trustworthy data providers. It is also important thatmetadata maintained by statistical offices like concept schemes ofeconomic or societal terms, statistical classifications, wellknowncodes, etc., are available as linked data, because they are of goodquality, wellmaintained, and they constitute a corpus to which a lot ofother data can refer to.

Statisticians have a long-going culture of data integrity, quality anddocumentation. They have developed industrialized data production andpublication processes, and they care about data confidentiality and moregenerally how data can be used. It seems that after a period where theaim was to publish as many triples as possible, the focus of theSemantic Web community is now shifting to having a better quality ofdata and metadata, more coherent vocabularies (see the LOV initiative),good and documented naming patterns, etc. This workshop aims tocontribute in these longer term problems in order to have a significantimpact.

The statistics community faces sometimes challenges when trying to adoptSemantic Web technologies, in particular:

* difficulty to create and publish linked data: this can be alleviatedby providing methods, tools, lessons learned and best practices, bypublicizing successful examples and by providing support.* difficulty to see the purpose of publishing linked data: we mustdevelop end-user tools leveraging statistical linked data, provideconvincing examples of real use in applications or mashups, so that theend-user value of statistical linked data and metadata appears more clearly.* difficulty to use external linked data in their daily activity: it isimportant to develop statistical methods and tools especially tailoredfor linked data, so that statisticians can get accustomed to using themand get convinced of their specific utility.

To conclude, statisticians know how misleading it can be to exploitsemantic connections without carefully considering and weighinginformation about the quality of these connections, the validity ofinferences, etc. A challenge for them is to determine, to ensure and toinform consumers about the quality of semantic connections which may beused to support analysis in some circumstances but not others. Theworkshop will enable participants to discuss these very important issues.


Topics
======

The workshop will address topics related to statistics and linked data.This includes but is not limited to:


How to publish linked statistics?

* What are the relevant vocabularies for the publication of statisticaldata?* What are the relevant vocabularies for the publication of statisticalmetadata (code lists and classifications, descriptive metadata,provenance and quality information, etc.)?* What are the existing tools? Can the usual statistical softwarepackages (e.g. R, SAS, Stata) do the job?* How do we include linked data production and publication in the datalifecycle?

* How do we establish, document and share best practices?

How to use linked data for statistics?

* Where and how can we find statistics data: data catalogues, datasetdescriptions, data discovery?* How do we assess data quality (collection methodology, traceability,etc.)?* How can we perform data reconciliation, ontology matching and instancematching with statistical data?* How can we apply statistical processes on linked data: data analysis,descriptive statistics, estimation, correction?* How to intuitively represent statistical linked data: visualanalytics, results of data mining?


Submissions
===========

This workshop is aimed at an interdisciplinary audience of researchersand practitioners involved or interested in Statistics and the SemanticWeb. All papers must represent original and unpublished work that is notcurrently under review. Papers will be evaluated according to theirsignificance, originality, technical content, style, clarity, andrelevance to the workshop. At least one author of each accepted paper isexpected to attend the workshop.

Workshop participation is available to ISWC 2015 attendants at anadditional cost, see http://iswc2015.semanticweb.org/registration fordetails.

The workshop will also feature a challenge based on Census Datapublished on the web or provided by Statistical Institutes. It isexpected that data from Australia, France and Italy will be available.The challenge will consist in the realization of mashups orvisualizations, but also on comparisons, alignment and enrichment of thedata and concepts involved.


We welcome the following types of contributions:

* Full research papers (up to 12 pages)
* Short papers (up to 6 pages)
* Challenge papers (up to 6 pages)

All submissions must be written in English and must be formattedaccording to the information for LNCS Authors (seehttp://www.springer.com/computer/lncs?SGWID=0-164-6-793341-0). Pleasenote that (X)HTML(+RDFa) submissions are also welcome as long as thelayout complies with the LNCS style. Authors can for example use thetemplate provided at https://github.com/csarven/linked-research.Submissions are NOT anonymous. Please submit your contributionselectronically in PDF format athttp://www.easychair.org/conferences/?conf=semstats2015 and before July15, 2015, 23:59 PM Hawaii Time. All accepted papers will be archived inan electronic proceedings published by CEUR-WS.org.

If you are interested in submitting a paper but would like morepreliminary information, please contact semstats2...@easychair.org.


Chairs
======

* Sarven Capadisli, University of Bonn, Germany, and Bern University ofApplied Sciences, Switzerland

* Franck Cotton, INSEE, France
* Armin Haller, CSIRO, Australia
* Evangelos Kalampokis, CERTH/ITI and University of Macedonia, Greece
* Monica Scannapieco, Istat, Italy
* Raphaël Troncy, EURECOM, France

Program Committee
=================
(To be confirmed)
* Stefano Abbruzzini
* Phil Archer
* Ghislain Atemezing
* Hadley Beeman
* Ric Clarke
* Oscar Corcho
* Richard Cyganiak
* Stefano De Francisci
* Jay Devlin
* Miguel Expósito
* Dan Gillman
* Alberto González Yanes
* Arofan Gregory
* Tudor Groza
* Christophe Guéret
* Andreas Harth
* Jane Hunter
* Haklae Kim
* Yves Jacques
* Laurent Lefort
* Domenico Lembo
* Giorgia Lodi
* Erik Mannens
* Peter Mika
* Marco Pellegrino
* Dave Reynolds
* Bill Roberts
* Hideaki Takeda
* Boris Villazon Terrazas
* Wendy Thomas
* Bernard Vatant
* Joachim Wackerow
* Stuart Williams

[CFP] Third International Workshop on Semantic Statistics (SemStats 2015)

Reply via email to