Re: Schema Mappings (was Re: AW: ANN: LOD Cloud - Statistics and compliance with best practices)
Leigh, Le 22/10/2010 17:23, Leigh Dodds a écrit : Hi, On 22 October 2010 09:35, Chris Bizer wrote: Anja has pointed to a wealth of openly available numbers (no pun intended), that have not been discussed at all. For example, only 7.5% of the data source provide a mapping of "proprietary vocabulary terms" to "other vocabulary terms". For anyone building applications to work with LOD, this is a real problem. Yes, this is also the figure that scared me most. This might be low for a good reason: people may be creating proprietary terms because they don't feel well served by existing vocabularies and hence defining mappings (or even just reusing terms) may be difficult or even impossible. This also strikes me as an opportunity: someone could usefully build a service (perhaps built on facilities in Sindice) that aggregated schema information and provides tools for expressing simple mappings and equivalencies. It could fill a dual role: recommend more common/preferred terms, whilst simultaneously providing machine-readable equivalencies. This sounds very much like what an ontology alignment server is doing: it provides alignments [often synonym with mappings] on demand (given two ontology URIs), either by retrieving locally stored alignments, or by asking another alignment server for an alignment that it may have, or by computing the alignment on the fly, given a certain direct matching algorithm or from the aggregation (e.g., composition) of existing alignments. The alignment server can also be used for various other things such as comparing alignments, evaluating them, rating them, updating them, etc. A paper describing the Alignment server [1] has been submitted to the Semantic Web Journal and is under open review (you can read the paper and the reviews and submit your own reviews or comments). The server itself can be downloaded and installed anywhere [2]. I know that Uberblic provides some mapping tools in this area, allowing for the creation of a more normalized view across the web, but not sure how much of that is resurfaced. There are literally dozens of systems for ontology matching or schema mappings, which can more or less be used for Web Ontologies. Every year, a competition is organised [3] to evaluate the ontology matching tools, which features various tests among which several OWL ontology matching tasks. The output is a ranked list of equivalences or subsumption relations between the terms of the input ontologies. These tools are often unknown to the LOD enthusiasts although they could be obtained from their authors and tested on concrete cases. On the other side, the Ontology Matching crowd is always eager to find concrete applications to test their tools on real life problems. More information and some 500+ publications on the topic can be found on the ontologymatching.org [4]. Recall that ontology matching has its root in schema matching, which is---as Enrico Motta just said on this list---a 30 year old topic. [1] Jérôme Euzenat and Chan Le Duc. The Alignment server: storing and sharing alignments on the semantic web. http://www.semantic-web-journal.net/content/new-submission-alignment-server-storing-and-sharing-alignments-semantic-web [2] Alignment API and Alignment Server. http://alignapi.gforge.inria.fr/ [3] The Ontology Alignment Evaluation Initiative (AOEI). http://oaei.ontologymatching.org/ [4] http://www.ontologymatching.org/ Regards, -- Antoine Zimmermann Researcher at: Laboratoire d'InfoRmatique en Image et Systèmes d'information Database Group 7 Avenue Jean Capelle 69621 Villeurbanne Cedex France Lecturer at: Institut National des Sciences Appliquées de Lyon 20 Avenue Albert Einstein 69621 Villeurbanne Cedex France antoine.zimmerm...@insa-lyon.fr http://zimmer.aprilfoolsreview.com/
Re: Schema Mappings (was Re: AW: ANN: LOD Cloud - Statistics and compliance with best practices)
Leigh, Le 22/10/2010 17:23, Leigh Dodds a écrit : > Hi, > > On 22 October 2010 09:35, Chris Bizer wrote: >>> Anja has pointed to a wealth of openly >>> available numbers (no pun intended), that have not been discussed at all. >> For >>> example, only 7.5% of the data source provide a mapping of "proprietary >>> vocabulary terms" to "other vocabulary terms". For anyone building >>> applications to work with LOD, this is a real problem. >> >> Yes, this is also the figure that scared me most. > > This might be low for a good reason: people may be creating > proprietary terms because they don't feel well served by existing > vocabularies and hence defining mappings (or even just reusing terms) > may be difficult or even impossible. > > This also strikes me as an opportunity: someone could usefully build a > service (perhaps built on facilities in Sindice) that aggregated > schema information and provides tools for expressing simple mappings > and equivalencies. It could fill a dual role: recommend more > common/preferred terms, whilst simultaneously providing > machine-readable equivalencies. This sounds very much like what an ontology alignment server is doing: it provides alignments [often synonym with mappings] on demand (given two ontology URIs), either by retrieving locally stored alignments, or by asking another alignment server for an alignment that it may have, or by computing the alignment on the fly, given a certain direct matching algorithm or from the aggregation (e.g., composition) of existing alignments. The alignment server can also be used for various other things such as comparing alignments, evaluating them, rating them, updating them, etc. A paper describing the Alignment server [1] has been submitted to the Semantic Web Journal and is under open review (you can read the paper and the reviews and submit your own reviews or comments). The server itself can be downloaded and installed anywhere [2]. > I know that Uberblic provides some mapping tools in this area, > allowing for the creation of a more normalized view across the web, > but not sure how much of that is resurfaced. There are literally dozens of systems for ontology matching or schema mappings, which can more or less be used for Web Ontologies. Every year, a competition is organised [3] to evaluate the ontology matching tools, which features various tests among which several OWL ontology matching tasks. The output is a ranked list of equivalences or subsumption relations between the terms of the input ontologies. These tools are often unknown to the LOD enthusiasts although they could be obtained from their authors and tested on concrete cases. On the other side, the Ontology Matching crowd is always eager to find concrete applications to test their tools on real life problems. More information and some 500+ publications on the topic can be found on the ontologymatching.org [4]. Recall that ontology matching has its root in schema matching, which is---as Enrico Motta just said on this list---a 30 year old topic. [1] Jérôme Euzenat and Chan Le Duc. The Alignment server: storing and sharing alignments on the semantic web. http://www.semantic-web-journal.net/content/new-submission-alignment-server-storing-and-sharing-alignments-semantic-web [2] Alignment API and Alignment Server. http://alignapi.gforge.inria.fr/ [3] The Ontology Alignment Evaluation Initiative (AOEI). http://oaei.ontologymatching.org/ [4] http://www.ontologymatching.org/ Regards, -- Antoine Zimmermann Researcher at: Laboratoire d'InfoRmatique en Image et Systèmes d'information Database Group 7 Avenue Jean Capelle 69621 Villeurbanne Cedex France Lecturer at: Institut National des Sciences Appliquées de Lyon 20 Avenue Albert Einstein 69621 Villeurbanne Cedex France antoine.zimmerm...@insa-lyon.fr http://zimmer.aprilfoolsreview.com/
Re: Schema Mappings (was Re: AW: ANN: LOD Cloud - Statistics and compliance with best practices)
Hi Leigh and Enrico, > Hi, > > On 22 October 2010 09:35, Chris Bizer wrote: >>> Anja has pointed to a wealth of openly >>> available numbers (no pun intended), that have not been discussed at all. >> For >>> example, only 7.5% of the data source provide a mapping of "proprietary >>> vocabulary terms" to "other vocabulary terms". For anyone building >>> applications to work with LOD, this is a real problem. >> >> Yes, this is also the figure that scared me most. > > This might be low for a good reason: people may be creating > proprietary terms because they don't feel well served by existing > vocabularies and hence defining mappings (or even just reusing terms) > may be difficult or even impossible. Yes, this is true in many cases and for a given point in time. But altogether I think it is important to see web-scale data integration more in an evolutionary fashion in which different factors play together over time. In my opinion these factors are: 1. An increasing amount of people start to use existing vocabularies which already solves the integration problem in some areas simply by agreement on these vocabularies. 2. More and more instance data is becoming available on the Web, which makes it easier to mine schema mappings using statistical methods. 3. Different groups in various areas want to contribute to solving the integration problem and thus invest effort in manually aligning vocabularies (for instance between different standards used in the libraries community or for people and provenance related vocabularies within the W3C Social Web and Provenance XGs). 4. The Web allows you to share mappings by publishing them as RDF. Thus many different people and groups may provide small contributions (= hints) that help to solve the problem in the long run. My thinking on the topic was strongly influenced by the pay-as-you-go data integration ideas developed by Alon Halevy and other people in the dataspaces community. A cool paper on the topic is in my opinion: Web-Scale Data Integration: You can afford to Pay as You Go. Madhavan, J.; Cohen, S.; Dong, X.; Halevy, A.; Jeffery, S.; Ko, D.; Yu, C., CIDR (2007) http://research.yahoo.com/files/paygo.pdf describing a system that applies schema clustering in order to mine mappings from Google Base and web table data and presents ideas on how you can deal with the uncertainty that you introduce using ranking algorithms. Other interesting papers in the area are: Das Sarma, A., Dong, X., Halevy, A.: Bootstrapping pay-as-you-go data integration systems. Proceedings of the Conference on Management of Data, SIGMOD (2008) Vaz Salles, M.A., Dittrich, J., Karakashian, S.K., Girard, O.R., Blunschi, L.: iTrails: Payas-you-go Information Integration in Dataspaces. In: Conference of Very large Data Bases (VLDB 2007), 663-674 (2007) Franklin, M.J., Halevy, A.Y., Maier, D.: From databases to dataspaces: A new abstraction for information management. SIGMOD Record 34(4), pp. 2733 (2005) Hedeler, C., et al.: Dimensions of Dataspaces. In: Proceedings of the 26th British National Conference on Databases, pp. 55-66 (2009) These guys always have the idea that mappings are added to a dataspace by administrators or mined using a single, specific method. What I think is interesting in the Web of Linked Data setting is that mappings can be created and published by different parties to a single global dataspace. Meaning that the necessary effort to create the mappings can be divided between different parties. So pay-as-you-go might evolve into somebody-pay-as-you-go :-) But of course also meaning that the quality of mappings is becoming increasingly uncertain and that the information consumer needs to assess the quality of mappings and decide which ones it wants to use. We are currently exploring this problem space and will present a paper about publishing and discovering mappings on the Web of Linked Data at the COLD workshop at ISWC 2010. http://www.wiwiss.fu-berlin.de/en/institute/pwo/bizer/research/publications/ BizerSchultz-COLD-R2R-Paper.pdf Central ideas of the paper are that: 1. you identify mappings with URIs so that they can be interlinked from vocabulary definitions or void dataset descriptions and so that client applications as well as Web of Data search engines can discover them. 2. A client application which discovers data that is represented using terms that are unknown to the application may search the Web for mappings, apply a quality evaluation heuristic to decide which alternative mappings to use and then apply the chosen mappings to translate data to its local schema. > This also strikes me as an opportunity: someone could usefully build a > service (perhaps built on facilities in Sindice) that aggregated > schema information and provides tools for expressing simple mappings > and equivalencies. It could fill a dual role: recommend more > common/preferred terms, whilst simultaneously providing > machine-readable equivalencies. Absolutely, there migh
Re: Schema Mappings (was Re: AW: ANN: LOD Cloud - Statistics and compliance with best practices)
Hi Antoine, On Friday, October 22, 2010, Antoine Zimmermann wrote: > Le 22/10/2010 17:23, Leigh Dodds a écrit : > This also strikes me as an opportunity: someone could usefully build a > service (perhaps built on facilities in Sindice) that aggregated > schema information and provides tools for expressing simple mappings > and equivalencies. It could fill a dual role: recommend more > common/preferred terms, whilst simultaneously providing > machine-readable equivalencies. > > This sounds very much like what an ontology alignment server is doing: > it provides alignments [often synonym with mappings] on demand (given > two ontology URIs), either by retrieving locally stored alignments, or > by asking another alignment server for an alignment that it may have, or > by computing the alignment on the fly, given a certain direct matching > algorithm or from the aggregation (e.g., composition) of existing > alignments. The alignment server can also be used for various other > things such as comparing alignments, evaluating them, rating them, > updating them, etc. > > A paper describing the Alignment server [1] has been submitted to the > Semantic Web Journal and is under open review (you can read the paper > and the reviews and submit your own reviews or comments). The server > itself can be downloaded and installed anywhere [2]. Interesting, thanks for the reference. I was aware that there's has been and continues to be a lot of research in this area, but was just wondering out loud whether anyone has explored opening up some kind of matching service on a more production footing, either as an automated service or using crowd-sourced mappings. Running tools locally, and explore their effectiveness would he an interesting exercise. But presumably there will be a need to start surfacing some services in this area soon, as part of the general semweb infrastructure. Cheers, L. -- Leigh Dodds Programme Manager, Talis Platform Talis leigh.do...@talis.com http://www.talis.com
Schema Mappings (was Re: AW: ANN: LOD Cloud - Statistics and compliance with best practices)
Hi, On 22 October 2010 09:35, Chris Bizer wrote: >> Anja has pointed to a wealth of openly >> available numbers (no pun intended), that have not been discussed at all. > For >> example, only 7.5% of the data source provide a mapping of "proprietary >> vocabulary terms" to "other vocabulary terms". For anyone building >> applications to work with LOD, this is a real problem. > > Yes, this is also the figure that scared me most. This might be low for a good reason: people may be creating proprietary terms because they don't feel well served by existing vocabularies and hence defining mappings (or even just reusing terms) may be difficult or even impossible. This also strikes me as an opportunity: someone could usefully build a service (perhaps built on facilities in Sindice) that aggregated schema information and provides tools for expressing simple mappings and equivalencies. It could fill a dual role: recommend more common/preferred terms, whilst simultaneously providing machine-readable equivalencies. I know that Uberblic provides some mapping tools in this area, allowing for the creation of a more normalized view across the web, but not sure how much of that is resurfaced. Cheers, L. -- Leigh Dodds Programme Manager, Talis Platform Talis leigh.do...@talis.com http://www.talis.com