Pivotal data conversion/integration... (Was: Mapping Ontologies)

Paolo Castagna Thu, 05 Apr 2012 14:37:37 -0700

Hi Bernie

Bernie Greenberg wrote:
> [...] are you trying to "union" two or more web knowledge databases
> representing parts of the same knowledge? I found this a thankless task.

'thankless' is my new word today. :-)

To understand what you mean, I needed to go to a common place for the English
language (i.e. a dictionary) and read the definition (which fortunately, uses
words I already know).

I agree with you on the adjective, it is thankless.

RDF itself in relation to information|data|knowledge integration do not offer
IMHO particular advantages on a 'semantic' level, in particular if|when people
use different vocabularies|schema|ontologies. RDF provides help for merging
datasets at a sort of 'syntactical' level, that is trivial (and it gives you
time to think about the 'semantic' :-)). If the data you need to merge is using
same vocabulary|schema|ontology you are almost done. Otherwise, you are left on
your own, practically. This is just my humble opinion.

By the way, people often disagree on how to model the same thing or how to map
between two ontologies (or translate between two languages)... or how to name
the same thing with the same name (or URI) or on the notion of "same thing".
Trying to automate these tasks is thankless^2.

In relation to data integration/conversion, one approach I think works very well
is what Wikipedia calls 'pivotal conversion' [1]. Data integration and data
conversion between N different formats (or N different languages) is an N^2
problem. But, it can be reduced to an linear one simply adopting a core/common
language. English for humans, TCP/IP for Internet, ? for data.

With a pivotal data conversion/integration approach, it's very cheap to add a
new format to your system, in particular if it is possible to transform from one
format into another without loosing information. You just need to convert
from/to a common format only. If you do that automatically you gain the
conversion from/to all the other formats in the system.

Why more and more people speak English? Because everybody else does it and this
is the easiest way to communicate with everybody else. Unfortunately, human
language is not as precise as other type of communication formats, when you go
back and forward you lose information and translating from one language to
another is not a precise process.

RDF as well as OWL ontologies can be used in this way as core/common data
format. This is easier on a syntactic level and it can become harder and
imprecise as the expressive power of your language grows. However, you can still
map external OWL ontologies to your own view of the world, your own internal
core ontology. When you do that, your RDF toolbox has tools which allow you to
translate RDF data described with an external ontology in data you can easily
integrate and transform into other ontologies.

To make things less abstract, here are three IMHO good examples of pivotal data
integration|conversion:

- Hojoki: Make All Your Cloud Apps Work As One
http://hojoki.com/

- Open Services for Lifecycle Collaboration
http://open-services.net/ and http://eclipse.org/lyo/

- SIMILE | Babel
http://service.simile-widgets.org/babel/

Hojoki is really cool and you can measure how fast they keep adding new
services, each time adding more and more value for their users. For them, adding
a new service is easy. A beautiful example of pivotal data/service integration.

The video at the bottom of the http://eclipse.org/lyo/ page could have be done
by Google (promoting RDF without ever mention it. ;-)). It made me remeber:
http://www.youtube.com/watch?v=TJfrNo3Z-DU ... unfortunate IMHO Google bought
them. I do not see the Freebase datadumps growing massively as they could (being
Google). But, then... why sharing? Let's all give Google more data via
schema.org and maybe they'll give it back... in HTML :-/ Ops, ...

Babel is not 'active' anymore AFAICT. I did not want to let it die, so I've
stolen it and put it on GitHub [2] (also it is using Apache Jena now). It's much
more limited as I've spent only a few hours on it.

You just have two interfaces to implement to add a new tabular data format:
https://github.com/castagna/babel2/blob/master/apis/src/main/java/org/apache/jena/babel2/BabelReader.java
https://github.com/castagna/babel2/blob/master/apis/src/main/java/org/apache/jena/babel2/BabelWriter.java

The SemanticType.java interface is trying to capture the 'semantic' axis:
https://github.com/castagna/babel2/blob/master/apis/src/main/java/org/apache/jena/babel2/SemanticType.java
Currently, there is only GenericType.java which implements SemanticType and it
is a sort of 'tabular' data. But, nothing stops you to add more or more complex
SemanticType: for example, you could rapresent graph data instead of tables, or
go one level up and represent people, cars, etc. or one level up and represent
knowledge domains such as: "food" or "sport".

To conclude, the pivotal approach to data conversion/integration keeps the costs
of adding new serialization formats or new data formats low and manageable. Each
time you add a new data format the overall value of your integration software
grows (quadratically?).

This approach can be applied independently from RDF or OWL, there is nothing
magic with RDF or OWL. However RDF gives you a powerful and flexible data model
which can be easily adopted at the core of such systems and OWL (as well as
SPARQL or other tools such as SPIN) gives you powerful ways to transform your
data.

Something that was a thankless can become almost pleasant. ;-)

Paolo

[1] http://en.wikipedia.org/wiki/Data_conversion#Pivotal_conversion
[2] https://github.com/castagna/babel2/ (feel free to fork it, if you find it
useful and send pull request if you improve it)

Pivotal data conversion/integration... (Was: Mapping Ontologies)

Reply via email to