Hi Bernie Bernie Greenberg wrote: > [...] are you trying to "union" two or more web knowledge databases > representing parts of the same knowledge? I found this a thankless task.
'thankless' is my new word today. :-) To understand what you mean, I needed to go to a common place for the English language (i.e. a dictionary) and read the definition (which fortunately, uses words I already know). I agree with you on the adjective, it is thankless. RDF itself in relation to information|data|knowledge integration do not offer IMHO particular advantages on a 'semantic' level, in particular if|when people use different vocabularies|schema|ontologies. RDF provides help for merging datasets at a sort of 'syntactical' level, that is trivial (and it gives you time to think about the 'semantic' :-)). If the data you need to merge is using same vocabulary|schema|ontology you are almost done. Otherwise, you are left on your own, practically. This is just my humble opinion. By the way, people often disagree on how to model the same thing or how to map between two ontologies (or translate between two languages)... or how to name the same thing with the same name (or URI) or on the notion of "same thing". Trying to automate these tasks is thankless^2. In relation to data integration/conversion, one approach I think works very well is what Wikipedia calls 'pivotal conversion' [1]. Data integration and data conversion between N different formats (or N different languages) is an N^2 problem. But, it can be reduced to an linear one simply adopting a core/common language. English for humans, TCP/IP for Internet, ? for data. With a pivotal data conversion/integration approach, it's very cheap to add a new format to your system, in particular if it is possible to transform from one format into another without loosing information. You just need to convert from/to a common format only. If you do that automatically you gain the conversion from/to all the other formats in the system. Why more and more people speak English? Because everybody else does it and this is the easiest way to communicate with everybody else. Unfortunately, human language is not as precise as other type of communication formats, when you go back and forward you lose information and translating from one language to another is not a precise process. RDF as well as OWL ontologies can be used in this way as core/common data format. This is easier on a syntactic level and it can become harder and imprecise as the expressive power of your language grows. However, you can still map external OWL ontologies to your own view of the world, your own internal core ontology. When you do that, your RDF toolbox has tools which allow you to translate RDF data described with an external ontology in data you can easily integrate and transform into other ontologies. To make things less abstract, here are three IMHO good examples of pivotal data integration|conversion: - Hojoki: Make All Your Cloud Apps Work As One http://hojoki.com/ - Open Services for Lifecycle Collaboration http://open-services.net/ and http://eclipse.org/lyo/ - SIMILE | Babel http://service.simile-widgets.org/babel/ Hojoki is really cool and you can measure how fast they keep adding new services, each time adding more and more value for their users. For them, adding a new service is easy. A beautiful example of pivotal data/service integration. The video at the bottom of the http://eclipse.org/lyo/ page could have be done by Google (promoting RDF without ever mention it. ;-)). It made me remeber: http://www.youtube.com/watch?v=TJfrNo3Z-DU ... unfortunate IMHO Google bought them. I do not see the Freebase datadumps growing massively as they could (being Google). But, then... why sharing? Let's all give Google more data via schema.org and maybe they'll give it back... in HTML :-/ Ops, ... Babel is not 'active' anymore AFAICT. I did not want to let it die, so I've stolen it and put it on GitHub [2] (also it is using Apache Jena now). It's much more limited as I've spent only a few hours on it. You just have two interfaces to implement to add a new tabular data format: https://github.com/castagna/babel2/blob/master/apis/src/main/java/org/apache/jena/babel2/BabelReader.java https://github.com/castagna/babel2/blob/master/apis/src/main/java/org/apache/jena/babel2/BabelWriter.java The SemanticType.java interface is trying to capture the 'semantic' axis: https://github.com/castagna/babel2/blob/master/apis/src/main/java/org/apache/jena/babel2/SemanticType.java Currently, there is only GenericType.java which implements SemanticType and it is a sort of 'tabular' data. But, nothing stops you to add more or more complex SemanticType: for example, you could rapresent graph data instead of tables, or go one level up and represent people, cars, etc. or one level up and represent knowledge domains such as: "food" or "sport". To conclude, the pivotal approach to data conversion/integration keeps the costs of adding new serialization formats or new data formats low and manageable. Each time you add a new data format the overall value of your integration software grows (quadratically?). This approach can be applied independently from RDF or OWL, there is nothing magic with RDF or OWL. However RDF gives you a powerful and flexible data model which can be easily adopted at the core of such systems and OWL (as well as SPARQL or other tools such as SPIN) gives you powerful ways to transform your data. Something that was a thankless can become almost pleasant. ;-) Paolo [1] http://en.wikipedia.org/wiki/Data_conversion#Pivotal_conversion [2] https://github.com/castagna/babel2/ (feel free to fork it, if you find it useful and send pull request if you improve it)
