Paolo, thank you so much for these insights and pointers. Perhaps "thankless" (which implies that ardent efforts were unrewarded by others) was not even the right word. What I was trying to convey was my discouragement on finding the ubiquity and variety of databases, each claiming authenticity, breadth, veracity, ease-of-use, produced by independent entities, revealing no schema/ontology consistency (or reason to expect to have such) with the others. It reminded me of my days as a schoolboy, long before the internet, when students resorted to libraries to research papers, and had to read four or five different encyclopaedias and search shelves for books written perhaps decades and continents apart, none of which were expected to have any commonality with the others except an accurate reporting of say, Napoleon's rout of the Prussians at Jena, the eye-structure of an owl, or the way to multiply two matrices. Each knowledge-source had its own schema, and part of learning to learn in the old way was to learn to master and union them to acquire such knowledge. If you were a graduate student, you might even have to read works in natural languages not your native one.
The web is a very different thing. The combination of Wikipedia and Google is amassing authority daily, and it scares me, a one-source knowledge-mart for all except serious researchers/doctoral students. The crowd-sourced or aggregated RDF info hubs resemble human, learned know-it-alls in different towns and different countries who don't even speak the same language, and aren't concerned about each other. While Wikipedia seems the evolutionary product of a forerunner decade of personal sites about single facets of the world, dbpedia is nothing like it in scope. accomplishment, or usefulness. representing only the "info-boxes", and (I have found) even they are wildly inconsistent in schema, reflecting crowd sourcing. The dream of an RDF database with all the real "knowledge" in Wikipedia, which would put to rest all the other putative know-it-all RDF DB's as Wikipedia has "John's Count Leo Tolstoy Site" remains elusive; there is not enough free labor to schematize and RDFize that much information, and computer text understanding has not yet reached the level where *reliable *facts can be automatically harvested from prose and awarded the authority needed (yes, I am aware of projects and techniques that try to do that). The "Utopian RDF vision" is clearly the total and automatic integration of all available RDF knowledge-sources to produce the SPARQL parallel of Google, complete with all the flaws of crowd-sourcing and hidden/mixed-reliability-models and dangers that afflict the world's de facto search engine. Clearly, that is what that group Rodrigo named is working on, and every other explorer in their morions and caravels on the uncharted sea of crowd-soruced RDF. Enough of my wasting the time of this working readership. Thanks, Paolo. My code continues to work well. Bernie On Thu, Apr 5, 2012 at 8:40 PM, Rodrigo Jardim <[email protected]> wrote: > Paolo, > your tips were very useful for me. > > Thanks very much > > -- > Rodrigo > > > > Hi Paolo, > > > Em 05/04/2012 18:37, Paolo Castagna escreveu: > > Hi Bernie >> >> Bernie Greenberg wrote: >> >>> [...] are you trying to "union" two or more web knowledge databases >>> representing parts of the same knowledge? I found this a thankless task. >>> >> 'thankless' is my new word today. :-) >> >> To understand what you mean, I needed to go to a common place for the >> English >> language (i.e. a dictionary) and read the definition (which fortunately, >> uses >> words I already know). >> >> I agree with you on the adjective, it is thankless. >> >> RDF itself in relation to information|data|knowledge integration do not >> offer >> IMHO particular advantages on a 'semantic' level, in particular if|when >> people >> use different vocabularies|schema|**ontologies. RDF provides help for >> merging >> datasets at a sort of 'syntactical' level, that is trivial (and it gives >> you >> time to think about the 'semantic' :-)). If the data you need to merge is >> using >> same vocabulary|schema|ontology you are almost done. Otherwise, you are >> left on >> your own, practically. This is just my humble opinion. >> >> By the way, people often disagree on how to model the same thing or how >> to map >> between two ontologies (or translate between two languages)... or how to >> name >> the same thing with the same name (or URI) or on the notion of "same >> thing". >> Trying to automate these tasks is thankless^2. >> >> In relation to data integration/conversion, one approach I think works >> very well >> is what Wikipedia calls 'pivotal conversion' [1]. Data integration and >> data >> conversion between N different formats (or N different languages) is an >> N^2 >> problem. But, it can be reduced to an linear one simply adopting a >> core/common >> language. English for humans, TCP/IP for Internet, ? for data. >> >> With a pivotal data conversion/integration approach, it's very cheap to >> add a >> new format to your system, in particular if it is possible to transform >> from one >> format into another without loosing information. You just need to convert >> from/to a common format only. If you do that automatically you gain the >> conversion from/to all the other formats in the system. >> >> Why more and more people speak English? Because everybody else does it >> and this >> is the easiest way to communicate with everybody else. Unfortunately, >> human >> language is not as precise as other type of communication formats, when >> you go >> back and forward you lose information and translating from one language to >> another is not a precise process. >> >> RDF as well as OWL ontologies can be used in this way as core/common data >> format. This is easier on a syntactic level and it can become harder and >> imprecise as the expressive power of your language grows. However, you >> can still >> map external OWL ontologies to your own view of the world, your own >> internal >> core ontology. When you do that, your RDF toolbox has tools which allow >> you to >> translate RDF data described with an external ontology in data you can >> easily >> integrate and transform into other ontologies. >> >> To make things less abstract, here are three IMHO good examples of >> pivotal data >> integration|conversion: >> >> - Hojoki: Make All Your Cloud Apps Work As One >> http://hojoki.com/ >> >> - Open Services for Lifecycle Collaboration >> http://open-services.net/ and http://eclipse.org/lyo/ >> >> - SIMILE | Babel >> >> http://service.simile-widgets.**org/babel/<http://service.simile-widgets.org/babel/> >> >> Hojoki is really cool and you can measure how fast they keep adding new >> services, each time adding more and more value for their users. For them, >> adding >> a new service is easy. A beautiful example of pivotal data/service >> integration. >> >> The video at the bottom of the http://eclipse.org/lyo/ page could have >> be done >> by Google (promoting RDF without ever mention it. ;-)). It made me >> remeber: >> http://www.youtube.com/watch?**v=TJfrNo3Z-DU<http://www.youtube.com/watch?v=TJfrNo3Z-DU>... >> unfortunate IMHO Google bought >> them. I do not see the Freebase datadumps growing massively as they could >> (being >> Google). But, then... why sharing? Let's all give Google more data via >> schema.org and maybe they'll give it back... in HTML :-/ Ops, ... >> >> Babel is not 'active' anymore AFAICT. I did not want to let it die, so >> I've >> stolen it and put it on GitHub [2] (also it is using Apache Jena now). >> It's much >> more limited as I've spent only a few hours on it. >> >> You just have two interfaces to implement to add a new tabular data >> format: >> https://github.com/castagna/**babel2/blob/master/apis/src/** >> main/java/org/apache/jena/**babel2/BabelReader.java<https://github.com/castagna/babel2/blob/master/apis/src/main/java/org/apache/jena/babel2/BabelReader.java> >> https://github.com/castagna/**babel2/blob/master/apis/src/** >> main/java/org/apache/jena/**babel2/BabelWriter.java<https://github.com/castagna/babel2/blob/master/apis/src/main/java/org/apache/jena/babel2/BabelWriter.java> >> >> The SemanticType.java interface is trying to capture the 'semantic' axis: >> https://github.com/castagna/**babel2/blob/master/apis/src/** >> main/java/org/apache/jena/**babel2/SemanticType.java<https://github.com/castagna/babel2/blob/master/apis/src/main/java/org/apache/jena/babel2/SemanticType.java> >> Currently, there is only GenericType.java which implements SemanticType >> and it >> is a sort of 'tabular' data. But, nothing stops you to add more or more >> complex >> SemanticType: for example, you could rapresent graph data instead of >> tables, or >> go one level up and represent people, cars, etc. or one level up and >> represent >> knowledge domains such as: "food" or "sport". >> >> To conclude, the pivotal approach to data conversion/integration keeps >> the costs >> of adding new serialization formats or new data formats low and >> manageable. Each >> time you add a new data format the overall value of your integration >> software >> grows (quadratically?). >> >> This approach can be applied independently from RDF or OWL, there is >> nothing >> magic with RDF or OWL. However RDF gives you a powerful and flexible data >> model >> which can be easily adopted at the core of such systems and OWL (as well >> as >> SPARQL or other tools such as SPIN) gives you powerful ways to transform >> your data. >> >> Something that was a thankless can become almost pleasant. ;-) >> >> Paolo >> >> [1] >> http://en.wikipedia.org/wiki/**Data_conversion#Pivotal_**conversion<http://en.wikipedia.org/wiki/Data_conversion#Pivotal_conversion> >> [2] >> https://github.com/castagna/**babel2/<https://github.com/castagna/babel2/>(feel >> free to fork it, if you find it >> useful and send pull request if you improve it) >> > >
