Hi all Since this week is the midterm evaluation of the GSoC projects, I want to tell you the status of this project.
I began my project trying to index Freebase data using the Freebase indexer in Stanbol but this process was too expensive to be done in a normal computer (with about 8 GB RAM and non SSD hard disk). I was able to create a Referenced Site with Freebase data using Rupert's index (generated using an SSD hard disk) Currently, Rafa Haro is working on the Jena TDB part of the indexer in order to speed up the process of index Freebase data. The next task was to parse the Wikilinks extended dataset [1] and store it in a Jena TDB database, in order to take advantage of the contained information to be used in some tasks, like disambiguation. Moreover a service has been created (along with the parser tool) in order to query the data and retrieve information about Wikilink items. The code and more information about this library can be found at [2] Ideally, when the new Freebase indexer is finished and tested, I would like to integrate Freebase data and Wikilinks data in the same referenced site, because the Wikilinks extended dataset contains references to Freebase entities, so it's relatively easy to link both informations. But for now, we can use the Wikilinks information to perform other tasks. In order to finish the work for the midterm, I have develop a tool to import Freebase data using the BaseKBLime data dump [3] into a graph database (Neo4j right now using the Tinkerpop Blueprints interfaces [4]). Moreover, a simple algortihm to "weight" the graph is done during the import process. The code and more information about this tool can be obtained in [5]. With this information, I have got a Knowledge Base which can be used to develop new graph-based disambiguation algorithms. So far it is the work done for the midterm The expected work for the second part is to develop a disambiguation algorithm using the generated graph. To do this, I am taking a look two papers ([6] and [7]) to take some ideas to develop a new algorithm. This is all folks, so please feel free to comment. Comments are more than welcome. Best regards -- ------------------------------ This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately. Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory. Zaizi Ltd is registered in England and Wales with the registration number 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road, London W6 7AN.