Hi all

Since this week is the midterm evaluation of the GSoC projects, I want to
tell you the status of this project.

I began my project trying to index Freebase data using the Freebase indexer
in Stanbol but this process was too expensive to be done in a normal
computer (with about 8 GB RAM and non SSD hard disk).

I was able to create a Referenced Site with Freebase data using Rupert's
index (generated using an SSD hard disk)

Currently, Rafa Haro is working on the Jena TDB part of the indexer in
order to speed up the process of index Freebase data.

The next task was to parse the Wikilinks extended dataset [1] and store it
in a Jena TDB database, in order to take advantage of the contained
information to be used in some tasks, like disambiguation.
Moreover a service has been created (along with the parser tool) in order
to query the data and retrieve information about Wikilink items. The code
and more information about this library can be found at [2]

Ideally, when the new Freebase indexer is finished and tested, I would like
to integrate Freebase data and Wikilinks data in the same referenced site,
because the Wikilinks extended dataset contains references to Freebase
entities, so it's relatively easy to link both informations. But for now,
we can use the Wikilinks information to perform other tasks.

In order to finish the work for the midterm, I have develop a tool to
import Freebase data using the BaseKBLime data dump [3] into a graph
database (Neo4j right now using the Tinkerpop Blueprints interfaces [4]).
Moreover, a simple algortihm to "weight" the graph is done during the
import process.
The code and more information about this tool can be obtained in [5].

With this information, I have got a Knowledge Base which can be used to
develop new graph-based disambiguation algorithms.

So far it is the work done for the midterm

The expected work for the second part is to develop a disambiguation
algorithm using the generated graph. To do this, I am taking a look two
papers ([6] and [7]) to take some ideas to develop a new algorithm.


This is all folks, so please feel free to comment. Comments are more than
welcome.

Best regards

-- 

------------------------------
This message should be regarded as confidential. If you have received this 
email in error please notify the sender and destroy it immediately. 
Statements of intent shall only become binding when confirmed in hard copy 
by an authorised signatory.

Zaizi Ltd is registered in England and Wales with the registration number 
6440931. The Registered Office is Brook House, 229 Shepherds Bush Road, 
London W6 7AN. 

Reply via email to