Hi Rupert Thanks for the advices
Regarding to wiki-links dataset, the current dataset contains the context and freebase id (in guid format) but it doesn't contain the text of the document. It can be added later, because the wikilinks expanded dataset parser (thrift to RDF) I have developed already support the content of the document (when added). You can download the current datasets from http://iesl.cs.umass.edu/downloads/wiki-link/context-only/ . I am thinking about creating a ReferencedSite with the expanded wikilinks + google concept dictionary , also modifying the freebase id given by wikilinks (guid) to the new freebase id (m) (which I have already done as a test). This way, we'll have freebase entities and wikilinks information related. What do you think about it? Moreover, my intention is to allow use the wikilinks information with DBPedia. I'm going to use the freebase id of each mention in wikilinks to link with Freebase and I would like to do the same thing but using the wikipedia url to link with DBpedia. How could I do that? Regards On Wed, Jul 3, 2013 at 8:20 AM, Rupert Westenthaler < [email protected]> wrote: > Hi Antonio, > > Thank you for the nice overview. > > Let me mention that because of the following issue > > On Mon, Jul 1, 2013 at 11:36 AM, Antonio Perez <[email protected]> wrote: > > > > The indexing tool takes too much time in a standard computer, so in order > > to execute this process, you'll need either a computer with SSD or > > a computer with 200GB of RAM in order to deal with the whole Freebase > data > > dump in memory. > > > > Rafa has started to work on an IndexingSource that can directly > operate on the Freebase dump (any single file RDF dump that is sorted > by SPO). With such a source one can index a dataset without first > importing the data to an RDF triple store. As this is the most > hardware demanding part of the chain it should greatly improve > indexing performance. > > However this IndexingSource will not support LDPath and will therefore > not support some of the available EntityProcessors. > > > > > For the next milestone (midterm evaluation) the following tasks need to > be > > done: > > 1. Convert wiki-links data dump to RDF > > * Wiki-links contains a lot of disambiguation information which it is > > wanted to incorporate to the Entityhub Freebase site. > > * The wiki-link data dump will be converted to RDF to be easier to > > process by the new Stanbol Freebase indexing tool (point 2) > > * The wiki-link expanded dataset [1] will be used because it contains > > information like extracted context for the mentions, alignment to > Freebase > > entities, etc. > > 2. Develop a new stanbol indexer to join Freebase and wiki-links > > information > > The expanded dataset [1] is really great that is allows to avoid a lot > of very time-consuming tasks (crawling the resource and extracting the > mention text and context, linking the dbpedia URIs to freebase). > Without this those information the usage of this great dataset would > not be feasible because of time constraints. > > > 3. Generate a graph with the links in Freebase > > * To support Graph-based disambiguation algorithms in Stanbol, a > graph > > will be generated using Blueprints Neo4j and every node in the graph will > > be associated to entries in the EntityHub to later be used to position > > directly in a node on the graph. > > > > IMO this is really interesting not only for Disambiguation. I am > really looking forward to this. Do not forget to test the code also > with backends that are compatible with the Apache License. > > best > Rupert > > > Comments are more than welcome > > > > Regards > > > > [1] http://www.iesl.cs.umass.edu/data/wiki-links > > > > -- > > > > ------------------------------ > > This message should be regarded as confidential. If you have received > this > > email in error please notify the sender and destroy it immediately. > > Statements of intent shall only become binding when confirmed in hard > copy > > by an authorised signatory. > > > > Zaizi Ltd is registered in England and Wales with the registration number > > 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road, > > London W6 7AN. > > > > -- > | Rupert Westenthaler [email protected] > | Bodenlehenstraße 11 ++43-699-11108907 > | A-5500 Bischofshofen > -- ------------------------------ This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately. Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory. Zaizi Ltd is registered in England and Wales with the registration number 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road, London W6 7AN.
