On Tue, Apr 30, 2013 at 6:03 PM, Rafa Haro <rh...@zaizi.com> wrote: > Hi Rupert, Antonio, all > > El 27/04/13 16:35, Rupert Westenthaler escribió: > >>> For this, I would like to discuss some topics about the proposal: >>> > >>> >- Knowledge Base: I have decided to stick first to Freebase, because it >>> > has >>> >a REST API allowing 100k calls per day for read and 10k for write. >>> > Besides >>> >the REST API, an alternative could be to integrate the whole freebase >>> > graph >>> >in Stanbol and use their Java API to manage it. Ideally, the management >>> >framework should be valid for others knowledge bases as Wikipedia or >>> >DBpedia. >>> > >> >> I recently created my first Freebase index for Stanbol (see >> STANBOL-1014 for the Indexing tool). First test on an Index with all >> Freebase Topics and all languages have shown very nice result! IMO >> Freebase is currently for sure the better choice over DBpedia. However >> one needs to see/wait how Freebase compares to the Wikidata project >> [4] that only recently entered phase 2. >> >> Designing disambiguation in a way that it can be applied to other >> datasets would be for sure a great bonus. But given the good results >> one can get with Freebase I would even be very interested if the >> results would only work on Freebase ^^ > > Following Rupert's idea, I agree that maybe the best is to develop a > Knowledge Base manager within Stanbol for disambiguation purposes. IMO, it > would be a mistake to try to come with an universal solution. I suppose that > one wants to generate its knowledge base differently according to custom > data domains. For instance, a graph representation is more suitable in "real > world" knowledge bases, while most domains are well covered with a taxonomy > structure. > > It would be important to develop tools to allow Stanbol to interact with > these knowledge bases from-to EntityHub sites. Of course, a good way to > learn how to do that could be developing first a nice solution only for > Freebase. >
There is already the "2-layered storage infrastructure" [1] for the Contenthub. Developed mainly by Suat and Anil in an own branch. In this trunk there is also a new commons.semanticindexing [2] package. This architecture would allow for using a "knowledge base" as "Indexing Source" and use an Entityhub Site as "Indexing Destination". So it goes exactly in the proposed direction. I have planed for long to adapt this for the Entityhub, but development in this branch has not shown much progress in the recent time and I do also have higher priority task ATM. To be clear: I would not recommend to use this for a GSoC project but rather use available APIs of the Entityhub as this is clearly off topic to disambiguation. However the results of the propoesed GSoC project could be a nice application/validation case for [1]. [1] https://issues.apache.org/jira/browse/STANBOL-471 [2] https://issues.apache.org/jira/browse/STANBOL-701 best Rupert -- | Rupert Westenthaler rupert.westentha...@gmail.com | Bodenlehenstraße 11 ++43-699-11108907 | A-5500 Bischofshofen