Hi all On Thu, Oct 23, 2014 at 5:29 PM, Rafa Haro <rh...@apache.org> wrote: > 1. Extend Aida-Light for supporting others datasets. We would need to check > how much the disambiguation algorithms are coupled with the information > provided by YAGO and try to convert them to a generic approach.
With a focus on the Aida-Light engine IMO this is by far the most interesting topic. > > 2. Current Aida-Light architecture. Currently, all the data is preloaded in > memory forcing the user to use a high profiled machines. We have discussed > this several times, but maybe it is moment to finally decide on a proper > backend strategy for supporting disambiguation. Probably current Yards are > not enough/valid. I fear that each disambiguation approach will come with its own data model. Mainly because the way data is kept is central for performance. Also based on what I have seen up to now keeping everything in-memory is the way to go. For Aida-Light I suggest to keep the current solutions. The requirement of about 50GByte RAM for Yago is anyways quite OK. If we can add support for more focused datasets one will often end up with far less entities. This is also a way to keep resource requirements down. > > 3. Stanbol Disambiguation API. Another (almost) eternal discussion. Can we > design an extensible API for supporting different disambiguation approaches? Something like the "enhancer.nlp" module that allowed to integrate different NLP processing frameworks but for disambiguation would be nice to have. But for Chalitha IMO it would be much more efficient to focus in improving the Aida-Light engine as opposed to start implementing a disambiguation framework. best Rupert -- | Rupert Westenthaler rupert.westentha...@gmail.com | Bodenlehenstraße 11 ++43-699-11108907 | A-5500 Bischofshofen | REDLINK.CO .......................................................................... | http://redlink.co/