Hi all

On Thu, Oct 23, 2014 at 5:29 PM, Rafa Haro <rh...@apache.org> wrote:
> 1. Extend Aida-Light for supporting others datasets. We would need to check 
> how much the disambiguation algorithms are coupled with the information 
> provided by YAGO and try to convert them to a generic approach.

With a focus on the Aida-Light engine IMO this is by far the most
interesting topic.

>
> 2. Current Aida-Light architecture. Currently, all the data is preloaded in 
> memory forcing the user to use a high profiled machines. We have discussed 
> this several times, but maybe it is moment to finally decide on a proper 
> backend strategy for supporting disambiguation. Probably current Yards are 
> not enough/valid.

I fear that each disambiguation approach will come with its own data
model. Mainly because the way data is kept is central for performance.
Also based on what I have seen up to now keeping everything in-memory
is the way to go. For Aida-Light I suggest to keep the current
solutions. The requirement of about 50GByte RAM for Yago is anyways
quite OK. If we can add support for more focused datasets one will
often end up with far less entities. This is also a way to keep
resource requirements down.

>
> 3. Stanbol Disambiguation API. Another (almost) eternal discussion. Can we 
> design an extensible API for supporting different disambiguation approaches?

Something like the "enhancer.nlp" module that allowed to integrate
different NLP processing frameworks but for disambiguation would be
nice to have. But for Chalitha IMO it would be much more efficient to
focus in improving the Aida-Light engine as opposed to start
implementing a disambiguation framework.

best
Rupert



-- 
| Rupert Westenthaler             rupert.westentha...@gmail.com
| Bodenlehenstraße 11                              ++43-699-11108907
| A-5500 Bischofshofen
| REDLINK.CO 
..........................................................................
| http://redlink.co/

Reply via email to