Marcin, in reaction to your quote below, I can only agree that the
computational part is quite complex.

There was some research in Tilburg, resulting in a quite simple
computational algorithm:

- reduce all characters to their simple form (drop accents)
- get their ascii value, raise it to the 5th power
- add these numbers
Result is a simple number to calculate with. All words with this number
are letter-equivalent (or almost).
- compute common letter group mistakes (hunspell rep) into number
differences beforehand
- apply a number of these differences to the main number
- get their words
Then add de character-distance cacluation for filtering out too different
suggestions.

This works rather well, as simplistic as it is.

It does not take any compounding into account, which would result in less
flexibility in suggestions for compounding languages.

There shoul be a publication about tis algorithm from Martin Reynaert,
University of Tilburg.

Just in case you are interested.

Ruud

>>I'm not a fan of hunspell; I think it has a wrong approach for creating
>>suggestions because the computational complexity of its algorithm is
>>simply too high.


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to