W dniu 2012-06-22 07:39, Dominique Pellé pisze: > Daniel Naber<list2...@danielnaber.de> wrote: > >> Hi, >> >> I know Marcin has warned about this several times but I only today noticed >> how slow spell check suggestions really are. Checking a long German blog >> entry takes 100 seconds. Without suggestions it takes 6 seconds. 100 >> seconds seems not acceptable, so I suggest we keep the spell checking but >> disable the suggestions. In the next version we can then have an >> alternative rule that does spell checking with suggestions (introducing >> that now would mean new strings that need translation). Any thoughts?
I'm against the complete removal of suggestions. Hunspell without suggestions is not better than using just our taggers for spell-check, and for some languages, like English, hunspell is fine as regards speed. German simply has a huge dictionary, all because of compounding. The only thing that seems practical to me is to extend HunspellRule for languages that have really slow suggestions (HunspellNoSuggestRule) and use it for this release. I do not think that removing spell-check suggestions for English US is a good idea at all. For languages without compounds and diacritics, we can use unmunch and convert them quickly to morfologik-speller format, if you're worried about the speed. The morfologik-speller module is experimental and it does not allow for some of hunspell tricks (I didn't have time to implement REP etc.), but it's fair enough. > I'm fine with this if that. > Will this be made configurable in command line mode? > 3 possible modes: > > (1) no hunspell (fastest) (we can already do that with -d HUNSPELL_RULE) > (2) hunspell without spelling suggestions > (3) hunspell with corrections (slowest) I'm afraid this is not possible and will not be possible with 1.8. The rules are not configurable at all from the command-line, and we're already in the freeze period, no new features introduced. Rule configuration is a non-trivial thing to implement. > > On top of that, there is also the idea of using Hunspell > only on words with UNKNOWN POS tag which may work > fine for some languages. This algorithm would be a waste of time: we can already use non-tagged words for displaying an error. It will be faster than any Hunspell rule. But most Hunspell dicts cover more words than our taggers, so it should not make any difference in timing, especially because we would have to some string processing for every sentence to map string portions to tokens. In some languages it won't help, if hunspell tokenization is different. This is why I didn't bother with this. Moreover, checking time is negligible. The crucial thing is the time spent for creating suggestions. Regards, Marcin ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel