Some time ago I started a project that would use Hunspell with Java (I think at
the time I used that library that uses BridJ).
It worked well, but I also needed to have custom dictionaries, and another
interesting feature would be merge dictionaries.
Never had time to continue investigating it, but I remember it wasn't
straightforward add new words, and for some cases I'd have to use the Hunspell
syntax to tell it about the possible variations of the word.
Having something in Java would definitely be helpful. So if someone starts a
GitHub repo, and have the skills to either port the existing Hunspell code (not
sure about license issues) or write one from scratch based on papers, I'd be
keen to take a look, help testing, and maybe even send some pull requests :)
That'd be great not just LT, but also for several other OSS projects. At the
moment, when I need a simple dictionary and don't need all features of
Hunspell, I prefer to use jazzy.
Cheers,Bruno
From: Daniel Naber <daniel.na...@languagetool.org>
To: LanguageTool Developer List <languagetool-devel@lists.sourceforge.net>
Sent: Wednesday, 29 June 2016 9:28 PM
Subject: The spell checker issue
Hi,
yesterday I tried to update the English dictionary that LT includes. The
details are documented at
https://github.com/languagetool-org/languagetool/issues/329 but in a
nutshell: our spell checking is so complicated that the dictionary
update didn't work.
We could really need a process that allows us to use hunspell
dictionaries directly, without conversion to other formats. The original
reason we don't use hunspell (or only parts of it) is that it's slow,
especially when it comes to generating suggestions. Today I ran a test
with hunspell 1.4.1 and LT, and it turns out LT is about 4-5 times
faster.
What could be a solution:
A) Improve hunspell to be faster. We'd need someone who can do this and
then we'd still rely on native code, which isn't what we want in Java
(but we've lived with it for years now).
B) Finally write a Java-based spell checker that can read hunspell
dictionaries. The internet is full of spell checkers, but we need one
with support for advanced features like compound words (important for
German).
C) I don't know, do you have an idea?
If we cannot find a solution, the current situation will persist so that
some dictionaries probably won't be updated.
Regards
Daniel
This is the text for testing, full of typos (supposed to be German):
Fgen Siex hxier Ixhren Txext eiwen. Klcken ie nch dr Prüung aug diw
fatbig
unteelegten Textstellwn. oder notzen Sie desen Teyt alls Beeispiel füür
eein
Paat Fwhler , diw LanguageTool erkwnnen ksnn: Ih wirde Ankst und banke.
------------------------------------------------------------------------------
Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel
------------------------------------------------------------------------------
Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel