Re: LanguageTool for Chrome

Juan Martorell Sun, 25 Oct 2015 06:45:42 -0700

>
> > Latest dictionary in LO5 comes with version 0.8. However in LT source
> > code repository I see the version 0.2.
>
> So can't we just update the version in LT to improve support a lot, with
> little work for us?
>


I guess so. However, since I was not directly involved in Hunspell
integration into LT I want to make sure that the upgrade introduces no
regression. Besides, the extension of the dictionary in LO and in LT do not
match -- I think that needs a little more preparation.


>
> > Extending the Spanish dictionary is an endeavor and it is one of the
> > main reasons Spanish LT is nor evolving so far: Most of the rules
> > would be "brute force" rules since most of the compound words fall out
> > of the dictionaries.
>
> Can you explain this a bit more, I'm not sure I understand. Do you refer
> to the rules in grammar.xml?
>

Spanish, like other languages such as German, is quite rich in affixes,
especially when regarding verbs. For instance, pronouns are added to the
verb and it is quite common that, in the same word, you have the flexed
verb, indirect object and direct object. For LT it is easy to analyse the
form "me lo explicas" (you explain it to me) but when you use the compound
form "explícamelo", which is the most common usage form, then you have the
challenge. So far, no dictionary includes all this forms and we survive
adding those to our local dictionary as they come across.

However there is light at the end of the tunnel. Maybe we at LT can scan
these unrecognised words and even tag them with relevant POS tags. But for
that we need some algorithms that are not still implemented in LT. That is
very first in the roadmap for Spanish, since I firmly believe in the
architecture sequence [[dictionary >> disambiguation >> rules]] and
necessarily in that order. Otherwise you end up with thousands of rules
(brute forced rules) hard to maintain and strongly incompatible with some
disambiguation and even dictionary improvements.

I contacted the Freeling people (original LT dictionary) and they welcome
suggestions, however they have their own roadmap that increases distance
from LT, so last time I checked joining efforts with them is quite a
challenge.

I got contact from one Cuban university which claimed to be interested in
improving the product. I offered support to them (even for brute-forced
rules) but I haven't heard of them for a while --and I think it is because
it is so hard getting started with it.

I think one of the biggest challenges we are facing as a product is that LT
is a very demanding development. As in all products relying on powerful
regular expressions, it is too easy introducing regressions, it takes a lot
of thinking to create a good rule and a lot of testing afterwards.
Personally, I find quite difficult allocating enough continuous time to do
something that is worth it. In its current state I think it is not possible
to pick thirty minutes here and forty five there to contribute with
something meaningful, at least for Spanish.

So I really look forward to find a group of students who wish to tackle the
task, create a Git branch and move it to a new state. Until then I think
the wise thing to do is fixing bugs, adding rules with little dictionary or
disambiguation influence and try to keep the false positives low.

Best regards,
Juan.

------------------------------------------------------------------------------

_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Re: LanguageTool for Chrome

Reply via email to