> > > Latest dictionary in LO5 comes with version 0.8. However in LT source > > code repository I see the version 0.2. > > So can't we just update the version in LT to improve support a lot, with > little work for us? >
I guess so. However, since I was not directly involved in Hunspell integration into LT I want to make sure that the upgrade introduces no regression. Besides, the extension of the dictionary in LO and in LT do not match -- I think that needs a little more preparation. > > > Extending the Spanish dictionary is an endeavor and it is one of the > > main reasons Spanish LT is nor evolving so far: Most of the rules > > would be "brute force" rules since most of the compound words fall out > > of the dictionaries. > > Can you explain this a bit more, I'm not sure I understand. Do you refer > to the rules in grammar.xml? > Spanish, like other languages such as German, is quite rich in affixes, especially when regarding verbs. For instance, pronouns are added to the verb and it is quite common that, in the same word, you have the flexed verb, indirect object and direct object. For LT it is easy to analyse the form "me lo explicas" (you explain it to me) but when you use the compound form "explícamelo", which is the most common usage form, then you have the challenge. So far, no dictionary includes all this forms and we survive adding those to our local dictionary as they come across. However there is light at the end of the tunnel. Maybe we at LT can scan these unrecognised words and even tag them with relevant POS tags. But for that we need some algorithms that are not still implemented in LT. That is very first in the roadmap for Spanish, since I firmly believe in the architecture sequence [[dictionary >> disambiguation >> rules]] and necessarily in that order. Otherwise you end up with thousands of rules (brute forced rules) hard to maintain and strongly incompatible with some disambiguation and even dictionary improvements. I contacted the Freeling people (original LT dictionary) and they welcome suggestions, however they have their own roadmap that increases distance from LT, so last time I checked joining efforts with them is quite a challenge. I got contact from one Cuban university which claimed to be interested in improving the product. I offered support to them (even for brute-forced rules) but I haven't heard of them for a while --and I think it is because it is so hard getting started with it. I think one of the biggest challenges we are facing as a product is that LT is a very demanding development. As in all products relying on powerful regular expressions, it is too easy introducing regressions, it takes a lot of thinking to create a good rule and a lot of testing afterwards. Personally, I find quite difficult allocating enough continuous time to do something that is worth it. In its current state I think it is not possible to pick thirty minutes here and forty five there to contribute with something meaningful, at least for Spanish. So I really look forward to find a group of students who wish to tackle the task, create a Git branch and move it to a new state. Until then I think the wise thing to do is fixing bugs, adding rules with little dictionary or disambiguation influence and try to keep the false positives low. Best regards, Juan.
------------------------------------------------------------------------------
_______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel