Hi,

I have done the changes required in LT for updating to Morfologik 2.1.0.
You can see them in the branch "updatemorfologik" (a code clean-up is
pending).

Someone should test these changes before I push them.

The inputs for the dictionary builders are the same as before.

As for the ouputs, there is a slight difference in the format of the tagger
dictionaries. They are now by default FSA5, before they were CSFA2. This
can be changed. According to Dawid Weiss:

"The difference in CFSA2 vs. FSA5 is in the way the automaton is written/
compressed. If you want to know the details I wrote a paper about it, but
it is safe to assume that CFSA2 will produce smaller dictionaries at a
slightly higher cost of traversing them."

The Catalan dictionary is 1.1M with CFSA2 and 1.4M with FSA5. What should
we use? I don't know if the "the cost of traversing" the dictionary is
relevant.

There is a minor change in tagger dicionaries with added "frequency data"
(it only happens in the Catalan dictionary, I think). The separator
character between the POS tag and the frequency will be removed.

Regards,
Jaume Ortolà
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://makebettercode.com/inteldaal-eval
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to