Goddag, I've just tagged new releases of swe-nor and dan-nor.
The work on swe-nor is partially funded by the Norwegian News Agency, and dan-nor by Store norske leksikon. For both pairs, all directions now use apertium-separable (lsx) and recursive transfer (rtx), with testing by apertium-regtest. Most of the work has been focused on the nob→{swe,dan} direction, but all directions have of course improved vocabulary and seem to have improved quality. The directions into Nynorsk are also usable with style preferences (though it hasn't been added to the UI yet in this release). Some stats: dan-nor: - Over 22.000 new non-name bidix entries - Over 300 new lexical selection rules - Over 300 new lexical selection rules - ~60 separable/mwe entries, including comma insertion rules for generating Danish swe-nor: - Over 20.000 new non-name bidix entries - Over 300 new lexical selection rules manually added - Nearly 7000 new lexical selection rules based on corpus frequencies - ~30 separable/mwe entries and the newer monolingual dependencies mean much better bokmål disambiguation (and some improvements there for the other languages as well) as well as much better compound epenthetic choices and tweaks all round. Moving from chunking transfer to recursive for these pairs was a joy. I have spent very little time on the rules, but they already cover more than the old rules did, in much fewer lines of code (including comments and everything, dan-nor has ~1011 lines of rtx in one file per direction, and 8347 of t?x with three files per direction). Each direction has about 20 rtx rules (where a rule is NP→n|ncmp n|…), 50 if you count alternatives. There's a lot less redundancy than before, and the recursion means we can have e.g. compounds of arbitrary length. -Kevin _______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff