Goddag,

I've just tagged new releases of swe-nor and dan-nor.

The work on swe-nor is partially funded by the Norwegian News Agency,
and dan-nor by Store norske leksikon.

For both pairs, all directions now use apertium-separable (lsx) and
recursive transfer (rtx), with testing by apertium-regtest.

Most of the work has been focused on the nob→{swe,dan} direction, but
all directions have of course improved vocabulary and seem to have
improved quality. The directions into Nynorsk are also usable with style
preferences (though it hasn't been added to the UI yet in this release).

Some stats:

dan-nor:
- Over 22.000 new non-name bidix entries
- Over 300 new lexical selection rules
- Over 300 new lexical selection rules
- ~60 separable/mwe entries, including comma insertion rules for
  generating Danish

swe-nor:
- Over 20.000 new non-name bidix entries
- Over 300 new lexical selection rules manually added
- Nearly 7000 new lexical selection rules based on corpus frequencies
- ~30 separable/mwe entries

and the newer monolingual dependencies mean much better bokmål
disambiguation (and some improvements there for the other languages as
well) as well as much better compound epenthetic choices and tweaks all
round.

Moving from chunking transfer to recursive for these pairs was a joy. I
have spent very little time on the rules, but they already cover more
than the old rules did, in much fewer lines of code (including comments
and everything, dan-nor has ~1011 lines of rtx in one file per
direction, and 8347 of t?x with three files per direction). Each
direction has about 20 rtx rules (where a rule is NP→n|ncmp n|…), 50 if
you count alternatives. There's a lot less redundancy than before, and
the recursion means we can have e.g. compounds of arbitrary length.

-Kevin



_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to