A 2021-04-29 11:54, Kevin Brubeck Unhammer escrigué:
Hi,

I've tagged some new releases of nno, nob and apertium-nno-nob.

Like before[0], the work has been funded by the Norwegian Ministry of
Culture via Nynorsk pressekontor (NPK) and the Norwegian News Agency,
now with direct commits from contributors Anja, Victoria and Hallvard of
NPK :-)

One major visible change is that we now let the user select a number of
spelling variants using a new preferences system. Instead of compiling
one FST per set of style choices, we just generate all choices on the
fly and disambiguate.[1] You can try it already on the Beta
site[2] (though currently it may fail if you use it with
Transfuse[3]). Before, the user could only select if infinitives ended
in -e or -a; now they can also pick if the third person plural should be
"me" or "vi", if words like "byggje" should have the optional j there,
etc. They can combine such options as they choose, and we'll be adding
more options in the future.

Since last time, we've also updated the monolingual dictionaries with
new entries from the updated Norsk ordbank[4] and gotten lots of new
bidix entry as well through that.

Other changes:
- 41 new transfer rules
- 614 new lrx rules
- about 800 new names and 26.800 new non-names added to bidix
  (many scriptually added via new Norsk ordbank entries)
- many transfer tweaks, e.g. adverbs can move past noun phrases, new
  constructions recognised
- lots of work on nob disambiguation, especially on noun vs verb and
  participles (which gain a distinction in nno which they don't have in
  nob)
- much more consistent default nno spelling choices
- rules for name guessing using CG
- number compounding + more left-hand-only compound parts

WER on news text continues to stay around 4% – we're on the one hand
reaching deep into the long tail of unknown words, and on the other
hand spending more time making things more idiomatic with multi-word
rules. The next steps include better support for correcting
capitalisation[5] and starting to use apertium-separable for MWE's.[6]


-Kevin

[0] https://sourceforge.net/p/apertium/mailman/apertium-stuff/thread/CABnmVq5J5Acc7r4XwtMgVR2eyd5dF2ab4gUsUv2ZWPzMWE5J7A%40mail.gmail.com/ [1] https://wiki.apertium.org/wiki/Dialectal_or_standard_variation#Overlapping_variants [2] https://beta.apertium.org/index.eng.html?dir=nob-nno&q=Vi%20liker%20enten%20%C3%A5%20fortsette%20%C3%A5%20bygge%20n%C3%A5r%20vi%20blant%20annet%20s%C3%B8ker%20forskjellen%20mens%20dere%20er%20uenige.#translation
[3] https://github.com/TinoDidriksen/cg3/pull/75
[4] https://www.nb.no/sprakbanken/ressurskatalog/oai-nb-no-sbr-41/
[5] We'd like to be able to state in bidix that "Xyz" should turn into "xyz",
    which is currently not possible. See also
https://github.com/apertium/apertium/issues/75
[6] currently blocked by
https://github.com/apertium/apertium-separable/issues/36



Congratulations, this is great news! :)

Fran


_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to