Hi all,

I made a thing:
https://apertium.trigram.no/?dir=nob-nno&q=Vi%20liker%20enten%20%C3%A5%20fortsette%20%C3%A5%20bygge%20n%C3%A5r%20vi%20blant%20annet%20s%C3%B8ker%20forskjellen%20mens%20dere%20er%20uenige.#translation
(try toggling the various "Style preferences")

Norway in general has a very positive view on dialects, and there are
quite a lot of accepted spelling/word alternatives in both the Bokmål
and Nynorsk variants of Norwegian. And people have style preferences
that don't map cleanly into disjoint sets – some people want
{me,a,kj}, others want {me,e,kj} or {vi,a,k} etc. The current system
of alt-attributes and multiple binaries doesn't scale here. With just
two options, you have to have four alt-values (and lots of duplication
in .dix) and four generators – we'd like dozens of options, but without
compiling 2^dozens of generators.

So instead, why not just generate everything and disambiguate? CG can
read variables inserted as blanks in the stream, and then use those to
decide which style preferences to keep or throw away. Apparently others
already do this.[1]

I've got some branches of apertium-nno-nob/apertium-nno where
it's implemented. We put a cg-proc command after both bidix and the
generator, and change the generator to use the bilingual format
(lt-proc -b).[2]


For bidix-specified preferences, when translating from right to left, we
remove the r="LR" on the entries in question, and match on lemmas in the
biprefs.rlx CG file:[3]

    SELECT ("skilnad"i) IF (0 ("forskjell"i) + (VAR:forskjell_skilnad));
    REMOVE ("skilnad"i) IF (0 ("forskjell"i));

The preference variable here is named "forskjell_skilnad" since the
default is "forskjell", but if that option is ticked / variable is set,
we choose "skilnad".


For generator-specified preferences, we remove the LR's and instead add
a tag when generating, typically through a pardef, e.g. this one is used
for the set of words where 'kj' kan be written as 'k':

    <pardef n="v:kj_k">
      <e r="LR">   <p><l></l>                  <r></r></p></e>
      <e r="RL">   <p><l><s n="v:kj_k"/></l> <r></r></p></e>
    </pardef>

I gave cg-proc a new switch -g/--generation that outputs lexical units
without without tags (unless --trace is also given) and without
surrounding ^$. The generator (running with lt-proc -b) now gives

    $ echo spøkelse | apertium -d . nob-nno-dgen
    ^spøkelse<n><nt><sg><ind>/spø<v:kj_k>kelse/spøkjelse$^.<sent><clb>/.$

and yeah maybe the tag is at a weird spot but cg-proc doesn't mind:

    $ echo '^spøkelse<n><nt><sg><ind>/spø<v:kj_k>kelse/spøkjelse$' | cg-proc -n 
-g -w nob-nno.genprefs.rlx.bin
    spøkjelse

    echo 
'[<STREAMCMD:SETVAR:kj_k>]^spøkelse<n><nt><sg><ind>/spø<v:kj_k>kelse/spøkjelse$'
 | cg-proc -n -g -w nob-nno.genprefs.rlx.bin
    [<STREAMCMD:SETVAR:kj_k>]spøkelse

Those STREAMCMD's are hard to do manually, so /usr/bin/apertium can now
insert and strip them by reading AP_SETVAR:

    $ export AP_SETVAR
    $ for AP_SETVAR in "" "kj_k" "kj_k,infa_infe" "infa_infe"; do echo spøke| 
apertium -d . nob-nno; done
    spøkja
    spøka
    spøke
    spøkje

The changes to apertium and vislcg3 are merged, but the html-tools
changes may need deuglifying and testing.

I also haven't merged the changes to apy[6] yet, since we need to
bikeshed how to get the list of possible options from the language pair
(which html-tools shows). Currently the apy branch just hardcodes the
list for nob→nno[7] :)

Some preferences are in the monolingual packages and some in the pair,
so the preferences for a pair need to include both. Perhaps each package
includes one or more preferences.xml files, and then modes.xml can do

    <mode name="nob-nno">
        <pipeline>…</pipeline>
        <preferences>
            <pref path="nob-nno.preferences.xml"/>
            <pref path="nno.preferences.xml"/>
        </preferences>
    </mode>

where nno.preferences.xml is copied by make from
$(LANG1)/preferences.xml and contains something

        <preference id="kj_k">
            <description lang="nno">søkje → søke</description>
            <description lang="nob">søkje → søke</description>
        </preference>

(then apy would have to parse modes.xml and the files listed there)

Or is there a better way?




[1] https://github.com/TinoDidriksen/cg3/issues/68#issuecomment-736571504
[2] 
https://github.com/apertium/apertium-nno-nob/compare/biprefs#diff-94c6b34f4d7517dc0915b07677ec9a8656b559e0bbdb683a66d4270790b88812R21-L39
[3] 
https://github.com/apertium/apertium-nno-nob/compare/biprefs#diff-4e8f9b0972e6a59cd53d18a476d8d01bbe772f29a5c481659b24185d6839cd1dR14-R15
[4] 
https://github.com/apertium/apertium-nno/compare/biprefs#diff-4e49c13e8aa1b44221621ad844edbd9be60169e8baa78ed7bbb80b4237865573R638-R641
[5] https://github.com/apertium/apertium-html-tools/tree/biprefs
[6] https://github.com/apertium/apertium-apy/tree/biprefs
[7] 
https://github.com/apertium/apertium-apy/blob/c16902972097d4e9dd9af1d14413562f96d32604/apertium_apy/handlers/translate.py#L219



_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to