Francis Tyers <fty...@prompsit.com> writes: > El dc 13 de 11 de 2013 a les 11:10 +0100, en/na Mikel L. Forcada va > escriure:
[...] > The eng-kaz pair is actually using four level transfer, as > apertium-sme-nob, which is at a similar level of complexity. > > The header comments in the sme-nob transfer files (just do cat > apertium-sme-nob.sme-nob.t1x | less) give nice comments about which file > does what. But hopefully Unhammer can give us some background/breakdown > too. http://www.molto-project.eu/sites/default/files/FreeRBMT-2012.pdf#25 gives a shorter intro: 1. Chunking, 63 rules: noun phrases turn into larger chunks, prepositions are output based on case information, verb auxiliaries and adverbs are output based on verb modality, voice and derivation tags. 2. Interchunk 1, 26 rules: simple anaphora resolution (based on most recent subject gender), merging coordinated noun phrase chunks, moving postpositions before noun phrases. 3. Interchunk 2, 39 rules: major word order changes, inserting dropped pronouns, insert- ing adverbs to indicate verb modality, correcting noun phrase definiteness using verb information (e.g. subjects of duals are definite). 4. Postchunk, 29 rules: inserting articles/determiners and the infinitive marker, tag cleanup in preparation of generation. > My thoughts: > > .t1x should be used for local chunking (noun groups and verb groups) -- > and for doing local agreement. Examples: "the red bus", "was going to > go" "very quickly". > > .t2x should be used to deal with preposition + noun groups and noun + > relative clause groups. You could also deal with some light coordination > here. [...] sme-nob pretty much corresponds to that, although there are som minor differences: Preposition choice is highly lexical and irregular, so we can't do it just in t2x. In sme-nob we do a first-pass preposition insertion in t1x. E.g. the input ^diggi<N><Sg><Gen><@→N>$ ^miella<N><Sg><Loc><@X>$ council.sg.gen mind.sg.loc "in the council's opinion" after t1x becomes ^caseprep<PR><loc>{^i<pr>$}$ ^pre_nom<SN><@X><ind><m><sg><loc>{^ting<n><nt><sg><def><gen>$ ^oppfatning<n><m><sg><3>$}$ We have a macro set_caseprep that turns locatives into 'på' ("on") unless the noun lemma is in the list loc-av ("of") or loc-i ("in") or loc-om ("about"). Oh, and toponyms get 'i' unless they're in the list loc-på … and then there's verbs which might interact … We tag verbs in t1x with "loc-om", "loc-for" etc. The verb is too far away for t1x to see it, but when we're in t2x, we're a bit more zoomed out, there a verb preposition tag might make us change the preposition chunk. (If I were to do it again, I would probably wait with inserting the preposition chunk until t2x. That is, t1x would add e.g. <loc-i> to the noun chunk and <loc-om> to the verb chunk, while t2x would decide which one to use if there's a conflict.) -- Kevin Brubeck Unhammer Sent from my emacs
pgpMNvLdIAvih.pgp
Description: PGP signature
------------------------------------------------------------------------------ DreamFactory - Open Source REST & JSON Services for HTML5 & Native Apps OAuth, Users, Roles, SQL, NoSQL, BLOB Storage and External API Access Free app hosting. Or install the open source package on any LAMP server. Sign up and see examples for AngularJS, jQuery, Sencha Touch and Native! http://pubads.g.doubleclick.net/gampad/clk?id=63469471&iu=/4140/ostg.clktrk
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff