Francis Tyers <fty...@prompsit.com> writes:

> El dc 13 de 11 de 2013 a les 11:10 +0100, en/na Mikel L. Forcada va
> escriure:

[...]

> The eng-kaz pair is actually using four level transfer, as
> apertium-sme-nob, which is at a similar level of complexity.
>
> The header comments in the sme-nob transfer files (just do cat
> apertium-sme-nob.sme-nob.t1x | less) give nice comments about which file
> does what. But hopefully Unhammer can give us some background/breakdown
> too.

http://www.molto-project.eu/sites/default/files/FreeRBMT-2012.pdf#25
gives a shorter intro:

1. Chunking, 63 rules: noun phrases turn into larger chunks,
   prepositions are output based on case information, verb auxiliaries
   and adverbs are output based on verb modality, voice and derivation
   tags.

2. Interchunk 1, 26 rules: simple anaphora resolution (based on most
   recent subject gender), merging coordinated noun phrase chunks,
   moving postpositions before noun phrases.

3. Interchunk 2, 39 rules: major word order changes, inserting dropped
   pronouns, insert- ing adverbs to indicate verb modality, correcting
   noun phrase definiteness using verb information (e.g. subjects of
   duals are definite).

4. Postchunk, 29 rules: inserting articles/determiners and the
   infinitive marker, tag cleanup in preparation of generation.

> My thoughts:
>
> .t1x should be used for local chunking (noun groups and verb groups) --
> and for doing local agreement. Examples: "the red bus", "was going to
> go" "very quickly".
>
> .t2x should be used to deal with preposition + noun groups and noun +
> relative clause groups. You could also deal with some light coordination
> here.

[...]

sme-nob pretty much corresponds to that, although there are som minor
differences: Preposition choice is highly lexical and irregular, so we
can't do it just in t2x. In sme-nob we do a first-pass preposition
insertion in t1x. E.g. the input

    ^diggi<N><Sg><Gen><@→N>$ ^miella<N><Sg><Loc><@X>$
    council.sg.gen mind.sg.loc
    "in the council's opinion"

after t1x becomes

    ^caseprep<PR><loc>{^i<pr>$}$ 
^pre_nom<SN><@X><ind><m><sg><loc>{^ting<n><nt><sg><def><gen>$ 
^oppfatning<n><m><sg><3>$}$

We have a macro set_caseprep that turns locatives into 'på' ("on")
unless the noun lemma is in the list loc-av ("of") or loc-i ("in") or
loc-om ("about"). Oh, and toponyms get 'i' unless they're in the list
loc-på … and then there's verbs which might interact …

We tag verbs in t1x with "loc-om", "loc-for" etc. The verb is too far
away for t1x to see it, but when we're in t2x, we're a bit more zoomed
out, there a verb preposition tag might make us change the preposition
chunk.

(If I were to do it again, I would probably wait with inserting the
preposition chunk until t2x. That is, t1x would add e.g. <loc-i> to the
noun chunk and <loc-om> to the verb chunk, while t2x would decide which
one to use if there's a conflict.)



-- 
Kevin Brubeck Unhammer

Sent from my emacs

Attachment: pgpMNvLdIAvih.pgp
Description: PGP signature

------------------------------------------------------------------------------
DreamFactory - Open Source REST & JSON Services for HTML5 & Native Apps
OAuth, Users, Roles, SQL, NoSQL, BLOB Storage and External API Access
Free app hosting. Or install the open source package on any LAMP server.
Sign up and see examples for AngularJS, jQuery, Sencha Touch and Native!
http://pubads.g.doubleclick.net/gampad/clk?id=63469471&iu=/4140/ostg.clktrk
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to