Re: [Apertium-stuff] Compound words and dix format

Francis Tyers Tue, 21 Dec 2010 04:17:39 -0800

El dt 21 de 12 de 2010 a les 13:04 +0100, en/na Kevin Brubeck Unhammer
va escriure:
> Francis Tyers <fty...@prompsit.com> writes:
> 
> > Hi!
> >
> > The problem with this is that there are so many different metadix
> > formats that it will be impossible to come up with one that covers them
> > all. For example if I remember correctly how the "alt" works is
> > different in es-pt and in oc-es. I think it was decided that it was
> > desirable to have them functioning differently, or at least would
> > require substantial changes in either language pair to get a unified
> > format -- changes that without some push (and let's face it, cash) are
> > not going to get made. 
> >
> > On the other hand, implementing compound words gives us the chance to
> > strike while the iron is hot! We can make a (fairly innocuous change --
> > any language pair that does not have compounding will be unaffected)
> > before getting a plethora of different options and thus avoiding the
> > metadix problem for another set of issues.
> >
> > Btw, thinking about metadix I have some probably unpopular ideas,
> > thatwould preclude any standardisation. I think that maybe we should not
> > have one format, but rather many _codified_ formats depending on the
> > language(group). For example how to include a verb would be different in
> > Tajik and Dutch, because different things are important. Unnecessary
> > examples:
> >
> > <e lm="aanzitten"><par n="z/itten__vblex" prefix="aan"
> > pp="aangezeten"/></e>
> >
> > Giving:
> >
> >     <e lm="aanzitten"><i>aanz</i><par n="aanz/itten__vblex_sep"/></e>
> >     <e lm="aanzitten"><p><l>z</l><r>aanz</r></p><par
> > n="z/itten#_aan__vblex_sep"/><p><l><b/>aan</l><r></r></p></e>
> >     <e lm="aanzitten"><p><l>aangezeten</l><r>aanzitten</r></p><par
> > n="gesproken__vblex_sep"/></e>
> >
> > Or in Tajik:
> >
> > <e lm="харидан"><par n="кард/ан__vblex" stem1="харид" stem2="хар"/></e>
> 
> In the unification proposal from
> 
> http://wiki.apertium.org/wiki/Unification_of_metadix_and_parametrized_dictionaries#A_unifying_proposal
> 
> the calls would look like
> 
> <e lm="aanzitten"><par n="z/itten__vblex" prms="prefix='aan' 
> pp='aangezeten'"/></e>
> 
> and
> 
> <e lm="харидан"><par n="кард/ан__vblex" prms="stem1='харид' stem2='хар'"/></e>
> 
> 
> Are there good reasons not to go with that kind of syntax?


The problem is that what happens after that would be different depending
on the language pair. I think one of the points of the unification
proposal was to have a single xsl file to do the transformations(?)
Where in this case it would be two. 

Fran


------------------------------------------------------------------------------
Lotusphere 2011
Register now for Lotusphere 2011 and learn how
to connect the dots, take your collaborative environment
to the next level, and enter the era of Social Business.
http://p.sf.net/sfu/lotusphere-d2d
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Compound words and dix format

Reply via email to