Francis Tyers <fty...@prompsit.com> writes: > Now we have the java compound word implementation ported to C++ we can > probably consider this 'de facto' how we are going to do compounds in > lttoolbox -- it is _in use_ and there have been _no alternatives_. > > So it is probably worth looking at how we are going to represent this > nicely in the .dix format. At the moment we use two 'special' symbols: > > <sdef n="compound-only-L" c="for a form that can only appear on the L"/> > <sdef n="compound-R" c="for a form that can only appear on the R, or > as a word on its own"/> > > I propose making a new element <c> for compound, and having one > attribute "r" for restriction. > > <s n="compound-only-L"/> would be replaced with <c r="L"/> and > <s n="compound-R/> would be replaced with <c r="R"/>
I think it would be better if elements with <c r="R"/> are, like <c r="L"/>, "compound-only". As the examples below show, an element marked <s n="compound-R"/> now both allows use in compounds and out of compounds, while <s n="compound-only-L"/> marks a path that's only reachable in compounds. I think new users would find it less confusing if they mean the same thing, even though it requires a slightly more explicit dix file. So instead of > <e><p><l>plast</l><r>plast<s n="n"/><s n="m"/><s n="sg"/><s > n="ind"/><c r="L"/></r></p></e> > <e><p><l>plast</l><r>plast<s n="n"/><s n="m"/><s n="sg"/><s > n="ind"/></r></p></e> > <e><p><l>kortet</l><r>kort<s n="n"/><s n="nt"/><s n="sg"/><s > n="def"/><c r="R"/></r></p></e> you would have to have > <e><p><l>plast</l><r>plast<s n="n"/><s n="m"/><s n="sg"/><s > n="ind"/><c r="L"/></r></p></e> > <e><p><l>plast</l><r>plast<s n="n"/><s n="m"/><s n="sg"/><s > n="ind"/></r></p></e> > <e><p><l>kortet</l><r>kort<s n="n"/><s n="nt"/><s n="sg"/><s > n="def"/><c r="R"/></r></p></e> > <e><p><l>kortet</l><r>kort<s n="n"/><s n="nt"/><s n="sg"/><s > n="def"/></r></p></e> (Note the beautiful symmetry.) The original reason for having this difference was that we so far have no examples of forms that can be compound-R but not words on their own, so having those extra identical lines means longer dix files. However, lttoolbox has this wonderful feature called pardefs :) So what the line for "kortet" really looks like is this: <e> <p><l>kortet</l> <r>kort<s n="n"/><s n="nt"/><s n="sg"/><s n="def"/></r></p><par n="cp-R"/></e> where <pardef n="cp-R"> <!-- can appear in compounds: --> <e> <p><l></l> <r><c r="R"/></r></p></e> <!-- can appear as a word on its own: --> <e> <p><l></l> <r></r></p></e> </pardef> So, if we're deciding on specifications, that's the only thing I'd like to see changed. -Kevin -- Sent from my Emacs ------------------------------------------------------------------------------ Lotusphere 2011 Register now for Lotusphere 2011 and learn how to connect the dots, take your collaborative environment to the next level, and enter the era of Social Business. http://p.sf.net/sfu/lotusphere-d2d _______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff