Francis Tyers <fty...@prompsit.com> writes:

> Now we have the java compound word implementation ported to C++ we can
> probably consider this 'de facto' how we are going to do compounds in
> lttoolbox -- it is _in use_ and there have been _no alternatives_. 
>
> So it is probably worth looking at how we are going to represent this
> nicely in the .dix format. At the moment we use two 'special' symbols:
>
> <sdef n="compound-only-L" c="for a form that can only appear on the L"/>
> <sdef n="compound-R"    c="for a form that can only appear on the R, or
> as a word on its own"/>
>
> I propose making a new element <c> for compound, and having one
> attribute "r" for restriction.
>
> <s n="compound-only-L"/> would be replaced with <c r="L"/> and 
> <s n="compound-R/> would be replaced with <c r="R"/>

I think it would be better if elements with <c r="R"/> are, like
<c r="L"/>, "compound-only". As the examples below show, an element
marked <s n="compound-R"/> now both allows use in compounds and out
of compounds, while <s n="compound-only-L"/> marks a path that's only
reachable in compounds. I think new users would find it less confusing
if they mean the same thing, even though it requires a slightly more
explicit dix file. So instead of

>   <e><p><l>plast</l><r>plast<s n="n"/><s n="m"/><s n="sg"/><s
> n="ind"/><c r="L"/></r></p></e>
>   <e><p><l>plast</l><r>plast<s n="n"/><s n="m"/><s n="sg"/><s
> n="ind"/></r></p></e>
>   <e><p><l>kortet</l><r>kort<s n="n"/><s n="nt"/><s n="sg"/><s
> n="def"/><c r="R"/></r></p></e>

you would have to have

>   <e><p><l>plast</l><r>plast<s n="n"/><s n="m"/><s n="sg"/><s
> n="ind"/><c r="L"/></r></p></e>
>   <e><p><l>plast</l><r>plast<s n="n"/><s n="m"/><s n="sg"/><s
> n="ind"/></r></p></e>
>   <e><p><l>kortet</l><r>kort<s n="n"/><s n="nt"/><s n="sg"/><s
> n="def"/><c r="R"/></r></p></e>
>   <e><p><l>kortet</l><r>kort<s n="n"/><s n="nt"/><s n="sg"/><s
> n="def"/></r></p></e>

(Note the beautiful symmetry.)


The original reason for having this difference was that we so far have
no examples of forms that can be compound-R but not words on their own,
so having those extra identical lines means longer dix files. 

However, lttoolbox has this wonderful feature called pardefs :) So what
the line for "kortet" really looks like is this:

  <e>       <p><l>kortet</l>    <r>kort<s n="n"/><s n="nt"/><s
  n="sg"/><s n="def"/></r></p><par n="cp-R"/></e>

where 

<pardef n="cp-R">
   <!-- can appear in compounds: -->
   <e>       <p><l></l>          <r><c r="R"/></r></p></e>
   <!-- can appear as a word on its own: -->
   <e>       <p><l></l>          <r></r></p></e>
</pardef>


So, if we're deciding on specifications, that's the only thing I'd like
to see changed. 


-Kevin


-- 

Sent from my Emacs


------------------------------------------------------------------------------
Lotusphere 2011
Register now for Lotusphere 2011 and learn how
to connect the dots, take your collaborative environment
to the next level, and enter the era of Social Business.
http://p.sf.net/sfu/lotusphere-d2d
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to