Am Tue, 28 Feb 2017 10:33:25 +0100
schrieb Kevin Brubeck Unhammer <unham...@fsfe.org>:

> If I have this in foo.dix:
> 
> <e><p><l>PC-ane</l><r>PC<s n="n"/><s n="m"/><s n="pl"/><s
> n="def"/></r></p></e>
> 
> then I get
> 
> $ echo '^PC<n><m><pl><def>$' | lt-proc -d foo.autogen.bin
> PC-ane
> 
> which is what I want. However, sometimes people write compounds in
> lowercase, so I try including both upper and lower-case lemmas (I only
> care about generation now):
> 
> <e><p><l>pc-ane</l><r>pc<s n="n"/><s n="m"/><s n="pl"/><s
> n="def"/></r></p></e> <e><p><l>PC-ane</l><r>PC<s n="n"/><s n="m"/><s
> n="pl"/><s n="def"/></r></p></e>
> 
> but that gives
> 
> $ echo '^PC<n><m><pl><def>$' | lt-proc -d foo.autogen.bin
> PC-ane/PC-ANE
> 
> Would it make sense for lt-proc to not output forced-uppercase
> analyses if there's an otherwise identical dictionary-uppercase
> analysis? (Would it be easily implementable?)

I have no idea of the implementability of this, but it affects most
fin- pairs, so if there's anything remotely reasonable I can also help
as much as I can.


-- 
Doktor Tommi A Pirinen, Computational Linguist,
<https://flammie.github.io/purplemonkeydishwasher/>, Universität
Hamburg, Hamburger Zentrum für Sprachkorpora <http://hzsk.de>. CLARIN-D
Entwickler.  President of ACL SIGUR SIG for Uralic languages
<http://gtweb.uit.no/sigur/>.
I tend to follow inline-posting style in desktop e-mail messages.

Attachment: pgpDKDcI922gi.pgp
Description: Digitale Signatur von OpenPGP

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to