Flammie A Pirinen <flam...@iki.fi> čálii:

> Hi all,
>
> I've written a handful of apertium-fin-* prototypes and I usually end up
> spending way too much time with all the useless subclasses of proper
> nouns we have (cogs, ants, als, tops, orgs, and to top all that,
> sometimes ms and fs for some extra (mis)gendering). Could we just get
> rid of those or those someone have a good use for them? Most of the time
> it's very random anyways and we aren't really doing NERing or anything.
> I think if these are used in e.g. cg or whatever we should probably have
> different way of introducing them that doesn't intervene with
> analysis-generation stuffs, like we talked passing by in the last
> apertium zoom meeting? Or is there some smart way to bypass them I
> haven't thought of (probably)

Genders are useful when anaphora resolving / in transfer, though only on
person names. There are some place/org names from swe that have genders
(originally from SALDO) which bled into other scandipairs – I'd be happy
to remove those since they seem quite useless for us.

The <ant>, <cog> and <top> tags are used quite a bit in the nob
disambiguator, but not in transfer.

I tend to underspecify np's in bidix:

<e> <p><l>Iran<s n="np"/></l><r>Iran<s n="np"/></r></p></e>
<e> <p><l>Thiel<s n="np"/></l><r>Thiel<s n="np"/></r></p></e>
<e> <p><l>Saruman<s n="np"/></l><r>Saruman<s n="np"/></r></p></e>
<e> <p><l>Contras<s n="np"/></l><r>Contras<s n="np"/></r></p></e>

so just the monodixen need to be synced. If there is an actual
bidix-relevant difference, e.g. some place name gets translated but not
if it's a person name, then one can specify the tags for just that
entry.

The remaining problem is when the analyser gives ^Saruman<np><al>$ and
you try to send that into a generator that expects ^Saruman<np><ant>$.

We could perhaps use the Giellatekno solution for that, where dixen have
RL entries that just contain <np> (ie., no cog/ant/al), and some
transfer step cleans off the tags. Should be a fairly simple change, and
it's tried and tested in giella-pairs. Since lttoolbox is used mostly
for languages where np pardefs are small, adding the RL's is like max
10 extra lines; for languages requiring hfst it's probably a fairly
simple twol or xfregex rule?

Attachment: signature.asc
Description: PGP signature

_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to