Am Mon, Mar 06, 2023 at 03:35:45PM -0500 schrieb Daniel Swanson:

> This script could, without too much trouble, be expanded to cover the
> rest of our source files, at which point I would like to propose that
> we move towards greater standardization of our tagset:
> https://wiki.apertium.org/wiki/List_of_symbols
> 
> At minimum, I would like to deal with some of the duplicate tags, like
> impf/imperf, rec/res, v/vblex, pass/pasv, etc.

Yay. There's probably some ind / indic ~ indv / indef confusions too.

> My preference would be that we also consider splitting compound tags,
> like the tense+mood (fti, fts, pii, pis) and maybe possessor and
> subject tags (px1sg, s_1sg)

That's already harder to implement people surely have strong opinions
here. Personally, I'd be used to having verbal person numbers tagged
with only one tag too, {sg,du,pl}{1,2,3} rather than two, but I can see
some languages can use separate tags for syntax etc. At least as long as
they are standard and have easy 1:n mappings, perhaps even scripts to
switch between easily, they should be workable.

>. And if we wanted to go really crazy we
> could consider a broader rewrite like changing our tags to UD-style
> feature-value pairs (so <sg> becomes <Number=Sing>), though I don't
> imagine we actually want to go nearly that far.

YEah it would be ideal imo but probably would also have some opposition.
As long as we have standardised tagset
and fairly simple remapping to ud (and unimorph would be nice too) it's
not too bad. Simple being a screenful of code in your favourite
scripting language (and a mapping table).


-- 
Regards, Flammie <https://flammie.github.io>
(Please note, that I will often include my replies inline instead of
top or bottom of the mail)

Attachment: signature.asc
Description: PGP signature

_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to