Greetings Apertiumers! This morning I set out to change the Ancient Hebrew analyzer from Latin script to Hebrew script (a task I don't wish upon anyone) and in the process produced a search-and-replace tool that understands the structure of several of our source files: https://github.com/mr-martian/apertium-grep
This script could, without too much trouble, be expanded to cover the rest of our source files, at which point I would like to propose that we move towards greater standardization of our tagset: https://wiki.apertium.org/wiki/List_of_symbols At minimum, I would like to deal with some of the duplicate tags, like impf/imperf, rec/res, v/vblex, pass/pasv, etc. My preference would be that we also consider splitting compound tags, like the tense+mood (fti, fts, pii, pis) and maybe possessor and subject tags (px1sg, s_1sg). And if we wanted to go really crazy we could consider a broader rewrite like changing our tags to UD-style feature-value pairs (so <sg> becomes <Number=Sing>), though I don't imagine we actually want to go nearly that far. So, given that the effort involved in actually making the change is no longer the limiting factor, what do we want our tagset to be? Daniel _______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff