Re: [Apertium-stuff] Tagset Standardization

2023-06-07 Thread Francis Tyers via Apertium-stuff

El 2023-06-07 16:19, Daniel Swanson va escriure:

Greetings Apertiumers!

I've been reminded that derivational morphology exists, which throws a
wrench in my desire for full position-independent tags.

I've also been reminded that some repos have .udx files which specify
a conversion between Apertium tags and Universal Dependencies, but as
far as I know there isn't any documentation for this and I'm not even
sure where to find the script that processes them. Does anyone have
any further information on those files? I think it could be quite
useful to document and standardize them and adopt them more broadly.

Daniel

On Tue, Mar 7, 2023 at 2:22 PM Daniel Swanson
 wrote:


Yes, most of our tools assume that tags are position independent, but
I've come across a handful of languages that treat some tags as
position dependent, and I was more hoping to make it official to make
it less likely that we run into issues with that.

Also, I have an idea for how to make a version of lt-proc -g that
accepts the tags in any order, which might be helpful for reducing
generation errors, though it may turn out to be too much of a slowdown
for production.

Daniel

On Tue, Mar 7, 2023 at 1:58 PM Kevin Brubeck Unhammer 
 wrote:

>
> Daniel Swanson
>  čálii:
>
> > To be clear, I meant splitting  into .
>
> 👍
>
> > One of my ideals for the tagset is that every tag be
> > position-independent, so that the only reason I need to care about
> > order is because of FST topology (and maybe not even then).
>
> Aren't the tags themselves already position-independent? Both CG and to
> a certain extent transfer assume that.
> ___
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff





The UDX format was made by me for converting vislcg3 style treebanks
to UD-style ones.

It works mostly with longest-overlap set matching on the input. Some 
challenges

are e.g.  vs. 

 -> PRON PronType=Rel
 -> NOUN NounType=Relat

I have a tonne of scripts that do it, one of which is:

https://github.com/ftyers/ud-scripts/blob/master/conllu-feats.py

I'd be happy to work on this topic as I find it interesting and there
are some substantial improvements that could be made over my existing
code.

Fran






___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff


Re: [Apertium-stuff] Tagset Standardization

2023-06-07 Thread Daniel Swanson
Greetings Apertiumers!

I've been reminded that derivational morphology exists, which throws a
wrench in my desire for full position-independent tags.

I've also been reminded that some repos have .udx files which specify
a conversion between Apertium tags and Universal Dependencies, but as
far as I know there isn't any documentation for this and I'm not even
sure where to find the script that processes them. Does anyone have
any further information on those files? I think it could be quite
useful to document and standardize them and adopt them more broadly.

Daniel

On Tue, Mar 7, 2023 at 2:22 PM Daniel Swanson
 wrote:
>
> Yes, most of our tools assume that tags are position independent, but
> I've come across a handful of languages that treat some tags as
> position dependent, and I was more hoping to make it official to make
> it less likely that we run into issues with that.
>
> Also, I have an idea for how to make a version of lt-proc -g that
> accepts the tags in any order, which might be helpful for reducing
> generation errors, though it may turn out to be too much of a slowdown
> for production.
>
> Daniel
>
> On Tue, Mar 7, 2023 at 1:58 PM Kevin Brubeck Unhammer  
> wrote:
> >
> > Daniel Swanson
> >  čálii:
> >
> > > To be clear, I meant splitting  into .
> >
> > 👍
> >
> > > One of my ideals for the tagset is that every tag be
> > > position-independent, so that the only reason I need to care about
> > > order is because of FST topology (and maybe not even then).
> >
> > Aren't the tags themselves already position-independent? Both CG and to
> > a certain extent transfer assume that.
> > ___
> > Apertium-stuff mailing list
> > Apertium-stuff@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/apertium-stuff


___
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff