Re: [Apertium-stuff] Idea for GSOC: tools to train supervised taggers

Trosterud Trond Wed, 20 Mar 2013 16:00:52 -0700

!y experience: already 100 handwritten .cg rules give pos marking with accuracy 
around 93-95. What takes a bit more is disamb of the full tag string.


So .cg as part of the process should  be considered.

Trend
Lähetetty Samsungin tablettitietokoneesta

Francis Tyers <fty...@prompsit.com> kirjoitti:
I also like the idea! Especially if we can have an optional integration
of CG to allow people to write rules to tag the corpus -- if they so
wish. In the end we win both ways: Those who are looking for a tagged
corpus for training the tagger get it, and those who would also like
constraint rules get them too.

I'll try writing it up now. :)

Fran

El dt 19 de 03 de 2013 a les 20:19 +0100, en/na Mikel Forcada va
escriure:
> +1
>
> Write it up, Gema! ;-)
>
> You'll mentor it with a co-mentor (!)  I can easily think of a couple
> names....
>
> Mikel
>
> Al 03/19/2013 01:49 PM, En/na Gema Ramírez-Sánchez ha escrit:
> > Hi there,
> >
> > as I see it, there is a need in Apertium for most released pairs and
> > the ones to come: better PoS taggers. In my experience, training
> > supervised taggers has never been a waste of time but all the
> > opposite: at the same time we have quality improvement and we are
> > creating unvaluable linguistic resources such as disambiguated tagged
> > corpora.
> >
> > So, how to turn this inot a GSoC idea?
> >
> > Following the wikipages on how to train a tagger (see below) and
> > taking into account that supervised training still to be written...
> > this project would at least involve
> >
> > 0) (must-have) making an interface where you can upload a raw text of,
> > say, 25.000 words or (optional) create a corpus or X size for a given
> > language from wikipedia
> >
> >   and, by choosing a language for which there is at least a
> > morphological dictionary in Apertium, you have:
> >
> > 1) (must-have) a non-disambiguated tagged corpus
> > 3) (must-have) a .dic file
> > 2) (must-have) a simple fully functional precalculated .tsx file in
> > which coarse tags defined taking into account the information from the
> > dic file
> >
> > then it will also include:
> >
> > 4) (must-have) a user-friendly interface to take your
> > non-disambiguated tagged corpus and be able to disambiguate it
> > manually
> > 5) (must-have) a user-friendly documentation on how to improve the tsx
> > (refine coarse tags, write rules)
> > 6) (must-have) a user-friendly interface to train a supervised tagger
> > 7) (must-have) some way to evaluate performance of a .prob
> >
> > I'm surely forgetting some must-have and I have to think about it a
> > little bit more, but, what do you think about the general idea of
> > having tools to train supervised taggers?
> >
> > Another important question: I'll not able to technically mentor this
> > project, so, if no one else is interested...
> >
> > Best,
> >
> > Gema.
> >
> > --------------------
> > How to train a tagger in Apertium:
> > http://wiki.apertium.org/wiki/Tagger_training
> > http://wiki.apertium.org/wiki/Target_language_tagger_training
> > http://wiki.apertium.org/wiki/Unsupervised_tagger_training
> >
>
>




------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar

_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Idea for GSOC: tools to train supervised taggers

Reply via email to