On Tue, Jul 05, 2011 at 01:42:52PM +0100, Jimmy O'Regan wrote: > 2011/7/5 Keld Jørn Simonsen <k...@keldix.com>: > > On Tue, Jul 05, 2011 at 01:04:40PM +0100, Jimmy O'Regan wrote: > >> 2011/7/5 Keld Jørn Simonsen <k...@keldix.com>: > >> > On Sun, Jul 03, 2011 at 10:21:55PM +0100, Jimmy O'Regan wrote: > >> >> 2011/7/3 Keld Jørn Simonsen <k...@keldix.com>: > >> >> > So that person actually understood what I meant the first time - good > >> >> > to > >> >> > know that there is at least one person (plus my mother) that > >> >> > understands > >> >> > me - although the understanding may crumble over time. > >> >> > >> >> Context is wonderful. I did say it wouldn't be done in a hurry, and > >> >> nobody else has expressed an interest in it since then. If you want to > >> >> try yourself, take a look at TaggerWord::discardOnAmbiguity in > >> >> tagger_word.cc, otherwise you'll have to continue waiting. > >> > > >> > Yes, you said: > >> > > >> >> Without retraining the tagger, there's no way to do that. There are > >> >> preference rules, but those only filter on tags. I think it might be > >> >> useful to extend the tagger to have a mechanism to make certain tag > >> >> choices for specific lemmas, and not too difficult to implement, based > >> >> on the existing preference rules, but it's not going to be done in a > >> >> hurry. > >> > > >> > I put emphasis on "not to difficult to implement". What are your > >> > thoughts? Then I could have a look. I was actually thinking of some more > >> > complex things also, and if they would be almost as easy to implement, > >> > then I would go for the full monty. > >> > > >> > My further ideas were: > >> > - discardOnAmbiguity based on allowed grammatical rules > >> > - discardOnAmbiguity based on number of appearances > >> > - discardOnAmbiguity based on shortest distance for a wordnet like graph > >> > for the surrounding say 10 words. > >> > >> You've taken what was meant to be a simple idea and made it extremely > >> complicated. There are a handful of people on this list who use CG, > >> maybe you should talk to one of them. It might do what you want. > > > > OK, who are you thinking of? > > Francis usually relentlessly promotes it. I'm surprised he hasn't > chimed in by now.
OK. There is a difference in my mind about the simplistic idea of removing some homonyms as that would be statically done when generating tables out of the monodix, and it would mean removal of the relevant surface forms, while the other further ideas above would be dynamic, with the surface forms of the homonyms intact in the monodix, and then some decision will be done. Is TaggerWord::discardOnAmbiguity in tagger_word.cc a static or dynamic functionality in this sense? Best regards keld ------------------------------------------------------------------------------ All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2 _______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff