Re: [Apertium-stuff] Word selection by sens was: Re: Adding Swedish nouns from SALDO to da-se was: Re: Danish - Swedish Nouns

keld Tue, 09 Oct 2012 06:14:25 -0700

On Tue, Oct 09, 2012 at 09:41:41AM +0200, Per Tunedal wrote:
> Hej Keld,
> I liked your algo but had to think it over. After I've slept on, it a
> few things got into my mind:
> 
> "My initial go on an algorithm is then: I found a homonym. 
> Each of the homonyms have a placement in the meaning tree via its father
> and mother relations."
> 
> Unfortunately, I've no idea what's the father relation. Maybe you should
> follow only the mother relations?


The father relation is meant to discriminate between the same mother relations.
So maybe it can be of help. I don't know. I take it into account to generalize 
wordnet-like structures, there may be more than one relation from a given 
homonym
And a general Apertium wordnet module and algoritm should be able
to handle more than one upwards relation, In the monodix markup
this could be then marked with a "rel" tag, and more
"rel" tags may be present. I need input from people more in the know if this 
could be
the recommended way to mark up such meaning relations in the monodix.

> "Each of the above terms in the trees will then be recorded with the
> distance in links 
> to the specific homonym."
> 
> Seems alright, so far.
> 
> "If a term has been visited before then link count is modified
> if the new link count is less - and also modified for all links above
> this node."
> 
> I don't understand this.

there may be different paths from the homonym to a specific term. And one path
would be shorter than another. if we find a path that is shorter than a 
previously found path,
then the path to all terms above the found term would also be shorter, and thus 
the
link count to those terms need to be adjusted too.


> "I then take a number of preceding and following words - say 5 preceding
> and 5 following words." 
> 
> 1. You would only benefit from words loaded with sense, primarily nouns,
> secondly verbs. The rest should be ignored.

yes, that could be an optimization. I would include adjectives and adverbiums 
too.
At least exclude some very common words.

> 2. It might be useful to search in adjacent sentences when looking for
> related words. Look for 3 nouns before and 3 nouns after the ambiguous
> word? If you find less in any direction, look for more in the other,
> until you have the desired total (or as many you can find).
> 
> "For each of these words I travel up in the hierachy both of the
> father and mother branches."
> 
> As I've said, maybe you could skip the father branch.

yes, as noted above.

> "We need to follow all branches - there may be one branch
> that is shorter than a previous match, both on the homonym
> side and the surrounding word side."
> 
> Seems OK.
> 
> "The shortest distance between the specific homonym
> and the surrounding word is then the link count from the specific
> homonym to the common term plus the surrounding word distance in links
> to the common term."
> 
> OK
> 
> "It is not OK to stop at the first match, there may be
> shorter matches.... (Hmm, one could travel
> the two trees to always secure shortest match - I think)"
> 
> Maybe there is a way to skip non-fruitful links before they are
> completed, to speed things up? Compare to the shortest link found so
> far?

Yes, you could stop the search if you are sure that further links would be 
longer than the
one already found.

> Maybe there are other ways too. If you test your algo you might find a
> pattern of links likely to be less fruitful than others. Or someone
> might come up with a theoretical way to figure it out?

yes, that is a possibility. Maybe the linguists here know more about such algos?

> Maybe there are similar algos around for related applications, like
> spell checking, statistical translation (tree model or even factored
> translation, look at Moses), speech recognition or even artificial
> intelligence? And you mentioned finding the shortest way. Someone might
> have an idea of where to look for algos? There might be some open source
> code to copy or be inspired by.

yes. Anyway I think the algo we are making here is sufficiently simple and 
effective to
give some experience to how well it could work.

> "Then add up all link counts for all surrounding words, and pick the
> homonym
> that has the smallest total counts."
> 
> OK. What if you instead skipped all links that are not shorter than the
> shortest link already found? There isn't any need to save a table of all
> counts, if not for research or debugging.

yes, I think I described this above.

> Further: Maybe it isn't necessary to find the shortest link, it might do
> with a sufficiently short link? A link shorter than X. You might be able
> to trim your algo when testing to find out a suitable X-factor. Maybe
> the factor should be set differently for different languages or
> corpuses? Your module might be shipped with a default factor that can be
> adjusted by the developer of a language pair to the best fit. (Rational:
> If the probability to find a shorter link is very low, don't try to find
> any.)

I think you really should find the shortest link, to find the best match.

But we could try out different strategies, such as stopping searching when the 
link count exceeds X.
Depends on how time consuming it is. 

> BTW Just like you I'm into this just for the fun of it. I will only work
> with things that are of great interests to me. Primarily, I like to
> solve problems. Or help others to solve theirs.

Yes, agree. I think that also discussing on the list without doing
commits is contributing to the Apertium project.
And I would like to contribute more (I have done some committs already)
but I am stuck with committs because I am not getting
the advice from more seasoned people, that I am in my limited understanding
thinking that I need guidance on, to not hurt the overall system
or the specific language pair I am working on, or to not violate Apertium
design principles.

Let's see what happens.
keld
> 
> Yours,
> Per Tunedal
> 
> On Mon, Oct 8, 2012, at 20:51, k...@keldix.com wrote:
> > Hej Per
> > 
> > Vad tycker du om min algoritm p?? wordnet data?
> > 
> > H??lsningar
> > keld
> > 
> --snip---
> > > 
> > > 
> > > 
> > > 2012/10/7 Per Tunedal <[1]per.tune...@operamail.com>
> > > 
> > >   Hi again,
> > >   maybe this is feasible now, in the light of the possibility to trim
> > >   dictionaries?
> > >   Otherwise you might add the words to a copy of the Swedish
> > >   dictionnary
> > >   and give it an explanatory suffix. And I might somehow find out a
> > >   way to
> > >   adjust it.
> > >   Actually it might be better to add it to the increased Swedish
> > >   dictionary from the pair Islandic (is) - Swedish (se) (or was it
> > >   se-is?), instead. I suppose it wouldn't do any harm if I tried that
> > >   dictionary for Swedish and Danish. The main problem right now is
> > >   that
> > >   there are much more words in the Danish, than in the Swedish
> > >   dictionary.
> > >   Yours,
> > >   Per Tunedal
> > >   On Tue, Sep 11, 2012, at 10:03, [2]k...@keldix.com wrote:
> > >   > Hej Per
> > >   >
> > >   > I actually have about 49.000 swedish nouns from the SALDO project
> > >   to add
> > >   > to the swedish dix. I would just like some way to suppress
> > >   overwriting
> > >   > already existing working relations for homonyms.
> > >   >
> > >   --snip--
> > >   >
> > >   > Best regards
> > >   > keld
> > >   >
> > >   >
> > >   --snip__
> > >   > [3]https://lists.sourceforge.net/lists/listinfo/apertium-stuff
> > >   --------------------------------------------------------------------
> > >   ----------
> > >   Don't let slow site performance ruin your business. Deploy New Relic
> > >   APM
> > >   Deploy New Relic app performance management and know exactly
> > >   what is happening inside your Ruby, Python, PHP, Java, and .NET app
> > >   Try New Relic at no cost today and get our sweet Data Nerd shirt
> > >   too!
> > >   [4]http://p.sf.net/sfu/newrelic-dev2dev
> > >   _______________________________________________
> > >   Apertium-stuff mailing list
> > >   [5]Apertium-stuff@lists.sourceforge.net
> > >   [6]https://lists.sourceforge.net/lists/listinfo/apertium-stuff
> > > 
> > > 
> > > --
> > > [7]Jacob Nordfalk
> > > [8]javabog.dk
> > > Androidudvikler og -underviser p?? [9]IHK og [10]Lund&Bendsen
> > > 
> > > -----------------------------------------------------------------------
> > > -------
> > > 
> > > Don't let slow site performance ruin your business. Deploy New Relic
> > > APM
> > > 
> > > Deploy New Relic app performance management and know exactly
> > > 
> > > what is happening inside your Ruby, Python, PHP, Java, and .NET app
> > > 
> > > Try New Relic at no cost today and get our sweet Data Nerd shirt too!
> > > 
> > > [11]http://p.sf.net/sfu/newrelic-dev2dev
> > > 
> > > _______________________________________________
> > > 
> > > Apertium-stuff mailing list
> > > 
> > > [12]Apertium-stuff@lists.sourceforge.net
> > > 
> > > [13]https://lists.sourceforge.net/lists/listinfo/apertium-stuff
> > > 
> > > References
> > > 
> > > 1. mailto:per.tune...@operamail.com
> > > 2. mailto:k...@keldix.com
> > > 3. https://lists.sourceforge.net/lists/listinfo/apertium-stuff
> > > 4. http://p.sf.net/sfu/newrelic-dev2dev
> > > 5. mailto:Apertium-stuff@lists.sourceforge.net
> > > 6. https://lists.sourceforge.net/lists/listinfo/apertium-stuff
> > > 7. http://profiles.google.com/jacob.nordfalk
> > > 8. http://javabog.dk/
> > > 9. http://cv.ihk.dk/diplomuddannelser/itd/vf/MAU
> > >   10. https://www.lundogbendsen.dk/undervisning/beskrivelse/LB1809/
> > >   11. http://p.sf.net/sfu/newrelic-dev2dev
> > >   12. mailto:Apertium-stuff@lists.sourceforge.net
> > >   13. https://lists.sourceforge.net/lists/listinfo/apertium-stuff
> > 
> > > ------------------------------------------------------------------------------
> > > Don't let slow site performance ruin your business. Deploy New Relic APM
> > > Deploy New Relic app performance management and know exactly
> > > what is happening inside your Ruby, Python, PHP, Java, and .NET app
> > > Try New Relic at no cost today and get our sweet Data Nerd shirt too!
> > > http://p.sf.net/sfu/newrelic-dev2dev
> > > _______________________________________________
> > > Apertium-stuff mailing list
> > > Apertium-stuff@lists.sourceforge.net
> > > https://lists.sourceforge.net/lists/listinfo/apertium-stuff
> > 
> 
> ------------------------------------------------------------------------------
> Don't let slow site performance ruin your business. Deploy New Relic APM
> Deploy New Relic app performance management and know exactly
> what is happening inside your Ruby, Python, PHP, Java, and .NET app
> Try New Relic at no cost today and get our sweet Data Nerd shirt too!
> http://p.sf.net/sfu/newrelic-dev2dev
> _______________________________________________
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff

------------------------------------------------------------------------------
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Word selection by sens was: Re: Adding Swedish nouns from SALDO to da-se was: Re: Danish - Swedish Nouns

Reply via email to