Re: word similarity measure

Chas. Owens Sun, 01 Mar 2009 08:56:12 -0800

On Fri, Feb 27, 2009 at 11:24, Susan <yangyang0...@gmail.com> wrote:
> If my data looks like this:
>
> word 1: 100    101     101    102    102     102    106    106
> word 2: 101    104     106    110    113     129    131    148
> word 3: 101    153     175    180    381
> word 4: 106    110     113    122    131     137    142    148
> word 5: 120    165     169
>
> where word 1,2,3,4,5 represent different words, numbers represent
> different attributes of words.
>
> How can I calculate similarity between words?
snip


What do you mean by similarity?  I am going to assume that similar
means having the same attributes.

#!/usr/bin/perl

use strict;
use warnings;

my %h;
while (<DATA>) {
        next unless my ($word, $attr) = /(.*):(.*)/;
        $h{$word} = [split " ", $attr];
}

#inelegant, but I am lazy now
for my $k1 (keys %h) {
        my %comp;
        for my $k2 (keys %h) {
                next if $k1 eq $k2; #don't compare to yourself
                $comp{$k2} = 0;
                for my $attr1 (@{$h{$k1}}) {
                        for my $attr2 (@{$h{$k2}}) {
                                $comp{$k2}++ if $attr1 == $attr2;
                        }
                }
        }
        print "$k1 has\n",
                map {
                        ("\t$comp{$_} attribute",
                        ($comp{$_} == 1 ? '' : 's'),
                        " in common with $_\n")
                } keys %comp;
}


__DATA__
word 1: 100    101     101    102    102     102    106    106
word 2: 101    104     106    110    113     129    131    148
word 3: 101    153     175    180    381
word 4: 106    110     113    122    131     137    142    148
word 5: 120    165     169


-- 
Chas. Owens
wonkden.net
The most important skill a programmer can have is the ability to read.

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: word similarity measure

Reply via email to