On Fri, Feb 27, 2009 at 11:24, Susan <yangyang0...@gmail.com> wrote: > If my data looks like this: > > word 1: 100 101 101 102 102 102 106 106 > word 2: 101 104 106 110 113 129 131 148 > word 3: 101 153 175 180 381 > word 4: 106 110 113 122 131 137 142 148 > word 5: 120 165 169 > > where word 1,2,3,4,5 represent different words, numbers represent > different attributes of words. > > How can I calculate similarity between words? snip
What do you mean by similarity? I am going to assume that similar means having the same attributes. #!/usr/bin/perl use strict; use warnings; my %h; while (<DATA>) { next unless my ($word, $attr) = /(.*):(.*)/; $h{$word} = [split " ", $attr]; } #inelegant, but I am lazy now for my $k1 (keys %h) { my %comp; for my $k2 (keys %h) { next if $k1 eq $k2; #don't compare to yourself $comp{$k2} = 0; for my $attr1 (@{$h{$k1}}) { for my $attr2 (@{$h{$k2}}) { $comp{$k2}++ if $attr1 == $attr2; } } } print "$k1 has\n", map { ("\t$comp{$_} attribute", ($comp{$_} == 1 ? '' : 's'), " in common with $_\n") } keys %comp; } __DATA__ word 1: 100 101 101 102 102 102 106 106 word 2: 101 104 106 110 113 129 131 148 word 3: 101 153 175 180 381 word 4: 106 110 113 122 131 137 142 148 word 5: 120 165 169 -- Chas. Owens wonkden.net The most important skill a programmer can have is the ability to read. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/