Yes, a typo -- I changed the variable name for the example code and lost the $ somehow...
Currently trying to get timing tests going as suggested by chm... thanks, jim On Mon, Feb 27, 2012 at 2:39 PM, Clifford Sobchuk < [email protected]> wrote: > I don't know if this is a typo or not, but in the code for inner loop you > have the following: > > $sim = inner($kernel{$w1},$kernel{w2}); > Where $kernel{w2} should be $kernel{$w2}. > > > > > CLIFF SOBCHUK > Core RF Engineering > Phone 613-667-1974 ecn: 8109-71974 > mobile 403-819-9233 > yahoo: sobchuk > www.ericsson.com > > "The author works for Telefonaktiebolaget L M Ericsson ("Ericsson"), who > is solely responsible for this email and its contents. All inquiries > regarding this email should be addressed to Ericsson. The web site for > Ericsson is www.ericsson.com." > > This Communication is Confidential. We only send and receive email on the > basis of the terms set out at www.ericsson.com/email_disclaimer > > > -----Original Message----- > From: chm [mailto:[email protected]] > Sent: Monday, February 27, 2012 5:46 AM > To: Jim Magnuson > Cc: perldl > Subject: Re: [Perldl] Mysterious slow down from repeated inner calls > > I don't know of any reason why inner() would slow down---have you tried > using NYTProf or some such tool to track time in inner and number of calls > to inner? One oddity is that the first loop appears to skip all calls to > inner which would be *very* fast. Maybe something is going on with the > loop structure? > > --Chris > > On 2/27/2012 2:50 AM, Jim Magnuson wrote: > > Hello, > > > > I have a set of about 30,000 words, and I am using string kernels as a > > metric of word similarity. The goal is to see whether different > > kernels are better at predicting how quickly human subjects are able > > to process words. I have calculated the string kernels for each word. > > So now I have a file with 30,000 lines. The first field in each line > > is a word, and this is followed by a 676-element vector representing the > kernel representation. > > > > Once I read this in, I need to step through and calculate the > > similarity of each word to every other word using vector cosine, as > > well as track the highest similarity value (excluding the word > > itself), and the set of X-most similar items (there are reasons to > > believe these are good predictors of human performance). > > > > Here's the problem: when I start running the code below, it is very fast. > > It takes 5 msecs to process the first word (that is, to do the > > necessary 30,000 cosines), but by the time it reaches the 100th it is > > taking 37 msecs, and by the 1,000th it is taking 398 msecs -- with > > 29,000 to go, and constant slowing... > > > > Memory use by perl stays constant, and I cannot figure out what would > > make the program slow down so much. I posted a query at Perl Monks and > > I got advice about how to speed up each step (the first word used to > > take 38 msecs), and they pointed out that it is indeed the call to > > inner that is the culprit (replace it with a non-pdl calculation, and > > the slowing goes away). They suggested I should look for advice from PDL > experts. > > > > So if anyone can give me pointers as to what is slowing things down > > and whether there is a way to avoid it, I would be most grateful. > > Apologies in advance for any offensively inefficient/awkward use of PDL! > > > > Thanks! > > > > jim > > #!/usr/bin/perl -s > > use PDL; > > use Time::HiRes qw ( time ) ; > > $|=1; > > $top = 20; > > > > while(<>){ > > chomp; > > ($wrd, @data) = split; > > $kernel{$wrd} = norm(pdl(@data)); > > # EXAMPLE LINE > > # word 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 > > 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > > > } > > $nrecs = keys %kernel; > > @kernelKeys = sort( keys %kernel ); > > > > $startAll = time(); > > > > $at1 = 0; > > foreach $w1 (@kernelKeys) { > > $totalsim = $maxsim = 0; > > $startWord = time(); > > @topX = (); > > $at2 = 0; > > foreach $w2 (@kernelKeys) { > > next if($at1 == $at2); # skip identical item, but not homophones > > $at2++; > > $sim = inner($kernel{$w1},$kernel{w2}); > > $totalsim+=$sim; > > if($sim> $maxsim){ $maxsim = $sim; } > > # keep the top 20 > > if($#topX< $top){ > > push @topX, $sim; > > } else { > > @topX = sort { $a<=> $b } @topX; > > if($sim> $topX[0]){ $topX[0] = $sim; } > > } > > } > > $at1++; > > $topXtotal = sum(pdl(@topX)); > > printf "$at1\t$w1\t$totalsim\t$maxsim\t$topXtotal\n"; > > unless($at1 % 10){ > > $now = time(); > > $elapsed = $now - $startAll; > > $thisWord = $now - $startWord; > > $perWord = $elapsed / $at1; > > $hoursRemaining = (($nrecs - $at1) * $perWord)/3600; > > printf STDERR "#$at1\t$w1\t$totalsim\t$maxsim\t$topXtotal\t"; > > printf STDERR "ELAPSED %.3f THISWORD %.3f PERWORD %.3f HOURStoGO > > %.3f\n", > > $elapsed, $thisWord, $perWord, $hoursRemaining; > > } > > } > > > > > > > > > > _______________________________________________ > > Perldl mailing list > > [email protected] > > http://mailman.jach.hawaii.edu/mailman/listinfo/perldl > > > _______________________________________________ > Perldl mailing list > [email protected] > http://mailman.jach.hawaii.edu/mailman/listinfo/perldl >
_______________________________________________ Perldl mailing list [email protected] http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
