I'm not sure what you mean here. The current release on CPAN is 2.4.10 and you can find SciPDL for both 2.4.9 and 2.4.10 on sf.net.
--Chris On Mon, Feb 27, 2012 at 12:06 PM, Jim Magnuson <[email protected]> wrote: > Ah. I'm on 2.4.6. There doesn't appear to be an easy way to upgrade. CPAN > doesn't find a newer version, and SciPDL (I'm on a mac) says it installs > successfully with 2.4.10, but I still seem to have 2.4.6. > > I guess that will be my project for tomorrow... > > Thanks again, > > jim > > > On Mon, Feb 27, 2012 at 5:29 PM, Chris Marshall <[email protected]> > wrote: >> >> Hmm, what version of PDL are you using? My >> code was with PDL-2.4.10 and some of the new >> rcols support has only been in since PDL-2.4.7 >> or so. >> >> You can just create a piddle of the needed >> dimensions and then use your while loop to >> put the data in by hand. >> >> $inword = []; >> $count = 0; >> while(<>) { >> my($wrd,@data) = split; >> push @$inword, $wrd; >> $kern(:,$count) .= norm(pdl(@data)); >> } >> >> --Chris >> >> >> On Mon, Feb 27, 2012 at 11:18 AM, Jim Magnuson <[email protected]> >> wrote: >> > Hello, Chris >> > >> > May I impose on you a bit more? >> > >> > I am trying to understand the code you sent, but I get an error at the >> > 2nd >> > line ($kern = $grid->mv(1,0)->norm;): >> > >> > One of dims 1, 0 out of range: should be 0<=dim<1 >> > >> > Also, the first line seems to make $inword into a piddle of zeros. Is >> > that >> > the intent? >> >> $inword is a perl array ref with the elements being the >> words from each line in column 0. Again, this is for a >> recent PDL. >> >> > Thank you very, very much, >> > >> > jim >> > >> > On Mon, Feb 27, 2012 at 4:04 PM, Chris Marshall <[email protected]> >> > wrote: >> >> >> >> To get the best performance, you'll need to use what we >> >> call vectorized PDL operations. Here is an example of >> >> pdl-iomatic way to do some of your computation: >> >> >> >> # use rcols to read data directly into a pdl and perl array >> >> ($inword, $grid) = rcols 'jmtest.data',0,[], { perlcols=>[0] }; >> >> >> >> # rearrange dimensions since rcols puts columns in dim(0) >> >> $kern = $grid->mv(1,0)->norm; >> >> >> >> # number of records is now length of dim(1) >> >> $nrecs = $kern->dim(1); >> >> >> >> # calculate all inner products at the same time >> >> $sim = inner($kern(,(0)),$kern); >> >> >> >> # calculate the top 20 values >> >> $topXind = zeros(long,20); >> >> >> >> # don't forget to skip diagonal elements >> >> $sim->(1:-1)->maximum_n_ind($topXind); >> >> >> >> # use slicing to get the max elements >> >> $topX = $sim($topXind); >> >> >> >> # how much wt in top 20? >> >> print $topX->sum . "\n"; >> >> >> >> >> >> Cheers, >> >> Chris >> >> >> >> On Mon, Feb 27, 2012 at 8:46 AM, Jim Magnuson >> >> <[email protected]> >> >> wrote: >> >> > Yes, a typo -- I changed the variable name for the example code and >> >> > lost >> >> > the >> >> > $ somehow... >> >> > >> >> > Currently trying to get timing tests going as suggested by chm... >> >> > >> >> > thanks, >> >> > >> >> > jim >> >> > >> >> > >> >> > On Mon, Feb 27, 2012 at 2:39 PM, Clifford Sobchuk >> >> > <[email protected]> wrote: >> >> >> >> >> >> I don't know if this is a typo or not, but in the code for inner >> >> >> loop >> >> >> you >> >> >> have the following: >> >> >> > $sim = inner($kernel{$w1},$kernel{w2}); >> >> >> Where $kernel{w2} should be $kernel{$w2}. >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> CLIFF SOBCHUK >> >> >> Core RF Engineering >> >> >> Phone 613-667-1974 ecn: 8109-71974 >> >> >> mobile 403-819-9233 >> >> >> yahoo: sobchuk >> >> >> www.ericsson.com >> >> >> >> >> >> "The author works for Telefonaktiebolaget L M Ericsson ("Ericsson"), >> >> >> who >> >> >> is solely responsible for this email and its contents. All inquiries >> >> >> regarding this email should be addressed to Ericsson. The web site >> >> >> for >> >> >> Ericsson is www.ericsson.com." >> >> >> >> >> >> This Communication is Confidential. We only send and receive email >> >> >> on >> >> >> the >> >> >> basis of the terms set out at www.ericsson.com/email_disclaimer >> >> >> >> >> >> >> >> >> -----Original Message----- >> >> >> From: chm [mailto:[email protected]] >> >> >> Sent: Monday, February 27, 2012 5:46 AM >> >> >> To: Jim Magnuson >> >> >> Cc: perldl >> >> >> Subject: Re: [Perldl] Mysterious slow down from repeated inner calls >> >> >> >> >> >> I don't know of any reason why inner() would slow down---have you >> >> >> tried >> >> >> using NYTProf or some such tool to track time in inner and number of >> >> >> calls >> >> >> to inner? One oddity is that the first loop appears to skip all >> >> >> calls >> >> >> to >> >> >> inner which would be *very* fast. Maybe something is going on with >> >> >> the >> >> >> loop >> >> >> structure? >> >> >> >> >> >> --Chris >> >> >> >> >> >> On 2/27/2012 2:50 AM, Jim Magnuson wrote: >> >> >> > Hello, >> >> >> > >> >> >> > I have a set of about 30,000 words, and I am using string kernels >> >> >> > as >> >> >> > a >> >> >> > metric of word similarity. The goal is to see whether different >> >> >> > kernels are better at predicting how quickly human subjects are >> >> >> > able >> >> >> > to process words. I have calculated the string kernels for each >> >> >> > word. >> >> >> > So now I have a file with 30,000 lines. The first field in each >> >> >> > line >> >> >> > is a word, and this is followed by a 676-element vector >> >> >> > representing >> >> >> > the >> >> >> > kernel representation. >> >> >> > >> >> >> > Once I read this in, I need to step through and calculate the >> >> >> > similarity of each word to every other word using vector cosine, >> >> >> > as >> >> >> > well as track the highest similarity value (excluding the word >> >> >> > itself), and the set of X-most similar items (there are reasons to >> >> >> > believe these are good predictors of human performance). >> >> >> > >> >> >> > Here's the problem: when I start running the code below, it is >> >> >> > very >> >> >> > fast. >> >> >> > It takes 5 msecs to process the first word (that is, to do the >> >> >> > necessary 30,000 cosines), but by the time it reaches the 100th it >> >> >> > is >> >> >> > taking 37 msecs, and by the 1,000th it is taking 398 msecs -- with >> >> >> > 29,000 to go, and constant slowing... >> >> >> > >> >> >> > Memory use by perl stays constant, and I cannot figure out what >> >> >> > would >> >> >> > make the program slow down so much. I posted a query at Perl Monks >> >> >> > and >> >> >> > I got advice about how to speed up each step (the first word used >> >> >> > to >> >> >> > take 38 msecs), and they pointed out that it is indeed the call to >> >> >> > inner that is the culprit (replace it with a non-pdl calculation, >> >> >> > and >> >> >> > the slowing goes away). They suggested I should look for advice >> >> >> > from >> >> >> > PDL >> >> >> > experts. >> >> >> > >> >> >> > So if anyone can give me pointers as to what is slowing things >> >> >> > down >> >> >> > and whether there is a way to avoid it, I would be most grateful. >> >> >> > Apologies in advance for any offensively inefficient/awkward use >> >> >> > of >> >> >> > PDL! >> >> >> > >> >> >> > Thanks! >> >> >> > >> >> >> > jim >> >> >> > #!/usr/bin/perl -s >> >> >> > use PDL; >> >> >> > use Time::HiRes qw ( time ) ; >> >> >> > $|=1; >> >> >> > $top = 20; >> >> >> > >> >> >> > while(<>){ >> >> >> > chomp; >> >> >> > ($wrd, @data) = split; >> >> >> > $kernel{$wrd} = norm(pdl(@data)); >> >> >> > # EXAMPLE LINE >> >> >> > # word 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> >> >> > 0 0 >> >> >> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> >> >> > 0 0 >> >> >> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 >> >> >> > 0 0 >> >> >> > 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> >> >> > 0 0 >> >> >> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> >> >> > 0 0 >> >> >> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> >> >> > 0 0 >> >> >> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> >> >> > 0 0 >> >> >> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> >> >> > 0 0 >> >> >> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> >> >> > 0 0 >> >> >> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> >> >> > 0 0 >> >> >> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> >> >> > 0 0 >> >> >> > 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> >> >> > 0 0 >> >> >> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> >> >> > 0 0 >> >> >> > 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> >> >> > 0 0 >> >> >> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> >> >> > 0 0 >> >> >> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> >> >> > 0 0 >> >> >> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> >> >> > 0 0 >> >> >> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> >> >> > 0 0 >> >> >> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> >> >> > 0 0 >> >> >> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> >> >> > >> >> >> > } >> >> >> > $nrecs = keys %kernel; >> >> >> > @kernelKeys = sort( keys %kernel ); >> >> >> > >> >> >> > $startAll = time(); >> >> >> > >> >> >> > $at1 = 0; >> >> >> > foreach $w1 (@kernelKeys) { >> >> >> > $totalsim = $maxsim = 0; >> >> >> > $startWord = time(); >> >> >> > @topX = (); >> >> >> > $at2 = 0; >> >> >> > foreach $w2 (@kernelKeys) { >> >> >> > next if($at1 == $at2); # skip identical item, but not >> >> >> > homophones >> >> >> > $at2++; >> >> >> > $sim = inner($kernel{$w1},$kernel{w2}); >> >> >> > $totalsim+=$sim; >> >> >> > if($sim> $maxsim){ $maxsim = $sim; } >> >> >> > # keep the top 20 >> >> >> > if($#topX< $top){ >> >> >> > push @topX, $sim; >> >> >> > } else { >> >> >> > @topX = sort { $a<=> $b } @topX; >> >> >> > if($sim> $topX[0]){ $topX[0] = $sim; } >> >> >> > } >> >> >> > } >> >> >> > $at1++; >> >> >> > $topXtotal = sum(pdl(@topX)); >> >> >> > printf "$at1\t$w1\t$totalsim\t$maxsim\t$topXtotal\n"; >> >> >> > unless($at1 % 10){ >> >> >> > $now = time(); >> >> >> > $elapsed = $now - $startAll; >> >> >> > $thisWord = $now - $startWord; >> >> >> > $perWord = $elapsed / $at1; >> >> >> > $hoursRemaining = (($nrecs - $at1) * $perWord)/3600; >> >> >> > printf STDERR "#$at1\t$w1\t$totalsim\t$maxsim\t$topXtotal\t"; >> >> >> > printf STDERR "ELAPSED %.3f THISWORD %.3f PERWORD %.3f >> >> >> > HOURStoGO >> >> >> > %.3f\n", >> >> >> > $elapsed, $thisWord, $perWord, $hoursRemaining; >> >> >> > } >> >> >> > } >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > _______________________________________________ >> >> >> > Perldl mailing list >> >> >> > [email protected] >> >> >> > http://mailman.jach.hawaii.edu/mailman/listinfo/perldl >> >> >> >> >> >> >> >> >> _______________________________________________ >> >> >> Perldl mailing list >> >> >> [email protected] >> >> >> http://mailman.jach.hawaii.edu/mailman/listinfo/perldl >> >> > >> >> > >> > >> > > > _______________________________________________ Perldl mailing list [email protected] http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
