Re: [Perldl] Mysterious slow down from repeated inner calls

Chris Marshall Mon, 27 Feb 2012 10:28:12 -0800

I'm not sure what you mean here.  The current release on
CPAN is 2.4.10 and you can find SciPDL for both 2.4.9
and 2.4.10 on sf.net.


--Chris

On Mon, Feb 27, 2012 at 12:06 PM, Jim Magnuson <[email protected]> wrote:
> Ah. I'm on 2.4.6. There doesn't appear to be an easy way to upgrade. CPAN
> doesn't find a newer version, and SciPDL (I'm on a mac) says it installs
> successfully with 2.4.10, but I still seem to have 2.4.6.
>
> I guess that will be my project for tomorrow...
>
> Thanks again,
>
> jim
>
>
> On Mon, Feb 27, 2012 at 5:29 PM, Chris Marshall <[email protected]>
> wrote:
>>
>> Hmm, what version of PDL are you using?  My
>> code was with PDL-2.4.10 and some of the new
>> rcols support has only been in since PDL-2.4.7
>> or so.
>>
>> You can just create a piddle of the needed
>> dimensions and then use your while loop to
>> put the data in by hand.
>>
>> $inword = [];
>> $count = 0;
>> while(<>) {
>>  my($wrd,@data) = split;
>>  push @$inword, $wrd;
>>  $kern(:,$count) .= norm(pdl(@data));
>> }
>>
>> --Chris
>>
>>
>> On Mon, Feb 27, 2012 at 11:18 AM, Jim Magnuson <[email protected]>
>> wrote:
>> > Hello, Chris
>> >
>> > May I impose on you a bit more?
>> >
>> > I am trying to understand the code you sent, but I get an error at the
>> > 2nd
>> > line ($kern = $grid->mv(1,0)->norm;):
>> >
>> > One of dims 1, 0 out of range: should be 0<=dim<1
>> >
>> > Also, the first line seems to make $inword into a piddle of zeros. Is
>> > that
>> > the intent?
>>
>> $inword is a perl array ref with the elements being the
>> words from each line in column 0.  Again, this is for a
>> recent PDL.
>>
>> > Thank you very, very much,
>> >
>> > jim
>> >
>> > On Mon, Feb 27, 2012 at 4:04 PM, Chris Marshall <[email protected]>
>> > wrote:
>> >>
>> >> To get the best performance, you'll need to use what we
>> >> call vectorized PDL operations.  Here is an example of
>> >> pdl-iomatic way to do some of your computation:
>> >>
>> >> # use rcols to read data directly into a pdl and perl array
>> >> ($inword, $grid) = rcols 'jmtest.data',0,[], { perlcols=>[0] };
>> >>
>> >> # rearrange dimensions since rcols puts columns in dim(0)
>> >> $kern = $grid->mv(1,0)->norm;
>> >>
>> >> # number of records is now length of dim(1)
>> >> $nrecs = $kern->dim(1);
>> >>
>> >> # calculate all inner products at the same time
>> >> $sim = inner($kern(,(0)),$kern);
>> >>
>> >> # calculate the top 20 values
>> >> $topXind = zeros(long,20);
>> >>
>> >> # don't forget to skip diagonal elements
>> >> $sim->(1:-1)->maximum_n_ind($topXind);
>> >>
>> >> # use slicing to get the max elements
>> >> $topX = $sim($topXind);
>> >>
>> >> # how much wt in top 20?
>> >> print $topX->sum . "\n";
>> >>
>> >>
>> >> Cheers,
>> >> Chris
>> >>
>> >> On Mon, Feb 27, 2012 at 8:46 AM, Jim Magnuson
>> >> <[email protected]>
>> >> wrote:
>> >> > Yes, a typo -- I changed the variable name for the example code and
>> >> > lost
>> >> > the
>> >> > $ somehow...
>> >> >
>> >> > Currently trying to get timing tests going as suggested by chm...
>> >> >
>> >> > thanks,
>> >> >
>> >> > jim
>> >> >
>> >> >
>> >> > On Mon, Feb 27, 2012 at 2:39 PM, Clifford Sobchuk
>> >> > <[email protected]> wrote:
>> >> >>
>> >> >> I don't know if this is a typo or not, but in the code for inner
>> >> >> loop
>> >> >> you
>> >> >> have the following:
>> >> >> >      $sim = inner($kernel{$w1},$kernel{w2});
>> >> >> Where $kernel{w2} should be $kernel{$w2}.
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> CLIFF SOBCHUK
>> >> >> Core RF Engineering
>> >> >> Phone 613-667-1974   ecn: 8109-71974
>> >> >> mobile 403-819-9233
>> >> >> yahoo: sobchuk
>> >> >> www.ericsson.com
>> >> >>
>> >> >> "The author works for Telefonaktiebolaget L M Ericsson ("Ericsson"),
>> >> >> who
>> >> >> is solely responsible for this email and its contents. All inquiries
>> >> >> regarding this email should be addressed to Ericsson. The web site
>> >> >> for
>> >> >> Ericsson is www.ericsson.com."
>> >> >>
>> >> >> This Communication is Confidential. We only send and receive email
>> >> >> on
>> >> >> the
>> >> >> basis of the terms set out at www.ericsson.com/email_disclaimer
>> >> >>
>> >> >>
>> >> >> -----Original Message-----
>> >> >> From: chm [mailto:[email protected]]
>> >> >> Sent: Monday, February 27, 2012 5:46 AM
>> >> >> To: Jim Magnuson
>> >> >> Cc: perldl
>> >> >> Subject: Re: [Perldl] Mysterious slow down from repeated inner calls
>> >> >>
>> >> >> I don't know of any reason why inner() would slow down---have you
>> >> >> tried
>> >> >> using NYTProf or some such tool to track time in inner and number of
>> >> >> calls
>> >> >> to inner?  One oddity is that the first loop appears to skip all
>> >> >> calls
>> >> >> to
>> >> >> inner which would be *very* fast.  Maybe something is going on with
>> >> >> the
>> >> >> loop
>> >> >> structure?
>> >> >>
>> >> >> --Chris
>> >> >>
>> >> >> On 2/27/2012 2:50 AM, Jim Magnuson wrote:
>> >> >> > Hello,
>> >> >> >
>> >> >> > I have a set of about 30,000 words, and I am using string kernels
>> >> >> > as
>> >> >> > a
>> >> >> > metric of word similarity. The goal is to see whether different
>> >> >> > kernels are better at predicting how quickly human subjects are
>> >> >> > able
>> >> >> > to process words. I have calculated the string kernels for each
>> >> >> > word.
>> >> >> > So now I have a file with 30,000 lines. The first field in each
>> >> >> > line
>> >> >> > is a word, and this is followed by a 676-element vector
>> >> >> > representing
>> >> >> > the
>> >> >> > kernel representation.
>> >> >> >
>> >> >> > Once I read this in, I need to step through and calculate the
>> >> >> > similarity of each word to every other word using vector cosine,
>> >> >> > as
>> >> >> > well as track the highest similarity value (excluding the word
>> >> >> > itself), and the set of X-most similar items (there are reasons to
>> >> >> > believe these are good predictors of human performance).
>> >> >> >
>> >> >> > Here's the problem: when I start running the code below, it is
>> >> >> > very
>> >> >> > fast.
>> >> >> > It takes 5 msecs to process the first word (that is, to do the
>> >> >> > necessary 30,000 cosines), but by the time it reaches the 100th it
>> >> >> > is
>> >> >> > taking 37 msecs, and by the 1,000th it is taking 398 msecs -- with
>> >> >> > 29,000 to go, and constant slowing...
>> >> >> >
>> >> >> > Memory use by perl stays constant, and I cannot figure out what
>> >> >> > would
>> >> >> > make the program slow down so much. I posted a query at Perl Monks
>> >> >> > and
>> >> >> > I got advice about how to speed up each step (the first word used
>> >> >> > to
>> >> >> > take 38 msecs), and they pointed out that it is indeed the call to
>> >> >> > inner that is the culprit (replace it with a non-pdl calculation,
>> >> >> > and
>> >> >> > the slowing goes away). They suggested I should look for advice
>> >> >> > from
>> >> >> > PDL
>> >> >> > experts.
>> >> >> >
>> >> >> > So if anyone can give me pointers as to what is slowing things
>> >> >> > down
>> >> >> > and whether there is a way to avoid it, I would be most grateful.
>> >> >> > Apologies in advance for any offensively inefficient/awkward use
>> >> >> > of
>> >> >> > PDL!
>> >> >> >
>> >> >> > Thanks!
>> >> >> >
>> >> >> > jim
>> >> >> > #!/usr/bin/perl -s
>> >> >> > use PDL;
>> >> >> > use Time::HiRes qw ( time ) ;
>> >> >> > $|=1;
>> >> >> > $top = 20;
>> >> >> >
>> >> >> > while(<>){
>> >> >> >      chomp;
>> >> >> >      ($wrd, @data) = split;
>> >> >> >      $kernel{$wrd} = norm(pdl(@data));
>> >> >> >      # EXAMPLE LINE
>> >> >> >      # word 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> >> >> > 0 0
>> >> >> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> >> >> > 0 0
>> >> >> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0
>> >> >> > 0 0
>> >> >> > 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> >> >> > 0 0
>> >> >> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> >> >> > 0 0
>> >> >> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> >> >> > 0 0
>> >> >> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> >> >> > 0 0
>> >> >> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> >> >> > 0 0
>> >> >> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> >> >> > 0 0
>> >> >> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> >> >> > 0 0
>> >> >> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> >> >> > 0 0
>> >> >> > 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> >> >> > 0 0
>> >> >> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> >> >> > 0 0
>> >> >> > 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> >> >> > 0 0
>> >> >> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> >> >> > 0 0
>> >> >> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> >> >> > 0 0
>> >> >> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> >> >> > 0 0
>> >> >> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> >> >> > 0 0
>> >> >> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> >> >> > 0 0
>> >> >> > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> >> >> >
>> >> >> > }
>> >> >> > $nrecs = keys %kernel;
>> >> >> > @kernelKeys = sort( keys %kernel );
>> >> >> >
>> >> >> > $startAll = time();
>> >> >> >
>> >> >> > $at1 = 0;
>> >> >> > foreach $w1 (@kernelKeys) {
>> >> >> >    $totalsim = $maxsim = 0;
>> >> >> >    $startWord = time();
>> >> >> >    @topX = ();
>> >> >> >    $at2 = 0;
>> >> >> >    foreach $w2 (@kernelKeys) {
>> >> >> >      next if($at1 == $at2); # skip identical item, but not
>> >> >> > homophones
>> >> >> >      $at2++;
>> >> >> >      $sim = inner($kernel{$w1},$kernel{w2});
>> >> >> >      $totalsim+=$sim;
>> >> >> >      if($sim>  $maxsim){      $maxsim = $sim;    }
>> >> >> >      # keep the top 20
>> >> >> >      if($#topX<  $top){
>> >> >> >        push @topX, $sim;
>> >> >> >      } else {
>> >> >> >        @topX = sort { $a<=>  $b } @topX;
>> >> >> >        if($sim>  $topX[0]){ $topX[0] = $sim;      }
>> >> >> >      }
>> >> >> >    }
>> >> >> >    $at1++;
>> >> >> >    $topXtotal = sum(pdl(@topX));
>> >> >> >    printf "$at1\t$w1\t$totalsim\t$maxsim\t$topXtotal\n";
>> >> >> >    unless($at1 % 10){
>> >> >> >      $now = time();
>> >> >> >      $elapsed = $now - $startAll;
>> >> >> >      $thisWord = $now - $startWord;
>> >> >> >      $perWord = $elapsed / $at1;
>> >> >> >      $hoursRemaining = (($nrecs - $at1) * $perWord)/3600;
>> >> >> >      printf STDERR "#$at1\t$w1\t$totalsim\t$maxsim\t$topXtotal\t";
>> >> >> >      printf STDERR "ELAPSED %.3f THISWORD %.3f PERWORD %.3f
>> >> >> > HOURStoGO
>> >> >> > %.3f\n",
>> >> >> >        $elapsed, $thisWord, $perWord, $hoursRemaining;
>> >> >> >    }
>> >> >> > }
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > _______________________________________________
>> >> >> > Perldl mailing list
>> >> >> > [email protected]
>> >> >> > http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
>> >> >>
>> >> >>
>> >> >> _______________________________________________
>> >> >> Perldl mailing list
>> >> >> [email protected]
>> >> >> http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
>> >> >
>> >> >
>> >
>> >
>
>

_______________________________________________
Perldl mailing list
[email protected]
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl

Re: [Perldl] Mysterious slow down from repeated inner calls

Reply via email to