Re: [Perldl] PDL speed vs Perl loop

P Kishor Fri, 09 Jul 2010 08:18:11 -0700

On Fri, Jul 9, 2010 at 10:10 AM, Benjamin Schuster-Boeckler
<[email protected]> wrote:
> My experience with very large datasets in PDL comes down to this:
>
> USE THE SMALLEST SUITABLE DATATYPE
>
> I can't stress enough how important that is :-)
>



Yes, very correct. But, keep in mind, even if a single value is
different from your smallest datatype, the *entire* piddle will get
padded to the larger datatype.

> I'm dealing with vectors of ~500M values (whole human chromosomes, if you're 
> interested :-). If I only need a bitmask, I use a byte() piddle, if I have 
> counts, I use byte/ushort, and I even sometimes convert rational numbers to 
> integers for performance reasons. I'm pretty sure that most of your problems 
> on a mac are due to over-allocating memory.
>
> In general, I find that PDL DOES eat Perl's lunch a million times if you do 
> things cleverly. I was able to do sliding window averaging on those 500M 
> vectors using PDL::PP in a second or two, compared to hours in pure Perl.
>


I am sure. I am sold on PDL. But, it does require me to be "very
clever," while Perl is probably more forgiving. It will probably be a
long while before I reach the "very clever" stage and capable of using
PDL::PP to its fullest.

We really need an extensive and updated "Cookbook" that documents all
the best practices, with examples.





> Cheers,
> Ben
>
> On 9 Jul 2010, at 16:53, P Kishor wrote:
>
>> Craig, David, others,
>>
>> I find your explanation satisfying, but not the actual results that I
>> am getting. I am experiencing a more stable performance from Perl,
>> with the performance scaling predictably. PDL shows itself to be more
>> moody. From one run to another, the performance can really swing. This
>> is on my MacBook with no other user process running (meaning, I am not
>> ripping music or watching a movie on Hulu at the same time...).
>>
>> First, no doubt my simplistic PDL approach was wrong. I figured, I
>> have to calculate one "column" based on two other "columns" -- "Hey!
>> the PDL docs show how to get a column... use slice." So, that is what
>> I went with. However, using Craig's better and more efficient
>> calculation approach, I did experience much better results, but not
>> completely.
>>
>> I used Craig's reworked script and ran it three times. The results are
>> below (use fixed width font to see the results), but here is some
>> discussion --
>>
>> Both David and Craig implied that making the data (the array for Perl
>> and the piddle for PDL) would be more efficient in Perl because it
>> would do some up-front memory allocation, so 'push'ing an element on
>> to the array would not be costly. That is not the case. PDL is pretty
>> good, in fact, better than Perl in converting an array into a piddle
>> than Perl is in making the array in the first place.
>>
>> Another assertion was that PDL will eat Perl's lunch when it comes to
>> calculation. That is also not the case *always*. PDL is much faster at
>> smaller data sets. But, at a certain threshold, (for me, that
>> threshold is 3 million), PDL gets bogged down. Actually, at 3.5
>> million, PDL gets very slow, and at 4 million, it basically locks up
>> my computer.
>>
>> Another interesting issue -- Perl seems to be better at sharing the
>> resources. When the Perl calculation is running, my machine is
>> responsive. I can switch back to the browser, scroll a page, etc. When
>> the PDL calc is running, it is like my machine is frozen.
>>
>> This kinda worries me. If we write-up the gotchas and the limits
>> between which PDL use is optimal, then it is "caveat emptor" and all
>> that. However, on a more realistic front, I was hoping to use PDL with
>> a 13 million elements piddle. I did some tests, and I found that a 2D
>> piddle where ("first D" * "second D") = 13 million, PDL was smokingly
>> fast. I am wondering though -- will its performance change if the
>> piddle was a 1D piddle that was 13 million elements long? Does it
>> matter to PDL if my dataset is a "long rope" vs. a "carpet", but both
>> with the same "thread count" (to use a fabric analogy)?
>>
>> Test results (reformatted) shown below
>>
>>
>> count: 10000
>> ============================
>>           Perl       PDL
>> ----------------------------
>> make data: 0.0097     0.0065
>> calculate: 0.0064     0.0014
>>
>> make data: 0.0106     0.0065
>> calculate: 0.0064     0.0014
>>
>> make data: 0.0104     0.0065
>> calculate: 0.0063     0.0014
>> ____________________________
>>
>>
>> count: 100000
>> ============================
>>           Perl       PDL
>> ----------------------------
>> make data: 0.0962     0.0791
>> calculate: 0.0624     0.0108
>>
>> make data: 0.0966     0.0811
>> calculate: 0.0621     0.0109
>>
>> make data: 0.0966     0.0789
>> calculate: 0.0626     0.0109
>> ____________________________
>>
>>
>> count: 1000000
>> ============================
>>           Perl       PDL
>> ----------------------------
>> make data: 0.9626     0.8014
>> calculate: 0.6269     0.1170
>>
>> make data: 0.9656     0.8064
>> calculate: 0.6275     0.1182
>>
>> make data: 0.9643     0.8203
>> calculate: 0.6275     0.1168
>> ____________________________
>>
>>
>> count: 2000000
>> ============================
>>           Perl       PDL
>> ----------------------------
>> make data: 1.7542     1.5168
>> calculate: 1.2462     0.2381
>>
>> make data: 1.7519     1.5221
>> calculate: 1.2500     0.2391
>>
>> make data: 1.7517     1.5226
>> calculate: 1.2699     0.2394
>> ____________________________
>>
>>
>> count: 3000000
>> ============================
>>           Perl       PDL
>> ----------------------------
>> make data: 2.5263     2.5722
>> calculate: 1.9163     3.2107
>>
>> make data: 2.5411     2.2062
>> calculate: 1.8897     6.9557
>>
>> make data: 2.5305     2.2822
>> calculate: 1.9204     7.2502
>> ____________________________
>> On Fri, Jul 9, 2010 at 2:32 AM, Craig DeForest
>> <[email protected]> wrote:
>>> Wow, Puneet really stirred us all up (again).  Puneet, as David said, your
>>> PDL code is slow because you are using a complicated expression, which
>>> forced PDL to create and destroy intermediate PDLs (every binary operation
>>> has to have a complete temporary PDL allocated and then freed to store its
>>> result!).  I attach a variant of your test, with the operation carried out
>>> as much in-place as possible to eliminate extra allocations.  PDL runs
>>> almost exactly a factor of 10 faster on my computer than does raw Perl in
>>> this case.
>>> Note that the original ingestion of the Perl array to PDL is quite slow:  it
>>> generally takes slightly longer to create the PDL than to generate the
>>> random numbers and create the Perl array in the first place!  That is
>>> because PDL has to make several passes through the Perl array to determine
>>> its size, and then has to individually probe and convert each numeric value
>>> in the Perl array.
>>>
>>> On Jul 9, 2010, at 1:09 AM, David Mertens wrote:
>>>
>>> FYI, for really thorough timing results, check out Devel::NYTProf:
>>> http://search.cpan.org/~timb/Devel-NYTProf-4.03/lib/Devel/NYTProf.pm
>>>
>>> You have a lot of things going on to mix up the results - you have both a
>>> memory allocation and a calculation. As I understand it, Perl will likely
>>> outperform PDL in the memory allocation portion of this exercise, but PDL
>>> should have Perl's lunch for the calculation portion.
>>>
>>> Perl will outperform PDL in the memory allocation because in all likelihood,
>>> it doesn't perform any allocation with the push. It likely already allocated
>>> more than three elements for (all of) its arrays, so pushing the new value
>>> on the array does not cost anything, except for a higher up-front memory
>>> cost. I suspect this is where PDL is losing to Perl - Perl is performing the
>>> allocation ahead of where you start the timer.
>>>
>>> In terms of the calculation itself, PDL should far outperform Perl. The
>>> reason is that the actual contents of the calculation loop are very slim, so
>>> the cost of all of the Perl stack manipulation should significantly increase
>>> its cost. The reason Perl for loops usually make sense are because the code
>>> inside the for loops often involve IO operations or other such things, in
>>> which case the Perl stack manipulations comprise only a small portion of the
>>> total compute time.
>>>
>>> Try a situation when Perl and PDL allocate their memory as part of the
>>> timing and see what that gives.
>>>
>>> David
>>>
>>> --
>>> Sent via my carrier pigeon.
>>> _______________________________________________
>>> Perldl mailing list
>>> [email protected]
>>> http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>> --
>> Puneet Kishor http://www.punkish.org
>> Carbon Model http://carbonmodel.org
>> Charter Member, Open Source Geospatial Foundation http://www.osgeo.org
>> Science Commons Fellow, http://sciencecommons.org/about/whoweare/kishor
>> Nelson Institute, UW-Madison http://www.nelson.wisc.edu
>> -----------------------------------------------------------------------
>> Assertions are politics; backing up assertions with evidence is science
>> =======================================================================
>>

_______________________________________________
Perldl mailing list
[email protected]
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl

Re: [Perldl] PDL speed vs Perl loop

Reply via email to