Craig, David, others,

I find your explanation satisfying, but not the actual results that I
am getting. I am experiencing a more stable performance from Perl,
with the performance scaling predictably. PDL shows itself to be more
moody. From one run to another, the performance can really swing. This
is on my MacBook with no other user process running (meaning, I am not
ripping music or watching a movie on Hulu at the same time...).

First, no doubt my simplistic PDL approach was wrong. I figured, I
have to calculate one "column" based on two other "columns" -- "Hey!
the PDL docs show how to get a column... use slice." So, that is what
I went with. However, using Craig's better and more efficient
calculation approach, I did experience much better results, but not
completely.

I used Craig's reworked script and ran it three times. The results are
below (use fixed width font to see the results), but here is some
discussion --

Both David and Craig implied that making the data (the array for Perl
and the piddle for PDL) would be more efficient in Perl because it
would do some up-front memory allocation, so 'push'ing an element on
to the array would not be costly. That is not the case. PDL is pretty
good, in fact, better than Perl in converting an array into a piddle
than Perl is in making the array in the first place.

Another assertion was that PDL will eat Perl's lunch when it comes to
calculation. That is also not the case *always*. PDL is much faster at
smaller data sets. But, at a certain threshold, (for me, that
threshold is 3 million), PDL gets bogged down. Actually, at 3.5
million, PDL gets very slow, and at 4 million, it basically locks up
my computer.

Another interesting issue -- Perl seems to be better at sharing the
resources. When the Perl calculation is running, my machine is
responsive. I can switch back to the browser, scroll a page, etc. When
the PDL calc is running, it is like my machine is frozen.

This kinda worries me. If we write-up the gotchas and the limits
between which PDL use is optimal, then it is "caveat emptor" and all
that. However, on a more realistic front, I was hoping to use PDL with
a 13 million elements piddle. I did some tests, and I found that a 2D
piddle where ("first D" * "second D") = 13 million, PDL was smokingly
fast. I am wondering though -- will its performance change if the
piddle was a 1D piddle that was 13 million elements long? Does it
matter to PDL if my dataset is a "long rope" vs. a "carpet", but both
with the same "thread count" (to use a fabric analogy)?

Test results (reformatted) shown below


count: 10000
============================
           Perl       PDL
----------------------------
make data: 0.0097     0.0065
calculate: 0.0064     0.0014

make data: 0.0106     0.0065
calculate: 0.0064     0.0014

make data: 0.0104     0.0065
calculate: 0.0063     0.0014
____________________________


count: 100000
============================
           Perl       PDL
----------------------------
make data: 0.0962     0.0791
calculate: 0.0624     0.0108

make data: 0.0966     0.0811
calculate: 0.0621     0.0109

make data: 0.0966     0.0789
calculate: 0.0626     0.0109
____________________________


count: 1000000
============================
           Perl       PDL
----------------------------
make data: 0.9626     0.8014
calculate: 0.6269     0.1170

make data: 0.9656     0.8064
calculate: 0.6275     0.1182

make data: 0.9643     0.8203
calculate: 0.6275     0.1168
____________________________


count: 2000000
============================
           Perl       PDL
----------------------------
make data: 1.7542     1.5168
calculate: 1.2462     0.2381

make data: 1.7519     1.5221
calculate: 1.2500     0.2391

make data: 1.7517     1.5226
calculate: 1.2699     0.2394
____________________________


count: 3000000
============================
           Perl       PDL
----------------------------
make data: 2.5263     2.5722
calculate: 1.9163     3.2107

make data: 2.5411     2.2062
calculate: 1.8897     6.9557

make data: 2.5305     2.2822
calculate: 1.9204     7.2502
____________________________
On Fri, Jul 9, 2010 at 2:32 AM, Craig DeForest
<[email protected]> wrote:
> Wow, Puneet really stirred us all up (again).  Puneet, as David said, your
> PDL code is slow because you are using a complicated expression, which
> forced PDL to create and destroy intermediate PDLs (every binary operation
> has to have a complete temporary PDL allocated and then freed to store its
> result!).  I attach a variant of your test, with the operation carried out
> as much in-place as possible to eliminate extra allocations.  PDL runs
> almost exactly a factor of 10 faster on my computer than does raw Perl in
> this case.
> Note that the original ingestion of the Perl array to PDL is quite slow:  it
> generally takes slightly longer to create the PDL than to generate the
> random numbers and create the Perl array in the first place!  That is
> because PDL has to make several passes through the Perl array to determine
> its size, and then has to individually probe and convert each numeric value
> in the Perl array.
>
> On Jul 9, 2010, at 1:09 AM, David Mertens wrote:
>
> FYI, for really thorough timing results, check out Devel::NYTProf:
> http://search.cpan.org/~timb/Devel-NYTProf-4.03/lib/Devel/NYTProf.pm
>
> You have a lot of things going on to mix up the results - you have both a
> memory allocation and a calculation. As I understand it, Perl will likely
> outperform PDL in the memory allocation portion of this exercise, but PDL
> should have Perl's lunch for the calculation portion.
>
> Perl will outperform PDL in the memory allocation because in all likelihood,
> it doesn't perform any allocation with the push. It likely already allocated
> more than three elements for (all of) its arrays, so pushing the new value
> on the array does not cost anything, except for a higher up-front memory
> cost. I suspect this is where PDL is losing to Perl - Perl is performing the
> allocation ahead of where you start the timer.
>
> In terms of the calculation itself, PDL should far outperform Perl. The
> reason is that the actual contents of the calculation loop are very slim, so
> the cost of all of the Perl stack manipulation should significantly increase
> its cost. The reason Perl for loops usually make sense are because the code
> inside the for loops often involve IO operations or other such things, in
> which case the Perl stack manipulations comprise only a small portion of the
> total compute time.
>
> Try a situation when Perl and PDL allocate their memory as part of the
> timing and see what that gives.
>
> David
>
> --
> Sent via my carrier pigeon.
> _______________________________________________
> Perldl mailing list
> [email protected]
> http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
>
>
>
>
>



-- 
Puneet Kishor http://www.punkish.org
Carbon Model http://carbonmodel.org
Charter Member, Open Source Geospatial Foundation http://www.osgeo.org
Science Commons Fellow, http://sciencecommons.org/about/whoweare/kishor
Nelson Institute, UW-Madison http://www.nelson.wisc.edu
-----------------------------------------------------------------------
Assertions are politics; backing up assertions with evidence is science
=======================================================================

_______________________________________________
Perldl mailing list
[email protected]
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl

Reply via email to