Re: [Perldl] Matlab overview

David Mertens Tue, 03 Nov 2009 23:55:23 -0800

(Accidentally only sent to Gabor.  Resending to everybody.)

Hi Gabor -


I used to call myself a C++ programmer, and I've written a few analysis
scripts in Matlab before getting fed up with all of it and switching to
Perl/PDL, so I can probably give you a few ideas.  I see you're speaking to
a PerlMongers group, so I assume this group will be interested in seeing
what Perl can do.  I'll try to take this into account with my response.  As
written, it seems like I've got a whole lot of ideas and very few
specifics.  Feel free to ask more specific questions as they arise.

The single greatest advantage that Matlab/PDL offers over the standard
hammers - C, C++, FORTRAN, and Java - is that Matlab and PDL scripts require
much, much less overhead to get them to actually do something useful.  I
once wrote a 2000 line C++ simulation program, of which only about 150 lines
were actually interesting to me as a scientist.  Also, Perl scripts tend to
be less of a 'my program can do everything!' solution, because perl scripts
are so cheap to write.  (Most perl programmers put their effort into writing
modules that do everything for them, and then put it on CPAN, which makes
their code far more reusable.  But I digress.)

The single greatest advantage that PDL offers over Matlab is that PDL is an
extension of Perl, a vastly supperior language for systems programming, file
handling, data aggregation, etc.  This means you'll write even less
infrastructure code with PDL than you would with Matlab and it'll be easier
to comprehend.  In my opinion, Perl is also a more expressive than Matlab.


First a little bit of overview for numerical computing.  Matlab and PDL both
try to solve similar problems.  As we all know, you could do all your
numerical work with C, C++, FORTRAN, or Java code and - assuming you had
great patience and skill - it would run really fast.  However, you pay for
your fast code with increased developer hours.  Scripting languages like
Bash, Python, and Perl take the opposite approach, assuming that you would
do better to reduce your developer hours but at the expense of computational
speed.  We all know this is why we like Perl, because it allows us to get to
solving our problems quickly.  This is trade-off often favors scripting
languages, but numerical programming is an important exception.  Matlab and
IDL, as well as Numpy and PDL, are attempts to find a happy medium between
rapid development and rapid execution.

The happy medium for all of these languages lies in optimizing vectorized
operations.  In other words, if you have two data sets of 100 elements a
piece that you want to sum, in C you would write a for loop to iterate over
all the elements, but in a numerical language you would simply write $a + $b
and it would give a vector/array containing elements with the sum.  That
sum, and many other operations, is computed using compiled C-code that
iterates over the elements of $a and $b just as fast as hand-rolled C-code.
The reason that PDL, Numpy, Matlab, and IDL have had such great success is
because most numerical operations can actually be implemented in this
vectorized way of thinking.

If you are speaking to C programmers, you must emphasize this way of
thinking because they will always want to think in terms of for loops.  If
you ever write a for-loop in PDL or Matlab, you're probably not doing it
right, unless you're iterating through a list of files.

So, using a vectorized approach makes your PDL/Matlab code almost as fast as
C or FORTRAN, but it has a huge, rarely mentioned advantage: writing your
code using vectorized expressions is actually more expressive and easier to
understand.  With less for loops hogging up screen space, you've got less
extraneous code, less off-by-one errors, and can see more of your logic in
one screen.

At this point, those who do their numerics in Java and C++ will get restless
and note that they could achieve similar levels of expresiveness with
vector/array objects and operator overloading.  They are correct, and the
rest of their code would run faster than the Matlab or PDL code.  But then
they'd be doing all of their infrastructure code in C++ or Java, which is
just a headache for me (and probably most of the other people on this
list).  And, they probably can't do data slicing like PDL can.

At this point, not everybody may agree with you, but hopefully they
understand why you would opt to work with PDL or Matlab instead of the
standard hammers.

Now onto explaining the differences between PDL and Matlab.  Here's a list
of what Matlab does better than PDL:

The most jarring difference for a simple Matlab (or IDL) user will be the
lack of an IDE.  PDL offers a partial replacement, however, with the PDL
prompt. The PDL prompt is an interactive shell, and it's possible to set up
an auto-load directory in much the same way that you could set up
autoloading for Matlab or IDL.  Perhaps, someday PDL could have an extension
for Padre that would actually make it feel more like a full IDE, but until
that time comes, PDL developers will work with an open perldl or bash shell,
editing scripts in their preferred editors.

PDL has 2D and 3D graphics for visualizing data, but they're not nearly as
developed as in Matlab.  So the simple answer to this is, "It can be done,
but it may not be as easy."  From what I understand, PGPLOT is a standard
plotting tool for Astronomers, so it can't be all that bad.  Still, Matlab
makes visualization pretty darn easy and is (rightly) a major selling point.

>From the numeric standpoint, Matlab has many more features than PDL.  For
example, there is a standard wavelet analysis toolbox in Matlab, and as far
as I know, nobody has written a wavelet analysis tool for PDL.  The vast
wealth of toolboxes for Matlab is another major selling point for Matlab.
Again it could be done in PDL, but you may have to do it yourself.

Now, here's a list of what PDL does better than Matlab:

Basically put, PDL has an incredible base upon which to build a numerical
suite.  As far as I am aware, PDL's data-flow and "pdl-threading"
capabilities are unmatched.  They are also very poorly explained, which is
why they are so under-appreciated.  To perpetuate the problem, I will not
attempt to explain them here because I don't know them well enough myself to
explain them to a beginner.  But there are other advantages.

You've already said it and I'm going to reiterate it because it's why I left
Matlab: file handling.  Handling nontrivial filenames in Matlab is a huge
pain.  Yes, they do trivial, trivial file globbing and they offer regular
expressions, but I only thought to look for these after I switched to Perl.
It didn't even occurr to me as a possibility while I was a regular Matlab
user.  If you need to process a server log and extract lots of data from it,
you wouldn't even think to write the extraction portion in Matlab.  This is
Perl's bread and butter, and if you use PDL you can include the numerical
analysis in the same script.

You also mentioned the major differences in cost between Matlab and PDL.
This is a major sellng point for PDL, so to speak.  I'm told Matlab's price
can impede collaboration, though I've never actually seen it happen.  Still
it's nice to have free (beer) software.

The reason those cool features that I'll point at but won't get into -
data-flow and "pdl-threading" - the reason those are possible is PDL::PP.
This is a very advanced feature of PDL, but it is a major difference between
PDL and Matlab.  In Matlab, you can write C functions and compile them into
MEX files, which is nice.  You can do the same for plane-old Perl, too,
using a variation of C called XS.  What's more, you can do this inline with
your code using the Inline module.  It's all very nice.  But with PDL, it's
even better.

I mentioned that with PDL and Matlab, you try to replace for loops with
vector operations.  In PDL::PP, you specify what those vector operrations
should do, using loops and all, with C-like code (it eventually gets turned
into C code and gets compiled), but PDL::PP HANDLES HIGHER DIMENSIONS FOR
YOU, as well as multiple data types.  It is far easier to write
custom-rolled C-code for PDL than for Matlab.  The only problem is that the
documentation for PDL::PP is difficult to get through, assumes you're a PDL
expert, and you're willing to give the documentation a number of reads in
order to get it.  (By the way, the documentation that Toumas and company did
write has a nice style and covers a lot of material, but let's just say it's
not quite like reading the Camel book.)

Finally, PDL has a great group of developers.  They're just nice people.
I'm sure you could find the same thing with Matlab, but I never even thought
to look.  Perl is just so much more of a culture-oriented language.


So what does PDL have the Matlab doesn't?  It's based in Perl, it's got all
of CPAN at its disposal, it's got the amazing PDL::PP at its core, its free,
and it's got cool developers.  Where does Matlab win out?  Great
visualization, huge user base, clean IDE, and a large tool set.


Gabor, this may be the longest email I've ever written to anybody.  You
struck a major chord for me.  What I've written is not a specific 'instead
of command A use command B', but hopefully it gives you an idea for the
different flavors of the two languages.  It's more of a philosophical
how-to.

Anyway, I should be off to bed.  Please feel free to ask specific how-to
questions if you can think of them.  Good luck!

David

_______________________________________________
Perldl mailing list
[email protected]
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl

Re: [Perldl] Matlab overview

Reply via email to