Re: Wow, Python much faster than MatLab

2007-01-01 Thread gblais
We're not so far apart.

I've used SAS or 25 years, and R/S-PLUS for 10.

I think you've said it better than I did, though: R requires more attention
(which is often needed).

I certainly didn't mean that R crashed - just an indictment of how much I
thought I was holding in my head.

Gerry
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Wow, Python much faster than MatLab

2007-01-01 Thread Wensui Liu
Gerry,

I have the similar background as yours, many years using SAS/R. Right
now I am trying to pick up python.

From your point, is there anything that can be done with python easily
but not with SAS/R?

thanks for your insight.

wensui

On 1/1/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
 We're not so far apart.

 I've used SAS or 25 years, and R/S-PLUS for 10.

 I think you've said it better than I did, though: R requires more attention
 (which is often needed).

 I certainly didn't mean that R crashed - just an indictment of how much I
 thought I was holding in my head.

 Gerry
 --
 http://mail.python.org/mailman/listinfo/python-list



-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Wow, Python much faster than MatLab

2006-12-31 Thread Klaas

sturlamolden wrote:

 as well as looping over the data only once. This is one of the main
 reasons why Fortran is better than C++ for scientific computing. I.e.
 instead of

 for (i=0; in; i++)
   array1[i] = (array1[i] + array2[i]) * (array3[i] + array4[i]);

 one actually gets something like three intermediates and four loops:

 tmp1 = malloc(n*sizeof(whatever));
 for (i=0; in; i++)
tmp1[i] = array1[i] + array2[i];
 tmp2 = malloc(n*sizeof(whatever));
 for (i=0; in; i++)
tmp2[i] = array3[i] + array4[i];
 tmp3 = malloc(n*sizeof(whatever));
 for (i=0; in; i++)
tmp3[i] = tmp1[i] + tmp2[i];
 free(tmp1);
 free(tmp2);
 for (i=0; in; i++)
   array1[i]  = tmp3[i];
 free(tmp3);

C/C++ do not allocate extra arrays.  What you posted _might_ bear a
small resemblance to what numpy might produce (if using vectorized
code, not explicit loop code).  This is entirely unrelated to the
reasons why fortran can be faster than c.

-Mike

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Wow, Python much faster than MatLab

2006-12-31 Thread sturlamolden

Klaas wrote:
 C/C++ do not allocate extra arrays.  What you posted _might_ bear a
 small resemblance to what numpy might produce (if using vectorized
 code, not explicit loop code).  This is entirely unrelated to the
 reasons why fortran can be faster than c.

Array libraries in C++ that use operator overloading produce
intermediate arrays for the same reason as NumPy. There is a C++
library that are sometimes able to avoid intermediates (Blitz++), but
it can only do so for small arrays for which bounds are known at
compile time.

Operator overloading is sometimes portrayed as required for scientific
computing (e.g. in Java vs. C# flame wars), but the cure can be worse
than the disease.

C does not have operator overloading and is an entirely different case.
You can of course avoid intermediates in C++ if you use C++ as C. You
can do that in Python as well.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Wow, Python much faster than MatLab

2006-12-30 Thread Steven D'Aprano
On Fri, 29 Dec 2006 19:35:22 -0800, Beliavsky wrote:

 Especially I like:
 - more relaxed behavior of exceeded the upper limit of a (1-dimensional)
   array
 
 Could you explain what this means? In general, I don't want a
 programming language to be relaxed about exceeding array bounds.

I'm not sure about SciPy, but lists in standard Python allow this:

 array = [1, 2, 3, 4]
 array[2:5]
[3, 4]

That's generally a good thing.




-- 
Steven.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Wow, Python much faster than MatLab

2006-12-30 Thread Stef Mientki

 MatLab: 14 msec
 Python:  2 msec
 
 For times this small, I wonder if timing comparisons are valid. I do
 NOT think SciPy is in general an order of magnitude faster than Matlab
 for the task typically performed with Matlab.
The algorithm is meant for real-time analysis,
where these kind of differences counts a lot.
I'm also a typical surface programmer
(don't need/want to know what's going inside),
just want to get my analysis done,
and the fact that Python has much more functions available,
means I've to write far less explicit or implicit for loops,
and thus I expect it to look faster for me always.
 
 After taking the first difficult steps into Python,
 all kind of small problems as you already know,
 it nows seems a piece of cake to convert from MatLab to Python.
 (the final programs of MatLab and Python can almost only be
 distinguished by the comment character ;-)

 Especially I like:
 - more relaxed behavior of exceeded the upper limit of a (1-dimensional)
   array
 
 Could you explain what this means? In general, I don't want a
 programming language to be relaxed about exceeding array bounds.
 
Well, I've to admit, that wasn't a very tactic remark, noise is still
an unwanted issue in software.
But in the meanwhile I've reading further and I should replace that by
some other great things:
- the very efficient way, comment is turned into help information
- the (at first sight) very easy, but yet quit powerfull OOPs implemetation.

cheers,
Stef Mientki
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Wow, Python much faster than MatLab

2006-12-30 Thread Stef Mientki
 
 I'm not sure about SciPy,

Yes SciPy allows it too !
  but lists in standard Python allow this:
 
 array = [1, 2, 3, 4]
 array[2:5]
 [3, 4]
 
 That's generally a good thing.
 

You're not perhaps by origin an analog engineer ;-)

cheers,
Stef Mientki
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Wow, Python much faster than MatLab

2006-12-30 Thread Mathias Panzenboeck
A other great thing: With rpy you have R bindings for python.
So you have the power of R and the easy syntax and big standard lib of python! 
:)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Wow, Python much faster than MatLab

2006-12-30 Thread Stef Mientki
Mathias Panzenboeck wrote:
 A other great thing: With rpy you have R bindings for python.

forgive my ignorance, what's R, rpy ?
Or is only relevant for Linux users ?

cheers
Stef

 So you have the power of R and the easy syntax and big standard lib of 
 python! :)
-- 
http://mail.python.org/mailman/listinfo/python-list


RE: Wow, Python much faster than MatLab

2006-12-30 Thread Doran, Harold
R is the open-source implementation of the S language developed at Bell
laboratories. It is a statistical programming language that is becoming
the de facto standard among statisticians. Rpy is what allows an
interface between python and the R language.

Harold 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On 
 Behalf Of Stef Mientki
 Sent: Saturday, December 30, 2006 9:24 AM
 To: python-list@python.org
 Subject: Re: Wow, Python much faster than MatLab
 
 Mathias Panzenboeck wrote:
  A other great thing: With rpy you have R bindings for python.
 
 forgive my ignorance, what's R, rpy ?
 Or is only relevant for Linux users ?
 
 cheers
 Stef
 
  So you have the power of R and the easy syntax and big 
 standard lib of 
  python! :)
 --
 http://mail.python.org/mailman/listinfo/python-list
 
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Wow, Python much faster than MatLab

2006-12-30 Thread Stef Mientki
Doran, Harold wrote:
 R is the open-source implementation of the S language developed at Bell
 laboratories. It is a statistical programming language that is becoming
 the de facto standard among statisticians.
Thanks for the information
I always thought that SPSS or SAS where thé standards.
Stef
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Wow, Python much faster than MatLab

2006-12-30 Thread John J. Lee
Stef Mientki [EMAIL PROTECTED] writes:

 Mathias Panzenboeck wrote:
  A other great thing: With rpy you have R bindings for python.
 
 forgive my ignorance, what's R, rpy ?
 Or is only relevant for Linux users ?
[...]

R is a language / environment for statistical programming.  RPy is a
Python interface to let you use R from Python.  I think they both run
on both Windows and Linux.

http://www.r-project.org/

http://rpy.sourceforge.net/


John
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Wow, Python much faster than MatLab

2006-12-30 Thread John J. Lee
Stef Mientki [EMAIL PROTECTED] writes:

 Doran, Harold wrote:
  R is the open-source implementation of the S language developed at Bell
  laboratories. It is a statistical programming language that is becoming
  the de facto standard among statisticians.
 Thanks for the information
 I always thought that SPSS or SAS where thé standards.
 Stef

The 'SS' in SPSS stands for Social Science, IIRC.  Looking at the lack
of mention of that on their website, though, and the prominent use of
the E word there, they have obviously grown out of (or want to grow
out of) their original niche.

Googling, SAS's market seems to be mostly in the business / financial
worlds.

No doubt R's community differs from those, though I don't know exactly
how.  From the long list of free software available for it, it sure
seems popular with some people:

http://www.stats.bris.ac.uk/R/


John
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Wow, Python much faster than MatLab

2006-12-30 Thread gblais
R is the free version of the S language.  S-PLUS is a commercial version. 
Both are targeted at statisticians per se.  Their strengths are in
exploratory data analysis (in my opinion).

SAS has many statistical featues, and is phenomenally well-documented and
supported.  One of its great strengths is the robustness of its data model
-- very well suited to large sizes, repetitive inputs, industrial-strength
data processing with a statistics slant.  Well over 200 SAS books,for
example.

I think of SAS and R as being like airliners and helicopters -- airlines get
the job done, and well, as long as it's well-defined and nearly the same job
all the time.  Helicopters can go anywhere, do anything, but a moment's
inattention leads to a crash.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Wow, Python much faster than MatLab

2006-12-30 Thread Stef Mientki
 I think of SAS and R as being like airliners and helicopters -- 
I like that comparison,...
.. Airplanes are inherent stable,
.. Helicopters are inherent not-stable ;-)

cheers,
Stef
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Wow, Python much faster than MatLab

2006-12-30 Thread Ramon Diaz-Uriarte
On 12/31/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
 R is the free version of the S language.  S-PLUS is a commercial version.
 Both are targeted at statisticians per se.  Their strengths are in
 exploratory data analysis (in my opinion).

 SAS has many statistical featues, and is phenomenally well-documented and
 supported.  One of its great strengths is the robustness of its data model
 -- very well suited to large sizes, repetitive inputs, industrial-strength
 data processing with a statistics slant.  Well over 200 SAS books,for
 example.

 I think of SAS and R as being like airliners and helicopters -- airlines get
 the job done, and well, as long as it's well-defined and nearly the same job
 all the time.  Helicopters can go anywhere, do anything, but a moment's
 inattention leads to a crash.
 --

inattention leading to a crash? I don't get it. I used SAS for about 3
or 4 years, and have used S-Plus and then R for 10 years (R for 8
years now). I've never noticed inattention leading to a crash. I've
noticed I cannot get away in R without a careful definition of what I
want (which is good), and the immediate interactivity of R is very
helpful with mistakes. And of course, programming in R is, well,
programming in a reasonable language. Programming in SAS is ... well,
programming in SAS (which is about as fun as programming in SPSS).

(Another email somehow suggested that the stability/instability
analogy of airplanes vs. helicopters does apply to SAS vs. R. Again, I
don't really get it. Sure, SAS is very stable. But so is R ---one
common complaint is getting seg faults because package whatever has
memory leaks, but that is not R's fault, but rather the package's
fault).

But then, this might start looking a lot like a flame war, which is
actually rather off-topic for this list.


Anyway, for a Python programmer, picking up R should be fairly easy.
And rpy is really a great way of getting R and Python to talk to each
other. We do this sort of thing quite a bit on our applications.

And yes, R is definitely available for both Linux and Windows (and
Mac), has excellent support from several editors in those platforms
(e.g., emacs + ess, tinn-R, etc), and seems to be becoming a de facto
standard at least in statistical research and is extremely popular in
bioinformatics and among statisticians who do bioinformatics (look at
bioconductor.org).


Ramon


-- 
Ramon Diaz-Uriarte
Statistical Computing Team
Structural Biology and Biocomputing Programme
Spanish National Cancer Centre (CNIO)
http://ligarto.org/rdiaz
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Wow, Python much faster than MatLab

2006-12-30 Thread sturlamolden

Stef Mientki wrote:

 I always thought that SPSS or SAS where thé standards.
 Stef

As far as SPSS is a standard, it is in the field of religious use of
statistical procedures I don't understand (as I'm a math retard), but
hey p0.05 is always significant (and any other value is proof of the
opposite ... I think).

SPSS is often used by scientists that don't understand maths at all,
often within the fields of social sciences, but regrettably also within
biology and medicine. I know of few program that have done so much harm
as SPSS. It's like handing an armed weapon to a child. Generally one
should stay away from the things that one don't understand,
particularly within medicine where a wrong result can have dramatic
consequences. SPSS encourages the opposite. Copy and paste from Excel
to SPSS is regrettably becoming the de-facto standard in applied
statistics. The problem is not the quality of Excel or SPSS, but rather
the (in)competence of those conducting the data analysis. This can and
does regrettably lead to serious misinterpretation of the data, in
either direction. When a paper is submitted, these errors are usually
not caught in the peer review process, as peer review is, well, exactly
what is says: *peer* review.

Thus, SPSS makes it easy to shoot your self in the foot. In my
experience students in social sciences and medicine are currently
thought to do exact that, in universities and colleges all around the
World. And it is particularly dangerous within medical sciences, as
peoples' life and health may be affected by it. I pray God something is
done to prohibit or limit the use of these statistical toys.


Sturla Molden
PhD

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Wow, Python much faster than MatLab

2006-12-30 Thread Wensui Liu
Sturla,

I am working in the healthcare and seeing people loves to use excel /
spss as database or statistical tool without know what he/she is
doing. However, that is not the fault of excel/spss itself but of
people who is using it. Things, even include SAS/R, would look stupid,
when it has been misused.

In the hospitals, people don't pray God. They pray MD. :-)

On 30 Dec 2006 19:09:59 -0800, sturlamolden [EMAIL PROTECTED] wrote:

 Stef Mientki wrote:

  I always thought that SPSS or SAS where thé standards.
  Stef

 As far as SPSS is a standard, it is in the field of religious use of
 statistical procedures I don't understand (as I'm a math retard), but
 hey p0.05 is always significant (and any other value is proof of the
 opposite ... I think).

 SPSS is often used by scientists that don't understand maths at all,
 often within the fields of social sciences, but regrettably also within
 biology and medicine. I know of few program that have done so much harm
 as SPSS. It's like handing an armed weapon to a child. Generally one
 should stay away from the things that one don't understand,
 particularly within medicine where a wrong result can have dramatic
 consequences. SPSS encourages the opposite. Copy and paste from Excel
 to SPSS is regrettably becoming the de-facto standard in applied
 statistics. The problem is not the quality of Excel or SPSS, but rather
 the (in)competence of those conducting the data analysis. This can and
 does regrettably lead to serious misinterpretation of the data, in
 either direction. When a paper is submitted, these errors are usually
 not caught in the peer review process, as peer review is, well, exactly
 what is says: *peer* review.

 Thus, SPSS makes it easy to shoot your self in the foot. In my
 experience students in social sciences and medicine are currently
 thought to do exact that, in universities and colleges all around the
 World. And it is particularly dangerous within medical sciences, as
 peoples' life and health may be affected by it. I pray God something is
 done to prohibit or limit the use of these statistical toys.


 Sturla Molden
 PhD

 --
 http://mail.python.org/mailman/listinfo/python-list



-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Wow, Python much faster than MatLab

2006-12-30 Thread sturlamolden

Stef Mientki wrote:

 MatLab: 14 msec
 Python:  2 msec

I have the same experience. NumPy is usually faster than Matlab. But it
very much depends on how the code is structured.

I wonder if it is possible to improve the performance of NumPy by
having its fundamental types in the language, instead of depending on
operator overloading. For example, in NumPy, a statement like

array3[:] = array1[:] + array2[:]

allocates an intermediate array that is not needed. This is because the
operator overloading cannot know if it's evaluating a part of a larger
statement like

array1[:] = (array1[:] + array2[:]) * (array3[:] + array4[:])

If arrays had been a part of the language, as it is in Matlab and
Fortran 95, the compiler could see this and avoid intermediate storage,
as well as looping over the data only once. This is one of the main
reasons why Fortran is better than C++ for scientific computing. I.e.
instead of

for (i=0; in; i++)
  array1[i] = (array1[i] + array2[i]) * (array3[i] + array4[i]);

one actually gets something like three intermediates and four loops:

tmp1 = malloc(n*sizeof(whatever));
for (i=0; in; i++)
   tmp1[i] = array1[i] + array2[i];
tmp2 = malloc(n*sizeof(whatever));
for (i=0; in; i++)
   tmp2[i] = array3[i] + array4[i];
tmp3 = malloc(n*sizeof(whatever));
for (i=0; in; i++)
   tmp3[i] = tmp1[i] + tmp2[i];
free(tmp1);
free(tmp2);
for (i=0; in; i++)
  array1[i]  = tmp3[i];
free(tmp3);

In C++ this is actually further bloated by constructor, destructor and
copyconstructor calls.
Why one should use Fortran over C++ is obvious. But it also applies to
NumPy, and also to the issue of Numpy vs. Matlab, as Matlab know about
arrays and has a compiler that can deal with this, whilst NumPy depends
on bloated operator overloading. On the other hand, Matlab is
fundamentally impaired on function calls and array slicing compared
with NumPy (basically copies are created instead of views). Thus, which
is faster - Matlab or NumPy - very much depends on how the code is
written.

Now for my question: operator overloading is (as shown) not the
solution to efficient scientific computing. It creates serious bloat
where it is undesired. Can NumPy's performance be improved by adding
the array types to the Python language it self? Or are the dynamic
nature of Python preventing this?

Sturla Molden

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Wow, Python much faster than MatLab

2006-12-30 Thread sturlamolden

Wensui Liu wrote:

 doing. However, that is not the fault of excel/spss itself but of
 people who is using it.

Yes and no. I think SPSS makes it too tempting. Like children playing
with fire, they may not even know it's dangerous. You can do an GLM in
SPSS by just filling out a form - but how many social scientists or MDs
know anything about general linear models?

The command line interface of MySQL, SAS, Matlab and R makes an
excellent deterrent. All statistical tool can be misused. But the
difference is accidental and deliberate misuse. Anyone can naviagte a
GUI, but you need to know you want to do an ANOVA before you can think
of typing anova on the command line.

You mentioned use of Excel as database. That is another example,
although it has more to do with data security and integrity, and
sometimes protection of privacy. Many companies have banned the use of
Microsoft Access, as employees were building their own mock up
databases - thus migrating these Access databases to an even worse
solution (Excel). 

Sturla Molden

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Wow, Python much faster than MatLab

2006-12-30 Thread Robert Kern
sturlamolden wrote:
 array3[:] = array1[:] + array2[:]

OT, but why are you slicing array1 and array2? All that does is create new array
objects pointing to the same data.

 Now for my question: operator overloading is (as shown) not the
 solution to efficient scientific computing. It creates serious bloat
 where it is undesired. Can NumPy's performance be improved by adding
 the array types to the Python language it self? Or are the dynamic
 nature of Python preventing this?

Pretty much. Making the array types builtin rather than from a third party
module doesn't really change anything. However, if type inferencing tools like
psyco are taught about numpy arrays like they are already taught about ints,
then one could do make it avoid temporaries.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth.
  -- Umberto Eco

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Wow, Python much faster than MatLab

2006-12-29 Thread Beliavsky

Stef Mientki wrote:
 hi All,

 instead of questions,
 my first success story:

 I converted my first MatLab algorithm into Python (using SciPy),
 and it not only works perfectly,
 but also runs much faster:

 MatLab: 14 msec
 Python:  2 msec

For times this small, I wonder if timing comparisons are valid. I do
NOT think SciPy is in general an order of magnitude faster than Matlab
for the task typically performed with Matlab.


 After taking the first difficult steps into Python,
 all kind of small problems as you already know,
 it nows seems a piece of cake to convert from MatLab to Python.
 (the final programs of MatLab and Python can almost only be
 distinguished by the comment character ;-)

 Especially I like:
 - more relaxed behavior of exceeded the upper limit of a (1-dimensional)
   array

Could you explain what this means? In general, I don't want a
programming language to be relaxed about exceeding array bounds.

-- 
http://mail.python.org/mailman/listinfo/python-list