Re: Wow, Python much faster than MatLab

2007-01-01 Thread Wensui Liu
Gerry,

I have the similar background as yours, many years using SAS/R. Right
now I am trying to pick up python.

>From your point, is there anything that can be done with python easily
but not with SAS/R?

thanks for your insight.

wensui

On 1/1/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> We're not so far apart.
>
> I've used SAS or 25 years, and R/S-PLUS for 10.
>
> I think you've said it better than I did, though: R requires more attention
> (which is often needed).
>
> I certainly didn't mean that R crashed - just an indictment of how much I
> thought I was holding in my head.
>
> Gerry
> --
> http://mail.python.org/mailman/listinfo/python-list
>


-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Wow, Python much faster than MatLab

2007-01-01 Thread gblais
We're not so far apart.

I've used SAS or 25 years, and R/S-PLUS for 10.

I think you've said it better than I did, though: R requires more attention
(which is often needed).

I certainly didn't mean that R crashed - just an indictment of how much I
thought I was holding in my head.

Gerry
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Wow, Python much faster than MatLab

2006-12-31 Thread sturlamolden

Klaas wrote:
> C/C++ do not allocate extra arrays.  What you posted _might_ bear a
> small resemblance to what numpy might produce (if using vectorized
> code, not explicit loop code).  This is entirely unrelated to the
> reasons why fortran can be faster than c.

Array libraries in C++ that use operator overloading produce
intermediate arrays for the same reason as NumPy. There is a C++
library that are sometimes able to avoid intermediates (Blitz++), but
it can only do so for small arrays for which bounds are known at
compile time.

Operator overloading is sometimes portrayed as required for scientific
computing (e.g. in Java vs. C# flame wars), but the cure can be worse
than the disease.

C does not have operator overloading and is an entirely different case.
You can of course avoid intermediates in C++ if you use C++ as C. You
can do that in Python as well.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Wow, Python much faster than MatLab

2006-12-31 Thread Klaas

sturlamolden wrote:

> as well as looping over the data only once. This is one of the main
> reasons why Fortran is better than C++ for scientific computing. I.e.
> instead of
>
> for (i=0; i   array1[i] = (array1[i] + array2[i]) * (array3[i] + array4[i]);
>
> one actually gets something like three intermediates and four loops:
>
> tmp1 = malloc(n*sizeof(whatever));
> for (i=0; itmp1[i] = array1[i] + array2[i];
> tmp2 = malloc(n*sizeof(whatever));
> for (i=0; itmp2[i] = array3[i] + array4[i];
> tmp3 = malloc(n*sizeof(whatever));
> for (i=0; itmp3[i] = tmp1[i] + tmp2[i];
> free(tmp1);
> free(tmp2);
> for (i=0; i   array1[i]  = tmp3[i];
> free(tmp3);

C/C++ do not allocate extra arrays.  What you posted _might_ bear a
small resemblance to what numpy might produce (if using vectorized
code, not explicit loop code).  This is entirely unrelated to the
reasons why fortran can be faster than c.

-Mike

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Wow, Python much faster than MatLab

2006-12-30 Thread Robert Kern
sturlamolden wrote:
> array3[:] = array1[:] + array2[:]

OT, but why are you slicing array1 and array2? All that does is create new array
objects pointing to the same data.

> Now for my question: operator overloading is (as shown) not the
> solution to efficient scientific computing. It creates serious bloat
> where it is undesired. Can NumPy's performance be improved by adding
> the array types to the Python language it self? Or are the dynamic
> nature of Python preventing this?

Pretty much. Making the array types builtin rather than from a third party
module doesn't really change anything. However, if type inferencing tools like
psyco are taught about numpy arrays like they are already taught about ints,
then one could do make it avoid temporaries.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth."
  -- Umberto Eco

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Wow, Python much faster than MatLab

2006-12-30 Thread sturlamolden

Wensui Liu wrote:

> doing. However, that is not the fault of excel/spss itself but of
> people who is using it.

Yes and no. I think SPSS makes it too tempting. Like children playing
with fire, they may not even know it's dangerous. You can do an GLM in
SPSS by just filling out a form - but how many social scientists or MDs
know anything about general linear models?

The command line interface of MySQL, SAS, Matlab and R makes an
excellent deterrent. All statistical tool can be misused. But the
difference is accidental and deliberate misuse. Anyone can naviagte a
GUI, but you need to know you want to do an ANOVA before you can think
of typing "anova" on the command line.

You mentioned use of Excel as database. That is another example,
although it has more to do with data security and integrity, and
sometimes protection of privacy. Many companies have banned the use of
Microsoft Access, as employees were building their own mock up
databases - thus migrating these Access databases to an even worse
solution (Excel). 

Sturla Molden

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Wow, Python much faster than MatLab

2006-12-30 Thread sturlamolden

Stef Mientki wrote:

> MatLab: 14 msec
> Python:  2 msec

I have the same experience. NumPy is usually faster than Matlab. But it
very much depends on how the code is structured.

I wonder if it is possible to improve the performance of NumPy by
having its fundamental types in the language, instead of depending on
operator overloading. For example, in NumPy, a statement like

array3[:] = array1[:] + array2[:]

allocates an intermediate array that is not needed. This is because the
operator overloading cannot know if it's evaluating a part of a larger
statement like

array1[:] = (array1[:] + array2[:]) * (array3[:] + array4[:])

If arrays had been a part of the language, as it is in Matlab and
Fortran 95, the compiler could see this and avoid intermediate storage,
as well as looping over the data only once. This is one of the main
reasons why Fortran is better than C++ for scientific computing. I.e.
instead of

for (i=0; ihttp://mail.python.org/mailman/listinfo/python-list


Re: Wow, Python much faster than MatLab

2006-12-30 Thread Wensui Liu
Sturla,

I am working in the healthcare and seeing people loves to use excel /
spss as database or statistical tool without know what he/she is
doing. However, that is not the fault of excel/spss itself but of
people who is using it. Things, even include SAS/R, would look stupid,
when it has been misused.

In the hospitals, people don't pray God. They pray MD. :-)

On 30 Dec 2006 19:09:59 -0800, sturlamolden <[EMAIL PROTECTED]> wrote:
>
> Stef Mientki wrote:
>
> > I always thought that SPSS or SAS where thé standards.
> > Stef
>
> As far as SPSS is a standard, it is in the field of "religious use of
> statistical procedures I don't understand (as I'm a math retard), but
> hey p<0.05 is always significant (and any other value is proof of the
> opposite ... I think)."
>
> SPSS is often used by scientists that don't understand maths at all,
> often within the fields of social sciences, but regrettably also within
> biology and medicine. I know of few program that have done so much harm
> as SPSS. It's like handing an armed weapon to a child. Generally one
> should stay away from the things that one don't understand,
> particularly within medicine where a wrong result can have dramatic
> consequences. SPSS encourages the opposite. Copy and paste from Excel
> to SPSS is regrettably becoming the de-facto standard in applied
> statistics. The problem is not the quality of Excel or SPSS, but rather
> the (in)competence of those conducting the data analysis. This can and
> does regrettably lead to serious misinterpretation of the data, in
> either direction. When a paper is submitted, these errors are usually
> not caught in the peer review process, as peer review is, well, exactly
> what is says: *peer* review.
>
> Thus, SPSS makes it easy to shoot your self in the foot. In my
> experience students in social sciences and medicine are currently
> thought to do exact that, in universities and colleges all around the
> World. And it is particularly dangerous within medical sciences, as
> peoples' life and health may be affected by it. I pray God something is
> done to prohibit or limit the use of these statistical toys.
>
>
> Sturla Molden
> PhD
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>


-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Wow, Python much faster than MatLab

2006-12-30 Thread sturlamolden

Stef Mientki wrote:

> I always thought that SPSS or SAS where thé standards.
> Stef

As far as SPSS is a standard, it is in the field of "religious use of
statistical procedures I don't understand (as I'm a math retard), but
hey p<0.05 is always significant (and any other value is proof of the
opposite ... I think)."

SPSS is often used by scientists that don't understand maths at all,
often within the fields of social sciences, but regrettably also within
biology and medicine. I know of few program that have done so much harm
as SPSS. It's like handing an armed weapon to a child. Generally one
should stay away from the things that one don't understand,
particularly within medicine where a wrong result can have dramatic
consequences. SPSS encourages the opposite. Copy and paste from Excel
to SPSS is regrettably becoming the de-facto standard in applied
statistics. The problem is not the quality of Excel or SPSS, but rather
the (in)competence of those conducting the data analysis. This can and
does regrettably lead to serious misinterpretation of the data, in
either direction. When a paper is submitted, these errors are usually
not caught in the peer review process, as peer review is, well, exactly
what is says: *peer* review.

Thus, SPSS makes it easy to shoot your self in the foot. In my
experience students in social sciences and medicine are currently
thought to do exact that, in universities and colleges all around the
World. And it is particularly dangerous within medical sciences, as
peoples' life and health may be affected by it. I pray God something is
done to prohibit or limit the use of these statistical toys.


Sturla Molden
PhD

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Wow, Python much faster than MatLab

2006-12-30 Thread Ramon Diaz-Uriarte
On 12/31/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> R is the free version of the S language.  S-PLUS is a commercial version.
> Both are targeted at statisticians per se.  Their strengths are in
> exploratory data analysis (in my opinion).
>
> SAS has many statistical featues, and is phenomenally well-documented and
> supported.  One of its great strengths is the robustness of its data model
> -- very well suited to large sizes, repetitive inputs, industrial-strength
> data processing with a statistics slant.  Well over 200 SAS books,for
> example.
>
> I think of SAS and R as being like airliners and helicopters -- airlines get
> the job done, and well, as long as it's well-defined and nearly the same job
> all the time.  Helicopters can go anywhere, do anything, but a moment's
> inattention leads to a crash.
> --

inattention leading to a crash? I don't get it. I used SAS for about 3
or 4 years, and have used S-Plus and then R for 10 years (R for 8
years now). I've never noticed inattention leading to a crash. I've
noticed I cannot get away in R without a careful definition of what I
want (which is good), and the immediate interactivity of R is very
helpful with mistakes. And of course, programming in R is, well,
programming in a reasonable language. Programming in SAS is ... well,
programming in SAS (which is about as fun as programming in SPSS).

(Another email somehow suggested that the stability/instability
analogy of airplanes vs. helicopters does apply to SAS vs. R. Again, I
don't really get it. Sure, SAS is very stable. But so is R ---one
common complaint is getting seg faults because package whatever has
memory leaks, but that is not R's fault, but rather the package's
fault).

But then, this might start looking a lot like a flame war, which is
actually rather off-topic for this list.


Anyway, for a Python programmer, picking up R should be fairly easy.
And rpy is really a great way of getting R and Python to talk to each
other. We do this sort of thing quite a bit on our applications.

And yes, R is definitely available for both Linux and Windows (and
Mac), has excellent support from several editors in those platforms
(e.g., emacs + ess, tinn-R, etc), and seems to be becoming a de facto
standard at least in statistical research and is extremely popular in
bioinformatics and among statisticians who do bioinformatics (look at
bioconductor.org).


Ramon


-- 
Ramon Diaz-Uriarte
Statistical Computing Team
Structural Biology and Biocomputing Programme
Spanish National Cancer Centre (CNIO)
http://ligarto.org/rdiaz
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Wow, Python much faster than MatLab

2006-12-30 Thread Stef Mientki
> I think of SAS and R as being like airliners and helicopters -- 
I like that comparison,...
.. Airplanes are inherent stable,
.. Helicopters are inherent not-stable ;-)

cheers,
Stef
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Wow, Python much faster than MatLab

2006-12-30 Thread gblais
R is the free version of the S language.  S-PLUS is a commercial version. 
Both are targeted at statisticians per se.  Their strengths are in
exploratory data analysis (in my opinion).

SAS has many statistical featues, and is phenomenally well-documented and
supported.  One of its great strengths is the robustness of its data model
-- very well suited to large sizes, repetitive inputs, industrial-strength
data processing with a statistics slant.  Well over 200 SAS books,for
example.

I think of SAS and R as being like airliners and helicopters -- airlines get
the job done, and well, as long as it's well-defined and nearly the same job
all the time.  Helicopters can go anywhere, do anything, but a moment's
inattention leads to a crash.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Wow, Python much faster than MatLab

2006-12-30 Thread John J. Lee
Stef Mientki <[EMAIL PROTECTED]> writes:

> Doran, Harold wrote:
> > R is the open-source implementation of the S language developed at Bell
> > laboratories. It is a statistical programming language that is becoming
> > the de facto standard among statisticians.
> Thanks for the information
> I always thought that SPSS or SAS where thé standards.
> Stef

The 'SS' in SPSS stands for Social Science, IIRC.  Looking at the lack
of mention of that on their website, though, and the prominent use of
the "E word" there, they have obviously grown out of (or want to grow
out of) their original niche.

Googling, SAS's market seems to be mostly in the business / financial
worlds.

No doubt R's community differs from those, though I don't know exactly
how.  From the long list of free software available for it, it sure
seems popular with some people:

http://www.stats.bris.ac.uk/R/


John
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Wow, Python much faster than MatLab

2006-12-30 Thread John J. Lee
Stef Mientki <[EMAIL PROTECTED]> writes:

> Mathias Panzenboeck wrote:
> > A other great thing: With rpy you have R bindings for python.
> 
> forgive my ignorance, what's R, rpy ?
> Or is only relevant for Linux users ?
[...]

R is a language / environment for statistical programming.  RPy is a
Python interface to let you use R from Python.  I think they both run
on both Windows and Linux.

http://www.r-project.org/

http://rpy.sourceforge.net/


John
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Wow, Python much faster than MatLab

2006-12-30 Thread Stef Mientki
Doran, Harold wrote:
> R is the open-source implementation of the S language developed at Bell
> laboratories. It is a statistical programming language that is becoming
> the de facto standard among statisticians.
Thanks for the information
I always thought that SPSS or SAS where thé standards.
Stef
-- 
http://mail.python.org/mailman/listinfo/python-list


RE: Wow, Python much faster than MatLab

2006-12-30 Thread Doran, Harold
R is the open-source implementation of the S language developed at Bell
laboratories. It is a statistical programming language that is becoming
the de facto standard among statisticians. Rpy is what allows an
interface between python and the R language.

Harold 

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On 
> Behalf Of Stef Mientki
> Sent: Saturday, December 30, 2006 9:24 AM
> To: python-list@python.org
> Subject: Re: Wow, Python much faster than MatLab
> 
> Mathias Panzenboeck wrote:
> > A other great thing: With rpy you have R bindings for python.
> 
> forgive my ignorance, what's R, rpy ?
> Or is only relevant for Linux users ?
> 
> cheers
> Stef
> 
> > So you have the power of R and the easy syntax and big 
> standard lib of 
> > python! :)
> --
> http://mail.python.org/mailman/listinfo/python-list
> 
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Wow, Python much faster than MatLab

2006-12-30 Thread Stef Mientki
Mathias Panzenboeck wrote:
> A other great thing: With rpy you have R bindings for python.

forgive my ignorance, what's R, rpy ?
Or is only relevant for Linux users ?

cheers
Stef

> So you have the power of R and the easy syntax and big standard lib of 
> python! :)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Wow, Python much faster than MatLab

2006-12-30 Thread Mathias Panzenboeck
A other great thing: With rpy you have R bindings for python.
So you have the power of R and the easy syntax and big standard lib of python! 
:)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Wow, Python much faster than MatLab

2006-12-30 Thread Stef Mientki
> 
> I'm not sure about SciPy,

Yes SciPy allows it too !
  but lists in standard Python allow this:
> 
 array = [1, 2, 3, 4]
 array[2:5]
> [3, 4]
> 
> That's generally a good thing.
> 

You're not perhaps by origin an analog engineer ;-)

cheers,
Stef Mientki
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Wow, Python much faster than MatLab

2006-12-30 Thread Stef Mientki

>> MatLab: 14 msec
>> Python:  2 msec
> 
> For times this small, I wonder if timing comparisons are valid. I do
> NOT think SciPy is in general an order of magnitude faster than Matlab
> for the task typically performed with Matlab.
The algorithm is meant for real-time analysis,
where these kind of differences counts a lot.
I'm also a typical "surface programmer"
(don't need/want to know what's going inside),
just want to get my analysis done,
and the fact that Python has much more functions available,
means I've to write far less explicit or implicit for loops,
and thus I expect it to "look" faster for me always.
> 
>> After taking the first difficult steps into Python,
>> all kind of small problems as you already know,
>> it nows seems a piece of cake to convert from MatLab to Python.
>> (the final programs of MatLab and Python can almost only be
>> distinguished by the comment character ;-)
>>
>> Especially I like:
>> - more relaxed behavior of exceeded the upper limit of a (1-dimensional)
>>   array
> 
> Could you explain what this means? In general, I don't want a
> programming language to be "relaxed" about exceeding array bounds.
> 
Well, I've to admit, that wasn't a very tactic remark, "noise" is still
an unwanted issue in software.
But in the meanwhile I've reading further and I should replace that by
some other great things:
- the very efficient way, comment is turned into help information
- the (at first sight) very easy, but yet quit powerfull OOPs implemetation.

cheers,
Stef Mientki
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Wow, Python much faster than MatLab

2006-12-30 Thread Steven D'Aprano
On Fri, 29 Dec 2006 19:35:22 -0800, Beliavsky wrote:

>> Especially I like:
>> - more relaxed behavior of exceeded the upper limit of a (1-dimensional)
>>   array
> 
> Could you explain what this means? In general, I don't want a
> programming language to be "relaxed" about exceeding array bounds.

I'm not sure about SciPy, but lists in standard Python allow this:

>>> array = [1, 2, 3, 4]
>>> array[2:5]
[3, 4]

That's generally a good thing.




-- 
Steven.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Wow, Python much faster than MatLab

2006-12-29 Thread Beliavsky

Stef Mientki wrote:
> hi All,
>
> instead of questions,
> my first success story:
>
> I converted my first MatLab algorithm into Python (using SciPy),
> and it not only works perfectly,
> but also runs much faster:
>
> MatLab: 14 msec
> Python:  2 msec

For times this small, I wonder if timing comparisons are valid. I do
NOT think SciPy is in general an order of magnitude faster than Matlab
for the task typically performed with Matlab.

>
> After taking the first difficult steps into Python,
> all kind of small problems as you already know,
> it nows seems a piece of cake to convert from MatLab to Python.
> (the final programs of MatLab and Python can almost only be
> distinguished by the comment character ;-)
>
> Especially I like:
> - more relaxed behavior of exceeded the upper limit of a (1-dimensional)
>   array

Could you explain what this means? In general, I don't want a
programming language to be "relaxed" about exceeding array bounds.

-- 
http://mail.python.org/mailman/listinfo/python-list