Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2010-01-18 Thread Robert Kern
On Mon, Jan 18, 2010 at 13:34, Vicente Sole  wrote:

> You are taking point 4.d)0 while I am taking 4.d)1:
>
> """
> 1) Use a suitable shared library mechanism for linking with the
> Library. A suitable mechanism is one that (a) uses at run time a copy
> of the Library already present on the user's computer system, and (b)
> will operate properly with a modified version of the Library that is
> interface-compatible with the Linked Version.
> """
>
> If you are using the library as a shared library (what you do most of
> the times in Python), you are quite free.

numpy would not be using Eigen2 as a shared library. It is true that
numpy would act as a shared library with respect to some downstream
application, but incorporating Eigen2 into numpy would make those
numpy binaries be effectively under the LGPL license with respect to
the downstream application.

> In any case, it seems I am not the only one seeing it like that:
>
> http://qt.nokia.com/downloads
>
> The key point is if you use the library "as is" or you have modified it.

With respect to numpy and the way that Eigen2 was proposed as being
used, no, it is not the key point. We will not incorporate Eigen2 code
into numpy, particularly not as the default linear algebra
implementation, because we wish to keep numpy's source as being only
BSD. This is a policy decision of the numpy team, not a legal
incompatibility.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2010-01-18 Thread Vicente Sole
Quoting Bruce Southey :

> On 01/18/2010 12:47 PM, Vicente Sole wrote:
>> Quoting Bruce Southey :
>>
>>>
>>> If you obtain the code from any package then you are bound by the terms
>>> of that code. So while a user might not be 'inconvenienced' by the LGPL,
>>> they are required to meet the terms as required. For some licenses (like
>>> the LGPL) these terms do not really apply until you distribute the code
>>> but that does not mean that the user is exempt from the licensing terms
>>> of that code because they have not distributed their code (yet).
>>>
>>> Furthermore there are a number of numpy users that download the numpy
>>> project for further distribution such as Enthought, packagers for Linux
>>> distributions and developers of projects like Python(x,y). Some of these
>>> users would be inconvenienced because binary-only distributions would
>>> not be permitted in any form.
>>>
>>
>> I think people are confusing LGPL and GPL...
> Not at all.
>
>>
>> I can distribute my code in binary form without any restriction   
>> when using an LGPL library UNLESS I have modified the library itself.
>
> I do not interpret the LGPL version 3 in this way:
> A "Combined Work" is a work produced by combining or linking an
> Application with the Library.
> So you must apply section 4, in particular, provide the "Minimal
> Corresponding Source":
> The "Minimal Corresponding Source" for a Combined Work means the
> Corresponding Source for the Combined Work, excluding any source code
> for portions of the Combined Work that, considered in isolation, are
> based on the Application, and not on the Linked Version.
>
> So a binary-only is usually not appropriate.
>

You are taking point 4.d)0 while I am taking 4.d)1:

"""
1) Use a suitable shared library mechanism for linking with the  
Library. A suitable mechanism is one that (a) uses at run time a copy  
of the Library already present on the user's computer system, and (b)  
will operate properly with a modified version of the Library that is  
interface-compatible with the Linked Version.
"""

If you are using the library as a shared library (what you do most of  
the times in Python), you are quite free.

In any case, it seems I am not the only one seeing it like that:

http://qt.nokia.com/downloads

The key point is if you use the library "as is" or you have modified it.

Armando
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2010-01-18 Thread Bruce Southey

On 01/18/2010 12:47 PM, Vicente Sole wrote:

Quoting Bruce Southey :



If you obtain the code from any package then you are bound by the terms
of that code. So while a user might not be 'inconvenienced' by the LGPL,
they are required to meet the terms as required. For some licenses (like
the LGPL) these terms do not really apply until you distribute the code
but that does not mean that the user is exempt from the licensing terms
of that code because they have not distributed their code (yet).

Furthermore there are a number of numpy users that download the numpy
project for further distribution such as Enthought, packagers for Linux
distributions and developers of projects like Python(x,y). Some of these
users would be inconvenienced because binary-only distributions would
not be permitted in any form.



I think people are confusing LGPL and GPL...

Not at all.



I can distribute my code in binary form without any restriction when 
using an LGPL library UNLESS I have modified the library itself. 


I do not interpret the LGPL version 3 in this way:
A "Combined Work" is a work produced by combining or linking an 
Application with the Library.
So you must apply section 4, in particular, provide the "Minimal 
Corresponding Source":
The "Minimal Corresponding Source" for a Combined Work means the 
Corresponding Source for the Combined Work, excluding any source code 
for portions of the Combined Work that, considered in isolation, are 
based on the Application, and not on the Linked Version.


So a binary-only is usually not appropriate.

Bruce

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2010-01-18 Thread Bruce Southey
On 01/18/2010 10:46 AM, Benoit Jacob wrote:
> 2010/1/18 Robert Kern:
>
>> On Mon, Jan 18, 2010 at 10:26, Benoit Jacob  wrote:
>>  
>>> 2010/1/18 Robert Kern:
>>>
 On Mon, Jan 18, 2010 at 09:35, Benoit Jacob  
 wrote:

  
> Sorry for continuing the licensing noise on your list --- I though
> that now that I've started, I should let you know that I think I
> understand things more clearly now ;)
>
 No worries.

  
> First, Section 5 of the LGPL is horrible indeed, so let's forget about 
> that.
>
 I don't think it's that horrible, honestly. It just applies to a
 different deployment use case and a different set of technologies.

  
> If you were using a LGPL-licensed binary library, Section 4 would
> rather be what you want. It would require you to:
>   4a) say somewhere ("prominently" is vague, the bottom of a README is
> OK) that you use the library
>   4b) distribute copies of the GPL and LGPL licenses text. Pointless,
> but not a big issue.
>
> the rest doesn't matter:
>   4c) not applicable to you
>   4d1) this is what you would be doing anyway
>
 Possibly, but shared libraries are not easy for a variety of boring,
 Python-specific, technical reasons.
  
>>> Ah, that I didn't know.
>>>
>>>
>   4e) not applicable to you
>
 Yes, it is. The exception where Installation Information is not
 required is only when installation is impossible, such as embedded
 devices where the code is in a ROM chip.
  
>>> OK, I didn't understand that.
>>>
>>>
  
> Finally and this is the important point: you would not be passing any
> requirement to your own users. Indeed, the LGPL license, contrary to
> the GPL license, does not propagate through dependency chains. So if
> NumPy used a LGPL-licensed lib Foo, the conditions of the LGPL must be
> met when distributing NumPy, but NumPy itself isn't LGPL at all and an
> application using NumPy does not have to care at all about the LGPL.
> So there should be no concern at all of "passing on LGPL requirements
> to users"
>
 No, not at all. The GPL "propagates" by requiring that the rest of the
 code be licensed compatibly with the GPL. This is an unusual and
 particular feature of the GPL. The LGPL does not require that rest of
 the code be licensed in a particular way. However, that doesn't mean
 that the license of the "outer layer" insulates the downstream user
 from the LGPL license of the wrapped component. It just means that
 there is BSD code and LGPL code in the total product. The downstream
 user must accept and deal with the licenses of *all* of the components
 simultaneously. This is how most licenses work. I think that the fact
 that the GPL is particularly "viral" may be obscuring the normal way
 that licenses work when combined with other licenses.

 If I had a proprietary application that used an LGPL library, and I
 gave my customers some limited rights to modify and resell my
 application, they would still be bound by the LGPL with respect to the
 library. They could not modify the LGPLed library and sell it under a
 proprietary license even if I allow them to do that with the
 application as a whole. For us to use Eigen2 in numpy such that our
 users could use, modify and redistribute numpy+Eigen2, in its
 entirety, under the terms of the BSD license, we would have to get
 permission from you to distribute Eigen2 under the BSD license. It's
 only polite.
  
>>> OK, so the Eigen code inside of NumPy would still be protected by the
>>> LGPL. But what I meant when I said that the LGPL requirements don't
>>> propagate to your users, was that, for example, they don't have to
>>> distribute copies of the LGPL text, installation information for
>>> Eigen, or links to Eigen's website.
>>>
>> Yes, they do. They are redistributing Eigen; they must abide by its
>> license in all respects. It doesn't matter how much it is wrapped.
>>  
> Well this is where I'm not sure if I agree, I am asking the FSF right
> now as, if this were the case, I too would find such a clause very
> inconvenient for users.
>
>

If you obtain the code from any package then you are bound by the terms 
of that code. So while a user might not be 'inconvenienced' by the LGPL, 
they are required to meet the terms as required. For some licenses (like 
the LGPL) these terms do not really apply until you distribute the code 
but that does not mean that the user is exempt from the licensing terms 
of that code because they have not distributed their code (yet).

Furthermore there are a number of numpy users that download the numpy 
project for further distribution suc

Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2010-01-18 Thread Benoit Jacob
2010/1/18 Robert Kern :
> On Mon, Jan 18, 2010 at 10:26, Benoit Jacob  wrote:
>> 2010/1/18 Robert Kern :
>>> On Mon, Jan 18, 2010 at 09:35, Benoit Jacob  
>>> wrote:
>>>
 Sorry for continuing the licensing noise on your list --- I though
 that now that I've started, I should let you know that I think I
 understand things more clearly now ;)
>>>
>>> No worries.
>>>
 First, Section 5 of the LGPL is horrible indeed, so let's forget about 
 that.
>>>
>>> I don't think it's that horrible, honestly. It just applies to a
>>> different deployment use case and a different set of technologies.
>>>
 If you were using a LGPL-licensed binary library, Section 4 would
 rather be what you want. It would require you to:
  4a) say somewhere ("prominently" is vague, the bottom of a README is
 OK) that you use the library
  4b) distribute copies of the GPL and LGPL licenses text. Pointless,
 but not a big issue.

 the rest doesn't matter:
  4c) not applicable to you
  4d1) this is what you would be doing anyway
>>>
>>> Possibly, but shared libraries are not easy for a variety of boring,
>>> Python-specific, technical reasons.
>>
>> Ah, that I didn't know.
>>
  4e) not applicable to you
>>>
>>> Yes, it is. The exception where Installation Information is not
>>> required is only when installation is impossible, such as embedded
>>> devices where the code is in a ROM chip.
>>
>> OK, I didn't understand that.
>>
>>>
 Finally and this is the important point: you would not be passing any
 requirement to your own users. Indeed, the LGPL license, contrary to
 the GPL license, does not propagate through dependency chains. So if
 NumPy used a LGPL-licensed lib Foo, the conditions of the LGPL must be
 met when distributing NumPy, but NumPy itself isn't LGPL at all and an
 application using NumPy does not have to care at all about the LGPL.
 So there should be no concern at all of "passing on LGPL requirements
 to users"
>>>
>>> No, not at all. The GPL "propagates" by requiring that the rest of the
>>> code be licensed compatibly with the GPL. This is an unusual and
>>> particular feature of the GPL. The LGPL does not require that rest of
>>> the code be licensed in a particular way. However, that doesn't mean
>>> that the license of the "outer layer" insulates the downstream user
>>> from the LGPL license of the wrapped component. It just means that
>>> there is BSD code and LGPL code in the total product. The downstream
>>> user must accept and deal with the licenses of *all* of the components
>>> simultaneously. This is how most licenses work. I think that the fact
>>> that the GPL is particularly "viral" may be obscuring the normal way
>>> that licenses work when combined with other licenses.
>>>
>>> If I had a proprietary application that used an LGPL library, and I
>>> gave my customers some limited rights to modify and resell my
>>> application, they would still be bound by the LGPL with respect to the
>>> library. They could not modify the LGPLed library and sell it under a
>>> proprietary license even if I allow them to do that with the
>>> application as a whole. For us to use Eigen2 in numpy such that our
>>> users could use, modify and redistribute numpy+Eigen2, in its
>>> entirety, under the terms of the BSD license, we would have to get
>>> permission from you to distribute Eigen2 under the BSD license. It's
>>> only polite.
>>
>> OK, so the Eigen code inside of NumPy would still be protected by the
>> LGPL. But what I meant when I said that the LGPL requirements don't
>> propagate to your users, was that, for example, they don't have to
>> distribute copies of the LGPL text, installation information for
>> Eigen, or links to Eigen's website.
>
> Yes, they do. They are redistributing Eigen; they must abide by its
> license in all respects. It doesn't matter how much it is wrapped.

Well this is where I'm not sure if I agree, I am asking the FSF right
now as, if this were the case, I too would find such a clause very
inconvenient for users.

>
>> The only requirement, if I understand well, is that _if_ a NumPy user
>> wanted to make modifications to  Eigen itself, he would have to
>> conform to the LGPL requirements about sharing the modified source
>> code.
>>
>> But is it really a requirement of NumPy that all its dependencies must
>> be free to modify without redistributing the modified source code?
>
> For the default build and the official binaries, yes.

OK.

>
>> Don't you use MKL, for which the source code is not available at all?
>
> No, we don't. It is a build option. If you were to provide a BLAS
> interface to Eigen, Eigen would be another option.

OK, then I guess that this is what will happen once we release the BLAS library.

Thanks for your patience
Benoit
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2010-01-18 Thread Robert Kern
On Mon, Jan 18, 2010 at 10:26, Benoit Jacob  wrote:
> 2010/1/18 Robert Kern :
>> On Mon, Jan 18, 2010 at 09:35, Benoit Jacob  wrote:
>>
>>> Sorry for continuing the licensing noise on your list --- I though
>>> that now that I've started, I should let you know that I think I
>>> understand things more clearly now ;)
>>
>> No worries.
>>
>>> First, Section 5 of the LGPL is horrible indeed, so let's forget about that.
>>
>> I don't think it's that horrible, honestly. It just applies to a
>> different deployment use case and a different set of technologies.
>>
>>> If you were using a LGPL-licensed binary library, Section 4 would
>>> rather be what you want. It would require you to:
>>>  4a) say somewhere ("prominently" is vague, the bottom of a README is
>>> OK) that you use the library
>>>  4b) distribute copies of the GPL and LGPL licenses text. Pointless,
>>> but not a big issue.
>>>
>>> the rest doesn't matter:
>>>  4c) not applicable to you
>>>  4d1) this is what you would be doing anyway
>>
>> Possibly, but shared libraries are not easy for a variety of boring,
>> Python-specific, technical reasons.
>
> Ah, that I didn't know.
>
>>>  4e) not applicable to you
>>
>> Yes, it is. The exception where Installation Information is not
>> required is only when installation is impossible, such as embedded
>> devices where the code is in a ROM chip.
>
> OK, I didn't understand that.
>
>>
>>> Finally and this is the important point: you would not be passing any
>>> requirement to your own users. Indeed, the LGPL license, contrary to
>>> the GPL license, does not propagate through dependency chains. So if
>>> NumPy used a LGPL-licensed lib Foo, the conditions of the LGPL must be
>>> met when distributing NumPy, but NumPy itself isn't LGPL at all and an
>>> application using NumPy does not have to care at all about the LGPL.
>>> So there should be no concern at all of "passing on LGPL requirements
>>> to users"
>>
>> No, not at all. The GPL "propagates" by requiring that the rest of the
>> code be licensed compatibly with the GPL. This is an unusual and
>> particular feature of the GPL. The LGPL does not require that rest of
>> the code be licensed in a particular way. However, that doesn't mean
>> that the license of the "outer layer" insulates the downstream user
>> from the LGPL license of the wrapped component. It just means that
>> there is BSD code and LGPL code in the total product. The downstream
>> user must accept and deal with the licenses of *all* of the components
>> simultaneously. This is how most licenses work. I think that the fact
>> that the GPL is particularly "viral" may be obscuring the normal way
>> that licenses work when combined with other licenses.
>>
>> If I had a proprietary application that used an LGPL library, and I
>> gave my customers some limited rights to modify and resell my
>> application, they would still be bound by the LGPL with respect to the
>> library. They could not modify the LGPLed library and sell it under a
>> proprietary license even if I allow them to do that with the
>> application as a whole. For us to use Eigen2 in numpy such that our
>> users could use, modify and redistribute numpy+Eigen2, in its
>> entirety, under the terms of the BSD license, we would have to get
>> permission from you to distribute Eigen2 under the BSD license. It's
>> only polite.
>
> OK, so the Eigen code inside of NumPy would still be protected by the
> LGPL. But what I meant when I said that the LGPL requirements don't
> propagate to your users, was that, for example, they don't have to
> distribute copies of the LGPL text, installation information for
> Eigen, or links to Eigen's website.

Yes, they do. They are redistributing Eigen; they must abide by its
license in all respects. It doesn't matter how much it is wrapped.

> The only requirement, if I understand well, is that _if_ a NumPy user
> wanted to make modifications to  Eigen itself, he would have to
> conform to the LGPL requirements about sharing the modified source
> code.
>
> But is it really a requirement of NumPy that all its dependencies must
> be free to modify without redistributing the modified source code?

For the default build and the official binaries, yes.

> Don't you use MKL, for which the source code is not available at all?

No, we don't. It is a build option. If you were to provide a BLAS
interface to Eigen, Eigen would be another option.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2010-01-18 Thread Benoit Jacob
2010/1/18 Robert Kern :
> On Mon, Jan 18, 2010 at 09:35, Benoit Jacob  wrote:
>
>> Sorry for continuing the licensing noise on your list --- I though
>> that now that I've started, I should let you know that I think I
>> understand things more clearly now ;)
>
> No worries.
>
>> First, Section 5 of the LGPL is horrible indeed, so let's forget about that.
>
> I don't think it's that horrible, honestly. It just applies to a
> different deployment use case and a different set of technologies.
>
>> If you were using a LGPL-licensed binary library, Section 4 would
>> rather be what you want. It would require you to:
>>  4a) say somewhere ("prominently" is vague, the bottom of a README is
>> OK) that you use the library
>>  4b) distribute copies of the GPL and LGPL licenses text. Pointless,
>> but not a big issue.
>>
>> the rest doesn't matter:
>>  4c) not applicable to you
>>  4d1) this is what you would be doing anyway
>
> Possibly, but shared libraries are not easy for a variety of boring,
> Python-specific, technical reasons.

Ah, that I didn't know.

>>  4e) not applicable to you
>
> Yes, it is. The exception where Installation Information is not
> required is only when installation is impossible, such as embedded
> devices where the code is in a ROM chip.

OK, I didn't understand that.

>
>> Finally and this is the important point: you would not be passing any
>> requirement to your own users. Indeed, the LGPL license, contrary to
>> the GPL license, does not propagate through dependency chains. So if
>> NumPy used a LGPL-licensed lib Foo, the conditions of the LGPL must be
>> met when distributing NumPy, but NumPy itself isn't LGPL at all and an
>> application using NumPy does not have to care at all about the LGPL.
>> So there should be no concern at all of "passing on LGPL requirements
>> to users"
>
> No, not at all. The GPL "propagates" by requiring that the rest of the
> code be licensed compatibly with the GPL. This is an unusual and
> particular feature of the GPL. The LGPL does not require that rest of
> the code be licensed in a particular way. However, that doesn't mean
> that the license of the "outer layer" insulates the downstream user
> from the LGPL license of the wrapped component. It just means that
> there is BSD code and LGPL code in the total product. The downstream
> user must accept and deal with the licenses of *all* of the components
> simultaneously. This is how most licenses work. I think that the fact
> that the GPL is particularly "viral" may be obscuring the normal way
> that licenses work when combined with other licenses.
>
> If I had a proprietary application that used an LGPL library, and I
> gave my customers some limited rights to modify and resell my
> application, they would still be bound by the LGPL with respect to the
> library. They could not modify the LGPLed library and sell it under a
> proprietary license even if I allow them to do that with the
> application as a whole. For us to use Eigen2 in numpy such that our
> users could use, modify and redistribute numpy+Eigen2, in its
> entirety, under the terms of the BSD license, we would have to get
> permission from you to distribute Eigen2 under the BSD license. It's
> only polite.

OK, so the Eigen code inside of NumPy would still be protected by the
LGPL. But what I meant when I said that the LGPL requirements don't
propagate to your users, was that, for example, they don't have to
distribute copies of the LGPL text, installation information for
Eigen, or links to Eigen's website.

The only requirement, if I understand well, is that _if_ a NumPy user
wanted to make modifications to  Eigen itself, he would have to
conform to the LGPL requirements about sharing the modified source
code.

But is it really a requirement of NumPy that all its dependencies must
be free to modify without redistributing the modified source code?
Don't you use MKL, for which the source code is not available at all?
I am not sure that I understand how that is better than having source
code subject to LGPL requirements.

Benoit
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2010-01-18 Thread Robert Kern
On Mon, Jan 18, 2010 at 09:35, Benoit Jacob  wrote:

> Sorry for continuing the licensing noise on your list --- I though
> that now that I've started, I should let you know that I think I
> understand things more clearly now ;)

No worries.

> First, Section 5 of the LGPL is horrible indeed, so let's forget about that.

I don't think it's that horrible, honestly. It just applies to a
different deployment use case and a different set of technologies.

> If you were using a LGPL-licensed binary library, Section 4 would
> rather be what you want. It would require you to:
>  4a) say somewhere ("prominently" is vague, the bottom of a README is
> OK) that you use the library
>  4b) distribute copies of the GPL and LGPL licenses text. Pointless,
> but not a big issue.
>
> the rest doesn't matter:
>  4c) not applicable to you
>  4d1) this is what you would be doing anyway

Possibly, but shared libraries are not easy for a variety of boring,
Python-specific, technical reasons. 4d0 would be easier for the
official binaries (because we provide official source). But that would
still force people building a proprietary application using numpy to
rebuild a binary without Eigen2 or else make sure that they allow
users to rebuild numpy. For a number of deployment options (py2app,
py2exe, bbfreeze, etc.), this is annoying, particularly when combined
with the 4e requirement, as I explain below.

>  4e) not applicable to you

Yes, it is. The exception where Installation Information is not
required is only when installation is impossible, such as embedded
devices where the code is in a ROM chip.

> Finally and this is the important point: you would not be passing any
> requirement to your own users. Indeed, the LGPL license, contrary to
> the GPL license, does not propagate through dependency chains. So if
> NumPy used a LGPL-licensed lib Foo, the conditions of the LGPL must be
> met when distributing NumPy, but NumPy itself isn't LGPL at all and an
> application using NumPy does not have to care at all about the LGPL.
> So there should be no concern at all of "passing on LGPL requirements
> to users"

No, not at all. The GPL "propagates" by requiring that the rest of the
code be licensed compatibly with the GPL. This is an unusual and
particular feature of the GPL. The LGPL does not require that rest of
the code be licensed in a particular way. However, that doesn't mean
that the license of the "outer layer" insulates the downstream user
from the LGPL license of the wrapped component. It just means that
there is BSD code and LGPL code in the total product. The downstream
user must accept and deal with the licenses of *all* of the components
simultaneously. This is how most licenses work. I think that the fact
that the GPL is particularly "viral" may be obscuring the normal way
that licenses work when combined with other licenses.

If I had a proprietary application that used an LGPL library, and I
gave my customers some limited rights to modify and resell my
application, they would still be bound by the LGPL with respect to the
library. They could not modify the LGPLed library and sell it under a
proprietary license even if I allow them to do that with the
application as a whole. For us to use Eigen2 in numpy such that our
users could use, modify and redistribute numpy+Eigen2, in its
entirety, under the terms of the BSD license, we would have to get
permission from you to distribute Eigen2 under the BSD license. It's
only polite.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2010-01-18 Thread Benoit Jacob
2010/1/17 Benoit Jacob :
> 2010/1/17 Robert Kern :
>> On Sun, Jan 17, 2010 at 13:18, Benoit Jacob  wrote:
>>> 2010/1/17 Robert Kern :
 On Sun, Jan 17, 2010 at 12:11, Benoit Jacob  
 wrote:
> 2010/1/17 Robert Kern :
>> On Sun, Jan 17, 2010 at 08:52, Benoit Jacob  
>> wrote:
>>> 2010/1/17 David Cournapeau :
>>
 There are several issues with eigen2 for NumPy usage:
  - using it as a default implementation does not make much sense IMHO,
 as it would make distributed binaries non 100 % BSD.
>>>
>>> But the LGPL doesn't impose restrictions on the usage of binaries, so
>>> how does it matter? The LGPL and the BSD licenses are similar as far
>>> as the binaries are concerned (unless perhaps one starts disassembling
>>> them).
>>>
>>> The big difference between LGPL and BSD is at the level of source
>>> code, not binary code: one modifies LGPL-based source code and
>>> distributes a binary form of it, then one has to release the modified
>>> source code as well.
>>
>> This is not true. Binaries that contain LGPLed code must be able to be
>> relinked with a modified version of the LGPLed component.
>
> This doesn't apply to Eigen which is a header-only pure template
> library, hence can't be 'linked' to.
>
> Actually you seem to be referring to Section 4 of the LGPL3, we have
> already asked the FSF about this and their reply was that it just
> doesn't apply in the case of Eigen:
>
> http://listengine.tuxfamily.org/lists.tuxfamily.org/eigen/2009/01/msg00083.html
>
> In your case, what matters is Section 5.

 You mean Section 3. Good.
>>>
>>> Section 3 is for using Eigen directly in a C++ program, yes, but I got
>>> a bit ahead of myself there: see below
>>>
 I admit to being less up on the details of
 LGPLv3 than I was of LGPLv2 which had a problem with C++ header
 templates.
>>>
>>> Indeed, it did, that's why we don't use it.
>>>

 That said, we will not be using the C++ templates directly in numpy
 for technical reasons (not least that we do not want to require a C++
 compiler for the default build). At best, we would be using a BLAS
 interface which requires linking of objects, not just header
 templates. That *would* impose the Section 4 requirements.
>>>
>>> ... or rather Section 5: that is what I was having in mind:
>>>  " 5. Combined Libraries. "
>>>
>>> I have to admit that I don't understand what 5.a) means.
>>
>> I don't think it applies. Let's say I write some routines that use an
>> LGPLed Library (let's call them Routines A). I can include those
>> routines in a larger library with routines that do not use the LGPLed
>> library (Routines B). The Routines B can be under whatever license you
>> like. However, one must make a library containing only Routines A and
>> the LGPLed Library and release that under the LGPLv3, distribute it
>> along with the combined work, and give notice about how to obtain
>> Routines A+Library separate from Routines B. Basically, it's another
>> exception for needing to be able to relink object code in a particular
>> technical use case.
>>
>> This cannot apply to numpy because we cannot break out numpy.linalg
>> from the rest of numpy. Even if we could, we do not wish to make
>> numpy.linalg itself LGPLed.
>
> Indeed, that seems very cumbersome. I will ask the FSF about this, as
> this is definitely not something that we want to impose on Eigen
> users.
>

Sorry for continuing the licensing noise on your list --- I though
that now that I've started, I should let you know that I think I
understand things more clearly now ;)

First, Section 5 of the LGPL is horrible indeed, so let's forget about that.

If you were using a LGPL-licensed binary library, Section 4 would
rather be what you want. It would require you to:
 4a) say somewhere ("prominently" is vague, the bottom of a README is
OK) that you use the library
 4b) distribute copies of the GPL and LGPL licenses text. Pointless,
but not a big issue.

the rest doesn't matter:
 4c) not applicable to you
 4d1) this is what you would be doing anyway
 4e) not applicable to you

Finally and this is the important point: you would not be passing any
requirement to your own users. Indeed, the LGPL license, contrary to
the GPL license, does not propagate through dependency chains. So if
NumPy used a LGPL-licensed lib Foo, the conditions of the LGPL must be
met when distributing NumPy, but NumPy itself isn't LGPL at all and an
application using NumPy does not have to care at all about the LGPL.
So there should be no concern at all of "passing on LGPL requirements
to users"

Again, IANAL.

Benoit
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2010-01-17 Thread Benoit Jacob
2010/1/17 Robert Kern :
> On Sun, Jan 17, 2010 at 13:18, Benoit Jacob  wrote:
>> 2010/1/17 Robert Kern :
>>> On Sun, Jan 17, 2010 at 12:11, Benoit Jacob  
>>> wrote:
 2010/1/17 Robert Kern :
> On Sun, Jan 17, 2010 at 08:52, Benoit Jacob  
> wrote:
>> 2010/1/17 David Cournapeau :
>
>>> There are several issues with eigen2 for NumPy usage:
>>>  - using it as a default implementation does not make much sense IMHO,
>>> as it would make distributed binaries non 100 % BSD.
>>
>> But the LGPL doesn't impose restrictions on the usage of binaries, so
>> how does it matter? The LGPL and the BSD licenses are similar as far
>> as the binaries are concerned (unless perhaps one starts disassembling
>> them).
>>
>> The big difference between LGPL and BSD is at the level of source
>> code, not binary code: one modifies LGPL-based source code and
>> distributes a binary form of it, then one has to release the modified
>> source code as well.
>
> This is not true. Binaries that contain LGPLed code must be able to be
> relinked with a modified version of the LGPLed component.

 This doesn't apply to Eigen which is a header-only pure template
 library, hence can't be 'linked' to.

 Actually you seem to be referring to Section 4 of the LGPL3, we have
 already asked the FSF about this and their reply was that it just
 doesn't apply in the case of Eigen:

 http://listengine.tuxfamily.org/lists.tuxfamily.org/eigen/2009/01/msg00083.html

 In your case, what matters is Section 5.
>>>
>>> You mean Section 3. Good.
>>
>> Section 3 is for using Eigen directly in a C++ program, yes, but I got
>> a bit ahead of myself there: see below
>>
>>> I admit to being less up on the details of
>>> LGPLv3 than I was of LGPLv2 which had a problem with C++ header
>>> templates.
>>
>> Indeed, it did, that's why we don't use it.
>>
>>>
>>> That said, we will not be using the C++ templates directly in numpy
>>> for technical reasons (not least that we do not want to require a C++
>>> compiler for the default build). At best, we would be using a BLAS
>>> interface which requires linking of objects, not just header
>>> templates. That *would* impose the Section 4 requirements.
>>
>> ... or rather Section 5: that is what I was having in mind:
>>  " 5. Combined Libraries. "
>>
>> I have to admit that I don't understand what 5.a) means.
>
> I don't think it applies. Let's say I write some routines that use an
> LGPLed Library (let's call them Routines A). I can include those
> routines in a larger library with routines that do not use the LGPLed
> library (Routines B). The Routines B can be under whatever license you
> like. However, one must make a library containing only Routines A and
> the LGPLed Library and release that under the LGPLv3, distribute it
> along with the combined work, and give notice about how to obtain
> Routines A+Library separate from Routines B. Basically, it's another
> exception for needing to be able to relink object code in a particular
> technical use case.
>
> This cannot apply to numpy because we cannot break out numpy.linalg
> from the rest of numpy. Even if we could, we do not wish to make
> numpy.linalg itself LGPLed.

Indeed, that seems very cumbersome. I will ask the FSF about this, as
this is definitely not something that we want to impose on Eigen
users.

Benoit

>
> --
> Robert Kern
>
> "I have come to believe that the whole world is an enigma, a harmless
> enigma that is made terrible by our own mad attempt to interpret it as
> though it had an underlying truth."
>  -- Umberto Eco
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2010-01-17 Thread Robert Kern
On Sun, Jan 17, 2010 at 13:18, Benoit Jacob  wrote:
> 2010/1/17 Robert Kern :
>> On Sun, Jan 17, 2010 at 12:11, Benoit Jacob  wrote:
>>> 2010/1/17 Robert Kern :
 On Sun, Jan 17, 2010 at 08:52, Benoit Jacob  
 wrote:
> 2010/1/17 David Cournapeau :

>> There are several issues with eigen2 for NumPy usage:
>>  - using it as a default implementation does not make much sense IMHO,
>> as it would make distributed binaries non 100 % BSD.
>
> But the LGPL doesn't impose restrictions on the usage of binaries, so
> how does it matter? The LGPL and the BSD licenses are similar as far
> as the binaries are concerned (unless perhaps one starts disassembling
> them).
>
> The big difference between LGPL and BSD is at the level of source
> code, not binary code: one modifies LGPL-based source code and
> distributes a binary form of it, then one has to release the modified
> source code as well.

 This is not true. Binaries that contain LGPLed code must be able to be
 relinked with a modified version of the LGPLed component.
>>>
>>> This doesn't apply to Eigen which is a header-only pure template
>>> library, hence can't be 'linked' to.
>>>
>>> Actually you seem to be referring to Section 4 of the LGPL3, we have
>>> already asked the FSF about this and their reply was that it just
>>> doesn't apply in the case of Eigen:
>>>
>>> http://listengine.tuxfamily.org/lists.tuxfamily.org/eigen/2009/01/msg00083.html
>>>
>>> In your case, what matters is Section 5.
>>
>> You mean Section 3. Good.
>
> Section 3 is for using Eigen directly in a C++ program, yes, but I got
> a bit ahead of myself there: see below
>
>> I admit to being less up on the details of
>> LGPLv3 than I was of LGPLv2 which had a problem with C++ header
>> templates.
>
> Indeed, it did, that's why we don't use it.
>
>>
>> That said, we will not be using the C++ templates directly in numpy
>> for technical reasons (not least that we do not want to require a C++
>> compiler for the default build). At best, we would be using a BLAS
>> interface which requires linking of objects, not just header
>> templates. That *would* impose the Section 4 requirements.
>
> ... or rather Section 5: that is what I was having in mind:
>  " 5. Combined Libraries. "
>
> I have to admit that I don't understand what 5.a) means.

I don't think it applies. Let's say I write some routines that use an
LGPLed Library (let's call them Routines A). I can include those
routines in a larger library with routines that do not use the LGPLed
library (Routines B). The Routines B can be under whatever license you
like. However, one must make a library containing only Routines A and
the LGPLed Library and release that under the LGPLv3, distribute it
along with the combined work, and give notice about how to obtain
Routines A+Library separate from Routines B. Basically, it's another
exception for needing to be able to relink object code in a particular
technical use case.

This cannot apply to numpy because we cannot break out numpy.linalg
from the rest of numpy. Even if we could, we do not wish to make
numpy.linalg itself LGPLed.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2010-01-17 Thread Benoit Jacob
2010/1/17 Robert Kern :
> On Sun, Jan 17, 2010 at 12:11, Benoit Jacob  wrote:
>> 2010/1/17 Robert Kern :
>>> On Sun, Jan 17, 2010 at 08:52, Benoit Jacob  
>>> wrote:
 2010/1/17 David Cournapeau :
>>>
> There are several issues with eigen2 for NumPy usage:
>  - using it as a default implementation does not make much sense IMHO,
> as it would make distributed binaries non 100 % BSD.

 But the LGPL doesn't impose restrictions on the usage of binaries, so
 how does it matter? The LGPL and the BSD licenses are similar as far
 as the binaries are concerned (unless perhaps one starts disassembling
 them).

 The big difference between LGPL and BSD is at the level of source
 code, not binary code: one modifies LGPL-based source code and
 distributes a binary form of it, then one has to release the modified
 source code as well.
>>>
>>> This is not true. Binaries that contain LGPLed code must be able to be
>>> relinked with a modified version of the LGPLed component.
>>
>> This doesn't apply to Eigen which is a header-only pure template
>> library, hence can't be 'linked' to.
>>
>> Actually you seem to be referring to Section 4 of the LGPL3, we have
>> already asked the FSF about this and their reply was that it just
>> doesn't apply in the case of Eigen:
>>
>> http://listengine.tuxfamily.org/lists.tuxfamily.org/eigen/2009/01/msg00083.html
>>
>> In your case, what matters is Section 5.
>
> You mean Section 3. Good.

Section 3 is for using Eigen directly in a C++ program, yes, but I got
a bit ahead of myself there: see below

> I admit to being less up on the details of
> LGPLv3 than I was of LGPLv2 which had a problem with C++ header
> templates.

Indeed, it did, that's why we don't use it.

>
> That said, we will not be using the C++ templates directly in numpy
> for technical reasons (not least that we do not want to require a C++
> compiler for the default build). At best, we would be using a BLAS
> interface which requires linking of objects, not just header
> templates. That *would* impose the Section 4 requirements.

... or rather Section 5: that is what I was having in mind:
  " 5. Combined Libraries. "

I have to admit that I don't understand what 5.a) means.

> Furthermore, we would still prefer not to have any LGPL code in the
> official numpy sources or binaries, regardless of how minimal the real
> requirements are. Licensing is confusing enough that being able to say
> "numpy is BSD licensed" without qualification is quite important.

I hear you, in the same way we definitely care about being able to say
"Eigen is LGPL licensed". So it's a hard problem. I think that this is
the only real issue here, but I definitely agree that it is a real
one. Large projects (such as Qt) that have a third_party subdirectory
have to find a wording to explain that their license doesn't cover it.

Benoit


>
> --
> Robert Kern
>
> "I have come to believe that the whole world is an enigma, a harmless
> enigma that is made terrible by our own mad attempt to interpret it as
> though it had an underlying truth."
>  -- Umberto Eco
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2010-01-17 Thread Robert Kern
On Sun, Jan 17, 2010 at 12:11, Benoit Jacob  wrote:
> 2010/1/17 Robert Kern :
>> On Sun, Jan 17, 2010 at 08:52, Benoit Jacob  wrote:
>>> 2010/1/17 David Cournapeau :
>>
 There are several issues with eigen2 for NumPy usage:
  - using it as a default implementation does not make much sense IMHO,
 as it would make distributed binaries non 100 % BSD.
>>>
>>> But the LGPL doesn't impose restrictions on the usage of binaries, so
>>> how does it matter? The LGPL and the BSD licenses are similar as far
>>> as the binaries are concerned (unless perhaps one starts disassembling
>>> them).
>>>
>>> The big difference between LGPL and BSD is at the level of source
>>> code, not binary code: one modifies LGPL-based source code and
>>> distributes a binary form of it, then one has to release the modified
>>> source code as well.
>>
>> This is not true. Binaries that contain LGPLed code must be able to be
>> relinked with a modified version of the LGPLed component.
>
> This doesn't apply to Eigen which is a header-only pure template
> library, hence can't be 'linked' to.
>
> Actually you seem to be referring to Section 4 of the LGPL3, we have
> already asked the FSF about this and their reply was that it just
> doesn't apply in the case of Eigen:
>
> http://listengine.tuxfamily.org/lists.tuxfamily.org/eigen/2009/01/msg00083.html
>
> In your case, what matters is Section 5.

You mean Section 3. Good. I admit to being less up on the details of
LGPLv3 than I was of LGPLv2 which had a problem with C++ header
templates.

That said, we will not be using the C++ templates directly in numpy
for technical reasons (not least that we do not want to require a C++
compiler for the default build). At best, we would be using a BLAS
interface which requires linking of objects, not just header
templates. That *would* impose the Section 4 requirements.

Furthermore, we would still prefer not to have any LGPL code in the
official numpy sources or binaries, regardless of how minimal the real
requirements are. Licensing is confusing enough that being able to say
"numpy is BSD licensed" without qualification is quite important.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2010-01-17 Thread Benoit Jacob
2010/1/17 Robert Kern :
> On Sun, Jan 17, 2010 at 08:52, Benoit Jacob  wrote:
>> 2010/1/17 David Cournapeau :
>
>>> There are several issues with eigen2 for NumPy usage:
>>>  - using it as a default implementation does not make much sense IMHO,
>>> as it would make distributed binaries non 100 % BSD.
>>
>> But the LGPL doesn't impose restrictions on the usage of binaries, so
>> how does it matter? The LGPL and the BSD licenses are similar as far
>> as the binaries are concerned (unless perhaps one starts disassembling
>> them).
>>
>> The big difference between LGPL and BSD is at the level of source
>> code, not binary code: one modifies LGPL-based source code and
>> distributes a binary form of it, then one has to release the modified
>> source code as well.
>
> This is not true. Binaries that contain LGPLed code must be able to be
> relinked with a modified version of the LGPLed component.

This doesn't apply to Eigen which is a header-only pure template
library, hence can't be 'linked' to.

Actually you seem to be referring to Section 4 of the LGPL3, we have
already asked the FSF about this and their reply was that it just
doesn't apply in the case of Eigen:

http://listengine.tuxfamily.org/lists.tuxfamily.org/eigen/2009/01/msg00083.html

In your case, what matters is Section 5.

>In addition, binaries containing an LGPLed
> component must still come with the source of the LGPLed component (or
> come with a written offer to distribute via the same mechanism ...
> yada yada yada).

Since you would presumably be using vanilla Eigen without changes of
your own, it would be enough to just give the link to the Eigen
website, that's all. Just one line, and it doesn't have to be in a
very prominent place, it just has to be reasonably easy to find for
someone looking for it.

> These are non-trivial restrictions above and beyond
> the BSD license that we, as a matter of policy, do not wish to impose
> on numpy users.

The only thing you'd be imposing on NumPy users would be that
somewhere at the bottom of, say, your README file, there would be a
link to Eigen's website. Then who am I to discuss your policies ;)

Finally let me just give an example why this is moot. You are using
GCC, right? So you use the GNU libc (their standard C library)? It is
LGPL ;) It's just that nobody cares to put a link to the GNU libc
homepage, which is understandable ;)

Cheers,
Benoit
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2010-01-17 Thread Robert Kern
On Sun, Jan 17, 2010 at 08:52, Benoit Jacob  wrote:
> 2010/1/17 David Cournapeau :

>> There are several issues with eigen2 for NumPy usage:
>>  - using it as a default implementation does not make much sense IMHO,
>> as it would make distributed binaries non 100 % BSD.
>
> But the LGPL doesn't impose restrictions on the usage of binaries, so
> how does it matter? The LGPL and the BSD licenses are similar as far
> as the binaries are concerned (unless perhaps one starts disassembling
> them).
>
> The big difference between LGPL and BSD is at the level of source
> code, not binary code: one modifies LGPL-based source code and
> distributes a binary form of it, then one has to release the modified
> source code as well.

This is not true. Binaries that contain LGPLed code must be able to be
relinked with a modified version of the LGPLed component. This is
technically non-trivial. In addition, binaries containing an LGPLed
component must still come with the source of the LGPLed component (or
come with a written offer to distribute via the same mechanism ...
yada yada yada). These are non-trivial restrictions above and beyond
the BSD license that we, as a matter of policy, do not wish to impose
on numpy users.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2010-01-17 Thread Benoit Jacob
2010/1/17 David Cournapeau :
> On Sun, Jan 17, 2010 at 2:20 PM, Benoit Jacob  
> wrote:
>
>> Couldn't you simply:
>>  - either add LGPL-licensed code to a third_party subdirectory not
>> subject to the NumPy license, and just use it? This is common
>> practice, see e.g. how Qt puts a copy of WebKit in a third_party
>> subdirectory.
>>  - or use LGPL-licensed code as an external dependency?
>

Thanks for the reply!
First of all I should say that I was only talking about the raised
licensing issue, I'm not saying that you _should_ use eigen from a
technical point of view.

> There are several issues with eigen2 for NumPy usage:
>  - using it as a default implementation does not make much sense IMHO,
> as it would make distributed binaries non 100 % BSD.

But the LGPL doesn't impose restrictions on the usage of binaries, so
how does it matter? The LGPL and the BSD licenses are similar as far
as the binaries are concerned (unless perhaps one starts disassembling
them).

The big difference between LGPL and BSD is at the level of source
code, not binary code: one modifies LGPL-based source code and
distributes a binary form of it, then one has to release the modified
source code as well. Since NumPy's users are presumably not interested
in modifying _Eigen_ itself, I don't think that matters. I understand
that they may want to modify NumPy's source code without releasing
their modified source code, so the BSD license is important for NumPy,
but having Eigen in a third_party directory wouldn't affect that.

>  - to my knowledge, eigen2 does not have a BLAS API, so we would have
> to write specific wrappers for eigen2, which is undesirable.

That's true. FYI, a BLAS API is coming in Eigen 3,
https://bitbucket.org/eigen/eigen/src/tip/blas/

>  - eigen2 is C++, and it is a stated goal to make numpy depend only on
> a C compiler (it may optionally uses fortran to link against
> blas/lapack, though).

Ah OK. Well, once the Eigen BLAS is implemented, it will be usable by
a C compiler.

> As I see it, people would be able to easily use eigen2 if there was a
> BLAS API for it. We still would not distribute binaries built with
> eigen2, but it means people who don't care about using GPL code could
> use it.

I see. I'd quite like to see this happening! Maybe, just give a look
at where Eigen is in 1 year from now, the BLAS should be ready for
that.

>
> Independently of NumPy, I think a BLAS API for eigen2 would be very
> beneficial for eigen2 if you care about the numerical scientific
> community.

So do we, that's why we're doing it ;) see above.

Benoit
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2010-01-16 Thread David Cournapeau
On Sun, Jan 17, 2010 at 2:20 PM, Benoit Jacob  wrote:

> Couldn't you simply:
>  - either add LGPL-licensed code to a third_party subdirectory not
> subject to the NumPy license, and just use it? This is common
> practice, see e.g. how Qt puts a copy of WebKit in a third_party
> subdirectory.
>  - or use LGPL-licensed code as an external dependency?

There are several issues with eigen2 for NumPy usage:
 - using it as a default implementation does not make much sense IMHO,
as it would make distributed binaries non 100 % BSD.
 - to my knowledge, eigen2 does not have a BLAS API, so we would have
to write specific wrappers for eigen2, which is undesirable.
 - eigen2 is C++, and it is a stated goal to make numpy depend only on
a C compiler (it may optionally uses fortran to link against
blas/lapack, though).

As I see it, people would be able to easily use eigen2 if there was a
BLAS API for it. We still would not distribute binaries built with
eigen2, but it means people who don't care about using GPL code could
use it.

Independently of NumPy, I think a BLAS API for eigen2 would be very
beneficial for eigen2 if you care about the numerical scientific
community.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2010-01-16 Thread Benoit Jacob
>> Hi,
>>
>> I while back, someone talked about aigen2(http://eigen.tuxfamily.org/). In
>> their benchmark they give info that they are competitive again mkl and goto
>> on matrix matrix product. They are not better, but that could make a good
>> default implementation for numpy when their is no blas installed. I think
>> the license would allow to include it in numpy directly.
>
>It is licensed under the LGPLv3, so it is not compatible with the numpy 
>license.

Hi,

I'm one of Eigen's authors. Eigen is indeed LGPL3 licensed. Our intent
and understanding is that this makes Eigen usable by virtually any
software, whence my disappointment to learn that LGPL3 software can't
be used by NumPy.

Just for my information, could you tell my why NumPy can't use
LGPL3-licensed libraries?

I found this page:
http://www.scipy.org/License_Compatibility

It does say that LGPL-licensed code can't be added to NumPy, but
there's a big difference between adding LGPL code directly into NumPy,
and just letting NumPy _use_ LGPL code. Couldn't you simply:
 - either add LGPL-licensed code to a third_party subdirectory not
subject to the NumPy license, and just use it? This is common
practice, see e.g. how Qt puts a copy of WebKit in a third_party
subdirectory.
 - or use LGPL-licensed code as an external dependency?

FYI, several BSD-licensed projects are using Eigen ;)

Thanks for your consideration
Benoit
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2010-01-08 Thread Robert Kern
2010/1/8 Frédéric Bastien :
> Hi,
>
> I while back, someone talked about aigen2(http://eigen.tuxfamily.org/). In
> their benchmark they give info that they are competitive again mkl and goto
> on matrix matrix product. They are not better, but that could make a good
> default implementation for numpy when their is no blas installed. I think
> the license would allow to include it in numpy directly.

It is licensed under the LGPLv3, so it is not compatible with the numpy license.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2010-01-08 Thread Frédéric Bastien
Hi,

I while back, someone talked about aigen2(http://eigen.tuxfamily.org/). In
their benchmark they give info that they are competitive again mkl and goto
on matrix matrix product. They are not better, but that could make a good
default implementation for numpy when their is no blas installed. I think
the license would allow to include it in numpy directly.

I don't have time to do it, and my numpy is linked with goto. So it would be
useless for me. But if someone want to make the default version better again
other tools, that could be a good approach.

Frédéric Bastien

On Thu, Jan 7, 2010 at 12:47 PM, Sturla Molden  wrote:

> > Sturla Molden wrote:
> >> I would suggest using GotoBLAS instead of ATLAS.
> >
> >> http://www.tacc.utexas.edu/tacc-projects/
> >
> > That does look promising -- nay idea what the license is? They don't
> > make it clear on the site
>
>
>
> UT TACC Research License (Source Code)
>
>
>
> The Texas Advanced Computing Center of The University of Texas at Austin
> has developed certain software and documentation that it desires to make
> available without charge to anyone for academic, research, experimental or
> personal use. This license is designed to guarantee freedom to use the
> software for these purposes. If you wish to distribute or make other use
> of the software, you may purchase a license to do so from the University
> of Texas.
>
> The accompanying source code is made available to you under the terms of
> this UT TACC Research License (this "UTTRL"). By clicking the "ACCEPT"
> button, or by installing or using the code, you are consenting to be bound
> by this UTTRL. If you do not agree to the terms and conditions of this
> license, do not click the "ACCEPT" button, and do not install or use any
> part of the code.
>
> The terms and conditions in this UTTRL not only apply to the source code
> made available by UT TACC, but also to any improvements to, or derivative
> works of, that source code made by you and to any object code compiled
> from such source code, improvements or derivative works.
>
> 1. DEFINITIONS.
>
> 1.1 "Commercial Use" shall mean use of Software or Documentation by
> Licensee for direct or indirect financial, commercial or strategic gain or
> advantage, including without limitation: (a) bundling or integrating the
> Software with any hardware product or another software product for
> transfer, sale or license to a third party (even if distributing the
> Software on separate media and not charging for the Software); (b)
> providing customers with a link to the Software or a copy of the Software
> for use with hardware or another software product purchased by that
> customer; or (c) use in connection with the performance of services for
> which Licensee is compensated.
>
> 1.2 "Derivative Products" means any improvements to, or other derivative
> works of, the Software made by Licensee.
>
> 1.3 "Documentation" shall mean all manuals, user documentation, and other
> related materials pertaining to the Software that are made available to
> Licensee in connection with the Software.
>
> 1.4 "Licensor" shall mean The University of Texas.
>
> 1.5 "Licensee" shall mean the person or entity that has agreed to the
> terms hereof and is exercising rights granted hereunder.
>
> 1.6 "Software" shall mean the computer program(s) referred to as GotoBLAS2
> made available under this UTTRL in source code form, including any error
> corrections, bug fixes, patches, updates or other modifications that
> Licensor may in its sole discretion make available to Licensee from time
> to time, and any object code compiled from such source code.
>
> 2. GRANT OF RIGHTS.
>
> Subject to the terms and conditions hereunder, Licensor hereby grants to
> Licensee a worldwide, non-transferable, non-exclusive license to (a)
> install, use and reproduce the Software for academic, research,
> experimental and personal use (but specifically excluding Commercial Use);
> (b) use and modify the Software to create Derivative Products, subject to
> Section 3.2; and (c) use the Documentation, if any, solely in connection
> with Licensee's authorized use of the Software.
>
> 3. RESTRICTIONS; COVENANTS.
>
> 3.1 Licensee may not: (a) distribute, sub-license or otherwise transfer
> copies or rights to the Software (or any portion thereof) or the
> Documentation; (b) use the Software (or any portion thereof) or
> Documentation for Commercial Use, or for any other use except as described
> in Section 2; (c) copy the Software or Documentation other than for
> archival and backup purposes; or (d) remove any product identification,
> copyright, proprietary notices or labels from the Software and
> Documentation. This UTTRL confers no rights upon Licensee except those
> expressly granted herein.
>
> 3.2 Licensee hereby agrees that it will provide a copy of all Derivative
> Products to Licensor and that its use of the Derivative Products will be
> subject to all of the same terms, conditions, restrictions an

Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2010-01-07 Thread Sturla Molden
> Sturla Molden wrote:
>> I would suggest using GotoBLAS instead of ATLAS.
>
>> http://www.tacc.utexas.edu/tacc-projects/
>
> That does look promising -- nay idea what the license is? They don't
> make it clear on the site



UT TACC Research License (Source Code)



The Texas Advanced Computing Center of The University of Texas at Austin
has developed certain software and documentation that it desires to make
available without charge to anyone for academic, research, experimental or
personal use. This license is designed to guarantee freedom to use the
software for these purposes. If you wish to distribute or make other use
of the software, you may purchase a license to do so from the University
of Texas.

The accompanying source code is made available to you under the terms of
this UT TACC Research License (this "UTTRL"). By clicking the "ACCEPT"
button, or by installing or using the code, you are consenting to be bound
by this UTTRL. If you do not agree to the terms and conditions of this
license, do not click the "ACCEPT" button, and do not install or use any
part of the code.

The terms and conditions in this UTTRL not only apply to the source code
made available by UT TACC, but also to any improvements to, or derivative
works of, that source code made by you and to any object code compiled
from such source code, improvements or derivative works.

1. DEFINITIONS.

1.1 "Commercial Use" shall mean use of Software or Documentation by
Licensee for direct or indirect financial, commercial or strategic gain or
advantage, including without limitation: (a) bundling or integrating the
Software with any hardware product or another software product for
transfer, sale or license to a third party (even if distributing the
Software on separate media and not charging for the Software); (b)
providing customers with a link to the Software or a copy of the Software
for use with hardware or another software product purchased by that
customer; or (c) use in connection with the performance of services for
which Licensee is compensated.

1.2 "Derivative Products" means any improvements to, or other derivative
works of, the Software made by Licensee.

1.3 "Documentation" shall mean all manuals, user documentation, and other
related materials pertaining to the Software that are made available to
Licensee in connection with the Software.

1.4 "Licensor" shall mean The University of Texas.

1.5 "Licensee" shall mean the person or entity that has agreed to the
terms hereof and is exercising rights granted hereunder.

1.6 "Software" shall mean the computer program(s) referred to as GotoBLAS2
made available under this UTTRL in source code form, including any error
corrections, bug fixes, patches, updates or other modifications that
Licensor may in its sole discretion make available to Licensee from time
to time, and any object code compiled from such source code.

2. GRANT OF RIGHTS.

Subject to the terms and conditions hereunder, Licensor hereby grants to
Licensee a worldwide, non-transferable, non-exclusive license to (a)
install, use and reproduce the Software for academic, research,
experimental and personal use (but specifically excluding Commercial Use);
(b) use and modify the Software to create Derivative Products, subject to
Section 3.2; and (c) use the Documentation, if any, solely in connection
with Licensee's authorized use of the Software.

3. RESTRICTIONS; COVENANTS.

3.1 Licensee may not: (a) distribute, sub-license or otherwise transfer
copies or rights to the Software (or any portion thereof) or the
Documentation; (b) use the Software (or any portion thereof) or
Documentation for Commercial Use, or for any other use except as described
in Section 2; (c) copy the Software or Documentation other than for
archival and backup purposes; or (d) remove any product identification,
copyright, proprietary notices or labels from the Software and
Documentation. This UTTRL confers no rights upon Licensee except those
expressly granted herein.

3.2 Licensee hereby agrees that it will provide a copy of all Derivative
Products to Licensor and that its use of the Derivative Products will be
subject to all of the same terms, conditions, restrictions and limitations
on use imposed on the Software under this UTTRL. Licensee hereby grants
Licensor a worldwide, non-exclusive, royalty-free license to reproduce,
prepare derivative works of, publicly display, publicly perform,
sublicense and distribute Derivative Products. Licensee also hereby grants
Licensor a worldwide, non-exclusive, royalty-free patent license to make,
have made, use, offer to sell, sell, import and otherwise transfer the
Derivative Products under those patent claims licensable by Licensee that
are necessarily infringed by the Derivative Products.

4. PROTECTION OF SOFTWARE.

4.1 Confidentiality. The Software and Documentation are the confidential
and proprietary information of Licensor. Licensee agrees to take adequate
steps to protect the Software and Documentation 

Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2010-01-07 Thread Christopher Barker
Sturla Molden wrote:
> I would suggest using GotoBLAS instead of ATLAS.

> http://www.tacc.utexas.edu/tacc-projects/

That does look promising -- nay idea what the license is? They don't 
make it clear on the site (maybe it it is you set up a user account and 
download, but I'd rather know up front). The only reference I could find 
is from 2006:

http://www.utexas.edu/news/2006/04/12/tacc/

and in that, they refer to one of those annoying "free for academic and 
scientific use" clauses.

-Chris




-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2010-01-07 Thread Sturla Molden

> I also tried to Install numpy with intel mkl 9.1
> I still used gfortran for numpy installation as intel mkl 9.1 supports gnu
> compiler.

I would suggest using GotoBLAS instead of ATLAS. It is easier to build
then ATLAS (basically no configuration), and has even better performance
than MKL.

http://www.tacc.utexas.edu/tacc-projects/

S.M.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2010-01-06 Thread David Cournapeau
On Thu, Jan 7, 2010 at 11:20 AM, Xue (Sue) Yang
 wrote:

> This time, only one cpu was used.  Does it mean that our installed intel mkl
> 9.1 is not threaded?

You would have to consult the MKL documentation - I believe you can
control how many threads are used from an environment variable. Also,
the exact build commands depend on the version of the MKL, as its
libraries often change between versions.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] performance matrix multiplication vs. matlab

2010-01-06 Thread Xue (Sue) Yang
Hi David,

Thank you for the reply which is useful.

I also tried to Install numpy with intel mkl 9.1
I still used gfortran for numpy installation as intel mkl 9.1 supports gnu
compiler.

I only uncomment these lines for site.cfg in  site.cfg.example

[mkl]
library_dirs = /usr/physics/intel/mkl/lib/32
include_dirs = /usr/physics/intel/mkl/include
lapack_libs = mkl_lapack

then I tested the numpy with 

> python
>>import numpy
>>a = numpy.random.randn(6000, 6000)
>>numpy.dot(a, a)

This time, only one cpu was used.  Does it mean that our installed intel mkl
9.1 is not threaded?
I don't think so.  We have used it for openMP parallelization for quite a
while.

Thanks!

Sue


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2010-01-05 Thread David Cournapeau
Xue (Sue) Yang wrote:
> Hi,
> 
> I followed what I collected about installation of numpy with lapack and
> atlas and installed numpy on our desktop with RHEL4 and 4 cores.
> 
>> uname -a
> 
> Linux curie.physics.usyd.edu.au 2.6.9-89.0.15.ELsmp #1 SMP Sat Oct 10
> 05:59:16 EDT 2009 i686 i686 i386 GNU/Linux
> 
> I successfully installed lapack-3.1.1, atlas3.8.0 with fortran comfiler:
> gfortran, and numpy-1.3.0 with enthought-python distribution (python2.5).
> 
>> python
>>> import numpy
>>> a = numpy.random.randn(6000, 6000)
>>> numpy.dot(a, a)
> 
> Surprisingly, it only uses 2 cores instead of 4 cores.  Where and how should
> I set up the number of threads for numpy?

Atlas (at least your version, I don't know about 3.9.* series) does not 
support setting the number of threads dynamically - it is a compile time 
option. If the compile time option is indeed 4 threads, it may be that 
ATLAS decided that using 2 threads instead of 4 was more efficient.

You can find this info in atlas_buildinfo.h file (the ATL_NCPU CPP 
define). Note that you should not use atlas 3.8.0, as it has a number of 
serious bugs - you should use 3.8.3.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2010-01-05 Thread Xue (Sue) Yang
Hi,

I followed what I collected about installation of numpy with lapack and
atlas and installed numpy on our desktop with RHEL4 and 4 cores.

>uname -a

Linux curie.physics.usyd.edu.au 2.6.9-89.0.15.ELsmp #1 SMP Sat Oct 10
05:59:16 EDT 2009 i686 i686 i386 GNU/Linux

I successfully installed lapack-3.1.1, atlas3.8.0 with fortran comfiler:
gfortran, and numpy-1.3.0 with enthought-python distribution (python2.5).

> python
>>import numpy
>>a = numpy.random.randn(6000, 6000)
>>numpy.dot(a, a)

Surprisingly, it only uses 2 cores instead of 4 cores.  Where and how should
I set up the number of threads for numpy?

Thanks!

Dr. Xue (Sue) Yang
School of Physics, University of Sydney
Ph: 02 9351 6081
Email: x.y...@physics.usyd.edu.au



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-07-22 Thread Jonathan Taylor
Sorry.  I meant to update this thread after I had resolved my issue.
This was indeed one problem.  I had to set LD_LIBRARY_PATH.

I also had another odd problem that I will spell out here in hopes
that I save someone some trouble.  Specifically, one should be very
sure that the path to the blas that was compiled is correct when you
configure ATLAS because it does not indicate any problems if it is
not.   Specifically, I tried compiling blas with make -j3 to get all
my cores compiling at the same time but this actually caused a failure
that I did not notice.  It did create a temp_LINUX.a file in the right
place so I configured ATLAS against that.  Alas, many of the symbols
needed were not contained in this file as BLAS had failed to compile.
This was fairly hard to debug but once I got blas recompiled properly
without the -j 3 switch I was able to follow the rest of the steps and
everything works well.

Thanks,
Jonathan.

On Sun, Jul 19, 2009 at 11:35 PM, Nicolas Pinto wrote:
> Jonathan,
>
> What does "ldd 
> /home/jtaylor/lib/python2.5/site-packages/numpy/linalg/lapack_lite.so"
> say ?
>
> You need to make sure that it's using the libraries in /usr/local/lib.
> You can remove the ones in /usr/lib or "export
> LD_LIBRARY_PATH=/usr/local/lib/:$LD_LIBRARY_PATH".
>
> Hope it helps.
>
> Best,
>
> N
>
> On Fri, Jul 17, 2009 at 3:57 PM, Jonathan
> Taylor wrote:
>> Following these instructions I have the following problem when I
>> import numpy.  Does anyone know why this might be?
>>
>> Thanks,
>> Jonathan.
>>
> import numpy
>> Traceback (most recent call last):
>>  File "", line 1, in 
>>  File "/home/jtaylor/lib/python2.5/site-packages/numpy/__init__.py",
>> line 130, in 
>>    import add_newdocs
>>  File "/home/jtaylor/lib/python2.5/site-packages/numpy/add_newdocs.py",
>> line 9, in 
>>    from lib import add_newdoc
>>  File "/home/jtaylor/lib/python2.5/site-packages/numpy/lib/__init__.py",
>> line 13, in 
>>    from polynomial import *
>>  File "/home/jtaylor/lib/python2.5/site-packages/numpy/lib/polynomial.py",
>> line 18, in 
>>    from numpy.linalg import eigvals, lstsq
>>  File "/home/jtaylor/lib/python2.5/site-packages/numpy/linalg/__init__.py",
>> line 47, in 
>>    from linalg import *
>>  File "/home/jtaylor/lib/python2.5/site-packages/numpy/linalg/linalg.py",
>> line 22, in 
>>    from numpy.linalg import lapack_lite
>> ImportError: /usr/local/lib/libptcblas.so: undefined symbol: ATL_cpttrsm
>>
>>
>> On Sat, Jun 6, 2009 at 12:59 PM, Chris Colbert wrote:
>>> since there is demand, and someone already emailed me, I'll put what I
>>> did in this post. It pretty much follows whats on the scipy website,
>>> with a couple other things I gleaned from reading the ATLAS install
>>> guide:
>>>
>>> and here it goes, this is valid for Ubuntu 9.04 64-bit  (# starts a
>>> comment when working in the terminal)
>>>
>>>
>>> download lapack 3.2.1 http://www.netlib.org/lapack/lapack.tgz
>>> download atlas 3.8.3
>>> http://sourceforge.net/project/downloading.php?group_id=23725&filename=atlas3.8.3.tar.bz2&a=65663372
>>>
>>> create folder  /home/your-user-name/build/atlas   #this is where we build
>>> create folder /home/your-user-name/build/lapack #atlas and lapack
>>>
>>> extract the folder lapack-3.2.1 to /home/your-user-name/build/lapack
>>> extract the contents of atlas to /home/your-user-name/build/atlas
>>>
>>>
>>>
>>> now in the terminal:
>>>
>>> # remove g77 and get stuff we need
>>> sudo apt-get remove g77
>>> sudo apt-get install gfortran
>>> sudo apt-get install build-essential
>>> sudo apt-get install python-dev
>>> sudo apt-get install python-setuptools
>>> sudo easy_install nose
>>>
>>>
>>> # build lapack
>>> cd /home/your-user-name/build/lapack/lapack-3.2.1
>>> cp INSTALL/make.inc.gfortran make.inc
>>>
>>> gedit make.inc
>>> #
>>> #in the make.inc file make sure the line   OPTS = -O2 -fPIC -m64
>>> #and    NOOPTS = -O0 -fPIC -m64
>>> #the -m64 flags build 64-bit code, if you want 32-bit, simply leave
>>> #the -m64 flags out
>>> #
>>>
>>> cd SRC
>>>
>>> #this should build lapack without error
>>> make
>>>
>>>
>>>
>>> # build atlas
>>>
>>> cd /home/your-user-name/build/atlas
>>>
>>> #this is simply where we will build the atlas
>>> #libs, you can name it what you want
>>> mkdir Linux_X64SSE2
>>>
>>> cd Linux_X64SSE2
>>>
>>> #need to turn off cpu-throttling
>>> sudo cpufreq-selector -g performance
>>>
>>> #if you don't want 64bit code remove the -b 64 flag. replace the
>>> #number 2400 with your CPU frequency in MHZ
>>> #i.e. my cpu is 2.53 GHZ so i put 2530
>>> ../configure -b 64 -D c -DPentiumCPS=2400 -Fa  -alg -fPIC
>>> --with-netlib-lapack=/home/your-user-name/build/lapack/lapack-3.2.1/Lapack_LINUX.a
>>>
>>> #the configure step takes a bit, and should end without errors
>>>
>>>  #this takes a long time, go get some coffee, it should end without error
>>> make build
>>>
>>> #this will verify the build, also long running
>>> make check
>>>
>>> #this will test the performa

Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-07-19 Thread Nicolas Pinto
Jonathan,

What does "ldd 
/home/jtaylor/lib/python2.5/site-packages/numpy/linalg/lapack_lite.so"
say ?

You need to make sure that it's using the libraries in /usr/local/lib.
You can remove the ones in /usr/lib or "export
LD_LIBRARY_PATH=/usr/local/lib/:$LD_LIBRARY_PATH".

Hope it helps.

Best,

N

On Fri, Jul 17, 2009 at 3:57 PM, Jonathan
Taylor wrote:
> Following these instructions I have the following problem when I
> import numpy.  Does anyone know why this might be?
>
> Thanks,
> Jonathan.
>
 import numpy
> Traceback (most recent call last):
>  File "", line 1, in 
>  File "/home/jtaylor/lib/python2.5/site-packages/numpy/__init__.py",
> line 130, in 
>    import add_newdocs
>  File "/home/jtaylor/lib/python2.5/site-packages/numpy/add_newdocs.py",
> line 9, in 
>    from lib import add_newdoc
>  File "/home/jtaylor/lib/python2.5/site-packages/numpy/lib/__init__.py",
> line 13, in 
>    from polynomial import *
>  File "/home/jtaylor/lib/python2.5/site-packages/numpy/lib/polynomial.py",
> line 18, in 
>    from numpy.linalg import eigvals, lstsq
>  File "/home/jtaylor/lib/python2.5/site-packages/numpy/linalg/__init__.py",
> line 47, in 
>    from linalg import *
>  File "/home/jtaylor/lib/python2.5/site-packages/numpy/linalg/linalg.py",
> line 22, in 
>    from numpy.linalg import lapack_lite
> ImportError: /usr/local/lib/libptcblas.so: undefined symbol: ATL_cpttrsm
>
>
> On Sat, Jun 6, 2009 at 12:59 PM, Chris Colbert wrote:
>> since there is demand, and someone already emailed me, I'll put what I
>> did in this post. It pretty much follows whats on the scipy website,
>> with a couple other things I gleaned from reading the ATLAS install
>> guide:
>>
>> and here it goes, this is valid for Ubuntu 9.04 64-bit  (# starts a
>> comment when working in the terminal)
>>
>>
>> download lapack 3.2.1 http://www.netlib.org/lapack/lapack.tgz
>> download atlas 3.8.3
>> http://sourceforge.net/project/downloading.php?group_id=23725&filename=atlas3.8.3.tar.bz2&a=65663372
>>
>> create folder  /home/your-user-name/build/atlas   #this is where we build
>> create folder /home/your-user-name/build/lapack #atlas and lapack
>>
>> extract the folder lapack-3.2.1 to /home/your-user-name/build/lapack
>> extract the contents of atlas to /home/your-user-name/build/atlas
>>
>>
>>
>> now in the terminal:
>>
>> # remove g77 and get stuff we need
>> sudo apt-get remove g77
>> sudo apt-get install gfortran
>> sudo apt-get install build-essential
>> sudo apt-get install python-dev
>> sudo apt-get install python-setuptools
>> sudo easy_install nose
>>
>>
>> # build lapack
>> cd /home/your-user-name/build/lapack/lapack-3.2.1
>> cp INSTALL/make.inc.gfortran make.inc
>>
>> gedit make.inc
>> #
>> #in the make.inc file make sure the line   OPTS = -O2 -fPIC -m64
>> #and    NOOPTS = -O0 -fPIC -m64
>> #the -m64 flags build 64-bit code, if you want 32-bit, simply leave
>> #the -m64 flags out
>> #
>>
>> cd SRC
>>
>> #this should build lapack without error
>> make
>>
>>
>>
>> # build atlas
>>
>> cd /home/your-user-name/build/atlas
>>
>> #this is simply where we will build the atlas
>> #libs, you can name it what you want
>> mkdir Linux_X64SSE2
>>
>> cd Linux_X64SSE2
>>
>> #need to turn off cpu-throttling
>> sudo cpufreq-selector -g performance
>>
>> #if you don't want 64bit code remove the -b 64 flag. replace the
>> #number 2400 with your CPU frequency in MHZ
>> #i.e. my cpu is 2.53 GHZ so i put 2530
>> ../configure -b 64 -D c -DPentiumCPS=2400 -Fa  -alg -fPIC
>> --with-netlib-lapack=/home/your-user-name/build/lapack/lapack-3.2.1/Lapack_LINUX.a
>>
>> #the configure step takes a bit, and should end without errors
>>
>>  #this takes a long time, go get some coffee, it should end without error
>> make build
>>
>> #this will verify the build, also long running
>> make check
>>
>> #this will test the performance of your build and give you feedback on
>> #it. your numbers should be close to the test numbers at the end
>> make time
>>
>> cd lib
>>
>> #builds single threaded .so's
>> make shared
>>
>> #builds multithreaded .so's
>> make ptshared
>>
>> #copies all of the atlas libs (and the lapack lib built with atlas)
>> #to our lib dir
>> sudo  cp  *.so  /usr/local/lib/
>>
>>
>>
>> #now we need to get and build numpy
>>
>> download numpy 1.3.0
>> http://sourceforge.net/project/downloading.php?group_id=1369&filename=numpy-1.3.0.tar.gz&a=93506515
>>
>> extract the folder numpy-1.3.0 to /home/your-user-name/build
>>
>> #in the terminal
>>
>> cd /home/your-user-name/build/numpy-1.3.0
>> cp site.cfg.example site.cfg
>>
>> gedit site.cfg
>> ###
>> # in site.cfg uncomment the following lines and make them look like these
>> [DEFAULT]
>> library_dirs = /usr/local/lib
>> include_dirs = /usr/local/include
>>
>> [blas_opt]
>> libraries = ptf77blas, ptcblas, atlas
>>
>> [lapack_opt]
>> libraries = lapack, ptf77blas, ptcblas, atlas
>> ###
>>

Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-07-17 Thread David Warde-Farley
On 17-Jul-09, at 4:20 PM, David Warde-Farley wrote:

> It doesn't look like you ATLAS is linked together properly,
> specifically fblas. What fortran compiler are you using?


 > > ImportError: /usr/local/lib/libptcblas.so: undefined symbol:  
ATL_cpttrsm

Errr, nevermind. I seem to have very selective vision and saw that as  
'ptf77blas.so'.

Suffice it to say it's an ATLAS build problem and you seem to be doing  
everything right given the commands. You remembered ldconfig?

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-07-17 Thread David Warde-Farley
On 17-Jul-09, at 3:57 PM, Jonathan Taylor wrote:

>  File "/home/jtaylor/lib/python2.5/site-packages/numpy/linalg/ 
> __init__.py",
> line 47, in 
>from linalg import *
>  File "/home/jtaylor/lib/python2.5/site-packages/numpy/linalg/ 
> linalg.py",
> line 22, in 
>from numpy.linalg import lapack_lite
> ImportError: /usr/local/lib/libptcblas.so: undefined symbol:  
> ATL_cpttrsm

It doesn't look like you ATLAS is linked together properly,  
specifically fblas. What fortran compiler are you using?

What does ldd /usr/local/lib/libptcblas.so say?

I seem to recall this sort of thing happening when g77 and gfortran  
get mixed up together...

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-07-17 Thread Jonathan Taylor
Following these instructions I have the following problem when I
import numpy.  Does anyone know why this might be?

Thanks,
Jonathan.

>>> import numpy
Traceback (most recent call last):
  File "", line 1, in 
  File "/home/jtaylor/lib/python2.5/site-packages/numpy/__init__.py",
line 130, in 
import add_newdocs
  File "/home/jtaylor/lib/python2.5/site-packages/numpy/add_newdocs.py",
line 9, in 
from lib import add_newdoc
  File "/home/jtaylor/lib/python2.5/site-packages/numpy/lib/__init__.py",
line 13, in 
from polynomial import *
  File "/home/jtaylor/lib/python2.5/site-packages/numpy/lib/polynomial.py",
line 18, in 
from numpy.linalg import eigvals, lstsq
  File "/home/jtaylor/lib/python2.5/site-packages/numpy/linalg/__init__.py",
line 47, in 
from linalg import *
  File "/home/jtaylor/lib/python2.5/site-packages/numpy/linalg/linalg.py",
line 22, in 
from numpy.linalg import lapack_lite
ImportError: /usr/local/lib/libptcblas.so: undefined symbol: ATL_cpttrsm


On Sat, Jun 6, 2009 at 12:59 PM, Chris Colbert wrote:
> since there is demand, and someone already emailed me, I'll put what I
> did in this post. It pretty much follows whats on the scipy website,
> with a couple other things I gleaned from reading the ATLAS install
> guide:
>
> and here it goes, this is valid for Ubuntu 9.04 64-bit  (# starts a
> comment when working in the terminal)
>
>
> download lapack 3.2.1 http://www.netlib.org/lapack/lapack.tgz
> download atlas 3.8.3
> http://sourceforge.net/project/downloading.php?group_id=23725&filename=atlas3.8.3.tar.bz2&a=65663372
>
> create folder  /home/your-user-name/build/atlas   #this is where we build
> create folder /home/your-user-name/build/lapack #atlas and lapack
>
> extract the folder lapack-3.2.1 to /home/your-user-name/build/lapack
> extract the contents of atlas to /home/your-user-name/build/atlas
>
>
>
> now in the terminal:
>
> # remove g77 and get stuff we need
> sudo apt-get remove g77
> sudo apt-get install gfortran
> sudo apt-get install build-essential
> sudo apt-get install python-dev
> sudo apt-get install python-setuptools
> sudo easy_install nose
>
>
> # build lapack
> cd /home/your-user-name/build/lapack/lapack-3.2.1
> cp INSTALL/make.inc.gfortran make.inc
>
> gedit make.inc
> #
> #in the make.inc file make sure the line   OPTS = -O2 -fPIC -m64
> #and    NOOPTS = -O0 -fPIC -m64
> #the -m64 flags build 64-bit code, if you want 32-bit, simply leave
> #the -m64 flags out
> #
>
> cd SRC
>
> #this should build lapack without error
> make
>
>
>
> # build atlas
>
> cd /home/your-user-name/build/atlas
>
> #this is simply where we will build the atlas
> #libs, you can name it what you want
> mkdir Linux_X64SSE2
>
> cd Linux_X64SSE2
>
> #need to turn off cpu-throttling
> sudo cpufreq-selector -g performance
>
> #if you don't want 64bit code remove the -b 64 flag. replace the
> #number 2400 with your CPU frequency in MHZ
> #i.e. my cpu is 2.53 GHZ so i put 2530
> ../configure -b 64 -D c -DPentiumCPS=2400 -Fa  -alg -fPIC
> --with-netlib-lapack=/home/your-user-name/build/lapack/lapack-3.2.1/Lapack_LINUX.a
>
> #the configure step takes a bit, and should end without errors
>
>  #this takes a long time, go get some coffee, it should end without error
> make build
>
> #this will verify the build, also long running
> make check
>
> #this will test the performance of your build and give you feedback on
> #it. your numbers should be close to the test numbers at the end
> make time
>
> cd lib
>
> #builds single threaded .so's
> make shared
>
> #builds multithreaded .so's
> make ptshared
>
> #copies all of the atlas libs (and the lapack lib built with atlas)
> #to our lib dir
> sudo  cp  *.so  /usr/local/lib/
>
>
>
> #now we need to get and build numpy
>
> download numpy 1.3.0
> http://sourceforge.net/project/downloading.php?group_id=1369&filename=numpy-1.3.0.tar.gz&a=93506515
>
> extract the folder numpy-1.3.0 to /home/your-user-name/build
>
> #in the terminal
>
> cd /home/your-user-name/build/numpy-1.3.0
> cp site.cfg.example site.cfg
>
> gedit site.cfg
> ###
> # in site.cfg uncomment the following lines and make them look like these
> [DEFAULT]
> library_dirs = /usr/local/lib
> include_dirs = /usr/local/include
>
> [blas_opt]
> libraries = ptf77blas, ptcblas, atlas
>
> [lapack_opt]
> libraries = lapack, ptf77blas, ptcblas, atlas
> ###
> #if you want single threaded libs, uncomment those lines instead
>
>
> #build numpy- should end without error
> python setup.py build
>
> #install numpy
> python setup.py install
>
> cd /home
>
> sudo ldconfig
>
> python
>>>import numpy
>>>numpy.test()   #this should run with no errors (skipped tests and 
>>>known-fails are ok)
>>>a = numpy.random.randn(6000, 6000)
>>>numpy.dot(a, a)     # look at your cpu monitor and verify all cpu cores are 
>>>at 100% if you built with threads
>
>
> Celebrate with a beer!
>
>
> Ch

Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-07-14 Thread Keith Goodman
On Sun, Jun 7, 2009 at 2:52 AM, Gabriel Beckers wrote:
> OK, perhaps I drank that beer too soon...
>
> Now, numpy.test() hangs at:
>
> test_pinv (test_defmatrix.TestProperties) ...
>
> So perhaps something is wrong with ATLAS, even though the building went
> fine, and "make check" and "make ptcheck" reported no errors.

I ran into the same problem on 32-bit debian squeeze.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-11 Thread Jason Rennie
On Mon, Jun 8, 2009 at 11:02 AM, David Cournapeau <
da...@ar.media.kyoto-u.ac.jp> wrote:

> Isn't it true for any general framework who enjoys some popularity :)


Yup :)

I think there are cases where gradient methods are not applicable
> (latent models where the complete data Y cannot be split into
> observations-hidden (O, H) variables), although I am not sure that's a
> very common case in machine learning,
>

I won't argue with that.  My bias has certainly been strongly influenced by
the type of problems I've been exposed to.  It'd be interesting to hear of a
problem where one can't separate observed/hidden variables :)

Cheers,

Jason

-- 
Jason Rennie
Research Scientist, ITA Software
617-714-2645
http://www.itasoftware.com/
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-10 Thread Benoit Jacob
Hi David,

2009/6/9 David Cournapeau :
> Hi Benoit,
>
> Benoit Jacob wrote:
>> No, because _we_ are serious about compilation times, unlike other c++
>> template libraries. But granted, compilation times are not as short as
>> a plain C library either.
>>
>
> I concede it is not as bad as the heavily templated libraries in boost.
> But C++ is just horribly slow to compile, at least with g++ - in scipy,
> half of the compilation time is spent for a couple of C++ files which
> uses simple templates. And the compiler takes a lot of memory during
> compilation (~ 300 Mb per file - that's a problem because I rely a lot
> on VM to build numpy/scipy binaries).

Well, I can't comment on other libraries that I don't know. It is true
that compilation time and memory usage in C++ templated code will
never be as low as in C compilation, and can easily go awry if the c++
programmer isn't careful. Templates are really a scripting language
for the compiler and like in any (turing complete) language you can
always write a program that takes long to "execute".

>
>> Eigen doesn't _require_ any SIMD instruction set although it can use
>> SSE / AltiVec if enabled.
>>
>
> If SSE is not enabled, my (very limited) tests show that eigen does not
> perform as well as a stock debian ATLAS on the benchmarks given by
> eigen. For example:

Of course! The whole point is that ATLAS is a binary library with its
own SSE code, so it is still able to use SSE even if your program was
compiled without SSE enabled: ATLAS will run its own platform check at
runtime.

So it's not a surprise that ATLAS with SSE is faster than Eigen without SSE.

By the way this was shown in our benchmark already:
http://eigen.tuxfamily.org/index.php?title=Benchmark
Scroll down to matrix matrix product. The gray curve "eigen2_novec" is
eigen without SSE.

>
>  g++ benchBlasGemm.cpp -I .. -lblas -O2 -DNDEBUG && ./a.out 300
> cblas: 0.034222 (0.788 GFlops/s)
> eigen : 0.0863581 (0.312 GFlops/s)
> eigen : 0.121259 (0.222 GFlops/s)

and just out of curiosity, what are the 2 eigen lines ?

>
> g++ benchBlasGemm.cpp -I .. -lblas -O2 -DNDEBUG -msse2 && ./a.out 300
> cblas: 0.035438 (0.761 GFlops/s)
> eigen : 0.0182271 (1.481 GFlops/s)
> eigen : 0.0860961 (0.313 GFlops/s)
>
> (on a PIV, which may not be very representative of current architectures)
>
>> It is true that with Eigen this is set up at build time, but this is
>> only because it is squarely _not_ Eigen's job to do runtime platform
>> checks. Eigen isn't a binary library. If you want a runtime platform
>> switch, just compile your critical Eigen code twice, one with SSE one
>> without, and do the platform check in your own app. The good thing
>> here is that Eigen makes sure that the ABI is independent of whether
>> vectorization is enabled.
>>
>
> I understand that it is not a goal of eigen, and that should be the
> application's job. It is just that MKL does it automatically, and doing
> it in a cross platform way in the context of python extensions is quite
> hard because of various linking strategies on different OS.

Yes, I understand that. MKL is not only a math library, it comes with
embedded threading library and hardware detection routines.

Cheers,
Benoit
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-10 Thread Benoit Jacob
2009/6/9 Charles R Harris :
>> >   - heavily expression-template-based C++, meaning compilation takes
>> > ages
>>
>> No, because _we_ are serious about compilation times, unlike other c++
>> template libraries. But granted, compilation times are not as short as
>> a plain C library either.
>
> I wonder if it is possible to have a compiler/parser that does nothing but
> translate templates into c? Say, something written in python ;) Name
> mangling would be a problem but could perhaps be simplified for the somewhat
> limited space needed for numpy/scipy.

That's not possible: templates are (mostly) the only thing in C++ that
can't be translated into C.

In a C++ template library, templated types are types that are not
determined by the library itself, but will be determined by the
application that uses the library. So the template library itself
can't be translated into C because there's no way in C to allow "not
yet determined" types. In some trivial cases that can be done with
macros, but C++ templates go farther than that, they are Turing
complete. A C++ type is a tree and using template expressions you can
perform any operation on these trees at compilation time. In Eigen, we
use these trees to represent arithmetic expressions like "a+b+c".
That's the paradigm known as "expression templates".

Benoit
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-09 Thread David Cournapeau
David Warde-Farley wrote:
> On 9-Jun-09, at 3:54 AM, David Cournapeau wrote:
>
>   
>> For example, what ML people call PCA is called Karhunen Loéve in  
>> signal
>> processing, and the concepts are quite similar.
>> 
>
>
> Yup. This seems to be a nice set of review notes:
>
>   http://www.ece.rutgers.edu/~orfanidi/ece525/svd.pdf
>   

This looks indeed like a very nice review from a signal processing
approach. I never took the time to understand the
similarities/differences/connections between traditional SP approaches
and the machine learning approach. I wonder if the subspaces methods ala
PENCIL/MUSIC and co have a (useful) interpretation in a more ML
approach, I never really thought about it. I guess other people had :)

> And going further than just PCA/KLT, tying it together with maximum  
> likelihood factor analysis/linear dynamical systems/hidden Markov  
> models,
>
>   http://www.cs.toronto.edu/~roweis/papers/NC110201.pdf
>   

As much as I like this paper, I always felt that you miss a lot of
insights when considering PCA only from a purely statistical POV. I
really like the consideration of PCA within a function approximation POV
(the chapter 9 of the Mallat book on wavelet is cristal clear, for
example, and it is based on all those cool functional spaces theory
likes Besov space).

cheers,

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-09 Thread David Cournapeau
Hi Benoit,

Benoit Jacob wrote:
> No, because _we_ are serious about compilation times, unlike other c++
> template libraries. But granted, compilation times are not as short as
> a plain C library either.
>   

I concede it is not as bad as the heavily templated libraries in boost.
But C++ is just horribly slow to compile, at least with g++ - in scipy,
half of the compilation time is spent for a couple of C++ files which
uses simple templates. And the compiler takes a lot of memory during
compilation (~ 300 Mb per file - that's a problem because I rely a lot
on VM to build numpy/scipy binaries).

> Eigen doesn't _require_ any SIMD instruction set although it can use
> SSE / AltiVec if enabled.
>   

If SSE is not enabled, my (very limited) tests show that eigen does not
perform as well as a stock debian ATLAS on the benchmarks given by
eigen. For example:

 g++ benchBlasGemm.cpp -I .. -lblas -O2 -DNDEBUG && ./a.out 300
cblas: 0.034222 (0.788 GFlops/s)
eigen : 0.0863581 (0.312 GFlops/s)
eigen : 0.121259 (0.222 GFlops/s)

g++ benchBlasGemm.cpp -I .. -lblas -O2 -DNDEBUG -msse2 && ./a.out 300
cblas: 0.035438 (0.761 GFlops/s)
eigen : 0.0182271 (1.481 GFlops/s)
eigen : 0.0860961 (0.313 GFlops/s)

(on a PIV, which may not be very representative of current architectures)

> It is true that with Eigen this is set up at build time, but this is
> only because it is squarely _not_ Eigen's job to do runtime platform
> checks. Eigen isn't a binary library. If you want a runtime platform
> switch, just compile your critical Eigen code twice, one with SSE one
> without, and do the platform check in your own app. The good thing
> here is that Eigen makes sure that the ABI is independent of whether
> vectorization is enabled.
>   

I understand that it is not a goal of eigen, and that should be the
application's job. It is just that MKL does it automatically, and doing
it in a cross platform way in the context of python extensions is quite
hard because of various linking strategies on different OS.

cheers,

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-09 Thread Charles R Harris
On Tue, Jun 9, 2009 at 7:46 PM, Benoit Jacob wrote:

> Hi,
>
> I'm one of the Eigen developers and was pointed to your discussion. I
> just want to clarify a few things for future reference (not trying to
> get you to use Eigen):
>
> > No, eigen does not provide a (complete) BLAS/LAPACK interface.
>
> True,
>
> > I don't know if that's even a goal of eigen
>
> Not a goal indeed, though there's agreement that such a bridge would
> be a nice add-on. (Would be a one-directional bridge though. You can't
> express with BLAS/LAPACK all what you can express with the Eigen API).
>
> > (it started as a project for KDE, to support high performance core
> > computations for things like spreadsheet and co).
>
> Yes, that's how it started 3 years ago. A lot changed since, though. See
> http://eigen.tuxfamily.org/index.php?title=Main_Page#Projects_using_Eigen
>
> > Eigen is:
> >   - not mature.
>
> Fair enough
>
> >   - heavily expression-template-based C++, meaning compilation takes ages
>
> No, because _we_ are serious about compilation times, unlike other c++
> template libraries. But granted, compilation times are not as short as
> a plain C library either.
>

I wonder if it is possible to have a compiler/parser that does nothing but
translate templates into c? Say, something written in python ;) Name
mangling would be a problem but could perhaps be simplified for the somewhat
limited space needed for numpy/scipy.

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-09 Thread Benoit Jacob
Hi,

I'm one of the Eigen developers and was pointed to your discussion. I
just want to clarify a few things for future reference (not trying to
get you to use Eigen):

> No, eigen does not provide a (complete) BLAS/LAPACK interface.

True,

> I don't know if that's even a goal of eigen

Not a goal indeed, though there's agreement that such a bridge would
be a nice add-on. (Would be a one-directional bridge though. You can't
express with BLAS/LAPACK all what you can express with the Eigen API).

> (it started as a project for KDE, to support high performance core
> computations for things like spreadsheet and co).

Yes, that's how it started 3 years ago. A lot changed since, though. See
http://eigen.tuxfamily.org/index.php?title=Main_Page#Projects_using_Eigen

> Eigen is:
>   - not mature.

Fair enough

>   - heavily expression-template-based C++, meaning compilation takes ages

No, because _we_ are serious about compilation times, unlike other c++
template libraries. But granted, compilation times are not as short as
a plain C library either.

> + esoteric, impossible to decypher compilation errors.

Try it ;) See e.g. this comment:
http://www.macresearch.org/interview-eigen-matrix-library#comment-14667

>  - SSE dependency harcoded, since it is setup at build time. That's
> going backward IMHO - I would rather see a numpy/scipy which can load
> the optimized code at runtime.

Eigen doesn't _require_ any SIMD instruction set although it can use
SSE / AltiVec if enabled.

It is true that with Eigen this is set up at build time, but this is
only because it is squarely _not_ Eigen's job to do runtime platform
checks. Eigen isn't a binary library. If you want a runtime platform
switch, just compile your critical Eigen code twice, one with SSE one
without, and do the platform check in your own app. The good thing
here is that Eigen makes sure that the ABI is independent of whether
vectorization is enabled.

And to reply to Matthieu's mail:

> I would add that it relies on C++ compiler extensions (the restrict
> keyword) as does blitz. You unfortunately can't expect every compiler
> to support it unless the C++ committee finally adds it to the
> standard.

currently we have:

#define EIGEN_RESTRICT __restrict

This is ready to be replaced by an empty symbol if some compiler
doesn't support restrict. The only reason why we didn't do this is
that all compilers we've encountered so far support restrict.

Cheers,
Benoit
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-09 Thread David Warde-Farley
On 9-Jun-09, at 3:54 AM, David Cournapeau wrote:

> For example, what ML people call PCA is called Karhunen Loéve in  
> signal
> processing, and the concepts are quite similar.


Yup. This seems to be a nice set of review notes:

http://www.ece.rutgers.edu/~orfanidi/ece525/svd.pdf

And going further than just PCA/KLT, tying it together with maximum  
likelihood factor analysis/linear dynamical systems/hidden Markov  
models,

http://www.cs.toronto.edu/~roweis/papers/NC110201.pdf

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-09 Thread Stéfan van der Walt
2009/6/9 David Cournapeau :
> Anyway, the book from Bishop is a pretty good reference by one of the
> leading researcher:
>
> http://research.microsoft.com/en-us/um/people/cmbishop/prml/
>
> It can be read without much background besides basic 1st year
> calculus/linear algebra.

Bishop's book could be confusing at times, so I would also recommend
going back to the original papers.  It is sometimes easier to learn
*with* researchers than from them!

Cheers
Stéfan
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-09 Thread David Cournapeau
David Cournapeau wrote:
>
> I think the biggest problem is the 'babel tower' aspect of machine
> learning (the expression is from David H. Wolpert I believe), and
> practitioners in different subfields often use totally different words
> for more or less the same concepts (and many keep being rediscovered).
> For example, what ML people call PCA is called Karhunen Loéve in signal
> processing, and the concepts are quite similar.
>
> Anyway, the book from Bishop is a pretty good reference by one of the
> leading researcher:
>
> http://research.microsoft.com/en-us/um/people/cmbishop/prml/
>   

Should have mentioned that it is the same Bishop as mentioned by
Matthieu, and that chapter 12 deals with latent models with continuous
latent variable, which is one way to consider PCA in a probabilistic
framework.

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-09 Thread David Cournapeau
Robin wrote:
> On Mon, Jun 8, 2009 at 7:14 PM, David Warde-Farley wrote:
>   
>> On 8-Jun-09, at 8:33 AM, Jason Rennie wrote:
>>
>> Note that EM can be very slow to converge:
>>
>> That's absolutely true, but EM for PCA can be a life saver in cases where
>> diagonalizing (or even computing) the full covariance matrix is not a
>> realistic option. Diagonalization can be a lot of wasted effort if all you
>> care about are a few leading eigenvectors. EM also lets you deal with
>> missing values in a principled way, which I don't think you can do with
>> standard SVD.
>>
>> EM certainly isn't a magic bullet but there are circumstances where it's
>> appropriate. I'm a big fan of the ECG paper too. :)
>> 
>
> Hi,
>
> I've been following this with interest... although I'm not really
> familiar with the area. At the risk of drifting further off topic I
> wondered if anyone could recommend an accessible review of these kind
> of dimensionality reduction techniques... I am familiar with PCA and
> know of diffusion maps and ICA and others, but I'd never heard of EM
> and I don't really have any idea how they relate to each other and
> which might be better for one job or the other... so some sort of
> primer would be really handy.
>   

I think the biggest problem is the 'babel tower' aspect of machine
learning (the expression is from David H. Wolpert I believe), and
practitioners in different subfields often use totally different words
for more or less the same concepts (and many keep being rediscovered).
For example, what ML people call PCA is called Karhunen Loéve in signal
processing, and the concepts are quite similar.

Anyway, the book from Bishop is a pretty good reference by one of the
leading researcher:

http://research.microsoft.com/en-us/um/people/cmbishop/prml/

It can be read without much background besides basic 1st year
calculus/linear algebra.

cheers,

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-09 Thread Matthieu Brucher
2009/6/9 Robin :
> On Mon, Jun 8, 2009 at 7:14 PM, David Warde-Farley wrote:
>>
>> On 8-Jun-09, at 8:33 AM, Jason Rennie wrote:
>>
>> Note that EM can be very slow to converge:
>>
>> That's absolutely true, but EM for PCA can be a life saver in cases where
>> diagonalizing (or even computing) the full covariance matrix is not a
>> realistic option. Diagonalization can be a lot of wasted effort if all you
>> care about are a few leading eigenvectors. EM also lets you deal with
>> missing values in a principled way, which I don't think you can do with
>> standard SVD.
>>
>> EM certainly isn't a magic bullet but there are circumstances where it's
>> appropriate. I'm a big fan of the ECG paper too. :)
>
> Hi,
>
> I've been following this with interest... although I'm not really
> familiar with the area. At the risk of drifting further off topic I
> wondered if anyone could recommend an accessible review of these kind
> of dimensionality reduction techniques... I am familiar with PCA and
> know of diffusion maps and ICA and others, but I'd never heard of EM
> and I don't really have any idea how they relate to each other and
> which might be better for one job or the other... so some sort of
> primer would be really handy.

Hi,

Check Ch. Bishop publication on Probabilistic Principal Components
Analysis, you have there the parallel between the two (EM is in fact
just a way of computing PPCA, and with some Gaussian assumptions, you
get PCA).

Matthieu
-- 
Information System Engineer, Ph.D.
Website: http://matthieu-brucher.developpez.com/
Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92
LinkedIn: http://www.linkedin.com/in/matthieubrucher
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-09 Thread Robin
On Mon, Jun 8, 2009 at 7:14 PM, David Warde-Farley wrote:
>
> On 8-Jun-09, at 8:33 AM, Jason Rennie wrote:
>
> Note that EM can be very slow to converge:
>
> That's absolutely true, but EM for PCA can be a life saver in cases where
> diagonalizing (or even computing) the full covariance matrix is not a
> realistic option. Diagonalization can be a lot of wasted effort if all you
> care about are a few leading eigenvectors. EM also lets you deal with
> missing values in a principled way, which I don't think you can do with
> standard SVD.
>
> EM certainly isn't a magic bullet but there are circumstances where it's
> appropriate. I'm a big fan of the ECG paper too. :)

Hi,

I've been following this with interest... although I'm not really
familiar with the area. At the risk of drifting further off topic I
wondered if anyone could recommend an accessible review of these kind
of dimensionality reduction techniques... I am familiar with PCA and
know of diffusion maps and ICA and others, but I'd never heard of EM
and I don't really have any idea how they relate to each other and
which might be better for one job or the other... so some sort of
primer would be really handy.

Cheers

Robin
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-08 Thread David Warde-Farley
On 8-Jun-09, at 8:33 AM, Jason Rennie wrote:Note that EM can be very slow to converge:That's absolutely true, but EM for PCA can be a life saver in cases where diagonalizing (or even computing) the full covariance matrix is not a realistic option. Diagonalization can be a lot of wasted effort if all you care about are a few leading eigenvectors. EM also lets you deal with missing values in a principled way, which I don't think you can do with standard SVD.EM certainly isn't a magic bullet but there are circumstances where it's appropriate. I'm a big fan of the ECG paper too. :)David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-08 Thread David Cournapeau
Jason Rennie wrote:
>
> I hung-out in the machine learning community appx. 1999-2007 and
> thought the Salakhutdinov work was extremely refreshing to see after
> listening to no end of papers applying EM to whatever was the hot
> topic at the time. :)

Isn't it true for any general framework who enjoys some popularity :)

>   I've certainly seen/heard about various fixes to EM, but I haven't
> seen convincing reason(s) to prefer it over proper gradient
> descent/hill climbing algorithms (besides its present-ability and ease
> of implementation).

I think there are cases where gradient methods are not applicable
(latent models where the complete data Y cannot be split into
observations-hidden (O, H) variables), although I am not sure that's a
very common case in machine learning,

cheers,

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-08 Thread Jason Rennie
On Mon, Jun 8, 2009 at 8:55 AM, David Cournapeau <
da...@ar.media.kyoto-u.ac.jp> wrote:

> I think it depends on what you are doing - EM is used for 'real' work
> too, after all :)


Certainly, but EM is really just a mediocre gradient descent/hill climbing
algorithm that is relatively easy to implement.

Thanks for the link, I was not aware of this work. What is the
> difference between the ECG method and the method proposed by Lange in
> [1] ? To avoid 'local trapping' of the parameter in EM methods,
> recursive EM [2] may also be a promising method, also it seems to me
> that it has not been used so much, but I may well be wrong (I have seen
> several people using a simplified version of it without much theoretical
> consideration in speech processing).


I hung-out in the machine learning community appx. 1999-2007 and thought the
Salakhutdinov work was extremely refreshing to see after listening to no end
of papers applying EM to whatever was the hot topic at the time. :)  I've
certainly seen/heard about various fixes to EM, but I haven't seen
convincing reason(s) to prefer it over proper gradient descent/hill climbing
algorithms (besides its present-ability and ease of implementation).

Cheers,

Jason

-- 
Jason Rennie
Research Scientist, ITA Software
617-714-2645
http://www.itasoftware.com/
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-08 Thread Gael Varoquaux
On Mon, Jun 08, 2009 at 06:28:06AM -0700, Keith Goodman wrote:
> On Mon, Jun 8, 2009 at 6:17 AM, Gael Varoquaux
>  wrote:
> > On Mon, Jun 08, 2009 at 09:02:12AM -0400, josef.p...@gmail.com wrote:
> >> whats the actual shape of the array/data you run your PCA on.

> > 50 000 dimensions, 820 datapoints.

> Have you tried shuffling each time series, performing PCA, looking at
> the magnitude of the largest eigenvalue, then repeating many times?
> That will give you an idea of how large the noise can be. Then you can
> see how many eigenvectors of the unshuffled data have eigenvalues
> greater than the noise. It would be kind of the empirical approach to
> random matrix theory.

Yes, that's the kind of things that is done in the paper I pointed out
and I use to infer the number of PCAs I retain.

Gaël
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-08 Thread Keith Goodman
On Mon, Jun 8, 2009 at 6:17 AM, Gael Varoquaux
 wrote:
> On Mon, Jun 08, 2009 at 09:02:12AM -0400, josef.p...@gmail.com wrote:
>> whats the actual shape of the array/data you run your PCA on.
>
> 50 000 dimensions, 820 datapoints.

Have you tried shuffling each time series, performing PCA, looking at
the magnitude of the largest eigenvalue, then repeating many times?
That will give you an idea of how large the noise can be. Then you can
see how many eigenvectors of the unshuffled data have eigenvalues
greater than the noise. It would be kind of the empirical approach to
random matrix theory.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-08 Thread Matthieu Brucher
2009/6/8 Gael Varoquaux :
> On Mon, Jun 08, 2009 at 09:02:12AM -0400, josef.p...@gmail.com wrote:
>> whats the actual shape of the array/data you run your PCA on.
>
> 50 000 dimensions, 820 datapoints.

You definitely can't expect to find 50 meaningfull PCs. It's
impossible to robustly get them with less than thousand points!

>> Number of time periods, size of cross section at point in time?
>
> I am not sure what the question means. The data is sampled at 0.5Hz.
>
> Gaël
> ___
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
Information System Engineer, Ph.D.
Website: http://matthieu-brucher.developpez.com/
Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92
LinkedIn: http://www.linkedin.com/in/matthieubrucher
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-08 Thread Gael Varoquaux
On Mon, Jun 08, 2009 at 09:02:12AM -0400, josef.p...@gmail.com wrote:
> whats the actual shape of the array/data you run your PCA on.

50 000 dimensions, 820 datapoints.

> Number of time periods, size of cross section at point in time?

I am not sure what the question means. The data is sampled at 0.5Hz.

Gaël
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-08 Thread David Cournapeau
Jason Rennie wrote:
> Note that EM can be very slow to converge:
>
> http://www.cs.toronto.edu/~roweis/papers/emecgicml03.pdf
> 
>
> EM is great for churning-out papers, not so great for getting real
> work done.

I think it depends on what you are doing - EM is used for 'real' work
too, after all :)

> Conjugate gradient is a much better tool, at least in my (and
> Salakhutdinov's) experience.

Thanks for the link, I was not aware of this work. What is the
difference between the ECG method and the method proposed by Lange in
[1] ? To avoid 'local trapping' of the parameter in EM methods,
recursive EM [2] may also be a promising method, also it seems to me
that it has not been used so much, but I may well be wrong (I have seen
several people using a simplified version of it without much theoretical
consideration in speech processing).

cheers,

David

[1] "A gradient algorithm locally equivalent to the EM algorithm", in
Journal of the Royal Statistical Society. Series B. Methodological,
1995, vol. 57, n^o 2, pp. 425-437
[2] "Online EM Algorithm for Latent Data Models", by: Olivier Cappe;,
Eric Moulines, in the Journal of the Royal Statistical Society Series B
(February 2009).
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-08 Thread josef . pktd
On Mon, Jun 8, 2009 at 3:29 AM, Gael Varoquaux
 wrote:
> On Mon, Jun 08, 2009 at 08:58:29AM +0200, Matthieu Brucher wrote:
>> Given the number of PCs, I think you may just be measuring noise.
>> As said in several manifold reduction publications (as the ones by
>> Torbjorn Vik who published on robust PCA for medical imaging), you
>> cannot expect to have more than 4 or 5 meaningful PCs, due to the
>> dimensionality curse. If you want 50 PCs, you have to have at least...
>> 10^50 samples, which is quite a lot, let's say it this way.
>> According to the litterature, a usual manifold can be described by 4
>> or 5 variables. If you have more, it is that you may be infringing
>> your hypothesis, here the linearity of your data (and as it is medical
>> imaging, you know from the beginning that this hypothesis is wrong).
>> So if you really want to find something meaningful and/or physical,
>> you should use a real dimensionality reduction, preferably a
>> non-linear one.
>
> I am not sure I am following you: I have time-varying signals. I am not
> taking a shot of the same process over and over again. My intuition tells
> me that I have more than 5 meaningful patterns.
>
> Anyhow, I do some more analysis behind that (ICA actually), and I do find
> more than 5 patterns of interest that I not noise.

Just curious:
whats the actual shape of the array/data you run your PCA on.
Number of time periods, size of cross section at point in time?

Josef
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-08 Thread Gael Varoquaux
On Mon, Jun 08, 2009 at 08:33:11AM -0400, Jason Rennie wrote:
>EM is great for churning-out papers, not so great for getting real work
>done.� 

That's just what I thought.

>Btw, have you considered how much the Gaussianity assumption is
>hurting you?

I have. And the answer is: not much. But then, my order-selection method
is just about selecting the non-gaussian components. And the
non-orthogonality of the interessing 'indedpendant' signals is small, in
that subspace.

Ga�l
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-08 Thread Jason Rennie
Note that EM can be very slow to converge:

http://www.cs.toronto.edu/~roweis/papers/emecgicml03.pdf

EM is great for churning-out papers, not so great for getting real work
done.  Conjugate gradient is a much better tool, at least in my (and
Salakhutdinov's) experience.  Btw, have you considered how much the
Gaussianity assumption is hurting you?

Jason

On Mon, Jun 8, 2009 at 1:17 AM, David Cournapeau <
da...@ar.media.kyoto-u.ac.jp> wrote:

> Gael Varoquaux wrote:
> > I am using the heuristic exposed in
> > http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4562996
> >
> > We have very noisy and long time series. My experience is that most
> > model-based heuristics for choosing the number of PCs retained give us
> > way too much on this problem (they simply keep diverging if I add noise
> > at the end of the time series). The algorithm we use gives us ~50
> > interesting PCs (each composed of 50 000 dimensions). That happens to be
> > quite right based on our experience with the signal. However, being
> > fairly new to statistics, I am not aware of the EM algorithm that you
> > mention. I'd be interested in a reference, to see if I can use that
> > algorithm.
>
> I would not be surprised if David had this paper in mind :)
>
> http://www.cs.toronto.edu/~roweis/papers/empca.pdf
>
> cheers,
>
> David
> ___
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
Jason Rennie
Research Scientist, ITA Software
617-714-2645
http://www.itasoftware.com/
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-08 Thread Matthieu Brucher
2009/6/8 David Warde-Farley :
>
> On 8-Jun-09, at 1:17 AM, David Cournapeau wrote:
>
>> I would not be surprised if David had this paper in mind :)
>>
>> http://www.cs.toronto.edu/~roweis/papers/empca.pdf
>
> Right you are :)
>
> There is a slight trick to it, though, in that it won't produce an
> orthogonal basis on its own, just something that spans that principal
> subspace. So you typically have to at least extract the first PC
> independently to uniquely orient your basis. You can then either
> subtract off the projection of the data on the 1st PC and find the
> next one, one at at time, or extract a spanning set all at once and
> orthogonalize with respect to the first PC.
>
> David

Also Ch. Bishop has an article on using EM for PCA, Probabilistic
Principal Components Analysis where I think he proves the equivalence
as well.

Matthieu
-- 
Information System Engineer, Ph.D.
Website: http://matthieu-brucher.developpez.com/
Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92
LinkedIn: http://www.linkedin.com/in/matthieubrucher
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-08 Thread Matthieu Brucher
2009/6/8 Gael Varoquaux :
> On Mon, Jun 08, 2009 at 08:58:29AM +0200, Matthieu Brucher wrote:
>> Given the number of PCs, I think you may just be measuring noise.
>> As said in several manifold reduction publications (as the ones by
>> Torbjorn Vik who published on robust PCA for medical imaging), you
>> cannot expect to have more than 4 or 5 meaningful PCs, due to the
>> dimensionality curse. If you want 50 PCs, you have to have at least...
>> 10^50 samples, which is quite a lot, let's say it this way.
>> According to the litterature, a usual manifold can be described by 4
>> or 5 variables. If you have more, it is that you may be infringing
>> your hypothesis, here the linearity of your data (and as it is medical
>> imaging, you know from the beginning that this hypothesis is wrong).
>> So if you really want to find something meaningful and/or physical,
>> you should use a real dimensionality reduction, preferably a
>> non-linear one.
>
> I am not sure I am following you: I have time-varying signals. I am not
> taking a shot of the same process over and over again. My intuition tells
> me that I have more than 5 meaningful patterns.

How many samples do you have? 1? a million? a billion? The problem
with 50 PCs is that your search space is mostly empty, "thanks" to the
curse of dimensionality. This means that you *should* not try to get a
meaning for the 10th and following PCs.

> Anyhow, I do some more analysis behind that (ICA actually), and I do find
> more than 5 patterns of interest that I not noise.

ICa suffers from the same problems than PCA. And I'm not even talking
about the linearity hypothesis that is never respected.

> So maybe I should be using some non-linear dimensionality reduction, but
> what I am doing works, and I can write a generative model of it. Most
> importantly, it is actually quite computationaly simple.

Thanks linearity ;)
The problem is that you will have a lot of confounds this way (your 50
PCs can in fact be the effect of 5 variables that are nonlinear).

> However, if you can point me to methods that you believe are better (and
> tell me why you believe so), I am all ears.

My thesis was on nonlinear dimensionality reduction (this is why I
believe so, especially in the medical imaging field), but it always
need some adaptation. It depends on what you want to do, the time you
can use to process data, ... Suffice to say we started with PCA some
years ago and we were switching to nonlinear reduction because of the
emptiness of the search space and because of the nonlinearity of the
brain space (no idea what my former lab is doing now, but it is used
for DTI at least).
You should check some books on it, and you surely have to read
something about the curse of dimensionality (at least if you want to
get published, as people know about this issue in the medical field),
even if you do not use nonlinear techniques.

Matthieu
-- 
Information System Engineer, Ph.D.
Website: http://matthieu-brucher.developpez.com/
Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92
LinkedIn: http://www.linkedin.com/in/matthieubrucher
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-08 Thread Gael Varoquaux
On Mon, Jun 08, 2009 at 08:58:29AM +0200, Matthieu Brucher wrote:
> Given the number of PCs, I think you may just be measuring noise.
> As said in several manifold reduction publications (as the ones by
> Torbjorn Vik who published on robust PCA for medical imaging), you
> cannot expect to have more than 4 or 5 meaningful PCs, due to the
> dimensionality curse. If you want 50 PCs, you have to have at least...
> 10^50 samples, which is quite a lot, let's say it this way.
> According to the litterature, a usual manifold can be described by 4
> or 5 variables. If you have more, it is that you may be infringing
> your hypothesis, here the linearity of your data (and as it is medical
> imaging, you know from the beginning that this hypothesis is wrong).
> So if you really want to find something meaningful and/or physical,
> you should use a real dimensionality reduction, preferably a
> non-linear one.

I am not sure I am following you: I have time-varying signals. I am not
taking a shot of the same process over and over again. My intuition tells
me that I have more than 5 meaningful patterns.

Anyhow, I do some more analysis behind that (ICA actually), and I do find
more than 5 patterns of interest that I not noise.

So maybe I should be using some non-linear dimensionality reduction, but
what I am doing works, and I can write a generative model of it. Most
importantly, it is actually quite computationaly simple.

However, if you can point me to methods that you believe are better (and
tell me why you believe so), I am all ears.

Gaël
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-08 Thread David Warde-Farley

On 8-Jun-09, at 1:17 AM, David Cournapeau wrote:

> I would not be surprised if David had this paper in mind :)
>
> http://www.cs.toronto.edu/~roweis/papers/empca.pdf

Right you are :)

There is a slight trick to it, though, in that it won't produce an  
orthogonal basis on its own, just something that spans that principal  
subspace. So you typically have to at least extract the first PC  
independently to uniquely orient your basis. You can then either  
subtract off the projection of the data on the 1st PC and find the  
next one, one at at time, or extract a spanning set all at once and  
orthogonalize with respect to the first PC.

David


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-07 Thread Matthieu Brucher
2009/6/8 Gael Varoquaux :
> On Mon, Jun 08, 2009 at 12:29:08AM -0400, David Warde-Farley wrote:
>> On 7-Jun-09, at 6:12 AM, Gael Varoquaux wrote:
>
>> > Well, I do bootstrapping of PCAs, that is SVDs. I can tell you, it
>> > makes
>> > a big difference, especially since I have 8 cores.
>
>> Just curious Gael: how many PC's are you retaining? Have you tried
>> iterative methods (i.e. the EM algorithm for PCA)?
>
> I am using the heuristic exposed in
> http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4562996
>
> We have very noisy and long time series. My experience is that most
> model-based heuristics for choosing the number of PCs retained give us
> way too much on this problem (they simply keep diverging if I add noise
> at the end of the time series). The algorithm we use gives us ~50
> interesting PCs (each composed of 50 000 dimensions). That happens to be
> quite right based on our experience with the signal. However, being
> fairly new to statistics, I am not aware of the EM algorithm that you
> mention. I'd be interested in a reference, to see if I can use that
> algorithm. The PCA bootstrap is time-consuming.

Hi,

Given the number of PCs, I think you may just be measuring noise.
As said in several manifold reduction publications (as the ones by
Torbjorn Vik who published on robust PCA for medical imaging), you
cannot expect to have more than 4 or 5 meaningful PCs, due to the
dimensionality curse. If you want 50 PCs, you have to have at least...
10^50 samples, which is quite a lot, let's say it this way.
According to the litterature, a usual manifold can be described by 4
or 5 variables. If you have more, it is that you may be infringing
your hypothesis, here the linearity of your data (and as it is medical
imaging, you know from the beginning that this hypothesis is wrong).
So if you really want to find something meaningful and/or physical,
you should use a real dimensionality reduction, preferably a
non-linear one.

Just my 2 cents ;)

Matthieu
-- 
Information System Engineer, Ph.D.
Website: http://matthieu-brucher.developpez.com/
Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92
LinkedIn: http://www.linkedin.com/in/matthieubrucher
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-07 Thread Gael Varoquaux
On Mon, Jun 08, 2009 at 02:17:45PM +0900, David Cournapeau wrote:
> > However, being fairly new to statistics, I am not aware of the EM
> > algorithm that you mention. I'd be interested in a reference, to see
> > if I can use that algorithm. 

> I would not be surprised if David had this paper in mind :)

> http://www.cs.toronto.edu/~roweis/papers/empca.pdf

Excellent. Thanks to the Davids. I'll read that through.

Gaël
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-07 Thread David Cournapeau
Gael Varoquaux wrote:
> I am using the heuristic exposed in
> http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4562996
>
> We have very noisy and long time series. My experience is that most
> model-based heuristics for choosing the number of PCs retained give us
> way too much on this problem (they simply keep diverging if I add noise
> at the end of the time series). The algorithm we use gives us ~50
> interesting PCs (each composed of 50 000 dimensions). That happens to be
> quite right based on our experience with the signal. However, being
> fairly new to statistics, I am not aware of the EM algorithm that you
> mention. I'd be interested in a reference, to see if I can use that
> algorithm. 

I would not be surprised if David had this paper in mind :)

http://www.cs.toronto.edu/~roweis/papers/empca.pdf

cheers,

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-07 Thread Gael Varoquaux
On Mon, Jun 08, 2009 at 12:29:08AM -0400, David Warde-Farley wrote:
> On 7-Jun-09, at 6:12 AM, Gael Varoquaux wrote:

> > Well, I do bootstrapping of PCAs, that is SVDs. I can tell you, it  
> > makes
> > a big difference, especially since I have 8 cores.

> Just curious Gael: how many PC's are you retaining? Have you tried  
> iterative methods (i.e. the EM algorithm for PCA)?

I am using the heuristic exposed in
http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4562996

We have very noisy and long time series. My experience is that most
model-based heuristics for choosing the number of PCs retained give us
way too much on this problem (they simply keep diverging if I add noise
at the end of the time series). The algorithm we use gives us ~50
interesting PCs (each composed of 50 000 dimensions). That happens to be
quite right based on our experience with the signal. However, being
fairly new to statistics, I am not aware of the EM algorithm that you
mention. I'd be interested in a reference, to see if I can use that
algorithm. The PCA bootstrap is time-consuming.

Thanks,

Gaël 
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-07 Thread David Warde-Farley
On 7-Jun-09, at 6:12 AM, Gael Varoquaux wrote:

> Well, I do bootstrapping of PCAs, that is SVDs. I can tell you, it  
> makes
> a big difference, especially since I have 8 cores.

Just curious Gael: how many PC's are you retaining? Have you tried  
iterative methods (i.e. the EM algorithm for PCA)?

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-07 Thread David Cournapeau
Gabriel Beckers wrote:
> On Sun, 2009-06-07 at 18:37 +0900, David Cournapeau wrote:
>   
>> That's why compiling atlas by yourself is hard, and I generally advise
>> against it: there is nothing intrinsically hard about it, but you need
>> to know a lot of small details and platform oddities to get it right
>> every time. That's just a waste of time in most cases IMHO, unless all
>> you do with numpy is inverting big matrices,
>> 
>
> I have been trying intel mkl and icc compiler instead, with no luck. I
> run into the same problem during setup as reported here:
>
> http://www.mail-archive.com/numpy-discussion@scipy.org/msg16595.html
>   

See #1131 on numpy tracker - it has nothing to do with icc/mkl per-se.

cheers,

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-07 Thread Chris Colbert
when i had problems building atlas in the past (i.e. numpy.test()
failed) it was a problem with my lapack build, not atlas. The netlib
website gives instructions for building the lapack test suite. I
suggest you do that and run the tests on lapack and make sure
everything is kosher.

Chris

On Sun, Jun 7, 2009 at 5:52 AM, Gabriel Beckers wrote:
> OK, perhaps I drank that beer too soon...
>
> Now, numpy.test() hangs at:
>
> test_pinv (test_defmatrix.TestProperties) ...
>
> So perhaps something is wrong with ATLAS, even though the building went
> fine, and "make check" and "make ptcheck" reported no errors.
>
> Gabriel
>
> On Sun, 2009-06-07 at 10:20 +0200, Gabriel Beckers wrote:
>> On Sat, 2009-06-06 at 12:59 -0400, Chris Colbert wrote:
>> > ../configure -b 64 -D c -DPentiumCPS=2400 -Fa  -alg -fPIC
>> > --with-netlib-lapack=/home/your-user-name/build/lapack/lapack-3.2.1/Lapack_LINUX.a
>>
>> Many thanks Chris, I succeeded in building it.
>>
>> The configure command above contained two problems that I had to correct
>> to get it to work though.
>>
>> In case other people are trying this, I used:
>>
>> ../configure -b 32 -D c -DPentiumCPS=1800 -Fa alg -fPIC
>> --with-netlib-lapack=/home/your-user-name/build/lapack/lapack-3.2.1/lapack_LINUX.a
>>
>> That is (in addition to the different -b switch for my 32-bit machine
>> and the different processor speed): the dash before "alg" should be
>> removed, and "Lapack_LINUX.a" should be "lapack_LINUX.a".
>>
>> Gabriel
>>
>> ___
>> Numpy-discussion mailing list
>> Numpy-discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
> ___
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-07 Thread Chris Colbert
thanks for catching the typos!

Chris

On Sun, Jun 7, 2009 at 4:20 AM, Gabriel Beckers wrote:
> On Sat, 2009-06-06 at 12:59 -0400, Chris Colbert wrote:
>> ../configure -b 64 -D c -DPentiumCPS=2400 -Fa  -alg -fPIC
>> --with-netlib-lapack=/home/your-user-name/build/lapack/lapack-3.2.1/Lapack_LINUX.a
>
> Many thanks Chris, I succeeded in building it.
>
> The configure command above contained two problems that I had to correct
> to get it to work though.
>
> In case other people are trying this, I used:
>
> ../configure -b 32 -D c -DPentiumCPS=1800 -Fa alg -fPIC
> --with-netlib-lapack=/home/your-user-name/build/lapack/lapack-3.2.1/lapack_LINUX.a
>
> That is (in addition to the different -b switch for my 32-bit machine
> and the different processor speed): the dash before "alg" should be
> removed, and "Lapack_LINUX.a" should be "lapack_LINUX.a".
>
> Gabriel
>
> ___
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-07 Thread Gabriel Beckers
On Sun, 2009-06-07 at 18:37 +0900, David Cournapeau wrote:
> That's why compiling atlas by yourself is hard, and I generally advise
> against it: there is nothing intrinsically hard about it, but you need
> to know a lot of small details and platform oddities to get it right
> every time. That's just a waste of time in most cases IMHO, unless all
> you do with numpy is inverting big matrices,

I have been trying intel mkl and icc compiler instead, with no luck. I
run into the same problem during setup as reported here:

http://www.mail-archive.com/numpy-discussion@scipy.org/msg16595.html


Sigh. I guess I should not get into these matters anyway; I am just a
simple and humble user...

As far as I understand the Ubuntu atlas problems have been found for
complex types, which I don't use except for fft. I guess I'll continue
to use the ubuntu libraries then and hope for better days in the future.

Best, Gabriel

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-07 Thread Gabriel Beckers
On Sun, 2009-06-07 at 19:00 +0900, David Cournapeau wrote:
> hence *most* :) I doubt most numpy users need to do PCA on
> high-dimensional data.

OK a quick look on the MDP website learns that I am one of the
exceptions (as Gaël's email already suggested). 

Gabriel




___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-07 Thread Gabriel Beckers
On Sun, 2009-06-07 at 18:37 +0900, David Cournapeau wrote:
> Maybe you did not use the same fortran compiler with atlas and numpy,
> or
> maybe something else. make check/make ptchek do not test anything
> useful
> to avoid problems with numpy, in my experience.
> 
> That's why compiling atlas by yourself is hard, and I generally advise
> against it: there is nothing intrinsically hard about it, but you need
> to know a lot of small details and platform oddities to get it right
> every time. That's just a waste of time in most cases IMHO, unless all
> you do with numpy is inverting big matrices,
> 
> cheers,
> 
> David

Hi David,

I did:

sudo apt-get remove g77
sudo apt-get install gfortran

before starting the whole thing, so I assume that should take care of
it.

I am not sure how much I actually depend on Atlas for what I do, so your
advice is well taken. One thing I can think of is PCA and ICA (of *big*
matrices of float32 data), using the MDP toolbox mostly. I should find
out in how far Atlas is crucial specifically for that.

All the best, Gabriel


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-07 Thread David Cournapeau
Gael Varoquaux wrote:
> On Sun, Jun 07, 2009 at 06:37:21PM +0900, David Cournapeau wrote:
>   
>> That's why compiling atlas by yourself is hard, and I generally advise
>> against it: there is nothing intrinsically hard about it, but you need
>> to know a lot of small details and platform oddities to get it right
>> every time. That's just a waste of time in most cases IMHO, unless all
>> you do with numpy is inverting big matrices,
>> 
>
> Well, I do bootstrapping of PCAs, that is SVDs. I can tell you, it makes
> a big difference, especially since I have 8 cores.
>   

hence *most* :) I doubt most numpy users need to do PCA on
high-dimensional data.

cheers,

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-07 Thread Gael Varoquaux
On Sun, Jun 07, 2009 at 06:37:21PM +0900, David Cournapeau wrote:
> That's why compiling atlas by yourself is hard, and I generally advise
> against it: there is nothing intrinsically hard about it, but you need
> to know a lot of small details and platform oddities to get it right
> every time. That's just a waste of time in most cases IMHO, unless all
> you do with numpy is inverting big matrices,

Well, I do bootstrapping of PCAs, that is SVDs. I can tell you, it makes
a big difference, especially since I have 8 cores.

Gaël
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-07 Thread David Cournapeau
Gabriel Beckers wrote:
> OK, perhaps I drank that beer too soon... 
>
> Now, numpy.test() hangs at:
>
> test_pinv (test_defmatrix.TestProperties) ... 
>
> So perhaps something is wrong with ATLAS, even though the building went
> fine, and "make check" and "make ptcheck" reported no errors.
>   

Maybe you did not use the same fortran compiler with atlas and numpy, or
maybe something else. make check/make ptchek do not test anything useful
to avoid problems with numpy, in my experience.

That's why compiling atlas by yourself is hard, and I generally advise
against it: there is nothing intrinsically hard about it, but you need
to know a lot of small details and platform oddities to get it right
every time. That's just a waste of time in most cases IMHO, unless all
you do with numpy is inverting big matrices,

cheers,

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-07 Thread Gabriel Beckers
OK, perhaps I drank that beer too soon... 

Now, numpy.test() hangs at:

test_pinv (test_defmatrix.TestProperties) ... 

So perhaps something is wrong with ATLAS, even though the building went
fine, and "make check" and "make ptcheck" reported no errors.

Gabriel

On Sun, 2009-06-07 at 10:20 +0200, Gabriel Beckers wrote: 
> On Sat, 2009-06-06 at 12:59 -0400, Chris Colbert wrote:
> > ../configure -b 64 -D c -DPentiumCPS=2400 -Fa  -alg -fPIC
> > --with-netlib-lapack=/home/your-user-name/build/lapack/lapack-3.2.1/Lapack_LINUX.a
> 
> Many thanks Chris, I succeeded in building it.
> 
> The configure command above contained two problems that I had to correct
> to get it to work though. 
> 
> In case other people are trying this, I used:
> 
> ../configure -b 32 -D c -DPentiumCPS=1800 -Fa alg -fPIC
> --with-netlib-lapack=/home/your-user-name/build/lapack/lapack-3.2.1/lapack_LINUX.a
> 
> That is (in addition to the different -b switch for my 32-bit machine
> and the different processor speed): the dash before "alg" should be
> removed, and "Lapack_LINUX.a" should be "lapack_LINUX.a".
> 
> Gabriel
> 
> ___
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-07 Thread Gabriel Beckers
On Sat, 2009-06-06 at 12:59 -0400, Chris Colbert wrote:
> ../configure -b 64 -D c -DPentiumCPS=2400 -Fa  -alg -fPIC
> --with-netlib-lapack=/home/your-user-name/build/lapack/lapack-3.2.1/Lapack_LINUX.a

Many thanks Chris, I succeeded in building it.

The configure command above contained two problems that I had to correct
to get it to work though. 

In case other people are trying this, I used:

../configure -b 32 -D c -DPentiumCPS=1800 -Fa alg -fPIC
--with-netlib-lapack=/home/your-user-name/build/lapack/lapack-3.2.1/lapack_LINUX.a

That is (in addition to the different -b switch for my 32-bit machine
and the different processor speed): the dash before "alg" should be
removed, and "Lapack_LINUX.a" should be "lapack_LINUX.a".

Gabriel

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-06 Thread Minjae Kim
Thanks for this excellent recipe.

I have not tried it out myself yet, but I will follow the instruction on
clean Ubuntu 9.04 64-bit.

Best,
Minjae

On Sat, Jun 6, 2009 at 11:59 AM, Chris Colbert  wrote:

> since there is demand, and someone already emailed me, I'll put what I
> did in this post. It pretty much follows whats on the scipy website,
> with a couple other things I gleaned from reading the ATLAS install
> guide:
>
> and here it goes, this is valid for Ubuntu 9.04 64-bit  (# starts a
> comment when working in the terminal)
>
>
> download lapack 3.2.1 http://www.netlib.org/lapack/lapack.tgz
> download atlas 3.8.3
>
> http://sourceforge.net/project/downloading.php?group_id=23725&filename=atlas3.8.3.tar.bz2&a=65663372
>
> create folder  /home/your-user-name/build/atlas   #this is where we build
> create folder /home/your-user-name/build/lapack #atlas and lapack
>
> extract the folder lapack-3.2.1 to /home/your-user-name/build/lapack
> extract the contents of atlas to /home/your-user-name/build/atlas
>
>
>
> now in the terminal:
>
> # remove g77 and get stuff we need
> sudo apt-get remove g77
> sudo apt-get install gfortran
> sudo apt-get install build-essential
> sudo apt-get install python-dev
> sudo apt-get install python-setuptools
> sudo easy_install nose
>
>
> # build lapack
> cd /home/your-user-name/build/lapack/lapack-3.2.1
> cp INSTALL/make.inc.gfortran make.inc
>
> gedit make.inc
> #
> #in the make.inc file make sure the line   OPTS = -O2 -fPIC -m64
> #andNOOPTS = -O0 -fPIC -m64
> #the -m64 flags build 64-bit code, if you want 32-bit, simply leave
> #the -m64 flags out
> #
>
> cd SRC
>
> #this should build lapack without error
> make
>
>
>
> # build atlas
>
> cd /home/your-user-name/build/atlas
>
> #this is simply where we will build the atlas
> #libs, you can name it what you want
> mkdir Linux_X64SSE2
>
> cd Linux_X64SSE2
>
> #need to turn off cpu-throttling
> sudo cpufreq-selector -g performance
>
> #if you don't want 64bit code remove the -b 64 flag. replace the
> #number 2400 with your CPU frequency in MHZ
> #i.e. my cpu is 2.53 GHZ so i put 2530
> ../configure -b 64 -D c -DPentiumCPS=2400 -Fa  -alg -fPIC
>
> --with-netlib-lapack=/home/your-user-name/build/lapack/lapack-3.2.1/Lapack_LINUX.a
>
> #the configure step takes a bit, and should end without errors
>
>  #this takes a long time, go get some coffee, it should end without error
> make build
>
> #this will verify the build, also long running
> make check
>
> #this will test the performance of your build and give you feedback on
> #it. your numbers should be close to the test numbers at the end
> make time
>
> cd lib
>
> #builds single threaded .so's
> make shared
>
> #builds multithreaded .so's
> make ptshared
>
> #copies all of the atlas libs (and the lapack lib built with atlas)
> #to our lib dir
> sudo  cp  *.so  /usr/local/lib/
>
>
>
> #now we need to get and build numpy
>
> download numpy 1.3.0
>
> http://sourceforge.net/project/downloading.php?group_id=1369&filename=numpy-1.3.0.tar.gz&a=93506515
>
> extract the folder numpy-1.3.0 to /home/your-user-name/build
>
> #in the terminal
>
> cd /home/your-user-name/build/numpy-1.3.0
> cp site.cfg.example site.cfg
>
> gedit site.cfg
> ###
> # in site.cfg uncomment the following lines and make them look like these
> [DEFAULT]
> library_dirs = /usr/local/lib
> include_dirs = /usr/local/include
>
> [blas_opt]
> libraries = ptf77blas, ptcblas, atlas
>
> [lapack_opt]
> libraries = lapack, ptf77blas, ptcblas, atlas
> ###
> #if you want single threaded libs, uncomment those lines instead
>
>
> #build numpy- should end without error
> python setup.py build
>
> #install numpy
> python setup.py install
>
> cd /home
>
> sudo ldconfig
>
> python
> >>import numpy
> >>numpy.test()   #this should run with no errors (skipped tests and
> known-fails are ok)
> >>a = numpy.random.randn(6000, 6000)
> >>numpy.dot(a, a) # look at your cpu monitor and verify all cpu cores
> are at 100% if you built with threads
>
>
> Celebrate with a beer!
>
>
> Cheers!
>
> Chris
>
>
>
>
>
> On Sat, Jun 6, 2009 at 10:42 AM, Keith Goodman wrote:
> > On Fri, Jun 5, 2009 at 2:37 PM, Chris Colbert 
> wrote:
> >> I'll caution anyone from using Atlas from the repos in Ubuntu 9.04  as
> the
> >> package is broken:
> >>
> >> https://bugs.launchpad.net/ubuntu/+source/atlas/+bug/363510
> >>
> >>
> >> just build Atlas yourself, you get better performance AND threading.
> >> Building it is not the nightmare it sounds like. I think i've done it a
> >> total of four times now, both 32-bit and 64-bit builds.
> >>
> >> If you need help with it,  just email me off list.
> >
> > That's a nice offer. I tried building ATLAS on Debian a year or two
> > ago and got stuck.
> >
> > Clear out your inbox!
> > ___
> > Numpy-discussion mailing list
> > Numpy-discussion@scipy.or

Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-06 Thread Chris Colbert
since there is demand, and someone already emailed me, I'll put what I
did in this post. It pretty much follows whats on the scipy website,
with a couple other things I gleaned from reading the ATLAS install
guide:

and here it goes, this is valid for Ubuntu 9.04 64-bit  (# starts a
comment when working in the terminal)


download lapack 3.2.1 http://www.netlib.org/lapack/lapack.tgz
download atlas 3.8.3
http://sourceforge.net/project/downloading.php?group_id=23725&filename=atlas3.8.3.tar.bz2&a=65663372

create folder  /home/your-user-name/build/atlas   #this is where we build
create folder /home/your-user-name/build/lapack #atlas and lapack

extract the folder lapack-3.2.1 to /home/your-user-name/build/lapack
extract the contents of atlas to /home/your-user-name/build/atlas



now in the terminal:

# remove g77 and get stuff we need
sudo apt-get remove g77
sudo apt-get install gfortran
sudo apt-get install build-essential
sudo apt-get install python-dev
sudo apt-get install python-setuptools
sudo easy_install nose


# build lapack
cd /home/your-user-name/build/lapack/lapack-3.2.1
cp INSTALL/make.inc.gfortran make.inc

gedit make.inc
#
#in the make.inc file make sure the line   OPTS = -O2 -fPIC -m64
#andNOOPTS = -O0 -fPIC -m64
#the -m64 flags build 64-bit code, if you want 32-bit, simply leave
#the -m64 flags out
#

cd SRC

#this should build lapack without error
make



# build atlas

cd /home/your-user-name/build/atlas

#this is simply where we will build the atlas
#libs, you can name it what you want
mkdir Linux_X64SSE2

cd Linux_X64SSE2

#need to turn off cpu-throttling
sudo cpufreq-selector -g performance

#if you don't want 64bit code remove the -b 64 flag. replace the
#number 2400 with your CPU frequency in MHZ
#i.e. my cpu is 2.53 GHZ so i put 2530
../configure -b 64 -D c -DPentiumCPS=2400 -Fa  -alg -fPIC
--with-netlib-lapack=/home/your-user-name/build/lapack/lapack-3.2.1/Lapack_LINUX.a

#the configure step takes a bit, and should end without errors

 #this takes a long time, go get some coffee, it should end without error
make build

#this will verify the build, also long running
make check

#this will test the performance of your build and give you feedback on
#it. your numbers should be close to the test numbers at the end
make time

cd lib

#builds single threaded .so's
make shared

#builds multithreaded .so's
make ptshared

#copies all of the atlas libs (and the lapack lib built with atlas)
#to our lib dir
sudo  cp  *.so  /usr/local/lib/



#now we need to get and build numpy

download numpy 1.3.0
http://sourceforge.net/project/downloading.php?group_id=1369&filename=numpy-1.3.0.tar.gz&a=93506515

extract the folder numpy-1.3.0 to /home/your-user-name/build

#in the terminal

cd /home/your-user-name/build/numpy-1.3.0
cp site.cfg.example site.cfg

gedit site.cfg
###
# in site.cfg uncomment the following lines and make them look like these
[DEFAULT]
library_dirs = /usr/local/lib
include_dirs = /usr/local/include

[blas_opt]
libraries = ptf77blas, ptcblas, atlas

[lapack_opt]
libraries = lapack, ptf77blas, ptcblas, atlas
###
#if you want single threaded libs, uncomment those lines instead


#build numpy- should end without error
python setup.py build

#install numpy
python setup.py install

cd /home

sudo ldconfig

python
>>import numpy
>>numpy.test()   #this should run with no errors (skipped tests and known-fails 
>>are ok)
>>a = numpy.random.randn(6000, 6000)
>>numpy.dot(a, a) # look at your cpu monitor and verify all cpu cores are 
>>at 100% if you built with threads


Celebrate with a beer!


Cheers!

Chris





On Sat, Jun 6, 2009 at 10:42 AM, Keith Goodman wrote:
> On Fri, Jun 5, 2009 at 2:37 PM, Chris Colbert  wrote:
>> I'll caution anyone from using Atlas from the repos in Ubuntu 9.04  as the
>> package is broken:
>>
>> https://bugs.launchpad.net/ubuntu/+source/atlas/+bug/363510
>>
>>
>> just build Atlas yourself, you get better performance AND threading.
>> Building it is not the nightmare it sounds like. I think i've done it a
>> total of four times now, both 32-bit and 64-bit builds.
>>
>> If you need help with it,  just email me off list.
>
> That's a nice offer. I tried building ATLAS on Debian a year or two
> ago and got stuck.
>
> Clear out your inbox!
> ___
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-06 Thread Keith Goodman
On Fri, Jun 5, 2009 at 2:37 PM, Chris Colbert  wrote:
> I'll caution anyone from using Atlas from the repos in Ubuntu 9.04  as the
> package is broken:
>
> https://bugs.launchpad.net/ubuntu/+source/atlas/+bug/363510
>
>
> just build Atlas yourself, you get better performance AND threading.
> Building it is not the nightmare it sounds like. I think i've done it a
> total of four times now, both 32-bit and 64-bit builds.
>
> If you need help with it,  just email me off list.

That's a nice offer. I tried building ATLAS on Debian a year or two
ago and got stuck.

Clear out your inbox!
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-05 Thread Chris Colbert
I'll caution anyone from using Atlas from the repos in Ubuntu 9.04  as the
package is broken:

https://bugs.launchpad.net/ubuntu/+source/atlas/+bug/363510


just build Atlas yourself, you get better performance AND threading.
Building it is not the nightmare it sounds like. I think i've done it a
total of four times now, both 32-bit and 64-bit builds.

If you need help with it,  just email me off list.

Cheers,

Chris

On Fri, Jun 5, 2009 at 2:46 PM, Matthieu Brucher  wrote:

> 2009/6/5 David Cournapeau :
> > Eric Firing wrote:
> >>
> >> David,
> >>
> >> The eigen web site indicates that eigen achieves high performance
> >> without all the compilation difficulty of atlas.  Does eigen have enough
> >> functionality to replace atlas in numpy?
> >
> > No, eigen does not provide a (complete) BLAS/LAPACK interface. I don't
> > know if that's even a goal of eigen (it started as a project for KDE, to
> > support high performance core computations for things like spreadsheet
> > and co).
> >
> > But even then, it would be a huge undertaking. For all its flaws, LAPACK
> > is old, tested code, with a very stable language (F77). Eigen is:
> >- not mature.
> >- heavily expression-template-based C++, meaning compilation takes
> > ages + esoteric, impossible to decypher compilation errors. We have
> > enough build problems already :)
> >- SSE dependency harcoded, since it is setup at build time. That's
> > going backward IMHO - I would rather see a numpy/scipy which can load
> > the optimized code at runtime.
>
> I would add that it relies on C++ compiler extensions (the restrict
> keyword) as does blitz. You unfortunately can't expect every compiler
> to support it unless the C++ committee finally adds it to the
> standard.
>
> Matthieu
> --
> Information System Engineer, Ph.D.
> Website: http://matthieu-brucher.developpez.com/
> Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92
> LinkedIn: http://www.linkedin.com/in/matthieubrucher
> ___
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-05 Thread Matthieu Brucher
2009/6/5 David Cournapeau :
> Eric Firing wrote:
>>
>> David,
>>
>> The eigen web site indicates that eigen achieves high performance
>> without all the compilation difficulty of atlas.  Does eigen have enough
>> functionality to replace atlas in numpy?
>
> No, eigen does not provide a (complete) BLAS/LAPACK interface. I don't
> know if that's even a goal of eigen (it started as a project for KDE, to
> support high performance core computations for things like spreadsheet
> and co).
>
> But even then, it would be a huge undertaking. For all its flaws, LAPACK
> is old, tested code, with a very stable language (F77). Eigen is:
>    - not mature.
>    - heavily expression-template-based C++, meaning compilation takes
> ages + esoteric, impossible to decypher compilation errors. We have
> enough build problems already :)
>    - SSE dependency harcoded, since it is setup at build time. That's
> going backward IMHO - I would rather see a numpy/scipy which can load
> the optimized code at runtime.

I would add that it relies on C++ compiler extensions (the restrict
keyword) as does blitz. You unfortunately can't expect every compiler
to support it unless the C++ committee finally adds it to the
standard.

Matthieu
-- 
Information System Engineer, Ph.D.
Website: http://matthieu-brucher.developpez.com/
Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92
LinkedIn: http://www.linkedin.com/in/matthieubrucher
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-05 Thread David Cournapeau
Eric Firing wrote:
>
> David,
>
> The eigen web site indicates that eigen achieves high performance 
> without all the compilation difficulty of atlas.  Does eigen have enough 
> functionality to replace atlas in numpy?

No, eigen does not provide a (complete) BLAS/LAPACK interface. I don't
know if that's even a goal of eigen (it started as a project for KDE, to
support high performance core computations for things like spreadsheet
and co).

But even then, it would be a huge undertaking. For all its flaws, LAPACK
is old, tested code, with a very stable language (F77). Eigen is:
- not mature.
- heavily expression-template-based C++, meaning compilation takes
ages + esoteric, impossible to decypher compilation errors. We have
enough build problems already :)
- SSE dependency harcoded, since it is setup at build time. That's
going backward IMHO - I would rather see a numpy/scipy which can load
the optimized code at runtime.

cheers,

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-05 Thread Eric Firing
David Cournapeau wrote:

> 
> It really depends on the CPU, compiler, how atlas was compiled, etc...
> it can be slightly faster to 10 times faster (if you use a very poorly
> optimized ATLAS).
> 
> For some recent benchmarks:
> 
> http://eigen.tuxfamily.org/index.php?title=Benchmark
> 

David,

The eigen web site indicates that eigen achieves high performance 
without all the compilation difficulty of atlas.  Does eigen have enough 
functionality to replace atlas in numpy?  Presumably it would need C 
compatibility wrappers to emulate the blas functions.  Would that kill 
its performance?  Or be very difficult?

(I'm asking from curiosity combined with complete ignorance.  Until 
yesterday I had never even heard of eigen.)

Eric

> cheers,
> 
> David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-05 Thread David Paul Reichert
Hi,

Thanks for the suggestion.

Unfortunately I'm using university managed machines here, so
I have no control over the distribution, not even root access.

However, I just downloaded the latest Enthought distribution,
which uses numpy 1.3, and now numpy is only 30% to 60% slower
than matlab, instead of 5 times slower. I can live with that.
(whether it uses atlas now or not, I don't know).

Cheers

David


Quoting Jason Rennie :

> Hi David,
>
> Let me suggest that you try the latest version of Ubuntu (9.04/Jaunty),
> which was released two months ago.  It sounds like you are effectively using
> release 5 of RedHat Linux which was originally released May 2007.  There
> have been updates (5.1, 5.2, 5.3), but, if my memory serves me correctly,
> RedHat updates are more focused on fixing bugs and security issues rather
> than improving functionality.  Ubuntu does a full, new release every 6
> months so you don't have to wait as long to see improvements.  Ubuntu also
> has a tremendously better package management system.  You generally
> shouldn't be installing packages by hand as it sounds like you are doing.
>
> This post suggests that the latest version of Ubuntu is up-to-date wrt
> ATLAS:
>
> http://www.mail-archive.com/numpy-discussion@scipy.org/msg13102.html
>
> Jason
>
> On Fri, Jun 5, 2009 at 5:44 AM, David Paul Reichert <
> d.p.reich...@sms.ed.ac.uk> wrote:
>
>> Thanks for the replies so far.
>>
>> I had already tested using an already transposed matrix in the loop,
>> it didn't make any difference. Oh and btw, I'm on (Scientific) Linux.
>>
>> I used the Enthought distribution, but I guess I'll have to get
>> my hands dirty and try to get that Atlas thing working (I'm not
>> a Linux expert though). My simulations pretty much consist of
>> matrix multiplications, so if I don't get rid of that factor 5,
>> I pretty much have to get back to Matlab.
>>
>> When you said Atlas is going to be optimized for my system, does
>> that mean I should compile everything on each machine separately?
>> I.e. I have a not-so-great desktop machine and one of those bigger
>> multicore things available...
>>
>> Cheers
>>
>> David
>>
>
> --
> Jason Rennie
> Research Scientist, ITA Software
> 617-714-2645
> http://www.itasoftware.com/
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-05 Thread Jason Rennie
Hi David,

Let me suggest that you try the latest version of Ubuntu (9.04/Jaunty),
which was released two months ago.  It sounds like you are effectively using
release 5 of RedHat Linux which was originally released May 2007.  There
have been updates (5.1, 5.2, 5.3), but, if my memory serves me correctly,
RedHat updates are more focused on fixing bugs and security issues rather
than improving functionality.  Ubuntu does a full, new release every 6
months so you don't have to wait as long to see improvements.  Ubuntu also
has a tremendously better package management system.  You generally
shouldn't be installing packages by hand as it sounds like you are doing.

This post suggests that the latest version of Ubuntu is up-to-date wrt
ATLAS:

http://www.mail-archive.com/numpy-discussion@scipy.org/msg13102.html

Jason

On Fri, Jun 5, 2009 at 5:44 AM, David Paul Reichert <
d.p.reich...@sms.ed.ac.uk> wrote:

> Thanks for the replies so far.
>
> I had already tested using an already transposed matrix in the loop,
> it didn't make any difference. Oh and btw, I'm on (Scientific) Linux.
>
> I used the Enthought distribution, but I guess I'll have to get
> my hands dirty and try to get that Atlas thing working (I'm not
> a Linux expert though). My simulations pretty much consist of
> matrix multiplications, so if I don't get rid of that factor 5,
> I pretty much have to get back to Matlab.
>
> When you said Atlas is going to be optimized for my system, does
> that mean I should compile everything on each machine separately?
> I.e. I have a not-so-great desktop machine and one of those bigger
> multicore things available...
>
> Cheers
>
> David
>

-- 
Jason Rennie
Research Scientist, ITA Software
617-714-2645
http://www.itasoftware.com/
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-05 Thread David Cournapeau
Sebastian Walter wrote:
> On Fri, Jun 5, 2009 at 11:58 AM, David
> Cournapeau wrote:
>   
>> Sebastian Walter wrote:
>> 
>>> On Thu, Jun 4, 2009 at 10:56 PM, Chris Colbert wrote:
>>>
>>>   
 I should update after reading the thread Sebastian linked:

 The current 1.3 version of numpy (don't know about previous versions) uses
 the optimized Atlas BLAS routines for numpy.dot() if numpy was compiled 
 with
 these libraries. I've verified this on linux only, thought it shouldnt be
 any different on windows AFAIK.

 
>>> in the  best of all possible worlds this would be done by a package
>>> maintainer
>>>
>>>   
>> Numpy packages on windows do use ATLAS, so I am not sure what you are
>> referring to ?
>> 
> I'm on debian unstable and my numpy (version 1.2.1) uses an unoptimized blas.
>   

Yes, it is because the package on Linux are not well done in that
respect (to their defense, numpy build is far from being packaging
friendly, and is both fragile and obscure).

> I had the impression that most ppl that use numpy are on linux.

Sourceforge numbers tell a different story at least. I think most users
on the ML use linux, and certainly almost every developer use linux or
mac os x. But ML already filter most windows users - only geeks read ML
:) I am pretty sure a vast majority of numpy users never even bother to
look for the ML.

>> On a side note,  correctly packaging ATLAS is almost
>> inherently impossible, since the build method of ATLAS can never produce
>> the same binary (even on the same machine), and the binary is optimized
>> for the machine it was built on. So if you want the best speed, you
>> should build atlas by yourself - which is painful on windows (you need
>> cygwin).
>> 
> in the debian repositories there are different builds of atlas so
> there could be different builds for numpy, too.
> But there aren't
>   

There are several problems:
- packagers (rightfully) hate to have many versions of the same software
- as for now, if ATLAS is detected, numpy is built differently than
if it is linked against conventional blas/lapack
- numpy on debian is not built with atlas support

But there is certainly no need to build one numpy version for every
atlas: the linux loader can load the most appropriate library depending
on your architecture, the so called hwcap flag. If your CPU supports
SSE2, and you have ATLAS installed for SSE2, then the loader will
automatically load the libraries there instead of the one in /usr/lib by
default.

But because ATLAS is such a pain to support in a binary form, only
ancient versions of ATLAS are packaged anyways (3.6.*). So if you care
so much, you should build your own.

>> On windows, if you really care about speed, you should try linking
>> against the Intel MKL. That's what Matlab uses internally on recent
>> versions, so you would get the same speed. But that's rather involved.
>> 
>
>   

It really depends on the CPU, compiler, how atlas was compiled, etc...
it can be slightly faster to 10 times faster (if you use a very poorly
optimized ATLAS).

For some recent benchmarks:

http://eigen.tuxfamily.org/index.php?title=Benchmark

cheers,

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-05 Thread Sebastian Walter
On Fri, Jun 5, 2009 at 11:58 AM, David
Cournapeau wrote:
> Sebastian Walter wrote:
>> On Thu, Jun 4, 2009 at 10:56 PM, Chris Colbert wrote:
>>
>>> I should update after reading the thread Sebastian linked:
>>>
>>> The current 1.3 version of numpy (don't know about previous versions) uses
>>> the optimized Atlas BLAS routines for numpy.dot() if numpy was compiled with
>>> these libraries. I've verified this on linux only, thought it shouldnt be
>>> any different on windows AFAIK.
>>>
>>
>> in the  best of all possible worlds this would be done by a package
>> maintainer
>>
>
> Numpy packages on windows do use ATLAS, so I am not sure what you are
> referring to ?
I'm on debian unstable and my numpy (version 1.2.1) uses an unoptimized blas.
I had the impression that most ppl that use numpy are on linux. But
apparently this is a misconception.

>On a side note,  correctly packaging ATLAS is almost
> inherently impossible, since the build method of ATLAS can never produce
> the same binary (even on the same machine), and the binary is optimized
> for the machine it was built on. So if you want the best speed, you
> should build atlas by yourself - which is painful on windows (you need
> cygwin).
in the debian repositories there are different builds of atlas so
there could be different builds for numpy, too.
But there aren't

>
> On windows, if you really care about speed, you should try linking
> against the Intel MKL. That's what Matlab uses internally on recent
> versions, so you would get the same speed. But that's rather involved.

How much faster is MKL than ATLAS?


>
> cheers,
>
> David
> ___
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-05 Thread David Cournapeau
Sebastian Walter wrote:
> On Thu, Jun 4, 2009 at 10:56 PM, Chris Colbert wrote:
>   
>> I should update after reading the thread Sebastian linked:
>>
>> The current 1.3 version of numpy (don't know about previous versions) uses
>> the optimized Atlas BLAS routines for numpy.dot() if numpy was compiled with
>> these libraries. I've verified this on linux only, thought it shouldnt be
>> any different on windows AFAIK.
>> 
>
> in the  best of all possible worlds this would be done by a package
> maintainer
>   

Numpy packages on windows do use ATLAS, so I am not sure what you are
referring to ? On a side note,  correctly packaging ATLAS is almost
inherently impossible, since the build method of ATLAS can never produce
the same binary (even on the same machine), and the binary is optimized
for the machine it was built on. So if you want the best speed, you
should build atlas by yourself - which is painful on windows (you need
cygwin).

On windows, if you really care about speed, you should try linking
against the Intel MKL. That's what Matlab uses internally on recent
versions, so you would get the same speed. But that's rather involved.

cheers,

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-05 Thread Sebastian Walter
On Thu, Jun 4, 2009 at 10:56 PM, Chris Colbert wrote:
> I should update after reading the thread Sebastian linked:
>
> The current 1.3 version of numpy (don't know about previous versions) uses
> the optimized Atlas BLAS routines for numpy.dot() if numpy was compiled with
> these libraries. I've verified this on linux only, thought it shouldnt be
> any different on windows AFAIK.

in the  best of all possible worlds this would be done by a package
maintainer


>
> chris
>
> On Thu, Jun 4, 2009 at 4:54 PM, Chris Colbert  wrote:
>>
>> Sebastian is right.
>>
>> Since Matlab r2007 (i think that's the version) it has included support
>> for multi-core architecture. On my core2 Quad here at the office, r2008b has
>> no problem utilizing 100% cpu for large matrix multiplications.
>>
>>
>> If you download and build atlas and lapack from source and enable
>> parrallel threads in atlas, then compile numpy against these libraries, you
>> should achieve similar if not better performance (since the atlas routines
>> will be tuned to your system).
>>
>> If you're on Windows, you need to do some trickery to get threading to
>> work (the instructions are on the atlas website).
>>
>> Chris
>>
>>
>
>
> ___
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-05 Thread David Paul Reichert
Thanks for the replies so far.

I had already tested using an already transposed matrix in the loop,
it didn't make any difference. Oh and btw, I'm on (Scientific) Linux.

I used the Enthought distribution, but I guess I'll have to get
my hands dirty and try to get that Atlas thing working (I'm not
a Linux expert though). My simulations pretty much consist of
matrix multiplications, so if I don't get rid of that factor 5,
I pretty much have to get back to Matlab.

When you said Atlas is going to be optimized for my system, does
that mean I should compile everything on each machine separately?
I.e. I have a not-so-great desktop machine and one of those bigger
multicore things available...

Cheers

David



Quoting David Cournapeau :

> David Warde-Farley wrote:
>> On 4-Jun-09, at 5:03 PM, Anne Archibald wrote:
>>
>>
>>> Apart from the implementation issues people have chimed in about
>>> already, it's worth noting that the speed of matrix multiplication
>>> depends on the memory layout of the matrices. So generating B instead
>>> directly as a 100 by 500 matrix might affect the speed substantially
>>> (I'm not sure in which direction). If MATLAB's matrices have a
>>> different memory order, that might be a factor as well.
>>>
>>
>> AFAIK Matlab matrices are always Fortran ordered.
>>
>> Does anyone know if the defaults on Mac OS X (vecLib/Accelerate)
>> support multicore? Is there any sense in compiling ATLAS on OS X (I
>> know it can be done)?
>>
>
> It may be worthwhile if you use a recent gcc and recent ATLAS.
> Multithread support is supposed to be much better in 3.9.* compared to
> 3.6.* (which is likely the version used on vecLib/Accelerate). The main
> issue I could foresee is clashes between vecLib/Accelerate and Atlas if
> you mix softwares which use one or the other together.
>
> For the OP question: recent matlab versions use the MKL, which is likely
> to give higher performances than ATLAS, specially on windows (compilers
> on that platform are ancient, as building atlas with native compilers on
> windows requires super-human patience).
>
> David
> ___
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-04 Thread David Cournapeau
David Warde-Farley wrote:
> On 4-Jun-09, at 5:03 PM, Anne Archibald wrote:
>
>   
>> Apart from the implementation issues people have chimed in about
>> already, it's worth noting that the speed of matrix multiplication
>> depends on the memory layout of the matrices. So generating B instead
>> directly as a 100 by 500 matrix might affect the speed substantially
>> (I'm not sure in which direction). If MATLAB's matrices have a
>> different memory order, that might be a factor as well.
>> 
>
> AFAIK Matlab matrices are always Fortran ordered.
>
> Does anyone know if the defaults on Mac OS X (vecLib/Accelerate)  
> support multicore? Is there any sense in compiling ATLAS on OS X (I  
> know it can be done)?
>   

It may be worthwhile if you use a recent gcc and recent ATLAS.
Multithread support is supposed to be much better in 3.9.* compared to
3.6.* (which is likely the version used on vecLib/Accelerate). The main
issue I could foresee is clashes between vecLib/Accelerate and Atlas if
you mix softwares which use one or the other together.

For the OP question: recent matlab versions use the MKL, which is likely
to give higher performances than ATLAS, specially on windows (compilers
on that platform are ancient, as building atlas with native compilers on
windows requires super-human patience).

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-04 Thread David Warde-Farley
On 4-Jun-09, at 5:03 PM, Anne Archibald wrote:

> Apart from the implementation issues people have chimed in about
> already, it's worth noting that the speed of matrix multiplication
> depends on the memory layout of the matrices. So generating B instead
> directly as a 100 by 500 matrix might affect the speed substantially
> (I'm not sure in which direction). If MATLAB's matrices have a
> different memory order, that might be a factor as well.

AFAIK Matlab matrices are always Fortran ordered.

Does anyone know if the defaults on Mac OS X (vecLib/Accelerate)  
support multicore? Is there any sense in compiling ATLAS on OS X (I  
know it can be done)?

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-04 Thread Anne Archibald
2009/6/4 David Paul Reichert :
> Hi all,
>
> I would be glad if someone could help me with
> the following issue:
>
>  From what I've read on the web it appears to me
> that numpy should be about as fast as matlab. However,
> when I do simple matrix multiplication, it consistently
> appears to be about 5 times slower. I tested this using
>
> A = 0.9 * numpy.matlib.ones((500,100))
> B = 0.8 * numpy.matlib.ones((500,100))
>
> def test():
>     for i in range(1000):
>         A*B.T
>
> I also used ten times larger matrices with ten times less
> iterations, used xrange instead of range, arrays instead
> of matrices, and tested it on two different machines,
> and the result always seems to be the same.
>
> Any idea what could go wrong? I'm using ipython and
> matlab R2008b.

Apart from the implementation issues people have chimed in about
already, it's worth noting that the speed of matrix multiplication
depends on the memory layout of the matrices. So generating B instead
directly as a 100 by 500 matrix might affect the speed substantially
(I'm not sure in which direction). If MATLAB's matrices have a
different memory order, that might be a factor as well.

Anne

> Thanks,
>
> David
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>
> ___
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-04 Thread Chris Colbert
I should update after reading the thread Sebastian linked:

The current 1.3 version of numpy (don't know about previous versions) uses
the optimized Atlas BLAS routines for numpy.dot() if numpy was compiled with
these libraries. I've verified this on linux only, thought it shouldnt be
any different on windows AFAIK.

chris

On Thu, Jun 4, 2009 at 4:54 PM, Chris Colbert  wrote:

> Sebastian is right.
>
> Since Matlab r2007 (i think that's the version) it has included support for
> multi-core architecture. On my core2 Quad here at the office, r2008b has no
> problem utilizing 100% cpu for large matrix multiplications.
>
>
> If you download and build atlas and lapack from source and enable parrallel
> threads in atlas, then compile numpy against these libraries, you should
> achieve similar if not better performance (since the atlas routines will be
> tuned to your system).
>
> If you're on Windows, you need to do some trickery to get threading to work
> (the instructions are on the atlas website).
>
> Chris
>
>
>
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-04 Thread Chris Colbert
Sebastian is right.

Since Matlab r2007 (i think that's the version) it has included support for
multi-core architecture. On my core2 Quad here at the office, r2008b has no
problem utilizing 100% cpu for large matrix multiplications.


If you download and build atlas and lapack from source and enable parrallel
threads in atlas, then compile numpy against these libraries, you should
achieve similar if not better performance (since the atlas routines will be
tuned to your system).

If you're on Windows, you need to do some trickery to get threading to work
(the instructions are on the atlas website).

Chris
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-04 Thread Sebastian Walter
Have a look at this thread:
http://www.mail-archive.com/numpy-discussion@scipy.org/msg13085.html

The speed difference is probably due to the fact that the matrix
multiplication does not call optimized an optimized blas routine, e.g.
the ATLAS blas.

Sebastian


On Thu, Jun 4, 2009 at 3:36 PM, David Paul Reichert
 wrote:
> Hi all,
>
> I would be glad if someone could help me with
> the following issue:
>
>  From what I've read on the web it appears to me
> that numpy should be about as fast as matlab. However,
> when I do simple matrix multiplication, it consistently
> appears to be about 5 times slower. I tested this using
>
> A = 0.9 * numpy.matlib.ones((500,100))
> B = 0.8 * numpy.matlib.ones((500,100))
>
> def test():
>     for i in range(1000):
>         A*B.T
>
> I also used ten times larger matrices with ten times less
> iterations, used xrange instead of range, arrays instead
> of matrices, and tested it on two different machines,
> and the result always seems to be the same.
>
> Any idea what could go wrong? I'm using ipython and
> matlab R2008b.
>
> Thanks,
>
> David
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>
> ___
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] performance matrix multiplication vs. matlab

2009-06-04 Thread David Paul Reichert
Hi all,

I would be glad if someone could help me with
the following issue:

 From what I've read on the web it appears to me
that numpy should be about as fast as matlab. However,
when I do simple matrix multiplication, it consistently
appears to be about 5 times slower. I tested this using

A = 0.9 * numpy.matlib.ones((500,100))
B = 0.8 * numpy.matlib.ones((500,100))

def test():
 for i in range(1000):
 A*B.T

I also used ten times larger matrices with ten times less
iterations, used xrange instead of range, arrays instead
of matrices, and tested it on two different machines,
and the result always seems to be the same.

Any idea what could go wrong? I'm using ipython and
matlab R2008b.

Thanks,

David


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion