Re: [Numpy-discussion] NumPy re-factoring project

2010-06-10 Thread Sturla Molden
Den 11.06.2010 03:02, skrev Charles R Harris:

 But for an initial refactoring it probably falls in the category of 
 premature optimization. Another thing to avoid on the first go around 
 is micro-optimization, as it tends to complicate the code and often 
 doesn't do much for performance.




First, to refractor NumPy we must get rid of array iterators as Python 
objects. Retaining them as Python objects defies the whole purpose. 
Second, creating an array of pointers has the advantage of being more 
friendly to OpenMP and multi-core CPUs. We don't need to serialize 
access to an iterator, and we allow OpenMP to do load balancing. For 
things like FFT on vectors along an axis in a multidimensional array 
(e.g. in multi-dimensional FFTs), this is not a micro-optimization. Not 
to mention that it is easier to iterate over an array than use a 
complicated iterator object in C.

Sturla
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2010-01-07 Thread Sturla Molden

 I also tried to Install numpy with intel mkl 9.1
 I still used gfortran for numpy installation as intel mkl 9.1 supports gnu
 compiler.

I would suggest using GotoBLAS instead of ATLAS. It is easier to build
then ATLAS (basically no configuration), and has even better performance
than MKL.

http://www.tacc.utexas.edu/tacc-projects/

S.M.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance matrix multiplication vs. matlab

2010-01-07 Thread Sturla Molden
 Sturla Molden wrote:
 I would suggest using GotoBLAS instead of ATLAS.

 http://www.tacc.utexas.edu/tacc-projects/

 That does look promising -- nay idea what the license is? They don't
 make it clear on the site



UT TACC Research License (Source Code)



The Texas Advanced Computing Center of The University of Texas at Austin
has developed certain software and documentation that it desires to make
available without charge to anyone for academic, research, experimental or
personal use. This license is designed to guarantee freedom to use the
software for these purposes. If you wish to distribute or make other use
of the software, you may purchase a license to do so from the University
of Texas.

The accompanying source code is made available to you under the terms of
this UT TACC Research License (this UTTRL). By clicking the ACCEPT
button, or by installing or using the code, you are consenting to be bound
by this UTTRL. If you do not agree to the terms and conditions of this
license, do not click the ACCEPT button, and do not install or use any
part of the code.

The terms and conditions in this UTTRL not only apply to the source code
made available by UT TACC, but also to any improvements to, or derivative
works of, that source code made by you and to any object code compiled
from such source code, improvements or derivative works.

1. DEFINITIONS.

1.1 Commercial Use shall mean use of Software or Documentation by
Licensee for direct or indirect financial, commercial or strategic gain or
advantage, including without limitation: (a) bundling or integrating the
Software with any hardware product or another software product for
transfer, sale or license to a third party (even if distributing the
Software on separate media and not charging for the Software); (b)
providing customers with a link to the Software or a copy of the Software
for use with hardware or another software product purchased by that
customer; or (c) use in connection with the performance of services for
which Licensee is compensated.

1.2 Derivative Products means any improvements to, or other derivative
works of, the Software made by Licensee.

1.3 Documentation shall mean all manuals, user documentation, and other
related materials pertaining to the Software that are made available to
Licensee in connection with the Software.

1.4 Licensor shall mean The University of Texas.

1.5 Licensee shall mean the person or entity that has agreed to the
terms hereof and is exercising rights granted hereunder.

1.6 Software shall mean the computer program(s) referred to as GotoBLAS2
made available under this UTTRL in source code form, including any error
corrections, bug fixes, patches, updates or other modifications that
Licensor may in its sole discretion make available to Licensee from time
to time, and any object code compiled from such source code.

2. GRANT OF RIGHTS.

Subject to the terms and conditions hereunder, Licensor hereby grants to
Licensee a worldwide, non-transferable, non-exclusive license to (a)
install, use and reproduce the Software for academic, research,
experimental and personal use (but specifically excluding Commercial Use);
(b) use and modify the Software to create Derivative Products, subject to
Section 3.2; and (c) use the Documentation, if any, solely in connection
with Licensee's authorized use of the Software.

3. RESTRICTIONS; COVENANTS.

3.1 Licensee may not: (a) distribute, sub-license or otherwise transfer
copies or rights to the Software (or any portion thereof) or the
Documentation; (b) use the Software (or any portion thereof) or
Documentation for Commercial Use, or for any other use except as described
in Section 2; (c) copy the Software or Documentation other than for
archival and backup purposes; or (d) remove any product identification,
copyright, proprietary notices or labels from the Software and
Documentation. This UTTRL confers no rights upon Licensee except those
expressly granted herein.

3.2 Licensee hereby agrees that it will provide a copy of all Derivative
Products to Licensor and that its use of the Derivative Products will be
subject to all of the same terms, conditions, restrictions and limitations
on use imposed on the Software under this UTTRL. Licensee hereby grants
Licensor a worldwide, non-exclusive, royalty-free license to reproduce,
prepare derivative works of, publicly display, publicly perform,
sublicense and distribute Derivative Products. Licensee also hereby grants
Licensor a worldwide, non-exclusive, royalty-free patent license to make,
have made, use, offer to sell, sell, import and otherwise transfer the
Derivative Products under those patent claims licensable by Licensee that
are necessarily infringed by the Derivative Products.

4. PROTECTION OF SOFTWARE.

4.1 Confidentiality. The Software and Documentation are the confidential
and proprietary information of Licensor. Licensee agrees to take adequate
steps to protect the Software and Documentation from unauthorized

Re: [Numpy-discussion] non-standard standard deviation

2009-12-06 Thread Sturla Molden
Colin J. Williams skrev:
 When one has a smallish sample size, what give the best estimate of the 
 variance? 
What do you mean by best estimate?

Unbiased? Smallest standard error?


 In the widely used Analysis of Variance (ANOVA), the degrees of freedom 
 are reduced for each mean estimated, 
That is for statistical tests, not to compute estimators.





___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] non-standard standard deviation

2009-12-05 Thread Sturla Molden
Colin J. Williams skrev:
   
  suggested that 1 (one) would be a better default but Robert Kern told 
 us that it won't happen.

   
I don't even see the need for this keyword argument, as you can always 
multiply the variance by n/(n-1) to get what you want.

Also, normalization by n gives the ML estimate (yes it has a bias, but 
it is better anyway). It is a common novice mistake to use 1/(n-1) as 
nomalization, probably due to poor advice in introductory statistics 
textbooks. It also seems that frequentists are more scared about this 
bias boogey monster than Bayesians. It may actually help beginners to 
avoid this mistake if numpy's implementation prompts them to ask why the 
normalization is 1/n.

If numpy is to change the implementation of std, var, and cov, I suggest 
using the two-pass algorithm to reduce rounding error. (I can provide C 
code.) This is much more important than changing the normalization to a 
bias-free but otherwise inferior value.

Sturla





___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] non-standard standard deviation

2009-11-30 Thread Sturla Molden
Colin J. Williams skrev:
 Where the distribution of a variate is not known a priori, then I 
 believe that it can be shown
 that the n-1 divisor provides the best estimate of the variance.
   
Have you ever been shooting with a rifle?

What would you rather do:

- Hit 9 or 10, with a bias to the right.

- Hit 7 or better, with no bias.

Do you think it can be shown that the latter option is the better?

No?


Sturla Molden
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Is anyone knowledgeable about dll deployment on windows ?

2009-11-30 Thread Sturla Molden
David Cournapeau skrev:
 If every python package starts to put its extensions (*.pyd) into a
 directory, what happens when two different packages have an extension
 with the same name (e.g. package foo has a package multiarray.pyd) ? I
 would also be really annoyed if a 3rd party extension starts polluting
 my C:\Python26.

   
In disutils, data_files can install DLLs where Python wants them. Just 
pass 'DLLs' as path, and the rest it up to distutils. If anyone pollutes 
your c:\Python26 it is distutils, not a third party extension. This is 
not any different from installing other data files.

Sturla

 





___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Is anyone knowledgeable about dll deployment on windows ?

2009-11-30 Thread Sturla Molden
David Cournapeau skrev:
 We are talking about the numpy extensions here, which are not
 installed through the install_data command. The problem is about how
 windows looks for dll with the manifest mechanism, and how to
 build/install extensions when the C runtime (or any other system
 dll) is not installed in the SxS cache.
   
Do we need to support Windows 2000?

S.M.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Is anyone knowledgeable about dll deployment on windows ?

2009-11-30 Thread Sturla Molden
David Cournapeau skrev:
 We are talking about the numpy extensions here, which are not
 installed through the install_data command. The problem is about how
 windows looks for dll with the manifest mechanism, and how to
 build/install extensions when the C runtime (or any other system
 dll) is not installed in the SxS cache.
   
Is COM (aka ActiveX) out of the question?


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy on Python3

2009-11-22 Thread Sturla Molden
Pauli Virtanen skrev:
 XXX: 3K: numpy.random is disabled for now, uses PyString_*
 XXX: 3K: numpy.ma is disabled for now -- some issues
   
I thought numpy.random uses Cython? Is it just a matter of recompiling 
the pyx-file?


   I remember Dag was working on this a bit: how far did it go?

   
Cython's include file numpy.pxd has an ndarray class that extend 
PyArrayObject with PEP 3118 buffer compatibility.


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] bus error in embedded numpy

2009-11-13 Thread Sturla Molden
Robin skrev:
 I had assumed when matlab unloads the mex function it would also
 unload python - but it looks like other dynamic libs pulled in from
 the mex function (in this case python and in turn numpy) aren't
 unloaded...
   
Matlab MEX functions are DLLs, Python interpreter is a DLL, NumPy .pyd 
files are DLLs... If you cannot unload Python or NumPy, you cannot 
unload MEX functions either. The same OS constraints apply.

If you are using Windows, I would recommend that you expose your NumPy 
code as a COM object using pywin32, and make a Matlab wrapper (ordinary 
.m file) that calls the COM object. If you are not using Windows, you 
could create an XMLRPC server in Python and call that using Matlab's 
built-in Java VM.

Or you can spawn a separate Python process from Matlab, and communicate 
using pipes or a tcp/ip socket (Matlab's Java or MEX).

There are many solutions that are easier than embedding Python in Matlab.

Sturla
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] bus error in embedded numpy

2009-11-13 Thread Sturla Molden
Robin skrev:
 So far the only remotely tricky thing I did was redirect sys.stdout
 and sys.stderr to a wrapper that uses mexPrintf so output goes to the
 matlab console.
   
Be careful when you are using file handles. You have to be sure that 
Matlab, Python and NumPy are all linked against the same CRT.

Sturla
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] bus error in embedded numpy

2009-11-13 Thread Sturla Molden
Robin skrev:
 Ah, I hadn't realised it was an OS constraint - I thought it was
 possible to unload dlls - and that was why matlab provides the clear
 function. mex automatically clears a function when you rebuild it - I
 thought that was how you can rebuild and reload mex functions without
 restarting matlab.
It is OS dependent if DLLs can be unloaded or not. In Windows you can 
call CloseHandle on the DLL handle and it's gone. If you can unload a 
MEX function you can also unload Python or NumPy.

But when you unload the MEX function, the DLLs loaded by the MEX are not 
automatically cleared. There is no garbage collector.

In Windows, a load and an unload will result in calls to a function 
called DllMain in the DLL. And if the DLL has other DLLs to load or 
clear, that is where you need to place it. You can put a custom DllMain 
function in a MEX file.

But be careful: a DLL is only loaded once, i.e. DLLs are imported as 
singletons in the process. If two MEX functions load Python26.dll, they 
get handles to the same instance. If you unload Python26.dll in one 
MEX,  it is unloaded globally from the process, so the other MEX will 
crash. There is no reference counting by the OS kernel.

The solution to this type of DLL Hell in Windows is COM. A COM object is 
a DLL but not a singleton. You can have multiple instances of the save 
COM object in one process.

On Mac I guess your options are to either statically link everything to 
the MEX file, or find a way for multiple MEX files to share Python 
interpreter, e.g. implement some sort of reference counting scheme, 
which by the way is what COM does on Windows.

 I really want a cross platform solution so that rules out COM. 
CORBA or XMLRPC seem to be the standards. I'm not sure I would use either.


 Do you think I'm likely to run into more problems? I got the feeling
 from asking questions on IRC that embedding Python is kind of
 discouraged but I thought in this case it seemed the neatest way.

   
It is discouraged because you get a singleton. You can create 
subinterpreters (cf. Apache's mod_python), but that will bring problems 
of it's own. For example you cannot use extensions that require the 
simplified GIL API (e.g. ctypes). I think this is a major design flaw of 
CPython. For example with Java's JNI, you get a context pointer to the 
VM, so you don't have any of these problems. But with CPython, both the 
interpreter and extensions are basically implemented to be loaded as 
singletons.



 I was aware of this - I thought I would be OK on the mac - at least
 Python and Numpy and my mex function are built with apple gcc although
 I'm not sure about Matlab. I guess Windows will be more difficult...
 But in any case I don't plan to pass any file handles around.

   
It applies to any CRT resource, not just files. Compiler is not 
important, but which CRT is loaded. And if you statically link the same 
CRT twice, that becomes two CRTs that cannot share resources. In 
Windows, Microsoft has made sure there are multiple versions of their 
CRT (msvcrt.dll, msvcr71.dll, msvcr80.dll, msvcr90.dll, ...) that cannot 
share resources. And anyone not careful with this can experice crashes 
at random locations.








___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Multiple Regression

2009-11-12 Thread Sturla Molden
Alexey Tigarev skrev:
 I have implemented multiple regression in a following way:

   
You should be using QR or SVD for this.

Sturla


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Random int64 and float64 numbers

2009-11-07 Thread Sturla Molden
David Cournapeau wrote:
 On Fri, Nov 6, 2009 at 6:54 AM, David Goldsmith d.l.goldsm...@gmail.com 
 wrote:
   
 Interesting thread, which leaves me wondering two things: is it documented
 somewhere (e.g., at the IEEE site) precisely how many *decimal* mantissae
 are representable using the 64-bit IEEE standard for float representation
 (if that makes sense); and are such decimal mantissae uniformly distributed
 

 They are definitely not uniformly distributed: that's why two numbers
 are close around 1 when they have only a few EPS difference, but
 around 1e100, you have to add quite a few EPS to even get a different
 number at all.

 That may be my audio processing background, but I like to think about
 float as numbers which have the same relative precision at any level
 - a kind of dB scale. If you want numbers with a fixed number of
 decimals, you need a fixed point representation.
   

David Godsmith was asking about the mantissae. For a double, that is a 
53 bit signed integer. I.e. you have 52 bit fractional part (bit 0-51), 
11 bit exponent (bit 52-62), and one sign bit (bit 63). The mantissae is 
uniformly distributed like any signed integer. The mantissae of a double 
have 2**53 different integer values: -2**52 to 2**52-1.

But the value of a floating point number is

   value = (-1)**signbit * 2**(exponent - bias) * (1 - fraction)

with bias = 1023 for a double. Thus, floating point numbers are not 
uniformly distributed, but the mantissae is.

For numerical illiterates this might come as a surprise. But in 
numerical mathematics, the resolution is in the number of significant 
digits, not in the number of decimals. 101 and .00201 have the same 
numerical precision.

A decimal, on the other hand, can be thought of as a floating point 
number using base-10 instead of base-2 for the exponent:

   value = (-1)**signbit * 10**(exponent - bias) * (1 - fraction)

Decimals and floats are not fundamentally different. There are number 
exactly representable with a decimal that cannot be exactly represented 
with a float. But numerical computation do not become more precise with 
a decimal than a float.

















___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Accessing LAPACK and BLAS from the numpy C API

2009-11-07 Thread Sturla Molden
Jake VanderPlas wrote:
 Does anybody know a
 way to directly access the numpy.linalg routines from a C extension,
 without the overhead of a python callback?  Thanks for the help.
   

You find a C function pointer wrapped in a CObject in the ._cpointer 
attribute.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Single view on multiple arrays

2009-11-01 Thread Sturla Molden
Anne Archibald skrev:
 The short answer is, you can't. 
Not really true. It is possible create an array (sub)class that stores 
memory addresses (pointers) instead of values. It is doable, but I am 
not wasting my time implementing it.

Sturla


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Single view on multiple arrays

2009-11-01 Thread Sturla Molden
Bill Blinn skrev:
 v = multiview((3, 4))
 #the idea of the following lines is that the 0th row of v is
 #a view on the first row of a. the same would hold true for
 #the 1st and 2nd row of v and the 0th rows of b and c, respectively
 v[0] = a[0]
This would not even work, becuase a[0] does not return a view array but 
a scalar arrays, which is a different type of numpy objects. To get a 
view,  you will need to:

v = a[0:1] # view of element 0 in a

Also you cannot assign to v[0], as that would trigger a copy as well.

  v[1] = b[0]
  v[2] = c[0]


As I mentioned in the answer to Anne, it would take a completely 
different array object. It would need to internally store an array with 
memory addresses. I have not made up my mind if ndarray can be 
subclassed for this, or if it takes a completely different object (e.g. 
similar to numpy.memmap). What it would require is __setitem__ to store 
pointers and __getitem__ to dereference (return an ndarray with values). 
Good look hacking, it is not even difficult, just tedious.

Sturla











___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Random int64 and float64 numbers

2009-11-01 Thread Sturla Molden
Thomas Robitaille skrev:
 np.random.random_integers(np.iinfo(np.int32).min,high=np.iinfo 
 (np.int32).max,size=10)

 which gives

 array([-1506183689,   662982379, -1616890435, -1519456789,  1489753527,
  -604311122,  2034533014,   449680073,  -444302414,  
 -1924170329])

   

This fails on my computer (Python 2.6.4, NumPy 1.3.0 on Win32).

  np.random.random_integers(np.iinfo(np.int32).min,high=np.iinfo
(np.int32).max,size=10)

Traceback (most recent call last):
  File pyshell#33, line 2, in module
(np.int32).max,size=10)
  File mtrand.pyx, line 950, in mtrand.RandomState.random_integers
  File mtrand.pyx, line 746, in mtrand.RandomState.randint
OverflowError: long int too large to convert to int

It might have something to do with this:

  2**31-1
2147483647L
  -2**31
-2147483648L

In light of this annoying behaviour:

def random_int64(size):
a0 = np.random.random_integers(0, 0x, size=size).astype(np.uint64)
a1 = np.random.random_integers(0, 0x, size=size).astype(np.uint64)
a2 = np.random.random_integers(0, 0x, size=size).astype(np.uint64)
a3 = np.random.random_integers(0, 0x, size=size).astype(np.uint64)
a = a0 + (a116) + (a2  32) + (a3  48)
return a.view(dtype=np.int64)


Sturla









___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Random int64 and float64 numbers

2009-11-01 Thread Sturla Molden
Robert Kern skrev:
 64-bit and larger integers could be done, but it requires
 modification. The integer distributions were written to support C
 longs, not anything larger. You could also use .bytes() and
 np.fromstring().
   
But as of Python 2.6.4, even 32-bit integers fail, at least on Windows.


Sturla
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Random int64 and float64 numbers

2009-11-01 Thread Sturla Molden
Robert Kern skrev:
 Then let me clarify: it was written to support integer ranges up to
 sys.maxint. Absolutely, it would be desirable to extend it.

   
I know, but look at this:

  import sys
  sys.maxint
2147483647
  2**31-1
2147483647L

sys.maxint becomes a long, which is what confuses mtrand.















___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Random int64 and float64 numbers

2009-11-01 Thread Sturla Molden
Robert Kern skrev:
 Then let me clarify: it was written to support integer ranges up to
 sys.maxint. Absolutely, it would be desirable to extend it.

   
Actually it only supports integers up to sys.maxint-1, as 
random_integers call randint. random_integers includes the upper range, 
but randint excludes the upper range. Thus, this happens on line 1153 in 
mtrand.pyx:

return self.randint(low, high+1, size)

The main source of the problem is that number smaller than sys.maxint 
can become a long. (I have asked why on python-dev, it does not make any 
sence.) So when random_integers pass high+1 to randint, it is 
unneccesarily converted to a long. Then, there is an exception on line 847:

hi = high

With hi previously declared to long, Cython refuses the conversion. Now, 
we could try a downcast to int like this:

hi = int(high)

which would make Cython only raise an exception in case of an integer 
overflow.

  int(2**31)
2147483648L
  int(2**31-1)
2147483647

If there is no overflow, high becomes an int and conversion to C long is 
allowed. Still, this will only support integer ranges up to sys.maxint - 
1. We thus have to swap the order of randint and random_intgers. The one 
with the inclusive upper interval should call rk_interval.


Sturla







































___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Random int64 and float64 numbers

2009-11-01 Thread Sturla Molden
Sturla Molden skrev:
 Robert Kern skrev:
   
 Then let me clarify: it was written to support integer ranges up to
 sys.maxint. Absolutely, it would be desirable to extend it.

   
 
 Actually it only supports integers up to sys.maxint-1, as 
 random_integers call randint. random_integers includes the upper range, 
 but randint excludes the upper range. Thus, this happens on line 1153 in 
 mtrand.pyx:

 return self.randint(low, high+1, size)

 inclusive upper interval should call rk_interval
   

I love this one:

cdef long lo, hi, diff
[...]
diff = hi - lo - 1

which silently overflows, and is the reason for this strange exception:

  np.random.random_integers(-2147483648,high=2147483646,size=10)

Traceback (most recent call last):
  File pyshell#68, line 1, in module
np.random.random_integers(-2147483648,high=2147483646,size=10)
  File mtrand.pyx, line 950, in mtrand.RandomState.random_integers
  File mtrand.pyx, line 750, in mtrand.RandomState.randint
ValueError: low = high

I'll call this a bug.


Sturla



















___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Syntax highlighting for Cython and NumPy

2009-10-27 Thread Sturla Molden
Here is an XML for Cython syntax highlighting in katepart (e.g. KATE and 
KDevelop). I made this because KATE is my faviourite text edior (feel 
free to call me a heretic for not using emacs). Unfortunately, the 
Python highlighting for KDE contains several bugs. And the Pyrex/Cython 
version that circulates on the web builds on this and introduces a 
couple more. I have tried to clean it up. Note that this will also 
highlight numpy.* or np.*, if * is a type or function you get from 
cimport numpy or import numpy. This works on Windows as well, if you 
have installed KDE for Windows. Just copy the XML to:


  ~/.kde/share/apps/katepart/syntax/
  C:\kde\share\apps\katepart\syntax  (or whereever you have KDE installed)

and Cython with NumPy shows up under Sources. Anyway, this is the 
syntax high-lighter I use to write Cython. Feel free to use it as you wish.


P.S. I am also cleaning up Python high-lighting for KDE. Not done yet, 
but I will post a Python with NumPy highlighter later on if this is 
interesting.


P.P.S. This also covers Pyrex, but add in some Cython stuff.


Sturla Molden
?xml version=1.0 encoding=UTF-8?
!DOCTYPE language
!-- Python syntax highlightning v0.9 by Per Wigren --
!-- Python syntax highlighting v1.9 by Michael Bueker (improved keyword differentiation) --
!-- Python syntax highlighting v1.97 by Paul Giannaros --
!-- Pyrex syntax highlighting v0.1 by Matthew Marshall --
!-- Cython syntax highlighting v1.0 by Martin Gysel --
!-- Cython syntax highlighting v1.1 by Sturla Molden --
language name=Cython with NumPy version=1.1 kateversion=2.4 section=Sources extensions=*.pyx;*.pxi;*.pxd mimetype=application/x-cython;text/x-cython casesensitive=1 author=Sturla Molden license=
	highlighting
	list name=imports
			item import /item
			item from /item
			item as /item
/list
list name=cimport
!-- might be used to trigger special highlighting after cimport cython and cimport numpy --
item cimport /item
/list
list name=prep
item DEF /item
item IF /item
item ELIF /item
item ELSE /item
item include /item
/list
		list name=defs
			item class /item
			item cdef /item
			item cpdef /item
			item extern /item
			item ctypedef /item
			item def /item 
			item del /item
			item global /item
			item lambda /item
			item struct /item
			item enum /item
			item property /item
			item readonly /item
			item public /item
item nogil /item
			item gil /item
		/list
		list name=operators
			item and /item
			item assert /item
			item in /item
			item is /item
			item not /item
			item or /item
item sizeof /item 
		/list
		list name=commands
			!-- item exec /item --
			item print /item
		/list
		list name=flow
			item break /item
			item continue /item
			item elif /item
			item else /item
			item except /item
			item finally /item
			item for /item
			item if /item
			item pass /item
			item raise /item
			item return /item
			item try /item
			item while /item
			item with /item
!-- item yield /item --
		/list
		list name=builtinfuncs
			item __import__ /item
			item abs /item
			item all /item
			item any /item
			item apply /item
			item basestring /item
			item buffer /item
			item callable /item
			item chr /item
			item classmethod /item
			item cmp /item
			item coerce /item
			item compile /item
			item delattr /item
			item dir /item
			item divmod /item
			item enumerate /item
			item eval /item
			item execfile /item
			item filter /item
			item getattr /item
			!-- item globals /item --
			item hasattr /item
			item hash /item
			item hex /item
			item id /item
			item input /item
			item intern /item
			item isinstance /item
			item issubclass /item
			item iter /item
			item len /item
			!-- item locals /item --
			item map /item
			item max /item
			item min /item
			item oct /item
			item open /item
			item ord /item
			item pow /item
			!-- item property /item --
			item range /item
			item raw_input /item
			item reduce /item
			item reload /item
			item repr /item
			item reversed /item
			item round /item
			item setattr /item
			item sorted /item
			item staticmethod /item
			item sum /item
			item super /item
			item type /item
			item unichr /item
			item unicode /item
			!-- item vars /item --
			item xrange /item
			item zip /item
  		/list
		list name=ctypes
!-- native types --
			item unsigned /item
			item void /item
			item enum /item
item double /item
			item long /item
			item short /item
			item char /item
			item Py_ssize_t /item
item Py_intptr_t /item
item Py_buffer /item
item bint /item
!-- python types --
/list
list name=pytypes
item int /item
item float /item
item object /item
item list /item
item tuple /item
item str /item
item dict /item

Re: [Numpy-discussion] [Cython] Syntax highlighting for Cython and NumPy

2009-10-27 Thread Sturla Molden

Sturla Molden skrev:


and Cython with NumPy shows up under Sources. Anyway, this is the 
syntax high-lighter I use to write Cython. 

It seems I posted the wrong file. :-(

S.M.
?xml version=1.0 encoding=UTF-8?
!DOCTYPE language
!-- Python syntax highlightning v0.9 by Per Wigren --
!-- Python syntax highlighting v1.9 by Michael Bueker (improved keyword differentiation) --
!-- Python syntax highlighting v1.97 by Paul Giannaros --
!-- Pyrex syntax highlighting v0.1 by Matthew Marshall --
!-- Cython syntax highlighting v1.0 by Martin Gysel --
!-- Cython syntax highlighting v1.1 by Sturla Molden --
language name=Cython with NumPy version=1.1 kateversion=2.4 section=Sources extensions=*.pyx;*.pxi;*.pxd mimetype=application/x-cython;text/x-cython casesensitive=1 author=Sturla Molden license=
	highlighting
list name=as
item as /item
/list
   	list name=imports
			item cimport /item
item import /item
item from /item
item as /item
/list
list name=prep
item DEF /item
item IF /item
item ELIF /item
item ELSE /item
item include /item
/list
		list name=defs
item class /item
			item cdef /item
			item cpdef /item
			item extern /item
			item ctypedef /item
			item def /item 
			item del /item
			item global /item
			item lambda /item
			item struct /item
			item enum /item
			item property /item
			item readonly /item
			item public /item
item nogil /item
			item gil /item
item inline /item
		/list
		list name=operators
			item and /item
			item assert /item
			item in /item
			item is /item
			item not /item
			item or /item
item sizeof /item 
		/list
		list name=commands
			!-- item exec /item --
			item print /item
		/list
		list name=flow
			item break /item
			item continue /item
			item elif /item
			item else /item
			item except /item
			item finally /item
			item for /item
			item if /item
			item pass /item
			item raise /item
			item return /item
			item try /item
			item while /item
			item with /item
!-- item yield /item --
		/list
		list name=builtinfuncs
			item __import__ /item
			item abs /item
			item all /item
			item any /item
			item apply /item
			item basestring /item
			item buffer /item
			item callable /item
			item chr /item
			item classmethod /item
			item cmp /item
			item coerce /item
			item compile /item
			item delattr /item
			item dir /item
			item divmod /item
			item enumerate /item
			item eval /item
			item execfile /item
			item filter /item
			item getattr /item
			!-- item globals /item --
			item hasattr /item
			item hash /item
			item hex /item
			item id /item
			item input /item
			item intern /item
			item isinstance /item
			item issubclass /item
			item iter /item
			item len /item
			!-- item locals /item --
			item map /item
			item max /item
			item min /item
			item oct /item
			item open /item
			item ord /item
			item pow /item
			!-- item property /item --
			item range /item
			item raw_input /item
			item reduce /item
			item reload /item
			item repr /item
			item reversed /item
			item round /item
			item setattr /item
			item sorted /item
			item staticmethod /item
			item sum /item
			item super /item
			item type /item
			item unichr /item
			item unicode /item
			!-- item vars /item --
			item xrange /item
			item zip /item
  		/list
		list name=ctypes
!-- native types --
			item unsigned /item
			item void /item
			item enum /item
item double /item
			item long /item
			item short /item
			item char /item
			item Py_ssize_t /item
item Py_intptr_t /item
item Py_buffer /item
item bint /item
!-- python types --
/list
list name=pytypes
item int /item
item float /item
item object /item
item list /item
item tuple /item
item str /item
item dict /item
item set /item
item frozenset /item
item slice /item
item bool /item
item complex /item
item file /item
!-- numpy types --
/list
list name=numpyselector
item np /item
item numpy /item
/list
list name=numpymodule
item numpy /item
/list
list name=cythonmodule
item cython /item
/list
list name=numpytypes
item  dtype /item
item  flatiter /item
item  broadcast /item
item  ndarray /item
item  int8_t /item
item  int16_t /item
item  int32_t /item
item  int64_t /item
item  uint8_t /item
item  uint16_t /item
item  uint32_t /item
item  uint64_t /item
item  float32_t /item
item

Re: [Numpy-discussion] [Cython] Syntax highlighting for Cython and NumPy

2009-10-27 Thread Sturla Molden
Lisandro Dalcin skrev:
 Is there any specific naming convention for these XML files to work
 with KATE? Would it be fine to call it 'cython-mode-kate.xml' to push
 it to the repo? Will it still work (I mean, with that name) when
 placed appropriately in KATE config dirs or whatever? ... Just
 concerned that 'cython.xml' is a bit too generic filename...

   
You can name it anything you want. The file has an entry like this:

language name=Cython with NumPy version=1.1 kateversion=2.4 
section=Sources extensions=*.pyx;*.pxi;*.pxd

that tells KDE what this is and where to put it.

I was thinking of changing section to Scientific,  because that I 
where I want to put  Python with NumPy as well. It makes sence to have 
NumPy highlighting alongside Matlab and Octave.

S.M.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] fftpack_lite question

2009-10-26 Thread Sturla Molden
Ralf Gommers skrev:

 If anyone with knowledge of the differences between the C and Fortran 
 versions could add a few notes at the above link, that would be great.

The most notable difference (from a user perspective) is that the 
Fortran version has more transforms, such as discrete sine and cosine 
transforms. It also supports single and double precision. The older 
Fortran version is used in SciPy.

FFTs from FFTW and MKL tend to be faster than FFTPACK, at least on Intel 
hardware. FFTPACK was originally written for running fast on vector 
machines like the Cray and NEC.

FFTPACK-lite:
http://projects.scipy.org/numpy/browser/trunk/scipy/basic/fftpack_lite?rev=1676

Older Fortran version:
http://www.netlib.org/fftpack/

Fortran 90 version (no license):
http://orion.math.iastate.edu/burkardt/f_src/fftpack/fftpack.html

Another C version:
http://www.netlib.org/cgi-bin/netlibfiles.txt?format=txtfilename=fftpack/fft.c 
http://www.netlib.org/cgi-bin/netlibfiles.txt?format=txtfilename=fftpack/fft.c




S.M.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy and C99

2009-10-23 Thread Sturla Molden
Dag Sverre Seljebotn skrev:

 

 Microsoft's compilers don't support C99 (or, at least, versions that 
 still has to be used doesn't).

   
Except for automatic arrays, they do support some of the more important 
parts of C99 as extensions to C89:

inline functions
restrict qualifier
for (int i=0; i; i++)


Personally I think all of NumPy's C base should be moved to Cython. With 
your excellent syntax for PEP 3118 buffers, I see no reason to keep 
NumPy in C. This would make porting to Py3k as well as maintainence 
easier. When Cython can build Sage, it can be used for a smaller project 
like NumPy as well.

The question of using C89, C99 or C++ would be deferred to the Cython 
compiler. We could use C++ on one platform (MSVC) and C99 on another 
(GCC). We would also get direct support for C99 _Complex and C++ 
std::complex types.

I'd also suggest that ndarray subclasses memoryview in Py3k.


S.M.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-22 Thread Sturla Molden
Robert Kern skrev:
 No, I think you're right. Using SIMD to refer to numpy-like
 operations is an abuse of the term not supported by any outside
 community that I am aware of. Everyone else uses SIMD to describe
 hardware instructions, not the application of a single syntactical
 element of a high level language to a non-trivial data structure
 containing lots of atomic data elements.
   
Then you should pick up a book on parallel computing.

It is common to differentiate between four classes of computers: SISD, 
MISD, SIMD, and MIMD machines.

A SISD system is the classical von Neuman machine. A MISD system is a 
pipelined von Neuman machine, for example the x86 processor.

A SIMD system is one that has one CPU dedicated to control, and a large 
collection of subordinate ALUs for computation. Each ALU has a small 
amount of private memory. The IBM Cell processor is the typical SIMD 
machine.

A special class of SIMD machines are the so-called vector machines, of 
which the most famous is the Cray C90. The MMX and SSE instructions in 
Intel Pentium processors are an example of vector instructions. Some 
computer scientists regard vector machines a subtype of MISD systems, 
orthogonal to piplines, because there are no subordinate ALUs with 
private memory.

MIMD systems multiple independent CPUs. MIMD systems comes in two 
categories: shared-memory processors (SMP) and distributed-memory 
machines (also called cluster computers). The dual- and quad-core x86 
processors are shared-memory MIMD machines.

Many people associate the word SIMD with SSE due to Intel marketing. But 
to the extent that vector machines are MISD orthogonal to piplined von 
Neuman machines, SSE cannot be called SIMD.

NumPy is a software simulated vector machine, usually executed on MISD 
hardware. To the extent that vector machines (such as SSE and C90) are 
SIMD, we must call NumPy an object-oriented SIMD library.


S.M.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-22 Thread Sturla Molden
Matthieu Brucher skrev:
 I agree with Sturla, for instance nVidia GPUs do SIMD computations
 with blocs of 16 values at a time, but the hardware behind can't
 compute on so much data at a time. It's SIMD from our point of view,
 just like Numpy does ;)

   
A computer with a CPU and a GPU is a SIMD machine by definition, due to 
the single CPU and the multiple ALUs in the GPU, which are subordinate 
to the CPU. But with modern computers, these classifications becomes a 
bit unclear.

S.M.




___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-22 Thread Sturla Molden
Mathieu Blondel skrev:
 Peter Norvig suggested to merge Numpy into Cython but he didn't
 mention SIMD as the reason (this one is from me). 

I don't know what Norvig said or meant.

However:

There is NumPy support in Cython. Cython has a general syntax applicable 
to any PEP 3118 buffer. (As NumPy is not yet PEP 3118 compliant, NumPy 
arrays are converted to Py_buffer structs behind the scenes.)

Support for optimized vector expressions might be added later. 
Currently, slicing works as with NumPy in Python, producing slice 
objects and invoking NumPy's own code, instead of being converted to 
fast inlined C.

The PEP 3118 buffer syntax in Cython can be used to port NumPy to Py3k, 
replacing the current C source. That might be what Norvig meant if he 
suggested merging NumPy into Cython.


S.M.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-22 Thread Sturla Molden
Mathieu Blondel skrev:
 As I wrote earlier in this thread, I confused Cython and CPython. PN
 was suggesting to include Numpy in the CPython  distribution (not
 Cython). The reason why was also given earlier.

   
First, that would currently not be possible, as NumPy does not support 
Py3k. Second, the easiest way to port NumPy to Py3k is Cython, which 
would prevent adoption in the Python standard library. At least they 
have to change their current policy. Also with NumPy in the standard 
library, any modification to NumPy would require a PEP.

But Python should have a PEP 3118 compliant buffer object in the 
standard library, which NumPy could subclass.

S.M.





___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-22 Thread Sturla Molden
Robert Kern skrev:
 I would be delighted to see a reference to one that refers to a high
 level language's API as SIMD. Please point one out to me. It's
 certainly not any of the ones I have available to me.

   
Numerical Receipes in Fortran 90, page 964 and 985-986, describes the 
syntax of Fortran 90 and 95 as SIMD.

Peter Pacheco's book on MPI describes the difference between von Neumann 
machines and vector machines as analogous to the difference between 
Fortran77 and Fortran 90 (with an example from Fortran90 array slicing). 
He is ambigous as to whether vector machines really are SIMD, or more 
related to pipelined von Neumann machines.

Grama et al. Introduction to Parallel Computing describes SIMD as an 
architecture, but it is more or less clear that the mean hardware. 
They do say the Fortran 90 where statement is a primitive used to 
support selective execution on SIMD processors, as conditional execution 
(if statements) are detrimental to performance.

So at least we here have three books claiming that Fortran is a language 
with special primities for SIMD processors.


 That's a fair point, but unrelated to whether or not numpy can be
 labeled SIMD. These all refer to hardware.
   
Actually I don't think the distinction is that important as we are 
taking about Turing machines. Also, a lot of what we call hardware is 
actually implemented  as software on the chip: The most extreme example 
would be Transmeta, which completely software emulated x86 processors. 
The vague distinction between hardware and software is why we get 
patents on software in Europe, although pure software patents are 
prohibited. One can always argue that the program and the computer 
together constitutes a physical device; and circumventing patents by 
moving hardware into software should not be allowed. The distinction 
between hardware and software is not as clear as programmers tend to 
believe.

Another thing is that performance issues for vector machines and vector 
languages (Fortran 90, Matlab, NumPy) are similar. Precisely the same 
situations that makes NumPy and Matlab code slow are detrimental on 
SIMD/vector hardware. That would for example be long for loops with 
conditional if statements. On the other hand, vectorized operations over 
arrays, possibly using where/find masks, are fast. So although NumPy is 
not executed on a vector machine like the Cray C90, it certainly behaves 
like one performance wise.

I'd say that a MIMD machine running NumPy is a Turing machine emulating 
a SIMD/vector machine.

And now I am done with this stupid discussion...


Sturla Molden
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy

2009-10-21 Thread Sturla Molden
Mathieu Blondel skrev:
 Hello,

 About one year ago, a high-level, objected-oriented SIMD API was added
 to Mono. For example, there is a class Vector4f for vectors of 4
 floats and this class implements methods such as basic operators,
 bitwise operators, comparison operators, min, max, sqrt, shuffle
 directly using SIMD operations.
I think you are confusing SIMD with Intel's MMX/SSE instruction set.

SIMD means single instruction - multiple data. NumPy is interherently 
an object-oriented SIMD API:

  array1[:] = array2 + array3

is a SIMD instruction by definition.

SIMD instructions in hardware for length-4 vectors are mostly useful for 
3D graphics. But they are not used a lot for that purpose, because GPUs 
are getting common. SSE is mostly for rendering 3D graphics without a 
GPU. There is nothing that prevents NumPy from having a Vector4f dtype, 
that internally stores four float32 and is aligned at 16 byte 
boundaries. But it would not be faster than the current float32 dtype. 
Do you know why?

The reason is that memory access is slow, and computation is fast. 
Modern CPUs are starved. The speed of NumPy is not limited by not using 
MMX/SSE whenever possible. It is limited from having to create and 
delete temporary arrays all the time. You are suggesting to optimize in 
the wrong place. There is a lot that can be done to speed up 
computation: There are optimized BLAS libraries like ATLAS and MKL. 
NumPy uses BLAS for things like matrix multiplication. There are OpenMP 
for better performance on multicores. There are OpenCL and CUDA for 
moving computation from CPUs to GPU. But the main boost you get from 
going from NumPy to hand-written C or Fortran comes from reduced memory use.

 existing discussion here. Memory-alignment is an import related issue
 since non-aligned movs can tank the performance.

   

You can align an ndarray on 16-byte boundary like this:

def aligned_array(N, dtype):
 d = dtype()
 tmp = numpy.zeros(N * d.nbytes + 16, dtype=numpy.uint8)
 address = tmp.__array_interface__['data'][0]
 offset = (16 - address % 16) % 16
 return tmp[offset:offset+N].view(dtype=dtype)


Sturla Molden










___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Optimized sum of squares

2009-10-18 Thread Sturla Molden
Skipper Seabold skrev:
 I'm curious about this as I use ss, which is just np.sum(a*a, axis),
 in statsmodels and didn't much think about it.

 Do the number of loops matter in the timings and is dot always faster
 even without the blas dot?
   
The thing is that a*a returns a temporary array with the same shape as 
a, and then that is passed to np.sum. The BLAS dot product don't need to 
allocate and deallocate temporary arrays.

S.M.


  
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] object array alignment issues

2009-10-16 Thread Sturla Molden
Francesc Alted skrev:
 The response is clear: avoid memcpy() if you can.  It is true that memcpy() 
 performance has improved quite a lot in latest gcc (it has been quite good in 
 Win versions since many years ago), but working with data in-place (i.e. 
 avoiding a memory copy) is always faster (and most specially for large arrays 
 that don't fit in cache processors).

 My own experiments says that, with an Intel Core2 processor the typical speed-
 ups for avoiding memcpy() are 2x. 
If the underlying array is strided, I have seen the opposite as well. 
Copy-in copy-out is a common optimization used by Fortran compilers 
when working with strided arrays. The catch is that the work array has 
to fit in cache for this to make any sence. Anyhow, you cannot use 
memcpy for this kind of optimization - it assumes both buffers are 
contiguous. But working with arrays directly instead of copies is not 
always the faster option.

S.M.















  And I've read somewhere that both AMD and 
 Intel are trying to make unaligned operations to go even faster in next 
 architectures (the goal is that there should be no speed difference in 
 accessing aligned or unaligned data).

   
 I believe the memcpy approach is used for other unaligned parts of void
 types. There is an inherent performance penalty there, but I don't see how
 it can be avoided when using what are essentially packed structures. As to
 memcpy, it's performance seems to depend on the compiler/compiler version,
 old versions of gcc had *horrible* implementations of memcpy. I believe the
 situation has since improved. However, I'm not sure we should be coding to
 compiler issues unless it is unavoidable or the gain is huge.
 

 IMO, NumPy can be improved for unaligned data handling.  For example, Numexpr 
 is using this small snippet:

 from cpuinfo import cpu
 if cpu.is_AMD() or cpu.is_Intel():
 is_cpu_amd_intel = True
 else:
 is_cpu_amd_intel = False

 for detecting AMD/Intel architectures and allowing the code to avoid memcpy() 
 calls for the unaligned arrays.

 The above code uses the excellent ``cpuinfo.py`` module from Pearu Peterson, 
 which is distributed under NumPy, so it should not be too difficult to take 
 advantage of this for avoiding unnecessary copies in this scenario.

   

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Resize Method for Numpy Array

2009-09-24 Thread Sturla Molden
Robert Kern skrev:
 While this description is basically true of numpy arrays, I would
 caution you that every language has a different lexicon, and the same
 word can mean very different things in each. For example, Python lists
 are *not* linked lists; they are like C++'s std::vectors with a
 preallocation strategy to make appending cheap on average.
   
In Java and .NET jargon, Python lists are array lists, not linked lists.

It is sad there is no cons or llist built-in type, something like:

   mycons = cons(car, cdr)
   mylist = llist(iterable)


Of course we can write [car, cdr] or (car, cdr) for making linked lists 
in pure Python (without having to define class types), but both have 
issues.The first is storage inefficient, the latter is not mutable.

Yes I know Guido left out linked lists for a purpose, so there is 
probably no use complaining on the Python ideas of Python dev lists...


S.M.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Resize Method for Numpy Array

2009-09-24 Thread Sturla Molden
Robert Kern skrev:
 collections.deque() is a linked list of 64-item chunks.
   
Thanks for that useful information. :-) But it would not help much for a 
binary tree...

Since we are on the NumPy list... One could image making linked lists 
using NumPy  arrays with dtype=object. They are storage efficient like 
tuples, and mutable like lists.

def cons(a,b):
return np.array((a,b),dtype=object)

But I guess the best way is to implement a real linked extension type in 
Cython.

S.M.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy and cython in pure python mode

2009-09-22 Thread Sturla Molden
Sebastian Haase skrev:
 I know that cython's numpy is still getting better and better over
 time, but is it already today possible to have numpy support when
 using Cython in pure python mode?
   
I'm not sure. There is this odd memoryview syntax:

import cython
view = cython.int[:,:](my2darray)

print view[3,4] # fast when compiled, according to Sverre



S.M.



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Best way to insert C code in numpy code

2009-09-22 Thread Sturla Molden
René Dudfield skrev:
 Another way is to make your C function then load it with ctypes(or
 wrap it with something else) and pass it pointers with
 array.ctype.data.  

numpy.ctypeslib.ndpointer is preferred when using ndarrays with ctypes.

 You can find the shape of the array in python, and
 pass it to your C function.  The benefit is it's just C code, and you
 can avoid the GIL too if you want.  Then if you keep your C code
 separate from python stuff other people can use your C code in other
 languages more easily.
You can do this with Cython as well, just use Cython for the glue code. 
The important difference is this: Cython is a language for writing C 
extensions, ctypes is a module for calling DLLs.

One important advantage of Cython is deterministic clean-up code. If you 
put a __dealloc__ method in a cdef class, it will be called on garbage 
collection.

Another nice way of interfacing C with numpy is f2py. It also works with 
C, not just Fortran.

Yet another way (Windows specific) is to use win32com.client and pass 
array.ctype.data. That is nice if you have an ActiveX control; for 
Windows you often get commercial libraries like that. Also if you have 
.NET or Java objects, you can easily expose them to COM.












___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Best way to insert C code in numpy code

2009-09-22 Thread Sturla Molden
René Dudfield skrev:
 Another way is to make your C function then load it with ctypes
Also one should beware that ctypes is a stable part of the Python 
standard library.

Cython is still unstable and in rapid development.

Pyrex is more stabile than Cython, but interfacing with ndarrays is harder.

If you have a requirement on not using experimental code, then Cython is 
not an option.









___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Best way to insert C code in numpy code

2009-09-22 Thread Sturla Molden
Xavier Gnata skrev:
 I have a large 2D numpy array as input and a 1D array as output.
 In between, I would like to use C code.
 C is requirement because it has to be fast and because the algorithm 
 cannot be written in a numpy oriented way :( (no way...really).
   
There are certain algorithms that cannot be vectorized, particularly 
those that are recursive/iterative. One example is MCMC methods such as 
the Gibbs sampler. You can get around it by running multiple Markov 
chains in parallel, and vectorizing this parallelism with NumPy. But you 
cannot vectorize one long chain. Vectorizing with NumPy only applies to 
data parallel problems.

But then there is a nice tool you should not about: Cython in pure 
Python mode. You just add some annotations to the Python code, and the 
.py file can be compiled to efficient C.

http://wiki.cython.org/pure

This is quite similar in spirit to the optional static typing that makes 
certain implementations of Common Lisp (CMUCL, SBCL, Franz) so insanely 
fast.



 










___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] matlab for numpy users

2009-09-17 Thread Sturla Molden
Christian K. skrev:
 Is there something like broadcasting in matlab? 
Matlab does not do automatic broadcasting like NumPy and Fortran 95. You 
have to broadcast manually, mostly using repmat (but there are other 
ways as well). This should help:

http://home.online.no/~pjacklam/matlab/doc/mtt/doc/mtt.pdf 
http://home.online.no/%7Epjacklam/matlab/doc/mtt/doc/mtt.pdf


Sturla Molden


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Dot product performance on python 2.6 (windows)

2009-09-11 Thread Sturla Molden
V. Armando Solé skrev:
 import numpy
 import time
 a=numpy.arange(100.)
 a.shape=1000,1000
 t0=time.time()
 b=numpy.dot(a.T,a)
 print Elapsed time = ,time.time()-t0

 reports an Elapsed time of 1.4 seconds under python 2.5 and 15 seconds 
 under python 2.6
   
My computer reports 0.34 seconds (Pytin 2.6.2, Win32).


S.M.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Dot product performance on python 2.6 (windows)

2009-09-11 Thread Sturla Molden
V. Armando Solé skrev:
 In python 2.6:

  import numpy.core._dotblas as dotblas
 ...
 ImportError: No module named _dotblas
   

  import numpy.core._dotblas as dotblas
  dotblas.__file__
'C:\\Python26\\lib\\site-packages\\numpy\\core\\_dotblas.pyd'



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Sturla Molden
Citi, Luca skrev:
 That is exactly why numexpr is faster in these cases.
 I hope one day numpy will be able to perform such
 optimizations.
   
I think it is going to require lazy evaluation. Whenever possible, an 
operator would just return a symbolic representation of the operation. 
This would gradually build up a tree of operators and buffers. When 
someone tries to read the data from an array, the buffer is created 
on-demand by flushing procratinated expressions. One must be sure that 
the buffers referenced in an incomplete expression never change. This 
would be easiest to ensure with immutable buffers.  Numexpr is the kind 
of  back-end a system like this would require.  But a lot of the code in 
numexpr can be omitted because Python creates the parse tree; we would 
not need the expression parser in numexpr as frontend. Well... this plan 
is gradually getting closer to a specialized SciPy JIT-compiler. I would 
be fun to make if I could find time for it.

Sturla Molden


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Sturla Molden
Rohit Garg skrev:
 gtx280--141GBps--has 1GB
 ati4870--115GBps--has 1GB
 ati5870--153GBps (launches sept 22, 2009)--2GB models will be there too
   
That is going to help if buffers are kept in graphics memory. But the 
problem is that graphics memory is a scarse resource.

S.M.






___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-09 Thread Sturla Molden
George Dahl skrev:
  I know that for my work, I can get around an order of a 50-fold 
speedup over
  numpy using a python wrapper for a simple GPU matrix class.  So I 
might be
  dealing with a lot of matrix products where I multiply a fixed 512 by 
784 matrix
  by a 784 by 256 matrix that changes between each matrix product, 
although to
  really see the largest gains I use a 4096 by 2048 matrix times a 
bunch of 2048
  by 256 matrices.



Matrix multiplication is at the core of 3D graphics, and the raison 
d'etre for GPUs. That is specifically what they are designed to do. 
Matrix multiplication scale O(n**3) with floating point operations and 
O(n**2) with memory access. That is GPUs gives fast 3D graphics (matrix 
multiplications) by speeding up floating point operations.

GPUs makes sence for certain level-3 BLAS calls, but that really belongs 
in BLAS, not in NumPy's core. One could e.g. consider linking with a 
BLAS wrapper that directs these special cases to the GPU and the rest to 
ATLAS / MKL / netlib BLAS.

Sturla Molden
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-09 Thread Sturla Molden
James Bergstra skrev:
 Suppose you want to evaluate dot(a*b+c*sqrt(d), e).  The GPU is
 great for doing dot(), 
The CPU is equally great (or better?) for doing dot(). In both cases:

- memory access scale O(n) for dot producs.
- computation scale O(n) for dot producs.
- memory is low
- computation is fast (faster for GPU)

In both cases, the floating point unit is starved. That means it could 
do a lot more work if memory were faster.

For the GPU to be faster than CPU, you have to have a situation where 
computation dominates over memory access. Matrix-matrix multiplication 
is one such example. This is what GPUs are designed to do, as it is the 
major bootleneck in 3D graphics.

The proper way to speed up dot(a*b+c*sqrt(d), e) is to get rid of 
temporary intermediates. That is, in Python pseudo-code:

result = 0
for i in range(n):
result += (a[i]*b[i] + c[i]*sqrt(d[i])) * e[i]

instead of:

tmp0 = empty(n)
for i in range(n):
   tmp0[i] = a[i] * b[i]

tmp1 = empty(n)
for i in range(n):
   tmp1[i] = sqrt(d[i])

tmp2 = empty(n)
for i in range(n):
   tmp2[i] = c[i] * tmp1[i]

tmp3 = empty(n)
for i in range(n):
   tmp3[i]  = tmp0[i] + tmp2[i]

result = 0
for i in range(n): 
   result += tmp3[i] * e[i]


It is this complication that makes NumPy an order of magnitude slower 
than hand-crafted C (but still much faster than pure Python!) Adding in 
GPUs will not change this. The amount of computation (flop count) is the 
same, so it cannot be the source of the slowness.


Sturla Molden



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Huge arrays

2009-09-08 Thread Sturla Molden
Daniel Platz skrev:
 data1 = numpy.zeros((256,200),dtype=int16)
 data2 = numpy.zeros((256,200),dtype=int16)

 This works for the first array data1. However, it returns with a
 memory error for array data2. I have read somewhere that there is a
 2GB limit for numpy arrays on a 32 bit machine but shouldn't I still
 be below that? I use Windows XP Pro 32 bit with 3GB of RAM.

There is a 2 GB limit for user space on Win32, this is about 1.9 GB. You 
have other programs running as well, so this is still too much. Also 
Windows reserves 50% of RAM for itself, so you have less than 1.5 GB to 
play with.

S.M.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] object dtype questions

2009-09-06 Thread Sturla Molden
Mark Wendell skrev:
 for i in range(5):
  for j in range(5):
   a[i,j].myMethod(var3,var4)
   print a[i,j].attribute1

 Again, is there a quicker way than above to call myMethod or access attribute1
One option is to look up the name of the method unbound, and then use 
built-in function map.

map( cls.myMethod, a )

is similar to:

[aa.myMethod() for aa in a]

Using map avoids looking up the name 'myMethod' for each instance, it is 
done only once. map can be a lot faster or hardly faster at all, 
depending on the amount of work done by the function. The more work 
being performed, the less you benefit from map.

You can pass in variables (var3,var4) by giving additional sequences to 
map.

S.M.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] object dtype questions

2009-09-06 Thread Sturla Molden
Alan G Isaac skrev:
 http://article.gmane.org/gmane.comp.python.general/630847
   

Yes, but here you still have to look up the name 'f' from locals in each 
iteration. map is written in C, once it has as PyObject* to the callable 
it does not need to look up the name anymore. The dictionary lookup(s) 
to get the callable is done just once. map is also more thread-safe. 
During the iteration another thread could rebind the name of the 
callable, but map is impervious to that.

S.M.



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] fwrite() failure in PyArray_ToFile

2009-09-04 Thread Sturla Molden
David Warde-Farley skrev:
 The odd values might be from the format code in the error message:

PyErr_Format(PyExc_ValueError,
%ld requested and %ld written,
(long) size, (long) n);
 

 Yes, I saw that. My C is rusty, but wouldn't the cast take care of it?  
 n is of type size_t, which is pretty big, and a cast to long shouldn't  
 be an issue. And if (hopefully) PyErr_Format is correct...
   
A long is (usually) 32 bit, even on 64 bit systems. This means that size 
is casted to an integer between - 2**31 and 2*31 - 1. As 2**31 bytes are 
2 GB, the expression (long) size will overflow if a write of 2GB or 
more failed. Thus you get some bogus numbers in the formatted message. 
There is thus a bug in the call to PyErr_Format, as it only works 
reliably on 32 bits systems.

S.M.




___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] fwrite() failure in PyArray_ToFile

2009-09-04 Thread Sturla Molden
Charles R Harris skrev:
 The size of long depends on the compiler as well as the operating 
 system. On linux x86_64, IIRC, it is 64 bits, on Windows64 I believe 
 it is 32. Ints always seem to be 32 bits.
If I remember the C standard correctly, a long is guaranteed to be at 
least 32 bits, whereas an int is guaranteed to be at least 16 bits. A 
short is at least 16 bits and a long long is 64 bits. Then there is 
intptr_t which is wide enough to store a pointer, but could be wider. A 
size_t usually has the size of a pointer but not always (on segment and 
offset architectures they might differ). Considering PEP353, should we 
be indexing with Py_ssize_t instead of npy_intp? I believe (correct me 
if I'm wrong) npy_intp is typedef'ed to Py_intptr_t.

S.M.






___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A faster median (Wirth's method)

2009-09-03 Thread Sturla Molden
Chad Netzer skrev:
 That's right, Robert. Basically, I meant doing a median on a square
 (or rectangle) view of an array, without first having to ravel(),
 thus generally saving a copy.  But actually, since my selection based
 median overwrites the source array, it may not save a copy anyway.
   
Avoiding copies of timy buffers is futile optimization.

QuickSelect is overkill for tiny buffers like common filter kernels. 
Insertion sort is fine.

Getting rid of loops and multiple function calls in Python helps a lot.

If memory is not an issue, with np.median you can actully create an 
quite an efficient median filter using a 3D ndarray. For example if you 
use an image of 640 x 480 pixels and want a 9 pixel median filter, you 
can put shifted images in an 640 x 480 x 9 ndarray, and call  median 
with axis=2.

 
Sturla Molden


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A faster median (Wirth's method)

2009-09-02 Thread Sturla Molden
Dag Sverre Seljebotn skrev:

  a) Is the cast to numpy.npy_intp really needed? I'm pretty sure shape is
 
  defined as numpy.npy_intp*.

I don't know Cython internals in detail but you do, I so take your word 
for it. I thought shape was a tuple of Python ints.


  b) If you want higher performance with contiguous arrays (which occur a
  lot as inplace=False is default I guess) you can do
 
  np.ndarray[T, ndim=1, mode=c]
 
  to tell the compiler the array is contiguous. That doubles the number of
  function instances though...

Thanks. I could either double the number of specialized select 
functions, or I could make a local copy using numpy.ascontiguousarray in 
the select function.

Quickselect touch the discontiguous array on average 2*n times, whereas 
numpy.ascontiguousarray touch the discontiguous array n times (but in 
orderly). Then there is the question of cache use: Contiguous arrays are 
the more friendly case, and numpy.ascontiguousarray is more friendly 
than quickselect. Also if quickselect is not done inplace (the common 
case for medians), we always have contigous arrays, so mode=c is 
almost always wanted. And when quickselect is done inplace, we usually 
have a contiguous input. This is also why I used a C pointer instead of 
your buffer syntax in the first version. Then I changed my mind, not 
sure why. So I'll try with a local copy first then. I don't think we 
want close to a megabyte of Cython generated gibberish C just for the 
median.

Sturla Molden
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A faster median (Wirth's method)

2009-09-02 Thread Sturla Molden
Citi, Luca skrev:
 Hello Sturla,
 In _median how can you, if n==2, use s[] if s is not defined?
 What if n==1?
   
That was a typo.



 Also, I think when returning an empty array, it should be of
 the same type you would get in the other cases.
Currently median returns numpy.nan for empty input arrays. I'll do that 
instead. I want it to behave exactly as the current implementation, 
except for the sorting.

Sturla
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How to concatenate two arrays without duplicating memory?

2009-09-02 Thread Sturla Molden
V. Armando Solé skrev:
 I am looking for a way to have a non contiguous array C in which the 
 left (1, 2000) elements point to A and the right (1, 4000) 
 elements point to B. 

 Any hint will be appreciated.

If you know in advance that A and B are going to be duplicated, you can 
use views:

C = np.zeros((1, 6000))
A = C[:,:2000]
B = C[:,2000:]

Now C is A and B concatenated horizontally.

If you can't to this, you could write the data to a temporary file and 
read it back, but it would be slow.

Sturla
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How to concatenate two arrayswithout duplicating memory?

2009-09-02 Thread Sturla Molden
Sebastian Haase skrev:
 A mockarray is initialized with a list of nd-arrays. The result is a
 mock array having one additional dimention in front.
This is important, because often in the case of  'concatenation' a real 
concatenation is not needed. But then there is a common tool called 
Matlab, which unlike Python has no concept of lists and make numerical  
programmers think they do. C = [A, B] is a horizontal concatenation in 
Matlab. Too much exposure to Matlab cripples the mind easily.

Sturla

  
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fastest way to parsing a specific binay file

2009-09-02 Thread Sturla Molden
Gökhan Sever skrev:
 What would be wisest and fastest way to tackle this issue? 
Get the format, read the binary data directly, skip the ascii/regex part.

I sometimes use recarrays with formatted binary data; just constructing 
a dtype and use numpy.fromfile to read. That works when the binary file 
store C structs written successively.

Sturla Molden


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A faster median (Wirth's method)

2009-09-02 Thread Sturla Molden
Chad Netzer skrev:
 By the way, as far as I can tell, the above algorithm is exactly the
 same idea as a non-recursive Hoare (ie. quicksort) selection:  Do the
 partition, then only proceed to the sub-partition that must contain
 the nth element.My version is a bit more general, allowing
 partitioning on a range of elements rather than just one, but the
 concept is the same.  The numpy quicksort already does non recursive
 sorting.

 I'd also like to, if possible, have a specialized 2D version, since
 image media filtering is one of my interests, and the C version works
 on 1D (raveled) arrays only.
I agree. NumPy (or SciPy) could have a select module similar to the sort 
module. If the select function takes an axis argument similar to the 
sort functions, only a small change to the current np.median would needed.

Take a look at this:

http://projects.scipy.org/numpy/attachment/ticket/1213/_selectmodule.pyx

Here is a select function that takes an axis argument. There are 
specialized versions for 1D, 2D, and 3D. Input can be contiguous or not. 
For 4D and above, axes are found by recursion on the shape array. Thus 
it should be fast regardless of dimensions.

I haven't tested the Cython code /thoroughly/, but at least it does compile.


Sturla Molden


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A faster median (Wirth's method)

2009-09-02 Thread Sturla Molden
Robert Kern skrev:
 When he is talking about 2D, I believe he is referring to median
 filtering rather than computing the median along an axis. I.e.,
 replacing each pixel with the median of a specified neighborhood
 around the pixel.

   
That's not something numpy's median function should be specialized to 
do. IMHO, median filtering belongs to scipy.

Sturla
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ticket for O(n) median function?

2009-09-01 Thread Sturla Molden
I could not find any, so I'll ask if it's ok to create one. I have a 
patch for /numpy/lib/function_base.py that uses any 'select' function to 
obtain the median. I'll also submit the Cython code for quickselect.


Attachment (median.py.gz) contains the suggested implementation of 
median. I disabled overwrite_input because the median function calls 
numpy.apply_along_axis.



Regards,
Sturla Molden


median.py.gz
Description: application/gzip
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A faster median (Wirth's method)

2009-09-01 Thread Sturla Molden
Dag Sverre Seljebotn skrev:
 Nitpick: This will fail on large arrays. I guess numpy.npy_intp is the
 right type to use in this case?

   
Yup.  You are right. Thanks.

Sturla


 

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A faster median (Wirth's method)

2009-09-01 Thread Sturla Molden
Dag Sverre Seljebotn skrev:

 Nitpick: This will fail on large arrays. I guess numpy.npy_intp is the 
 right type to use in this case?
   
By the way, here is a more polished version, does it look ok?

http://projects.scipy.org/numpy/attachment/ticket/1213/generate_qselect.py
http://projects.scipy.org/numpy/attachment/ticket/1213/quickselect.pyx

Cython needs something like Java's generics by the way :-)

Regards,
Sturla Molden
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A faster median (Wirth's method)

2009-09-01 Thread Sturla Molden
Sturla Molden skrev:

 By the way, here is a more polished version, does it look ok?
No it doesn't... Got to keep the GIL for the general case (sorting 
object arrays). Fixing that.

SM


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A faster median (Wirth's method)

2009-09-01 Thread Sturla Molden
Sturla Molden skrev:

 http://projects.scipy.org/numpy/attachment/ticket/1213/generate_qselect.py
 http://projects.scipy.org/numpy/attachment/ticket/1213/quickselect.pyx
My suggestion for a new median function is here:
http://projects.scipy.org/numpy/attachment/ticket/1213/median.py

The quickselect extension module can be compiled from C source:
http://projects.scipy.org/numpy/attachment/ticket/1213/quickselect.c.gz

Feel free to look at it when you have time. :-)


Regards,
Sturla Molden
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] A faster median (Wirth's method)

2009-08-31 Thread Sturla Molden

We recently has a discussion regarding an optimization of NumPy's median 
to average O(n) complexity. After some searching, I found out there is a 
selection algorithm competitive in speed with Hoare's quick select. It 
has the advantage of being a lot simpler to implement. In plain Python:

import numpy as np

def wirthselect(array, k):
   
 Niklaus Wirth's selection algortithm 

a = np.ascontiguousarray(array)
if (a is array): a = a.copy()

l = 0
m = a.shape[0] - 1
while l  m:
x = a[k]
i = l
j = m
while 1:
while a[i]  x: i += 1
while x  a[j]: j -= 1
if i = j:
tmp = a[i]
a[i] = a[j]
a[j] = tmp
i += 1
j -= 1
if i  j: break
if j  k: l = i
if k  i: m = j

return a


Now, the median can be obtained in average O(n) time as:


def median(x):

 median in average O(n) time 

n = x.shape[0]
k = n  1
s = wirthselect(x, k)
if n  1:
return s[k]
else:
return 0.5*(s[k]+s[:k].max())


The beauty of this is that Wirth select is extremely easy to migrate to 
Cython:


import numpy
ctypedef numpy.double_t T # or whatever

def wirthselect(numpy.ndarray[T, ndim=1] array, int k):
   
cdef int i, j, l, m
cdef T x, tmp
cdef T *a

_array = np.ascontiguousarray(array)
if (_array is array): _array = _array.copy()
a = T * _array.data

l = 0
m = int a.shape[0] - 1
with nogil:
while l  m:
x = a[k]
i = l
j = m
while 1:
while a[i]  x: i += 1
while x  a[j]: j -= 1
if i = j:
tmp = a[i]
a[i] = a[j]
a[j] = tmp
i += 1
j -= 1
if i  j: break
if j  k: l = i
if k  i: m = j

return _array


For example, we could have a small script that generates withselect for 
all NumPy dtypes (T as template), and use a dict as jump table.

Chad, you can continue to write quick select using NumPy's C quick sort 
in numpy/core/src/_sortmodule.c.src.  When you are done, it might be 
about 10% faster than this. :-)


Reference:
http://ndevilla.free.fr/median/median.pdf



Best regards,
Sturla Molden







___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A faster median (Wirth's method)

2009-08-31 Thread Sturla Molden
Sturla Molden skrev:
 We recently has a discussion regarding an optimization of NumPy's median 
 to average O(n) complexity. After some searching, I found out there is a 
 selection algorithm competitive in speed with Hoare's quick select.

 Reference:
 http://ndevilla.free.fr/median/median.pdf

After som more googling, I found this  text by Wirth himself:

http://www.oberon2005.ru/book/ads2004.pdf

The method is described on page 61 (57 in the PDF) as Hoare's quick 
select. So it seems it's just a less optimized version than that of 
Numerical Receipes, and the first reference (Devillard) was confused. 
Anyhow, it still has the advantage of looking nice in Cython and being 
very different from the Numerical Receipes code. We can rename 
wirthselect to quickselect then. Sorry for the confusion. I should have 
checked the source better.

Sturla Molden

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A better median function?

2009-08-25 Thread Sturla Molden
Chad Netzer skrev:
 My current plan of attack is to deliver a partition() function that
 basically returns an array such that elements less than the pivot(s)
 come first, then the pivot(s), then the elements greater than the
 pivot(s).  

I'm actually trying to write a fast median replacement myself. I was 
thinking in the same lines, except I don't store those two arrays. I 
just keep track of counts in them. For the even case, I also keep track 
the elements closest to the pivot (smaller and bigger). It's incredibly 
simple actually. So lets see who gets there first :-)

Sturla Molden





___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] PRNGs and multi-threading

2009-08-21 Thread Sturla Molden

I am not sure if this is the right place to discuss this issue. However, 
I problem I keep running into in Monte Carlo simulations is generating 
pseudorandom numbers with multiple threads. PRNGs such as the Mersenne 
Twister keep an internal state, which prevents the PRNG from being 
re-entrant and thread-safe.


Possible solutions:


1. Use multiple instances of PRNG states (numpy.random.RandomState), one 
for each thread. This should give no contention, but is this 
mathematically acceptable? I don't know. At least I have not seen any 
proof that it is.



2. Protect the PRNG internally with a spinlock. In Windows lingo, that is:

#include windows.h
static volatile long spinlock = 0;
#define ACQUIRE_SPINLOCK while(InterlockedExchangeAcquire(spinlock, 1));
#define RELEASE_SPINLOCK InterlockedExchangeAcquire(spinlock, 0);

Problem: possible contention between thread, idle work if threads spins 
a lot.



3. Use a conventional mutex object to protect the PRNG (e.g. 
threading.Lock in Python or CRITICAL_SECTION in Windows).  Problem: 
contention, context shifting, and mutexes tend to be slow. Possibly the 
worst solution.



4. Put the PRNG in a dedicated thread, fill up rather big arrays with 
pseudo-random numbers, and write them to a queue. Problem: Context 
shifting unless a CPU is dedicated to this task. Unless producing random 
numbers consitutes a major portion of the simulation, this should not 
lead to much contention.

import threading
import Queue
import numpy
from numpy.random import rand

class PRNG(threading.Thread):

''' a thread that generate arrays with random numbers
and dumps them to a queue '''

def __init__(self, nthreads, shape):
self.shape = shape
self.count = numpy.prod(shape)
self.queue = Queue.Queue(4 * nthreads) # magic number
self.stop_evt = threading.Event()
   
def generate(self, block=True, timeout=None):
return self.queue.get(block,timeout)
   
def join(self, timeout=None):
self.stop_evt.set()
super(PRNG,self).join(timeout)
   
def run(self):
tmp = rand(count).reshape(shape)
while 1:
try:
self.queue.put(tmp, block=True, timeout=2.0)
except Queue.Full:
if self.stop_evt.isSet(): break
else:
tmp = rand(count).reshape(shape)





Do you have any view on this? Is there any way of creating multiple 
independent random states that will work correctly? I know of SPRNG 
(Scalable PRNG), but it is made to work with MPI (which I don't use).


Regards,
Sturla Molden












___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] PRNGs and multi-threading

2009-08-21 Thread Sturla Molden
Robert Kern skrev:
 Although you don't really have re-entrancy issues, you will usually
 want one PRNG per thread for determinism.
   
I see... numpy.random.rand does not have re-entrancy issues because of 
the GIL, but I get indeterminism from the OS scheduling the threads. 
RandomState might not release the GIL either, but preserves determinism 
in presence of multiple threads. Thanks.  :-)
 

Regards,
Sturla Molden







___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] PRNGs and multi-threading

2009-08-21 Thread Sturla Molden
Sturla Molden skrev:
 It seems there is a special version of the Mersenne Twister for this. 
 The code is LGPL (annoying for SciPy but ok for me).

http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/DC/dgene.pdf 
http://www.math.sci.hiroshima-u.ac.jp/%7Em-mat/MT/DC/dgene.pdf

Basically it encodes the thread-ids in the characteristic polynomial of 
the MT, producing multiple small-period, independent MTs. That solves it 
then. Too bad this is LGPL. It would be a very useful enhancement to 
RandomState.

Sturla Molden



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] PRNGs and multi-threading

2009-08-21 Thread Sturla Molden
Xavier Saint-Mleux skrev:
 Of course, the mathematically correct way would be to use a correct
 jumpahead function, but all the implementations that I know of are GPL. 
 A recent article about this is:

 www.iro.umontreal.ca/~lecuyer/myftp/papers/jumpmt.pdf

   
I know of no efficient jumpahead function for MT. Several seconds for 
1000 jumps ahead is not impressive -- just generating the deviates is 
faster!

With DCMT it is easy to create independent MTs with smaller periods. 
Independence here means that the characteristic polynomials are 
relatively prime to each other. A small period of e.g. 2**521 - 1 
means that if we produce 1 billion deviates per minute, it would still 
take the MT about 10**143 years to cycle. Chances are we will not be 
around to see that happen. It also seems that nvidia has endorsed this 
method:

http://developer.download.nvidia.com/compute/cuda/sdk/website/projects/MersenneTwister/doc/MersenneTwister.pdf



S.M.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fwd: GPU Numpy

2009-08-21 Thread Sturla Molden
Erik Tollerud skrev:
 NumPy arrays on the GPU memory is an easy task. But then I would have to
 write the computation in OpenCL's dialect of C99? 
 This is true to some extent, but also probably difficult to do given
 the fact that paralellizable algorithms are generally more difficult
 to formulate in striaghtforward ways. 
Then you have misunderstood me completely. Creating an ndarray that has 
a buffer in graphics memory is not too difficult, given that graphics 
memory can be memory mapped. This has nothing to do with parallelizable 
algorithms or not. It is just memory management. We could make an 
ndarray subclass that quickly puts is content in a buffer accessible to 
the GPU. That is not difficult. But then comes the question of what you 
do with it.

I think many here misunderstands the issue here:

Teraflops peak performance of modern GPUs is impressive. But NumPy 
cannot easily benefit from that. In fact, there is little or nothing to 
gain from optimising in that end. In order for a GPU to help, 
computation must be the time-limiting factor. It is not. There is not 
more to say about using GPUs in NumPy right now.

Take a look at the timings here: http://www.scipy.org/PerformancePython 
It shows that computing with NumPy is more than ten times slower than 
using plain C. This is despite NumPy being written in C. The NumPy code 
does not incur 10 times more floating point operations than the C code. 
The floating point unit does not run in turtle mode when using NumPy. 
NumPy's relative slowness compared to C has nothing to do with floating 
point computation. It is due to inferior memory use (temporary buffers, 
multiple buffer traversals) and memory access being slow. Moving 
computation to the GPU can only make this worse.

Improved memory usage - e.g. through lazy evaluation and JIT compilaton 
of expressions - can give up to a tenfold increase in performance. That 
is where we must start optimising to get a faster NumPy. Incidentally, 
this will  also make it easier to leverage on modern GPUs.

Sturla Molden
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] GPU Numpy

2009-08-06 Thread Sturla Molden
Olivier Grisel wrote:
 As usual, MS reinvents the wheel with DirectX Compute but vendors such
 as AMD and nvidia propose both the OpenCL API +runtime binaries for
 windows and their DirectX Compute counterpart, based on mostly the
 same underlying implementation, e.g. CUDA in nvidia's case.

   
Here is a DirectX Compute tutorial I found:

http://www.gamedev.net/community/forums/topic.asp?topic_id=516043

It pretty much says all we need to know. I am not investing any of my 
time learning that shitty API. Period.

Lets just hope OpenCL makes it to Windows without Microsoft breaking it 
for security reasons (as they did with OpenGL).



Sturla



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fwd: GPU Numpy

2009-08-06 Thread Sturla Molden

 Now linear algebra or FFTs on a GPU would probably be a huge boon, 
 I'll admit - especially if it's in the form of a drop-in replacement 
 for the numpy or scipy versions.


NumPy generate temporary arrays for expressions involving ndarrays. This 
extra allocation and copying often takes more time than the computation. 
With GPGPUs, we have to bus the data to and from VRAM as well. D. Knuth 
quoted Hoare saying that premature optimization is the root of all 
evil. Optimizing computation when the bottleneck is memory is premature.

In order to improve on this, I think we have to add lazy evaluation to 
NumPy. That is, an operator should not return a temporary array but a 
symbolic expression. So if we have an expression like

y = a*x + b

it should not evalute a*x into a temporary array. Rather, the operators 
would build up a parse tree like

y = add(multiply(a,x),b)

and evalute the whole expression  later on.

This would require two things: First we need dynamic code generation, 
which incidentally is what OpenCL is all about. I.e. OpenCL is 
dynamically invoked compiler; there is a function 
clCreateProgramFromSource, which  does just what it says. Second, we 
need arrays to be immutable. This is very important. If arrays are not 
immutable, code like this could fail:

y = a*x + b
x[0] = 1235512371235

With lazy evaluation, the memory overhead would be much smaller. The 
GPGPU would also get a more complex expressions to use as a kernels.

There should be an option of running this on the CPU, possibly using 
OpenMP for multi-threading. We could either depend on a compiler (C or 
Fortran) being installed, or use opcodes for a dedicated virtual machine 
(cf. what numexpr does).

In order to reduce the effect of immutable arrays, we could introduce a 
context-manager. Inside the with statement, all arrays would be 
immutable. Second, the __exit__ method could trigger the code generator 
and do all the evaluation. So we would get something like this:

# normal numpy here

with numpy.accelerator():

# arrays become immutable
# lazy evaluation
   
# code generation and evaluation on exit

# normal numpy continues here


Thus, here is my plan:

1. a special context-manager class
2. immutable arrays inside with statement
3. lazy evaluation: expressions build up a parse tree
4. dynamic code generation
5. evaluation on exit

I guess it is possibly to find ways to speed up this as well. If a 
context manager would always generate the same OpenCL code, the with 
statement would only need to execute once (we could raise an exception 
on enter to jump directly to exit).

It is possibly to create a superfast NumPy. But just plugging GPGPUs 
into the current design would be premature. In NumPy's current state, 
with mutable ndarrays and operators generating temporary arrays, there 
is not much to gain from introducing GPGPUs. It would only be beneficial 
in computationally demanding parts like FFTs and solvers for linear 
algebra and differential equations. Ufuncs with trancendental functions 
might also benefit. SciPy would certainly benefit more from GPGPUs than 
NumPy.

Just my five cents :-)

Regards,
Sturla Molden
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fwd: GPU Numpy

2009-08-06 Thread Sturla Molden
Robert Kern wrote:
 I believe that is exactly the point that Erik is making. :-)
   
I wasn't arguing against him, just suggesting a solution. :-)

I have big hopes for lazy evaluation, if we can find a way to to it right.

Sturla
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fwd: GPU Numpy

2009-08-06 Thread Sturla Molden
Charles R Harris wrote:
 Whether the code that gets compiled is written using lazy evaluation 
 (ala Sturla), or is expressed some other way seems like an independent 
 issue. It sounds like one important thing would be having arrays that 
 reside on the GPU.
Memory management is slow compared to computation. Operations like 
malloc, free and memcpy is not faster for VRAM than for RAM. There will 
be no benefit from the GPU if the bottleneck is memory. That is why we 
need to get rid of the creation of temporary arrays, hence lazy evaluation.

Having arrays reside in VRAM would reduce the communication between RAM 
and VRAM, but the problem with temporary arrays is still there.

Also VRAM tends to be a limited resource.

Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fwd: GPU Numpy

2009-08-06 Thread Sturla Molden
Sturla Molden wrote:
 Memory management is slow compared to computation. Operations like 
 malloc, free and memcpy is not faster for VRAM than for RAM. 

Actually it's not VRAM anymore, but whatever you call the memory 
dedicated to the GPU.

It is cheap to put 8 GB of RAM into a computer, but graphics cards with 
more than 1 GB memory are expensive and uncommon on e.g. laptops. And 
this memory will be needed for other things as well, e.g. graphics.

Sturla



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fwd: GPU Numpy

2009-08-06 Thread Sturla Molden
Charles R Harris wrote:

 I mean, once the computations are moved elsewhere numpy is basically a 
 convenient way to address memory.

That is how I mostly use NumPy, though. Computations I often do in 
Fortran 95 or C.

NumPy arrays on the GPU memory is an easy task. But then I would have to 
write the computation in OpenCL's dialect of C99? But I'd rather program 
everything in Python if I could. Details like GPU and OpenCL should be 
hidden away. Nice looking Python with NumPy is much easier to read and 
write. That is why I'd like to see a code generator (i.e. JIT compiler) 
for NumPy.


Sturla


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fwd: GPU Numpy

2009-08-06 Thread Sturla Molden
James Bergstra wrote:
 The plan you describe is a good one, and Theano
 (www.pylearn.org/theano) almost exactly implements it.  You should
 check it out.  It does not use 'with' syntax at the moment, but it
 could provide the backend machinery for your mechanism if you want to
 go forward with that.  Theano provides
 - symbolic expression building for a big subset of what numpy can do
 (and a few things that it doesn't)
 - expression optimization (for faster and more accurate computations)
 - dynamic code generation
 - cacheing of compiled functions to disk.
Thank you James, theano looks great. :-D

Sturla


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] improved NumPy support for boolean arrays?

2009-08-05 Thread Sturla Molden

  If x and y are numpy
 arrays of bools, I'd like to be able to create expressions like the
 following:

 not x (to invert each element of x)
 x and y
 x or y
 x xor y
 (not x) or y

 The usual array broadcasting rules should apply.  Is there any chance of
 getting something like this into NumPy?
There is a reason for this related to Python. In Python an object will 
often have a boolean truth value. How would you cast an ndarray to bool? 
If you write something like (x and y) the Python interpreter expects 
this to evaluate to True or False. Thus is cannot evaluate to an ndarray 
with booleans. NumPy cannot change the syntax of Python.

Another thing: An empty list evaluates to False in a boolean context, 
whereas a non-empty list evaluates to True. ndarrays behave differently. 
Why?

Sturla Moldem

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Compiling for free on Windows32

2009-04-17 Thread Sturla Molden
On 4/15/2009 6:44 PM, Matthieu Brucher wrote:

 There is a Python limitation for the compiler. 

There is a common misunderstanding that only VS2003 can be used to 
compile extension objects for Python 2.5. Don't believe it. There is no 
Python limitation for the compiler.

There is a Python limitation for the C runtime (crt), but only if you 
share crt objects with Python. That is, if you e.g. open a FILE* pointer 
in C, create a Python file object from the pointer, and then read that 
file in Python. In this case Python would do a fread on the pointer, and 
it is your responsibility to make sure the same crt DLL does the fopen, 
fread and fclose.

If you don't share any crt resources with Python, there is no limitation 
on the crt either. If you do a malloc and a corresponding free in C, it 
does not matter what crt you use. It may even be used by Python objects 
in the meanwhile.

mingw (gcc, gfortran) can be used freely, and afaik is used to compile 
the official NumPy and SciPy releases for Windows. The gfortran team has 
made a binary installer for version 4.4 available in their Wiki.

When using mingw you have to link with the the same crt as Python if crt 
resources are shared. That is -lmsvcr71 for Python 2.5 and -lmsvcr90 for 
Python 2.6. If crt resources are unshared, link with whatever crt you want.

Sturla Molden
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Compiling for free on Windows32

2009-04-17 Thread Sturla Molden
On 4/17/2009 10:50 AM, David Cournapeau wrote:

 I think Matthieu meant you have to use VS2003 as a MS compiler. Mingw is
 obviously fine, since that's how numpy binaries are built for quite a
 long time 

That is what I thought he meant as well, and it seems to be a widespread 
belief. The fact that numpy is built with mingw proves it wrong, 
obviously. I just wanted to point out what the real limitation is: don't 
mix and blend CRTs and CRT resources. This is obviously not a Windows or 
Python specific problem.


 gfortran 4.4 is pre-released software, though, so take care. I know for
 sure that gfortran + MS compiler can mix on 32 bits 

http://gcc.gnu.org/wiki/GFortranBinaries

I use this 32 bit mingw binary to build my Cython and f2py extensions. I 
works like a charm. I have licenses for Intel compilers at work, but I 
prefer gfortran 4.4.


Sturla Molden

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Compiling for free on Windows32

2009-04-17 Thread Sturla Molden
On 4/17/2009 11:50 AM, Gael Varoquaux wrote:

 Could you elaborate on your reason.

Probably silly reasons though...

I have a distutils.cfg file and a build system set up that works. I 
don't want to bother setting up a different build system when I already 
have one that works. I can use the Intel compilers at home, but only if 
I have an open VPN connection to work. I think just using a free gcc 
compiler is easier. I don't want to remember switches for two compilers. 
Syntax for inline assembly is different (on Windows), requiring 
preprocessor magic and code duplication. Object files and libraries are 
named differently (on Windows). There are numerous resons, but they all 
boil down to me being lacy.

S.M.

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] dpss windows

2009-04-03 Thread Sturla Molden
NumPy has support for various window functions (von Hann, hamming, 
blackman, bartlet, kaiser). DPSS windows (e.g. used by multitaper 
spectral estimators) are missing. Is there any particular reason for this?

(No this is not a request, I already have my own dpss code. I just 
notice while reading the NumPy and SciPy docs today that dpss windows 
are missing.)


Sturla Molden


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.ctypeslib.ndpointer and the restype attribute

2009-03-24 Thread Sturla Molden
Jens Rantil wrote:

 Thanks Sturla. However numpy.uint8 seem to be lacking attributes 'str'
 and 'descr'. I'm using installed Ubuntu package 1:1.1.1-1. Is it too old
 or is the code broken?
Oops, my fault :)


def fromaddress(address, nbytes, dtype=float):
class Dummy(object): pass
d = Dummy()
bytetype = numpy.dtype(numpy.uint8)
d.__array_interface__ = {
 'data' : (address, False),
 'typestr' : bytetype.str,
 'descr' : bytetype.descr,
 'shape' : (nbytes,),
 'strides' : None,
 'version' : 3
}   
return numpy.asarray(d).view(dtype=dtype)


You will have to make sure the address is an integer.


 Also, could you elaborate why dtype=float would work better?
Because there is no such thing as a double type in Python?


Sturla Molden






___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] String arrays from Python to Fortran

2009-03-23 Thread Sturla Molden
How did you import the function? f2py? What did you put in your .pyf file?




 *My Fortran code:*

 subroutine print_string (a, c)
 implicit none
 character(len=255), dimension(c), intent(inout)::  a
 integer, intent(in) :: c
 integer :: i
 do i = 1, size(a)
 print*, a(i)
 end do

 end subroutine print_string

 *My Python code:*

 from test import *
 from numpy import *

 a = this is the test string.
 a = a.split()

 b = a

 a = char.array(a, itemsize=1, order = 'Fortran')

 print_string(a, len(a)) #this is imported from the compiled Fortran code

 

 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.ctypeslib.ndpointer and the restype attribute

2009-03-23 Thread Sturla Molden
Sturla Molden wrote:
 def fromaddress(address, nbytes, dtype=double):
I guess dtype=float works better...


S.M.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Concatenating string arrays

2009-03-18 Thread Sturla Molden
On 3/18/2009 7:30 PM, Thomas Robitaille wrote:

 import numpy as np
 arr1 = np.array(['a','b','c'])
 arr2 = np.array(['d','e','f'])
 
 I would like to produce a third array that would contain  
 ['ad','be','cf']. Is there an efficient way to do this? I could do  
 this element by element, but I need a faster method, as I need to do  
 this on arrays with several million elements.

  arr1 = np.array(['a','b','c'])
  arr2 = np.array(['d','e','f'])
  arr3 = np.zeros(6, dtype='|S1')
  arr3[::2] = arr1
  arr3[1::2] = arr2
  arr3.view(dtype='|S2')
array(['ad', 'be', 'cf'],
   dtype='|S2')

Does this help?


Sturla Molden

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Superfluous array transpose (cf. ticket #1054)

2009-03-16 Thread Sturla Molden
On 3/16/2009 9:27 AM, Pearu Peterson wrote:

 If a operation produces new array then the new array should have the
 storage properties of the lhs operand.

That would not be enough, as 1+a would behave differently from a+1. The 
former would change storage order and the latter would not.

Broadcasting arrays adds futher to the complexity of the problem.

It seems necessary to something like this to avoid the trap when using f2py:

def some_fortran_function(x):
if x.flags['C_CONTIGUOUS']:
shape = x.shape[::-1]
 _x = x.reshape(shape, order='F')
 _y = _f2py_wrapper(_x)
 shape = _y.shape[::-1]
 return y.reshape(shape, order='C')
else:
 return _f2py_wrapper(x)

And then preferably never use Fortran ordered arrays directly.


Sturla Molden







___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Enhancements for NumPy's FFTs

2009-03-15 Thread Sturla Molden

 1) I have noticed that fftpack_litemodule.c does not release the GIL
 around calls to functions in fftpack.c. I cannot se any obvious reason
 for
 this. As far as I can tell, the functions in fftpack.c are re-entrant.

 2) If fftpack_lite did release the GIL, it would allow functions in
 numpy.fft to use multithreading for multiple FFTs in parallel
 (threading.Thread are ok, not special compilation needed).

 Both are fines to modify for 1.3.

There is a version of fftpack_litemodule.c, fftpack.c and fftpack.h that
does this attached to ticket #1055. The two important changes are
releasing the GIL and using npy_intp for 64 bit support.

Minor changes:

There is a restrict qualifier in fftpack.c. If it is not compiled with
C99, it tries to use similar GNU or MS extensions.

There is some OpenMP pragmas in ftpack_litemodule.c. If you don't compile
with OpenMP support, they do nothing. If you do compile with OpenMP, they
will make certain FFTs run in parallel.  I can comment them out if you
prefer.

Sturla Molden




___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Enhancements for NumPy's FFTs

2009-03-15 Thread Sturla Molden

 Would it be possible to make the changes as a patch (svn diff) - this
 makes things easier to review.

I've added diff files to ticket #1055.


 Yes, I would be more comfortable without them (for 1.3). This is
 typically the kind of small changes which can be a PITA to deal with
 just before a release because it breaks some platforms in non obvious
 ways.

Ok, they are commented out.


 For the restrict keyword support, I will add a distutils check to avoid
 the compiler-specifics (again after 1.3).

I've added a header file npy_restrict.h that defines a NPY_RESTRICT symbol.

Best regards,

Sturla







___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Enhancements for NumPy's FFTs

2009-03-15 Thread Sturla Molden
 Mon, 16 Mar 2009 00:33:28 +0900, David Cournapeau wrote:

 Also, you could post the patch on the http://codereview.appspot.com site.
 Then it would be easier to both review and to keep track of its
 revisions

I have posted the files here:

http://projects.scipy.org/numpy/ticket/1055


Sturla

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Enhancements for NumPy's FFTs

2009-03-15 Thread Sturla Molden

 Well, that's nearly as good. (Though submitting a single svn diff
 containing all changes could have been a bit more easy to handle than
 separate patches for each file. But a small nitpick only.)

The problem is I am really bad at using these tools. I have TortoiseSVN
installed, but no idea how to use it. :(

I cannot delete any file attachment in trac, but I can overwrite the files
I've posted.

S.M.



___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Superfluous array transpose (cf. ticket #1054)

2009-03-15 Thread Sturla Molden

Regarding ticket #1054. What is the reason for this strange behaviour?

 a = np.zeros((10,10),order='F')
 a.flags
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False
 (a+1).flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False


Sturla Molden

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Reminder: code freeze for bet at the end of the WE

2009-03-14 Thread Sturla Molden

Will memmap be fixed to use offsets correctly before 1.3?


 hi,

 Just a friendly reminder that I will close the trunk for 1.3.0 at the
 end of 15th March (I will more likely do it at the end of Monday Japan
 time which roughly corresponds to 15th March midnight Pacific time),

 cheers,

 David
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Reminder: code freeze for bet at the end of the WE

2009-03-14 Thread Sturla Molden

 Will memmap be fixed to use offsets correctly before 1.3?

I posted this to scipy-dev (possibly wrong list) on March 9, so I'll
repeat it here: In Python 2.6, mmap has a offset keyword. NumPy's memmap
should use this to allow big files to be memory mapped on 32 bit systems.
Only a minor change is required:

if float(sys.version[:3])  2.5:

 bytes = bytes - offset

 mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=offset)

 self = ndarray.__new__(subtype, shape, dtype=descr, buffer=mm,
 offset=0, order=order)

else:

 mm = mmap.mmap(fid.fileno(), bytes, access=acc)

 self = ndarray.__new__(subtype, shape, dtype=descr, buffer=mm,
 offset=offset, order=order)


Instead of just:

 mm = mmap.mmap(fid.fileno(), bytes, access=acc)

 self = ndarray.__new__(subtype, shape, dtype=descr, buffer=mm,
 offset=offset, order=order)



Reagards,
Sturla Molden

__all__ = ['memmap']

import warnings
from numeric import uint8, ndarray, dtype
import sys

dtypedescr = dtype
valid_filemodes = [r, c, r+, w+]
writeable_filemodes = [r+,w+]

mode_equivalents = {
readonly:r,
copyonwrite:c,
readwrite:r+,
write:w+
}

class memmap(ndarray):

Create a memory-map to an array stored in a file on disk.

Memory-mapped files are used for accessing small segments of large files
on disk, without reading the entire file into memory.  Numpy's
memmap's are array-like objects.  This differs from Python's ``mmap``
module, which uses file-like objects.

Parameters
--
filename : string or file-like object
The file name or file object to be used as the array data
buffer.
dtype : data-type, optional
The data-type used to interpret the file contents.
Default is `uint8`
mode : {'r+', 'r', 'w+', 'c'}, optional
The file is opened in this mode:

+--+-+
| 'r'  | Open existing file for reading only.|
+--+-+
| 'r+' | Open existing file for reading and writing. |
+--+-+
| 'w+' | Create or overwrite existing file for reading and writing.  |
+--+-+
| 'c'  | Copy-on-write: assignments affect data in memory, but   |
|  | changes are not saved to disk.  The file on disk is |
|  | read-only.  |
+--+-+

Default is 'r+'.
offset : integer, optional
In the file, array data starts at this offset.  `offset` should be
a multiple of the byte-size of `dtype`.  Requires `shape=None`.
The default is 0.
shape : tuple, optional
The desired shape of the array. By default, the returned array will be
1-D with the number of elements determined by file size and data-type.
order : {'C', 'F'}, optional
Specify the order of the ndarray memory layout: C (row-major) or
Fortran (column-major).  This only has an effect if the shape is
greater than 1-D.  The defaullt order is 'C'.

Methods
---
close
Close the memmap file.
flush
Flush any changes in memory to file on disk.
When you delete a memmap object, flush is called first to write
changes to disk before removing the object.

Notes
-
The memmap object can be used anywhere an ndarray is accepted.
Given a memmap ``fp``, ``isinstance(fp, numpy.ndarray)`` returns
``True``.

Notes
-

Memory-mapped arrays use the the Python memory-map object which
(prior to Python 2.5) does not allow files to be larger than a
certain size depending on the platform. This size is always  2GB
even on 64-bit systems.

Examples

 data = np.arange(12, dtype='float32')
 data.resize((3,4))

This example uses a temporary file so that doctest doesn't write
files to your directory. You would use a 'normal' filename.

 from tempfile import mkdtemp
 import os.path as path
 filename = path.join(mkdtemp(), 'newfile.dat')

Create a memmap with dtype and shape that matches our data:

 fp = np.memmap(filename, dtype='float32', mode='w+', shape=(3,4))
 fp
memmap([[ 0.,  0.,  0.,  0.],
[ 0.,  0.,  0.,  0.],
[ 0.,  0.,  0.,  0.]], dtype=float32)

Write data to memmap array:

 fp[:] = data[:]
 fp
memmap([[  0.,   1.,   2.,   3.],
[  4.,   5.,   6.,   7.],
[  8.,   9.,  10.,  11.]], dtype=float32)

Deletion flushes memory changes to disk before removing the object:

 del

Re: [Numpy-discussion] Reminder: code freeze for bet at the end of the WE

2009-03-14 Thread Sturla Molden

 Can you open a ticket for this?

Done. Ticket #1053


Surla

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


<    2   3   4   5   6   7   8   >