Re: numpy performance and random numbers

2009-12-23 Thread Peter Pearson
On Sun, 20 Dec 2009 21:02:02 -0800, Rami Chowdhury wrote:

 On Dec 20, 2009, at 17:41 , Peter Pearson wrote:

 Why not use a good cipher, such as AES, to generate a pseudorandom
 bit stream by encrypting successive integers?  

 Isn't the Fortuna PRNG based around that approximate concept?

Yes, although the interesting part of Fortuna is its strategy for
entropy accumulation, which is (I think) not of interest to the
Original Poster.  In cryptology, one finds many places where a
statistically significant departure from random behavior would
be a publishable discovery.

-- 
To email me, substitute nowhere-spamcop, invalid-net.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: numpy performance and random numbers

2009-12-21 Thread sturlamolden
On 20 Des, 01:46, Lie Ryan lie.1...@gmail.com wrote:

 Not necessarily, you only need to be certain that the two streams don't
 overlap in any reasonable amount of time. For that purpose, you can use
 a PRNG that have extremely high period like Mersenne Twister and puts
 the generators to very distant states.

Except there is no way to find two very distant states and prove they
are distant enough.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: numpy performance and random numbers

2009-12-21 Thread Robert Kern

On 2009-12-19 09:14 AM, Carl Johan Rehn wrote:

On Dec 19, 2:49 pm, sturlamoldensturlamol...@yahoo.no  wrote:

On 19 Des, 11:05, Carl Johan Rehncar...@gmail.com  wrote:


I plan to port a Monte Carlo engine from Matlab to Python. However,
when I timed randn(N1, N2) in Python and compared it with Matlab's
randn, Matlab came out as a clear winner with a speedup of 3-4 times.
This was truly disappointing. I ran tthis test on a Win32 machine and
without the Atlas library.


This is due to the algorithm. Matlab is using Marsaglia's ziggurat
method. Is is the fastest there is for normal and gamma random
variates. NumPy uses the Mersenne Twister to produce uniform random
deviates, and then applies trancendental functions to transform to the
normal distribution. Marsaglia's C code for ziggurat is freely
available, so you can compile it yourself and call from ctypes, Cython
or f2py.

The PRNG does not use BLAS/ATLAS.


Thank you, this was very informative. I know about the Mersenne
Twister, but had no idea about Marsaglia's ziggurat method. I will
definitely try f2py or cython.


It's worth noting that the ziggurat method can be implemented incorrectly, and 
requires some testing before I will accept such in numpy. That's why we don't 
use the ziggurat method currently. C.f.


  http://www.doornik.com/research/ziggurat.pdf

Requests for the ziggurat method come up occasionally on the numpy list, but so 
far no one has shown code or test results. Charles Harris and Bruce Carneal seem 
to have gotten closest. You can search numpy-discussion's archive for their 
email addresses and previous threads. Additionally, you should ask future numpy 
questions on the numpy-discussion mailing list.


  http://www.scipy.org/Mailing_Lists

--
Robert Kern

I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth.
  -- Umberto Eco

--
http://mail.python.org/mailman/listinfo/python-list


Re: numpy performance and random numbers

2009-12-21 Thread r0g
sturlamolden wrote:
 On 19 Des, 16:20, Carl Johan Rehn car...@gmail.com wrote:
 
 How about mulit-core or (perhaps more exciting) GPU and CUDA? I must
 admit that I am extremely interested in trying the CUDA-alternative.

 Obviously, cuBLAS is not an option here, so what is the safest route
 for a novice parallel-programmer?
 
 The problem with PRNG is that they are iterative in nature, and
 maintain global states. They are therefore very hard to vectorize. A
 GPU will not help. The GPU has hundreds of computational cores that
 can run kernels, but you only get to utilize one.
 
 Parallel PRNGs are an unsolved problem in computer science.
 
 
 
 
 



Surely you could have as many totally independent cores as you like
independently spitting out random bits whenever they feel like it to
make an equally random bitstream?  would have thought the only issue
would be ensuring high quality bitstream was used to seed each thread no?

Roger.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: numpy performance and random numbers

2009-12-21 Thread Gib Bogle

sturlamolden wrote:

On 19 Des, 14:06, Carl Johan Rehn car...@gmail.com wrote:


Matlab and numpy have (by chance?) the exact names for the same
functionality,


Common ancenstry, NumPy and Matlab borrowed the name from IDL.


LabView, Octave and SciLab uses the name randn as well.



So the basioc question is, how can I speed up random number
generation?


The obvious thing would be to compile ziggurat yourself, and turn on
optimization flags for your hardware.
http://www.jstatsoft.org/v05/i08/


P.S. Be careful if you consider using more than one processor.
Multithreading is a very difficult issue with PRNGs, becuase it is
difficult to guarrantee they are truely independent. But you can use a
producer-consumer pattern, though: one thread constantly producing
random numbers (writing into a buffer or pipe) and another thread(s)
consuming them.


In case you're interested, I've made a multi-thread version of ziggurat 
(actually in Fortran for use with OpenMP).  It is a simple enough mod, there is 
an additional argument (thread number) in each call to the ziggurat functions, 
and ziggurat maintains separate variables for each thread (not just the seed). 
There was one non-trivial issue.  To avoid cache collisions, the seed values (an 
array corresponding to jsr in the original code) need to be spaced sufficiently 
far apart.  Without this measure the performance was disappointingly slow.


I agree with your recommendation - ziggurat is really fast.
--
http://mail.python.org/mailman/listinfo/python-list


Re: numpy performance and random numbers

2009-12-21 Thread Gib Bogle

sturlamolden wrote:

On 19 Des, 16:20, Carl Johan Rehn car...@gmail.com wrote:


How about mulit-core or (perhaps more exciting) GPU and CUDA? I must
admit that I am extremely interested in trying the CUDA-alternative.

Obviously, cuBLAS is not an option here, so what is the safest route
for a novice parallel-programmer?


The problem with PRNG is that they are iterative in nature, and
maintain global states. They are therefore very hard to vectorize. A
GPU will not help. The GPU has hundreds of computational cores that
can run kernels, but you only get to utilize one.

Parallel PRNGs are an unsolved problem in computer science.


My parallel version of ziggurat works fine, with Fortran/OpenMP.
--
http://mail.python.org/mailman/listinfo/python-list


Re: numpy performance and random numbers

2009-12-21 Thread Gib Bogle

sturlamolden wrote:

On 19 Des, 22:58, sturlamolden sturlamol...@yahoo.no wrote:


If you pick two random states (using any PRNG), you need error-
checking that states are always unique, i.e. that each PRNG never
reaches the starting state of the other(s).


Another note on this:

Ideally, we would e.g. know how to find (analytically) MT states that
are very far apart. But to my knowledge no such equation has been
derived.

But often in Monte Carlo simulations, the PRNG is not the dominant
computational bottleneck. So we can simply start N PRNGs from N
consequtive states, and for each PRNG only use every N-th pseudorandom
deviate.



OK, now I understand.  In my application I don't care about a thread's PRNG 
reaching the start of another threads.  I do understand that this is not the 
case in all applications.

--
http://mail.python.org/mailman/listinfo/python-list


Re: numpy performance and random numbers

2009-12-21 Thread Robert Kern

On 2009-12-21 16:57 PM, r0g wrote:

sturlamolden wrote:

On 19 Des, 16:20, Carl Johan Rehncar...@gmail.com  wrote:


How about mulit-core or (perhaps more exciting) GPU and CUDA? I must
admit that I am extremely interested in trying the CUDA-alternative.

Obviously, cuBLAS is not an option here, so what is the safest route
for a novice parallel-programmer?


The problem with PRNG is that they are iterative in nature, and
maintain global states. They are therefore very hard to vectorize. A
GPU will not help. The GPU has hundreds of computational cores that
can run kernels, but you only get to utilize one.

Parallel PRNGs are an unsolved problem in computer science.


Surely you could have as many totally independent cores as you like
independently spitting out random bits whenever they feel like it to
make an equally random bitstream?  would have thought the only issue
would be ensuring high quality bitstream was used to seed each thread no?


No. For most quality PRNGs, all seeds are equal (with a few minor exceptions. 
E.g. for the Mersenne Twister, a state vector of all zeros will yield only 
zeros, but any nontrivial state vector puts the PRNG into the same orbit, just 
at different places). There is no notion of a high quality seed. The problem 
is that during the run, the separate PRNGs may overlap, which will reduce the 
independence of the samples.


That said, the enormous length of the Mersenne Twister's period helps a lot. You 
can use an ungodly number of streams and run length without having a physically 
realizable chance of overlap. The chance of having a bug in your simulation code 
is overwhelmingly greater.


There are also algorithms that can initialize a given number of PRNGs with 
different parameters (*not* seeds) that are guaranteed not to overlap. No one 
has implemented this for numpy, yet.


--
Robert Kern

I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth.
  -- Umberto Eco

--
http://mail.python.org/mailman/listinfo/python-list


Re: numpy performance and random numbers

2009-12-21 Thread Gib Bogle

David Cournapeau wrote:

On Sun, Dec 20, 2009 at 6:47 PM, Lie Ryan lie.1...@gmail.com wrote:

On 12/20/2009 2:53 PM, sturlamolden wrote:

On 20 Des, 01:46, Lie Ryanlie.1...@gmail.com  wrote:


Not necessarily, you only need to be certain that the two streams don't
overlap in any reasonable amount of time. For that purpose, you can use
a PRNG that have extremely high period like Mersenne Twister and puts
the generators to very distant states.

Except there is no way to find two very distant states and prove they
are distant enough.


Except only theoretical scientist feel the need to prove it and perhaps
perhaps for cryptographic-level security. Random number for games, random
number for tmp files, and 99.99% random number users doesn't really need
such proves.


But the OP case mostly like falls in your estimated 0.01% case. PRNG
quality is essential for reliable Monte Carlo procedures. I don't
think long period is enough to guarantee those good properties for //
random generators - at least it is not obvious to me.

David


My simulation program is Monte Carlo, but the complexity and variety of all the 
interactions ensure that PRNG sequence overlap will have no discernible effect.

--
http://mail.python.org/mailman/listinfo/python-list


Re: numpy performance and random numbers

2009-12-20 Thread Lie Ryan

On 12/20/2009 2:53 PM, sturlamolden wrote:

On 20 Des, 01:46, Lie Ryanlie.1...@gmail.com  wrote:


Not necessarily, you only need to be certain that the two streams don't
overlap in any reasonable amount of time. For that purpose, you can use
a PRNG that have extremely high period like Mersenne Twister and puts
the generators to very distant states.


Except there is no way to find two very distant states and prove they
are distant enough.

Except only theoretical scientist feel the need to prove it and perhaps 
perhaps for cryptographic-level security. Random number for games, 
random number for tmp files, and 99.99% random number users doesn't 
really need such proves.


And don't forget the effect of the very long period of PRNG like 
Mersenne Twister (2**19937 − 1, according to Wikipedia) makes it very 
unlikely to choose two different seeds and ended up in nearby entry 
point. Let's just assume we're using a 2**64-bit integer as the seeds 
and let's assume the entry point defined by these seeds are uniformly 
distributed you would to generate (on average) 2.34E+5982 numbers before 
you clashes with the nearest entry point. Such amount of numbers would 
require decades to generate. Your seed generator guard would only need 
to prevent seeding parallel generators with the same seed (which is 
disastrous), and that's it, the big period covers everything else.

--
http://mail.python.org/mailman/listinfo/python-list


Re: numpy performance and random numbers

2009-12-20 Thread David Cournapeau
On Sun, Dec 20, 2009 at 6:47 PM, Lie Ryan lie.1...@gmail.com wrote:
 On 12/20/2009 2:53 PM, sturlamolden wrote:

 On 20 Des, 01:46, Lie Ryanlie.1...@gmail.com  wrote:

 Not necessarily, you only need to be certain that the two streams don't
 overlap in any reasonable amount of time. For that purpose, you can use
 a PRNG that have extremely high period like Mersenne Twister and puts
 the generators to very distant states.

 Except there is no way to find two very distant states and prove they
 are distant enough.

 Except only theoretical scientist feel the need to prove it and perhaps
 perhaps for cryptographic-level security. Random number for games, random
 number for tmp files, and 99.99% random number users doesn't really need
 such proves.

But the OP case mostly like falls in your estimated 0.01% case. PRNG
quality is essential for reliable Monte Carlo procedures. I don't
think long period is enough to guarantee those good properties for //
random generators - at least it is not obvious to me.

David
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: numpy performance and random numbers

2009-12-20 Thread Lie Ryan

On 12/21/2009 1:13 AM, David Cournapeau wrote:

But the OP case mostly like falls in your estimated 0.01% case. PRNG
quality is essential for reliable Monte Carlo procedures. I don't
think long period is enough to guarantee those good properties for //
random generators - at least it is not obvious to me.


Now it's not, long periods are not indicator of quality. I was 
responding to the chance of unexpected repetition of sequence because of 
collision of entry points. Long periods is an indicator that the chance 
of entry point collision should be low enough. Long periods (alone) 
doesn't mean anything to the quality of the randomness itself.

--
http://mail.python.org/mailman/listinfo/python-list


Re: numpy performance and random numbers

2009-12-20 Thread Peter Pearson
On Sun, 20 Dec 2009 07:26:03 +1100, Lie Ryan lie.1...@gmail.com wrote:
 On 12/20/2009 4:02 AM, Carl Johan Rehn wrote:

 Parallel PRNGs are an unsolved problem in computer science.

 Thanks again for sharing your knowledge. I had no idea. This means
 that if I want to speed up my application I have to go for the fastest
 random generator and focus on other parts of my code that can be
 vectorized.

 If you don't care about repeatability (which is already extremely 
 difficult in parallel processing even without random number generators), 
 you can just start two PRNG at two distinct states (and probably from 
 two different algorithms) and they will each spews out two independent 
 streams of random numbers. What was unsolved was the pseudo- part of 
 the random number generation, which guarantee perfect replayability in 
 all conditions.

Why not use a good cipher, such as AES, to generate a pseudorandom
bit stream by encrypting successive integers?  If you use a different
key for each instance, you'll have exactly the independence you want.
And if you can detect any statistical anomaly in the output, you
automatically have the most exciting paper to be published in the next
issue of the Journal of Cryptology.

Minor caveats:

 1. This might be slower than another approach, but maybe not:
AES is much faster than the ciphers of the old days.

 2. Since AES(key,i) != AES(key,j) if i != j, there will be a
dearth of duplicates, which will become statistically
detectable around the time i has been incremented
2**64 times.  There are many reasons why this might not
bother you (one: you don't plan to use so many values;
two: you might use the 128-bit AES output in pieces, rather
than as a single value, in which case duplicates will
appear among the pieces at the right rate), but if it *does*
bother you, it can be fixed by using i^AES(key,i) instead
of just AES(key,i).

-- 
To email me, substitute nowhere-spamcop, invalid-net.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: numpy performance and random numbers

2009-12-20 Thread Rami Chowdhury

On Dec 20, 2009, at 17:41 , Peter Pearson wrote:

 On Sun, 20 Dec 2009 07:26:03 +1100, Lie Ryan lie.1...@gmail.com wrote:
 On 12/20/2009 4:02 AM, Carl Johan Rehn wrote:
 
 Parallel PRNGs are an unsolved problem in computer science.
 
 Thanks again for sharing your knowledge. I had no idea. This means
 that if I want to speed up my application I have to go for the fastest
 random generator and focus on other parts of my code that can be
 vectorized.
 
 If you don't care about repeatability (which is already extremely 
 difficult in parallel processing even without random number generators), 
 you can just start two PRNG at two distinct states (and probably from 
 two different algorithms) and they will each spews out two independent 
 streams of random numbers. What was unsolved was the pseudo- part of 
 the random number generation, which guarantee perfect replayability in 
 all conditions.
 
 Why not use a good cipher, such as AES, to generate a pseudorandom
 bit stream by encrypting successive integers?  

Isn't the Fortuna PRNG based around that approximate concept?

-
Rami Chowdhury
Never assume malice when stupidity will suffice. -- Hanlon's Razor
408-597-7068 (US) / 07875-841-046 (UK) / 0189-245544 (BD)

-- 
http://mail.python.org/mailman/listinfo/python-list


numpy performance and random numbers

2009-12-19 Thread Carl Johan Rehn
Dear friends,

I plan to port a Monte Carlo engine from Matlab to Python. However,
when I timed randn(N1, N2) in Python and compared it with Matlab's
randn, Matlab came out as a clear winner with a speedup of 3-4 times.
This was truly disappointing. I ran tthis test on a Win32 machine and
without the Atlas library.

Why is there such a large difference in speed and how can I improve
the speed of randn in Python! Any help with this matter is truly
appreciated since I really would like to get away from Matlab and move
over to Python instead.

Yours Carl
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: numpy performance and random numbers

2009-12-19 Thread Steven D'Aprano
On Sat, 19 Dec 2009 02:05:17 -0800, Carl Johan Rehn wrote:

 Dear friends,
 
 I plan to port a Monte Carlo engine from Matlab to Python. However, when
 I timed randn(N1, N2) in Python and compared it with Matlab's randn,

What's randn? I don't know that function. I know the randint, random, and 
randrange functions, but not randn. Where does it come from?


 Matlab came out as a clear winner with a speedup of 3-4 times. This was
 truly disappointing. I ran tthis test on a Win32 machine and without the
 Atlas library.
 
 Why is there such a large difference in speed and how can I improve the
 speed of randn in Python! Any help with this matter is truly appreciated
 since I really would like to get away from Matlab and move over to
 Python instead.

Could be many reasons. Python could be generally slower than Matlab. Your 
timing code might have been faulty and you weren't comparing equal 
amounts of work (if your Python code was doing four times as much work as 
the Matlab code, then naturally it will be four times slower). Perhaps 
the Matlab random number generator is a low-quality generator which is 
fast but not very random. Python uses a very high quality RNG which is 
not cheap.

But does it really matter if the RNG is slower? Your Monte Carlo engine 
is a lot more than just a RNG. What matters is whether the engine as a 
whole is faster or slower, not whether one small component is slower.


-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: numpy performance and random numbers

2009-12-19 Thread Carl Johan Rehn
On Dec 19, 12:29 pm, Steven D'Aprano st...@remove-this-
cybersource.com.au wrote:
 On Sat, 19 Dec 2009 02:05:17 -0800, Carl Johan Rehn wrote:
  Dear friends,

  I plan to port a Monte Carlo engine from Matlab to Python. However, when
  I timed randn(N1, N2) in Python and compared it with Matlab's randn,

 What's randn? I don't know that function. I know the randint, random, and
 randrange functions, but not randn. Where does it come from?

  Matlab came out as a clear winner with a speedup of 3-4 times. This was
  truly disappointing. I ran tthis test on a Win32 machine and without the
  Atlas library.

  Why is there such a large difference in speed and how can I improve the
  speed of randn in Python! Any help with this matter is truly appreciated
  since I really would like to get away from Matlab and move over to
  Python instead.

 Could be many reasons. Python could be generally slower than Matlab. Your
 timing code might have been faulty and you weren't comparing equal
 amounts of work (if your Python code was doing four times as much work as
 the Matlab code, then naturally it will be four times slower). Perhaps
 the Matlab random number generator is a low-quality generator which is
 fast but not very random. Python uses a very high quality RNG which is
 not cheap.

 But does it really matter if the RNG is slower? Your Monte Carlo engine
 is a lot more than just a RNG. What matters is whether the engine as a
 whole is faster or slower, not whether one small component is slower.

 --
 Steven

randn is given by

 import numpy
 numpy.random.randn(2,3)
array([[-2.66435181, -0.32486419,  0.12742156],
   [-0.2387061 , -0.55894044,  1.20750493]])

Generally, at least in my MC application, I need a large number of
random numbers. Usually I execute, for example, r = randn(100, 1)
sequentially a relatively large number of times until sufficient
accuracy has been reached. Thus, randn is in my case a mission
critical component for obtaining an acceptable overall run time.
Matlab and numpy have (by chance?) the exact names for the same
functionality, so I was very happy with numpy's implementation until I
timed it. So the basioc question is, how can I speed up random number
generation?

Carl



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: numpy performance and random numbers

2009-12-19 Thread sturlamolden
On 19 Des, 11:05, Carl Johan Rehn car...@gmail.com wrote:

 I plan to port a Monte Carlo engine from Matlab to Python. However,
 when I timed randn(N1, N2) in Python and compared it with Matlab's
 randn, Matlab came out as a clear winner with a speedup of 3-4 times.
 This was truly disappointing. I ran tthis test on a Win32 machine and
 without the Atlas library.

This is due to the algorithm. Matlab is using Marsaglia's ziggurat
method. Is is the fastest there is for normal and gamma random
variates. NumPy uses the Mersenne Twister to produce uniform random
deviates, and then applies trancendental functions to transform to the
normal distribution. Marsaglia's C code for ziggurat is freely
available, so you can compile it yourself and call from ctypes, Cython
or f2py.

The PRNG does not use BLAS/ATLAS.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: numpy performance and random numbers

2009-12-19 Thread sturlamolden
On 19 Des, 12:29, Steven D'Aprano st...@remove-this-
cybersource.com.au wrote:

 Perhaps
 the Matlab random number generator is a low-quality generator which is
 fast but not very random. Python uses a very high quality RNG which is
 not cheap.

Marsaglia and Matlab's implementation of ziggurat uses a slightly
lower quality RNG for uniform deviates than NumPy's Mersenne Twister.
But the real speed advantage comes from avoiding trancendental
functions. I have for some time thought of contributing a ziggurat
generator to NumPy, while retaining the Mersenne Twister internally,
but I have not got around to do it.


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: numpy performance and random numbers

2009-12-19 Thread sturlamolden
On 19 Des, 14:06, Carl Johan Rehn car...@gmail.com wrote:

 Matlab and numpy have (by chance?) the exact names for the same
 functionality,

Common ancenstry, NumPy and Matlab borrowed the name from IDL.


LabView, Octave and SciLab uses the name randn as well.


 So the basioc question is, how can I speed up random number
 generation?

The obvious thing would be to compile ziggurat yourself, and turn on
optimization flags for your hardware.
http://www.jstatsoft.org/v05/i08/


P.S. Be careful if you consider using more than one processor.
Multithreading is a very difficult issue with PRNGs, becuase it is
difficult to guarrantee they are truely independent. But you can use a
producer-consumer pattern, though: one thread constantly producing
random numbers (writing into a buffer or pipe) and another thread(s)
consuming them.












-- 
http://mail.python.org/mailman/listinfo/python-list


Re: numpy performance and random numbers

2009-12-19 Thread Carl Johan Rehn
On Dec 19, 2:49 pm, sturlamolden sturlamol...@yahoo.no wrote:
 On 19 Des, 11:05, Carl Johan Rehn car...@gmail.com wrote:

  I plan to port a Monte Carlo engine from Matlab to Python. However,
  when I timed randn(N1, N2) in Python and compared it with Matlab's
  randn, Matlab came out as a clear winner with a speedup of 3-4 times.
  This was truly disappointing. I ran tthis test on a Win32 machine and
  without the Atlas library.

 This is due to the algorithm. Matlab is using Marsaglia's ziggurat
 method. Is is the fastest there is for normal and gamma random
 variates. NumPy uses the Mersenne Twister to produce uniform random
 deviates, and then applies trancendental functions to transform to the
 normal distribution. Marsaglia's C code for ziggurat is freely
 available, so you can compile it yourself and call from ctypes, Cython
 or f2py.

 The PRNG does not use BLAS/ATLAS.

Thank you, this was very informative. I know about the Mersenne
Twister, but had no idea about Marsaglia's ziggurat method. I will
definitely try f2py or cython.

Well, I guess I knew that random numbers were not handled by BLAS/
ATLAS, but wasn't 100% sure.

Carl
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: numpy performance and random numbers

2009-12-19 Thread Carl Johan Rehn
On Dec 19, 3:16 pm, sturlamolden sturlamol...@yahoo.no wrote:
 On 19 Des, 14:06, Carl Johan Rehn car...@gmail.com wrote:

  Matlab and numpy have (by chance?) the exact names for the same
  functionality,

 Common ancenstry, NumPy and Matlab borrowed the name from IDL.

 LabView, Octave and SciLab uses the name randn as well.

  So the basioc question is, how can I speed up random number
  generation?

 The obvious thing would be to compile ziggurat yourself, and turn on
 optimization flags for your hardware.http://www.jstatsoft.org/v05/i08/

 P.S. Be careful if you consider using more than one processor.
 Multithreading is a very difficult issue with PRNGs, becuase it is
 difficult to guarrantee they are truely independent. But you can use a
 producer-consumer pattern, though: one thread constantly producing
 random numbers (writing into a buffer or pipe) and another thread(s)
 consuming them.

How about mulit-core or (perhaps more exciting) GPU and CUDA? I must
admit that I am extremely interested in trying the CUDA-alternative.

Obviously, cuBLAS is not an option here, so what is the safest route
for a novice parallel-programmer?

Carl


Carl
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: numpy performance and random numbers

2009-12-19 Thread Steven D'Aprano
On Sat, 19 Dec 2009 05:06:53 -0800, Carl Johan Rehn wrote:

 so I was very happy with numpy's implementation until I timed it.

How did you time it?


-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: numpy performance and random numbers

2009-12-19 Thread sturlamolden
On 19 Des, 16:20, Carl Johan Rehn car...@gmail.com wrote:

 How about mulit-core or (perhaps more exciting) GPU and CUDA? I must
 admit that I am extremely interested in trying the CUDA-alternative.

 Obviously, cuBLAS is not an option here, so what is the safest route
 for a novice parallel-programmer?

The problem with PRNG is that they are iterative in nature, and
maintain global states. They are therefore very hard to vectorize. A
GPU will not help. The GPU has hundreds of computational cores that
can run kernels, but you only get to utilize one.

Parallel PRNGs are an unsolved problem in computer science.





-- 
http://mail.python.org/mailman/listinfo/python-list


Re: numpy performance and random numbers

2009-12-19 Thread Carl Johan Rehn
On Dec 19, 4:47 pm, sturlamolden sturlamol...@yahoo.no wrote:
 On 19 Des, 16:20, Carl Johan Rehn car...@gmail.com wrote:

  How about mulit-core or (perhaps more exciting) GPU and CUDA? I must
  admit that I am extremely interested in trying the CUDA-alternative.

  Obviously, cuBLAS is not an option here, so what is the safest route
  for a novice parallel-programmer?

 The problem with PRNG is that they are iterative in nature, and
 maintain global states. They are therefore very hard to vectorize. A
 GPU will not help. The GPU has hundreds of computational cores that
 can run kernels, but you only get to utilize one.

 Parallel PRNGs are an unsolved problem in computer science.


How did you time it?

Well, in Matlab I used tic; for i = 1:1000, randn(100, 1), end;
toc and in IPython i used a similar construct but with time instead
of tic/(toc.


 Parallel PRNGs are an unsolved problem in computer science.

Thanks again for sharing your knowledge. I had no idea. This means
that if I want to speed up my application I have to go for the fastest
random generator and focus on other parts of my code that can be
vectorized.

Carl
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: numpy performance and random numbers

2009-12-19 Thread Lie Ryan

On 12/20/2009 4:02 AM, Carl Johan Rehn wrote:

How did you time it?


Well, in Matlab I used tic; for i = 1:1000, randn(100, 1), end;
toc and in IPython i used a similar construct but with time instead
of tic/(toc.


Code?


Parallel PRNGs are an unsolved problem in computer science.


Thanks again for sharing your knowledge. I had no idea. This means
that if I want to speed up my application I have to go for the fastest
random generator and focus on other parts of my code that can be
vectorized.


If you don't care about repeatability (which is already extremely 
difficult in parallel processing even without random number generators), 
you can just start two PRNG at two distinct states (and probably from 
two different algorithms) and they will each spews out two independent 
streams of random numbers. What was unsolved was the pseudo- part of 
the random number generation, which guarantee perfect replayability in 
all conditions.

--
http://mail.python.org/mailman/listinfo/python-list


Re: numpy performance and random numbers

2009-12-19 Thread sturlamolden
On 19 Des, 21:26, Lie Ryan lie.1...@gmail.com wrote:

 you can just start two PRNG at two distinct states

No. You have to know for certain that the outputs don't overlap.

If you pick two random states (using any PRNG), you need error-
checking that states are always unique, i.e. that each PRNG never
reaches the starting state of the other(s). If I had to do this, I
would e.g. hash the state to a 32 bit integer. You can deal with
errors by jumping to a different state or simply aborting the
simulation. The problem is that the book-keeping will be costly
compared to the work involved in generating a single pseudorandom
deviate. So while we can construct a parallel PRNG this way, it will
likely run slower than one unique PRNG.

It has been suggested to use short-period MTs with different
characteristic polynomials. This is also available for the nVidia GPU
using CUDA. This is probably the generator I would pick if I wanted to
produce random numbers in parallel. However, here we should be aware
that the math has not been thoroughly proven.

http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/DC/dgene.pdf
http://developer.download.nvidia.com/compute/cuda/2_2/sdk/website/projects/MersenneTwister/doc/MersenneTwister.pdf

There is also a version of MT that use SIMD instructions internally,
which gives a factor-of-two speedup.

I would never use different algorithms. It will introduce a systematic
difference in the simulation.



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: numpy performance and random numbers

2009-12-19 Thread sturlamolden
On 19 Des, 22:58, sturlamolden sturlamol...@yahoo.no wrote:

 If you pick two random states (using any PRNG), you need error-
 checking that states are always unique, i.e. that each PRNG never
 reaches the starting state of the other(s).

Another note on this:

Ideally, we would e.g. know how to find (analytically) MT states that
are very far apart. But to my knowledge no such equation has been
derived.

But often in Monte Carlo simulations, the PRNG is not the dominant
computational bottleneck. So we can simply start N PRNGs from N
consequtive states, and for each PRNG only use every N-th pseudorandom
deviate.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: numpy performance and random numbers

2009-12-19 Thread Carl Johan Rehn
On 19 Dec, 23:09, sturlamolden sturlamol...@yahoo.no wrote:
 On 19 Des, 22:58, sturlamolden sturlamol...@yahoo.no wrote:

  If you pick two random states (using any PRNG), you need error-
  checking that states are always unique, i.e. that each PRNG never
  reaches the starting state of the other(s).

 Another note on this:

 Ideally, we would e.g. know how to find (analytically) MT states that
 are very far apart. But to my knowledge no such equation has been
 derived.

 But often in Monte Carlo simulations, the PRNG is not the dominant
 computational bottleneck. So we can simply start N PRNGs from N
 consequtive states, and for each PRNG only use every N-th pseudorandom
 deviate.

Thank you for pointing me to the short-period MT reference and
especially the reference on the CUDA-version of parallel MT (even
though I would have wished the author had included a benchmark
comparison in the report). This is a very interesting topic. I agree
that it may work to start PRNGs at distinct and different states, but
that bookkeeping may slow down the algorithm so that it is not worth
the effort. However, the CUDA-version sounds interesting and should be
easy enough to use in a practical application.

Carl
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: numpy performance and random numbers

2009-12-19 Thread Gregory Ewing

Lie Ryan wrote:

If you don't care about repeatability (which is already extremely 
difficult in parallel processing even without random number generators), 
you can just start two PRNG at two distinct states (and probably from 
two different algorithms)


There's no need for different algorithms, and no problem
with repeatability. There exist algorithms with a very
long period (on the order of 2*100 or more) for which it's
easy to calculate seeds for a set of guaranteed non-
overlapping subsequences.

http://or.journal.informs.org/cgi/content/abstract/44/5/816
http://or.journal.informs.org/cgi/content/abstract/47/1/159

In these kinds of generator, moving from one state to
the next involves multiplying a state vector by a matrix
using modulo arithmetic. So you can jump ahead N steps
by raising the matrix to the power of N.

--
Greg
--
http://mail.python.org/mailman/listinfo/python-list


Re: numpy performance and random numbers

2009-12-19 Thread Steven D'Aprano
On Sat, 19 Dec 2009 09:02:38 -0800, Carl Johan Rehn wrote:

 Well, in Matlab I used tic; for i = 1:1000, randn(100, 1), end;
 toc and in IPython i used a similar construct but with time instead
 of tic/(toc.


I don't know if this will make any significant difference, but for the 
record that is not the optimal way to time a function in Python due to 
the overhead of creating the loop sequence.

In Python, the typical for-loop construct is something like this:

for i in range(1, 1000)

but range isn't a funny way of writing loop over these values, it 
actually creates a list of integers, which has a measurable cost. In 
Python 2.x you can reduce that cost by using xrange instead of range, but 
it's even better to pre-compute the list outside of the timing code. Even 
better still is to use a loop sequence like [None]*1000 (precomputed 
outside of the timing code naturally!) to minimize the cost of memory 
accesses.

The best way to perform timings of small code snippets in Python is using 
the timeit module, which does all these things for you. Something like 
this:

from timeit import Timer
t = Timer('randn(100, 1)', 'from numpy import randn')
print min(t.repeat())

This will return the time taken by one million calls to randn. Because of 
the nature of modern operating systems, any one individual timing can be 
seriously impacted by other processes, so it does three independent 
timings and returns the lowest. And it will automatically pick the best 
timer to use according to your operating system (either clock, which is 
best on Windows, or time, which is better on Posix systems).



-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: numpy performance and random numbers

2009-12-19 Thread Lie Ryan

On 12/20/2009 8:58 AM, sturlamolden wrote:

On 19 Des, 21:26, Lie Ryanlie.1...@gmail.com  wrote:


you can just start two PRNG at two distinct states


No. You have to know for certain that the outputs don't overlap.


Not necessarily, you only need to be certain that the two streams don't 
overlap in any reasonable amount of time. For that purpose, you can use 
a PRNG that have extremely high period like Mersenne Twister and puts 
the generators to very distant states.



If you pick two random states (using any PRNG), you need error-
checking that states are always unique, i.e. that each PRNG never
reaches the starting state of the other(s). If I had to do this, I
would e.g. hash the state to a 32 bit integer. You can deal with
errors by jumping to a different state or simply aborting the
simulation. The problem is that the book-keeping will be costly
compared to the work involved in generating a single pseudorandom
deviate. So while we can construct a parallel PRNG this way, it will
likely run slower than one unique PRNG.


You will need some logic to ensure that your generator's states are 
distant enough, but that won't incur any generation overhead. Of course 
this relies on the very large period of the PRNG algorithm.

--
http://mail.python.org/mailman/listinfo/python-list


Re: numpy performance and random numbers

2009-12-19 Thread Steven D'Aprano
On Sat, 19 Dec 2009 13:58:37 -0800, sturlamolden wrote:

 On 19 Des, 21:26, Lie Ryan lie.1...@gmail.com wrote:
 
 you can just start two PRNG at two distinct states
 
 No. You have to know for certain that the outputs don't overlap.

For certain? Why?

Presumably you never do a Monte Carlo simulation once, you always repeat 
the simulation. Does it matter if (say) one trial in fifty is lower-
quality than the rest because the outputs happened to overlap? What if 
it's one trial in fifty thousand? So long as the probability of overlap 
is sufficiently small, you might not even care about doing repeated 
simulations.


 If you pick two random states (using any PRNG), you need error- checking
 that states are always unique, i.e. that each PRNG never reaches the
 starting state of the other(s). If I had to do this, I would e.g. hash
 the state to a 32 bit integer. You can deal with errors by jumping to a
 different state or simply aborting the simulation. The problem is that
 the book-keeping will be costly compared to the work involved in
 generating a single pseudorandom deviate. So while we can construct a
 parallel PRNG this way, it will likely run slower than one unique PRNG.

Since truly random sequences sometimes produce repeating sequences, you 
should be able to tolerate short periods of overlap. I'd consider the 
following strategy: for each of your parallel PRNGs other than the first, 
periodically jump to another state unconditionally. The periods should be 
relatively prime to each other, e.g. the second PRNG jumps-ahead after 
(say) 51 calls, the third after 37 calls, etc. (the exact periods 
shouldn't be important). Use a second, cheap, PRNG to specify how far to 
jump ahead. The overhead should be quite small: a simple counter and test 
for each PRNG, plus a small number of calls to a cheap PRNG, and 
statistically you should expect no significant overlap.

Is this workable?



-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Numpy Performance

2009-04-24 Thread timlash
Thanks for your replies.

@Peter - My arrays are not sparse at all, but I'll take a quick look
as scipy.  I also should have mentioned that my numpy arrays are of
Object type as each data point (row) has one or more text labels for
categorization.

@Robert - Thanks for the comments about how numpy was optimized for
bulk transactions.  Most of the processing I'm doing is with
individual elements.

Essentially, I'm testing tens of thousands of scenarios on a
relatively small number of test cases.  Each scenario requires all
elements of each test case to be scored, then summarized, then sorted
and grouped with some top scores captured for reporting.

It seems like I can either work toward a procedure that features
indexed categorization so that my arrays are of integer type and a
design that will allow each scenario to be handled in bulk numpy
fashion, or expand RectangularArray with custom data handling methods.

Any other recommended approaches to working with tabular data in
Python?

Cheers,

Tim
--
http://mail.python.org/mailman/listinfo/python-list


Re: Numpy Performance

2009-04-24 Thread Robert Kern

On 2009-04-24 08:05, timlash wrote:

Essentially, I'm testing tens of thousands of scenarios on a
relatively small number of test cases.  Each scenario requires all
elements of each test case to be scored, then summarized, then sorted
and grouped with some top scores captured for reporting.

It seems like I can either work toward a procedure that features
indexed categorization so that my arrays are of integer type and a
design that will allow each scenario to be handled in bulk numpy
fashion, or expand RectangularArray with custom data handling methods.


If you posted a small, self-contained example of what you are doing to 
numpy-discussion, the denizens there will probably be able to help you formulate 
the right way to do this in numpy, if such a way exists.


  http://www.scipy.org/Mailing_Lists

--
Robert Kern

I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth.
  -- Umberto Eco

--
http://mail.python.org/mailman/listinfo/python-list


Numpy Performance

2009-04-23 Thread timlash
Still fairly new to Python.  I wrote a program that used a class
called RectangularArray as described here:

class RectangularArray:
   def __init__(self, rows, cols, value=0):
  self.arr = [None]*rows
  self.row = [value]*cols
   def __getitem__(self, (i, j)):
  return (self.arr[i] or self.row)[j]
   def __setitem__(self, (i, j), value):
  if self.arr[i]==None: self.arr[i] = self.row[:]
  self.arr[i][j] = value

This class was found in a 14 year old post:
http://www.python.org/search/hypermail/python-recent/0106.html

This worked great and let me process a few hundred thousand data
points with relative ease.  However, I soon wanted to start sorting
arbitrary portions of my arrays and to transpose others.  I turned to
Numpy rather than reinventing the wheel with custom methods within the
serviceable RectangularArray class.  However, once I refactored with
Numpy I was surprised to find that the execution time for my program
doubled!  I expected a purpose built array module to be more efficient
rather than less.

I'm not doing any linear algebra with my data.  I'm working with
rectangular datasets, evaluating individual rows, grouping, sorting
and summarizing various subsets of rows.

Is a Numpy implementation overkill for my data handling uses?  Should
I evaluate prior array modules such as Numeric or Numarray?  Are there
any other modules suited to handling tabular data?  Would I be best
off expanding the RectangularArray class for the few data
transformation methods I need?

Any guidance or suggestions would be greatly appreciated!

Cheers,

Tim
--
http://mail.python.org/mailman/listinfo/python-list


Re: Numpy Performance

2009-04-23 Thread Peter Otten
timlash wrote:

 Still fairly new to Python.  I wrote a program that used a class
 called RectangularArray as described here:
 
 class RectangularArray:
def __init__(self, rows, cols, value=0):
   self.arr = [None]*rows
   self.row = [value]*cols
def __getitem__(self, (i, j)):
   return (self.arr[i] or self.row)[j]
def __setitem__(self, (i, j), value):
   if self.arr[i]==None: self.arr[i] = self.row[:]
   self.arr[i][j] = value
 
 This class was found in a 14 year old post:
 http://www.python.org/search/hypermail/python-recent/0106.html
 
 This worked great and let me process a few hundred thousand data
 points with relative ease.  However, I soon wanted to start sorting
 arbitrary portions of my arrays and to transpose others.  I turned to
 Numpy rather than reinventing the wheel with custom methods within the
 serviceable RectangularArray class.  However, once I refactored with
 Numpy I was surprised to find that the execution time for my program
 doubled!  I expected a purpose built array module to be more efficient
 rather than less.
 
 I'm not doing any linear algebra with my data.  I'm working with
 rectangular datasets, evaluating individual rows, grouping, sorting
 and summarizing various subsets of rows.
 
 Is a Numpy implementation overkill for my data handling uses?  Should
 I evaluate prior array modules such as Numeric or Numarray?  Are there
 any other modules suited to handling tabular data?  Would I be best
 off expanding the RectangularArray class for the few data
 transformation methods I need?
 
 Any guidance or suggestions would be greatly appreciated!

Do you have many rows with zeros? That might be the reason why your
self-made approach shows better performance.

Googling for numpy sparse finds:

http://www.scipy.org/SciPy_Tutorial

Maybe one of the sparse matrix implementations in scipy works for you.

Peter
--
http://mail.python.org/mailman/listinfo/python-list


Re: Numpy Performance

2009-04-23 Thread Robert Kern

On 2009-04-23 10:32, timlash wrote:

Still fairly new to Python.  I wrote a program that used a class
called RectangularArray as described here:

class RectangularArray:
def __init__(self, rows, cols, value=0):
   self.arr = [None]*rows
   self.row = [value]*cols
def __getitem__(self, (i, j)):
   return (self.arr[i] or self.row)[j]
def __setitem__(self, (i, j), value):
   if self.arr[i]==None: self.arr[i] = self.row[:]
   self.arr[i][j] = value

This class was found in a 14 year old post:
http://www.python.org/search/hypermail/python-recent/0106.html

This worked great and let me process a few hundred thousand data
points with relative ease.  However, I soon wanted to start sorting
arbitrary portions of my arrays and to transpose others.  I turned to
Numpy rather than reinventing the wheel with custom methods within the
serviceable RectangularArray class.  However, once I refactored with
Numpy I was surprised to find that the execution time for my program
doubled!  I expected a purpose built array module to be more efficient
rather than less.


It depends on how much you refactored you code. numpy tries to optimize bulk 
operations. If you are doing a lot of __getitem__s and __setitem__s with 
individual elements as you would with RectangularArray, numpy is going to do a 
lot of extra work creating and deleting the scalar objects.



I'm not doing any linear algebra with my data.  I'm working with
rectangular datasets, evaluating individual rows, grouping, sorting
and summarizing various subsets of rows.

Is a Numpy implementation overkill for my data handling uses?  Should
I evaluate prior array modules such as Numeric or Numarray?


No.

--
Robert Kern

I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth.
  -- Umberto Eco

--
http://mail.python.org/mailman/listinfo/python-list


numpy performance and list comprehension

2007-04-03 Thread TG
Hi there.

Reading the page on python performance ( http://scipy.org/PerformancePython
) made me realize that I can achieve tremendous code acceleration with
numpy just by using u[:,:] kind of syntax the clever way.

Here is a little problem (Oja's rule of synaptic plasticity)

* W is a matrix containing the weights of connections between elements
i
and j
* V is an array containing the values of elements

I want to make W evolve with this rule :

dW[i,j] / dt = alpha * (V[i] * V[j] - W[i,j] * V[i]^2)

(don't pay attention to the derivate and stuff)

So, how would you write it in this nifty clever way ?

As a begining I wrote this :

  W += V.flatten().reshape((V.size,1)) *
V.flatten().reshape((1,V.size))

But it is not complete and, I guess, not efficient.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: numpy performance and list comprehension

2007-04-03 Thread irstas
On Apr 3, 5:42 pm, TG [EMAIL PROTECTED] wrote:
 Hi there.

 Reading the page on python performance (http://scipy.org/PerformancePython
 ) made me realize that I can achieve tremendous code acceleration with
 numpy just by using u[:,:] kind of syntax the clever way.

 Here is a little problem (Oja's rule of synaptic plasticity)

 * W is a matrix containing the weights of connections between elements
 i
 and j
 * V is an array containing the values of elements

 I want to make W evolve with this rule :

 dW[i,j] / dt = alpha * (V[i] * V[j] - W[i,j] * V[i]^2)

 (don't pay attention to the derivate and stuff)

 So, how would you write it in this nifty clever way ?

 As a begining I wrote this :

   W += V.flatten().reshape((V.size,1)) *
 V.flatten().reshape((1,V.size))

 But it is not complete and, I guess, not efficient.

alpha * (V[i] * V[j] - W[i,j] * V[i]^2) =
alpha * V[i] * (V[j] - W[i,j] * V[i])

Assuming that V is a column vector, you could do it like this:

V = array([[5],[3],[7]])
W = array([[1,5,3],[2,2,7],[3,9,8]])
W += alpha * V * (transpose(V) - W * V)

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: numpy performance and list comprehension

2007-04-03 Thread Robert Kern
TG wrote:
 Hi there.
 
 Reading the page on python performance ( http://scipy.org/PerformancePython
 ) made me realize that I can achieve tremendous code acceleration with
 numpy just by using u[:,:] kind of syntax the clever way.
 
 Here is a little problem (Oja's rule of synaptic plasticity)
 
 * W is a matrix containing the weights of connections between elements
 i
 and j
 * V is an array containing the values of elements
 
 I want to make W evolve with this rule :
 
 dW[i,j] / dt = alpha * (V[i] * V[j] - W[i,j] * V[i]^2)
 
 (don't pay attention to the derivate and stuff)
 
 So, how would you write it in this nifty clever way ?

irstas is correct. I'm just going to show off another feature of numpy,
numpy.newaxis.

import numpy as np

V = np.array([1, 2, 3])
VT = V[:, np.newaxis]  # VT.shape == (3, 1)

W = np.array([[1,2,3], [4,5,6], [7,8,9]])

dWdt = alpha * VT*(V - W*VT)

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth.
  -- Umberto Eco

-- 
http://mail.python.org/mailman/listinfo/python-list