Re: [Numpy-discussion] combinations anomaly

2007-09-22 Thread Charles R Harris
On 9/22/07, Charles R Harris <[EMAIL PROTECTED]> wrote:
>
>
>
> On 9/22/07, Alan G Isaac <[EMAIL PROTECTED]> wrote:
> >
> > Charles harris posted a generator function for generating
> > combinations (based on Knuth's fascicle).  I get the
> > expected results by iterating over the resulting generator,
> > but not if I let ``list`` do that for me.  What is more,
> > changing ``arange`` to ``range`` in the code eliminates
> > this anomaly.
> >
> > What am I failing to understand here?
> >
> > Thank you,
> > Alan Isaac
>
>
> There are a couple of potential problems if you want to make a list.
> Because an array view is returned, and the data is updated in the loop, all
> the views will end up with the same content. I used arrays and views for
> speed. To fix that, you need to return a copy, i.e., yield c[:t].copy().
> That way you will end up with a list of arrays. If you do yield list(c[:t]),
> you will get a list of lists. Or, you can do as you did and just use range
> instead of arange because a slice of a list returns a copy. I admit I'm
> curious about the speed of the two approaches, lists may be faster than
> arrays. Lets see combination returns array copies, combinaion1 uses
> range.
>
> In [7]: time for i in combination(100,3) : pass
> CPU times: user 0.89 s, sys: 0.07 s, total: 0.96 s
> Wall time: 0.96
>
> In [8]: time for i in combination1(100,3) : pass
> CPU times: user 0.17 s, sys: 0.01 s, total: 0.18 s
> Wall time: 0.18
>
> Wow, massive speed up using lists, almost as fast as nested loops. I
> expect lists benefit from faster indexing and faster creation. I think your
> range fix is the way to go. Things slow down a bit for the full list
> treatment, but not too much:
>
> In [13]: time a = list(combination1(100,3))
> CPU times: user 0.26 s, sys: 0.00 s, total: 0.27 s
> Wall time: 0.27
>
> In [14]: time a = [i for i in combination1(100,3)]
> CPU times: user 0.35 s, sys: 0.01 s, total: 0.36 s
> Wall time: 0.36
>

It's even faster in C++

In [1]: from _combination import combination

In [2]: time a = combination(100,3)
CPU times: user 0.09 s, sys: 0.01 s, total: 0.09 s
Wall time: 0.09

That's for the whole nested list. Arrays would probably be faster in this
case because I am calling the python functions to add objects to the lists.

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] combinations anomaly

2007-09-22 Thread Alan G Isaac
On Sat, 22 Sep 2007, Charles R Harris apparently wrote:
> an array view is returned, and the data is updated in the 
> loop ... I think your range fix is the way to go.


Got it. Thanks!
Alan 


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] combinations anomaly

2007-09-22 Thread Charles R Harris
On 9/22/07, Alan G Isaac <[EMAIL PROTECTED]> wrote:
>
> Charles harris posted a generator function for generating
> combinations (based on Knuth's fascicle).  I get the
> expected results by iterating over the resulting generator,
> but not if I let ``list`` do that for me.  What is more,
> changing ``arange`` to ``range`` in the code eliminates
> this anomaly.
>
> What am I failing to understand here?
>
> Thank you,
> Alan Isaac


There are a couple of potential problems if you want to make a list. Because
an array view is returned, and the data is updated in the loop, all the
views will end up with the same content. I used arrays and views for speed.
To fix that, you need to return a copy, i.e., yield c[:t].copy(). That way
you will end up with a list of arrays. If you do yield list(c[:t]), you will
get a list of lists. Or, you can do as you did and just use range instead of
arange because a slice of a list returns a copy. I admit I'm curious about
the speed of the two approaches, lists may be faster than arrays. Lets
see combination returns array copies, combinaion1 uses range.

In [7]: time for i in combination(100,3) : pass
CPU times: user 0.89 s, sys: 0.07 s, total: 0.96 s
Wall time: 0.96

In [8]: time for i in combination1(100,3) : pass
CPU times: user 0.17 s, sys: 0.01 s, total: 0.18 s
Wall time: 0.18

Wow, massive speed up using lists, almost as fast as nested loops. I expect
lists benefit from faster indexing and faster creation. I think your range
fix is the way to go. Things slow down a bit for the full list treatment,
but not too much:

In [13]: time a = list(combination1(100,3))
CPU times: user 0.26 s, sys: 0.00 s, total: 0.27 s
Wall time: 0.27

In [14]: time a = [i for i in combination1(100,3)]
CPU times: user 0.35 s, sys: 0.01 s, total: 0.36 s
Wall time: 0.36


Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] combinations anomaly

2007-09-22 Thread Alan G Isaac
Charles harris posted a generator function for generating 
combinations (based on Knuth's fascicle).  I get the 
expected results by iterating over the resulting generator,
but not if I let ``list`` do that for me.  What is more, 
changing ``arange`` to ``range`` in the code eliminates
this anomaly.

What am I failing to understand here?

Thank you,
Alan Isaac


  example ###
def combinations(n,t):
c = arange(t + 2)
c[-2] = n
c[-1] = 0
while 1:
yield c[:t]
j = 0
while c[j] + 1 == c[j+1] :
c[j] = j
j += 1
if j >= t :
return
c[j] += 1

print "I get the expected results by iterating:"

for tpl in combinations(5,3):
print tpl

print "However list conversion fails:"

print list(combinations(5,3))
# end example ###


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Extracting all the possible combinations of a grid

2007-09-22 Thread Gael Varoquaux
On Sat, Sep 22, 2007 at 09:58:33AM -0600, Charles R Harris wrote:
>Umm... that doesn't look quite right. Shouldn't it be something like

Puzzling. My implementation works, as far as I could test it, which I did
as much as I could. Maybe the two are equivalent.

>Algorithm L can be chunked pretty easily also:

Yes, I think this would scale even better for really big sets, as the
size of the chunk can be more easily controlled. The good think is that
with chunks like this the calculation is fully "local", so if the result
of the calculation does not depend on the number of points (eg its a sum
of the calculation for eache point) then the calculation can go on for
ever, there is no limit to the number of points other than the patience
of the operator. So if the algorythm is right, they can leave it running
for two days, and get great results for a paper.

Thanks for your input,

Gaël
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Extracting all the possible combinations of a grid

2007-09-22 Thread Charles R Harris
On 9/22/07, Gael Varoquaux <[EMAIL PROTECTED]> wrote:
>
> On Sat, Sep 22, 2007 at 10:35:16AM +0200, Gael Varoquaux wrote:
> > I would go for the "generate_fourplets" solution if I had a way to
> > calculate the binomial coefficient without overflows.
>
> Sorry, premature optimisation is the root of all evil, but turning ones
> brain on early is good.
>
> """
>
> ##
> # Some routines for calculation of binomial coefficients
> def gcd(m,n):
> while n:
> m,n=n,m%n
> return m
>
> def binom_(n,k):
> if k==0:
> return 1
> else:
> g = gcd(n,k)
> return binomial(n-1, k-1)/(k/g)*(n/g)
>
> def binomial(n,k):
> if k > n/2: # Limit recursion depth
> k=n-k
> return binom_(n,k)
> """
>
> This is surprisingly fast (surprising for me, at least).
>
> Using this and the C code I have, I can generate the quadruplets of 200
> integers quite quickly:
>
> In [5]: %timeit b = [1 for i in  generate_quadruplets(200)]
> 10 loops, best of 3: 1.61 s per loop
>
> With generate_quadruplets given by:
>
> """
>
> ##
> def generate_quadruplets(size):
> """ Returns an iterator on tables listing all the possible unique
> combinations of four integers below size. """
>
> C_code = """
> int index = 0;
> for (int j=0; j for (int k=0; k for (int l=0; l quadruplets(index, 0) = i;
> quadruplets(index, 1) = j;
> quadruplets(index, 2) = k;
> quadruplets(index, 3) = l;
> index++ ;
> }
> }
> }
> """
>
> for i in xrange(size):
> multiset_coef = binomial(i+3, 3)
> quadruplets = empty((multiset_coef, 4), dtype=uint32)
> inline(C_code, ['quadruplets', 'i'],
> type_converters=converters.blitz)
>
> yield quadruplets
> """


Umm... that doesn't look quite right. Shouldn't it be something like

def generate_quadruplets(size):
""" Returns an iterator on tables listing all the possible unique
combinations of four integers below size. """

C_code = """
int index = 0;
for (int j=2; j= t :
break
c[j] += 1
yield out[:i+1]
if j >= t :
return

I think this would go well as a C++ function object. The python wrapper
would look something like

def combination(n,t,chunk) :
next = cpp_combination(n,t,chunk)
while 1 :
out = next()
if len(out) > 0 :
yield out
else :
return

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Bitting the bullet: using scons to build extensions inside distutils ?

2007-09-22 Thread David Cournapeau
On 9/17/07, Matthew Brett <[EMAIL PROTECTED]> wrote:
> Hi,
>
> >Starting thinking over the whole distutils thing, I was thinking
> > what people would think about using scons inside distutils to build
> > extension.
>
> In general this seems like an excellent idea.  If we can contribute
> what we need to scons, that would greatly ease the burden of
> maintenance, and benefit both projects.  The key problem will be
> support.  At the moment Pearu maintains and owns numpy.distutils.
> Will we have the same level of commitment and support for this
> alternative do you think?
>
> How easy would it be to throw up a prototype for the rest of us to
> look at and get a feel for what the benefits would be?
Here we are for those interested: all the work is in the branch numpy.scons.

Basically, I added a command scons to numpy.distutils, and add a
add_sconscript hook to the config class. An example can be found in
numpy/scons_fake. You are expected to do

python setup.py scons

To build them (I have not found a way yet to tell the install command
to call scons command; but the current implementation put the built
code where distutils expects it, so if you call scons and then
install, it should be ok).

This is only a proof of concept, but as an example, I started to
implement a small support library for sconscript to be used inside
numpy, and got ctypes extension building working on both windows and
linux.

The Sconscript file:

# vim:syntax=python
from numpy.distutils.scons import GetNumpyEnvironment

env = GetNumpyEnvironment(ARGUMENTS)

config.CheckHeader('stdio.h')
config.CheckLib('c', 'printf')
config.Finish()

source = ['foo.c']
import sys
if sys.platform == 'win32':
env.AppendUnique(CPPDEFINES = 'WIN32')
env.NumpyCTypes('foo', source)

And in the setup.py, you just do:

config.add_sconscript('SConstruct')

Once numpy with scons support is installed, partial build is possible
(by partial I mean only using python setup.py scons in a subpackage).
This should give a feel on how it would behave for users. If people
think this is the right direction, then I can add more complete
support (mostly making scons behave as distutils  with respect to how
to find extension using setup.cfg,  implementing the checks in
numpy/distutils/system_info for scons, passing the compilers/linkers
to scons, etc...).

cheers,

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Extracting all the possible combinations of a grid

2007-09-22 Thread Gael Varoquaux
On Sat, Sep 22, 2007 at 10:35:16AM +0200, Gael Varoquaux wrote:
> I would go for the "generate_fourplets" solution if I had a way to
> calculate the binomial coefficient without overflows.

Sorry, premature optimisation is the root of all evil, but turning ones
brain on early is good.

"""
##
# Some routines for calculation of binomial coefficients
def gcd(m,n):
while n:
m,n=n,m%n
return m

def binom_(n,k):
if k==0:
return 1
else:
g = gcd(n,k)
return binomial(n-1, k-1)/(k/g)*(n/g)

def binomial(n,k): 
if k > n/2: # Limit recursion depth
k=n-k 
return binom_(n,k) 
"""

This is surprisingly fast (surprising for me, at least).

Using this and the C code I have, I can generate the quadruplets of 200
integers quite quickly:

In [5]: %timeit b = [1 for i in  generate_quadruplets(200)]
10 loops, best of 3: 1.61 s per loop

With generate_quadruplets given by:

"""
##
def generate_quadruplets(size):
""" Returns an iterator on tables listing all the possible unique 
combinations of four integers below size. """

C_code = """
int index = 0;
for (int j=0; jhttp://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Extracting all the possible combinations of a grid

2007-09-22 Thread Gael Varoquaux
On Fri, Sep 21, 2007 at 06:40:52PM -0600, Charles R Harris wrote:
>def triplet(n) :
>out = []
>for i in xrange(2,n) :
>for j in xrange(1,i) :
>for k in xrange(0,j) :
>out.append((i,j,k))
>return out

I need quadruplets, numbers going up first to 120 then to 200. I tried
this, but I can't even go to 120, I simply flood my system (1GB of RAM,
AMD64) by using all my RAM.

>Which isn't too bad for 161700 combinations. However, I still prefer the
>generator form if you want to save memory for large n.

>In [2]: def triplet(n) :
>   ...: for i in xrange(2,n) :
>   ...: for j in xrange(1,i) :
>   ...: for k in xrange(1,j) :
>   ...: yield i,j,k

That's the way we where doing it, but you really want to build arrays
when doing this, because the operations afterwards take ages.

Maybe I could build arrays by chunk of say 10^6. And keep itering until I
run out.


>In [29]: def combination(n,t) :
>   : c = arange(t + 2)
>   : c[-1] = 0
>   : c[-2] = n
>   : while 1 :
>   : yield c[:t]
>   : j = 0
>   : while c[j] + 1 == c[j+1] :
>   : c[j] = j
>   : j += 1
>   : if j >= t :
>   : return
>   : c[j] += 1

I didn't try that one... Wow, that one is pretty good ! 35s for
quadruplets going up to 120, 270s going up to 200s. I can use something
like itertools.groupby to build arrays by grouping the results.


I have implementations in C with weave inline that work pretty well: 

* For numbers up to 120, this brute force approach is just fine:

##
def unique_triplets(size):
""" Returns the arrays all the possible unique combinations of four 
integers below size."""

C_code = """
int index = 0;
for (int j=0; jhttp://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] New scipy 0.6.0 uploads

2007-09-22 Thread Jarrod Millman
There is a new scipy-0.6.0.tar.gz on the sourceforge page, which
contains the missing scipy/linalg/src/fblaswrap_veclib_c.c.  There is
also now a scipy-0.6.0-py2.4-win32.egg and scipy-0.6.0-py2.5-win32.egg.

Enjoy,

-- 
Jarrod Millman
Computational Infrastructure for Research Labs
10 Giannini Hall, UC Berkeley
phone: 510.643.4014
http://cirl.berkeley.edu/
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Compilation failure on python2.4 win32

2007-09-22 Thread Pearu Peterson
On Fri, September 21, 2007 10:23 pm, Jörgen Stenarson wrote:
> Travis E. Oliphant skrev:
>> Jörgen Stenarson wrote:
>>> Hi,
>>>
>>> I cannot compile numpy (rev 2042) for python2.4 on win32, it works on
>>> python2.5. It looks like the call to function get_build_architecture in
>>> distutils.misc_util.py is python2.5 specific.
>>>
>> Yes.  This needs to be fixed.   I'll do it.
>>
>> Can you try the current trunk?
>>
>
> I still see problems
>

>File "numpy\core\setup.py", line 109, in generate_config_h
>  from distutils.msvccompiler import get_build_architecture
> ImportError: cannot import name get_build_architecture

Try again the latest trunk.

Pearu


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion