Re: [Numpy-discussion] Faster

2008-05-05 Thread Damian Eads

Hi,

Looks like a fun discussion: it's too bad for me I did not join it 
earlier. My first try at scipy-cluster was completely in Python. Like 
you, I also tried to find the most efficient way to transform the 
distance matrix when joining two clusters. Eventually my data sets 
became big enough that I decided to write these parts in C. I don't 
think my Python joining code was efficient as yours.


I tried out your first test and I am a little confused at the output.

In [107]: goodman.test()
1
Clusters
[['BA'], ['FI'], ['MI', 'TO'], ['NA'], ['RM']]
Distance
[[997 662 255 412 996]
 [662 997 468 268 400]
 [255 468 997 219 869]
 [412 268 219 997 669]
 [996 400 869 669 997]]

2
Clusters
[['BA'], ['FI'], ['MI', 'TO', 'NA'], ['RM']]
Distance
[[998 662 412 996]
 [662 998 268 400]
 [412 268 998 669]
 [996 400 669 998]]

3
Clusters
[['BA'], ['FI', 'MI', 'TO', 'NA'], ['RM']]
Distance
[[999 412 996]
 [412 999 669]
 [996 669 999]]

4
Clusters
[['BA', 'FI', 'MI', 'TO', 'NA'], ['RM']]
Distance
[[1000  669]
 [ 669 1000]]

5
Clusters
[['BA', 'FI', 'MI', 'TO', 'NA', 'RM']]
Distance
[[1001]]

The first step is right, singletons 2 and 5 (starting at 0) should be 
joined since they have a minimum distance of 138. Let's look at their 
corresponding rows in the distance matrix.


In [101]: DM[[2,5],:]
Out[101]:
array([[   877.,295.,  1.,754.,564.,138.],
   [   996.,400.,138.,869.,669.,  1.]])

These two rows, rows 2 and 5, are all that we need to form the row for 
the newly joined cluster in the distance matrix. If we just take the 
minimum for each column we obtain,


In [102]: q=DM[[2,5],:].min(axis=0)
Out[102]: array([ 877.,  295.,  138.,  754.,  564.,  138.])

so the row for the cluster should be the row above with the 2 and 5'th 
row removed. Roughly, there should be a row in the distance matrix with 
the following values but I don't see one in your output.


In [103]: q[q != 138]
Out[103]: array([ 877.,  295.,  754.,  564.])

Since 295 is the minimum distance between this newly joined cluster and 
any other singleton, it should not be chosen for the second iteration 
since singletons 3 and 4 are closer to another with a distance of 219. 
So after iteration 2, you should get [['BA'], ['FI'], ['MI', 'TO'], 
['NA', 'RM']].


Recall that the distance matrix transformation forms a new distance 
matrix using only values from the previous distance matrix. So, at any 
iteration, the values in the distance matrix should be a subset of the 
values in the original distance matrix, eliminating the distance entries 
of the clusters formed.


If we look at the minimum distances in the original distance matrix in 
rank order, we have 138, 219, 255, 268, 295. Thus, we might expect the 
minimum distances found at each iteration to be these values, and they 
are in this case, but I don't have a mathematical proof that it works in 
general. If I run your distance matrix through hcluster.single, I get 
the following linkage matrix. The third column is the distance between 
the clusters joined, and the first two columns are the indices of the 
clusters joined (non-singletons have an index >= n).


array([[   2.,5.,  138.,2.],
   [   3.,4.,  219.,2.],
   [   0.,7.,  255.,3.],
   [   1.,8.,  268.,4.],
   [   6.,9.,  295.,6.]])

I've attached the dendrogram, since it is easier to interpret.

In [105]: lbls
Out[105]: ['BA', 'FI', 'MI', 'NA', 'RM', 'TO']
In [106]: hcluster.dendrogram(Z, labels=lbls)

I tried running your second test, and you'll see C might give you a 
better performance speed-up (not surprising). Roughly, what I'm doing in 
C is I'm only storing the upper triangular of the distance matrix. An 
array of double*'s (double **) refers to each row of this triangle. To 
eliminate a row, I simply remove the entry in the double ** array. To 
remove a column, I shift the values over in each non-removed row. I'm 
not sure if this is the best approach but it is certainly more efficient 
than what can be achieved in Python.


In [107]: hcluster.goodman.test2(1000)
n = 1000 took 22.10 seconds
In [108]: n=1000
In [109]: uu=numpy.random.rand(n*(n-1)/2)
In [110]: tic = time.time(); hcluster.single(uu); toc = time.time(); 
print toc-tic

Out[110]:
4.57607889175

Damian

Keith Goodman wrote:

On Fri, May 2, 2008 at 7:25 PM, Robert Kern <[EMAIL PROTECTED]> wrote:

 Assuming x is contiguous and you can modify x in-place:


 In [1]: from numpy import *

 In [2]: def dist(x):
   ...:x = x + 1e10 * eye(x.shape[0])
   ...:i, j = where(x == x.min())

   ...:return i[0], j[0]
   ...:

 In [3]: def symmdist(N):
   ...: x = random.rand(N, N)
   ...: x = x + x.T
   ...: x.flat[::N+1] = 0
   ...: return x
   ...:

 In [4]: symmdist(5)
 Out[4]:
 array([[ 0.,  0.87508654,  1.11691704,  0.80366071,  0.57966808],
   [ 0.87508654,  0.,  1.5521685 ,  1.74010886,  0.52156877],
   [ 1.11691704,  1.5521685 ,  0. 

Re: [Numpy-discussion] MoinMoin <-> docstrings gateway

2008-05-05 Thread dieter h
On Mon, May 5, 2008 at 1:48 AM, Gael Varoquaux
<[EMAIL PROTECTED]> wrote:
> On Mon, May 05, 2008 at 03:17:20AM +0300, Pauli Virtanen wrote:
>  > Some time ago there was discussion about MoinMoin <-> docstrings
>  > gateway. Did it produce some results?
>
>  My girlfriend, Emmanuelle, (Cced, I am not sure she follows this mailing
>  list) has been working on this, with some progress.
>
>
>  > Anyway, I threw a bit of code together. There's something between a
>  > proof-of-concept and final product running now on my desktop machine.
>  > You can play with it here:
>
>  >   http://pvx.homeip.net/pTSc0V/TestWiki
>
>  Sweet. Some comments:
>
>  * A lot of the docstrings are not valid rst. This is not your fault, but
>   we will had to fix this in the long run.


I humbly suggest Sphinx[1] as the generating tool and markup rather
than just regular ReST. I'm in the process of converting all of my
local documentation to this engine's format.

As it's maturing, I find Sphinx to be on a path of being an ideal
referencing tool (considering my desire for dialog and effort, towards
standardizing documentation in the floss community. Improving pinpoint
referencing is one of future focus for me).

For example, NumpyBook is much more accessible to me now than it was
in .pdf. I'll be sending Travis a copy when I'm finished finalizing
the formating, after the just now automated translation. When he's
ready to release numpybook to the public domain, he may consider it
useful.

Btw, I think an excellent model of documentation topology/formatting
is that for the Qt toolkit (online). Take a look if your not familiar
with it. I might add their docs, options of cross-referenced api code
as well as internals for code junkies.

Are there others out there as focussed as I am on the 'science' of
useful documentation.

[1] http://sphinx.pocoo.org/index.html


>
>  * I would prefer if the main page was broken up into one per function. I
>   know this is the way it is in the actual wiki layout, but I think it
>   would be better if it was presented this way to the user. Anyway, this
>   is debatable.
>
>  * Emmanuelle has functions to read from the wiki and write to it from a
>   remote client. I am not sure how well they work, but it would be nice
>   not to require a login and rights on the server to generate patches.
>
>
>  > Is there interest to move forward with this?
>
>  There is. With Emmanuelle and Stefan van der Waalt, who has also been
>  following the project, we were considering using a webapp running with
>  turbogears to move forward. They would know better what the status is.
>
>  Congratulations for that. Let us hope you can join forces with the other
>  team working on that to bring this project to its success.
>
>  Cheers,
>
>  Gaël
>
>
> ___
>  Numpy-discussion mailing list
>  Numpy-discussion@scipy.org
>  http://projects.scipy.org/mailman/listinfo/numpy-discussion
>
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Compilation problems - bizzare

2008-05-05 Thread Charles R Harris
On Mon, May 5, 2008 at 5:33 PM, Thomas Hrabe <[EMAIL PROTECTED]> wrote:

>  Hi all,
>
> currently, I am writing a box of modular functions for exchanging python &
> matlab objects (nd arrays in particular).
> I am facing an odd problem which I can not explain to myself:
>
> I use
> PyArg_ParseTuple(args, "O!s",&PyArray_Type, &array,&na)
> for parsing the array and a string.
> This function call works perfectly well when called from a static function
> used for extending python.
> However, using the call above in another function encapsulating the call
> above yields a segmentation fault -> python crash -> anger , irritation ...
> : )
> Same with
> PyArray_FromDimsAndData(dimensions,size,(const char)p.first,(char*)value)
>
> Did somebody ever encounter this?
>
> By the way, I get a compilation warning
> /home/global/python32/lib/python2.4/site-packages/numpy/core/include/numpy/__multiarray_api.h:944:
> warning: 'int _import_array()' defined but not used
>
> Thank you in advance for your help,
> Thomas
>

Could you attach a code snippet that reproduces the problem? What version of
numpy are you using?

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Compilation problems - bizzare

2008-05-05 Thread Robert Kern
On Mon, May 5, 2008 at 6:33 PM, Thomas Hrabe <[EMAIL PROTECTED]> wrote:
>
> Hi all,
>
>  currently, I am writing a box of modular functions for exchanging python &
> matlab objects (nd arrays in particular).
>  I am facing an odd problem which I can not explain to myself:
>
>  I use
>  PyArg_ParseTuple(args, "O!s",&PyArray_Type, &array,&na)
>  for parsing the array and a string.
>  This function call works perfectly well when called from a static function
> used for extending python.
>  However, using the call above in another function encapsulating the call
> above yields a segmentation fault -> python crash -> anger , irritation ...
> : )
>  Same with
>  PyArray_FromDimsAndData(dimensions,size,(const char)p.first,(char*)value)
>
>  Did somebody ever encounter this?

I haven't. Can you run this under a debugger to give us a backtrace at
the site of the crash?

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
 -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Compilation problems - bizzare

2008-05-05 Thread Thomas Hrabe
Hi all,

currently, I am writing a box of modular functions for exchanging python & 
matlab objects (nd arrays in particular).
I am facing an odd problem which I can not explain to myself:

I use 
PyArg_ParseTuple(args, "O!s",&PyArray_Type, &array,&na)
for parsing the array and a string.
This function call works perfectly well when called from a static function used 
for extending python. 
However, using the call above in another function encapsulating the call above 
yields a segmentation fault -> python crash -> anger , irritation ... : )
Same with 
PyArray_FromDimsAndData(dimensions,size,(const char)p.first,(char*)value)

Did somebody ever encounter this?

By the way, I get a compilation warning 
/home/global/python32/lib/python2.4/site-packages/numpy/core/include/numpy/__multiarray_api.h:944:
 warning: 'int _import_array()' defined but not used

Thank you in advance for your help,
Thomas

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Debian Lenny switches to Python 2.5 default

2008-05-05 Thread Robert Kern
On Mon, May 5, 2008 at 2:56 PM, Keith Goodman <[EMAIL PROTECTED]> wrote:
> I'm the click of a botton away from changing the python default on my
>  Debian Lenny system from 2.4 to 2.5. Has anyone experienced any numpy
>  issues after the switch?

2.4 -> 2.5 in general shouldn't be a problem. If you are on a 64-bit
system, you will finally be able to mmap very large files. I don't
(and wouldn't) know of any Lenny-specific issues, though.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
 -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy masked array oddity

2008-05-05 Thread Pierre GM
On Monday 05 May 2008 15:35:35 Eric Firing wrote:
> What I meant was that I don't see that such a ravelled version of a
> matrix would be likely to make sense in a linear algebra context, so
> leaving it as a matrix is likely to cause confusion rather than
> convenience.  Still, it would be consistent, so I am not objecting to it.

I understand and concur: ravelling a matrix always brings surprises. On a side 
note, .compressed() isn't the method recommended to get rid of missing values 
in a 2D array: there are the compress_rows and compress_cols functions for 
that. In any case, I doubt that regular matrix users combine their matrices 
with missing data...
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Debian Lenny switches to Python 2.5 default

2008-05-05 Thread Angus McMorland
2008/5/5 Keith Goodman <[EMAIL PROTECTED]>:
> I'm the click of a botton away from changing the python default on my
>  Debian Lenny system from 2.4 to 2.5. Has anyone experienced any numpy
>  issues after the switch?

All normal here so far, with most of a day's use.  All numpy tests
pass, and I get four failures in scipy (which also upgraded to
0.6.0-11 this morning): 2 in lapack/tests/esv_tests, and 2
umfpack-related.

A.
-- 
AJC McMorland, PhD candidate
Physiology, University of Auckland

(Nearly) post-doctoral research fellow
Neurobiology, University of Pittsburgh
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Debian Lenny switches to Python 2.5 default

2008-05-05 Thread Keith Goodman
I'm the click of a botton away from changing the python default on my
Debian Lenny system from 2.4 to 2.5. Has anyone experienced any numpy
issues after the switch?
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy masked array oddity

2008-05-05 Thread Eric Firing
Pierre GM wrote:
> On Monday 05 May 2008 15:10:56 Eric Firing wrote:
>> Pierre GM wrote:
>>> * An alternative would be to force the output of MaskedArray.compressed()
>>> to type(MaskedArray._baseclass), where the _baseclass attribute is the
>>> class of the underlying array: usually it's only ndarray, but it can be
>>> any subclass. Changing this behavior would not break anything in
>>> TimeSeries.
>> This alternative makes sense to me--I expect most use cases would be
>> most efficient with compressed yielding a plain ndarray.  I don't see
>> any gain in keeping it as a masked array, and having to manually convert
>> it.  (I don't see how the _baseclass conversion would work with the
>> baseclass as a matrix, though.)
> 
> In fact, it's straightforward:
> - ravel the _data part to get a type(_baseclass) object
> - use .compress on the _data part, using logical_not(mask) as the condition.
> When you have a matrix as _baseclass, the result will be a ravelled version 
> of 
> the initial matrix.

What I meant was that I don't see that such a ravelled version of a 
matrix would be likely to make sense in a linear algebra context, so 
leaving it as a matrix is likely to cause confusion rather than 
convenience.  Still, it would be consistent, so I am not objecting to it.

Eric
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy masked array oddity

2008-05-05 Thread Pierre GM
On Monday 05 May 2008 15:10:56 Eric Firing wrote:
> Pierre GM wrote:
> > * An alternative would be to force the output of MaskedArray.compressed()
> > to type(MaskedArray._baseclass), where the _baseclass attribute is the
> > class of the underlying array: usually it's only ndarray, but it can be
> > any subclass. Changing this behavior would not break anything in
> > TimeSeries.
>
> This alternative makes sense to me--I expect most use cases would be
> most efficient with compressed yielding a plain ndarray.  I don't see
> any gain in keeping it as a masked array, and having to manually convert
> it.  (I don't see how the _baseclass conversion would work with the
> baseclass as a matrix, though.)

In fact, it's straightforward:
- ravel the _data part to get a type(_baseclass) object
- use .compress on the _data part, using logical_not(mask) as the condition.
When you have a matrix as _baseclass, the result will be a ravelled version of 
the initial matrix.

But yes, it makes indeed more sense not to have a MaskedArray in output. 
SVN5126 should now work as discussed.



> Eric
>
> > * I need to fix a bug in compressed when the underlying array is a
> > matrix: I can take care of the alternative at the same time. What are the
> > opinions on that matter ?
> > ___
> > Numpy-discussion mailing list
> > Numpy-discussion@scipy.org
> > http://projects.scipy.org/mailman/listinfo/numpy-discussion
>
> ___
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy masked array oddity

2008-05-05 Thread Eric Firing
Pierre GM wrote:
> On Monday 05 May 2008 13:19:40 Russell E. Owen wrote:
>> The object returned by maskedArray.compressed() appears to be a normal
>> numpy array (based on repr output), but in reality it has some
>> surprising differences:
> 
> Russell:
> 
> * I assume you're not using the latest version of numpy, are you ? If you 
> were, the .sort() method would work.

He is clearly using the older version; it is accessed via numpy.core.ma.

> 
> * Currently, the output of MaskedArray.compressed() is indeed a MaskedArray, 
> where the missing values are skipped. If you need a regular ndarray, just a 
> view as Robert suggested. Christopher's suggestion is equivalent.
> 
> * An alternative would be to force the output of MaskedArray.compressed() to
> type(MaskedArray._baseclass), where the _baseclass attribute is the class of 
> the underlying array: usually it's only ndarray, but it can be any subclass. 
> Changing this behavior would not break anything in TimeSeries. 

This alternative makes sense to me--I expect most use cases would be 
most efficient with compressed yielding a plain ndarray.  I don't see 
any gain in keeping it as a masked array, and having to manually convert 
it.  (I don't see how the _baseclass conversion would work with the 
baseclass as a matrix, though.)

Eric

> 
> * I need to fix a bug in compressed when the underlying array is a matrix: I 
> can take care of the alternative at the same time. What are the opinions on 
> that matter ?
> ___
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Deprecating PyDataMem_RENEW ?

2008-05-05 Thread Robert Kern
On Mon, May 5, 2008 at 1:30 PM, David Cournapeau <[EMAIL PROTECTED]> wrote:
> On Tue, May 6, 2008 at 2:11 AM, Robert Kern <[EMAIL PROTECTED]> wrote:
>  >
>  >  I am in favor of at least trying this out. We will have to have a set
>  >  of benchmarks to make sure we haven't hurt the current uses of
>  >  PyMemData_RENEW which Tim points out.
>
>  What would be a good stress test for PyArray_FromIter ? I think using
>  array iterator was the main hotspot for reading large arff files in
>  scipy.io.arff, would that be enough ?

Since there are only 6 places where PyMemData_RENEW is used, all 6
uses should be benchmarked. I would prefer a more targeted benchmark
so we know exactly what we are measuring.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
 -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Deprecating PyDataMem_RENEW ?

2008-05-05 Thread Anne Archibald
2008/5/5 Robert Kern <[EMAIL PROTECTED]>:
> On Mon, May 5, 2008 at 7:44 AM, David Cournapeau
>  <[EMAIL PROTECTED]> wrote:
>
> >  In numpy, we can always replace realloc by malloc/free, because we know
>  >  the size of the old block: would deprecating PyMemData_RENEW and
>  >  replacing them by PyMemeData_NEW/PyMemData_FREE be possible, such as to
>  >  make all numpy arrays follow a default alignement ? There are only a few
>  >  of them in numpy (6 of them), 0 in scipy, and I guess extensions never
>  >  really used them ?
>
>  I am in favor of at least trying this out. We will have to have a set
>  of benchmarks to make sure we haven't hurt the current uses of
>  PyMemData_RENEW which Tim points out.

realloc() may not be a performance win anyway. Allocating new memory
is quite fast, copying data is quite fast, and in-place realloc()
tends to cause memory fragmentation - take an arena full of 1024-byte
blocks and suddenly make one of them 256 bytes long and you haven't
gained anything at all in most malloc() implementations. So yes,
benchmarks.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Deprecating PyDataMem_RENEW ?

2008-05-05 Thread David Cournapeau
On Tue, May 6, 2008 at 2:11 AM, Robert Kern <[EMAIL PROTECTED]> wrote:
>
>  I am in favor of at least trying this out. We will have to have a set
>  of benchmarks to make sure we haven't hurt the current uses of
>  PyMemData_RENEW which Tim points out.

What would be a good stress test for PyArray_FromIter ? I think using
array iterator was the main hotspot for reading large arff files in
scipy.io.arff, would that be enough ?

cheers,

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Deprecating PyDataMem_RENEW ?

2008-05-05 Thread David Cournapeau
On Tue, May 6, 2008 at 1:59 AM, Timothy Hochberg <[EMAIL PROTECTED]> wrote:

> I don't think you would want to do this in the core of PyArray_FromIter;
> presumably realloc can sometimes reuse the existing pointer and save on
> allocating a new chunk of memory. Since there are lots of allocations in
> fromiter, this could potentially be a big performance hit.

Looking at PyArray_FromIter, there is already a log(n) behavior for
allocation, so I am not sure it would hurt so much: I would guess that
realloc often need to allocate a new block if the size is the double
of the former pointer.

That's how realloc seems to work on FreeBSD, at least:

http://www.freebsd.org/cgi/cvsweb.cgi/src/lib/libc/stdlib/malloc.c?rev=1.171;content-type=text%2Fplain

> (At least I think
> so, realloc has always been kinda voodoo to me).

Once malloc is implemented, I would guess realloc to be simple, no ?
For each block of memory in the malloc chain, just check whereas there
is enough size to reuse the same block and adjacent block if any,
otherwise, free + malloc + copy, non ?

 One could use
> PyMemData_NEW/PyMemData_FREE in the final allocation to make sure that the
> data is alligned, we allready do a realloc there to dump any extra space.

I guess that would be the easiest: directly use realloc in the loop,
and use PyDataMem_NEW/FREE at the end if necessary. Using realloc is
no problem if the buffer is always freed in the same function.

> Or, possibly better, one could choose which allocation strategy to use here
> depending on whether the data was alligned or not.

I mentioned step 1, because there would be a step 2 (but maybe only
numpy 2): extending the C api functions to create arrays such as
asking for an explicit alignment is possible.

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy masked array oddity

2008-05-05 Thread Pierre GM
On Monday 05 May 2008 13:19:40 Russell E. Owen wrote:
> The object returned by maskedArray.compressed() appears to be a normal
> numpy array (based on repr output), but in reality it has some
> surprising differences:

Russell:

* I assume you're not using the latest version of numpy, are you ? If you 
were, the .sort() method would work.

* Currently, the output of MaskedArray.compressed() is indeed a MaskedArray, 
where the missing values are skipped. If you need a regular ndarray, just a 
view as Robert suggested. Christopher's suggestion is equivalent.

* An alternative would be to force the output of MaskedArray.compressed() to
type(MaskedArray._baseclass), where the _baseclass attribute is the class of 
the underlying array: usually it's only ndarray, but it can be any subclass. 
Changing this behavior would not break anything in TimeSeries. 

* I need to fix a bug in compressed when the underlying array is a matrix: I 
can take care of the alternative at the same time. What are the opinions on 
that matter ?
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy masked array oddity

2008-05-05 Thread Christopher Barker
Robert Kern wrote:
> I don't know the reason why it's not an ndarray, but you don't have to
> copy the data again to get one:
> 
>   c = ma.compressed().view(numpy.ndarray)

would:

c - numpy.asarray(ma.compressed())

work too?

-CHB



-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

[EMAIL PROTECTED]
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy masked array oddity

2008-05-05 Thread Robert Kern
On Mon, May 5, 2008 at 12:19 PM, Russell E. Owen <[EMAIL PROTECTED]> wrote:
> The object returned by maskedArray.compressed() appears to be a normal
>  numpy array (based on repr output), but in reality it has some
>  surprising differences:
>
>  import numpy
>  a = numpy.arange(10, dtype=int)
>  b = numpy.zeros(10)
>  b[1] = 1
>  b[3] = 1
>  ma = numpy.core.ma.array(a, mask=b, dtype=float)
>  print ma
>  # [0.0 -- 2.0 -- 4.0 5.0 6.0 7.0 8.0 9.0]
>  c = ma.compressed()
>  print repr(c)
>  # array([ 0.  2.  4.  5.  6.  7.  8.  9.])
>  c.sort()
>  #Traceback (most recent call last):
>  #  File "", line 1, in 
>  #  File
>  "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-pac
>  kages/#numpy/core/ma.py", line 2132, in not_implemented
>  #raise NotImplementedError, "not yet implemented for numpy.ma arrays"
>  #NotImplementedError: not yet implemented for numpy.ma arrays
>  d = numpy.array(c)
>  d.sort()
>  # this works fine, as expected
>
>  Why is "c" in the example above not just a regular numpy array? It is
>  not a "live" view (based on a quick test), which seems sensible to me.
>  I've worked around the problem by making a copy (d in the example
>  above), but it seems most unfortunate to have to copy the data twice.

I don't know the reason why it's not an ndarray, but you don't have to
copy the data again to get one:

  c = ma.compressed().view(numpy.ndarray)

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
 -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Learn about numpy

2008-05-05 Thread Nadav Horesh


I think you have a problem of overflow in r5: You may better use utin64 instead 
of uint32.

  Nadav.

-הודעה מקורית-
מאת: [EMAIL PROTECTED] בשם Folkert Boonstra
נשלח: ב 05-מאי-08 19:17
אל: Discussion of Numerical Python
נושא: Re: [Numpy-discussion] Learn about numpy
 
Folkert Boonstra schreef:
> Nadav Horesh schreef:
>   
>> What you do here is a convolution with 
>>
>> 0 1 0
>> 1 1 1
>> 0 1 0
>>
>> kernel, and thresholding, you can use numpy.numarray.nd_image package:
>>
>> import numpy.numarray.nd_image as NI
>> .
>> .
>> .
>>ker = array([[0,1,0], [1,1,1],[0,1,0]])
>>result = (NI.convolve(self.bufbw, ker) == 1).astype(uint8)
>>
>> for nore general cases you can use the function generic_filter in the same 
>> package.
>>
>>Nadav.
>>
>>   
>> 
> Thanks, that works ok! 
>
> However if instead of uint8, uint32 values are used, the result only
> contains zeros.
> Or am I doing something wrong?
> Folkert
>
> import numpy
> import numpy.numarray.nd_image as NI
>
> B = numpy.zeros((5,5), dtype=numpy.uint8)
> C = numpy.zeros((5,5), dtype=numpy.uint32)
>
> DC = 4278190280
> LC = 4278241280
>
> B[:]   = 0
> B[1,1] = 1
> B[2,2] = 1
> C[:]   = DC
> C[1,1] = LC
> C[2,2] = LC
>
> ker01 = numpy.array([[0,1,0], \
>  [1,1,1], \
>  [0,1,0]])
> kerCC = numpy.array([[C[0,0],C[1,1],C[0,0]], \
>  [C[1,1],C[1,1],C[1,1]], \
>  [C[0,0],C[1,1],C[0,0]]]).astype(numpy.uint32)
>
> r1 = NI.convolve(B, ker01).astype(numpy.uint8)
> r2 = (NI.convolve(B, ker01) == 1).astype(numpy.uint8)
> r3 = NI.convolve(C, kerCC).astype(numpy.uint32)
> r4 = (NI.convolve(C, kerCC) == C[0,0]).astype(numpy.uint32)
>   
It should be:

r5 = NI.convolve(C, ker01).astype(numpy.uint32)

which results in:

[[4211082216 4211133216 4211082216 4211082216 4211082216]
 [4211133216 4211133216 4211184216 4211082216 4211082216]
 [4211082216 4211184216 4211133216 4211133216 4211082216]
 [4211082216 4211082216 4211133216 4211082216 4211082216]
 [4211082216 4211082216 4211082216 4211082216 4211082216]]

Now I have to find out how convolve works in order to understand why
these values are generated.  Are there some good examples /
documentation as you know?
I found a EECE253_07_Convolution.pdf with lecture notes on convolution
for image processing.

Thanks,
Folkert

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion

<>___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] numpy masked array oddity

2008-05-05 Thread Russell E. Owen
The object returned by maskedArray.compressed() appears to be a normal 
numpy array (based on repr output), but in reality it has some 
surprising differences:

import numpy
a = numpy.arange(10, dtype=int)
b = numpy.zeros(10)
b[1] = 1
b[3] = 1
ma = numpy.core.ma.array(a, mask=b, dtype=float)
print ma
# [0.0 -- 2.0 -- 4.0 5.0 6.0 7.0 8.0 9.0]
c = ma.compressed()
print repr(c)
# array([ 0.  2.  4.  5.  6.  7.  8.  9.])
c.sort()
#Traceback (most recent call last):
#  File "", line 1, in 
#  File 
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-pac
kages/#numpy/core/ma.py", line 2132, in not_implemented
#raise NotImplementedError, "not yet implemented for numpy.ma arrays"
#NotImplementedError: not yet implemented for numpy.ma arrays
d = numpy.array(c)
d.sort()
# this works fine, as expected

Why is "c" in the example above not just a regular numpy array? It is 
not a "live" view (based on a quick test), which seems sensible to me. 
I've worked around the problem by making a copy (d in the example 
above), but it seems most unfortunate to have to copy the data twice.

-- Russsell

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy in RHEL4

2008-05-05 Thread Robert Kern
On Mon, May 5, 2008 at 6:14 AM, Bala subramanian
<[EMAIL PROTECTED]> wrote:
> Dear friends,
>
> I am trying to install numpy version numpy-1.0.4 in RHEL 4. My python
> version is 2.3.4.  While installation, it throws me the following error and
> stops. Kindly write me how to get rid of this.

The files /usr/lib/liblapack.a and /usr/lib/liblapack.so are probably
32-bit instead of 64-bit. You can (probably) check this using the
command

  $ file /usr/lib/liblapack.so

Locate the 64-bit versions of liblapack and libblas and make sure that
you have the correct directory in your site.cfg file. The numpy source
tree contains a commented site.cfg.example file for you to start with.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
 -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Deprecating PyDataMem_RENEW ?

2008-05-05 Thread Robert Kern
On Mon, May 5, 2008 at 7:44 AM, David Cournapeau
<[EMAIL PROTECTED]> wrote:
>  In numpy, we can always replace realloc by malloc/free, because we know
>  the size of the old block: would deprecating PyMemData_RENEW and
>  replacing them by PyMemeData_NEW/PyMemData_FREE be possible, such as to
>  make all numpy arrays follow a default alignement ? There are only a few
>  of them in numpy (6 of them), 0 in scipy, and I guess extensions never
>  really used them ?

I am in favor of at least trying this out. We will have to have a set
of benchmarks to make sure we haven't hurt the current uses of
PyMemData_RENEW which Tim points out.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
 -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Deprecating PyDataMem_RENEW ?

2008-05-05 Thread Timothy Hochberg
On Mon, May 5, 2008 at 5:44 AM, David Cournapeau <
[EMAIL PROTECTED]> wrote:

> Hi,
>
>While working again on the fftpack module, to clean things up and
> speed some backends (in particular fftw3, which is really sub-optimal
> right now), I remembered how much unaligned data pointer in numpy arrays
> hurt performances. So I would like to relaunch the discussion on aligned
> allocators and default alignement for numpy arrays :
>
> http://www.mail-archive.com/numpy-discussion@scipy.org/msg04005.html
>
> Basically, what I have in mind is, in a first step (for numpy 1.2):
>- define functions to allocate on a given alignement
>- make PyMemData_NEW 16 byte aligned by default (to be compatible
> with SSE and co).
>
> The problem was, and still is, realloc. It is not possible to implement
> realloc with malloc/free (in a portable way), and as such, it is not
> possible to have an aligned realloc.
>
> In numpy, we can always replace realloc by malloc/free, because we know
> the size of the old block: would deprecating PyMemData_RENEW and
> replacing them by PyMemeData_NEW/PyMemData_FREE be possible, such as to
> make all numpy arrays follow a default alignement ? There are only a few
> of them in numpy (6 of them), 0 in scipy, and I guess extensions never
> really used them ?


I don't think you would want to do this in the core of PyArray_FromIter;
presumably realloc can sometimes reuse the existing pointer and save on
allocating a new chunk of memory. Since there are lots of allocations in
fromiter, this could potentially be a big performance hit. (At least I think
so, realloc has always been kinda voodoo to me). One could use
PyMemData_NEW/PyMemData_FREE in the final allocation to make sure that the
data is alligned, we allready do a realloc there to dump any extra space.
Or, possibly better, one could choose which allocation strategy to use here
depending on whether the data was alligned or not.



>
>
> cheers,
>
> David
>
> ___
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
. __
. |-\
.
. [EMAIL PROTECTED]
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Learn about numpy

2008-05-05 Thread Folkert Boonstra
Folkert Boonstra schreef:
> Nadav Horesh schreef:
>   
>> What you do here is a convolution with 
>>
>> 0 1 0
>> 1 1 1
>> 0 1 0
>>
>> kernel, and thresholding, you can use numpy.numarray.nd_image package:
>>
>> import numpy.numarray.nd_image as NI
>> .
>> .
>> .
>>ker = array([[0,1,0], [1,1,1],[0,1,0]])
>>result = (NI.convolve(self.bufbw, ker) == 1).astype(uint8)
>>
>> for nore general cases you can use the function generic_filter in the same 
>> package.
>>
>>Nadav.
>>
>>   
>> 
> Thanks, that works ok! 
>
> However if instead of uint8, uint32 values are used, the result only
> contains zeros.
> Or am I doing something wrong?
> Folkert
>
> import numpy
> import numpy.numarray.nd_image as NI
>
> B = numpy.zeros((5,5), dtype=numpy.uint8)
> C = numpy.zeros((5,5), dtype=numpy.uint32)
>
> DC = 4278190280
> LC = 4278241280
>
> B[:]   = 0
> B[1,1] = 1
> B[2,2] = 1
> C[:]   = DC
> C[1,1] = LC
> C[2,2] = LC
>
> ker01 = numpy.array([[0,1,0], \
>  [1,1,1], \
>  [0,1,0]])
> kerCC = numpy.array([[C[0,0],C[1,1],C[0,0]], \
>  [C[1,1],C[1,1],C[1,1]], \
>  [C[0,0],C[1,1],C[0,0]]]).astype(numpy.uint32)
>
> r1 = NI.convolve(B, ker01).astype(numpy.uint8)
> r2 = (NI.convolve(B, ker01) == 1).astype(numpy.uint8)
> r3 = NI.convolve(C, kerCC).astype(numpy.uint32)
> r4 = (NI.convolve(C, kerCC) == C[0,0]).astype(numpy.uint32)
>   
It should be:

r5 = NI.convolve(C, ker01).astype(numpy.uint32)

which results in:

[[4211082216 4211133216 4211082216 4211082216 4211082216]
 [4211133216 4211133216 4211184216 4211082216 4211082216]
 [4211082216 4211184216 4211133216 4211133216 4211082216]
 [4211082216 4211082216 4211133216 4211082216 4211082216]
 [4211082216 4211082216 4211082216 4211082216 4211082216]]

Now I have to find out how convolve works in order to understand why
these values are generated.  Are there some good examples /
documentation as you know?
I found a EECE253_07_Convolution.pdf with lecture notes on convolution
for image processing.

Thanks,
Folkert

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Deprecating PyDataMem_RENEW ?

2008-05-05 Thread David Cournapeau
Hi,

While working again on the fftpack module, to clean things up and 
speed some backends (in particular fftw3, which is really sub-optimal 
right now), I remembered how much unaligned data pointer in numpy arrays 
hurt performances. So I would like to relaunch the discussion on aligned 
allocators and default alignement for numpy arrays :

http://www.mail-archive.com/numpy-discussion@scipy.org/msg04005.html

Basically, what I have in mind is, in a first step (for numpy 1.2):
- define functions to allocate on a given alignement
- make PyMemData_NEW 16 byte aligned by default (to be compatible 
with SSE and co).

The problem was, and still is, realloc. It is not possible to implement 
realloc with malloc/free (in a portable way), and as such, it is not 
possible to have an aligned realloc.

In numpy, we can always replace realloc by malloc/free, because we know 
the size of the old block: would deprecating PyMemData_RENEW and 
replacing them by PyMemeData_NEW/PyMemData_FREE be possible, such as to 
make all numpy arrays follow a default alignement ? There are only a few 
of them in numpy (6 of them), 0 in scipy, and I guess extensions never 
really used them ?

cheers,

David
   
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] numpy in RHEL4

2008-05-05 Thread Bala subramanian
Dear friends,

I am trying to install numpy version numpy-1.0.4 in RHEL 4. My python
version is 2.3.4.  While installation, it throws me the following error and
stops. Kindly write me how to get rid of this.

Thanks,
Bala

/usr/bin/ld: skipping incompatible /usr/lib/liblapack.so when searching for
-llapack
/usr/bin/ld: skipping incompatible /usr/lib/liblapack.a when searching for
-llapack
/usr/bin/ld: skipping incompatible
/usr/lib/gcc/x86_64-redhat-linux/3.4.6/../../../liblapack.so when searching
for -llapack
/usr/bin/ld: skipping incompatible
/usr/lib/gcc/x86_64-redhat-linux/3.4.6/../../../liblapack.a when searching
for -llapack
/usr/bin/ld: skipping incompatible /usr/lib/liblapack.so when searching for
-llapack
/usr/bin/ld: skipping incompatible /usr/lib/liblapack.a when searching for
-llapack
/usr/bin/ld: cannot find -llapack
collect2: ld returned 1 exit status
/usr/bin/ld: skipping incompatible /usr/lib/liblapack.so when searching for
-llapack
/usr/bin/ld: skipping incompatible /usr/lib/liblapack.a when searching for
-llapack
/usr/bin/ld: skipping incompatible
/usr/lib/gcc/x86_64-redhat-linux/3.4.6/../../../liblapack.so when searching
for -llapack
/usr/bin/ld: skipping incompatible
/usr/lib/gcc/x86_64-redhat-linux/3.4.6/../../../liblapack.a when searching
for -llapack
/usr/bin/ld: skipping incompatible /usr/lib/liblapack.so when searching for
-llapack
/usr/bin/ld: skipping incompatible /usr/lib/liblapack.a when searching for
-llapack
/usr/bin/ld: cannot find -llapack
collect2: ld returned 1 exit status
error: Command "/usr/bin/g77 -g -Wall -g -Wall -shared
build/temp.linux-x86_64-2.3/numpy/linalg/lapack_litemodule.o -L/usr/lib
-llapack -lblas -lg2c -o
build/lib.linux-x86_64-2.3/numpy/linalg/lapack_lite.so" failed with exit
status 1
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Learn about numpy

2008-05-05 Thread Folkert Boonstra
Nadav Horesh schreef:
> What you do here is a convolution with 
>
> 0 1 0
> 1 1 1
> 0 1 0
>
> kernel, and thresholding, you can use numpy.numarray.nd_image package:
>
> import numpy.numarray.nd_image as NI
> .
> .
> .
>ker = array([[0,1,0], [1,1,1],[0,1,0]])
>result = (NI.convolve(self.bufbw, ker) == 1).astype(uint8)
>
> for nore general cases you can use the function generic_filter in the same 
> package.
>
>Nadav.
>
>   
Thanks, that works ok! 

However if instead of uint8, uint32 values are used, the result only
contains zeros.
Or am I doing something wrong?
Folkert

import numpy
import numpy.numarray.nd_image as NI

B = numpy.zeros((5,5), dtype=numpy.uint8)
C = numpy.zeros((5,5), dtype=numpy.uint32)

DC = 4278190280
LC = 4278241280

B[:]   = 0
B[1,1] = 1
B[2,2] = 1
C[:]   = DC
C[1,1] = LC
C[2,2] = LC

ker01 = numpy.array([[0,1,0], \
 [1,1,1], \
 [0,1,0]])
kerCC = numpy.array([[C[0,0],C[1,1],C[0,0]], \
 [C[1,1],C[1,1],C[1,1]], \
 [C[0,0],C[1,1],C[0,0]]]).astype(numpy.uint32)

r1 = NI.convolve(B, ker01).astype(numpy.uint8)
r2 = (NI.convolve(B, ker01) == 1).astype(numpy.uint8)
r3 = NI.convolve(C, kerCC).astype(numpy.uint32)
r4 = (NI.convolve(C, kerCC) == C[0,0]).astype(numpy.uint32)

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] [IT] Weekend outage complete

2008-05-05 Thread Peter Wang
Hi everyone,

The downtime took a little longer than expected (perhaps that is to be  
expected?), but everything should be back up and running now.  Mail,  
web, SVN, and Trac for scipy.org and enthought.com are all  
functional.  The mail server is working through some backlogged mail  
but that should clear up in a few hours.

Thanks again for your patience during this upgrade process.  We're in  
much better shape now to continue improving our network without  
causing such intrusive outages in the future.

Please let me know if you have any questions or comments.


On behalf of the IT crew,
Thanks,
Peter

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] strict aliasing?

2008-05-05 Thread David Cournapeau
Charles R Harris wrote:
>
> Interesting article. I note that it is OK to alias pointers to the 
> signed and unsigned versions of integer types, which is where I must 
> have picked up my notions about length. I don't recall seeing any 
> major bit of software that didn't use the -fno-strict-aliasing flag, 
> as casting pointers around is one of the things C is(was) all about. 
> So I was a bit surprised that Mike was recommending not doing so, 
> although making the choice on a file by file basis might be useful for 
> the optimizer.

Well, as for the case of Linus' comments on LKML, you have to take 
things into context, I think: this is an article about programming the 
Cell CPU on the PS3. And if there is one domain where performance 
matters a lot, it is the video game domain, and the complexity of modern 
video games is several orders higher than most HPC softwares (the 80/20 
rule is mostly a fallacy in video games). So everything counts.

C99 introduces several rules related to aliasing, as mentioned in the 
article, so I am not sure you can say that C point is about aliasing 
pointers. Seeing what happens with strict aliasing on a file per file 
basis would be interesting (for example, in the ufunc code: at this 
point, there is no aliasing possible anymore I think).

It is also mentioned that a lot of code can be made "strict aliasing" 
safe: I don't know how far we could go in numpy. It would be quite 
difficult, I guess, because you would have to use special code path 
depending on whether there is potential aliasing or not.

cheers,

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion