Re: [Numpy-discussion] creation of ndarray with dtype=np.object : bug?

2014-12-03 Thread Emanuele Olivetti
On 12/03/2014 04:32 AM, Ryan Nelson wrote:
 Emanuele,

 This doesn't address your question directly. However, I wonder if you
 could approach this problem from a different way to get what you want.

 First of all, create a index array and then just vstack all of your
 arrays at once.



Ryan,

Thank you for your solution. Indeed it works. But it seems to me
that manually creating an index and re-implementing slicing
should be the last resort. NumPy is *great* and provides excellent
slicing and assembling tools. For some reason, that I don't fully
understand, when dtype=np.object the ndarray constructor
tries to be smart and creates unexpected results that cannot
be controlled.

Another simple example:
---
import numpy as np
from numpy.random import rand, randint
n_arrays = 4
shape0_min = 2
shape0_max = 4
for a in range(30):
 list_of_arrays = [rand(randint(shape0_min, shape0_max), 3) for i in 
range(n_arrays)]
 array_of_arrays = np.array(list_of_arrays, dtype=np.object)
 print(shape: %s % (array_of_arrays.shape,))
---
the usual output is:
shape: (4,)
but from time to time, when the randomly generated arrays have - by chance - the
same shape, you get:
shape: (4, 2, 3)
which may crash your code at runtime.

To NumPy developers: is there a specific reason for np.array(..., 
dtype=np.object)
to be smart instead of just assembling an array with the provided objects?

Best,

Emanuele

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] creation of ndarray with dtype=np.object : bug?

2014-12-03 Thread Emanuele Olivetti
On 12/03/2014 12:17 PM, Jaime Fernández del Río wrote:


 The safe way to create 1D object arrays from a list is by preallocating them, 
 something like this:

  a = [np.random.rand(2, 3), np.random.rand(2, 3)]
  b = np.empty(len(a), dtype=object)
  b[:] = a
  b
 array([ array([[ 0.124382  ,  0.04489531,  0.93864908],
[ 0.77204758,  0.63094413,  0.55823578]]),
array([[ 0.80151723,  0.33147467,  0.40491018],
[ 0.09905844,  0.90254708,  0.69911945]])], dtype=object)



Thank you for the compact way to create 1D object arrays. Definitely
useful!



 As to why np.array tries to be smart, keep in mind that there are other 
 applications of object arrays than having stacked sequences. The following 
 code computes the 100-th Fibonacci number using the matrix form of the 
 recursion (http://en.wikipedia.org/wiki/Fibonacci_number#Matrix_form), 
 numpy's 
 linear algebra capabilities, and Python's arbitrary precision ints:

  a = np.array([[0, 1], [1, 1]], dtype=object)
  np.linalg.matrix_power(a, 99)[0, 0]
 135301852344706746049L

 Trying to do this with any other type would result in either wrong results 
 due 
 to overflow:

 [...]

I guess that the problem I am referring to does not refer only to stacked
sequences and it is more general.

Moreover I do agree that on the example you present: the array creation
explores the list of lists and create a 2D array of Python int instead
of np.int64. Exploring iterable containers is certainly correct in general. I
am wondering whether it should be prevented in some cases, where the
semantic is clear from the syntax, e.g. when the nature of the container
changes (see below).

To me this is intuitive and correct:
  a = np.array([[0, 1], [1, 1]], dtype=object)
  a.shape
(2, 2)
while this is counterintuitive and potentially error-prone:
  b = np.array([np.array([0, 1]), np.array([0, 1])], dtype=object)
  b.shape
(2, 2)
because it is clear that I meant a list of two vectors, i.e. an array of
shape (2,), and not a 2D array of shape (2, 2).

Best,

Emanuele
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] creation of ndarray with dtype=np.object : bug?

2014-12-02 Thread Emanuele Olivetti
Hi,

I am using 2D arrays where only one dimension remains constant, e.g.:
---
import numpy as np
a = np.array([[1, 2, 3], [4, 5, 6]]) # 2 x 3
b = np.array([[9, 8, 7]]) # 1 x 3
c = np.array([[1, 3, 5], [7, 9, 8], [6, 4, 2]]) # 3 x 3
d = np.array([[5, 5, 4], [4, 3, 3]]) # 2 x 3
---
I have a large number of them and need to extract subsets of them
through fancy indexing and then stack them together. For this reason
I put them into an array of dtype=np.object, given their non-constant
nature. Indexing works well :) but stacking does not :( , as you can
see in the following example:
---
# fancy indexing :)
data = np.array([a, b, c, d], dtype=np.object)
idx = [0, 1, 3]
print(data[idx])
In [1]:
[[[1 2 3]
  [4 5 6]] [[9 8 7]] [[5 5 4]
  [4 3 3]]]

# stacking :(
data2 = np.array([a, b, c], dtype=np.object)
data3 = np.array([a, d], dtype=np.object)
together = np.vstack([data2, data3])
In [2]:
---
ValueErrorTraceback (most recent call last)
ipython-input-14-7ebee5709e29 in module()
 1 execfile(r'/tmp/python-3276515J.py') # PYTHON-MODE

/tmp/python-3276515J.py in module()
   1 data2 = np.array([a, b, c], dtype=np.object)
   2 data3 = np.array([a, d], dtype=np.object)
 3 together = np.vstack([data2, data3])

/usr/lib/python2.7/dist-packages/numpy/core/shape_base.pyc in vstack(tup)
 224
 225 
-- 226 return _nx.concatenate(map(atleast_2d,tup),0)
 227
 228 def hstack(tup):

ValueError: arrays must have same number of dimensions

The reason of the error is that data2.shape is (2,), while data3.shape is 
(2, 
2, 3).
This happens because the creation of ndarrays with dtype=np.object tries to be
smart and infer the common dimensions between the objects you put in the array
instead of just creating an array of the objects you give. This leads to 
unexpected
results when you use it, like the one in the example, because you cannot control
the resulting shape, which is data dependent. Or at least I cannot find a way to
create data3 with shape (2,)...

How should I address this issue? To me, it looks like a bug in the excellent 
NumPy.

Best,

Emanuele




___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] np.array creation: unexpected behaviour

2014-01-24 Thread Emanuele Olivetti
Hi,

I just came across this unexpected behaviour when creating
a np.array() from two other np.arrays of different shape.
Have a look at this example:

import numpy as np
a = np.zeros(3)
b = np.zeros((2,3))
c = np.zeros((3,2))
ab = np.array([a, b])
print ab.shape, ab.dtype
ac = np.array([a, c], dtype=np.object)
print ac.shape, ac.dtype
ac_no_dtype = np.array([a, c])
print ac_no_dtype.shape, ac_no_dtype.dtype

The output, with NumPy v1.6.1 (Ubuntu 12.04) is:

(2,) object
(2, 3) object
Traceback (most recent call last):
   File /tmp/numpy_bug.py, line 9, in module
 ac_no_dtype = np.array([a, c])
ValueError: setting an array element with a sequence.


The result for 'ab' is what I expect. The one for 'ac' is
a bit surprising. The one for ac_no_dtype even
is more surprising.

Is this an expected behaviour?

Best,

Emanuele

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] multivariate_normal issue with 'size' argument

2013-05-24 Thread Emanuele Olivetti
Hi,

I'm using NumPy v1.6.1 shipped with Ubuntu 12.04 (Python 2.7.3). I observed an
odd behavior of the multivariate_normal function, which does not like int64 for
the 'size' argument.
Short example:

import numpy as np
print np.random.multivariate_normal(mean=np.zeros(2), cov=np.eye(2), size=1)
print np.random.multivariate_normal(mean=np.zeros(2), cov=np.eye(2), 
size=np.int64(1))


Which outputs:


$ python2.7 mvn_bug.py
[[ 0.28880655  0.43289446]]
Traceback (most recent call last):
   File mvn_bug.py, line 3, in module
 print np.random.multivariate_normal(mean=np.zeros(2), cov=np.eye(2), 
size=np.int64(1))
   File mtrand.pyx, line 3990, in mtrand.RandomState.multivariate_normal 
(numpy/random/mtrand/mtrand.c:16663)
IndexError: invalid index to scalar variable.


I had a brief look to the tracker but haven't found any mention of this issue.
It might be already solved in the current NumPy (v1.7.0)... or not.

I'd like to have your feedback before submitting this issue to the bug tracking 
system.

Best,

Emanuele

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] multivariate_normal issue with 'size' argument

2013-05-24 Thread Emanuele Olivetti
Interesting. Anyone able to reproduce what I observe?

Emanuele

On 05/24/2013 02:09 PM, Nicolas Rougier wrote:


 Works for me (numpy 1.7.1, osx 10.8.3):

 import numpy as np
 print np.random.multivariate_normal(mean=np.zeros(2), cov=np.eye(2), 
 size=1)
 [[-0.55854737 -1.82631485]]
 print np.random.multivariate_normal(mean=np.zeros(2), cov=np.eye(2), 
 size=np.int64(1))
 [[ 0.40274243 -0.33922682]]



 Nicolas

 On May 24, 2013, at 2:02 PM, Emanuele Olivetti emanu...@relativita.com 
 wrote:

 import numpy as np
 print np.random.multivariate_normal(mean=np.zeros(2), cov=np.eye(2), size=1)
 print np.random.multivariate_normal(mean=np.zeros(2), cov=np.eye(2),
 size=np.int64(1))
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] [Fwd: Fwd: [ML-news] Call for Submissions: Workshop on Machine Learning Open Source Software (MLOSS), NIPS*08]

2008-09-09 Thread Emanuele Olivetti
Maybe of interest.

E.

 Original Message 

-- Forwarded message --
From: mikiobraun [EMAIL PROTECTED]
Date: 2008/9/8
Subject: [ML-news] Call for Submissions: Workshop on Machine Learning
Open Source  Software (MLOSS), NIPS*08
To: Machine Learning News [EMAIL PROTECTED]



**

   Call for Submissions

   Workshop on Machine Learning Open Source Software 2008
  http://mloss.org/workshop/nips08

  held at NIPS*08, Whistler, Canada,
December 12th, 2008

**

The NIPS workshop on Workshop on Machine Learning Open Source Software
(MLOSS) will held in Whistler (B.C.) on the 12th of December,
2008.

Important Dates
===

   * Submission Date: October 1st, 2008
   * Notification of Acceptance: October 14th, 2008
   * Workshop date: December 12 or 13th, 2008


Call for Contributions
==

The organizing committee is currently seeking abstracts for talks at
MLOSS 2008. MLOSS is a great opportunity for you to tell the community
about your use, development, or philosophy of open source software in
machine learning. This includes (but is not limited to) numeric
packages (as e.g. R,octave,numpy), machine learning toolboxes and
implementations of ML-algorithms. The committee will select several
submitted abstracts for 20-minute talks.  The submission process is
very simple:

   * Tag your mloss.org project with the tag nips2008

   * Ensure that you have a good description (limited to 500 words)

   * Any bells and whistles can be put on your own project page, and
 of course provide this link on mloss.org

On 1 October 2008, we will collect all projects tagged with nips2008
for review.

Note: Projects must adhere to a recognized Open Source License
(cf. http://www.opensource.org/licenses/ ) and the source code must
have been released at the time of submission. Submissions will be
reviewed based on the status of the project at the time of the
submission deadline.


Description
===

We believe that the wide-spread adoption of open source software
policies will have a tremendous impact on the field of machine
learning. The goal of this workshop is to further support the current
developments in this area and give new impulses to it. Following the
success of the inaugural NIPS-MLOSS workshop held at NIPS 2006, the
Journal of Machine Learning Research (JMLR) has started a new track
for machine learning open source software initiated by the workshop's
organizers. Many prominent machine learning researchers have
co-authored a position paper advocating the need for open source
software in machine learning. Furthermore, the workshop's organizers
have set up a community website mloss.org where people can register
their software projects, rate existing projects and initiate
discussions about projects and related topics. This website currently
lists 123 such projects including many prominent projects in the area
of machine learning.

The main goal of this workshop is to bring the main practitioners in
the area of machine learning open source software together in order to
initiate processes which will help to further improve the development
of this area. In particular, we have to move beyond a mere collection
of more or less unrelated software projects and provide a common
foundation to stimulate cooperation and interoperability between
different projects. An important step in this direction will be a
common data exchange format such that different methods can exchange
their results more easily.

This year's workshop sessions will consist of three parts.

   * We have two invited speakers: John Eaton, the lead developer of
 Octave and John Hunter, the lead developer of matplotlib.

   * Researchers are invited to submit their open source project to
 present it at the workshop.

   * In discussion sessions, important questions regarding the future
 development of this area will be discussed. In particular, we
 will discuss what makes a good machine learning software project
 and how to improve interoperability between programs. In
 addition, the question of how to deal with data sets and
 reproducibility will also be addressed.

Taking advantage of the large number of key research groups which
attend NIPS, decisions and agreements taken at the workshop will have
the potential to significantly impact the future of machine learning
software.


Invited Speakers


   * John D. Hunter - Main author of matplotlib.

   * John W. Eaton - Main author of Octave.


Tentative Program
=

The 1 day workshop will be a mixture of talks (including a mandatory
demo of the software) and panel/open/hands-on discussions.

Morning session: 7:30am - ­10:30am

   * Introduction and overview
   * Octave (John 

Re: [Numpy-discussion] distance matrix and (weighted) p-norm

2008-09-08 Thread Emanuele Olivetti
Damian Eads wrote:
 Emanuele Olivetti wrote:
 ...
 [*] : ||x - x'||_w = (\sum_{i=1...N} (w_i*|x_i - x'_i|)**p)**(1/p)

 This feature could be implemented easily. However, I must admit I'm not 
 very familiar with weighted p-norms.  What is the reason for raising w 
 to the p instead of w_i*(|x_i-x'_i|)**p?


I believe that it is just a choice, that should be clearly expressed
since the two formulations lead to different results. I think the
expression I wrote is more convenient, since it gives what it is
expected even in limit cases. 2 examples:
1) if |x-x'|=N.ones(n) , then ||x-x'||_w,p = ||w||_p
   in your case:
   ||x-x'||_w,p = (\sum(w_i))**(1/p)
   breaking this symmetry
2) when p goes to inifinity in my case:
||x-x'||_w,inf = max(w_i*|x_i-x'_i|_{i=1,...,n})
   in your case:
||x-x'||_w,inf = max(|x_i-x'_i|_{i=1,...,n}) = ||x-x'||_1,inf

But I welcome any comment on this topic!


Emanuele

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] distance matrix and (weighted) p-norm

2008-09-07 Thread Emanuele Olivetti
Excellent.

David said that distance computation will be moved in a
separate package soon. I guess that your implementation
will be the suitable one for this package. Am I wrong?

Thanks again,

Emanuele

Damian Eads wrote:
 Hi there,

 The pdist function computes pairwise distances between vectors in a 
 single collection, storing the distances in a condensed distance matrix. 
   This is not exactly what you want--you want to compute distance 
 between two collections of vectors.

 Suppose XA is a m_A by n array and XB is a m_B by n array,

M=scipy.cluster.distance.cdist(XA, XB, metric='mahalanobis')

 computes a m_A by m_B distance matrix M. The ij'th entry is the distance 
 between XA[i,:] and XB[j,:]. The core computation is implemented in C 
 for efficiency. I've committed the new function along with documentation 
 and about two dozen tests.

 Cheers,

 Damian

 Emanuele Olivetti wrote:
   
 David Cournapeau wrote:
 
 FWIW, distance is deemed to move to a separate package, because distance
 computation is useful in other contexts than clustering.

   
   
 Excellent. I was thinking about something similar. I'll have a look
 to the separate package. Please drop an email to this list when
 distance will be moved.

 Thanks,

 Emanuele
 

 -
 Damian Eads Ph.D. Student
 Jack Baskin School of Engineering, UCSCE2-479
 1156 High Street
 Santa Cruz, CA 95064http://www.soe.ucsc.edu/~eads
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

   

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] distance matrix and (weighted) p-norm

2008-09-03 Thread Emanuele Olivetti
David Cournapeau wrote:
 Emanuele Olivetti wrote:
 Hi,

 I'm trying to compute the distance matrix (weighted p-norm [*])
 between two sets of vectors (data1 and data2). Example:
   

 You may want to look at scipy.cluster.distance, which has a bunch of
 distance matrix implementation. I believe most of them have optional
 compiled version, for fast execution.

Thanks for the pointer but the distance subpackage in cluster is about
the distance matrix of vectors in one set of vectors. So the resulting
matrix is symmetric. I need to compute distances between two
different sets of vectors (i.e. a non-symmetric distance matrix).
It is not clear to me how to use it in my case.

Then cluster.distance offers:
1) slow python double for loop for computing each entry of the matrix
2) or fast C implementation (numpy/cluster/distance/src/distance.c).

I guess I need to extend distance.c, then work on the wrapper and then
on distance.py. But after that it would be meaningless to have those
distances under 'cluster', since clustering doesn't need distances between
two sets of vectors.

In my original post I was looking for a fast python/numpy implementation
for my code. In special cases (like p==2, i.e. standard weighted euclidean
distance) there is a superfast implementation (e.g., see Fastest distance
matrix calc 2007 thread). But I'm not able to find something similar
for the general case.

Any other suggestions on how to speed up my example?

Thanks,

Emanuele

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] distance matrix and (weighted) p-norm

2008-09-03 Thread Emanuele Olivetti
David Cournapeau wrote:
 FWIW, distance is deemed to move to a separate package, because distance
 computation is useful in other contexts than clustering.

   

Excellent. I was thinking about something similar. I'll have a look
to the separate package. Please drop an email to this list when
distance will be moved.

Thanks,

Emanuele

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] distance matrix and (weighted) p-norm

2008-09-02 Thread Emanuele Olivetti
Hi,

I'm trying to compute the distance matrix (weighted p-norm [*])
between two sets of vectors (data1 and data2). Example:

import numpy as N
p = 3.0
data1 = N.random.randn(100,20)
data2 = N.random.randn(80,20)
weight = N.random.rand(20)
distance_matrix = N.zeros((data1.shape[0],data2.shape[0]))
for d in range(data1.shape[1]):
distance_matrix +=
(N.abs(N.subtract.outer(data1[:,d],data2[:,d]))*weight[d])**p
pass
distance_matrix = distance_matrix**(1.0/p)


Is there a way to speed up the for loop? When the dimension
of the vectors becomes big (e.g. 1000) the for loop
becomes really annoying.

Thanks,

Emanuele

[*] : ||x - x'||_w = (\sum_{i=1...N} (w_i*|x_i - x'_i|)**p)**(1/p)

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] distance_matrix: how to speed up?

2008-05-21 Thread Emanuele Olivetti
Dear all,

I need to speed up this function (a little example follows):
--
import numpy as N
def distance_matrix(data1,data2,weights):
rows = data1.shape[0]
columns = data2.shape[0]
dm = N.zeros((rows,columns))
for i in range(rows):
for j in range(columns):
dm[i,j] = ((data1[i,:]-data2[j,:])**2*weights).sum()
pass
pass
return dm

size1 = 4
size2 = 3
dimensions = 2
data1 = N.random.rand(size1,dimensions)
data2 = N.random.rand(size2,dimensions)
weights = N.random.rand(dimensions)
dm = distance_matrix(data1,data2,weights)
print dm
--
The distance_matrix function computes the weighted (squared) euclidean
distances between each pair of vectors from two sets (data1, data2).
The previous naive algorithm is extremely slow for my standard use,
i.e., when size1 and size2 are in the order of 1000 or more. It can be
improved using N.subtract.outer:

def distance_matrix_faster(data1,data2,weights):
rows = data1.shape[0]
columns = data2.shape[0]
dm = N.zeros((rows,columns))
for i in range(data1.shape[1]):
dm += N.subtract.outer(data1[:,i],data2[:,i])**2*weights[i]
pass
return dm

This algorithm becomes slow when dimensions (i.e., data1.shape[1]) is
big (i.e., 1000), due to the Python loop. In order to speed it up, I guess
that N.subtract.outer could be used on the full matrices instead of one
column at a time. But then there is a memory issue: 'outer' allocates
too much memory since it stores all possible combinations along all
dimensions. This is clearly unnecessary.

Is there a NumPy way to avoid all Python loops and without wasting
too much memory? As a comparison I coded the same algorithm in
C through weave (inline): it is _much_ faster and requires just
the memory to store the result. But I'd prefer not using C or weave
if possible.

Thanks in advance for any help,


Emanuele

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] distance_matrix: how to speed up?

2008-05-21 Thread Emanuele Olivetti
Matthieu Brucher wrote:
 Hi,

 Bill Baxter proposed a version of this problem some months ago on this
 ML. I use it regularly and it is fast enough for me.


Excellent. Exactly what I was looking for.

Thanks,

Emanuele

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] distance_matrix: how to speed up?

2008-05-21 Thread Emanuele Olivetti
Rob Hetland wrote:
 I think you want something like this:

 x1 = x1 * weights[np.newaxis,:]
 x2 = x2 * weights[np.newaxis,:]

 x1 = x1[np.newaxis, :, :]
 x2 = x2[:, np.newaxis, :]
 distance = np.sqrt( ((x1 - x2)**2).sum(axis=-1) )

 x1 and x2 are arrays with size of (npoints, ndimensions), and npoints  
 can be different for each array.  I'm not sure I did your weights  
 right, but that part shouldn't be so difficult.

   

Weights seem not right but anyway here is the solution adapted from
Bill Baxter's :

def distance_matrix_final(data1,data2,weights):
data1w = data1*weights
dm =
(data1w*data1).sum(1)[:,None]-2*N.dot(data1w,data2.T)+(data2*data2*weights).sum(1)
dm[dm0] = 0
return dm

This solution is super-fast, stable and use little memory.
It is based on the fact that:
(x-y)^2*w = x*x*w - 2*x*y*w + y*y*w

For size1=size2=dimensions=1000 requires ~0.6sec. to compute
on my dual core duo. It is 2 order of magnitude faster than my
previous solution, but 1-2 order of magnitude slower than using
C with weave.inline.

Definitely good enough for me.


Emanuele

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

2008-03-23 Thread Emanuele Olivetti
James Philbin wrote:
 OK, i've written a simple benchmark which implements an elementwise
 multiply (A=B*C) in three different ways (standard C, intrinsics, hand
 coded assembly). On the face of things the results seem to indicate
 that the vectorization works best on medium sized inputs. If people
 could post the results of running the benchmark on their machines
 (takes ~1min) along with the output of gcc --version and their chip
 model, that wd be v useful.

 It should be compiled with: gcc -msse -O2 vec_bench.c -o vec_bench


CPU: Intel(R) Core(TM)2 CPU  T7400  @ 2.16GHz
(macbook, intel core 2 duo)

gcc (GCC) 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)
(ubuntu gutsy gibbon 7.10)

$ ./vec_bench
Testing methods...
All OK

Problem size  Simple 
Intrin  Inline
 100   0.0003ms (100.0%)   0.0002ms ( 68.3%)   0.0002ms
( 75.6%)
1000   0.0023ms (100.0%)   0.0018ms ( 76.7%)   0.0020ms
( 87.1%)
   1   0.0361ms (100.0%)   0.0193ms ( 53.4%)   0.0338ms
( 93.7%)
  10   0.2839ms (100.0%)   0.1351ms ( 47.6%)   0.0937ms
( 33.0%)
 100   4.2108ms (100.0%)   4.1234ms ( 97.9%)   4.0886ms
( 97.1%)
1000  45.3192ms (100.0%)  45.5359ms (100.5%)  45.3466ms
(100.1%)


Note that there is some variance in the results. Here is a second run to
have
an idea (look at Inline, size=1):

$ ./vec_bench
Testing methods...
All OK

Problem size  Simple 
Intrin  Inline
 100   0.0003ms (100.0%)   0.0002ms ( 69.5%)   0.0002ms
( 74.1%)
1000   0.0024ms (100.0%)   0.0018ms ( 75.9%)   0.0020ms
( 86.4%)
   1   0.0324ms (100.0%)   0.0186ms ( 57.3%)   0.0226ms
( 69.6%)
  10   0.2840ms (100.0%)   0.1171ms ( 41.2%)   0.0939ms
( 33.1%)
 100   4.4034ms (100.0%)   4.3657ms ( 99.1%)   4.0465ms
( 91.9%)
1000  44.4854ms (100.0%)  43.9502ms ( 98.8%)  43.6824ms
( 98.2%)


HTH

Emanuele

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] numpy.ndarray constructor from python list: bug?

2008-03-06 Thread Emanuele Olivetti
Dear all,

Look at this little example:

import numpy
a = numpy.array([1])
b = numpy.array([1,2,a])
c = numpy.array([a,1,2])

Which has the following output:

Traceback (most recent call last):
  File b.py, line 4, in module
c = numpy.array([a,1,2])
ValueError: setting an array element with a sequence.


It seems that a list starting with an ndarray ('a', of
a single number) is not a legal input to build an ndarray.
Instead if 'a' is in other places of the list the ndarray
builds up flawlessly.

Is there a meaning for this behavior or is it a bug?

Details: numpy 1.04 on ubuntu linux x86_64


Emanuele

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy, H, and struct: numpy bug?

2008-03-04 Thread Emanuele Olivetti
Just tried on a 32bit workstation (both CPU and OS): I get
an error, as before, using python2.5:
---
a.py:5: DeprecationWarning: struct integer overflow masking is deprecated
  b=struct.pack(10H,*a)
Traceback (most recent call last):
  File a.py, line 5, in module
b=struct.pack(10H,*a)
  File /usr/lib/python2.5/struct.py, line 63, in pack
return o.pack(*args)
SystemError: ../Objects/longobject.c:322: bad argument to internal function

No error with python2.4 so I believe it is a 32bit issue.

HTH,

Emanuele



Emanuele Olivetti wrote:
 Hi,

 this snippet is causing troubles:
 ---
 import struct
 import numpy

 a=numpy.arange(10).astype('H')
 b=struct.pack(10H,*a)
 ---
 (The module struct simply packs and unpacks data in byte-blobs).

 It works OK with python2.4, but gives problems with python2.5.
 On my laptop (linux x86_64 on intel core 2 duo) I got this warning:
 ---
 a.py:5: DeprecationWarning: struct integer overflow masking is deprecated
   b=struct.pack(10H,*a)
 ---

 On another workstation (linux i686 on intel core 2, so a 32 bit OS on 64 bit
 architecture) I got warning plus an _error_, when using python2.5 (python2.4
 works flawlessly):
 ---
 a.py:5: DeprecationWarning: struct integer overflow masking is deprecated
   b=struct.pack(10H,*a)
 Traceback (most recent call last):
   File a.py, line 5, in module
 b=struct.pack(10H,*a)
   File /usr/lib/python2.5/struct.py, line 63, in pack
 return o.pack(*args)
 SystemError: ../Objects/longobject.c:322: bad argument to internal function
 ---

 Both computers are ubuntu gutsy 7.10, updated.
 Details:
 python,  2.5.1-1ubuntu2
 numpy, 1:1.0.3-1ubuntu2
 Same versions on both machines.

 I did some little test _without_ numpy and the struct module seems not
 having
 problems. Is this a numpy bug?
 Note: If you remove  from the struct format string then it seems to work
 ok.

 Regards,

 Emanuele


 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

   

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ERROR in installation of NumPy

2007-10-05 Thread Emanuele Olivetti
Simone Marras wrote:
 Hello everyone,

 I am trying to install numpy on my Suse 10.2 using Python 2.5
 Python is correctly installed and when I launch  python setup.py
 install, I get the following error:

 numpy/core/src/multiarraymodule.c:7604: fatal error: error writing
 to /tmp/ccNImg9Q.s: No space left on devicetee: /tmp/tmpLmEe5Y: No
 space left on device

 compilation terminated.
 _exec_command_posix failed (status=256)




 Can anyone help me, pelase?

 thanks in advance,

   

As the error message says, you have to free some space in
the partition where /tmp is. Your disk (or partition) is full
and intermediate/temporary files - needed by the installation
step - cannot be created. That's why installation fails.

Cheers,

Emanuele

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] why std() eats much memory in multidimensional case?

2007-04-20 Thread Emanuele Olivetti
Hi,
I'm working with 4D integer matrices and need to compute std() on a
given axis but I experience problems with excessive memory consumption.
Example:
---
import numpy
a = numpy.random.randint(100,size=(50,50,50,200)) # 4D randint matrix
b = a.std(3)
---
It seems that this code requires 100-200 Mb to allocate 'a'
as a matrix of integers, but requires 500Mb more just to
compute std(3). Is it possible to compute std(3) on integer
matrices without spending so much memory?

I manage 4D matrices that are not much bigger than the one in the example
and they require 1.2Gb of ram to compute std(3) only.
Note that quite all this memory is immediately released after
computing std() so it seems it's used just internally and not to
represent/store the result. Unfortunately I haven't all that RAM...

Could someone explain/correct this problem?

Thanks in advance,

Emanuele

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] histogram2d bug?

2007-04-19 Thread Emanuele Olivetti
An even simpler example generating the same error:

import numpy
x = numpy.array([0,0])
numpy.histogram2d(x,x)


HTH,

Emanuele

Emanuele Olivetti wrote:
 While using histogram2d on simple examples I got these errors:

 import numpy
 x = numpy.array([0,0])
 y = numpy.array([0,1])
 numpy.histogram2d(x,y,bins=[2,2])
 -
 Warning: divide by zero encountered in log10
 ---
 exceptions.OverflowError Traceback (most
 recent call last)

 /home/ele/ipython console

 /usr/lib/python2.4/site-packages/numpy/lib/twodim_base.py in
 histogram2d(x, y, bins, range, normed, weights)
 180 if N != 1 and N != 2:
 181 xedges = yedges = asarray(bins, float)
 182 bins = [xedges, yedges]
 -- 183 hist, edges = histogramdd([x,y], bins, range, normed, weights)
 184 return hist, edges[0], edges[1]

 /usr/lib/python2.4/site-packages/numpy/lib/function_base.py in
 histogramdd(sample, bins, range, normed, weights)
 206 decimal = int(-log10(dedges[i].min())) +6
 207 # Find which points are on the rightmost edge.
 -- 208 on_edge = where(around(sample[:,i], decimal) ==
 around(edges[i][-1], decimal))[0]
 209 # Shift these points one bin to the left.
 210 Ncount[i][on_edge] -= 1

 /usr/lib/python2.4/site-packages/numpy/core/fromnumeric.py in round_(a,
 decimals, out)
 687 except AttributeError:
 688 return _wrapit(a, 'round', decimals, out)
 -- 689 return round(decimals, out)
 690
 691 around = round_

 OverflowError: long int too large to convert to int
 -

 numpy.__version__
 '1.0.3.dev3719'

 Hope this report helps,

 Emanuele

 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

   

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] histogram2d bug?

2007-04-19 Thread Emanuele Olivetti
David Huard wrote:
 Hi Emanuele,

 The bug is due to a part of the code that shifts the last bin's
 position to make sure the array's maximum value is counted in the last
 bin, and not as an outlier. To do so, the code computes an approximate
 precision used the shift the bin edge by amount small compared to the
 array's value. In your example, since all values in x are identical,
 the precision is ``infinite''. So my question is, what kind of
 behaviour would you be expecting in this case for the automatic
 placement of bin edges ?

 That is, given
 x : array of identical values, eg. [0, 0, 0, 0, 0, ..., 0]
 smin, smax = x.min(), x.max()
 How do you select the bin edges ?

 One solution is to use the same scheme used by histogram:
 if smin == smax:
 edges[i] = linspace(smin-.5, smax+.5, nbin[i]+1)

 Would that be ok ?

 David


  I'll submit a patch.

The histogram solution seems ok. I can't see drawbacks.
My concerns are about not having exception in degenerate
cases, like the example I sent. I need to estimate many probability
distributions counting samples efficiently so histogram*
functions are really nice.

Please submit the patch. By the way the same issue affects
histogramdd.

Thanks a lot,

Emanuele

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] numpy.random.permutation bug?

2007-01-18 Thread Emanuele Olivetti
Look at this:

--bug.py---
import numpy
a=numpy.array([1,2])
b=a.sum()
print type(b)
c=numpy.random.permutation(b)
---

If I run it (Python 2.5, numpy 1.0.1 on a Linux box) I get:
---
# python /tmp/bug.py
type 'numpy.int32'
Traceback (most recent call last):
  File /tmp/bug.py, line 5, in module
c=numpy.random.permutation(b)
  File mtrand.pyx, line 1227, in mtrand.RandomState.permutation
  File mtrand.pyx, line 1211, in mtrand.RandomState.shuffle
TypeError: len() of unsized object
---

permutation() likes 'int' and dislikes 'numpy.int32' integers :(
Seems a bug.

HTH,

Emanuele


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion