Re: [Numpy-discussion] creation of ndarray with dtype=np.object : bug?
On 12/03/2014 04:32 AM, Ryan Nelson wrote: Emanuele, This doesn't address your question directly. However, I wonder if you could approach this problem from a different way to get what you want. First of all, create a index array and then just vstack all of your arrays at once. Ryan, Thank you for your solution. Indeed it works. But it seems to me that manually creating an index and re-implementing slicing should be the last resort. NumPy is *great* and provides excellent slicing and assembling tools. For some reason, that I don't fully understand, when dtype=np.object the ndarray constructor tries to be smart and creates unexpected results that cannot be controlled. Another simple example: --- import numpy as np from numpy.random import rand, randint n_arrays = 4 shape0_min = 2 shape0_max = 4 for a in range(30): list_of_arrays = [rand(randint(shape0_min, shape0_max), 3) for i in range(n_arrays)] array_of_arrays = np.array(list_of_arrays, dtype=np.object) print(shape: %s % (array_of_arrays.shape,)) --- the usual output is: shape: (4,) but from time to time, when the randomly generated arrays have - by chance - the same shape, you get: shape: (4, 2, 3) which may crash your code at runtime. To NumPy developers: is there a specific reason for np.array(..., dtype=np.object) to be smart instead of just assembling an array with the provided objects? Best, Emanuele ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] creation of ndarray with dtype=np.object : bug?
On 12/03/2014 12:17 PM, Jaime Fernández del Río wrote: The safe way to create 1D object arrays from a list is by preallocating them, something like this: a = [np.random.rand(2, 3), np.random.rand(2, 3)] b = np.empty(len(a), dtype=object) b[:] = a b array([ array([[ 0.124382 , 0.04489531, 0.93864908], [ 0.77204758, 0.63094413, 0.55823578]]), array([[ 0.80151723, 0.33147467, 0.40491018], [ 0.09905844, 0.90254708, 0.69911945]])], dtype=object) Thank you for the compact way to create 1D object arrays. Definitely useful! As to why np.array tries to be smart, keep in mind that there are other applications of object arrays than having stacked sequences. The following code computes the 100-th Fibonacci number using the matrix form of the recursion (http://en.wikipedia.org/wiki/Fibonacci_number#Matrix_form), numpy's linear algebra capabilities, and Python's arbitrary precision ints: a = np.array([[0, 1], [1, 1]], dtype=object) np.linalg.matrix_power(a, 99)[0, 0] 135301852344706746049L Trying to do this with any other type would result in either wrong results due to overflow: [...] I guess that the problem I am referring to does not refer only to stacked sequences and it is more general. Moreover I do agree that on the example you present: the array creation explores the list of lists and create a 2D array of Python int instead of np.int64. Exploring iterable containers is certainly correct in general. I am wondering whether it should be prevented in some cases, where the semantic is clear from the syntax, e.g. when the nature of the container changes (see below). To me this is intuitive and correct: a = np.array([[0, 1], [1, 1]], dtype=object) a.shape (2, 2) while this is counterintuitive and potentially error-prone: b = np.array([np.array([0, 1]), np.array([0, 1])], dtype=object) b.shape (2, 2) because it is clear that I meant a list of two vectors, i.e. an array of shape (2,), and not a 2D array of shape (2, 2). Best, Emanuele ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] creation of ndarray with dtype=np.object : bug?
Hi, I am using 2D arrays where only one dimension remains constant, e.g.: --- import numpy as np a = np.array([[1, 2, 3], [4, 5, 6]]) # 2 x 3 b = np.array([[9, 8, 7]]) # 1 x 3 c = np.array([[1, 3, 5], [7, 9, 8], [6, 4, 2]]) # 3 x 3 d = np.array([[5, 5, 4], [4, 3, 3]]) # 2 x 3 --- I have a large number of them and need to extract subsets of them through fancy indexing and then stack them together. For this reason I put them into an array of dtype=np.object, given their non-constant nature. Indexing works well :) but stacking does not :( , as you can see in the following example: --- # fancy indexing :) data = np.array([a, b, c, d], dtype=np.object) idx = [0, 1, 3] print(data[idx]) In [1]: [[[1 2 3] [4 5 6]] [[9 8 7]] [[5 5 4] [4 3 3]]] # stacking :( data2 = np.array([a, b, c], dtype=np.object) data3 = np.array([a, d], dtype=np.object) together = np.vstack([data2, data3]) In [2]: --- ValueErrorTraceback (most recent call last) ipython-input-14-7ebee5709e29 in module() 1 execfile(r'/tmp/python-3276515J.py') # PYTHON-MODE /tmp/python-3276515J.py in module() 1 data2 = np.array([a, b, c], dtype=np.object) 2 data3 = np.array([a, d], dtype=np.object) 3 together = np.vstack([data2, data3]) /usr/lib/python2.7/dist-packages/numpy/core/shape_base.pyc in vstack(tup) 224 225 -- 226 return _nx.concatenate(map(atleast_2d,tup),0) 227 228 def hstack(tup): ValueError: arrays must have same number of dimensions The reason of the error is that data2.shape is (2,), while data3.shape is (2, 2, 3). This happens because the creation of ndarrays with dtype=np.object tries to be smart and infer the common dimensions between the objects you put in the array instead of just creating an array of the objects you give. This leads to unexpected results when you use it, like the one in the example, because you cannot control the resulting shape, which is data dependent. Or at least I cannot find a way to create data3 with shape (2,)... How should I address this issue? To me, it looks like a bug in the excellent NumPy. Best, Emanuele ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] np.array creation: unexpected behaviour
Hi, I just came across this unexpected behaviour when creating a np.array() from two other np.arrays of different shape. Have a look at this example: import numpy as np a = np.zeros(3) b = np.zeros((2,3)) c = np.zeros((3,2)) ab = np.array([a, b]) print ab.shape, ab.dtype ac = np.array([a, c], dtype=np.object) print ac.shape, ac.dtype ac_no_dtype = np.array([a, c]) print ac_no_dtype.shape, ac_no_dtype.dtype The output, with NumPy v1.6.1 (Ubuntu 12.04) is: (2,) object (2, 3) object Traceback (most recent call last): File /tmp/numpy_bug.py, line 9, in module ac_no_dtype = np.array([a, c]) ValueError: setting an array element with a sequence. The result for 'ab' is what I expect. The one for 'ac' is a bit surprising. The one for ac_no_dtype even is more surprising. Is this an expected behaviour? Best, Emanuele ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] multivariate_normal issue with 'size' argument
Hi, I'm using NumPy v1.6.1 shipped with Ubuntu 12.04 (Python 2.7.3). I observed an odd behavior of the multivariate_normal function, which does not like int64 for the 'size' argument. Short example: import numpy as np print np.random.multivariate_normal(mean=np.zeros(2), cov=np.eye(2), size=1) print np.random.multivariate_normal(mean=np.zeros(2), cov=np.eye(2), size=np.int64(1)) Which outputs: $ python2.7 mvn_bug.py [[ 0.28880655 0.43289446]] Traceback (most recent call last): File mvn_bug.py, line 3, in module print np.random.multivariate_normal(mean=np.zeros(2), cov=np.eye(2), size=np.int64(1)) File mtrand.pyx, line 3990, in mtrand.RandomState.multivariate_normal (numpy/random/mtrand/mtrand.c:16663) IndexError: invalid index to scalar variable. I had a brief look to the tracker but haven't found any mention of this issue. It might be already solved in the current NumPy (v1.7.0)... or not. I'd like to have your feedback before submitting this issue to the bug tracking system. Best, Emanuele ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] multivariate_normal issue with 'size' argument
Interesting. Anyone able to reproduce what I observe? Emanuele On 05/24/2013 02:09 PM, Nicolas Rougier wrote: Works for me (numpy 1.7.1, osx 10.8.3): import numpy as np print np.random.multivariate_normal(mean=np.zeros(2), cov=np.eye(2), size=1) [[-0.55854737 -1.82631485]] print np.random.multivariate_normal(mean=np.zeros(2), cov=np.eye(2), size=np.int64(1)) [[ 0.40274243 -0.33922682]] Nicolas On May 24, 2013, at 2:02 PM, Emanuele Olivetti emanu...@relativita.com wrote: import numpy as np print np.random.multivariate_normal(mean=np.zeros(2), cov=np.eye(2), size=1) print np.random.multivariate_normal(mean=np.zeros(2), cov=np.eye(2), size=np.int64(1)) ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] [Fwd: Fwd: [ML-news] Call for Submissions: Workshop on Machine Learning Open Source Software (MLOSS), NIPS*08]
Maybe of interest. E. Original Message -- Forwarded message -- From: mikiobraun [EMAIL PROTECTED] Date: 2008/9/8 Subject: [ML-news] Call for Submissions: Workshop on Machine Learning Open Source Software (MLOSS), NIPS*08 To: Machine Learning News [EMAIL PROTECTED] ** Call for Submissions Workshop on Machine Learning Open Source Software 2008 http://mloss.org/workshop/nips08 held at NIPS*08, Whistler, Canada, December 12th, 2008 ** The NIPS workshop on Workshop on Machine Learning Open Source Software (MLOSS) will held in Whistler (B.C.) on the 12th of December, 2008. Important Dates === * Submission Date: October 1st, 2008 * Notification of Acceptance: October 14th, 2008 * Workshop date: December 12 or 13th, 2008 Call for Contributions == The organizing committee is currently seeking abstracts for talks at MLOSS 2008. MLOSS is a great opportunity for you to tell the community about your use, development, or philosophy of open source software in machine learning. This includes (but is not limited to) numeric packages (as e.g. R,octave,numpy), machine learning toolboxes and implementations of ML-algorithms. The committee will select several submitted abstracts for 20-minute talks. The submission process is very simple: * Tag your mloss.org project with the tag nips2008 * Ensure that you have a good description (limited to 500 words) * Any bells and whistles can be put on your own project page, and of course provide this link on mloss.org On 1 October 2008, we will collect all projects tagged with nips2008 for review. Note: Projects must adhere to a recognized Open Source License (cf. http://www.opensource.org/licenses/ ) and the source code must have been released at the time of submission. Submissions will be reviewed based on the status of the project at the time of the submission deadline. Description === We believe that the wide-spread adoption of open source software policies will have a tremendous impact on the field of machine learning. The goal of this workshop is to further support the current developments in this area and give new impulses to it. Following the success of the inaugural NIPS-MLOSS workshop held at NIPS 2006, the Journal of Machine Learning Research (JMLR) has started a new track for machine learning open source software initiated by the workshop's organizers. Many prominent machine learning researchers have co-authored a position paper advocating the need for open source software in machine learning. Furthermore, the workshop's organizers have set up a community website mloss.org where people can register their software projects, rate existing projects and initiate discussions about projects and related topics. This website currently lists 123 such projects including many prominent projects in the area of machine learning. The main goal of this workshop is to bring the main practitioners in the area of machine learning open source software together in order to initiate processes which will help to further improve the development of this area. In particular, we have to move beyond a mere collection of more or less unrelated software projects and provide a common foundation to stimulate cooperation and interoperability between different projects. An important step in this direction will be a common data exchange format such that different methods can exchange their results more easily. This year's workshop sessions will consist of three parts. * We have two invited speakers: John Eaton, the lead developer of Octave and John Hunter, the lead developer of matplotlib. * Researchers are invited to submit their open source project to present it at the workshop. * In discussion sessions, important questions regarding the future development of this area will be discussed. In particular, we will discuss what makes a good machine learning software project and how to improve interoperability between programs. In addition, the question of how to deal with data sets and reproducibility will also be addressed. Taking advantage of the large number of key research groups which attend NIPS, decisions and agreements taken at the workshop will have the potential to significantly impact the future of machine learning software. Invited Speakers * John D. Hunter - Main author of matplotlib. * John W. Eaton - Main author of Octave. Tentative Program = The 1 day workshop will be a mixture of talks (including a mandatory demo of the software) and panel/open/hands-on discussions. Morning session: 7:30am - 10:30am * Introduction and overview * Octave (John
Re: [Numpy-discussion] distance matrix and (weighted) p-norm
Damian Eads wrote: Emanuele Olivetti wrote: ... [*] : ||x - x'||_w = (\sum_{i=1...N} (w_i*|x_i - x'_i|)**p)**(1/p) This feature could be implemented easily. However, I must admit I'm not very familiar with weighted p-norms. What is the reason for raising w to the p instead of w_i*(|x_i-x'_i|)**p? I believe that it is just a choice, that should be clearly expressed since the two formulations lead to different results. I think the expression I wrote is more convenient, since it gives what it is expected even in limit cases. 2 examples: 1) if |x-x'|=N.ones(n) , then ||x-x'||_w,p = ||w||_p in your case: ||x-x'||_w,p = (\sum(w_i))**(1/p) breaking this symmetry 2) when p goes to inifinity in my case: ||x-x'||_w,inf = max(w_i*|x_i-x'_i|_{i=1,...,n}) in your case: ||x-x'||_w,inf = max(|x_i-x'_i|_{i=1,...,n}) = ||x-x'||_1,inf But I welcome any comment on this topic! Emanuele ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] distance matrix and (weighted) p-norm
Excellent. David said that distance computation will be moved in a separate package soon. I guess that your implementation will be the suitable one for this package. Am I wrong? Thanks again, Emanuele Damian Eads wrote: Hi there, The pdist function computes pairwise distances between vectors in a single collection, storing the distances in a condensed distance matrix. This is not exactly what you want--you want to compute distance between two collections of vectors. Suppose XA is a m_A by n array and XB is a m_B by n array, M=scipy.cluster.distance.cdist(XA, XB, metric='mahalanobis') computes a m_A by m_B distance matrix M. The ij'th entry is the distance between XA[i,:] and XB[j,:]. The core computation is implemented in C for efficiency. I've committed the new function along with documentation and about two dozen tests. Cheers, Damian Emanuele Olivetti wrote: David Cournapeau wrote: FWIW, distance is deemed to move to a separate package, because distance computation is useful in other contexts than clustering. Excellent. I was thinking about something similar. I'll have a look to the separate package. Please drop an email to this list when distance will be moved. Thanks, Emanuele - Damian Eads Ph.D. Student Jack Baskin School of Engineering, UCSCE2-479 1156 High Street Santa Cruz, CA 95064http://www.soe.ucsc.edu/~eads ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] distance matrix and (weighted) p-norm
David Cournapeau wrote: Emanuele Olivetti wrote: Hi, I'm trying to compute the distance matrix (weighted p-norm [*]) between two sets of vectors (data1 and data2). Example: You may want to look at scipy.cluster.distance, which has a bunch of distance matrix implementation. I believe most of them have optional compiled version, for fast execution. Thanks for the pointer but the distance subpackage in cluster is about the distance matrix of vectors in one set of vectors. So the resulting matrix is symmetric. I need to compute distances between two different sets of vectors (i.e. a non-symmetric distance matrix). It is not clear to me how to use it in my case. Then cluster.distance offers: 1) slow python double for loop for computing each entry of the matrix 2) or fast C implementation (numpy/cluster/distance/src/distance.c). I guess I need to extend distance.c, then work on the wrapper and then on distance.py. But after that it would be meaningless to have those distances under 'cluster', since clustering doesn't need distances between two sets of vectors. In my original post I was looking for a fast python/numpy implementation for my code. In special cases (like p==2, i.e. standard weighted euclidean distance) there is a superfast implementation (e.g., see Fastest distance matrix calc 2007 thread). But I'm not able to find something similar for the general case. Any other suggestions on how to speed up my example? Thanks, Emanuele ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] distance matrix and (weighted) p-norm
David Cournapeau wrote: FWIW, distance is deemed to move to a separate package, because distance computation is useful in other contexts than clustering. Excellent. I was thinking about something similar. I'll have a look to the separate package. Please drop an email to this list when distance will be moved. Thanks, Emanuele ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] distance matrix and (weighted) p-norm
Hi, I'm trying to compute the distance matrix (weighted p-norm [*]) between two sets of vectors (data1 and data2). Example: import numpy as N p = 3.0 data1 = N.random.randn(100,20) data2 = N.random.randn(80,20) weight = N.random.rand(20) distance_matrix = N.zeros((data1.shape[0],data2.shape[0])) for d in range(data1.shape[1]): distance_matrix += (N.abs(N.subtract.outer(data1[:,d],data2[:,d]))*weight[d])**p pass distance_matrix = distance_matrix**(1.0/p) Is there a way to speed up the for loop? When the dimension of the vectors becomes big (e.g. 1000) the for loop becomes really annoying. Thanks, Emanuele [*] : ||x - x'||_w = (\sum_{i=1...N} (w_i*|x_i - x'_i|)**p)**(1/p) ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] distance_matrix: how to speed up?
Dear all, I need to speed up this function (a little example follows): -- import numpy as N def distance_matrix(data1,data2,weights): rows = data1.shape[0] columns = data2.shape[0] dm = N.zeros((rows,columns)) for i in range(rows): for j in range(columns): dm[i,j] = ((data1[i,:]-data2[j,:])**2*weights).sum() pass pass return dm size1 = 4 size2 = 3 dimensions = 2 data1 = N.random.rand(size1,dimensions) data2 = N.random.rand(size2,dimensions) weights = N.random.rand(dimensions) dm = distance_matrix(data1,data2,weights) print dm -- The distance_matrix function computes the weighted (squared) euclidean distances between each pair of vectors from two sets (data1, data2). The previous naive algorithm is extremely slow for my standard use, i.e., when size1 and size2 are in the order of 1000 or more. It can be improved using N.subtract.outer: def distance_matrix_faster(data1,data2,weights): rows = data1.shape[0] columns = data2.shape[0] dm = N.zeros((rows,columns)) for i in range(data1.shape[1]): dm += N.subtract.outer(data1[:,i],data2[:,i])**2*weights[i] pass return dm This algorithm becomes slow when dimensions (i.e., data1.shape[1]) is big (i.e., 1000), due to the Python loop. In order to speed it up, I guess that N.subtract.outer could be used on the full matrices instead of one column at a time. But then there is a memory issue: 'outer' allocates too much memory since it stores all possible combinations along all dimensions. This is clearly unnecessary. Is there a NumPy way to avoid all Python loops and without wasting too much memory? As a comparison I coded the same algorithm in C through weave (inline): it is _much_ faster and requires just the memory to store the result. But I'd prefer not using C or weave if possible. Thanks in advance for any help, Emanuele ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] distance_matrix: how to speed up?
Matthieu Brucher wrote: Hi, Bill Baxter proposed a version of this problem some months ago on this ML. I use it regularly and it is fast enough for me. Excellent. Exactly what I was looking for. Thanks, Emanuele ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] distance_matrix: how to speed up?
Rob Hetland wrote: I think you want something like this: x1 = x1 * weights[np.newaxis,:] x2 = x2 * weights[np.newaxis,:] x1 = x1[np.newaxis, :, :] x2 = x2[:, np.newaxis, :] distance = np.sqrt( ((x1 - x2)**2).sum(axis=-1) ) x1 and x2 are arrays with size of (npoints, ndimensions), and npoints can be different for each array. I'm not sure I did your weights right, but that part shouldn't be so difficult. Weights seem not right but anyway here is the solution adapted from Bill Baxter's : def distance_matrix_final(data1,data2,weights): data1w = data1*weights dm = (data1w*data1).sum(1)[:,None]-2*N.dot(data1w,data2.T)+(data2*data2*weights).sum(1) dm[dm0] = 0 return dm This solution is super-fast, stable and use little memory. It is based on the fact that: (x-y)^2*w = x*x*w - 2*x*y*w + y*y*w For size1=size2=dimensions=1000 requires ~0.6sec. to compute on my dual core duo. It is 2 order of magnitude faster than my previous solution, but 1-2 order of magnitude slower than using C with weave.inline. Definitely good enough for me. Emanuele ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)
James Philbin wrote: OK, i've written a simple benchmark which implements an elementwise multiply (A=B*C) in three different ways (standard C, intrinsics, hand coded assembly). On the face of things the results seem to indicate that the vectorization works best on medium sized inputs. If people could post the results of running the benchmark on their machines (takes ~1min) along with the output of gcc --version and their chip model, that wd be v useful. It should be compiled with: gcc -msse -O2 vec_bench.c -o vec_bench CPU: Intel(R) Core(TM)2 CPU T7400 @ 2.16GHz (macbook, intel core 2 duo) gcc (GCC) 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2) (ubuntu gutsy gibbon 7.10) $ ./vec_bench Testing methods... All OK Problem size Simple Intrin Inline 100 0.0003ms (100.0%) 0.0002ms ( 68.3%) 0.0002ms ( 75.6%) 1000 0.0023ms (100.0%) 0.0018ms ( 76.7%) 0.0020ms ( 87.1%) 1 0.0361ms (100.0%) 0.0193ms ( 53.4%) 0.0338ms ( 93.7%) 10 0.2839ms (100.0%) 0.1351ms ( 47.6%) 0.0937ms ( 33.0%) 100 4.2108ms (100.0%) 4.1234ms ( 97.9%) 4.0886ms ( 97.1%) 1000 45.3192ms (100.0%) 45.5359ms (100.5%) 45.3466ms (100.1%) Note that there is some variance in the results. Here is a second run to have an idea (look at Inline, size=1): $ ./vec_bench Testing methods... All OK Problem size Simple Intrin Inline 100 0.0003ms (100.0%) 0.0002ms ( 69.5%) 0.0002ms ( 74.1%) 1000 0.0024ms (100.0%) 0.0018ms ( 75.9%) 0.0020ms ( 86.4%) 1 0.0324ms (100.0%) 0.0186ms ( 57.3%) 0.0226ms ( 69.6%) 10 0.2840ms (100.0%) 0.1171ms ( 41.2%) 0.0939ms ( 33.1%) 100 4.4034ms (100.0%) 4.3657ms ( 99.1%) 4.0465ms ( 91.9%) 1000 44.4854ms (100.0%) 43.9502ms ( 98.8%) 43.6824ms ( 98.2%) HTH Emanuele ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] numpy.ndarray constructor from python list: bug?
Dear all, Look at this little example: import numpy a = numpy.array([1]) b = numpy.array([1,2,a]) c = numpy.array([a,1,2]) Which has the following output: Traceback (most recent call last): File b.py, line 4, in module c = numpy.array([a,1,2]) ValueError: setting an array element with a sequence. It seems that a list starting with an ndarray ('a', of a single number) is not a legal input to build an ndarray. Instead if 'a' is in other places of the list the ndarray builds up flawlessly. Is there a meaning for this behavior or is it a bug? Details: numpy 1.04 on ubuntu linux x86_64 Emanuele ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy, H, and struct: numpy bug?
Just tried on a 32bit workstation (both CPU and OS): I get an error, as before, using python2.5: --- a.py:5: DeprecationWarning: struct integer overflow masking is deprecated b=struct.pack(10H,*a) Traceback (most recent call last): File a.py, line 5, in module b=struct.pack(10H,*a) File /usr/lib/python2.5/struct.py, line 63, in pack return o.pack(*args) SystemError: ../Objects/longobject.c:322: bad argument to internal function No error with python2.4 so I believe it is a 32bit issue. HTH, Emanuele Emanuele Olivetti wrote: Hi, this snippet is causing troubles: --- import struct import numpy a=numpy.arange(10).astype('H') b=struct.pack(10H,*a) --- (The module struct simply packs and unpacks data in byte-blobs). It works OK with python2.4, but gives problems with python2.5. On my laptop (linux x86_64 on intel core 2 duo) I got this warning: --- a.py:5: DeprecationWarning: struct integer overflow masking is deprecated b=struct.pack(10H,*a) --- On another workstation (linux i686 on intel core 2, so a 32 bit OS on 64 bit architecture) I got warning plus an _error_, when using python2.5 (python2.4 works flawlessly): --- a.py:5: DeprecationWarning: struct integer overflow masking is deprecated b=struct.pack(10H,*a) Traceback (most recent call last): File a.py, line 5, in module b=struct.pack(10H,*a) File /usr/lib/python2.5/struct.py, line 63, in pack return o.pack(*args) SystemError: ../Objects/longobject.c:322: bad argument to internal function --- Both computers are ubuntu gutsy 7.10, updated. Details: python, 2.5.1-1ubuntu2 numpy, 1:1.0.3-1ubuntu2 Same versions on both machines. I did some little test _without_ numpy and the struct module seems not having problems. Is this a numpy bug? Note: If you remove from the struct format string then it seems to work ok. Regards, Emanuele ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ERROR in installation of NumPy
Simone Marras wrote: Hello everyone, I am trying to install numpy on my Suse 10.2 using Python 2.5 Python is correctly installed and when I launch python setup.py install, I get the following error: numpy/core/src/multiarraymodule.c:7604: fatal error: error writing to /tmp/ccNImg9Q.s: No space left on devicetee: /tmp/tmpLmEe5Y: No space left on device compilation terminated. _exec_command_posix failed (status=256) Can anyone help me, pelase? thanks in advance, As the error message says, you have to free some space in the partition where /tmp is. Your disk (or partition) is full and intermediate/temporary files - needed by the installation step - cannot be created. That's why installation fails. Cheers, Emanuele ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] why std() eats much memory in multidimensional case?
Hi, I'm working with 4D integer matrices and need to compute std() on a given axis but I experience problems with excessive memory consumption. Example: --- import numpy a = numpy.random.randint(100,size=(50,50,50,200)) # 4D randint matrix b = a.std(3) --- It seems that this code requires 100-200 Mb to allocate 'a' as a matrix of integers, but requires 500Mb more just to compute std(3). Is it possible to compute std(3) on integer matrices without spending so much memory? I manage 4D matrices that are not much bigger than the one in the example and they require 1.2Gb of ram to compute std(3) only. Note that quite all this memory is immediately released after computing std() so it seems it's used just internally and not to represent/store the result. Unfortunately I haven't all that RAM... Could someone explain/correct this problem? Thanks in advance, Emanuele ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] histogram2d bug?
An even simpler example generating the same error: import numpy x = numpy.array([0,0]) numpy.histogram2d(x,x) HTH, Emanuele Emanuele Olivetti wrote: While using histogram2d on simple examples I got these errors: import numpy x = numpy.array([0,0]) y = numpy.array([0,1]) numpy.histogram2d(x,y,bins=[2,2]) - Warning: divide by zero encountered in log10 --- exceptions.OverflowError Traceback (most recent call last) /home/ele/ipython console /usr/lib/python2.4/site-packages/numpy/lib/twodim_base.py in histogram2d(x, y, bins, range, normed, weights) 180 if N != 1 and N != 2: 181 xedges = yedges = asarray(bins, float) 182 bins = [xedges, yedges] -- 183 hist, edges = histogramdd([x,y], bins, range, normed, weights) 184 return hist, edges[0], edges[1] /usr/lib/python2.4/site-packages/numpy/lib/function_base.py in histogramdd(sample, bins, range, normed, weights) 206 decimal = int(-log10(dedges[i].min())) +6 207 # Find which points are on the rightmost edge. -- 208 on_edge = where(around(sample[:,i], decimal) == around(edges[i][-1], decimal))[0] 209 # Shift these points one bin to the left. 210 Ncount[i][on_edge] -= 1 /usr/lib/python2.4/site-packages/numpy/core/fromnumeric.py in round_(a, decimals, out) 687 except AttributeError: 688 return _wrapit(a, 'round', decimals, out) -- 689 return round(decimals, out) 690 691 around = round_ OverflowError: long int too large to convert to int - numpy.__version__ '1.0.3.dev3719' Hope this report helps, Emanuele ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] histogram2d bug?
David Huard wrote: Hi Emanuele, The bug is due to a part of the code that shifts the last bin's position to make sure the array's maximum value is counted in the last bin, and not as an outlier. To do so, the code computes an approximate precision used the shift the bin edge by amount small compared to the array's value. In your example, since all values in x are identical, the precision is ``infinite''. So my question is, what kind of behaviour would you be expecting in this case for the automatic placement of bin edges ? That is, given x : array of identical values, eg. [0, 0, 0, 0, 0, ..., 0] smin, smax = x.min(), x.max() How do you select the bin edges ? One solution is to use the same scheme used by histogram: if smin == smax: edges[i] = linspace(smin-.5, smax+.5, nbin[i]+1) Would that be ok ? David I'll submit a patch. The histogram solution seems ok. I can't see drawbacks. My concerns are about not having exception in degenerate cases, like the example I sent. I need to estimate many probability distributions counting samples efficiently so histogram* functions are really nice. Please submit the patch. By the way the same issue affects histogramdd. Thanks a lot, Emanuele ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] numpy.random.permutation bug?
Look at this: --bug.py--- import numpy a=numpy.array([1,2]) b=a.sum() print type(b) c=numpy.random.permutation(b) --- If I run it (Python 2.5, numpy 1.0.1 on a Linux box) I get: --- # python /tmp/bug.py type 'numpy.int32' Traceback (most recent call last): File /tmp/bug.py, line 5, in module c=numpy.random.permutation(b) File mtrand.pyx, line 1227, in mtrand.RandomState.permutation File mtrand.pyx, line 1211, in mtrand.RandomState.shuffle TypeError: len() of unsized object --- permutation() likes 'int' and dislikes 'numpy.int32' integers :( Seems a bug. HTH, Emanuele ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion