Re: [Numpy-discussion] performance of numpy.array()
I have seen a big improvement in performance with numpy 1.9.2 with python 2.7.8, numpy.array takes 5 s instead of 300s. On the other side, I have also tried numpy 1.9.2 and 1.9.0 with python 3.4 and the results are terrible: numpy.array takes 20s, but the other routines are slowed down, for example concatenate and astype and copy and uniform. Most of all, the sort function of numpy.dnarray is slowed down by a factor at least 10. On the other cluster I am using python 3.3 with numpy 1.9.0 and it is working very well (but I think it is so also because of the hardware). I was trying to install python 3.3 on this cluster, but because of other issues (error at compile time of h5py library and bug at runtime in the dill library) I cannot test it right now. 2015-04-29 17:47 GMT+02:00 Sebastian Berg sebast...@sipsolutions.net: There was a major improvement to np.array in some cases. You can probably work around this by using np.concatenate instead of np.array in your case (depends on the usecase, but I will guess you have code doing: np.array([arr1, arr2, arr3]) or similar. If your use case is different, you may be out of luck and only an upgrade would help. On Mi, 2015-04-29 at 17:41 +0200, Nick Papior Andersen wrote: You could try and install your own numpy to check whether that resolves the problem. 2015-04-29 17:40 GMT+02:00 simona bellavista afy...@gmail.com: on cluster A 1.9.0 and on cluster B 1.8.2 2015-04-29 17:18 GMT+02:00 Nick Papior Andersen nickpap...@gmail.com: Compile it yourself to know the limitations/benefits of the dependency libraries. Otherwise, have you checked which versions of numpy they are, i.e. are they the same version? 2015-04-29 17:05 GMT+02:00 simona bellavista afy...@gmail.com: I work on two distinct scientific clusters. I have run the same python code on the two clusters and I have noticed that one is faster by an order of magnitude than the other (1min vs 10min, this is important because I run this function many times). I have investigated with a profiler and I have found that the cause of this is that (same code and same data) is the function numpy.array that is being called 10^5 times. On cluster A it takes 2 s in total, whereas on cluster B it takes ~6 min. For what regards the other functions, they are generally faster on cluster A. I understand that the clusters are quite different, both as hardware and installed libraries. It strikes me that on this particular function the performance is so different. I would have though that this is due to a difference in the available memory, but actually by looking with `top` the memory seems to be used only at 0.1% on cluster B. In theory numpy is compiled with atlas on cluster B, and on cluster A it is not clear, because numpy.__config__.show() returns NOT AVAILABLE for anything. Does anybody has any insight on that, and if I can improve the performance on cluster B? ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Kind regards Nick ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Kind regards Nick ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list
Re: [Numpy-discussion] performance of numpy.array()
I have had good luck with Continuum's Miniconda Python distributions on Linux. http://conda.pydata.org/miniconda.html The `conda` command makes it very easy to create specific testing environments for Python 2 and 3 with many different packages. Everything is precompiled, so you won't have to worry about system library differences between the two clusters. Hope that helps. Ryan On Thu, Apr 30, 2015 at 10:03 AM, simona bellavista afy...@gmail.com wrote: I have seen a big improvement in performance with numpy 1.9.2 with python 2.7.8, numpy.array takes 5 s instead of 300s. On the other side, I have also tried numpy 1.9.2 and 1.9.0 with python 3.4 and the results are terrible: numpy.array takes 20s, but the other routines are slowed down, for example concatenate and astype and copy and uniform. Most of all, the sort function of numpy.dnarray is slowed down by a factor at least 10. On the other cluster I am using python 3.3 with numpy 1.9.0 and it is working very well (but I think it is so also because of the hardware). I was trying to install python 3.3 on this cluster, but because of other issues (error at compile time of h5py library and bug at runtime in the dill library) I cannot test it right now. 2015-04-29 17:47 GMT+02:00 Sebastian Berg sebast...@sipsolutions.net: There was a major improvement to np.array in some cases. You can probably work around this by using np.concatenate instead of np.array in your case (depends on the usecase, but I will guess you have code doing: np.array([arr1, arr2, arr3]) or similar. If your use case is different, you may be out of luck and only an upgrade would help. On Mi, 2015-04-29 at 17:41 +0200, Nick Papior Andersen wrote: You could try and install your own numpy to check whether that resolves the problem. 2015-04-29 17:40 GMT+02:00 simona bellavista afy...@gmail.com: on cluster A 1.9.0 and on cluster B 1.8.2 2015-04-29 17:18 GMT+02:00 Nick Papior Andersen nickpap...@gmail.com: Compile it yourself to know the limitations/benefits of the dependency libraries. Otherwise, have you checked which versions of numpy they are, i.e. are they the same version? 2015-04-29 17:05 GMT+02:00 simona bellavista afy...@gmail.com: I work on two distinct scientific clusters. I have run the same python code on the two clusters and I have noticed that one is faster by an order of magnitude than the other (1min vs 10min, this is important because I run this function many times). I have investigated with a profiler and I have found that the cause of this is that (same code and same data) is the function numpy.array that is being called 10^5 times. On cluster A it takes 2 s in total, whereas on cluster B it takes ~6 min. For what regards the other functions, they are generally faster on cluster A. I understand that the clusters are quite different, both as hardware and installed libraries. It strikes me that on this particular function the performance is so different. I would have though that this is due to a difference in the available memory, but actually by looking with `top` the memory seems to be used only at 0.1% on cluster B. In theory numpy is compiled with atlas on cluster B, and on cluster A it is not clear, because numpy.__config__.show() returns NOT AVAILABLE for anything. Does anybody has any insight on that, and if I can improve the performance on cluster B? ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Kind regards Nick ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org
Re: [Numpy-discussion] code snippet: assert all close or large
Sorry, hit the wrong key just an example that I think is not covered by numpy.testing assert absolute tolerance for `inf`: assert x and y are allclose or x is large if y is inf On Thu, Apr 30, 2015 at 2:24 PM, josef.p...@gmail.com wrote: def assert_allclose_large(x, y, rtol=1e-6, atol=0, ltol=1e30): assert x and y are allclose or x is large if y is inf mask_inf = np.isinf(y) ~np.isinf(x) assert_allclose(x[~mask_inf], y[~mask_inf], rtol=rtol, atol=atol) assert_array_less(ltol, x[mask_inf]) Josef ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] ANN: SciPy 2015 Tutorial Schedule Posted - Register Today - Already 30% Sold Out
**The https://plus.google.com/s/%23SciPy2015 #SciPy2015 Conference (Scientific Computing with https://plus.google.com/s/%23Python #Python) Tutorial Schedule is up! It is 1st come, 1st served and already 30% sold out. Register today!** http://www.scipy2015.scipy.org/ehome/115969/289057/? http://www.scipy2015.scipy.org/ehome/115969/289057/?; .This year you can choose from 16 different SciPy tutorials OR select the 2 day Software Carpentry course on scientific Python that assumes some programming experience but no Python knowledge. Please share! Tutorials include: *Introduction to NumPy (Beginner) *Machine Learning with Scikit-Learn (Intermediate) *Cython: Blend of the Best of Python and C/++ (Intermediate) *Image Analysis in Python with SciPy and Scikit-Image (Intermediate) *Analyzing and Manipulating Data with Pandas (Beginner) *Machine Learning with Scikit-Learn (Advanced) *Building Python Data Applications with Blaze and Bokeh (Intermediate) *Multibody Dynamics and Control with Python (Intermediate) *Anatomy of Matplotlib (Beginner) *Computational Statistics I (Intermediate) *Efficient Python for High-Performance Parallel Computing (Intermediate) *Geospatial Data with Open Source Tools in Python (Intermediate) *Decorating Drones: Using Drones to Delve Deeper into Intermediate Python (Intermediate) *Computational Statistics II (Intermediate) *Modern Optimization Methods in Python (Advanced) *Jupyter Advanced Topics Tutorial (Advanced) ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] code snippet: assert all close or large
___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy vendor repo
On Mon, Apr 27, 2015 at 5:20 PM, Ralf Gommers ralf.gomm...@gmail.com wrote: On Mon, Apr 27, 2015 at 5:04 PM, Peter Cock p.j.a.c...@googlemail.com wrote: On Mon, Apr 27, 2015 at 1:04 PM, Ralf Gommers ralf.gomm...@gmail.com wrote: Done in the master branch of https://github.com/rgommers/vendor. I think that numpy-vendor is a better repo name than vendor (which is pretty much meaningless outside of the numpy github org), so I propose to push my master branch to https://github.com/numpy/numpy-vendor and remove the current https://github.com/numpy/vendor repo. I'll do this in a couple of days, unless there are objections by then. Ralf Can you not just rename the repository on GitHub? Yes, that is possible. The difference is small in this case (retaining the 1 closed PR fixing a typo; there are no issues), but after looking it up I think renaming is a bit less work than creating a new repo. So I'll rename. This is done now. If anyone wants to give it a try, the instructions in README.txt should work for producing working Windows installers. The only thing that's not covered is the install and use of Vagrant itself, because the former is platform-dependent and the latter is basically only the very first terminal line of http://docs.vagrantup.com/v2/getting-started/ (but read on for a bit, it's useful). Feedback on how to document that repo better are very welcome. I'll be looking at improving the documentation of how to release and what a release manager does as well soon. Cheers, Ralf ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy vendor repo
On Thu, Apr 30, 2015 at 9:32 PM, Ralf Gommers ralf.gomm...@gmail.com wrote: On Mon, Apr 27, 2015 at 5:20 PM, Ralf Gommers ralf.gomm...@gmail.com wrote: On Mon, Apr 27, 2015 at 5:04 PM, Peter Cock p.j.a.c...@googlemail.com wrote: On Mon, Apr 27, 2015 at 1:04 PM, Ralf Gommers ralf.gomm...@gmail.com wrote: Done in the master branch of https://github.com/rgommers/vendor. I think that numpy-vendor is a better repo name than vendor (which is pretty much meaningless outside of the numpy github org), so I propose to push my master branch to https://github.com/numpy/numpy-vendor and remove the current https://github.com/numpy/vendor repo. I'll do this in a couple of days, unless there are objections by then. Ralf Can you not just rename the repository on GitHub? Yes, that is possible. The difference is small in this case (retaining the 1 closed PR fixing a typo; there are no issues), but after looking it up I think renaming is a bit less work than creating a new repo. So I'll rename. This is done now. One other thing: would be good to agree how we deal with updates to that repo. The users of numpy-vendor can be counted on one hand at the moment, so we probably should be less formal about it then for our other repos. How about everyone can push simple doc and maintenance updates directly, and more interesting changes go through a PR that the author can merge himself after a couple of days? That at least ensures that everyone who follows the repo gets notified on nontrivial changes. Ralf If anyone wants to give it a try, the instructions in README.txt should work for producing working Windows installers. The only thing that's not covered is the install and use of Vagrant itself, because the former is platform-dependent and the latter is basically only the very first terminal line of http://docs.vagrantup.com/v2/getting-started/ (but read on for a bit, it's useful). Feedback on how to document that repo better are very welcome. I'll be looking at improving the documentation of how to release and what a release manager does as well soon. Cheers, Ralf ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion