Re: [Numpy-discussion] performance of numpy.array()

2015-04-30 Thread simona bellavista
I have seen a big improvement in performance with  numpy 1.9.2 with python
2.7.8, numpy.array takes 5 s instead of 300s.

On the other side, I have also tried numpy 1.9.2 and 1.9.0 with python 3.4
and the results are terrible: numpy.array takes 20s, but the other routines
are slowed down, for example concatenate and astype and copy and uniform.
Most of all, the sort function of numpy.dnarray is slowed down by a factor
at least 10.

On the other cluster I am using python 3.3 with numpy 1.9.0 and it is
working very well (but I think it is so also because of the hardware). I
was trying to install python 3.3 on this cluster, but because of other
issues (error at compile time of h5py library and bug at runtime in the
dill library) I cannot test it right now.

2015-04-29 17:47 GMT+02:00 Sebastian Berg sebast...@sipsolutions.net:

 There was a major improvement to np.array in some cases.

 You can probably work around this by using np.concatenate instead of
 np.array in your case (depends on the usecase, but I will guess you have
 code doing:

 np.array([arr1, arr2, arr3])

 or similar. If your use case is different, you may be out of luck and
 only an upgrade would help.


 On Mi, 2015-04-29 at 17:41 +0200, Nick Papior Andersen wrote:
  You could try and install your own numpy to check whether that
  resolves the problem.
 
  2015-04-29 17:40 GMT+02:00 simona bellavista afy...@gmail.com:
  on cluster A 1.9.0 and on cluster B 1.8.2
 
  2015-04-29 17:18 GMT+02:00 Nick Papior Andersen
  nickpap...@gmail.com:
  Compile it yourself to know the limitations/benefits
  of the dependency libraries.
 
 
  Otherwise, have you checked which versions of numpy
  they are, i.e. are they the same version?
 
 
  2015-04-29 17:05 GMT+02:00 simona bellavista
  afy...@gmail.com:
 
  I work on two distinct scientific clusters. I
  have run the same python code on the two
  clusters and I have noticed that one is faster
  by an order of magnitude than the other (1min
  vs 10min, this is important because I run this
  function many times).
 
 
  I have investigated with a profiler and I have
  found that the cause of this is that (same
  code and same data) is the function
  numpy.array that is being called 10^5 times.
  On cluster A it takes 2 s in total, whereas on
  cluster B it takes ~6 min.  For what regards
  the other functions, they are generally faster
  on cluster A. I understand that the clusters
  are quite different, both as hardware and
  installed libraries. It strikes me that on
  this particular function the performance is so
  different. I would have though that this is
  due to a difference in the available memory,
  but actually by looking with `top` the memory
  seems to be used only at 0.1% on cluster B. In
  theory numpy is compiled with atlas on cluster
  B, and on cluster A it is not clear, because
  numpy.__config__.show() returns NOT AVAILABLE
  for anything.
 
 
  Does anybody has any insight on that, and if I
  can improve the performance on cluster B?
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
 
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
 
 
  --
  Kind regards Nick
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
 
 
  --
  Kind regards Nick
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list

Re: [Numpy-discussion] performance of numpy.array()

2015-04-30 Thread Ryan Nelson
I have had good luck with Continuum's Miniconda Python distributions on
Linux.
http://conda.pydata.org/miniconda.html
The `conda` command makes it very easy to create specific testing
environments for Python 2 and 3 with many different packages. Everything is
precompiled, so you won't have to worry about system library differences
between the two clusters.

Hope that helps.

Ryan

On Thu, Apr 30, 2015 at 10:03 AM, simona bellavista afy...@gmail.com
wrote:

 I have seen a big improvement in performance with  numpy 1.9.2 with python
 2.7.8, numpy.array takes 5 s instead of 300s.

 On the other side, I have also tried numpy 1.9.2 and 1.9.0 with python 3.4
 and the results are terrible: numpy.array takes 20s, but the other routines
 are slowed down, for example concatenate and astype and copy and uniform.
 Most of all, the sort function of numpy.dnarray is slowed down by a factor
 at least 10.

 On the other cluster I am using python 3.3 with numpy 1.9.0 and it is
 working very well (but I think it is so also because of the hardware). I
 was trying to install python 3.3 on this cluster, but because of other
 issues (error at compile time of h5py library and bug at runtime in the
 dill library) I cannot test it right now.

 2015-04-29 17:47 GMT+02:00 Sebastian Berg sebast...@sipsolutions.net:

 There was a major improvement to np.array in some cases.

 You can probably work around this by using np.concatenate instead of
 np.array in your case (depends on the usecase, but I will guess you have
 code doing:

 np.array([arr1, arr2, arr3])

 or similar. If your use case is different, you may be out of luck and
 only an upgrade would help.


 On Mi, 2015-04-29 at 17:41 +0200, Nick Papior Andersen wrote:
  You could try and install your own numpy to check whether that
  resolves the problem.
 
  2015-04-29 17:40 GMT+02:00 simona bellavista afy...@gmail.com:
  on cluster A 1.9.0 and on cluster B 1.8.2
 
  2015-04-29 17:18 GMT+02:00 Nick Papior Andersen
  nickpap...@gmail.com:
  Compile it yourself to know the limitations/benefits
  of the dependency libraries.
 
 
  Otherwise, have you checked which versions of numpy
  they are, i.e. are they the same version?
 
 
  2015-04-29 17:05 GMT+02:00 simona bellavista
  afy...@gmail.com:
 
  I work on two distinct scientific clusters. I
  have run the same python code on the two
  clusters and I have noticed that one is faster
  by an order of magnitude than the other (1min
  vs 10min, this is important because I run this
  function many times).
 
 
  I have investigated with a profiler and I have
  found that the cause of this is that (same
  code and same data) is the function
  numpy.array that is being called 10^5 times.
  On cluster A it takes 2 s in total, whereas on
  cluster B it takes ~6 min.  For what regards
  the other functions, they are generally faster
  on cluster A. I understand that the clusters
  are quite different, both as hardware and
  installed libraries. It strikes me that on
  this particular function the performance is so
  different. I would have though that this is
  due to a difference in the available memory,
  but actually by looking with `top` the memory
  seems to be used only at 0.1% on cluster B. In
  theory numpy is compiled with atlas on cluster
  B, and on cluster A it is not clear, because
  numpy.__config__.show() returns NOT AVAILABLE
  for anything.
 
 
  Does anybody has any insight on that, and if I
  can improve the performance on cluster B?
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
 
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
 
 
  --
  Kind regards Nick
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  

Re: [Numpy-discussion] code snippet: assert all close or large

2015-04-30 Thread josef.pktd
Sorry, hit the wrong key

just an example that I think is not covered by numpy.testing assert

absolute tolerance for `inf`: assert x and y are allclose or x is large if
y is inf

On Thu, Apr 30, 2015 at 2:24 PM, josef.p...@gmail.com wrote:




def assert_allclose_large(x, y, rtol=1e-6, atol=0, ltol=1e30):

 assert x and y are allclose or x is large if y is inf



mask_inf = np.isinf(y)  ~np.isinf(x)

assert_allclose(x[~mask_inf], y[~mask_inf], rtol=rtol, atol=atol)

assert_array_less(ltol, x[mask_inf])


Josef
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: SciPy 2015 Tutorial Schedule Posted - Register Today - Already 30% Sold Out

2015-04-30 Thread Courtenay Godshall (Enthought)
**The  https://plus.google.com/s/%23SciPy2015 #SciPy2015 Conference 
(Scientific Computing with  https://plus.google.com/s/%23Python #Python) 
Tutorial Schedule is up! It is 1st come, 1st served and already 30% sold out. 
Register today!** http://www.scipy2015.scipy.org/ehome/115969/289057/? 
http://www.scipy2015.scipy.org/ehome/115969/289057/?; .This year you can 
choose from 16 different SciPy tutorials OR select the 2 day Software Carpentry 
course on scientific Python that assumes some programming experience but no 
Python knowledge. Please share! Tutorials include:

 

*Introduction to NumPy (Beginner)

*Machine Learning with Scikit-Learn (Intermediate)

*Cython: Blend of the Best of Python and C/++ (Intermediate)

*Image Analysis in Python with SciPy and Scikit-Image (Intermediate)

*Analyzing and Manipulating Data with Pandas (Beginner)

*Machine Learning with Scikit-Learn (Advanced)

*Building Python Data Applications with Blaze and Bokeh (Intermediate)

*Multibody Dynamics and Control with Python (Intermediate)

*Anatomy of Matplotlib (Beginner)

*Computational Statistics I (Intermediate)

*Efficient Python for High-Performance Parallel Computing (Intermediate)

*Geospatial Data with Open Source Tools in Python (Intermediate)

*Decorating Drones: Using Drones to Delve Deeper into Intermediate Python 
(Intermediate)

*Computational Statistics II (Intermediate)

*Modern Optimization Methods in Python (Advanced)

*Jupyter Advanced Topics Tutorial (Advanced)



 

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] code snippet: assert all close or large

2015-04-30 Thread josef.pktd

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy vendor repo

2015-04-30 Thread Ralf Gommers
On Mon, Apr 27, 2015 at 5:20 PM, Ralf Gommers ralf.gomm...@gmail.com
wrote:




 On Mon, Apr 27, 2015 at 5:04 PM, Peter Cock p.j.a.c...@googlemail.com
 wrote:

 On Mon, Apr 27, 2015 at 1:04 PM, Ralf Gommers ralf.gomm...@gmail.com
 wrote:
 
  Done in the master branch of https://github.com/rgommers/vendor. I
 think
  that numpy-vendor is a better repo name than vendor (which is pretty
  much meaningless outside of the numpy github org), so I propose to push
 my
  master branch to https://github.com/numpy/numpy-vendor and remove the
  current https://github.com/numpy/vendor repo.
 
  I'll do this in a couple of days, unless there are objections by then.
 
  Ralf

 Can you not just rename the repository on GitHub?


 Yes, that is possible. The difference is small in this case (retaining the
 1 closed PR fixing a typo; there are no issues), but after looking it up I
 think renaming is a bit less work than creating a new repo. So I'll rename.


This is done now.

If anyone wants to give it a try, the instructions in README.txt should
work for producing working Windows installers. The only thing that's not
covered is the install and use of Vagrant itself, because the former is
platform-dependent and the latter is basically only the very first terminal
line of http://docs.vagrantup.com/v2/getting-started/ (but read on for a
bit, it's useful).

Feedback on how to document that repo better are very welcome. I'll be
looking at improving the documentation of how to release and what a release
manager does as well soon.

Cheers,
Ralf
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy vendor repo

2015-04-30 Thread Ralf Gommers
On Thu, Apr 30, 2015 at 9:32 PM, Ralf Gommers ralf.gomm...@gmail.com
wrote:



 On Mon, Apr 27, 2015 at 5:20 PM, Ralf Gommers ralf.gomm...@gmail.com
 wrote:




 On Mon, Apr 27, 2015 at 5:04 PM, Peter Cock p.j.a.c...@googlemail.com
 wrote:

 On Mon, Apr 27, 2015 at 1:04 PM, Ralf Gommers ralf.gomm...@gmail.com
 wrote:
 
  Done in the master branch of https://github.com/rgommers/vendor. I
 think
  that numpy-vendor is a better repo name than vendor (which is
 pretty
  much meaningless outside of the numpy github org), so I propose to
 push my
  master branch to https://github.com/numpy/numpy-vendor and remove the
  current https://github.com/numpy/vendor repo.
 
  I'll do this in a couple of days, unless there are objections by then.
 
  Ralf

 Can you not just rename the repository on GitHub?


 Yes, that is possible. The difference is small in this case (retaining
 the 1 closed PR fixing a typo; there are no issues), but after looking it
 up I think renaming is a bit less work than creating a new repo. So I'll
 rename.


 This is done now.


One other thing: would be good to agree how we deal with updates to that
repo. The users of numpy-vendor can be counted on one hand at the moment,
so we probably should be less formal about it then for our other repos. How
about everyone can push simple doc and maintenance updates directly, and
more interesting changes go through a PR that the author can merge himself
after a couple of days? That at least ensures that everyone who follows the
repo gets notified on nontrivial changes.

Ralf




 If anyone wants to give it a try, the instructions in README.txt should
 work for producing working Windows installers. The only thing that's not
 covered is the install and use of Vagrant itself, because the former is
 platform-dependent and the latter is basically only the very first terminal
 line of http://docs.vagrantup.com/v2/getting-started/ (but read on for a
 bit, it's useful).

 Feedback on how to document that repo better are very welcome. I'll be
 looking at improving the documentation of how to release and what a release
 manager does as well soon.

 Cheers,
 Ralf



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion