subject:"Re\: \[Numpy\-discussion\] \"import numpy\" performance"

Re: [Numpy-discussion] "import numpy" performance

2012-07-10 Thread Scott Sinclair

On 10 July 2012 09:05, Andrew Dalke  wrote:
> On Jul 8, 2012, at 9:22 AM, Scott Sinclair wrote:
>> On 6 July 2012 15:48, Andrew Dalke  wrote:
>>> I followed the instructions at
>>> http://docs.scipy.org/doc/numpy/dev/gitwash/patching.html
>>> and added Ticket #2181 (with patch)  ...
>>
>> Those instructions need to be updated to reflect the current preferred
>> practice. You'll make code review easier and increase the chances of
>> getting your patch accepted by submitting the patch as a Github pull
>> request instead (see
>> http://docs.scipy.org/doc/numpy/dev/gitwash/development_workflow.html
>> for a how-to). It's not very much extra work.
>
> Both of those URLs point to related documentation under the same
> root, so I assumed that both are equally valid.

That's a valid assumption.

> I did look at the development_workflow documentation, and am already
> bewildered by the terms 'rebase','fast-foward' etc. It seems to that
> last week I made a mistake because I did a "git pull" on my local copy
> (which is what I do with Mercurial to get the current trunk code)
> instead of:
>
>   git fetch followed by gitrebase, git merge --ff-only or
>   git merge --no-ff, depending on what you intend.
>
> I don't know if I made a "common mistake", and I don't know "what [I]
> intend."

Fair enough, new terminology is seldom fun. Using git pull wasn't
necessary in your case, neither was git rebase.

> I realize that for someone who plans to be a long term contributor,
> understanding git, github, and the NumPy development model is
> "not very much extra work", but in terms of extra work for me,
> or at least minimizing my level of confusion, I would rather do
> what the documentation suggests and continue with the submitted
> patch.

By "not very much extra work" I assumed that you'd already done most
of the legwork towards submitting a pull request (Github account,
forking numpy repo, etc..) My mistake, I now retract that statement :)
and submitted your patch in https://github.com/numpy/numpy/pull/334 as
a peace offering.

Cheers,
Scott
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] "import numpy" performance

2012-07-10 Thread Travis Oliphant

Andrew, 

Thank you for your comments.   I agree it's confusing coming to github at 
first.  I still have to refer to the jargon-file to understand what everything 
means.   There are a lot of unfamiliar terms.   

Thank you for your patches.   It does imply more work for developers on NumPy, 
which is why we prefer the github pull request mechanism.   But, having patches 
is better than not having them.  

Having easy ways to upload a patch somewhere is something to think about with 
the intended move to github issue tracker. 

Best, 

-Travis



On Jul 10, 2012, at 2:05 AM, Andrew Dalke wrote:

> On Jul 8, 2012, at 9:22 AM, Scott Sinclair wrote:
>> On 6 July 2012 15:48, Andrew Dalke  wrote:
>>> I followed the instructions at
>>> http://docs.scipy.org/doc/numpy/dev/gitwash/patching.html
>>> and added Ticket #2181 (with patch)  ...
>> 
>> Those instructions need to be updated to reflect the current preferred
>> practice. You'll make code review easier and increase the chances of
>> getting your patch accepted by submitting the patch as a Github pull
>> request instead (see
>> http://docs.scipy.org/doc/numpy/dev/gitwash/development_workflow.html
>> for a how-to). It's not very much extra work.
> 
> Both of those URLs point to related documentation under the same
> root, so I assumed that both are equally valid. The 'patching' one I
> linked to says:
> 
> Making a patch is the simplest and quickest, but if you’re going to be
> doing anything more than simple quick things, please consider following
> the Git for development model instead.
> 
> That really fits me the best, because I don't know git or github, and
> I don't plan to get involved in numpy development other than two patches
> (one already posted, and the other, after my holiday, to get rid of
> required the numpy.testing import).
> 
> I did look at the development_workflow documentation, and am already
> bewildered by the terms 'rebase','fast-foward' etc. It seems to that
> last week I made a mistake because I did a "git pull" on my local copy
> (which is what I do with Mercurial to get the current trunk code)
> instead of:
> 
>  git fetch followed by gitrebase, git merge --ff-only or
>  git merge --no-ff, depending on what you intend.
> 
> I don't know if I made a "common mistake", and I don't know "what [I]
> intend."
> 
> I realize that for someone who plans to be a long term contributor,
> understanding git, github, and the NumPy development model is
> "not very much extra work", but in terms of extra work for me,
> or at least minimizing my level of confusion, I would rather do
> what the documentation suggests and continue with the submitted
> patch.
> 
>   Andrew
>   da...@dalkescientific.com
> 
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] "import numpy" performance

2012-07-10 Thread Andrew Dalke

On Jul 8, 2012, at 9:22 AM, Scott Sinclair wrote:
> On 6 July 2012 15:48, Andrew Dalke  wrote:
>> I followed the instructions at
>> http://docs.scipy.org/doc/numpy/dev/gitwash/patching.html
>> and added Ticket #2181 (with patch)  ...
> 
> Those instructions need to be updated to reflect the current preferred
> practice. You'll make code review easier and increase the chances of
> getting your patch accepted by submitting the patch as a Github pull
> request instead (see
> http://docs.scipy.org/doc/numpy/dev/gitwash/development_workflow.html
> for a how-to). It's not very much extra work.

Both of those URLs point to related documentation under the same
root, so I assumed that both are equally valid. The 'patching' one I
linked to says:

 Making a patch is the simplest and quickest, but if you’re going to be
 doing anything more than simple quick things, please consider following
 the Git for development model instead.

That really fits me the best, because I don't know git or github, and
I don't plan to get involved in numpy development other than two patches
(one already posted, and the other, after my holiday, to get rid of
required the numpy.testing import).

I did look at the development_workflow documentation, and am already
bewildered by the terms 'rebase','fast-foward' etc. It seems to that
last week I made a mistake because I did a "git pull" on my local copy
(which is what I do with Mercurial to get the current trunk code)
instead of:

  git fetch followed by gitrebase, git merge --ff-only or
  git merge --no-ff, depending on what you intend.

I don't know if I made a "common mistake", and I don't know "what [I]
intend."

I realize that for someone who plans to be a long term contributor,
understanding git, github, and the NumPy development model is
"not very much extra work", but in terms of extra work for me,
or at least minimizing my level of confusion, I would rather do
what the documentation suggests and continue with the submitted
patch.

Andrew
da...@dalkescientific.com

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] "import numpy" performance

2012-07-08 Thread Scott Sinclair

On 6 July 2012 15:48, Andrew Dalke  wrote:
> I followed the instructions at
>  http://docs.scipy.org/doc/numpy/dev/gitwash/patching.html
> and added Ticket #2181 (with patch) at
>  http://projects.scipy.org/numpy/ticket/2181

Those instructions need to be updated to reflect the current preferred
practice. You'll make code review easier and increase the chances of
getting your patch accepted by submitting the patch as a Github pull
request instead (see
http://docs.scipy.org/doc/numpy/dev/gitwash/development_workflow.html
for a how-to). It's not very much extra work.

Cheers,
Scott
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] "import numpy" performance

2012-07-06 Thread Andrew Dalke

I followed the instructions at
 http://docs.scipy.org/doc/numpy/dev/gitwash/patching.html
and added Ticket #2181 (with patch) at
 http://projects.scipy.org/numpy/ticket/2181

This remove the 5 'exec' calls from polynomial/*.py and improves
the 'import numpy' time by about 25-30%. That is, on my laptop

python -c 'import time; t1=time.time(); import numpy; print time.time()-t1'

goes from 0.079 seconds to 0.057 (best of 10 for both cases).


The patch does mean that if someone edits the template then they will
need to run the template expansion script manually. I think it's
well worth the effort.

Cheers,

Andrew
da...@dalkescientific.com


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] "import numpy" performance

2012-07-05 Thread Ralf Gommers

On Tue, Jul 3, 2012 at 1:16 AM, Andrew Dalke wrote:

> On Jul 3, 2012, at 12:46 AM, David Cournapeau wrote:
> > It is indeed irrelevant to your end goal, but it does affect the
> > interpretation of what import_array does, and thus of your benchmark
>
> Indeed.
>
> > Focusing on polynomial seems the only sensible action. Except for
> > test, all the other stuff seem difficult to change without breaking
> > anything.
>
> I confirm that when I comment out numpy/__init__.py's "import polynomial"
> then the import time for numpy.core.multiarray goes from
>
> 0.084u 0.031s 0:00.11 100.0%0+0k 0+0io 0pf+0w
>
> to
>
> 0.058u 0.028s 0:00.08 87.5% 0+0k 0+0io 0pf+0w
>
>
> numpy/polynomial imports:
> from polynomial import Polynomial
> from chebyshev import Chebyshev
> from legendre import Legendre
> from hermite import Hermite
> from hermite_e import HermiteE
> from laguerre import Laguerre
>
> and there's no easy way to make these be lazy imports.
>
>
> Strange! The bottom of hermite.py has:
>
> exec polytemplate.substitute(name='Hermite', nick='herm', domain='[-1,1]')
>
> as well as similar code in laguerre.py, chebyshev.py, hermite_e.py,
> and polynomial.py.
>
> I bet there's a lot of overhead generating and exec'ing
> those for each import!
>

Looks like it. That could easily be done at build time though. Making that
change and your proposed change to the test functions, which seems fine,
will likely be enough to reach your 40% target. No need for new imports or
lazy loading then I hope.

Ralf
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] "import numpy" performance

2012-07-03 Thread Paul Ivanov

On Mon, Jul 2, 2012 at 2:59 PM, Nathaniel Smith  wrote:
> On Mon, Jul 2, 2012 at 10:06 PM, Robert Kern  wrote:
>> On Mon, Jul 2, 2012 at 9:43 PM, Benjamin Root  wrote:
>>>
>>> On Mon, Jul 2, 2012 at 4:34 PM, Nathaniel Smith  wrote:
>>
 I think this ship has sailed, but it'd be worth looking into lazy
 importing, where 'numpy.fft' isn't actually imported until someone
 starts using it. There are a bunch of libraries that do this, and one
 would have to fiddle to get compatibility with all the different
 python versions and make sure you're not killing performance (might
 have to be in C) but something along the lines of

 class _FFTModule(object):
   def __getattribute__(self, name):
 mod = importlib.import_module("numpy.fft")
 _FFTModule.__getattribute__ = mod.__getattribute__
 return getattr(mod, name)
 fft = _FFTModule()
>>>
>>> Not sure how this would impact projects like ipython that does
>>> tab-completion support, but I know that that would drive me nuts in my basic
>>> tab-completion setup I have for my regular python terminal.  Of course, in
>>> the grand scheme of things, that really isn't all that important, I don't
>>> think.
>>
>> We used to do it for scipy. It did interfere with tab completion. It
>> did drive many people nuts.
>
> Sounds like a bug in your old code, or else the REPLs have gotten
> better? I just pasted the above code into both ipython and python
> prompts, and typing 'fft.' worked fine in both cases. dir(fft)
> works first try as well.
>
> (If you try this, don't forget to 'import importlib' first, and note
> importlib is 2.7+ only. Obviously importlib is not necessary but it
> makes the minimal example less tedious.)

For anyone interested, I worked out a small lazy-loading class that
we use in nitime [1], which does not need importlib and thus works
on python versions before 2.7 and also has a bit of repr pretty
printing.

I wrote about this to Scipy-Dev [2], and in the original nitime
PR [3] tested that it works in python 2.5, 2.6, 2.7, 3.0, 3.1 and
3.2.

Since that time, we've only changed how we deal with the one
known limitation: reloading a lazily-loaded module was a noop in
that PR, but now throws an error (there's one line commented out
if the noop behavior is preferred).

Here's a link to the rendered docs [4], but if you just grab the
LazyImport class from [1], you can do

  fft = LazyImport('numpy.fft')

1. https://github.com/nipy/nitime/blob/master/nitime/lazyimports.py
2. http://mail.scipy.org/pipermail/scipy-dev/2011-September/016606.html
3. https://github.com/nipy/nitime/pull/88
4. 
http://nipy.sourceforge.net/nitime/api/generated/nitime.lazyimports.html#module-nitime.lazyimports

best,
-- 
Paul Ivanov
314 address only used for lists,  off-list direct email at:
http://pirsquared.org | GPG/PGP key id: 0x0F3E28F7
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] "import numpy" performance

2012-07-03 Thread Chris Barker

On Mon, Jul 2, 2012 at 12:17 PM, Andrew Dalke  wrote:
> In this email I propose a few changes which I think are minor
> and which don't really affect the external NumPy API but which
> I think could improve the "import numpy" performance by at
> least 40%.

+1 -- I think I remember that thread -- at the time, I was
experiencing some really, really slow inport times myself -- it turned
out to be something really wierd with my system (though I don't
remember exactly what), but numpy still is too big an import.

Another note -- I ship stuff with py2exe and friends a fair bit --
numpy's "Import a whole bunch of stuff you may well not be using"
approach means I have to include all that stuff, or hack the heck out
of numpy -- not ideal.

> 1) remove "add_newdocs" and put the docstrings in the C code
>  'add_newdocs' still needs to be there,
>
> The code says:
>
> # This is only meant to add docs to objects defined in C-extension modules.
> # The purpose is to allow easier editing of the docstrings without
> # requiring a re-compile.

+1 -- isn't it better for the docs to be with the code, anyway?

> 2) Don't optimistically assume that all submodules are
> needed. For example, some current code uses
>
 import numpy
 numpy.fft.ifft
> 

+1 see above -- really, what fraction of code uses fft and polynomial, and ...

"namespaces are one honking great idea"

I appreciate the legacy, and the easy-of-use at the interpreter, but
it sure would be nice to clean this up -- maybe keep the leegacy by
having a new import:

import just_numpy as np

that would import the core stuff, and offer the "extra" packages as
specific imports -- ideally, we'd dpreciate the old way, and reccoment
the extra importing for the future, and some day have "numpy" and
"numpy_plus". (Kind of like pylab, I suppose)

lazy importing may work OK, too, though more awkward for py2exe and
friends, and perhaps a bit "magic" for my taste.

> 3) Especially: don't always import 'numpy.testing'

+1

> I have not worried about numpy import performance for
> 4 years. While I have been developing scientific software
> for 20 years, and in Python for 15 years, it has been
> in areas of biology and chemistry which don't use arrays.

remarkable -- I use arrays for everything! most of which are not
classic big arrays you process with lapack type stuff ;-)

>   yeah, it's just using the homogenous array most of the time.

exactly -- I know Travis says: "if you're going to use numpy arrays,
use numpy", but they really are pretty darn handy even if you just use
them as containers.

Ben root wrote:

> Not sure how this would impact projects like ipython that does tab-completion 
> support,
> but I know that that would drive me nuts in my basic tab-completion setup I 
> have for
>my regular python terminal.  Of course, in the grand scheme of things, that 
>really
> isn't all that important, I don't think.

I do think it's important to support easy interactive use, Ipyhton,
etc -- with nice tab completion, easy access to doc string, etc. But
it should alo be possible to not have all that where it isn't required
-- hence my "import numpy_plus" type proposal.

I never did get why the polynomial stuff was added to core numpy

-Chris


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] "import numpy" performance

2012-07-02 Thread Andrew Dalke

On Jul 3, 2012, at 12:46 AM, David Cournapeau wrote:
> It is indeed irrelevant to your end goal, but it does affect the
> interpretation of what import_array does, and thus of your benchmark

Indeed.

> Focusing on polynomial seems the only sensible action. Except for
> test, all the other stuff seem difficult to change without breaking
> anything.

I confirm that when I comment out numpy/__init__.py's "import polynomial"
then the import time for numpy.core.multiarray goes from

0.084u 0.031s 0:00.11 100.0%0+0k 0+0io 0pf+0w

to

0.058u 0.028s 0:00.08 87.5% 0+0k 0+0io 0pf+0w


numpy/polynomial imports:
from polynomial import Polynomial
from chebyshev import Chebyshev
from legendre import Legendre
from hermite import Hermite
from hermite_e import HermiteE
from laguerre import Laguerre

and there's no easy way to make these be lazy imports.


Strange! The bottom of hermite.py has:

exec polytemplate.substitute(name='Hermite', nick='herm', domain='[-1,1]')

as well as similar code in laguerre.py, chebyshev.py, hermite_e.py,
and polynomial.py.

I bet there's a lot of overhead generating and exec'ing
those for each import!



Andrew
da...@dalkescientific.com


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] "import numpy" performance

2012-07-02 Thread Andrew Dalke

On Jul 3, 2012, at 12:21 AM, Nathaniel Smith wrote:
> Yes, but for a proper benchmark we need to compare this to the number
> that we would get with some other implementation... I'm assuming you
> aren't proposing we just delete the docstrings :-).

I suspect that we have a different meaning of the term 'benchmark'.

A benchmark establishes first the baseline by which future
implementations are measured. Which is why I did.

Once there are changes, the benchmark, rerun, helps judge
the usefulness of those changes. This I did not do.

I do not believe that a benchmark requires the changed code as
well before it can be considered a "proper benchmark"

>> This says that 'add_newdocs', which is imported from
>> numpy.core.multiarray (though there may be other importers)
>> takes 0.038 seconds to go through __import__, including
>> all of its children module imports.
> 
> There are no "children modules", all these modules refer to each
> other, and you're assuming that whichever module you happen to load
> first is responsible for all the other modules it happens to
> reference.

While I believe there is an "import tree" analogous to
a "call tree" and Python's import scheme helps ensure
that it's a DAG (so that 'children modules' has a real
meaning), you are correct in identifying that I was only
pointing out the first parent, and not all of the parents.

add_newdocs is the first module to import 'numpy.lib', but
after further testing (I stubbed out the import and made
a fake function), I see that other modules import numpy.lib
and there's no measurable performance increase.

I retract therefore my proposal to move the documentation
which is currently in add_newdocs into the C code.

 With instrumentation I found that 0.083s of the 0.119s
 is spent loading numpy.core.multiarray.
> 
> The number 0.083 doesn't appear anywhere in that profile you pasted,
> so I don't know where this comes from...

I did not save the output run which I used for my original email.
It's easy to generate, so I just ran it again.

Cheers,

Andrew
da...@dalkescientific.com

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] "import numpy" performance

2012-07-02 Thread David Cournapeau

On Mon, Jul 2, 2012 at 11:15 PM, Andrew Dalke  wrote:
> On Jul 2, 2012, at 11:38 PM, Fernando Perez wrote:
>> No, that's the wrong thing to test, because it effectively amounts to
>> 'import numpy', sicne the numpy __init__ file is still executed.  As
>> David indicated, you must import multarray.so by itself.
>
> I understand that clarification. However, it does not affect me.

It is indeed irrelevant to your end goal, but it does affect the
interpretation of what import_array does, and thus of your benchmark

polynomial is definitely the big new overhead (I don't remember it
being significant last time I optimized numpy import times), it is
roughly 30 % of the total cost of importing numpy (95 -> 70 ms total
time, of which numpy went from 70 to 50 ms). Then ctypeslib and test
are the two other significant ones.

I use profile_imports.py from bzr as follows:

import sys
import profile_imports
profile_imports.install()
import numpy
profile_imports.log_stack_info(sys.stdout)

Focusing on polynomial seems the only sensible action. Except for
test, all the other stuff seem difficult to change without breaking
anything.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] "import numpy" performance

2012-07-02 Thread Nathaniel Smith

On Mon, Jul 2, 2012 at 11:15 PM, Andrew Dalke  wrote:
> On Jul 2, 2012, at 11:38 PM, Fernando Perez wrote:
>> No, that's the wrong thing to test, because it effectively amounts to
>> 'import numpy', sicne the numpy __init__ file is still executed.  As
>> David indicated, you must import multarray.so by itself.
>
> I understand that clarification. However, it does not affect me.
>
> I do "import rdkit.Chem". This is all I really care about.
>
> That imports "rdkit.Chem.rdchem" which is a shared library.
>
> That shared library calls the C function/macro  "import_array", which appears 
> to be:
>
> #define import_array() { if (_import_array() < 0) {PyErr_Print(); 
> PyErr_SetString(PyExc_ImportError, "numpy.core.multiarray failed to import"); 
> } }
>
>
> The _import_array looks to be defined via 
> numpy/core/code_generators/generate_numpy_api.py
> which contains
>
> static int
> _import_array(void)
> {
>   int st;
>   PyObject *numpy = PyImport_ImportModule("numpy.core.multiarray");
>   PyObject *c_api = NULL;
>...
>
>
> Thus, I don't see any way that I can import 'multiarray' directly,
> because the underlying C code is the one which imports
> 'numpy.core.multiarray' and by design it is inaccessible to change
> from Python code.
>
> Thus, the correct reference benchmark is "import numpy.core.multiarray"

Oh, I see. I withdraw my comment about how you shouldn't import
numpy.core.multiarray directly, I forgot import_array() does that.

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] "import numpy" performance

2012-07-02 Thread Fernando Perez

On Mon, Jul 2, 2012 at 3:15 PM, Andrew Dalke  wrote:
>
> Thus, I don't see any way that I can import 'multiarray' directly,
> because the underlying C code is the one which imports
> 'numpy.core.multiarray' and by design it is inaccessible to change
> from Python code.

I was just referring to how David was benchmarking the cost of
multiarray in isolation, which can indeed be done, and is useful for
understanding the cumulative effect.

Indeed for your case, it's the sum total of what import_array does
that ultimately matters, but it's still useful to be able to
understand these pieces in isolation.

Cheers,

f
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] "import numpy" performance

2012-07-02 Thread Nathaniel Smith

On Mon, Jul 2, 2012 at 10:44 PM, Andrew Dalke  wrote:
> On Jul 2, 2012, at 10:34 PM, Nathaniel Smith wrote:
>> I don't have any opinion on how acceptable this would be, but I also
>> don't see a benchmark showing how much this would help?
>
> The profile output was lower in that email. The relevant line is
>
> 0.038 add_newdocs (numpy.core.multiarray)

Yes, but for a proper benchmark we need to compare this to the number
that we would get with some other implementation... I'm assuming you
aren't proposing we just delete the docstrings :-).

> This says that 'add_newdocs', which is imported from
> numpy.core.multiarray (though there may be other importers)
> takes 0.038 seconds to go through __import__, including
> all of its children module imports.

There are no "children modules", all these modules refer to each
other, and you're assuming that whichever module you happen to load
first is responsible for all the other modules it happens to
reference.

> add_newdocs: 0.067 (numpy.core.multiarray)
>  numpy.lib: 0.061 (add_newdocs)

I'm pretty sure that what these two lines say is that the actual
add_newdocs code only takes 0.006 seconds?

> numpy.testing: 0.041 (numpy.core.numeric)

However, it does look like numpy.testing is responsible for something
like 35% of our startup overhead and for pulling in a ton of extra
modules (with associated disk seeks), which is pretty dumb.

>>> With instrumentation I found that 0.083s of the 0.119s
>>> is spent loading numpy.core.multiarray.

The number 0.083 doesn't appear anywhere in that profile you pasted,
so I don't know where this comes from...

Anyway, it sounds like the answer is that importing
numpy.core.multiarray doesn't take that long; you're measuring the
total time to do 'import numpy', and it just happens that
numpy.core.multiarray is the first module you load. (BTW, you probably
shouldn't be importing numpy.core.multiarray directly at all, just do
'import numpy'.)

-N
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] "import numpy" performance

2012-07-02 Thread Andrew Dalke

On Jul 2, 2012, at 11:38 PM, Fernando Perez wrote:
> No, that's the wrong thing to test, because it effectively amounts to
> 'import numpy', sicne the numpy __init__ file is still executed.  As
> David indicated, you must import multarray.so by itself.

I understand that clarification. However, it does not affect me.

I do "import rdkit.Chem". This is all I really care about.

That imports "rdkit.Chem.rdchem" which is a shared library.

That shared library calls the C function/macro  "import_array", which appears 
to be: 

#define import_array() { if (_import_array() < 0) {PyErr_Print(); 
PyErr_SetString(PyExc_ImportError, "numpy.core.multiarray failed to import"); } 
}

The _import_array looks to be defined via 
numpy/core/code_generators/generate_numpy_api.py
which contains

static int
_import_array(void)
{
  int st;
  PyObject *numpy = PyImport_ImportModule("numpy.core.multiarray");
  PyObject *c_api = NULL;
   ...

Thus, I don't see any way that I can import 'multiarray' directly,
because the underlying C code is the one which imports
'numpy.core.multiarray' and by design it is inaccessible to change
from Python code.

Thus, the correct reference benchmark is "import numpy.core.multiarray"

Unless I'm lost in a set of header files?

Cheers,

Andrew
da...@dalkescientific.com

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] "import numpy" performance

2012-07-02 Thread Nathaniel Smith

On Mon, Jul 2, 2012 at 10:06 PM, Robert Kern  wrote:
> On Mon, Jul 2, 2012 at 9:43 PM, Benjamin Root  wrote:
>>
>> On Mon, Jul 2, 2012 at 4:34 PM, Nathaniel Smith  wrote:
>
>>> I think this ship has sailed, but it'd be worth looking into lazy
>>> importing, where 'numpy.fft' isn't actually imported until someone
>>> starts using it. There are a bunch of libraries that do this, and one
>>> would have to fiddle to get compatibility with all the different
>>> python versions and make sure you're not killing performance (might
>>> have to be in C) but something along the lines of
>>>
>>> class _FFTModule(object):
>>>   def __getattribute__(self, name):
>>> mod = importlib.import_module("numpy.fft")
>>> _FFTModule.__getattribute__ = mod.__getattribute__
>>> return getattr(mod, name)
>>> fft = _FFTModule()
>>
>> Not sure how this would impact projects like ipython that does
>> tab-completion support, but I know that that would drive me nuts in my basic
>> tab-completion setup I have for my regular python terminal.  Of course, in
>> the grand scheme of things, that really isn't all that important, I don't
>> think.
>
> We used to do it for scipy. It did interfere with tab completion. It
> did drive many people nuts.

Sounds like a bug in your old code, or else the REPLs have gotten
better? I just pasted the above code into both ipython and python
prompts, and typing 'fft.' worked fine in both cases. dir(fft)
works first try as well.

(If you try this, don't forget to 'import importlib' first, and note
importlib is 2.7+ only. Obviously importlib is not necessary but it
makes the minimal example less tedious.)

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] "import numpy" performance

2012-07-02 Thread Andrew Dalke

On Jul 2, 2012, at 10:34 PM, Nathaniel Smith wrote:
> I don't have any opinion on how acceptable this would be, but I also
> don't see a benchmark showing how much this would help?

The profile output was lower in that email. The relevant line is

0.038 add_newdocs (numpy.core.multiarray)

This says that 'add_newdocs', which is imported from
numpy.core.multiarray (though there may be other importers)
takes 0.038 seconds to go through __import__, including
all of its children module imports.

I have attached my import profile script. It has only minor
changes since the one I posted on this list 4 years ago. Its
output is here, showing the import dependency tree first,
and then the list of slowest modules to import.


== Tree ==
rdkit: 0.150 (None)
 os: 0.000 (rdkit)
 sys: 0.000 (rdkit)
 exceptions: 0.000 (rdkit)
  sqlite3: 0.003 (pyPgSQL)
   dbapi2: 0.002 (sqlite3)
datetime: 0.001 (dbapi2)
time: 0.000 (dbapi2)
_sqlite3: 0.001 (dbapi2)
  cDataStructs: 0.008 (pyPgSQL)
  rdkit.Geometry: 0.003 (pyPgSQL)
   rdGeometry: 0.003 (rdkit.Geometry)
  PeriodicTable: 0.002 (pyPgSQL)
   re: 0.000 (PeriodicTable)
  rdchem: 0.116 (pyPgSQL)
   numpy.core.multiarray: 0.109 (rdchem)
numpy.__config__: 0.000 (numpy.core.multiarray)
version: 0.000 (numpy.core.multiarray)
_import_tools: 0.000 (numpy.core.multiarray)
add_newdocs: 0.067 (numpy.core.multiarray)
 numpy.lib: 0.061 (add_newdocs)
  info: 0.000 (numpy.lib)
  numpy.version: 0.000 (numpy.lib)
  type_check: 0.053 (numpy.lib)
   numpy.core.numeric: 0.053 (type_check)
multiarray: 0.001 (numpy.core.numeric)
umath: 0.000 (numpy.core.numeric)
_internal: 0.004 (numpy.core.numeric)
 warnings: 0.000 (_internal)
 numpy.compat: 0.000 (_internal)
  _inspect: 0.000 (numpy.compat)
   types: 0.000 (_inspect)
  py3k: 0.000 (numpy.compat)
numerictypes: 0.001 (numpy.core.numeric)
 __builtin__: 0.000 (numerictypes)
_sort: 0.000 (numpy.core.numeric)
numeric: 0.003 (numpy.core.numeric)
 _dotblas: 0.000 (numeric)
 arrayprint: 0.001 (numeric)
  fromnumeric: 0.000 (arrayprint)
 cPickle: 0.001 (numeric)
  copy_reg: 0.000 (cPickle)
  cStringIO: 0.000 (cPickle)
defchararray: 0.001 (numpy.core.numeric)
 numpy: 0.000 (defchararray)
records: 0.000 (numpy.core.numeric)
memmap: 0.000 (numpy.core.numeric)
scalarmath: 0.000 (numpy.core.numeric)
 numpy.core.umath: 0.000 (scalarmath)
function_base: 0.000 (numpy.core.numeric)
machar: 0.000 (numpy.core.numeric)
 numpy.core.fromnumeric: 0.000 (machar)
getlimits: 0.000 (numpy.core.numeric)
shape_base: 0.000 (numpy.core.numeric)
numpy.testing: 0.041 (numpy.core.numeric)
 unittest: 0.039 (numpy.testing)
  result: 0.002 (unittest)
   traceback: 0.000 (result)
linecache: 0.000 (traceback)
   StringIO: 0.000 (result)
errno: 0.000 (StringIO)
   : 0.000 (result)
   functools: 0.001 (result)
_functools: 0.000 (functools)
  case: 0.035 (unittest)
   difflib: 0.034 (case)
heapq: 0.031 (difflib)
 itertools: 0.029 (heapq)
 operator: 0.001 (heapq)
 bisect: 0.001 (heapq)
  _bisect: 0.000 (bisect)
 _heapq: 0.000 (heapq)
collections: 0.001 (difflib)
 _abcoll: 0.000 (collections)
 _collections: 0.000 (collections)
 keyword: 0.000 (collections)
 thread: 0.000 (collections)
   pprint: 0.000 (case)
   util: 0.000 (case)
  suite: 0.000 (unittest)
  loader: 0.001 (unittest)
   fnmatch: 0.000 (loader)
  main: 0.001 (unittest)
   signals: 0.001 (main)
signal: 0.000 (signals)
weakref: 0.001 (signals)
 UserDict: 0.000 (weakref)
 _weakref: 0.000 (weakref)
 _weakrefset: 0.000 (weakref)
  runner: 0.000 (unittest)
 decorators: 0.001 (numpy.testing)
  numpy.testing.utils: 0.001 (decorators)
   nosetester: 0.000 (numpy.testing.utils)
 utils: 0.000 (numpy.testing)
 numpytest: 0.000 (numpy.testing)
   ufunclike: 0.000 (type_check)
  index_tricks: 0.003 (numpy.lib)
   numpy.core.numerictypes: 0.000 (index_tricks)
   math: 0.000 (index_tricks)
   numpy.core: 0.000 (index_tricks)
   numpy.lib.twodim_base: 0.000 (index_tricks)
   _compiled_base: 0.000 (index_tricks)
   arraysetops: 0.001 (index_tricks)
numpy.lib.utils: 0.001 (arraysetops)
   numpy.matrixlib: 0.001 (index_tricks)
defmatrix: 0.001 (numpy.matrixlib)
   numpy.lib._compiled_base: 0.000 (index_tricks)
  stride_tricks: 0.000 (numpy.lib)
  twodim_base: 0.000 (numpy.lib)
  scimath: 0.000 (numpy.lib)

Re: [Numpy-discussion] "import numpy" performance

2012-07-02 Thread Fernando Perez

On Mon, Jul 2, 2012 at 2:26 PM, Andrew Dalke  wrote:
>
> so the relevant timing test is more likely:
>
> % time python -c 'import numpy.core.multiarray'
> 0.086u 0.031s 0:00.12 91.6% 0+0k 0+0io 0pf+0w

No, that's the wrong thing to test, because it effectively amounts to
'import numpy', sicne the numpy __init__ file is still executed.  As
David indicated, you must import multarray.so by itself.

> I do not know how to run the timing test you did, as I get:
>
> % python -c "import multiarray"
> Traceback (most recent call last):
>   File "", line 1, in 
> ImportError: No module named multi array

You just have to cd to the directory where multiarray.so lives.  I get
the same numbers as David:

longs[core]> time python -c ''

real0m0.038s
user0m0.032s
sys 0m0.000s
longs[core]> time python -c 'import multiarray'

real0m0.035s
user0m0.020s
sys 0m0.012s
longs[core]> pwd
/usr/lib/python2.7/dist-packages/numpy/core


Cheers,

f
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] "import numpy" performance

2012-07-02 Thread Andrew Dalke

On Jul 2, 2012, at 10:33 PM, David Cournapeau wrote:
> On Mon, Jul 2, 2012 at 8:17 PM, Andrew Dalke  
> wrote:
>> In July of 2008 I started a thread about how "import numpy"
>> was noticeably slow for one of my customers.
   ...
>> I managed to get the import time down from 0.21 seconds to
>> 0.08 seconds.
> 
> I will answer to your other remarks later, but 0.21 sec to import
> numpy is very slow, especially on a recent computer. It is 0.095 sec
> on my mac, and 0.075 sec on a linux VM on the same computer (both hot
> cache of course).

That quote was historical review from 4 years ago. I described
the problems I had then, the work-around solution I implemented,
and my additional work to see if I could identify ways which
would have kept me from needing to find a work-around solution.

I then described why I have not worked on this problem for the
last four years, and what has changed to make me interested
in it again. That included current details, such as how "import numpy"
with a warm cache takes 0.083 seconds on my Mac.

> importing multiarray.so only is negligible for me (i.e. difference
> between python -c "import multiarray" and python -c "" is
> statistically insignificant).

The NumPy initialization is being done in C++ code through
"import_array()". That C function does (among other things)

  PyObject *numpy = PyImport_ImportModule("numpy.core.multiarray");

so the relevant timing test is more likely:

% time python -c 'import numpy.core.multiarray'
0.086u 0.031s 0:00.12 91.6% 0+0k 0+0io 0pf+0w
% time python -c 'import numpy.core.multiarray'
0.083u 0.031s 0:00.11 100.0%0+0k 0+0io 0pf+0w
% time python -c 'import numpy.core.multiarray'
0.083u 0.030s 0:00.12 91.6% 0+0k 0+0io 0pf+0w

I do not know how to run the timing test you did, as I get:

% python -c "import multiarray"
Traceback (most recent call last):
  File "", line 1, in 
ImportError: No module named multi array

> I would check external factors, like the size of your sys.path as well.

I have checked that, and inspected the output of python -v -v.

Andrew
da...@dalkescientific.com

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] "import numpy" performance

2012-07-02 Thread Pauli Virtanen

02.07.2012 21:17, Andrew Dalke kirjoitti:
[clip]
> 1) remove "add_newdocs" and put the docstrings in the C code
>  'add_newdocs' still needs to be there,

The docstrings need to be in an easily parseable format, because of the
online documentation editor. Keeping the current format may be the
easiest as that already works.

Moving them in the middle of other C code won't do, but a header file
e.g. generated at build-time should work. This is how it's currently
done with the ufunc docstrings, and it should work also for everything else.

The commit statistics for add_newdocs.py are somewhat misleading ---
since 2008, many of the documentation edits went in the online way, and
these only show up in a single large commit, usually before releases.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] "import numpy" performance

2012-07-02 Thread Robert Kern

On Mon, Jul 2, 2012 at 9:43 PM, Benjamin Root  wrote:
>
> On Mon, Jul 2, 2012 at 4:34 PM, Nathaniel Smith  wrote:

>> I think this ship has sailed, but it'd be worth looking into lazy
>> importing, where 'numpy.fft' isn't actually imported until someone
>> starts using it. There are a bunch of libraries that do this, and one
>> would have to fiddle to get compatibility with all the different
>> python versions and make sure you're not killing performance (might
>> have to be in C) but something along the lines of
>>
>> class _FFTModule(object):
>>   def __getattribute__(self, name):
>>     mod = importlib.import_module("numpy.fft")
>>     _FFTModule.__getattribute__ = mod.__getattribute__
>>     return getattr(mod, name)
>> fft = _FFTModule()
>
> Not sure how this would impact projects like ipython that does
> tab-completion support, but I know that that would drive me nuts in my basic
> tab-completion setup I have for my regular python terminal.  Of course, in
> the grand scheme of things, that really isn't all that important, I don't
> think.

We used to do it for scipy. It did interfere with tab completion. It
did drive many people nuts.

-- 
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] "import numpy" performance

2012-07-02 Thread Benjamin Root

On Mon, Jul 2, 2012 at 4:34 PM, Nathaniel Smith  wrote:

> On Mon, Jul 2, 2012 at 8:17 PM, Andrew Dalke 
> wrote:
> > In this email I propose a few changes which I think are minor
> > and which don't really affect the external NumPy API but which
> > I think could improve the "import numpy" performance by at
> > least 40%. This affects me because I and my clients use a
> > chemistry toolkit which uses only NumPy arrays, and where
> > we run short programs often on the command-line.
> >
> >
> > In July of 2008 I started a thread about how "import numpy"
> > was noticeably slow for one of my customers. They had
> > chemical analysis software, often even run on a single
> > molecular structure using command-line tools, and the
> > several invocations with 0.1 seconds overhead was one of
> > the dominant costs even when numpy wasn't needed.
> >
> > I fixed most of their problems by deferring numpy imports
> > until needed. I remember well the Steve Jobs anecdote at
> >
> http://folklore.org/StoryView.py?project=Macintosh&story=Saving_Lives.txt
> > and spent another day of my time in 2008 to identify the
> > parts of the numpy import sequence which seemed excessive.
> > I managed to get the import time down from 0.21 seconds to
> > 0.08 seconds.
> >
> > Very little of that made it into NumPy.
> >
> >
> > The three biggest changes I would like are:
> >
> > 1) remove "add_newdocs" and put the docstrings in the C code
> >  'add_newdocs' still needs to be there,
> >
> > The code says:
> >
> > # This is only meant to add docs to objects defined in C-extension
> modules.
> > # The purpose is to allow easier editing of the docstrings without
> > # requiring a re-compile.
> >
> > However, the change log shows that there are relatively few commits
> > to this module
> >
> >   YearNumber of commits
> >   =
> >   2012   8
> >   2011  62
> >   2010   9
> >   2009  18
> >   2008  17
> >
> > so I propose moving the docstrings to the C code, and perhaps
> > leaving 'add_newdocs' there, but only used when testing new
> > docstrings.
>
> I don't have any opinion on how acceptable this would be, but I also
> don't see a benchmark showing how much this would help?
>
> > 2) Don't optimistically assume that all submodules are
> > needed. For example, some current code uses
> >
>  import numpy
>  numpy.fft.ifft
> > 
> >
> > (See a real-world example at
> >
> http://stackoverflow.com/questions/10222812/python-numpy-fft-and-inverse-fft
> > )
> >
> > IMO, this optimizes the needs of the interactive shell
> > NumPy author over the needs of the many-fold more people
> > who don't spend their time in the REPL and/or don't need
> > those extra features added to every NumPy startup. Please
> > bear in mind that NumPy users of the first category will
> > be active on the mailing list, go to SciPy conferences,
> > etc. while members of the second category are less visible.
> >
> > I recognize that this is backwards incompatible, and will
> > not change. However, I understand that "NumPy 2.0" is a
> > glimmer in the future, which might be a natural place for
> > a transition to the more standard Python style of
> >
> >   from numpy import fft
> >
> > Personally, I think the documentation now (if it doesn't
> > already) should transition to use this form.
>
> I think this ship has sailed, but it'd be worth looking into lazy
> importing, where 'numpy.fft' isn't actually imported until someone
> starts using it. There are a bunch of libraries that do this, and one
> would have to fiddle to get compatibility with all the different
> python versions and make sure you're not killing performance (might
> have to be in C) but something along the lines of
>
> class _FFTModule(object):
>   def __getattribute__(self, name):
> mod = importlib.import_module("numpy.fft")
> _FFTModule.__getattribute__ = mod.__getattribute__
> return getattr(mod, name)
> fft = _FFTModule()
>
>
Not sure how this would impact projects like ipython that does
tab-completion support, but I know that that would drive me nuts in my
basic tab-completion setup I have for my regular python terminal.  Of
course, in the grand scheme of things, that really isn't all that
important, I don't think.

Ben Root
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] "import numpy" performance

2012-07-02 Thread Nathaniel Smith

On Mon, Jul 2, 2012 at 8:17 PM, Andrew Dalke  wrote:
> In this email I propose a few changes which I think are minor
> and which don't really affect the external NumPy API but which
> I think could improve the "import numpy" performance by at
> least 40%. This affects me because I and my clients use a
> chemistry toolkit which uses only NumPy arrays, and where
> we run short programs often on the command-line.
>
>
> In July of 2008 I started a thread about how "import numpy"
> was noticeably slow for one of my customers. They had
> chemical analysis software, often even run on a single
> molecular structure using command-line tools, and the
> several invocations with 0.1 seconds overhead was one of
> the dominant costs even when numpy wasn't needed.
>
> I fixed most of their problems by deferring numpy imports
> until needed. I remember well the Steve Jobs anecdote at
>   http://folklore.org/StoryView.py?project=Macintosh&story=Saving_Lives.txt
> and spent another day of my time in 2008 to identify the
> parts of the numpy import sequence which seemed excessive.
> I managed to get the import time down from 0.21 seconds to
> 0.08 seconds.
>
> Very little of that made it into NumPy.
>
>
> The three biggest changes I would like are:
>
> 1) remove "add_newdocs" and put the docstrings in the C code
>  'add_newdocs' still needs to be there,
>
> The code says:
>
> # This is only meant to add docs to objects defined in C-extension modules.
> # The purpose is to allow easier editing of the docstrings without
> # requiring a re-compile.
>
> However, the change log shows that there are relatively few commits
> to this module
>
>   YearNumber of commits
>   =
>   2012   8
>   2011  62
>   2010   9
>   2009  18
>   2008  17
>
> so I propose moving the docstrings to the C code, and perhaps
> leaving 'add_newdocs' there, but only used when testing new
> docstrings.

I don't have any opinion on how acceptable this would be, but I also
don't see a benchmark showing how much this would help?

> 2) Don't optimistically assume that all submodules are
> needed. For example, some current code uses
>
 import numpy
 numpy.fft.ifft
> 
>
> (See a real-world example at
>   http://stackoverflow.com/questions/10222812/python-numpy-fft-and-inverse-fft
> )
>
> IMO, this optimizes the needs of the interactive shell
> NumPy author over the needs of the many-fold more people
> who don't spend their time in the REPL and/or don't need
> those extra features added to every NumPy startup. Please
> bear in mind that NumPy users of the first category will
> be active on the mailing list, go to SciPy conferences,
> etc. while members of the second category are less visible.
>
> I recognize that this is backwards incompatible, and will
> not change. However, I understand that "NumPy 2.0" is a
> glimmer in the future, which might be a natural place for
> a transition to the more standard Python style of
>
>   from numpy import fft
>
> Personally, I think the documentation now (if it doesn't
> already) should transition to use this form.

I think this ship has sailed, but it'd be worth looking into lazy
importing, where 'numpy.fft' isn't actually imported until someone
starts using it. There are a bunch of libraries that do this, and one
would have to fiddle to get compatibility with all the different
python versions and make sure you're not killing performance (might
have to be in C) but something along the lines of

class _FFTModule(object):
  def __getattribute__(self, name):
mod = importlib.import_module("numpy.fft")
_FFTModule.__getattribute__ = mod.__getattribute__
return getattr(mod, name)
fft = _FFTModule()

> 3) Especially: don't always import 'numpy.testing'
>
> As far as I can tell, automatic import of this module
> is not needed, so is pure overhead for the vast majority
> of NumPy users. Unfortunately, there's a large number
> of user-facing 'test' and 'bench' bound methods acting
> as functions.
>
> from numpy.testing import Tester
> test = Tester().test
> bench = Tester().test
>
> They seem rather pointless to me but could be replaced
> with per-module functions like
>
> def test(...):
>from numpy.testing import Tester
>Tester().test(...)
>
>
>
>
> I have not worried about numpy import performance for
> 4 years. While I have been developing scientific software
> for 20 years, and in Python for 15 years, it has been
> in areas of biology and chemistry which don't use arrays.
> I use numpy for a day about once every two years, and
> so far I have had no reason to use scipy.
>
>
> This has changed.
>
> I talked with one of my clients last week. They (and I)
> use a chemistry toolkit called "RDKit". RDKit uses
> numpy as a way to store coordinate data for molecules.
> I checked with the package author and he confirms:
>
>   yeah, it's just using the homogenous array most of the time.
>
> My client complained about RDKit's high startup cost,
> due to the NumPy depe

Re: [Numpy-discussion] "import numpy" performance

2012-07-02 Thread David Cournapeau

On Mon, Jul 2, 2012 at 8:17 PM, Andrew Dalke  wrote:
> In this email I propose a few changes which I think are minor
> and which don't really affect the external NumPy API but which
> I think could improve the "import numpy" performance by at
> least 40%. This affects me because I and my clients use a
> chemistry toolkit which uses only NumPy arrays, and where
> we run short programs often on the command-line.
>
>
> In July of 2008 I started a thread about how "import numpy"
> was noticeably slow for one of my customers. They had
> chemical analysis software, often even run on a single
> molecular structure using command-line tools, and the
> several invocations with 0.1 seconds overhead was one of
> the dominant costs even when numpy wasn't needed.
>
> I fixed most of their problems by deferring numpy imports
> until needed. I remember well the Steve Jobs anecdote at
>   http://folklore.org/StoryView.py?project=Macintosh&story=Saving_Lives.txt
> and spent another day of my time in 2008 to identify the
> parts of the numpy import sequence which seemed excessive.
> I managed to get the import time down from 0.21 seconds to
> 0.08 seconds.

I will answer to your other remarks later, but 0.21 sec to import
numpy is very slow, especially on a recent computer. It is 0.095 sec
on my mac, and 0.075 sec on a linux VM on the same computer (both hot
cache of course).

importing multiarray.so only is negligible for me (i.e. difference
between python -c "import multiarray" and python -c "" is
statistically insignificant).

I would check external factors, like the size of your sys.path as well.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] "import numpy" performance

Re: [Numpy-discussion] "import numpy" performance

Re: [Numpy-discussion] "import numpy" performance

Re: [Numpy-discussion] "import numpy" performance

Re: [Numpy-discussion] "import numpy" performance

Re: [Numpy-discussion] "import numpy" performance

Re: [Numpy-discussion] "import numpy" performance

Re: [Numpy-discussion] "import numpy" performance

Re: [Numpy-discussion] "import numpy" performance

Re: [Numpy-discussion] "import numpy" performance

Re: [Numpy-discussion] "import numpy" performance

Re: [Numpy-discussion] "import numpy" performance

Re: [Numpy-discussion] "import numpy" performance

Re: [Numpy-discussion] "import numpy" performance

Re: [Numpy-discussion] "import numpy" performance

Re: [Numpy-discussion] "import numpy" performance

Re: [Numpy-discussion] "import numpy" performance

Re: [Numpy-discussion] "import numpy" performance

Re: [Numpy-discussion] "import numpy" performance

Re: [Numpy-discussion] "import numpy" performance

Re: [Numpy-discussion] "import numpy" performance

Re: [Numpy-discussion] "import numpy" performance

Re: [Numpy-discussion] "import numpy" performance

Re: [Numpy-discussion] "import numpy" performance

24 matches

Site Navigation

Mail list logo

Footer information