Re: [Distutils] Cache PYTHONPATH? (Re: make unzipped eggs be the default)

2009-08-02 Thread Tarek Ziadé
On Wed, Jul 29, 2009 at 6:44 AM, P.J. Ebyp...@telecommunity.com wrote:
 At 10:35 PM 7/28/2009 -0500, Ian Bicking wrote:

 On Tue, Jul 28, 2009 at 9:40 PM, P.J. Ebyp...@telecommunity.com wrote:
  At 09:22 PM 7/28/2009 -0500, Ian Bicking wrote:
 
  I can see how this could go quite wrong, but maybe if installers touch
  some file in the library directory anytime a package is
  installed/reinstalled/removed/etc,
 
  You mean, like, the mtime of the directory itself? Â ;-)

 Do directory mtimes get recursively updated?  I don't think they do.

 That's not necessary; if imports use a cached listdir, then the children
 will get handled recursively.

 So if you have a layout:

 site-packages/
  zope/
    interface/
      __init__.py

 And you update the package and update __init__.py, the mtime of
 site-packages doesn't change, does it?

 Nope, but at the top level, the fact that 'zope' is present is unchanged, as
 is the presence of an 'interface' subdirectory.


 I'm saying if there was a file in site-packages/last_updated that gets
 touched everytime an installer does anything in site-packages, then
 you could cache (between processes) the lookups.

 Since each invocation of the interpreter can have a different PYTHONPATH,
 the cache has to be per-directory, not global.  If it's per-directory, then
 there's no real benefit over runtime caching, since you now have to open and
 read a file (instead of just reading the directory).  And as I said, it's
 not realistic to think that opening and reading a file is going to beat
 opening and reading a directory for speed.

But opening and reading one file should beat opening hundreds of directories :

For instance, a plone 3 application will have +100 sys.path entries because
this zc.buildout (the Plone standard) adds one entry per egg in sys.path.

So being able to cache'em should speed things up.

In the PEP 376 prototype, after thinking about a per-directory cache
like you are
describing, I was thinking about having a global index file to replace
the global dictionnary that keeps track of the distributions per
directory (currently the directory path
is  the key in the dictionnary and the value the distribution objects).

That can even be a simple shelve of the dictionary, that become a
global index of directories
that [are/were once] in the path. This works as long as the index file
is per-user.
Or even better : per-application. I don't know how this could be
managed/done, but
a simple cache file created alongside the script the application is
launched with, could
speed up the lookups at the second launch.

Cheers
Tarek


-- 
Tarek Ziadé | http://ziade.org
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Cache PYTHONPATH? (Re: make unzipped eggs be the default)

2009-08-02 Thread P.J. Eby

At 06:52 PM 8/2/2009 +0200, Tarek Ziadé wrote:

On Wed, Jul 29, 2009 at 6:44 AM, P.J. Ebyp...@telecommunity.com wrote:
 At 10:35 PM 7/28/2009 -0500, Ian Bicking wrote:

 On Tue, Jul 28, 2009 at 9:40 PM, P.J. Ebyp...@telecommunity.com wrote:
  At 09:22 PM 7/28/2009 -0500, Ian Bicking wrote:
 
  I can see how this could go quite wrong, but maybe if installers touch
  some file in the library directory anytime a package is
  installed/reinstalled/removed/etc,
 
  You mean, like, the mtime of the directory itself? Â ;-)

 Do directory mtimes get recursively updated?  I don't think they do.

 That's not necessary; if imports use a cached listdir, then the children
 will get handled recursively.

 So if you have a layout:

 site-packages/
  zope/
interface/
  __init__.py

 And you update the package and update __init__.py, the mtime of
 site-packages doesn't change, does it?

 Nope, but at the top level, the fact that 'zope' is present is 
unchanged, as

 is the presence of an 'interface' subdirectory.


 I'm saying if there was a file in site-packages/last_updated that gets
 touched everytime an installer does anything in site-packages, then
 you could cache (between processes) the lookups.

 Since each invocation of the interpreter can have a different PYTHONPATH,
 the cache has to be per-directory, not global.  If it's per-directory, then
 there's no real benefit over runtime caching, since you now have 
to open and

 read a file (instead of just reading the directory).  And as I said, it's
 not realistic to think that opening and reading a file is going to beat
 opening and reading a directory for speed.

But opening and reading one file should beat opening hundreds of directories :
In the PEP 376 prototype, after thinking about a per-directory cache
like you are
describing, I was thinking about having a global index file to replace
the global dictionnary that keeps track of the distributions per
directory (currently the directory path
is  the key in the dictionnary and the value the distribution objects).

That can even be a simple shelve of the dictionary, that become a
global index of directories
that [are/were once] in the path. This works as long as the index file
is per-user.
Or even better : per-application. I don't know how this could be
managed/done, but
a simple cache file created alongside the script the application is
launched with, could
speed up the lookups at the second launch.


You'd still have to stat the directories to know if they changed - in 
which case the logic I've already laid out still applies.


I think, however, we are discussing different nominal scenarios.  I'm 
assuming a post-PEP 376 world where the only use for .egg files or 
directories are for *non-default* versions of packages, that only get 
added to sys.path for apps or libraries that need them, rather than 
being in a default .pth file.


However, if you're discussing speeding up an environment where we use 
.egg directories and they're on sys.path, then a per-user global 
cache might speed things up.  For security reasons, however, that 
cache would need to be ignored by Python when running secure 
scripts.  (e.g. -s and -E options, and definitely anything setuid.)


In contrast, directory stat caching with a modest number of (non-egg) 
PYTHONPATH entries would speed things nicely in the 
hopefully-future-default case.


___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Cache PYTHONPATH? (Re: make unzipped eggs be the default)

2009-07-29 Thread Jeff Rush
David Lyon wrote:
 
 Third party libraries are rarely so big that they need to
 be compressed to save disk space.. on any of the systems
 that i know about anyway..

Hi David.  Not just your post but others here are making assumptions on
your own working environment.  Yes there are systems you need to save
disk space on, yes there are systems where you care about I/O
performance.  These are embedded systems.

Python has a strong and growing following on small devices such as
phones (OpenMoko), music players, settops, netbooks, OLPC/XO and such.
If you haven't been following it, the Python-on-a-Chip initiative formed
from several projects took place at PyCon 2009.  The language is in a
position to become the standard control language for devices, if we
don't hobble it by assuming Python is always run on a full-blown desktop.

This attitude of allowing Python to always grow larger is prevalent on
the core developers list as well, where they are removing the ability to
compile Python selectively to drop out those portions not needed on a
platform.  The attitude there was if the embedded folks want a stripped
down version they can create and maintain it themselves, redoing work
already done years ago.  But they won't -- they'll chose the path of
least resistance and choose a more lightweight language.

Pardon the rant.  I just get frustrated when people believe that the
path forward is faster and bigger systems on our desktops when actually
desktops are dying and will be rare in ten years.  Let's keep Python
lean and flexible so it takes up residence in the infrastructure instead.

And the benefit of defaulting to zipped eggs is that it enforces on the
developer the discipline of writing his packages to use pkg_resources
instead of file I/O, to retain the future option of alternate packaging
formats.  Developers know, especially those using
test-driven-development, that if you don't regularly test against an
environment, your code will gradually rot and no longer run in that
environment.

-Jeff
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Cache PYTHONPATH? (Re: make unzipped eggs be the default)

2009-07-29 Thread Lennart Regebro
2009/7/29 Jeff Rush j...@taupro.com:
 Hi David.  Not just your post but others here are making assumptions on
 your own working environment.  Yes there are systems you need to save
 disk space on, yes there are systems where you care about I/O
 performance.  These are embedded systems.

Exactly. But the fact still is that these systems are the specialized
case today, so lets stop optimizing the *default* settings for them.

 And the benefit of defaulting to zipped eggs is that it enforces on the
 developer the discipline of writing his packages to use pkg_resources
 instead of file I/O

No, it just forces the developer to set zip_safe to False.

-- 
Lennart Regebro: Python, Zope, Plone, Grok
http://regebro.wordpress.com/
+33 661 58 14 64
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Cache PYTHONPATH? (Re: make unzipped eggs be the default)

2009-07-29 Thread David Lyon
On Wed, 29 Jul 2009 01:34:11 -0500, Jeff Rush j...@taupro.com wrote:
 Hi David.  Not just your post but others here are making assumptions on
 your own working environment.  Yes there are systems you need to save
 disk space on, yes there are systems where you care about I/O
 performance.  These are embedded systems.

Maybe you too are making the assumption that I've never worked on
such devices.. :-)

I have..

 This attitude of allowing Python to always grow larger is prevalent on
 the core developers list as well, where they are removing the ability to
 compile Python selectively to drop out those portions not needed on a
 platform.

ok. But people want to add their own code.. rarely do they want to
take away.. people resist if their code is taken away..

 Pardon the rant.  I just get frustrated when people believe that the
 path forward is faster and bigger systems on our desktops when actually
 desktops are dying and will be rare in ten years.

Only because motherboards will be embedded into the monitors more and
more often..

Anyway, you're kindof biting the hairs on the tail here.. because 3rd 
party packages don't impact the size of the whole python installation 
that much.

Still, it's an interesting point...

David


___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Cache PYTHONPATH? (Re: make unzipped eggs be the default)

2009-07-29 Thread Robert Kern

On 2009-07-29 16:47, David Lyon wrote:


Anyway, you're kindof biting the hairs on the tail here.. because 3rd
party packages don't impact the size of the whole python installation
that much.


My site-packages directory would like a word with you:

[~]$ cd /Library/Frameworks/Python.framework/Versions/Current
[Current]$ du -hsc .
1.5G.
1.5Gtotal
[Current]$ du -hsc lib/python2.5/site-packages
1.4Glib/python2.5/site-packages
1.4Gtotal

--
Robert Kern

I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth.
  -- Umberto Eco

___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Cache PYTHONPATH? (Re: make unzipped eggs be the default)

2009-07-29 Thread Jean-Paul Calderone

On Wed, 29 Jul 2009 09:37:17 +0200, Lennart Regebro rege...@gmail.com wrote:

2009/7/29 Jeff Rush j...@taupro.com:

Hi David.  Not just your post but others here are making assumptions on
your own working environment.  Yes there are systems you need to save
disk space on, yes there are systems where you care about I/O
performance.  These are embedded systems.


Exactly. But the fact still is that these systems are the specialized
case today, so lets stop optimizing the *default* settings for them.


And the benefit of defaulting to zipped eggs is that it enforces on the
developer the discipline of writing his packages to use pkg_resources
instead of file I/O


No, it just forces the developer to set zip_safe to False.



+1.  Python offers too many convenient ways to do it wrong.  Zipped
eggs break deployments.  They don't make developers write code that
works in that environment.  Such code only gets written when developers
choose to care about such cases.  If you want Python to excel in these
areas, you need to convince developers to care.

Jean-Paul
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


[Distutils] Cache PYTHONPATH? (Re: make unzipped eggs be the default)

2009-07-28 Thread Greg Ewing

P.J. Eby wrote:

So the optimum performance tradeoff depends on how many imports you have 
*and* how many eggs you have on sys.path.  If you have lots of eggs and 
few imports, unzipped ones will probably be faster.  If you have lots of 
eggs and *lots* of imports, zipped ones will probably be faster.


I'm wondering whether something could be gained by
cacheing the results of sys.path lookups somehow
between interpreter invocations.

Most of the time the contents of the directories
on one's PYTHONPATH don't change, so doing all this
statting and directory reading every time an
interpreter starts up seems rather suboptimal.

--
Greg
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Cache PYTHONPATH? (Re: make unzipped eggs be the default)

2009-07-28 Thread P.J. Eby

At 01:02 PM 7/29/2009 +1200, Greg Ewing wrote:

P.J. Eby wrote:

So the optimum performance tradeoff depends on how many imports you 
have *and* how many eggs you have on sys.path.  If you have lots of 
eggs and few imports, unzipped ones will probably be faster.  If 
you have lots of eggs and *lots* of imports, zipped ones will 
probably be faster.


I'm wondering whether something could be gained by
cacheing the results of sys.path lookups somehow
between interpreter invocations.

Most of the time the contents of the directories
on one's PYTHONPATH don't change, so doing all this
statting and directory reading every time an
interpreter starts up seems rather suboptimal.


The catch is that then you need some way to know whether your cache 
information is wrong/out-of-date.  I suppose, though, that you could 
do something like make a file that contains stat times, such that 
modifying the contained directory would automatically invalidate the 
cache info.


However, you'd probably gain more by making the core import logic 
simply use the dircache module (or a C equivalent thereof) in place 
of stat() calls.  This would drop the per-import stat() count for 
each directory to 1 (in place of several for .py, .pyc, .pyd/.so, 
/__init__.py, etc.), at the cost of an initial listdir() call the 
first time a directory is used.  This would give normal imports most 
of the speedup benefit that e.g. putting the stdlib in a zipfile does.


___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Cache PYTHONPATH? (Re: make unzipped eggs be the default)

2009-07-28 Thread Ian Bicking
On Tue, Jul 28, 2009 at 8:02 PM, Greg Ewinggreg.ew...@canterbury.ac.nz wrote:
 P.J. Eby wrote:

 So the optimum performance tradeoff depends on how many imports you have
 *and* how many eggs you have on sys.path.  If you have lots of eggs and few
 imports, unzipped ones will probably be faster.  If you have lots of eggs
 and *lots* of imports, zipped ones will probably be faster.

 I'm wondering whether something could be gained by
 cacheing the results of sys.path lookups somehow
 between interpreter invocations.

 Most of the time the contents of the directories
 on one's PYTHONPATH don't change, so doing all this
 statting and directory reading every time an
 interpreter starts up seems rather suboptimal.

I can see how this could go quite wrong, but maybe if installers touch
some file in the library directory anytime a package is
installed/reinstalled/removed/etc, then it would be fast to check if
the cache was correct.  Though the optimization seems like its working
around something that maybe shouldn't be a problem.

-- 
Ian Bicking  |  http://blog.ianbicking.org  |  http://topplabs.org/civichacker
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Cache PYTHONPATH? (Re: make unzipped eggs be the default)

2009-07-28 Thread P.J. Eby

At 09:22 PM 7/28/2009 -0500, Ian Bicking wrote:
I can see how this could go quite wrong, but maybe if installers 
touch some file in the library directory anytime a package is 
installed/reinstalled/removed/etc,


You mean, like, the mtime of the directory itself?  ;-)

Really, there's no need for a file.  It seems really, really unlikely 
that there's any common filesystem where reading a file containing 
the (maybe out-of-date) contents of a directory is faster than just 
reading the directory itself.  And, courtesy of the time machine, 
there's even a 'dircache' module already in the stdlib.


i.e. if you use dircache.listdir() in place of regular listdir, 
you'll only have to read the directory once.


(Another way to do this, of course, would be to have importlib 
importer objects use the same logic to keep a cache of their target directory.)


___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Cache PYTHONPATH? (Re: make unzipped eggs be the default)

2009-07-28 Thread Ian Bicking
On Tue, Jul 28, 2009 at 9:40 PM, P.J. Ebyp...@telecommunity.com wrote:
 At 09:22 PM 7/28/2009 -0500, Ian Bicking wrote:

 I can see how this could go quite wrong, but maybe if installers touch
 some file in the library directory anytime a package is
 installed/reinstalled/removed/etc,

 You mean, like, the mtime of the directory itself?  ;-)

Do directory mtimes get recursively updated?  I don't think they do.
So if you have a layout:

site-packages/
  zope/
interface/
  __init__.py

And you update the package and update __init__.py, the mtime of
site-packages doesn't change, does it?

I'm saying if there was a file in site-packages/last_updated that gets
touched everytime an installer does anything in site-packages, then
you could cache (between processes) the lookups.

-- 
Ian Bicking  |  http://blog.ianbicking.org  |  http://topplabs.org/civichacker
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Cache PYTHONPATH? (Re: make unzipped eggs be the default)

2009-07-28 Thread P.J. Eby

At 10:35 PM 7/28/2009 -0500, Ian Bicking wrote:

On Tue, Jul 28, 2009 at 9:40 PM, P.J. Ebyp...@telecommunity.com wrote:
 At 09:22 PM 7/28/2009 -0500, Ian Bicking wrote:

 I can see how this could go quite wrong, but maybe if installers touch
 some file in the library directory anytime a package is
 installed/reinstalled/removed/etc,

 You mean, like, the mtime of the directory itself? Â ;-)

Do directory mtimes get recursively updated?  I don't think they do.


That's not necessary; if imports use a cached listdir, then the 
children will get handled recursively.



So if you have a layout:

site-packages/
  zope/
interface/
  __init__.py

And you update the package and update __init__.py, the mtime of
site-packages doesn't change, does it?


Nope, but at the top level, the fact that 'zope' is present is 
unchanged, as is the presence of an 'interface' subdirectory.




I'm saying if there was a file in site-packages/last_updated that gets
touched everytime an installer does anything in site-packages, then
you could cache (between processes) the lookups.


Since each invocation of the interpreter can have a different 
PYTHONPATH, the cache has to be per-directory, not global.  If it's 
per-directory, then there's no real benefit over runtime caching, 
since you now have to open and read a file (instead of just reading 
the directory).  And as I said, it's not realistic to think that 
opening and reading a file is going to beat opening and reading a 
directory for speed.


___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] Cache PYTHONPATH? (Re: make unzipped eggs be the default)

2009-07-28 Thread David Lyon
P.J. Eby wrote:
So the optimum performance tradeoff depends on how many imports you 
have *and* how many eggs you have on sys.path.  

Spoken like a true master...

and it's imho a real design bludner (blunder)..

sys.path is meant to contain directories for which interpretor
can check for packages.

Adding eggs to sys.path just prioritizes eggs (higher) and means
that anytime a package is imported, virtually every egg must be
opened to check if it has the appropriate package.

imho it's an abuse of the sys.path to do things this way.

Eggs should sit in site-packages directories like any other
package and wait their turn.

.zip/.egg should just be a transport format. The site-package
directory should just hold packages of a like format.

Third party libraries are rarely so big that they need to
be compressed to save disk space.. on any of the systems
that i know about anyway..

/ranting

David



___
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig