At 06:52 PM 8/2/2009 +0200, Tarek Ziadé wrote:
On Wed, Jul 29, 2009 at 6:44 AM, P.J. Eby<p...@telecommunity.com> wrote:
> At 10:35 PM 7/28/2009 -0500, Ian Bicking wrote:
>>
>> On Tue, Jul 28, 2009 at 9:40 PM, P.J. Eby<p...@telecommunity.com> wrote:
>> > At 09:22 PM 7/28/2009 -0500, Ian Bicking wrote:
>> >>
>> >> I can see how this could go quite wrong, but maybe if installers touch
>> >> some file in the library directory anytime a package is
>> >> installed/reinstalled/removed/etc,
>> >
>> > You mean, like, the mtime of the directory itself? Â ;-)
>>
>> Do directory mtimes get recursively updated?  I don't think they do.
>
> That's not necessary; if imports use a cached listdir, then the children
> will get handled recursively.
>
>> So if you have a layout:
>>
>> site-packages/
>>  zope/
>>    interface/
>>      __init__.py
>>
>> And you update the package and update __init__.py, the mtime of
>> site-packages doesn't change, does it?
>
> Nope, but at the top level, the fact that 'zope' is present is unchanged, as
> is the presence of an 'interface' subdirectory.
>
>
>> I'm saying if there was a file in site-packages/last_updated that gets
>> touched everytime an installer does anything in site-packages, then
>> you could cache (between processes) the lookups.
>
> Since each invocation of the interpreter can have a different PYTHONPATH,
> the cache has to be per-directory, not global.  If it's per-directory, then
> there's no real benefit over runtime caching, since you now have to open and
> read a file (instead of just reading the directory).  And as I said, it's
> not realistic to think that opening and reading a file is going to beat
> opening and reading a directory for speed.

But opening and reading one file should beat opening hundreds of directories :
In the PEP 376 prototype, after thinking about a per-directory cache
like you are
describing, I was thinking about having a global index file to replace
the global dictionnary that keeps track of the distributions per
directory (currently the directory path
is  the key in the dictionnary and the value the distribution objects).

That can even be a simple shelve of the dictionary, that become a
global index of directories
that [are/were once] in the path. This works as long as the index file
is per-user.
Or even better : per-application. I don't know how this could be
managed/done, but
a simple cache file created alongside the script the application is
launched with, could
speed up the lookups at the second launch.

You'd still have to stat the directories to know if they changed - in which case the logic I've already laid out still applies.

I think, however, we are discussing different nominal scenarios. I'm assuming a post-PEP 376 world where the only use for .egg files or directories are for *non-default* versions of packages, that only get added to sys.path for apps or libraries that need them, rather than being in a default .pth file.

However, if you're discussing speeding up an environment where we use .egg directories and they're on sys.path, then a per-user global cache might speed things up. For security reasons, however, that cache would need to be ignored by Python when running secure scripts. (e.g. -s and -E options, and definitely anything setuid.)

In contrast, directory stat caching with a modest number of (non-egg) PYTHONPATH entries would speed things nicely in the hopefully-future-default case.

_______________________________________________
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig

Reply via email to