On Sat, Apr 05, 2008 at 10:49:24PM -0400, Phillip J. Eby wrote:
> At 02:18 AM 4/6/2008 +0100, Floris Bruynooghe wrote:
>> On Sat, Apr 05, 2008 at 07:50:19PM -0400, Phillip J. Eby wrote:
>> > At 10:07 PM 4/5/2008 +0100, Floris Bruynooghe wrote:
>> > (One comment, though: I really don't like the idea of extending PKG-INFO
>> > to include installation data; it's only incidentally related and there
>> > are other contexts in which we use PKG-INFO where having that data
>> > included would make no sense.  Plus, it's really not an ideal file format
>> > for including data about a potentially rather large number of files.)
>>
>> That's fair.  Blowing up the files with the PKG-INFO information in
>> could have bad performance effects.  rfc822 in the stdlib reads
>> everything in memory AFAIK.
>>
>> >> Secondly I'm not sure how
>> >> useful it is for the version number to be encoded in the filename.
>> >
>> > It's very useful for setuptools, as it avoids the need to open and parse
>> > the file when searching for a suitable version of a desired package.
>>
>> Hmm, it's not that much work to read the contents of a .egg-info.
>> Just seems odd to me to have this info in two places so close to each
>> other.
>
> It allows pkg_resources to grok the entire contents of a directory using 
> only a single listdir operation -- not an unbounded number of  
> open-and-read operations.

I'm still not thrilled.  To quote the "Rejected Suggestions" section
of PEP 262: "First, performance is probably not an extremely pressing
concern as the database is only used when installing or removing
software, a relatively infrequent task."

Yet, it's a done fact so there's no point in me complaining about it -
I'll live with it.


>> The second part was introducing a "virtual project" for pure namespace
>> packages, where the project name would have to be the same as the
>> package name in order to find it.
>
> I think there would also need to be some prefix to the name, to prevent 
> confusion in the event that there exists a normal project name that 
> happens to use that package name.  (Again: the two namespaces are 
> unrelated, so a new/reserved namespace would be required for these 
> virtual projects.)

Sounds sensible.


>> >> AFAIK this should cover namespace packages.
>> >
>> > Unfortunately, this doesn't fix the problem, since either *some* package
>> > has to own the __init__.py, or there has to be a way for Python to treat
>> > the directory as a package without one.  And for system package managers
>> > (esp. on Linux), some *one* system package must own the file - it can't
>> > be owned by multiple system packages.
>>
>> With the format I suggested a package tool could detect on install if
>> a required pure namespace package was already installed or still
>> needed to be installed/created.  Similar on removal it is possible to
>> detect if the pure namespace package is still required (by checking if
>> it's directory contains any other files then those provided by the
>> namespace package) on removal of a sub-package.
>
> Again...  some system packaging folks need to speak up on this, because 
> my understanding is that some tools simply can't do something like this.  
> They need to make explicit what a given package depends on, and install 
> that, not dynamically decide what dependencies something has.  (And then 
> there is the possibility of a problem if a non-system packager installs 
> the namespace, and then you install a system package for something that 
> includes packages in that namespace.)

As for dpkg it will just overwirte an existing __init__.py in the
namespace package if it doesn't own it.  It won't even tell you it did
so (I was surprised at this).

However --and I know you don't like this-- this still is no problem.
What we are concerned here is that a user or sysadmin owned directory
on the sys.path can be managed sanely.  dpkg and co will keep out of
those, they have /usr/lib to play in, and sysadmins or users should
stay out of /usr/lib in their turn.

What is needed to cooperate with system packagers is:

1. Detect existing packages on other directories of sys.path and
  accept them to satisfy dependencies on the distribution being
  installed.

2. Find a solution for a namespace package spread out over two
   directories of sys.path.


>> Maybe we're making it too hard by wanting to cover *every* file
>> installed by python projects?  The main reason for this installdb, as
>> I understand it, is so that a package tool can install a sub-project
>> in a namespace package installed by someone else.  And similarly that
>> someone else doesn't wipe away the sub-package when it thinks it can
>> remove the namespace package.
>
> It's not just about namespace packages, it's about any package or  
> module.  We also want to know about installed scripts, data, etc., so  
> that they can be cleaned up by a tool that does uninstalls.

No, it's only about namespace packages.  Everything else is easy, each
tool can keep their own database of installed package in a suitable
location if it wants to do that.  If you didn't install a file you
don't remove it.

>>   Ah, this make me think of the people
>> that complain on comp.lang.python that Python namespaces are too
>> tightly bound to files and directories...  It all makes sense now, we
>> wouldn't even be having this discussion if a package could declare
>> it's namespace in the code!  ;-)
>
> Or if you could import from directories without needing there to be an 
> __init__.py, and Python supported namespace packages by default.

Also good point.  I'm sure people can come up with negative
site-effect of this but I can't come up with any myself now.  So any
takers?  Is this a possible option to solve the problem?  What is the
reason for requiring __init__.py?


The longer this discussion goes on the less I like the idea of a full
PEP 262 style database (I do admit that at first it seemed like a
reasonable idea to me).  One issue I've always had with it is that it
suddenly stores management data in library directories (it should live
in /var).  The .egg-info files do already do this, but then they only
really provide the sort of information that can be found in .so files
of shared libraries but for python files.

To summarise what I think are the issues:

* Python packaging tools (distutils, setuptools) need to be able to
  detect packages on all sys.path directories and use them to satisfy
  dependencies.  AIUI this is already done in Python 2.5 with the
  .egg-info files.

* Python packaging tools need to be able to share namespace packages
  in a user owned sys.path/site-packages directory.  Installation and
  removal of the __init__.py needs coordination between the different
  tools.  This is what PEP 262 could solve, but it's not necesarily
  the best or most loved solution.

* Namespace packages need to be able to be spread over multiple
  sys.path directories so that the system can provide part of it, the
  sysadmin some more and the user yet another sub-package.



-- 
Debian GNU/Linux -- The Power of Freedom
www.debian.org | www.gnu.org | www.kernel.org
_______________________________________________
Distutils-SIG maillist  -  [email protected]
http://mail.python.org/mailman/listinfo/distutils-sig

Reply via email to