Re: [Distutils] PyPI is a sick sick hoarder

2015-05-16 Thread Donald Stufft

> On May 16, 2015, at 1:24 PM, Nick Coghlan  wrote:
> 
> On 17 May 2015 at 00:36, Justin Cappos  wrote:
>> This only considers computation cost though.  Other factors can become more
>> expensive than computation.  For example, SAT solvers need all the rules to
>> consider.  So a SAT solution needs to effectively download the full
>> dependency graph before starting.  A backtracking dependency resolver can
>> just download packages or dependency information as it considers them.
> 
> This is the defining consideration for pip at this point: a SAT solver
> requires publication of static dependency metadata on PyPI, which is
> dependent on both the Warehouse migration *and* the completion and
> acceptance of PEP 426. Propagation out to PyPI caching proxies and
> mirrors like devpi and the pulp-python plugin will then take even
> longer.
> 
> A backtracking resolver doesn't have those gating dependencies, as it
> can tolerate the current dynamic metadata model.
> 


Even when we have Warehouse and PEP 426, that only gives us that data going
forward, the 400k files that currently exist on PyPI still won’t have static
metadata. We could parse it out for Wheels but not for anything else. For the
foreseeable future any solution will need to be able to handle iteratively
finding constraints. Though I think a SAT solver can do it if it can handle
incremental solving or just by re-doing the SAT problem each time we discover
a new constraint.


---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-16 Thread Nick Coghlan
On 17 May 2015 at 00:36, Justin Cappos  wrote:
> This only considers computation cost though.  Other factors can become more
> expensive than computation.  For example, SAT solvers need all the rules to
> consider.  So a SAT solution needs to effectively download the full
> dependency graph before starting.  A backtracking dependency resolver can
> just download packages or dependency information as it considers them.

This is the defining consideration for pip at this point: a SAT solver
requires publication of static dependency metadata on PyPI, which is
dependent on both the Warehouse migration *and* the completion and
acceptance of PEP 426. Propagation out to PyPI caching proxies and
mirrors like devpi and the pulp-python plugin will then take even
longer.

A backtracking resolver doesn't have those gating dependencies, as it
can tolerate the current dynamic metadata model.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-16 Thread David Cournapeau
On Sun, May 17, 2015 at 12:40 AM, Daniel Holth  wrote:

>
> On May 16, 2015 11:22 AM, "David Cournapeau"  wrote:
> >
> >
> >
> > On Sat, May 16, 2015 at 11:36 PM, Justin Cappos  wrote:
> >>>
> >>> I am no expert, but I don't understand why backtracking algorithms
> would to be faster than SAT, since they both potentially need to walk over
> the full set of possible solutions. It is hard to reason about the cost
> because the worst case is in theory growing exponentially in both cases.
> >>
> >>
> >> This is talked about a bit in this thread:
> https://github.com/pypa/pip/issues/988
> >>
> >> Each algorithm could be computationally more efficient.  Basically, *if
> there are no conflicts* backtracking will certainly win.  If there are a
> huge number of conflicts a SAT solver will certainly win.  It's not clear
> where the tipping point is between the two schemes.
> >>
> >> However, a better question is does the computational difference
> matter?  If one is a microsecond faster than the other, I don't think
> anyone cares.  However, from the OPIUM paper (listed off of that thread),
> it is clear that SAT solver resolution can be slow without optimizations to
> make them work more like backtracking resolvers.  From my experience
> backtracking resolvers are also slow when the conflict rate is high.
> >
> >
> > Pure SAT is fast enough in practice in my experience (concretely:
> solving thousand of rules takes < 1 sec). It becomes more complicated as
> you need to optimize the solution, especially when you have already
> installed packages. This is unfortunately not as well discussed in the
> literature. Pseudo-boolean SAT for optimization was argued to be too slow
> by the 0 install people, but OTOH, this seems to be what's used in conda,
> so who knows :)
>
> Where optimizing means something like "find a solution with the newest
> possible releases of the required packages", not execution speed.
>

Indeed, it was not obvious in this context :) Though in theory,
optimization is more general. It could be optimizing w.r.t. a cost function
taking into account #packages, download size, minimal number of changes,
etc... This is where you want a pseudo-boolean SAT, which is what conda
uses I think.

0install, composer and I believe libsolv took a different route, and use
heuristics to find a reasonably good solution by picking the next
candidate. This requires access to the internals of the SAT solver though
(not a problem if you have a python implementation).

David

> > If you SAT solver is in pure python, you can choose a "direction" of the
> search which is more meaningful. I believe this is what 0install does from
> reading http://0install.net/solver.html, and what we have in our own SAT
> solver code. I unfortunately cannot look at the 0install code myself as it
> is under the GPL and am working on a BSD solver implementation. I also do
> not know how they handle updates and already installed packages.
> >
> >>
> >> This only considers computation cost though.  Other factors can become
> more expensive than computation.  For example, SAT solvers need all the
> rules to consider.  So a SAT solution needs to effectively download the
> full dependency graph before starting.  A backtracking dependency resolver
> can just download packages or dependency information as it considers them.
> The bandwidth cost for SAT solvers should be higher.
> >
> >
> > With a reasonable representation, I think you can make it small enough.
> To give an idea, our index @ Enthought containing around 20k packages takes
> ~340 kb compressed w/ bz2 if you only keep the data required for dependency
> handling (name, version and runtime dependencies), and that's using json,
> an inefficient encoding, so I suspect encoding all of pypi may be a few MB
> only fetch, which is generally faster that doing tens of http requests.
> >
> > The libsvolv people worked on a binary representation that may also be
> worth looking at.
> >
> >>
> >> P.S.  If you'd like to talk off list, possibly over Skype, I'd be happy
> to talk more with you and/or Robert about minutiae that others may not care
> about.
> >
> >
> > Sure, I would be happy too. As I mentioned before, we have some code
> around a SAT-based solver, but it is not ready yet, which is why we kept it
> private (https://github.com/enthought/sat-solver). It handles well (==
> both speed and quality-wise) the case where nothing is installed, but
> behaves poorly when packages are already installed, and does not handle the
> update case yet. The code is also very prototype-ish, but is not too
> complicated to experimente with it.
> >
> > David
> >
> >
> > ___
> > Distutils-SIG maillist  -  Distutils-SIG@python.org
> > https://mail.python.org/mailman/listinfo/distutils-sig
> >
>
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-16 Thread Daniel Holth
On May 16, 2015 11:22 AM, "David Cournapeau"  wrote:
>
>
>
> On Sat, May 16, 2015 at 11:36 PM, Justin Cappos  wrote:
>>>
>>> I am no expert, but I don't understand why backtracking algorithms
would to be faster than SAT, since they both potentially need to walk over
the full set of possible solutions. It is hard to reason about the cost
because the worst case is in theory growing exponentially in both cases.
>>
>>
>> This is talked about a bit in this thread:
https://github.com/pypa/pip/issues/988
>>
>> Each algorithm could be computationally more efficient.  Basically, *if
there are no conflicts* backtracking will certainly win.  If there are a
huge number of conflicts a SAT solver will certainly win.  It's not clear
where the tipping point is between the two schemes.
>>
>> However, a better question is does the computational difference matter?
If one is a microsecond faster than the other, I don't think anyone cares.
However, from the OPIUM paper (listed off of that thread), it is clear that
SAT solver resolution can be slow without optimizations to make them work
more like backtracking resolvers.  From my experience backtracking
resolvers are also slow when the conflict rate is high.
>
>
> Pure SAT is fast enough in practice in my experience (concretely: solving
thousand of rules takes < 1 sec). It becomes more complicated as you need
to optimize the solution, especially when you have already installed
packages. This is unfortunately not as well discussed in the literature.
Pseudo-boolean SAT for optimization was argued to be too slow by the 0
install people, but OTOH, this seems to be what's used in conda, so who
knows :)

Where optimizing means something like "find a solution with the newest
possible releases of the required packages", not execution speed.

> If you SAT solver is in pure python, you can choose a "direction" of the
search which is more meaningful. I believe this is what 0install does from
reading http://0install.net/solver.html, and what we have in our own SAT
solver code. I unfortunately cannot look at the 0install code myself as it
is under the GPL and am working on a BSD solver implementation. I also do
not know how they handle updates and already installed packages.
>
>>
>> This only considers computation cost though.  Other factors can become
more expensive than computation.  For example, SAT solvers need all the
rules to consider.  So a SAT solution needs to effectively download the
full dependency graph before starting.  A backtracking dependency resolver
can just download packages or dependency information as it considers them.
The bandwidth cost for SAT solvers should be higher.
>
>
> With a reasonable representation, I think you can make it small enough.
To give an idea, our index @ Enthought containing around 20k packages takes
~340 kb compressed w/ bz2 if you only keep the data required for dependency
handling (name, version and runtime dependencies), and that's using json,
an inefficient encoding, so I suspect encoding all of pypi may be a few MB
only fetch, which is generally faster that doing tens of http requests.
>
> The libsvolv people worked on a binary representation that may also be
worth looking at.
>
>>
>> P.S.  If you'd like to talk off list, possibly over Skype, I'd be happy
to talk more with you and/or Robert about minutiae that others may not care
about.
>
>
> Sure, I would be happy too. As I mentioned before, we have some code
around a SAT-based solver, but it is not ready yet, which is why we kept it
private (https://github.com/enthought/sat-solver). It handles well (== both
speed and quality-wise) the case where nothing is installed, but behaves
poorly when packages are already installed, and does not handle the update
case yet. The code is also very prototype-ish, but is not too complicated
to experimente with it.
>
> David
>
>
> ___
> Distutils-SIG maillist  -  Distutils-SIG@python.org
> https://mail.python.org/mailman/listinfo/distutils-sig
>
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-16 Thread David Cournapeau
On Sat, May 16, 2015 at 11:36 PM, Justin Cappos  wrote:

> I am no expert, but I don't understand why backtracking algorithms would
>> to be faster than SAT, since they both potentially need to walk over the
>> full set of possible solutions. It is hard to reason about the cost because
>> the worst case is in theory growing exponentially in both cases.
>>
>
> This is talked about a bit in this thread:
> https://github.com/pypa/pip/issues/988
>
> Each algorithm could be computationally more efficient.  Basically, *if
> there are no conflicts* backtracking will certainly win.  If there are a
> huge number of conflicts a SAT solver will certainly win.  It's not clear
> where the tipping point is between the two schemes.
>
> However, a better question is does the computational difference matter?
> If one is a microsecond faster than the other, I don't think anyone cares.
> However, from the OPIUM paper (listed off of that thread), it is clear that
> SAT solver resolution can be slow without optimizations to make them work
> more like backtracking resolvers.  From my experience backtracking
> resolvers are also slow when the conflict rate is high.
>

Pure SAT is fast enough in practice in my experience (concretely: solving
thousand of rules takes < 1 sec). It becomes more complicated as you need
to optimize the solution, especially when you have already installed
packages. This is unfortunately not as well discussed in the literature.
Pseudo-boolean SAT for optimization was argued to be too slow by the 0
install people, but OTOH, this seems to be what's used in conda, so who
knows :)

If you SAT solver is in pure python, you can choose a "direction" of the
search which is more meaningful. I believe this is what 0install does from
reading http://0install.net/solver.html, and what we have in our own SAT
solver code. I unfortunately cannot look at the 0install code myself as it
is under the GPL and am working on a BSD solver implementation. I also do
not know how they handle updates and already installed packages.


> This only considers computation cost though.  Other factors can become
> more expensive than computation.  For example, SAT solvers need all the
> rules to consider.  So a SAT solution needs to effectively download the
> full dependency graph before starting.  A backtracking dependency resolver
> can just download packages or dependency information as it considers them.
> The bandwidth cost for SAT solvers should be higher.
>

With a reasonable representation, I think you can make it small enough. To
give an idea, our index @ Enthought containing around 20k packages takes
~340 kb compressed w/ bz2 if you only keep the data required for dependency
handling (name, version and runtime dependencies), and that's using json,
an inefficient encoding, so I suspect encoding all of pypi may be a few MB
only fetch, which is generally faster that doing tens of http requests.

The libsvolv people worked on a binary representation that may also be
worth looking at.


> P.S.  If you'd like to talk off list, possibly over Skype, I'd be happy to
> talk more with you and/or Robert about minutiae that others may not care
> about.
>

Sure, I would be happy too. As I mentioned before, we have some code around
a SAT-based solver, but it is not ready yet, which is why we kept it
private (https://github.com/enthought/sat-solver). It handles well (== both
speed and quality-wise) the case where nothing is installed, but behaves
poorly when packages are already installed, and does not handle the update
case yet. The code is also very prototype-ish, but is not too complicated
to experimente with it.

David
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-16 Thread Justin Cappos
>
> I am no expert, but I don't understand why backtracking algorithms would
> to be faster than SAT, since they both potentially need to walk over the
> full set of possible solutions. It is hard to reason about the cost because
> the worst case is in theory growing exponentially in both cases.
>

This is talked about a bit in this thread:
https://github.com/pypa/pip/issues/988

Each algorithm could be computationally more efficient.  Basically, *if
there are no conflicts* backtracking will certainly win.  If there are a
huge number of conflicts a SAT solver will certainly win.  It's not clear
where the tipping point is between the two schemes.

However, a better question is does the computational difference matter?  If
one is a microsecond faster than the other, I don't think anyone cares.
However, from the OPIUM paper (listed off of that thread), it is clear that
SAT solver resolution can be slow without optimizations to make them work
more like backtracking resolvers.  From my experience backtracking
resolvers are also slow when the conflict rate is high.

This only considers computation cost though.  Other factors can become more
expensive than computation.  For example, SAT solvers need all the rules to
consider.  So a SAT solution needs to effectively download the full
dependency graph before starting.  A backtracking dependency resolver can
just download packages or dependency information as it considers them.  The
bandwidth cost for SAT solvers should be higher.

Thanks,
Justin
P.S.  If you'd like to talk off list, possibly over Skype, I'd be happy to
talk more with you and/or Robert about minutiae that others may not care
about.
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-16 Thread Nick Coghlan
On 16 May 2015 at 11:52, Robert Collins  wrote:
> On 16 May 2015 at 13:45, Donald Stufft  wrote:
>>
>>> On May 15, 2015, at 9:22 PM, Robert Collins  
>>> wrote:
>>>
>>> On 16 May 2015 at 11:08, Marcus Smith  wrote:
 Why not start with pip at least being a "simple" fail-on-conflict resolver
 (vs the "1st found wins" resolver it is now)...

 You'd "backtrack" for the sake of re-walking when new constraints are 
 found,
 but not for the purpose of solving conflicts.

 I know you're motivated to solve Openstack build issues, but many of the
 issues I've seen in the pip tracker, I think would be solved without the
 backtracking resolver you're trying to build.
>>>
>>> Well, I'm scratching the itch I have. If its too hard to get something
>>> decent, sure I might back off in my goals, but I see no point aiming
>>> for something less than all the other language specific packaging
>>> systems out there have.
>>
>>
>> So what makes the other language specific packaging systems different? As far
>> as I know all of them have complete archives (e.g. they are like PyPI where 
>> they
>> have a lot of versions, not like Linux Distros). What can we learn from how 
>> they
>> solved this?
>
> NB; I have by no means finished low hanging heuristics and space
> trimming stuff :). I have some simple things in mind and am sure I'll
> end up with something 'good enough' for day to day use. The thing I'm
> worried about is the long term health of the approach.

Longer term, I think it makes sense to have the notion of "active" and
"obsolete" versions baked into PyPI's API and the web UI. This
wouldn't be baked into the package metadata itself (unlike the
proposed "Obsoleted-By" field for project renaming), but rather be a
dynamic reflection of whether or not *new* users should be looking at
the affected version, and whether or not it should be considered as a
candidate for dependency resolution when not specifically requested.
(This could also replace the current "hidden versions" feature, which
only hides things from the web UI, without having any impact on the
information published to automated tools through the programmatic API)

Tools that list outdated packages could also be simplified a bit, as
their first pass could just be to check the obsolescence markers on
installed packages, with the second pass being to check for newer
versions of those packages.

While the bare minimum would be to let project mantainers set the
obsolescence flag directly, we could also potentially offer projects
some automated obsolescence schemes, such as:

* single active released version, anything older is marked as obsolete
whenever a new (non pre-release) version is uploaded
* semantic versioning, with a given maximum number of active released
X versions (e.g. 2), but only the most recent (according to PEP 440)
released version with a given X.* is active, everything else is
obsolete
* CPython-style and date-based versioning, with a given maximum number
of active released X.Y versions (e.g. 2), but only the most recent
(according to PEP 440) released version with a given X.Y.* is active,
everything else is obsolete

Pre-release versions could also be automatically flagged as obsolete
by PyPI as soon as a newer version for the same release (including the
final release itself) was uploaded for the given package.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-15 Thread David Cournapeau
On Sat, May 16, 2015 at 10:52 AM, Robert Collins 
wrote:

> On 16 May 2015 at 13:45, Donald Stufft  wrote:
> >
> >> On May 15, 2015, at 9:22 PM, Robert Collins 
> wrote:
> >>
> >> On 16 May 2015 at 11:08, Marcus Smith  wrote:
> >>> Why not start with pip at least being a "simple" fail-on-conflict
> resolver
> >>> (vs the "1st found wins" resolver it is now)...
> >>>
> >>> You'd "backtrack" for the sake of re-walking when new constraints are
> found,
> >>> but not for the purpose of solving conflicts.
> >>>
> >>> I know you're motivated to solve Openstack build issues, but many of
> the
> >>> issues I've seen in the pip tracker, I think would be solved without
> the
> >>> backtracking resolver you're trying to build.
> >>
> >> Well, I'm scratching the itch I have. If its too hard to get something
> >> decent, sure I might back off in my goals, but I see no point aiming
> >> for something less than all the other language specific packaging
> >> systems out there have.
> >
> >
> > So what makes the other language specific packaging systems different?
> As far
> > as I know all of them have complete archives (e.g. they are like PyPI
> where they
> > have a lot of versions, not like Linux Distros). What can we learn from
> how they
> > solved this?
>
> NB; I have by no means finished low hanging heuristics and space
> trimming stuff :). I have some simple things in mind and am sure I'll
> end up with something 'good enough' for day to day use. The thing I'm
> worried about is the long term health of the approach.
>
> Good questions. Some of it is structural I suspect. A quick rundown.
> cabal (haskell) has a backtracking solver that accepts various
> parameters to tell it to try harder.
> javascript effectively vendors every dep ever, so you end up with many
> copies of the same library at different versions in the same process.
> rust's cargo system currently solves everything in a single project
> only - it has no binary packaging, only vendor-into-a-binary-build
> packaging.
> The gem behaviour I'm not yet familiar with.
> perl I used to know but time has eroded it :/.
>

FWIW, php uses a SAT-based solver in composer, which started as a port of
libsolv (the SAT solver used by openSUSE and soon Fedora).

I am no expert, but I don't understand why backtracking algorithms would to
be faster than SAT, since they both potentially need to walk over the full
set of possible solutions. It is hard to reason about the cost because the
worst case is in theory growing exponentially in both cases.

With a SAT-based algorithm for dependency resolution, it is relatively
simple to apply heuristics which massively prune the search space. For
example, when considering package A with say 10 potential versions A_1,
etc..., in theory, you need to generate the rules:

# - means not install, + means install
- A_1 | -  A_2
- A_1 | - A_3
...

and those constitute most of the rules in common cases. But it is possible
to tweak the SAT implementation to replace those rules by a single "AtMost
one of" rule per *package*, which means the #rules do not grow much by
versions.

The real difficulty of SAT-based solver is the optimization part: many
actually valid solutions are not acceptable, and that's where the
heuristics get more complicated.

David
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-15 Thread Robert Collins
On 16 May 2015 at 13:45, Donald Stufft  wrote:
>
>> On May 15, 2015, at 9:22 PM, Robert Collins  
>> wrote:
>>
>> On 16 May 2015 at 11:08, Marcus Smith  wrote:
>>> Why not start with pip at least being a "simple" fail-on-conflict resolver
>>> (vs the "1st found wins" resolver it is now)...
>>>
>>> You'd "backtrack" for the sake of re-walking when new constraints are found,
>>> but not for the purpose of solving conflicts.
>>>
>>> I know you're motivated to solve Openstack build issues, but many of the
>>> issues I've seen in the pip tracker, I think would be solved without the
>>> backtracking resolver you're trying to build.
>>
>> Well, I'm scratching the itch I have. If its too hard to get something
>> decent, sure I might back off in my goals, but I see no point aiming
>> for something less than all the other language specific packaging
>> systems out there have.
>
>
> So what makes the other language specific packaging systems different? As far
> as I know all of them have complete archives (e.g. they are like PyPI where 
> they
> have a lot of versions, not like Linux Distros). What can we learn from how 
> they
> solved this?

NB; I have by no means finished low hanging heuristics and space
trimming stuff :). I have some simple things in mind and am sure I'll
end up with something 'good enough' for day to day use. The thing I'm
worried about is the long term health of the approach.

Good questions. Some of it is structural I suspect. A quick rundown.
cabal (haskell) has a backtracking solver that accepts various
parameters to tell it to try harder.
javascript effectively vendors every dep ever, so you end up with many
copies of the same library at different versions in the same process.
rust's cargo system currently solves everything in a single project
only - it has no binary packaging, only vendor-into-a-binary-build
packaging.
The gem behaviour I'm not yet familiar with.
perl I used to know but time has eroded it :/.

-Rob

-- 
Robert Collins 
Distinguished Technologist
HP Converged Cloud
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-15 Thread Donald Stufft

> On May 15, 2015, at 9:22 PM, Robert Collins  wrote:
> 
> On 16 May 2015 at 11:08, Marcus Smith  wrote:
>> Why not start with pip at least being a "simple" fail-on-conflict resolver
>> (vs the "1st found wins" resolver it is now)...
>> 
>> You'd "backtrack" for the sake of re-walking when new constraints are found,
>> but not for the purpose of solving conflicts.
>> 
>> I know you're motivated to solve Openstack build issues, but many of the
>> issues I've seen in the pip tracker, I think would be solved without the
>> backtracking resolver you're trying to build.
> 
> Well, I'm scratching the itch I have. If its too hard to get something
> decent, sure I might back off in my goals, but I see no point aiming
> for something less than all the other language specific packaging
> systems out there have.


So what makes the other language specific packaging systems different? As far
as I know all of them have complete archives (e.g. they are like PyPI where they
have a lot of versions, not like Linux Distros). What can we learn from how they
solved this?

---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-15 Thread Robert Collins
On 16 May 2015 at 11:08, Marcus Smith  wrote:
> Why not start with pip at least being a "simple" fail-on-conflict resolver
> (vs the "1st found wins" resolver it is now)...
>
> You'd "backtrack" for the sake of re-walking when new constraints are found,
> but not for the purpose of solving conflicts.
>
> I know you're motivated to solve Openstack build issues, but many of the
> issues I've seen in the pip tracker, I think would be solved without the
> backtracking resolver you're trying to build.

Well, I'm scratching the itch I have. If its too hard to get something
decent, sure I might back off in my goals, but I see no point aiming
for something less than all the other language specific packaging
systems out there have.

-Rob


-- 
Robert Collins 
Distinguished Technologist
HP Converged Cloud
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-15 Thread Marcus Smith
Why not start with pip at least being a "simple" fail-on-conflict resolver
(vs the "1st found wins" resolver it is now)...

You'd "backtrack" for the sake of re-walking when new constraints are
found, but not for the purpose of solving conflicts.

I know you're motivated to solve Openstack build issues, but many of the
issues I've seen in the pip tracker, I think would be solved without the
backtracking resolver you're trying to build.

On Fri, May 15, 2015 at 11:57 AM, Robert Collins 
wrote:

> So, I am working on pip issue 988: pip doesn't resolve packages at all.
>
> This is O(packages^alternatives_per_package): if you are resolving 10
> packages with 10 versions each, there are approximately 10^10 or 10G
> combinations. 10 packages with 100 versions each - 10^100.
>
> So - its going to depend pretty heavily on some good heuristics in
> whatever final algorithm makes its way in, but the problem is
> exacerbated by PyPI's nature.
>
> Most Linux (all that i'm aware of) distributions have at most 5
> versions of a package to consider at any time - installed(might be
> None), current release, current release security updates, new release
> being upgraded to, new release being upgraded to's security updates.
> And their common worst case is actually 2 versions: installed==current
> release and one new release present. They map alternatives out into
> separate packages (e.g. when an older soname is deliberately kept
> across an ABI incompatibility, you end up with 2 packages, not 2
> versions of one package). To when comparing pip's challenge to apt's:
> apt has ~20-30K packages, with altnernatives ~= 2, or
> pip has ~60K packages, with alternatives ~= 5.7 (I asked dstufft)
>
> Scaling the number of packages is relatively easy; scaling the number
> of alternatives is harder. Even 300 packages (the dependency tree for
> openstack) is ~2.4T combinations to probe.
>
> I wonder if it makes sense to give some back-pressure to people, or at
> the very least encourage them to remove distributions that:
>  - they don't support anymore
>  - have security holes
>
> If folk consider PyPI a sort of historical archive then perhaps we
> could have a feature to select 'supported' versions by the author, and
> allow a query parameter to ask for all the versions.
>
> -Rob
>
> --
> Robert Collins 
> Distinguished Technologist
> HP Converged Cloud
> ___
> Distutils-SIG maillist  -  Distutils-SIG@python.org
> https://mail.python.org/mailman/listinfo/distutils-sig
>
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-15 Thread Robert Collins
On 16 May 2015 at 08:46, Justin Cappos  wrote:
>> Example: say I have an ecosystem of 10 packages. A-J. And they do a
>> release every 6 months that is guaranteed to work together, but every
>> time some issue occurs which ends up clamping the group together- e.g.
>> an external release breaks API and so A1s deps are disjoint with A2s,
>> and then the same between A2 and A3. Even though A1's API is
>> compatible with B2's: its not internal bad code, its just taking *one*
>> external dep breaking its API.
>>
>> After 2 releases you have 10^2 combinations, but only 4 are valid at
>> all. Thats 4%. 8 releases gets you 10^8, 8 valid combinations, or
>> 0.008%.
>
>
> Yes, so this would not be a situation where "conflicts do not exist (or are
> very rare)" as my post mentioned.  Is this rate of conflicts something you
> measured or is it a value you made up?

It's drawn from the concrete example of OpenStack, which has a single
group of co-installable releases that cluster together every 6 months.
I don't have the actual valid/invalid ratio there because I don't have
enough machines to calculate it:).

-Rob


-- 
Robert Collins 
Distinguished Technologist
HP Converged Cloud
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-15 Thread Justin Cappos
>
> Example: say I have an ecosystem of 10 packages. A-J. And they do a
> release every 6 months that is guaranteed to work together, but every
> time some issue occurs which ends up clamping the group together- e.g.
> an external release breaks API and so A1s deps are disjoint with A2s,
> and then the same between A2 and A3. Even though A1's API is
> compatible with B2's: its not internal bad code, its just taking *one*
> external dep breaking its API.
>
> After 2 releases you have 10^2 combinations, but only 4 are valid at
> all. Thats 4%. 8 releases gets you 10^8, 8 valid combinations, or
> 0.008%.


Yes, so this would not be a situation where "conflicts do not exist (or are
very rare)" as my post mentioned.  Is this rate of conflicts something you
measured or is it a value you made up?


I don't hear anyone arguing that the status quo makes sense.  I think we're
mostly just chatting about the right thing to optimize the solution for and
what sorts of short cuts may be useful (or even necessary).  Since we can
measure the actual conflict and other values in practice, data seems like
it may be a good path toward grounding the discussion...

Thanks,
Justin
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-15 Thread Robert Collins
On 16 May 2015 at 08:27, Jim Fulton  wrote:

>> If folk consider PyPI a sort of historical archive then perhaps we
>> could have a feature to select 'supported' versions by the author, and
>> allow a query parameter to ask for all the versions.
>
> You could simply limit the number of versions from PyPI
> you consider.

Yes - it would be nice IMO to give package authors some influence over
that though.

-Rob

-- 
Robert Collins 
Distinguished Technologist
HP Converged Cloud
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-15 Thread Jim Fulton
On Fri, May 15, 2015 at 2:57 PM, Robert Collins
 wrote:
> So, I am working on pip issue 988: pip doesn't resolve packages at all.
>
> This is O(packages^alternatives_per_package): if you are resolving 10
> packages with 10 versions each, there are approximately 10^10 or 10G
> combinations. 10 packages with 100 versions each - 10^100.
>
> So - its going to depend pretty heavily on some good heuristics in
> whatever final algorithm makes its way in, but the problem is
> exacerbated by PyPI's nature.
>
> Most Linux (all that i'm aware of) distributions have at most 5
> versions of a package to consider at any time - installed(might be
> None), current release, current release security updates, new release
> being upgraded to, new release being upgraded to's security updates.
> And their common worst case is actually 2 versions: installed==current
> release and one new release present. They map alternatives out into
> separate packages (e.g. when an older soname is deliberately kept
> across an ABI incompatibility, you end up with 2 packages, not 2
> versions of one package). To when comparing pip's challenge to apt's:
> apt has ~20-30K packages, with altnernatives ~= 2, or
> pip has ~60K packages, with alternatives ~= 5.7 (I asked dstufft)
>
> Scaling the number of packages is relatively easy; scaling the number
> of alternatives is harder. Even 300 packages (the dependency tree for
> openstack) is ~2.4T combinations to probe.
>
> I wonder if it makes sense to give some back-pressure to people, or at
> the very least encourage them to remove distributions that:
>  - they don't support anymore
>  - have security holes
>
> If folk consider PyPI a sort of historical archive then perhaps we
> could have a feature to select 'supported' versions by the author, and
> allow a query parameter to ask for all the versions.

You could simply limit the number of versions from PyPI
you consider.

Jim

-- 
Jim Fulton
http://www.linkedin.com/in/jimfulton
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-15 Thread Robert Collins
On 16 May 2015 at 07:18, Justin Cappos  wrote:
> One thing to consider is that if conflicts do not exist (or are very rare),
> the number of possible combinations is a moot point.  A greedy algorithm for
> installation (which just chooses the most favored package to resolve each
> dependency) will run in linear time with the number of packages it would
> install, if no conflicts exist.
>
> So, what you are saying about state exploration may be true for a resolver
> that uses something like a SAT solver, but doesn't apply to backtracking
> dependency resolution (unless a huge number of conflicts occur) or simple
> dependency resolution (at all).  SAT solvers do have heuristics to avoid
> this blow up, except in pathological cases.  However, simple / backtracking
> dependency resolution systems have the further advantage of not needing to
> request unneeded metadata in the first place...

Your intuition here is misleading, sorry :(.

You're right about 'hey if everything fits its linear', but the reason
we have this bug open, with people adding a new example of it every
week or so (as someone who didn't realise that pip doesn't resolve
finds out and adds it to pip's issue tracker, only to have it made
into a dupe).

Backtracking recursive resolvers have exactly the same O as SAT.

Example: say I have an ecosystem of 10 packages. A-J. And they do a
release every 6 months that is guaranteed to work together, but every
time some issue occurs which ends up clamping the group together- e.g.
an external release breaks API and so A1s deps are disjoint with A2s,
and then the same between A2 and A3. Even though A1's API is
compatible with B2's: its not internal bad code, its just taking *one*
external dep breaking its API.

After 2 releases you have 10^2 combinations, but only 4 are valid at
all. Thats 4%. 8 releases gets you 10^8, 8 valid combinations, or
0.008%.

Now there are two things to examine here. How likely is this to happen
to PyPI users, and can a backtracker (which btw is what my code is)
handle this better than a SAT solver.

In terms of likelyhood - OpenStack hits this every release. Its not
that our libraries are incompatible with each other, its that given
250 packages (the 200 in error I quoted just shows that the resolver
hadn't obtained version data for everything), *something* breaks API
compat in each 6 month release cycle, and so you end up with the whole
set effectively locking together. In fact, it has happened so
consistently to OpenStack that we now release our libraries with
closed specifiers : >=min_version, 3.0 (and we have releases
1,2,3,4 of A through J).

Now, for a_version in A: is going to have the installed version of A
in its first step. B likewise. So we'll end up with a trace like:
A==3
B==3
C==3
D==3
E==3
F==3
G==3
H==3
I==3
J==3 error, backtrack (user specifier)
J==4 error, backtracks once the external deps are considered and the
conflict is found
J==2 error, backtrack (user specifier)
J==1 error, backtrack (user specifier)
I==4
J==3 error, backtrack (user specifier)
J==4 error, backtracks somewere in the external deps
J==2, error, backtrack (user specifier)

and so on, until we finally tick over to A==4.

More generally, any already installed version (without -U) can cause a
backtracking resolver to try *all* possible other combinations before
realising that that installed version is the problem and bumping it. A
heuristic to look for those and bump them first then hits molasses as
soon as one of the installed versions needs to be kept as-is.

Anyhow, my goal here is to start the conversation; pip will need some
knobs because no matter how good the heuristics users will need escape
hatches. (One of which is to fully specify their needs).

-Rob


-- 
Robert Collins 
Distinguished Technologist
HP Converged Cloud
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-15 Thread Noah Kantrowitz

On May 15, 2015, at 9:19 PM, Donald Stufft  wrote:

>> 
>> On May 15, 2015, at 2:57 PM, Robert Collins  
>> wrote:
>> 
>> So, I am working on pip issue 988: pip doesn't resolve packages at all.
>> 
>> This is O(packages^alternatives_per_package): if you are resolving 10
>> packages with 10 versions each, there are approximately 10^10 or 10G
>> combinations. 10 packages with 100 versions each - 10^100.
>> 
>> So - its going to depend pretty heavily on some good heuristics in
>> whatever final algorithm makes its way in, but the problem is
>> exacerbated by PyPI's nature.
>> 
>> Most Linux (all that i'm aware of) distributions have at most 5
>> versions of a package to consider at any time - installed(might be
>> None), current release, current release security updates, new release
>> being upgraded to, new release being upgraded to's security updates.
>> And their common worst case is actually 2 versions: installed==current
>> release and one new release present. They map alternatives out into
>> separate packages (e.g. when an older soname is deliberately kept
>> across an ABI incompatibility, you end up with 2 packages, not 2
>> versions of one package). To when comparing pip's challenge to apt's:
>> apt has ~20-30K packages, with altnernatives ~= 2, or
>> pip has ~60K packages, with alternatives ~= 5.7 (I asked dstufft)
>> 
>> Scaling the number of packages is relatively easy; scaling the number
>> of alternatives is harder. Even 300 packages (the dependency tree for
>> openstack) is ~2.4T combinations to probe.
>> 
>> I wonder if it makes sense to give some back-pressure to people, or at
>> the very least encourage them to remove distributions that:
>> - they don't support anymore
>> - have security holes
>> 
>> If folk consider PyPI a sort of historical archive then perhaps we
>> could have a feature to select 'supported' versions by the author, and
>> allow a query parameter to ask for all the versions.
>> 
> 
> There have been a handful of projects which would only keep the latest N
> versions uploaded to PyPI. I know this primarily because it has caused
> people a decent amount of pain over time. It’s common for deployments people
> have to use a requirements.txt file like ``foo==1.0`` and to just continue
> to pull from PyPI. Deleting the old files breaks anyone doing that, so it 
> would
> require either having people bundle their deps in their repositories or
> some way to get at those old versions. Personally I think that we shouldn’t
> go deleting the old versions or encouraging people to do that.

+1 for this. While I appreciate why Linux distress purge old versions, it is 
absolutely hellish for reproducibility. If you are looking for prior art, check 
out the Molinillo project (https://github.com/CocoaPods/Molinillo) used by 
Bundler and CocoaPods. It is not as complex as the Solve gem used in Chef but 
offers a good balance of performance in satisfying constraints and false 
negatives on solution failures.

--Noah



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-15 Thread Robert Collins
On 16 May 2015 at 07:19, Donald Stufft  wrote:

> There have been a handful of projects which would only keep the latest N
> versions uploaded to PyPI. I know this primarily because it has caused
> people a decent amount of pain over time. It’s common for deployments people
> have to use a requirements.txt file like ``foo==1.0`` and to just continue
> to pull from PyPI. Deleting the old files breaks anyone doing that, so it 
> would
> require either having people bundle their deps in their repositories or
> some way to get at those old versions. Personally I think that we shouldn’t
> go deleting the old versions or encouraging people to do that.

I think 'most recent only' is too much. Most upstreams will support
more than one release. Like - I don't care what testtools release you
use.

OTOH, every version with distinct dependencies becomes a very
expensive liability to the ecosystem here. It's beyond human scale,
and well in the territory of argh wtf the universe is burning around
me and my tardis has run out of power.

I'm sure we can provide an escape hatch in pip (and I'm going to do
that in my branch soon - offering simple 'error on conflict' and 'use
first seen specifier only' strategies) while folk work on different
heuristics - the actual resolver is only ~100 LOC in my branch today -
the rest is refactoring (that can be made better and I plan to do so
before suggesting we merge it).

But a significant contributing factor is the O of the problem, and we
can do something about that. I don't know what exactly, and I think
we're going to need to have our creative caps firmly on to come up
with something meeting the broad needs of the ecosystem: which
includes pip Just Working.

-Rob

-- 
Robert Collins 
Distinguished Technologist
HP Converged Cloud
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-15 Thread Ben Finney
Donald Stufft  writes:

> > On May 15, 2015, at 2:57 PM, Robert Collins  
> > wrote:
> > 
> > If folk consider PyPI a sort of historical archive then perhaps we
> > could have a feature to select 'supported' versions by the author,
> > and allow a query parameter to ask for all the versions.
> > 
>
> It’s common for deployments people have to use a requirements.txt file
> like ``foo==1.0`` and to just continue to pull from PyPI. Deleting the
> old files breaks anyone doing that, so it would require either having
> people bundle their deps in their repositories or some way to get at
> those old versions. Personally I think that we shouldn’t go deleting
> the old versions or encouraging people to do that.

Yes, it's common to consider PyPI as a repository of all versions ever
released, and to treat it as an archive whose URLs will continue to make
available the historical versions.

-- 
 \ “If history and science have taught us anything, it is that |
  `\ passion and desire are not the same as truth.” —E. O. Wilson, |
_o__)  _Consilience_, 1998 |
Ben Finney

___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-15 Thread Robert Collins
On 16 May 2015 at 06:57, Robert Collins  wrote:
> So, I am working on pip issue 988: pip doesn't resolve packages at all.
>
> This is O(packages^alternatives_per_package): if you are resolving 10
...
> Scaling the number of packages is relatively easy; scaling the number
> of alternatives is harder. Even 300 packages (the dependency tree for
> openstack) is ~2.4T combinations to probe.

I added a check for the exact number (when the current step limit is hit):
Hit step limit during resolving,
22493640689038530013767184665222125808455708963348534886974974630893524036813561125576881299950281714638872640331745747555743820280235291929928862660035516365300612827387994788286647556890876840654454905860390366740480.00
from 4038 versions in 205 packages after 10 steps

Which indicates a alternatives factor of ~20. And AIUI PyPI has a long
tail itself, so its more common that folk will see > 5.7 factors,
rather than less common.

-Rob

-- 
Robert Collins 
Distinguished Technologist
HP Converged Cloud
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-15 Thread Donald Stufft

> On May 15, 2015, at 2:57 PM, Robert Collins  wrote:
> 
> So, I am working on pip issue 988: pip doesn't resolve packages at all.
> 
> This is O(packages^alternatives_per_package): if you are resolving 10
> packages with 10 versions each, there are approximately 10^10 or 10G
> combinations. 10 packages with 100 versions each - 10^100.
> 
> So - its going to depend pretty heavily on some good heuristics in
> whatever final algorithm makes its way in, but the problem is
> exacerbated by PyPI's nature.
> 
> Most Linux (all that i'm aware of) distributions have at most 5
> versions of a package to consider at any time - installed(might be
> None), current release, current release security updates, new release
> being upgraded to, new release being upgraded to's security updates.
> And their common worst case is actually 2 versions: installed==current
> release and one new release present. They map alternatives out into
> separate packages (e.g. when an older soname is deliberately kept
> across an ABI incompatibility, you end up with 2 packages, not 2
> versions of one package). To when comparing pip's challenge to apt's:
> apt has ~20-30K packages, with altnernatives ~= 2, or
> pip has ~60K packages, with alternatives ~= 5.7 (I asked dstufft)
> 
> Scaling the number of packages is relatively easy; scaling the number
> of alternatives is harder. Even 300 packages (the dependency tree for
> openstack) is ~2.4T combinations to probe.
> 
> I wonder if it makes sense to give some back-pressure to people, or at
> the very least encourage them to remove distributions that:
> - they don't support anymore
> - have security holes
> 
> If folk consider PyPI a sort of historical archive then perhaps we
> could have a feature to select 'supported' versions by the author, and
> allow a query parameter to ask for all the versions.
> 

There have been a handful of projects which would only keep the latest N
versions uploaded to PyPI. I know this primarily because it has caused
people a decent amount of pain over time. It’s common for deployments people
have to use a requirements.txt file like ``foo==1.0`` and to just continue
to pull from PyPI. Deleting the old files breaks anyone doing that, so it would
require either having people bundle their deps in their repositories or
some way to get at those old versions. Personally I think that we shouldn’t
go deleting the old versions or encouraging people to do that.

---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-15 Thread Justin Cappos
One thing to consider is that if conflicts do not exist (or are very rare),
the number of possible combinations is a moot point.  A greedy algorithm
for installation (which just chooses the most favored package to resolve
each dependency) will run in linear time with the number of packages it
would install, if no conflicts exist.

So, what you are saying about state exploration may be true for a resolver
that uses something like a SAT solver, but doesn't apply to backtracking
dependency resolution (unless a huge number of conflicts occur) or simple
dependency resolution (at all).  SAT solvers do have heuristics to avoid
this blow up, except in pathological cases.  However, simple / backtracking
dependency resolution systems have the further advantage of not needing to
request unneeded metadata in the first place...

Thanks,
Justin

On Fri, May 15, 2015 at 2:57 PM, Robert Collins 
wrote:

> So, I am working on pip issue 988: pip doesn't resolve packages at all.
>
> This is O(packages^alternatives_per_package): if you are resolving 10
> packages with 10 versions each, there are approximately 10^10 or 10G
> combinations. 10 packages with 100 versions each - 10^100.
>
> So - its going to depend pretty heavily on some good heuristics in
> whatever final algorithm makes its way in, but the problem is
> exacerbated by PyPI's nature.
>
> Most Linux (all that i'm aware of) distributions have at most 5
> versions of a package to consider at any time - installed(might be
> None), current release, current release security updates, new release
> being upgraded to, new release being upgraded to's security updates.
> And their common worst case is actually 2 versions: installed==current
> release and one new release present. They map alternatives out into
> separate packages (e.g. when an older soname is deliberately kept
> across an ABI incompatibility, you end up with 2 packages, not 2
> versions of one package). To when comparing pip's challenge to apt's:
> apt has ~20-30K packages, with altnernatives ~= 2, or
> pip has ~60K packages, with alternatives ~= 5.7 (I asked dstufft)
>
> Scaling the number of packages is relatively easy; scaling the number
> of alternatives is harder. Even 300 packages (the dependency tree for
> openstack) is ~2.4T combinations to probe.
>
> I wonder if it makes sense to give some back-pressure to people, or at
> the very least encourage them to remove distributions that:
>  - they don't support anymore
>  - have security holes
>
> If folk consider PyPI a sort of historical archive then perhaps we
> could have a feature to select 'supported' versions by the author, and
> allow a query parameter to ask for all the versions.
>
> -Rob
>
> --
> Robert Collins 
> Distinguished Technologist
> HP Converged Cloud
> ___
> Distutils-SIG maillist  -  Distutils-SIG@python.org
> https://mail.python.org/mailman/listinfo/distutils-sig
>
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


[Distutils] PyPI is a sick sick hoarder

2015-05-15 Thread Robert Collins
So, I am working on pip issue 988: pip doesn't resolve packages at all.

This is O(packages^alternatives_per_package): if you are resolving 10
packages with 10 versions each, there are approximately 10^10 or 10G
combinations. 10 packages with 100 versions each - 10^100.

So - its going to depend pretty heavily on some good heuristics in
whatever final algorithm makes its way in, but the problem is
exacerbated by PyPI's nature.

Most Linux (all that i'm aware of) distributions have at most 5
versions of a package to consider at any time - installed(might be
None), current release, current release security updates, new release
being upgraded to, new release being upgraded to's security updates.
And their common worst case is actually 2 versions: installed==current
release and one new release present. They map alternatives out into
separate packages (e.g. when an older soname is deliberately kept
across an ABI incompatibility, you end up with 2 packages, not 2
versions of one package). To when comparing pip's challenge to apt's:
apt has ~20-30K packages, with altnernatives ~= 2, or
pip has ~60K packages, with alternatives ~= 5.7 (I asked dstufft)

Scaling the number of packages is relatively easy; scaling the number
of alternatives is harder. Even 300 packages (the dependency tree for
openstack) is ~2.4T combinations to probe.

I wonder if it makes sense to give some back-pressure to people, or at
the very least encourage them to remove distributions that:
 - they don't support anymore
 - have security holes

If folk consider PyPI a sort of historical archive then perhaps we
could have a feature to select 'supported' versions by the author, and
allow a query parameter to ask for all the versions.

-Rob

-- 
Robert Collins 
Distinguished Technologist
HP Converged Cloud
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig