Re: [Distutils] PyPI is a sick sick hoarder

2015-05-16 Thread David Cournapeau
On Sat, May 16, 2015 at 11:36 PM, Justin Cappos jcap...@nyu.edu wrote:

 I am no expert, but I don't understand why backtracking algorithms would
 to be faster than SAT, since they both potentially need to walk over the
 full set of possible solutions. It is hard to reason about the cost because
 the worst case is in theory growing exponentially in both cases.


 This is talked about a bit in this thread:
 https://github.com/pypa/pip/issues/988

 Each algorithm could be computationally more efficient.  Basically, *if
 there are no conflicts* backtracking will certainly win.  If there are a
 huge number of conflicts a SAT solver will certainly win.  It's not clear
 where the tipping point is between the two schemes.

 However, a better question is does the computational difference matter?
 If one is a microsecond faster than the other, I don't think anyone cares.
 However, from the OPIUM paper (listed off of that thread), it is clear that
 SAT solver resolution can be slow without optimizations to make them work
 more like backtracking resolvers.  From my experience backtracking
 resolvers are also slow when the conflict rate is high.


Pure SAT is fast enough in practice in my experience (concretely: solving
thousand of rules takes  1 sec). It becomes more complicated as you need
to optimize the solution, especially when you have already installed
packages. This is unfortunately not as well discussed in the literature.
Pseudo-boolean SAT for optimization was argued to be too slow by the 0
install people, but OTOH, this seems to be what's used in conda, so who
knows :)

If you SAT solver is in pure python, you can choose a direction of the
search which is more meaningful. I believe this is what 0install does from
reading http://0install.net/solver.html, and what we have in our own SAT
solver code. I unfortunately cannot look at the 0install code myself as it
is under the GPL and am working on a BSD solver implementation. I also do
not know how they handle updates and already installed packages.


 This only considers computation cost though.  Other factors can become
 more expensive than computation.  For example, SAT solvers need all the
 rules to consider.  So a SAT solution needs to effectively download the
 full dependency graph before starting.  A backtracking dependency resolver
 can just download packages or dependency information as it considers them.
 The bandwidth cost for SAT solvers should be higher.


With a reasonable representation, I think you can make it small enough. To
give an idea, our index @ Enthought containing around 20k packages takes
~340 kb compressed w/ bz2 if you only keep the data required for dependency
handling (name, version and runtime dependencies), and that's using json,
an inefficient encoding, so I suspect encoding all of pypi may be a few MB
only fetch, which is generally faster that doing tens of http requests.

The libsvolv people worked on a binary representation that may also be
worth looking at.


 P.S.  If you'd like to talk off list, possibly over Skype, I'd be happy to
 talk more with you and/or Robert about minutiae that others may not care
 about.


Sure, I would be happy too. As I mentioned before, we have some code around
a SAT-based solver, but it is not ready yet, which is why we kept it
private (https://github.com/enthought/sat-solver). It handles well (== both
speed and quality-wise) the case where nothing is installed, but behaves
poorly when packages are already installed, and does not handle the update
case yet. The code is also very prototype-ish, but is not too complicated
to experimente with it.

David
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-16 Thread Daniel Holth
On May 16, 2015 11:22 AM, David Cournapeau courn...@gmail.com wrote:



 On Sat, May 16, 2015 at 11:36 PM, Justin Cappos jcap...@nyu.edu wrote:

 I am no expert, but I don't understand why backtracking algorithms
would to be faster than SAT, since they both potentially need to walk over
the full set of possible solutions. It is hard to reason about the cost
because the worst case is in theory growing exponentially in both cases.


 This is talked about a bit in this thread:
https://github.com/pypa/pip/issues/988

 Each algorithm could be computationally more efficient.  Basically, *if
there are no conflicts* backtracking will certainly win.  If there are a
huge number of conflicts a SAT solver will certainly win.  It's not clear
where the tipping point is between the two schemes.

 However, a better question is does the computational difference matter?
If one is a microsecond faster than the other, I don't think anyone cares.
However, from the OPIUM paper (listed off of that thread), it is clear that
SAT solver resolution can be slow without optimizations to make them work
more like backtracking resolvers.  From my experience backtracking
resolvers are also slow when the conflict rate is high.


 Pure SAT is fast enough in practice in my experience (concretely: solving
thousand of rules takes  1 sec). It becomes more complicated as you need
to optimize the solution, especially when you have already installed
packages. This is unfortunately not as well discussed in the literature.
Pseudo-boolean SAT for optimization was argued to be too slow by the 0
install people, but OTOH, this seems to be what's used in conda, so who
knows :)

Where optimizing means something like find a solution with the newest
possible releases of the required packages, not execution speed.

 If you SAT solver is in pure python, you can choose a direction of the
search which is more meaningful. I believe this is what 0install does from
reading http://0install.net/solver.html, and what we have in our own SAT
solver code. I unfortunately cannot look at the 0install code myself as it
is under the GPL and am working on a BSD solver implementation. I also do
not know how they handle updates and already installed packages.


 This only considers computation cost though.  Other factors can become
more expensive than computation.  For example, SAT solvers need all the
rules to consider.  So a SAT solution needs to effectively download the
full dependency graph before starting.  A backtracking dependency resolver
can just download packages or dependency information as it considers them.
The bandwidth cost for SAT solvers should be higher.


 With a reasonable representation, I think you can make it small enough.
To give an idea, our index @ Enthought containing around 20k packages takes
~340 kb compressed w/ bz2 if you only keep the data required for dependency
handling (name, version and runtime dependencies), and that's using json,
an inefficient encoding, so I suspect encoding all of pypi may be a few MB
only fetch, which is generally faster that doing tens of http requests.

 The libsvolv people worked on a binary representation that may also be
worth looking at.


 P.S.  If you'd like to talk off list, possibly over Skype, I'd be happy
to talk more with you and/or Robert about minutiae that others may not care
about.


 Sure, I would be happy too. As I mentioned before, we have some code
around a SAT-based solver, but it is not ready yet, which is why we kept it
private (https://github.com/enthought/sat-solver). It handles well (== both
speed and quality-wise) the case where nothing is installed, but behaves
poorly when packages are already installed, and does not handle the update
case yet. The code is also very prototype-ish, but is not too complicated
to experimente with it.

 David


 ___
 Distutils-SIG maillist  -  Distutils-SIG@python.org
 https://mail.python.org/mailman/listinfo/distutils-sig

___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-16 Thread Nick Coghlan
On 17 May 2015 at 00:36, Justin Cappos jcap...@nyu.edu wrote:
 This only considers computation cost though.  Other factors can become more
 expensive than computation.  For example, SAT solvers need all the rules to
 consider.  So a SAT solution needs to effectively download the full
 dependency graph before starting.  A backtracking dependency resolver can
 just download packages or dependency information as it considers them.

This is the defining consideration for pip at this point: a SAT solver
requires publication of static dependency metadata on PyPI, which is
dependent on both the Warehouse migration *and* the completion and
acceptance of PEP 426. Propagation out to PyPI caching proxies and
mirrors like devpi and the pulp-python plugin will then take even
longer.

A backtracking resolver doesn't have those gating dependencies, as it
can tolerate the current dynamic metadata model.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-16 Thread Justin Cappos

 I am no expert, but I don't understand why backtracking algorithms would
 to be faster than SAT, since they both potentially need to walk over the
 full set of possible solutions. It is hard to reason about the cost because
 the worst case is in theory growing exponentially in both cases.


This is talked about a bit in this thread:
https://github.com/pypa/pip/issues/988

Each algorithm could be computationally more efficient.  Basically, *if
there are no conflicts* backtracking will certainly win.  If there are a
huge number of conflicts a SAT solver will certainly win.  It's not clear
where the tipping point is between the two schemes.

However, a better question is does the computational difference matter?  If
one is a microsecond faster than the other, I don't think anyone cares.
However, from the OPIUM paper (listed off of that thread), it is clear that
SAT solver resolution can be slow without optimizations to make them work
more like backtracking resolvers.  From my experience backtracking
resolvers are also slow when the conflict rate is high.

This only considers computation cost though.  Other factors can become more
expensive than computation.  For example, SAT solvers need all the rules to
consider.  So a SAT solution needs to effectively download the full
dependency graph before starting.  A backtracking dependency resolver can
just download packages or dependency information as it considers them.  The
bandwidth cost for SAT solvers should be higher.

Thanks,
Justin
P.S.  If you'd like to talk off list, possibly over Skype, I'd be happy to
talk more with you and/or Robert about minutiae that others may not care
about.
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-16 Thread David Cournapeau
On Sun, May 17, 2015 at 12:40 AM, Daniel Holth dho...@gmail.com wrote:


 On May 16, 2015 11:22 AM, David Cournapeau courn...@gmail.com wrote:
 
 
 
  On Sat, May 16, 2015 at 11:36 PM, Justin Cappos jcap...@nyu.edu wrote:
 
  I am no expert, but I don't understand why backtracking algorithms
 would to be faster than SAT, since they both potentially need to walk over
 the full set of possible solutions. It is hard to reason about the cost
 because the worst case is in theory growing exponentially in both cases.
 
 
  This is talked about a bit in this thread:
 https://github.com/pypa/pip/issues/988
 
  Each algorithm could be computationally more efficient.  Basically, *if
 there are no conflicts* backtracking will certainly win.  If there are a
 huge number of conflicts a SAT solver will certainly win.  It's not clear
 where the tipping point is between the two schemes.
 
  However, a better question is does the computational difference
 matter?  If one is a microsecond faster than the other, I don't think
 anyone cares.  However, from the OPIUM paper (listed off of that thread),
 it is clear that SAT solver resolution can be slow without optimizations to
 make them work more like backtracking resolvers.  From my experience
 backtracking resolvers are also slow when the conflict rate is high.
 
 
  Pure SAT is fast enough in practice in my experience (concretely:
 solving thousand of rules takes  1 sec). It becomes more complicated as
 you need to optimize the solution, especially when you have already
 installed packages. This is unfortunately not as well discussed in the
 literature. Pseudo-boolean SAT for optimization was argued to be too slow
 by the 0 install people, but OTOH, this seems to be what's used in conda,
 so who knows :)

 Where optimizing means something like find a solution with the newest
 possible releases of the required packages, not execution speed.


Indeed, it was not obvious in this context :) Though in theory,
optimization is more general. It could be optimizing w.r.t. a cost function
taking into account #packages, download size, minimal number of changes,
etc... This is where you want a pseudo-boolean SAT, which is what conda
uses I think.

0install, composer and I believe libsolv took a different route, and use
heuristics to find a reasonably good solution by picking the next
candidate. This requires access to the internals of the SAT solver though
(not a problem if you have a python implementation).

David

  If you SAT solver is in pure python, you can choose a direction of the
 search which is more meaningful. I believe this is what 0install does from
 reading http://0install.net/solver.html, and what we have in our own SAT
 solver code. I unfortunately cannot look at the 0install code myself as it
 is under the GPL and am working on a BSD solver implementation. I also do
 not know how they handle updates and already installed packages.
 
 
  This only considers computation cost though.  Other factors can become
 more expensive than computation.  For example, SAT solvers need all the
 rules to consider.  So a SAT solution needs to effectively download the
 full dependency graph before starting.  A backtracking dependency resolver
 can just download packages or dependency information as it considers them.
 The bandwidth cost for SAT solvers should be higher.
 
 
  With a reasonable representation, I think you can make it small enough.
 To give an idea, our index @ Enthought containing around 20k packages takes
 ~340 kb compressed w/ bz2 if you only keep the data required for dependency
 handling (name, version and runtime dependencies), and that's using json,
 an inefficient encoding, so I suspect encoding all of pypi may be a few MB
 only fetch, which is generally faster that doing tens of http requests.
 
  The libsvolv people worked on a binary representation that may also be
 worth looking at.
 
 
  P.S.  If you'd like to talk off list, possibly over Skype, I'd be happy
 to talk more with you and/or Robert about minutiae that others may not care
 about.
 
 
  Sure, I would be happy too. As I mentioned before, we have some code
 around a SAT-based solver, but it is not ready yet, which is why we kept it
 private (https://github.com/enthought/sat-solver). It handles well (==
 both speed and quality-wise) the case where nothing is installed, but
 behaves poorly when packages are already installed, and does not handle the
 update case yet. The code is also very prototype-ish, but is not too
 complicated to experimente with it.
 
  David
 
 
  ___
  Distutils-SIG maillist  -  Distutils-SIG@python.org
  https://mail.python.org/mailman/listinfo/distutils-sig
 

___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-16 Thread Donald Stufft

 On May 16, 2015, at 1:24 PM, Nick Coghlan ncogh...@gmail.com wrote:
 
 On 17 May 2015 at 00:36, Justin Cappos jcap...@nyu.edu wrote:
 This only considers computation cost though.  Other factors can become more
 expensive than computation.  For example, SAT solvers need all the rules to
 consider.  So a SAT solution needs to effectively download the full
 dependency graph before starting.  A backtracking dependency resolver can
 just download packages or dependency information as it considers them.
 
 This is the defining consideration for pip at this point: a SAT solver
 requires publication of static dependency metadata on PyPI, which is
 dependent on both the Warehouse migration *and* the completion and
 acceptance of PEP 426. Propagation out to PyPI caching proxies and
 mirrors like devpi and the pulp-python plugin will then take even
 longer.
 
 A backtracking resolver doesn't have those gating dependencies, as it
 can tolerate the current dynamic metadata model.
 


Even when we have Warehouse and PEP 426, that only gives us that data going
forward, the 400k files that currently exist on PyPI still won’t have static
metadata. We could parse it out for Wheels but not for anything else. For the
foreseeable future any solution will need to be able to handle iteratively
finding constraints. Though I think a SAT solver can do it if it can handle
incremental solving or just by re-doing the SAT problem each time we discover
a new constraint.


---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-16 Thread Nick Coghlan
On 16 May 2015 at 11:52, Robert Collins robe...@robertcollins.net wrote:
 On 16 May 2015 at 13:45, Donald Stufft don...@stufft.io wrote:

 On May 15, 2015, at 9:22 PM, Robert Collins robe...@robertcollins.net 
 wrote:

 On 16 May 2015 at 11:08, Marcus Smith qwc...@gmail.com wrote:
 Why not start with pip at least being a simple fail-on-conflict resolver
 (vs the 1st found wins resolver it is now)...

 You'd backtrack for the sake of re-walking when new constraints are 
 found,
 but not for the purpose of solving conflicts.

 I know you're motivated to solve Openstack build issues, but many of the
 issues I've seen in the pip tracker, I think would be solved without the
 backtracking resolver you're trying to build.

 Well, I'm scratching the itch I have. If its too hard to get something
 decent, sure I might back off in my goals, but I see no point aiming
 for something less than all the other language specific packaging
 systems out there have.


 So what makes the other language specific packaging systems different? As far
 as I know all of them have complete archives (e.g. they are like PyPI where 
 they
 have a lot of versions, not like Linux Distros). What can we learn from how 
 they
 solved this?

 NB; I have by no means finished low hanging heuristics and space
 trimming stuff :). I have some simple things in mind and am sure I'll
 end up with something 'good enough' for day to day use. The thing I'm
 worried about is the long term health of the approach.

Longer term, I think it makes sense to have the notion of active and
obsolete versions baked into PyPI's API and the web UI. This
wouldn't be baked into the package metadata itself (unlike the
proposed Obsoleted-By field for project renaming), but rather be a
dynamic reflection of whether or not *new* users should be looking at
the affected version, and whether or not it should be considered as a
candidate for dependency resolution when not specifically requested.
(This could also replace the current hidden versions feature, which
only hides things from the web UI, without having any impact on the
information published to automated tools through the programmatic API)

Tools that list outdated packages could also be simplified a bit, as
their first pass could just be to check the obsolescence markers on
installed packages, with the second pass being to check for newer
versions of those packages.

While the bare minimum would be to let project mantainers set the
obsolescence flag directly, we could also potentially offer projects
some automated obsolescence schemes, such as:

* single active released version, anything older is marked as obsolete
whenever a new (non pre-release) version is uploaded
* semantic versioning, with a given maximum number of active released
X versions (e.g. 2), but only the most recent (according to PEP 440)
released version with a given X.* is active, everything else is
obsolete
* CPython-style and date-based versioning, with a given maximum number
of active released X.Y versions (e.g. 2), but only the most recent
(according to PEP 440) released version with a given X.Y.* is active,
everything else is obsolete

Pre-release versions could also be automatically flagged as obsolete
by PyPI as soon as a newer version for the same release (including the
final release itself) was uploaded for the given package.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-15 Thread Marcus Smith
Why not start with pip at least being a simple fail-on-conflict resolver
(vs the 1st found wins resolver it is now)...

You'd backtrack for the sake of re-walking when new constraints are
found, but not for the purpose of solving conflicts.

I know you're motivated to solve Openstack build issues, but many of the
issues I've seen in the pip tracker, I think would be solved without the
backtracking resolver you're trying to build.

On Fri, May 15, 2015 at 11:57 AM, Robert Collins robe...@robertcollins.net
wrote:

 So, I am working on pip issue 988: pip doesn't resolve packages at all.

 This is O(packages^alternatives_per_package): if you are resolving 10
 packages with 10 versions each, there are approximately 10^10 or 10G
 combinations. 10 packages with 100 versions each - 10^100.

 So - its going to depend pretty heavily on some good heuristics in
 whatever final algorithm makes its way in, but the problem is
 exacerbated by PyPI's nature.

 Most Linux (all that i'm aware of) distributions have at most 5
 versions of a package to consider at any time - installed(might be
 None), current release, current release security updates, new release
 being upgraded to, new release being upgraded to's security updates.
 And their common worst case is actually 2 versions: installed==current
 release and one new release present. They map alternatives out into
 separate packages (e.g. when an older soname is deliberately kept
 across an ABI incompatibility, you end up with 2 packages, not 2
 versions of one package). To when comparing pip's challenge to apt's:
 apt has ~20-30K packages, with altnernatives ~= 2, or
 pip has ~60K packages, with alternatives ~= 5.7 (I asked dstufft)

 Scaling the number of packages is relatively easy; scaling the number
 of alternatives is harder. Even 300 packages (the dependency tree for
 openstack) is ~2.4T combinations to probe.

 I wonder if it makes sense to give some back-pressure to people, or at
 the very least encourage them to remove distributions that:
  - they don't support anymore
  - have security holes

 If folk consider PyPI a sort of historical archive then perhaps we
 could have a feature to select 'supported' versions by the author, and
 allow a query parameter to ask for all the versions.

 -Rob

 --
 Robert Collins rbtcoll...@hp.com
 Distinguished Technologist
 HP Converged Cloud
 ___
 Distutils-SIG maillist  -  Distutils-SIG@python.org
 https://mail.python.org/mailman/listinfo/distutils-sig

___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-15 Thread Robert Collins
On 16 May 2015 at 11:08, Marcus Smith qwc...@gmail.com wrote:
 Why not start with pip at least being a simple fail-on-conflict resolver
 (vs the 1st found wins resolver it is now)...

 You'd backtrack for the sake of re-walking when new constraints are found,
 but not for the purpose of solving conflicts.

 I know you're motivated to solve Openstack build issues, but many of the
 issues I've seen in the pip tracker, I think would be solved without the
 backtracking resolver you're trying to build.

Well, I'm scratching the itch I have. If its too hard to get something
decent, sure I might back off in my goals, but I see no point aiming
for something less than all the other language specific packaging
systems out there have.

-Rob


-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-15 Thread Donald Stufft

 On May 15, 2015, at 9:22 PM, Robert Collins robe...@robertcollins.net wrote:
 
 On 16 May 2015 at 11:08, Marcus Smith qwc...@gmail.com wrote:
 Why not start with pip at least being a simple fail-on-conflict resolver
 (vs the 1st found wins resolver it is now)...
 
 You'd backtrack for the sake of re-walking when new constraints are found,
 but not for the purpose of solving conflicts.
 
 I know you're motivated to solve Openstack build issues, but many of the
 issues I've seen in the pip tracker, I think would be solved without the
 backtracking resolver you're trying to build.
 
 Well, I'm scratching the itch I have. If its too hard to get something
 decent, sure I might back off in my goals, but I see no point aiming
 for something less than all the other language specific packaging
 systems out there have.


So what makes the other language specific packaging systems different? As far
as I know all of them have complete archives (e.g. they are like PyPI where they
have a lot of versions, not like Linux Distros). What can we learn from how they
solved this?

---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-15 Thread Robert Collins
On 16 May 2015 at 13:45, Donald Stufft don...@stufft.io wrote:

 On May 15, 2015, at 9:22 PM, Robert Collins robe...@robertcollins.net 
 wrote:

 On 16 May 2015 at 11:08, Marcus Smith qwc...@gmail.com wrote:
 Why not start with pip at least being a simple fail-on-conflict resolver
 (vs the 1st found wins resolver it is now)...

 You'd backtrack for the sake of re-walking when new constraints are found,
 but not for the purpose of solving conflicts.

 I know you're motivated to solve Openstack build issues, but many of the
 issues I've seen in the pip tracker, I think would be solved without the
 backtracking resolver you're trying to build.

 Well, I'm scratching the itch I have. If its too hard to get something
 decent, sure I might back off in my goals, but I see no point aiming
 for something less than all the other language specific packaging
 systems out there have.


 So what makes the other language specific packaging systems different? As far
 as I know all of them have complete archives (e.g. they are like PyPI where 
 they
 have a lot of versions, not like Linux Distros). What can we learn from how 
 they
 solved this?

NB; I have by no means finished low hanging heuristics and space
trimming stuff :). I have some simple things in mind and am sure I'll
end up with something 'good enough' for day to day use. The thing I'm
worried about is the long term health of the approach.

Good questions. Some of it is structural I suspect. A quick rundown.
cabal (haskell) has a backtracking solver that accepts various
parameters to tell it to try harder.
javascript effectively vendors every dep ever, so you end up with many
copies of the same library at different versions in the same process.
rust's cargo system currently solves everything in a single project
only - it has no binary packaging, only vendor-into-a-binary-build
packaging.
The gem behaviour I'm not yet familiar with.
perl I used to know but time has eroded it :/.

-Rob

-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-15 Thread David Cournapeau
On Sat, May 16, 2015 at 10:52 AM, Robert Collins robe...@robertcollins.net
wrote:

 On 16 May 2015 at 13:45, Donald Stufft don...@stufft.io wrote:
 
  On May 15, 2015, at 9:22 PM, Robert Collins robe...@robertcollins.net
 wrote:
 
  On 16 May 2015 at 11:08, Marcus Smith qwc...@gmail.com wrote:
  Why not start with pip at least being a simple fail-on-conflict
 resolver
  (vs the 1st found wins resolver it is now)...
 
  You'd backtrack for the sake of re-walking when new constraints are
 found,
  but not for the purpose of solving conflicts.
 
  I know you're motivated to solve Openstack build issues, but many of
 the
  issues I've seen in the pip tracker, I think would be solved without
 the
  backtracking resolver you're trying to build.
 
  Well, I'm scratching the itch I have. If its too hard to get something
  decent, sure I might back off in my goals, but I see no point aiming
  for something less than all the other language specific packaging
  systems out there have.
 
 
  So what makes the other language specific packaging systems different?
 As far
  as I know all of them have complete archives (e.g. they are like PyPI
 where they
  have a lot of versions, not like Linux Distros). What can we learn from
 how they
  solved this?

 NB; I have by no means finished low hanging heuristics and space
 trimming stuff :). I have some simple things in mind and am sure I'll
 end up with something 'good enough' for day to day use. The thing I'm
 worried about is the long term health of the approach.

 Good questions. Some of it is structural I suspect. A quick rundown.
 cabal (haskell) has a backtracking solver that accepts various
 parameters to tell it to try harder.
 javascript effectively vendors every dep ever, so you end up with many
 copies of the same library at different versions in the same process.
 rust's cargo system currently solves everything in a single project
 only - it has no binary packaging, only vendor-into-a-binary-build
 packaging.
 The gem behaviour I'm not yet familiar with.
 perl I used to know but time has eroded it :/.


FWIW, php uses a SAT-based solver in composer, which started as a port of
libsolv (the SAT solver used by openSUSE and soon Fedora).

I am no expert, but I don't understand why backtracking algorithms would to
be faster than SAT, since they both potentially need to walk over the full
set of possible solutions. It is hard to reason about the cost because the
worst case is in theory growing exponentially in both cases.

With a SAT-based algorithm for dependency resolution, it is relatively
simple to apply heuristics which massively prune the search space. For
example, when considering package A with say 10 potential versions A_1,
etc..., in theory, you need to generate the rules:

# - means not install, + means install
- A_1 | -  A_2
- A_1 | - A_3
...

and those constitute most of the rules in common cases. But it is possible
to tweak the SAT implementation to replace those rules by a single AtMost
one of rule per *package*, which means the #rules do not grow much by
versions.

The real difficulty of SAT-based solver is the optimization part: many
actually valid solutions are not acceptable, and that's where the
heuristics get more complicated.

David
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-15 Thread Noah Kantrowitz

On May 15, 2015, at 9:19 PM, Donald Stufft don...@stufft.io wrote:

 
 On May 15, 2015, at 2:57 PM, Robert Collins robe...@robertcollins.net 
 wrote:
 
 So, I am working on pip issue 988: pip doesn't resolve packages at all.
 
 This is O(packages^alternatives_per_package): if you are resolving 10
 packages with 10 versions each, there are approximately 10^10 or 10G
 combinations. 10 packages with 100 versions each - 10^100.
 
 So - its going to depend pretty heavily on some good heuristics in
 whatever final algorithm makes its way in, but the problem is
 exacerbated by PyPI's nature.
 
 Most Linux (all that i'm aware of) distributions have at most 5
 versions of a package to consider at any time - installed(might be
 None), current release, current release security updates, new release
 being upgraded to, new release being upgraded to's security updates.
 And their common worst case is actually 2 versions: installed==current
 release and one new release present. They map alternatives out into
 separate packages (e.g. when an older soname is deliberately kept
 across an ABI incompatibility, you end up with 2 packages, not 2
 versions of one package). To when comparing pip's challenge to apt's:
 apt has ~20-30K packages, with altnernatives ~= 2, or
 pip has ~60K packages, with alternatives ~= 5.7 (I asked dstufft)
 
 Scaling the number of packages is relatively easy; scaling the number
 of alternatives is harder. Even 300 packages (the dependency tree for
 openstack) is ~2.4T combinations to probe.
 
 I wonder if it makes sense to give some back-pressure to people, or at
 the very least encourage them to remove distributions that:
 - they don't support anymore
 - have security holes
 
 If folk consider PyPI a sort of historical archive then perhaps we
 could have a feature to select 'supported' versions by the author, and
 allow a query parameter to ask for all the versions.
 
 
 There have been a handful of projects which would only keep the latest N
 versions uploaded to PyPI. I know this primarily because it has caused
 people a decent amount of pain over time. It’s common for deployments people
 have to use a requirements.txt file like ``foo==1.0`` and to just continue
 to pull from PyPI. Deleting the old files breaks anyone doing that, so it 
 would
 require either having people bundle their deps in their repositories or
 some way to get at those old versions. Personally I think that we shouldn’t
 go deleting the old versions or encouraging people to do that.

+1 for this. While I appreciate why Linux distress purge old versions, it is 
absolutely hellish for reproducibility. If you are looking for prior art, check 
out the Molinillo project (https://github.com/CocoaPods/Molinillo) used by 
Bundler and CocoaPods. It is not as complex as the Solve gem used in Chef but 
offers a good balance of performance in satisfying constraints and false 
negatives on solution failures.

--Noah



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-15 Thread Robert Collins
On 16 May 2015 at 07:18, Justin Cappos jcap...@nyu.edu wrote:
 One thing to consider is that if conflicts do not exist (or are very rare),
 the number of possible combinations is a moot point.  A greedy algorithm for
 installation (which just chooses the most favored package to resolve each
 dependency) will run in linear time with the number of packages it would
 install, if no conflicts exist.

 So, what you are saying about state exploration may be true for a resolver
 that uses something like a SAT solver, but doesn't apply to backtracking
 dependency resolution (unless a huge number of conflicts occur) or simple
 dependency resolution (at all).  SAT solvers do have heuristics to avoid
 this blow up, except in pathological cases.  However, simple / backtracking
 dependency resolution systems have the further advantage of not needing to
 request unneeded metadata in the first place...

Your intuition here is misleading, sorry :(.

You're right about 'hey if everything fits its linear', but the reason
we have this bug open, with people adding a new example of it every
week or so (as someone who didn't realise that pip doesn't resolve
finds out and adds it to pip's issue tracker, only to have it made
into a dupe).

Backtracking recursive resolvers have exactly the same O as SAT.

Example: say I have an ecosystem of 10 packages. A-J. And they do a
release every 6 months that is guaranteed to work together, but every
time some issue occurs which ends up clamping the group together- e.g.
an external release breaks API and so A1s deps are disjoint with A2s,
and then the same between A2 and A3. Even though A1's API is
compatible with B2's: its not internal bad code, its just taking *one*
external dep breaking its API.

After 2 releases you have 10^2 combinations, but only 4 are valid at
all. Thats 4%. 8 releases gets you 10^8, 8 valid combinations, or
0.008%.

Now there are two things to examine here. How likely is this to happen
to PyPI users, and can a backtracker (which btw is what my code is)
handle this better than a SAT solver.

In terms of likelyhood - OpenStack hits this every release. Its not
that our libraries are incompatible with each other, its that given
250 packages (the 200 in error I quoted just shows that the resolver
hadn't obtained version data for everything), *something* breaks API
compat in each 6 month release cycle, and so you end up with the whole
set effectively locking together. In fact, it has happened so
consistently to OpenStack that we now release our libraries with
closed specifiers : =min_version, next_version.

Secondly, backtrackers. Assume nothing is installed, and you want the
latest release. Then sure,
for a_version in A:
   for b_version in B:
  for c_version in C:

etc will hit the most recent release first time, and you're golden.

Assume you have the prior release installed, and you ran pip without
-U, but something that pulls in the latest release of one lib (which
then nabs everything). e.g. pip install J3.0 (and we have releases
1,2,3,4 of A through J).

Now, for a_version in A: is going to have the installed version of A
in its first step. B likewise. So we'll end up with a trace like:
A==3
B==3
C==3
D==3
E==3
F==3
G==3
H==3
I==3
J==3 error, backtrack (user specifier)
J==4 error, backtracks once the external deps are considered and the
conflict is found
J==2 error, backtrack (user specifier)
J==1 error, backtrack (user specifier)
I==4
J==3 error, backtrack (user specifier)
J==4 error, backtracks somewere in the external deps
J==2, error, backtrack (user specifier)

and so on, until we finally tick over to A==4.

More generally, any already installed version (without -U) can cause a
backtracking resolver to try *all* possible other combinations before
realising that that installed version is the problem and bumping it. A
heuristic to look for those and bump them first then hits molasses as
soon as one of the installed versions needs to be kept as-is.

Anyhow, my goal here is to start the conversation; pip will need some
knobs because no matter how good the heuristics users will need escape
hatches. (One of which is to fully specify their needs).

-Rob


-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-15 Thread Jim Fulton
On Fri, May 15, 2015 at 2:57 PM, Robert Collins
robe...@robertcollins.net wrote:
 So, I am working on pip issue 988: pip doesn't resolve packages at all.

 This is O(packages^alternatives_per_package): if you are resolving 10
 packages with 10 versions each, there are approximately 10^10 or 10G
 combinations. 10 packages with 100 versions each - 10^100.

 So - its going to depend pretty heavily on some good heuristics in
 whatever final algorithm makes its way in, but the problem is
 exacerbated by PyPI's nature.

 Most Linux (all that i'm aware of) distributions have at most 5
 versions of a package to consider at any time - installed(might be
 None), current release, current release security updates, new release
 being upgraded to, new release being upgraded to's security updates.
 And their common worst case is actually 2 versions: installed==current
 release and one new release present. They map alternatives out into
 separate packages (e.g. when an older soname is deliberately kept
 across an ABI incompatibility, you end up with 2 packages, not 2
 versions of one package). To when comparing pip's challenge to apt's:
 apt has ~20-30K packages, with altnernatives ~= 2, or
 pip has ~60K packages, with alternatives ~= 5.7 (I asked dstufft)

 Scaling the number of packages is relatively easy; scaling the number
 of alternatives is harder. Even 300 packages (the dependency tree for
 openstack) is ~2.4T combinations to probe.

 I wonder if it makes sense to give some back-pressure to people, or at
 the very least encourage them to remove distributions that:
  - they don't support anymore
  - have security holes

 If folk consider PyPI a sort of historical archive then perhaps we
 could have a feature to select 'supported' versions by the author, and
 allow a query parameter to ask for all the versions.

You could simply limit the number of versions from PyPI
you consider.

Jim

-- 
Jim Fulton
http://www.linkedin.com/in/jimfulton
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-15 Thread Robert Collins
On 16 May 2015 at 08:27, Jim Fulton j...@zope.com wrote:

 If folk consider PyPI a sort of historical archive then perhaps we
 could have a feature to select 'supported' versions by the author, and
 allow a query parameter to ask for all the versions.

 You could simply limit the number of versions from PyPI
 you consider.

Yes - it would be nice IMO to give package authors some influence over
that though.

-Rob

-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-15 Thread Justin Cappos

 Example: say I have an ecosystem of 10 packages. A-J. And they do a
 release every 6 months that is guaranteed to work together, but every
 time some issue occurs which ends up clamping the group together- e.g.
 an external release breaks API and so A1s deps are disjoint with A2s,
 and then the same between A2 and A3. Even though A1's API is
 compatible with B2's: its not internal bad code, its just taking *one*
 external dep breaking its API.

 After 2 releases you have 10^2 combinations, but only 4 are valid at
 all. Thats 4%. 8 releases gets you 10^8, 8 valid combinations, or
 0.008%.


Yes, so this would not be a situation where conflicts do not exist (or are
very rare) as my post mentioned.  Is this rate of conflicts something you
measured or is it a value you made up?


I don't hear anyone arguing that the status quo makes sense.  I think we're
mostly just chatting about the right thing to optimize the solution for and
what sorts of short cuts may be useful (or even necessary).  Since we can
measure the actual conflict and other values in practice, data seems like
it may be a good path toward grounding the discussion...

Thanks,
Justin
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


[Distutils] PyPI is a sick sick hoarder

2015-05-15 Thread Robert Collins
So, I am working on pip issue 988: pip doesn't resolve packages at all.

This is O(packages^alternatives_per_package): if you are resolving 10
packages with 10 versions each, there are approximately 10^10 or 10G
combinations. 10 packages with 100 versions each - 10^100.

So - its going to depend pretty heavily on some good heuristics in
whatever final algorithm makes its way in, but the problem is
exacerbated by PyPI's nature.

Most Linux (all that i'm aware of) distributions have at most 5
versions of a package to consider at any time - installed(might be
None), current release, current release security updates, new release
being upgraded to, new release being upgraded to's security updates.
And their common worst case is actually 2 versions: installed==current
release and one new release present. They map alternatives out into
separate packages (e.g. when an older soname is deliberately kept
across an ABI incompatibility, you end up with 2 packages, not 2
versions of one package). To when comparing pip's challenge to apt's:
apt has ~20-30K packages, with altnernatives ~= 2, or
pip has ~60K packages, with alternatives ~= 5.7 (I asked dstufft)

Scaling the number of packages is relatively easy; scaling the number
of alternatives is harder. Even 300 packages (the dependency tree for
openstack) is ~2.4T combinations to probe.

I wonder if it makes sense to give some back-pressure to people, or at
the very least encourage them to remove distributions that:
 - they don't support anymore
 - have security holes

If folk consider PyPI a sort of historical archive then perhaps we
could have a feature to select 'supported' versions by the author, and
allow a query parameter to ask for all the versions.

-Rob

-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-15 Thread Donald Stufft

 On May 15, 2015, at 2:57 PM, Robert Collins robe...@robertcollins.net wrote:
 
 So, I am working on pip issue 988: pip doesn't resolve packages at all.
 
 This is O(packages^alternatives_per_package): if you are resolving 10
 packages with 10 versions each, there are approximately 10^10 or 10G
 combinations. 10 packages with 100 versions each - 10^100.
 
 So - its going to depend pretty heavily on some good heuristics in
 whatever final algorithm makes its way in, but the problem is
 exacerbated by PyPI's nature.
 
 Most Linux (all that i'm aware of) distributions have at most 5
 versions of a package to consider at any time - installed(might be
 None), current release, current release security updates, new release
 being upgraded to, new release being upgraded to's security updates.
 And their common worst case is actually 2 versions: installed==current
 release and one new release present. They map alternatives out into
 separate packages (e.g. when an older soname is deliberately kept
 across an ABI incompatibility, you end up with 2 packages, not 2
 versions of one package). To when comparing pip's challenge to apt's:
 apt has ~20-30K packages, with altnernatives ~= 2, or
 pip has ~60K packages, with alternatives ~= 5.7 (I asked dstufft)
 
 Scaling the number of packages is relatively easy; scaling the number
 of alternatives is harder. Even 300 packages (the dependency tree for
 openstack) is ~2.4T combinations to probe.
 
 I wonder if it makes sense to give some back-pressure to people, or at
 the very least encourage them to remove distributions that:
 - they don't support anymore
 - have security holes
 
 If folk consider PyPI a sort of historical archive then perhaps we
 could have a feature to select 'supported' versions by the author, and
 allow a query parameter to ask for all the versions.
 

There have been a handful of projects which would only keep the latest N
versions uploaded to PyPI. I know this primarily because it has caused
people a decent amount of pain over time. It’s common for deployments people
have to use a requirements.txt file like ``foo==1.0`` and to just continue
to pull from PyPI. Deleting the old files breaks anyone doing that, so it would
require either having people bundle their deps in their repositories or
some way to get at those old versions. Personally I think that we shouldn’t
go deleting the old versions or encouraging people to do that.

---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-15 Thread Justin Cappos
One thing to consider is that if conflicts do not exist (or are very rare),
the number of possible combinations is a moot point.  A greedy algorithm
for installation (which just chooses the most favored package to resolve
each dependency) will run in linear time with the number of packages it
would install, if no conflicts exist.

So, what you are saying about state exploration may be true for a resolver
that uses something like a SAT solver, but doesn't apply to backtracking
dependency resolution (unless a huge number of conflicts occur) or simple
dependency resolution (at all).  SAT solvers do have heuristics to avoid
this blow up, except in pathological cases.  However, simple / backtracking
dependency resolution systems have the further advantage of not needing to
request unneeded metadata in the first place...

Thanks,
Justin

On Fri, May 15, 2015 at 2:57 PM, Robert Collins robe...@robertcollins.net
wrote:

 So, I am working on pip issue 988: pip doesn't resolve packages at all.

 This is O(packages^alternatives_per_package): if you are resolving 10
 packages with 10 versions each, there are approximately 10^10 or 10G
 combinations. 10 packages with 100 versions each - 10^100.

 So - its going to depend pretty heavily on some good heuristics in
 whatever final algorithm makes its way in, but the problem is
 exacerbated by PyPI's nature.

 Most Linux (all that i'm aware of) distributions have at most 5
 versions of a package to consider at any time - installed(might be
 None), current release, current release security updates, new release
 being upgraded to, new release being upgraded to's security updates.
 And their common worst case is actually 2 versions: installed==current
 release and one new release present. They map alternatives out into
 separate packages (e.g. when an older soname is deliberately kept
 across an ABI incompatibility, you end up with 2 packages, not 2
 versions of one package). To when comparing pip's challenge to apt's:
 apt has ~20-30K packages, with altnernatives ~= 2, or
 pip has ~60K packages, with alternatives ~= 5.7 (I asked dstufft)

 Scaling the number of packages is relatively easy; scaling the number
 of alternatives is harder. Even 300 packages (the dependency tree for
 openstack) is ~2.4T combinations to probe.

 I wonder if it makes sense to give some back-pressure to people, or at
 the very least encourage them to remove distributions that:
  - they don't support anymore
  - have security holes

 If folk consider PyPI a sort of historical archive then perhaps we
 could have a feature to select 'supported' versions by the author, and
 allow a query parameter to ask for all the versions.

 -Rob

 --
 Robert Collins rbtcoll...@hp.com
 Distinguished Technologist
 HP Converged Cloud
 ___
 Distutils-SIG maillist  -  Distutils-SIG@python.org
 https://mail.python.org/mailman/listinfo/distutils-sig

___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-15 Thread Robert Collins
On 16 May 2015 at 06:57, Robert Collins robe...@robertcollins.net wrote:
 So, I am working on pip issue 988: pip doesn't resolve packages at all.

 This is O(packages^alternatives_per_package): if you are resolving 10
...
 Scaling the number of packages is relatively easy; scaling the number
 of alternatives is harder. Even 300 packages (the dependency tree for
 openstack) is ~2.4T combinations to probe.

I added a check for the exact number (when the current step limit is hit):
Hit step limit during resolving,
22493640689038530013767184665222125808455708963348534886974974630893524036813561125576881299950281714638872640331745747555743820280235291929928862660035516365300612827387994788286647556890876840654454905860390366740480.00
from 4038 versions in 205 packages after 10 steps

Which indicates a alternatives factor of ~20. And AIUI PyPI has a long
tail itself, so its more common that folk will see  5.7 factors,
rather than less common.

-Rob

-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-15 Thread Ben Finney
Donald Stufft don...@stufft.io writes:

  On May 15, 2015, at 2:57 PM, Robert Collins robe...@robertcollins.net 
  wrote:
  
  If folk consider PyPI a sort of historical archive then perhaps we
  could have a feature to select 'supported' versions by the author,
  and allow a query parameter to ask for all the versions.
  

 It’s common for deployments people have to use a requirements.txt file
 like ``foo==1.0`` and to just continue to pull from PyPI. Deleting the
 old files breaks anyone doing that, so it would require either having
 people bundle their deps in their repositories or some way to get at
 those old versions. Personally I think that we shouldn’t go deleting
 the old versions or encouraging people to do that.

Yes, it's common to consider PyPI as a repository of all versions ever
released, and to treat it as an archive whose URLs will continue to make
available the historical versions.

-- 
 \ “If history and science have taught us anything, it is that |
  `\ passion and desire are not the same as truth.” —E. O. Wilson, |
_o__)  _Consilience_, 1998 |
Ben Finney

___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-15 Thread Robert Collins
On 16 May 2015 at 07:19, Donald Stufft don...@stufft.io wrote:

 There have been a handful of projects which would only keep the latest N
 versions uploaded to PyPI. I know this primarily because it has caused
 people a decent amount of pain over time. It’s common for deployments people
 have to use a requirements.txt file like ``foo==1.0`` and to just continue
 to pull from PyPI. Deleting the old files breaks anyone doing that, so it 
 would
 require either having people bundle their deps in their repositories or
 some way to get at those old versions. Personally I think that we shouldn’t
 go deleting the old versions or encouraging people to do that.

I think 'most recent only' is too much. Most upstreams will support
more than one release. Like - I don't care what testtools release you
use.

OTOH, every version with distinct dependencies becomes a very
expensive liability to the ecosystem here. It's beyond human scale,
and well in the territory of argh wtf the universe is burning around
me and my tardis has run out of power.

I'm sure we can provide an escape hatch in pip (and I'm going to do
that in my branch soon - offering simple 'error on conflict' and 'use
first seen specifier only' strategies) while folk work on different
heuristics - the actual resolver is only ~100 LOC in my branch today -
the rest is refactoring (that can be made better and I plan to do so
before suggesting we merge it).

But a significant contributing factor is the O of the problem, and we
can do something about that. I don't know what exactly, and I think
we're going to need to have our creative caps firmly on to come up
with something meeting the broad needs of the ecosystem: which
includes pip Just Working.

-Rob

-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] PyPI is a sick sick hoarder

2015-05-15 Thread Robert Collins
On 16 May 2015 at 08:46, Justin Cappos jcap...@nyu.edu wrote:
 Example: say I have an ecosystem of 10 packages. A-J. And they do a
 release every 6 months that is guaranteed to work together, but every
 time some issue occurs which ends up clamping the group together- e.g.
 an external release breaks API and so A1s deps are disjoint with A2s,
 and then the same between A2 and A3. Even though A1's API is
 compatible with B2's: its not internal bad code, its just taking *one*
 external dep breaking its API.

 After 2 releases you have 10^2 combinations, but only 4 are valid at
 all. Thats 4%. 8 releases gets you 10^8, 8 valid combinations, or
 0.008%.


 Yes, so this would not be a situation where conflicts do not exist (or are
 very rare) as my post mentioned.  Is this rate of conflicts something you
 measured or is it a value you made up?

It's drawn from the concrete example of OpenStack, which has a single
group of co-installable releases that cluster together every 6 months.
I don't have the actual valid/invalid ratio there because I don't have
enough machines to calculate it:).

-Rob


-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig