Re: [Distutils] PyPI is a sick sick hoarder

David Cournapeau Sat, 16 May 2015 08:23:07 -0700

On Sat, May 16, 2015 at 11:36 PM, Justin Cappos <jcap...@nyu.edu> wrote:


> I am no expert, but I don't understand why backtracking algorithms would
>> to be faster than SAT, since they both potentially need to walk over the
>> full set of possible solutions. It is hard to reason about the cost because
>> the worst case is in theory growing exponentially in both cases.
>>
>
> This is talked about a bit in this thread:
> https://github.com/pypa/pip/issues/988
>
> Each algorithm could be computationally more efficient.  Basically, *if
> there are no conflicts* backtracking will certainly win.  If there are a
> huge number of conflicts a SAT solver will certainly win.  It's not clear
> where the tipping point is between the two schemes.
>
> However, a better question is does the computational difference matter?
> If one is a microsecond faster than the other, I don't think anyone cares.
> However, from the OPIUM paper (listed off of that thread), it is clear that
> SAT solver resolution can be slow without optimizations to make them work
> more like backtracking resolvers.  From my experience backtracking
> resolvers are also slow when the conflict rate is high.
>

Pure SAT is fast enough in practice in my experience (concretely: solving
thousand of rules takes < 1 sec). It becomes more complicated as you need
to optimize the solution, especially when you have already installed
packages. This is unfortunately not as well discussed in the literature.
Pseudo-boolean SAT for optimization was argued to be too slow by the 0
install people, but OTOH, this seems to be what's used in conda, so who
knows :)

If you SAT solver is in pure python, you can choose a "direction" of the
search which is more meaningful. I believe this is what 0install does from
reading http://0install.net/solver.html, and what we have in our own SAT
solver code. I unfortunately cannot look at the 0install code myself as it
is under the GPL and am working on a BSD solver implementation. I also do
not know how they handle updates and already installed packages.


> This only considers computation cost though.  Other factors can become
> more expensive than computation.  For example, SAT solvers need all the
> rules to consider.  So a SAT solution needs to effectively download the
> full dependency graph before starting.  A backtracking dependency resolver
> can just download packages or dependency information as it considers them.
> The bandwidth cost for SAT solvers should be higher.
>

With a reasonable representation, I think you can make it small enough. To
give an idea, our index @ Enthought containing around 20k packages takes
~340 kb compressed w/ bz2 if you only keep the data required for dependency
handling (name, version and runtime dependencies), and that's using json,
an inefficient encoding, so I suspect encoding all of pypi may be a few MB
only fetch, which is generally faster that doing tens of http requests.

The libsvolv people worked on a binary representation that may also be
worth looking at.


> P.S.  If you'd like to talk off list, possibly over Skype, I'd be happy to
> talk more with you and/or Robert about minutiae that others may not care
> about.
>

Sure, I would be happy too. As I mentioned before, we have some code around
a SAT-based solver, but it is not ready yet, which is why we kept it
private (https://github.com/enthought/sat-solver). It handles well (== both
speed and quality-wise) the case where nothing is installed, but behaves
poorly when packages are already installed, and does not handle the update
case yet. The code is also very prototype-ish, but is not too complicated
to experimente with it.

David

_______________________________________________
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] PyPI is a sick sick hoarder

Reply via email to