On Sun, May 17, 2015 at 12:40 AM, Daniel Holth <dho...@gmail.com> wrote:
> > On May 16, 2015 11:22 AM, "David Cournapeau" <courn...@gmail.com> wrote: > > > > > > > > On Sat, May 16, 2015 at 11:36 PM, Justin Cappos <jcap...@nyu.edu> wrote: > >>> > >>> I am no expert, but I don't understand why backtracking algorithms > would to be faster than SAT, since they both potentially need to walk over > the full set of possible solutions. It is hard to reason about the cost > because the worst case is in theory growing exponentially in both cases. > >> > >> > >> This is talked about a bit in this thread: > https://github.com/pypa/pip/issues/988 > >> > >> Each algorithm could be computationally more efficient. Basically, *if > there are no conflicts* backtracking will certainly win. If there are a > huge number of conflicts a SAT solver will certainly win. It's not clear > where the tipping point is between the two schemes. > >> > >> However, a better question is does the computational difference > matter? If one is a microsecond faster than the other, I don't think > anyone cares. However, from the OPIUM paper (listed off of that thread), > it is clear that SAT solver resolution can be slow without optimizations to > make them work more like backtracking resolvers. From my experience > backtracking resolvers are also slow when the conflict rate is high. > > > > > > Pure SAT is fast enough in practice in my experience (concretely: > solving thousand of rules takes < 1 sec). It becomes more complicated as > you need to optimize the solution, especially when you have already > installed packages. This is unfortunately not as well discussed in the > literature. Pseudo-boolean SAT for optimization was argued to be too slow > by the 0 install people, but OTOH, this seems to be what's used in conda, > so who knows :) > > Where optimizing means something like "find a solution with the newest > possible releases of the required packages", not execution speed. > Indeed, it was not obvious in this context :) Though in theory, optimization is more general. It could be optimizing w.r.t. a cost function taking into account #packages, download size, minimal number of changes, etc... This is where you want a pseudo-boolean SAT, which is what conda uses I think. 0install, composer and I believe libsolv took a different route, and use heuristics to find a reasonably good solution by picking the next candidate. This requires access to the internals of the SAT solver though (not a problem if you have a python implementation). David > > If you SAT solver is in pure python, you can choose a "direction" of the > search which is more meaningful. I believe this is what 0install does from > reading http://0install.net/solver.html, and what we have in our own SAT > solver code. I unfortunately cannot look at the 0install code myself as it > is under the GPL and am working on a BSD solver implementation. I also do > not know how they handle updates and already installed packages. > > > >> > >> This only considers computation cost though. Other factors can become > more expensive than computation. For example, SAT solvers need all the > rules to consider. So a SAT solution needs to effectively download the > full dependency graph before starting. A backtracking dependency resolver > can just download packages or dependency information as it considers them. > The bandwidth cost for SAT solvers should be higher. > > > > > > With a reasonable representation, I think you can make it small enough. > To give an idea, our index @ Enthought containing around 20k packages takes > ~340 kb compressed w/ bz2 if you only keep the data required for dependency > handling (name, version and runtime dependencies), and that's using json, > an inefficient encoding, so I suspect encoding all of pypi may be a few MB > only fetch, which is generally faster that doing tens of http requests. > > > > The libsvolv people worked on a binary representation that may also be > worth looking at. > > > >> > >> P.S. If you'd like to talk off list, possibly over Skype, I'd be happy > to talk more with you and/or Robert about minutiae that others may not care > about. > > > > > > Sure, I would be happy too. As I mentioned before, we have some code > around a SAT-based solver, but it is not ready yet, which is why we kept it > private (https://github.com/enthought/sat-solver). It handles well (== > both speed and quality-wise) the case where nothing is installed, but > behaves poorly when packages are already installed, and does not handle the > update case yet. The code is also very prototype-ish, but is not too > complicated to experimente with it. > > > > David > > > > > > _______________________________________________ > > Distutils-SIG maillist - Distutils-SIG@python.org > > https://mail.python.org/mailman/listinfo/distutils-sig > > >
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig