On Sat, May 16, 2015 at 11:36 PM, Justin Cappos <jcap...@nyu.edu> wrote:
> I am no expert, but I don't understand why backtracking algorithms would >> to be faster than SAT, since they both potentially need to walk over the >> full set of possible solutions. It is hard to reason about the cost because >> the worst case is in theory growing exponentially in both cases. >> > > This is talked about a bit in this thread: > https://github.com/pypa/pip/issues/988 > > Each algorithm could be computationally more efficient. Basically, *if > there are no conflicts* backtracking will certainly win. If there are a > huge number of conflicts a SAT solver will certainly win. It's not clear > where the tipping point is between the two schemes. > > However, a better question is does the computational difference matter? > If one is a microsecond faster than the other, I don't think anyone cares. > However, from the OPIUM paper (listed off of that thread), it is clear that > SAT solver resolution can be slow without optimizations to make them work > more like backtracking resolvers. From my experience backtracking > resolvers are also slow when the conflict rate is high. > Pure SAT is fast enough in practice in my experience (concretely: solving thousand of rules takes < 1 sec). It becomes more complicated as you need to optimize the solution, especially when you have already installed packages. This is unfortunately not as well discussed in the literature. Pseudo-boolean SAT for optimization was argued to be too slow by the 0 install people, but OTOH, this seems to be what's used in conda, so who knows :) If you SAT solver is in pure python, you can choose a "direction" of the search which is more meaningful. I believe this is what 0install does from reading http://0install.net/solver.html, and what we have in our own SAT solver code. I unfortunately cannot look at the 0install code myself as it is under the GPL and am working on a BSD solver implementation. I also do not know how they handle updates and already installed packages. > This only considers computation cost though. Other factors can become > more expensive than computation. For example, SAT solvers need all the > rules to consider. So a SAT solution needs to effectively download the > full dependency graph before starting. A backtracking dependency resolver > can just download packages or dependency information as it considers them. > The bandwidth cost for SAT solvers should be higher. > With a reasonable representation, I think you can make it small enough. To give an idea, our index @ Enthought containing around 20k packages takes ~340 kb compressed w/ bz2 if you only keep the data required for dependency handling (name, version and runtime dependencies), and that's using json, an inefficient encoding, so I suspect encoding all of pypi may be a few MB only fetch, which is generally faster that doing tens of http requests. The libsvolv people worked on a binary representation that may also be worth looking at. > P.S. If you'd like to talk off list, possibly over Skype, I'd be happy to > talk more with you and/or Robert about minutiae that others may not care > about. > Sure, I would be happy too. As I mentioned before, we have some code around a SAT-based solver, but it is not ready yet, which is why we kept it private (https://github.com/enthought/sat-solver). It handles well (== both speed and quality-wise) the case where nothing is installed, but behaves poorly when packages are already installed, and does not handle the update case yet. The code is also very prototype-ish, but is not too complicated to experimente with it. David
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig