A set was requested with preserved insertion order. Mathematically, such a structure relates more to an Oset (having order) than a Set. See relationships in the image below:
[image: libo.png] Each of the mentioned structure types has a dedicated role. Python's sets work well and correlate with existing math principles. I would be more convinced to introduce a new collection to Python rather than alter existing sets. -1: preserving insertion-order in sets 0: adding `OrderedSet` (where `dict.fromkeys()` does not suffice, i.e. needing set operations) +1: implementing an analogous abstract collection, e.g. "`HashList`" -> Oset, just as `Counter` -> Mset On Sun, Dec 29, 2019 at 9:00 PM Wes Turner <[email protected]> wrote: > Due to version and platform constraints, a SAT solver is necessary for > conda and (now) pip. Libsolv is one of the fastest around. > > https://github.com/QuantStack/mamba is conda implemented with libsolv > (and parallel downloading of *declarative* dependency metadata). > > For general purpose graphs with implicit node instantation from edge > declaration, NetworkX has implementations of many graph algorithms. > > https://networkx.github.io/documentation/stable/reference/algorithms/generated/networkx.algorithms.dag.topological_sort.html > > > CuGraph (a product of the RAPIDS.ai project) does graphs with CUDA (from > Python) with a "NetworkX-like API" but doesn't yet have a topological sort > (though it does have BFS) > https://github.com/rapidsai/cugraph > > "pip needs a dependency resolver" + End (Fn+Right) links to the latest > work on the new pip code (that must require declarative dependency metadata) > https://github.com/pypa/pip/issues/988 > > That said, this implementation of topo sort could have a deterministic > output given an OrderedSet: > https://rosettacode.org/wiki/Topological_sort#Python > > A table including Big-O and memory usage for the necessary data structure > and methods would be useful. > > On Sun, Dec 29, 2019, 5:12 PM Tim Peters <[email protected]> wrote: > >> [Larry] >> > It's a lightweight abstract dependency graph. Its nodes are opaque, >> > only required to be hashable. And it doesn't require that you give it >> > all the nodes in strict dependency order. >> > >> > When you add a node, you can also optionally specify >> > dependencies, and those dependencies aren't required >> > to be nodes that the graph has previously seen. So you can add >> > a node A that depends on B and C, without showing it B or C >> > first. When that happens it creates placeholder nodes for B >> > and C, and naturally these nodes have no dependencies. You >> > can then later inform the graph that node B has dependencies >> > on X Y and Z. >> > >> > (My specific use case is a package manager. Packages are identified >> > with unique strings. So the nodes I give the give the graph are simply >> > those package names. It's pretty common for me to get a package >> > with dependencies on packages I haven't seen yet. Designing the >> > graph to create placeholders let me skip making two passes over >> > the package list, pre-creating the package objects, etc etc. This >> > made the code simpler and presumably faster.) >> > >> > The graph API maintains an externally-visible "ready" iterable of >> > nodes. And since B can go from ready to not-ready, it can get added >> > to "ready" and subsequently removed. >> > >> > Also, when a node is satisfied, you simply remove it from the graph. >> > If A depends on B and C, and they all represent jobs, and B and C >> > have no dependencies, they'll be in the "ready" iterable. You iterate >> > over "ready", and execute those jobs, and assuming they are >> > successful you then remove them from the graph. When A's >> > dependencies are all satisfied, it'll get added to the "ready" iterable. >> > And naturally when B and C were removed from the graph, they were >> > removed from the "ready" iterable too. >> >> OK! You're doing a topological sort. There are natural & simple ways >> to do that right now that scale efficiently to very large graphs (time >> & space linear in the sum of the number of nodes and the number of >> dependencies). Curiously, the difficulties are mostly driven by the >> quality of _error_ messages you want (in case of, e.g., cycles in the >> dependency graph). >> >> Lots of those implementations became deterministic "by magic" when >> ordered dicts were introduced. This is why: a bare bones >> implementation needs to associate two pieces of info with each node: >> a list of its immediate successors, and an integer count of the number >> of its immediate predecessors. A dict is the natural way to implement >> that association. >> >> When all the dependency info has been entered, then: >> >> - First all nodes are emitted whose predecessor count is 0. Iterating >> over the association dict once is the natural way to find them, and >> that order is defined now. >> >> - Then, as each emitted node N is marked done: >> for child in N.successors: >> assert child.predcount > 0 >> child.predcount -= 1 >> if child.predcount == 0: >> emit(child) >> >> That's about all there is to it. But there's no end to the cruft that >> _can_ be added to, e.g., verify that a node is marked done no more >> than once, or to compute & display strongly connected components if >> not all nodes are emitted, etc. >> >> Ordered sets could improve this "natural" implementation if the >> successor lists were successor sets instead, simply because then - >> like lists - their iteration order would be defined, which can in turn >> affect the order in which nodes are emitted in the loop just above. >> But lists are adequate to the task, are cheaper in both time and >> space, and can't _not_ be ordered ;-) >> >> >> > Thus it's this "ready" collection that I want to a) be iterable, b) be >> stable, and >> > c) support fast removal. I add and remove nodes from that set a lot, >> which I >> > realized would perturb the iteration order from run to run, etc etc >> etc, and here >> > we are. >> >> The sketch above (which is a bog-common way to implement topsorts) >> doesn't build a data structure recording the final order, and, in >> fact, never deletes anything. You might protest that the initial >> iteration step (scanning the association dict to find nodes with no >> predecessors) is an expense you skip because nodes with predecessors >> are deleted from your "ready" structure all along. But nodes with >> predecessors are _touched_ either way, and merely skipping over them >> is bound to be cheaper than deleting them. >> >> After that initial step, no search of any kind is needed again. >> >> > I grant you maybe it's a weird approach. But after two false starts, >> where I >> > was starting to boil the oceans with ABCs and "node is satisfied" APis >> and >> > separate iterator objects, I had a re-think and hit on this super >> lightweight >> > design. I'm actually quite pleased with it--it's short, it has a >> minimal API, >> > it's readable, it was easy to get right, and it solves my problem. >> >> Whereas I would have started with a topsort and finished it while I >> was sleeping ;-) Seriously, I've written such a thing many times, but >> never reuse the code because it's so easy to redo from scratch. Every >> application has _some_ unique twist, which makes a general-purpose API >> so bloated it's harder to figure out how to use than to write custom >> code. >> >> In your application, I'm guessing (but don't know) that when a package >> name is emitted, it's possible you're not able to find the package, >> and so the process has to stop. That's why I inserted a "as each >> emitted node N is marked done" step to my description. That is, I'm >> picturing an API that adds a (say) >> >> topsorter.record_succeeded(node) >> >> method to say that - yes - the node was processed successfully. Else, >> by default, its successors (if any) must not be emitted. >> >> But if that isn't needed, the whole topsort order can be materialized >> into (say) a list in one gulp, or delivered by a generator. >> >> >> > Happy new year, >> >> And to you too! :-) >> _______________________________________________ >> Python-Dev mailing list -- [email protected] >> To unsubscribe send an email to [email protected] >> https://mail.python.org/mailman3/lists/python-dev.python.org/ >> Message archived at >> https://mail.python.org/archives/list/[email protected]/message/HRKQPTPQQNIU3DYMYUYRC7USVRJGMRFC/ >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > _______________________________________________ > Python-Dev mailing list -- [email protected] > To unsubscribe send an email to [email protected] > https://mail.python.org/mailman3/lists/python-dev.python.org/ > Message archived at > https://mail.python.org/archives/list/[email protected]/message/6UKXR63M6NYI3EQRXL7AOH3AWMTSNXUW/ > Code of Conduct: http://python.org/psf/codeofconduct/ >
_______________________________________________ Python-Dev mailing list -- [email protected] To unsubscribe send an email to [email protected] https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/[email protected]/message/5CSNT37JKYAY52CWRI4PRRCNC5UL7DSK/ Code of Conduct: http://python.org/psf/codeofconduct/
