A set was requested with preserved insertion order. Mathematically, such a
structure relates more to an Oset (having order) than a Set. See
relationships in the image below:
[image: libo.png]
Each of the mentioned structure types has a dedicated role. Python's sets
work well and correlate with existing math principles. I would be more
convinced to introduce a new collection to Python rather than alter
existing sets.
-1: preserving insertion-order in sets
0: adding `OrderedSet` (where `dict.fromkeys()` does not suffice, i.e.
needing set operations)
+1: implementing an analogous abstract collection, e.g. "`HashList`" ->
Oset, just as `Counter` -> Mset
On Sun, Dec 29, 2019 at 9:00 PM Wes Turner wrote:
> Due to version and platform constraints, a SAT solver is necessary for
> conda and (now) pip. Libsolv is one of the fastest around.
>
> https://github.com/QuantStack/mamba is conda implemented with libsolv
> (and parallel downloading of *declarative* dependency metadata).
>
> For general purpose graphs with implicit node instantation from edge
> declaration, NetworkX has implementations of many graph algorithms.
>
> https://networkx.github.io/documentation/stable/reference/algorithms/generated/networkx.algorithms.dag.topological_sort.html
>
>
> CuGraph (a product of the RAPIDS.ai project) does graphs with CUDA (from
> Python) with a "NetworkX-like API" but doesn't yet have a topological sort
> (though it does have BFS)
> https://github.com/rapidsai/cugraph
>
> "pip needs a dependency resolver" + End (Fn+Right) links to the latest
> work on the new pip code (that must require declarative dependency metadata)
> https://github.com/pypa/pip/issues/988
>
> That said, this implementation of topo sort could have a deterministic
> output given an OrderedSet:
> https://rosettacode.org/wiki/Topological_sort#Python
>
> A table including Big-O and memory usage for the necessary data structure
> and methods would be useful.
>
> On Sun, Dec 29, 2019, 5:12 PM Tim Peters wrote:
>
>> [Larry]
>> > It's a lightweight abstract dependency graph. Its nodes are opaque,
>> > only required to be hashable. And it doesn't require that you give it
>> > all the nodes in strict dependency order.
>> >
>> > When you add a node, you can also optionally specify
>> > dependencies, and those dependencies aren't required
>> > to be nodes that the graph has previously seen. So you can add
>> > a node A that depends on B and C, without showing it B or C
>> > first. When that happens it creates placeholder nodes for B
>> > and C, and naturally these nodes have no dependencies. You
>> > can then later inform the graph that node B has dependencies
>> > on X Y and Z.
>> >
>> > (My specific use case is a package manager. Packages are identified
>> > with unique strings. So the nodes I give the give the graph are simply
>> > those package names. It's pretty common for me to get a package
>> > with dependencies on packages I haven't seen yet. Designing the
>> > graph to create placeholders let me skip making two passes over
>> > the package list, pre-creating the package objects, etc etc. This
>> > made the code simpler and presumably faster.)
>> >
>> > The graph API maintains an externally-visible "ready" iterable of
>> > nodes. And since B can go from ready to not-ready, it can get added
>> > to "ready" and subsequently removed.
>> >
>> > Also, when a node is satisfied, you simply remove it from the graph.
>> > If A depends on B and C, and they all represent jobs, and B and C
>> > have no dependencies, they'll be in the "ready" iterable. You iterate
>> > over "ready", and execute those jobs, and assuming they are
>> > successful you then remove them from the graph. When A's
>> > dependencies are all satisfied, it'll get added to the "ready" iterable.
>> > And naturally when B and C were removed from the graph, they were
>> > removed from the "ready" iterable too.
>>
>> OK! You're doing a topological sort. There are natural & simple ways
>> to do that right now that scale efficiently to very large graphs (time
>> & space linear in the sum of the number of nodes and the number of
>> dependencies). Curiously, the difficulties are mostly driven by the
>> quality of _error_ messages you want (in case of, e.g., cycles in the
>> dependency graph).
>>
>> Lots of those implementations became deterministic "by magic" when
>> ordered dicts were introduced. This is why: a bare bones
>> implementation needs to associate two pieces of info with each node:
>> a list of its immediate successors, and an integer count of the number
>> of its immediate predecessors. A dict is the natural way to implement
>> that association.
>>
>> When all the dependency info has been entered, then:
>>
>> - First all nodes are emitted whose predecessor count is 0. Iterating
>> over the association dict once is the natural way to find them, and
>> that order is defined now.
>>
>> - Then, as each emitted node N is marked done:
>> for child in N.succe