[issue47145] Improve graphlib.TopologicalSort by removing the prepare step

Larry Hastings Tue, 29 Mar 2022 23:29:29 -0700


Larry Hastings <la...@hastings.org> added the comment:


> Assuming we do want to be able to add() after a get_ready(), is there
> a reason that "forgetting" already-produced nodes is the correct
> behavior, as opposed to remembering all nodes ever added, and
> raising iff the addition creates a cycle among all nodes ever
> added or depends on an already-yielded node?

I'm not sure "correct" applies here, because I don't have a sense that one 
behavior is conceptually more correct than the other.  But in implementing my 
API change, forgetting about the done nodes seemed more practical.

The "benefit" to remembering done nodes: the library can ignore dependencies to 
them in the future, forever.  Adding a dependency to a node that's already been 
marked as "done" doesn't make much conceptual sense to me, but as a practical 
thing maybe it's useful?  I'm not sure, but it doesn't seem all that useful.

I can only come up with a marginal reason why remembering done nodes is useful. 
 Let's say all your tasks fan out from some fundamental task, like "download 
master list of work" or something.  One of your tasks might discover additional 
tasks that need to run, and conceptually those tasks might depend on your 
"download master list of work" task.  If the graph remembers the done list 
forever, then adding that dependency is harmless.  If the graph forgets about 
done nodes, then adding that dependency could re-introduce that task to the 
graph, which could goof things up.  So maybe it's a little bit of a footgun?  
But on the other hand: you already know you're running, and you're a task that 
was dependent on the master list of work, which means you implicitly know that 
dependency has been met.  So just skip adding the redundant dependency and 
you're fine.

On the other hand, forgetting about the nodes has a definite practical benefit: 
the graph consumes less memory.  If you use a graph object for a long time, the 
list of done nodes it's holding references to would continue to grow and grow 
and grow.  If we forget about done nodes, we free up all that memory, and done 
membership testing maybe gets faster.

I guess I'm not married to the behavior.  If someone had a great conceptual or 
practical reason why remembering the done nodes forever was better, I'd be 
willing to listen to reason.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue47145>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue47145] Improve graphlib.TopologicalSort by removing the prepare step

Reply via email to