New submission from Larry Hastings <la...@hastings.org>:

I've maintained my own topological sort class for a while now.  The API I came 
up with (or was it Tim and I?) was adopted for graphlib.TopologicalSort().  
Some of my method names are slightly different, but the APIs are so similar, I 
can do a little renaming and run Lib/test/test_graphlib using my 
implementation.  (For clarity I'm going to write the rest of this issue using 
the names from graphlib.)

Recently I made an improvement to my version: it no longer has the modal 
behavior where there's a "before prepare" phase where you add nodes and an 
"after prepare" phase where you can call get_ready and done.

Instead, in my new and improved version the graph is always maintained in a 
"ready" state.  You can call add, get_ready, and done in any order, as much as 
you like.  prepare doesn't do anything besides the cycle check--but read on!

This approach required tweaking some semantics behind the scenes.  My graph 
object now maintains an internal "dirty" flag.  It's set to True after any 
edges are added, and only gets cleared after checking for cycles.  prepare runs 
this cycle check, but get_ready *also* runs this cycle check.  (Though in both 
cases, only when the dirty flag is set, of course.)  In theory this means you 
no longer need the prepare call.  However, it's still useful, because you can 
call it to check for a CycleError.

So, on the one hand, this means that get_ready can throw a CycleError, which it 
never did before.  On the other hand, it won't throw for existing users of the 
library, because they never add edges after their prepare call.  So this 
semantic change shouldn't break existing users, assuming they're not misusing 
the existing API.

Another wrinkle: adding a dependency on a node that's already been handed out 
by get_ready (but hasn't been returned by done) is ignored.  Again, not 
something existing users of graphlib can do, so this shouldn't break code.

There's one final semantic change I made worth knowing about: when you return a 
node via done, the graph simply forgets about it.  This means you could add it 
a second time (!) and it could go for a complete ride through the graph again, 
wheeee!  This also means that when all nodes have been returned by done, the 
graph is essentially returned to its initial pristine state.  This too seems 
completely harmless, because with the existing library it's illegal to add 
nodes to a graph after calling prepare, so nobody's doing it, so it won't break 
any code.

I'm happy to contribute my version.  But I was lazy and didn't worry about 
speed or memory implementation; it's strewn with sets and things.  I figured I 
could always optimize it later.  But "later" could become "now" if we were 
interested in trying to merge this behavior into Python's graphlib.

Interested?

----------
assignee: larry
components: Library (Lib)
messages: 416182
nosy: eric.smith, larry, pablogsal, tim.peters
priority: normal
severity: normal
stage: needs patch
status: open
title: Improve graphlib.TopologicalSort by removing the prepare step
type: enhancement
versions: Python 3.11

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue47145>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to