Steven D'Aprano <steve+comp.lang.pyt...@pearwood.info> writes: > Possibly because they aren't needed? Under what circumstances would > you use a tree instead of a list or a dict or combination of both?
I've sometimes wanted a functional tree in the sense of functional programming. That means the tree structure is immutable and you insert or delete nodes in O(log n) operations, by creating new trees that share almost all their data with the old tree. Once you release the old root, garbage collection frees any nodes that were in the old tree but not the new one. Use case inspired by Haskell's Happstack-store (formerly called MACID): you want a Prevayler-style in-memory key-value database. First idea: all read queries are served by pure in-memory lookups in a Python dict. Write queries update the dict and also append the command to a serial log on disk (one disk write, no seeks). If the system crashes, restart it empty, and play back the log from the beginning to restore the in-memory data. Problem with first idea: the database might be a few thousand entries but take millions of updates, if entries are modified frequently. You don't want to reload the million updates from the beginning. Next idea: dump a snapshot of the dictionary now and then, and then just reload updates starting from the last snapshot. Trouble here is that the dictionary is big enough that writing out the snapshot takes time, and you don't want to lock the whole dict against more updates while the snapshot is writing. If you do this the Java way, welcome to concurrency hell as you make finer and finer grained locks and deal with resulting deadlocks, race conditions, etc. And the dictionary still has to be synchronized to avoid reads simultaneous with updates. MACID solution: use a functional red-black tree instead of a dict. To add or update an entry, make a new tree that replaces the old tree by updating a single pointer. That allows unlimited read concurrency (reading means getting the current tree pointer and navigating the tree that you get: since that tree is immutable you can keep using it while other threads make new trees). Updates are managed by a single thread that can take its time making the new tree and writing the log, and then atomically updating the in-memory root pointer that's shared with other threads. Finally, to take a snapshot, you can just get the current root and write out its contents: since it's immutable you can do that concurrently with anything else (including updates) that might be happening. You just have a small interaction with the update thread to remember where in the log to start reading from in case of a crash and reload. Finding out about this was one of the "wow" moments that made me realize I had to learn Haskell. -- https://mail.python.org/mailman/listinfo/python-list