In other languages, there are built-in data structures with binary search tree semantics, as an example, std::map in C++.
By binary search tree semantics we mean the following: * We can insert elements into the structure in time faster than O(n), i.e. O(logn) for std::map * We can iterate over all of the elements of the structure in sorted order in linear time, i.e. O(n) for std::map Is there currently any data structure in Python with binary search tree semantics? To illustrate the need, can the following program be written elegantly and efficiently with built-in Python data structures? Imagine that you are a fund manager at a bank. Bank accounts are added to your fund very frequently. A bank account has an ID and a value. A smaller ID means that the bank account was created earlier than one with a larger ID. Each time that an account is added to your fund, you would like to know the ID number of the 1000th oldest account, for your records. Here are implementations in Python 3 and C++, for comparison: ```python3 from random import randint dist = lambda: randint(0, 2147483647) def main(): my_map = {} # fills a map with 1000000 random mappings of type (int, int) for i in range(0, 1000000): my_map[dist()] = dist() # prints out the 1000th smallest key and its value target_key = nth_smallest_key(1000, my_map) print("({}: {})".format(target_key, my_map[target_key])) # writes a new random mapping to the map # then prints out the 1000th smallest key and its value if that key # has changed # 100000 times for i in range(100000): my_map[dist()] = dist() test_key = nth_smallest_key(1000, my_map) if target_key != test_key: target_key = test_key print("({}: {})".format(target_key, my_map[target_key])) # print an indicator every 100 iterations for comparison if i % 100 == 0: print("iteration: {}".format(i)) def nth_smallest_key(n, m): return sorted(m.keys())[n] if __name__ == "__main__": main() ``` ```c++ #include <cstdio> #include <map> #include <random> using namespace std; int nth_smallest_key(int n, map<int, int>& m); int main() { random_device rd; mt19937 psrng(rd()); uniform_int_distribution<> dist(0, 2147483647); map<int, int> my_map; // fills a map with 1000000 random mappings of type (<int>: <int>) for (int i = 0; i < 1000000; i++) { my_map[dist(psrng)] = dist(psrng); } // prints out the 1000th smallest key and its value int target_key = nth_smallest_key(1000, my_map); printf("(%d: %d)\n", target_key, my_map[target_key]); // writes a new random mapping to the map // then prints out the 1000th smallest key and its value if that key // has changed // 100000 times for (int i = 0; i <= 100000; i++) { my_map[dist(psrng)] = dist(psrng); int test_key = nth_smallest_key(1000, my_map); if (target_key != test_key) { target_key = test_key; printf("(%d: %d)\n", target_key, my_map[target_key]); } } return 0; } int nth_smallest_key(int n, map<int, int>& m) { map<int, int>::iterator curr = m.begin(); for (int i = 0; i < n; i++) { curr = next(curr); } return curr->first; } ``` Makefile: ```make runcpp: buildcpp ./main buildcpp: g++ -o main bst_semantics_cpp.cpp runpython: python3 bst_semantics_python.py ``` The C++ program runs on my machine in approximately 5 seconds ```bash $ time ./main (2211193: 2021141747) (2208771: 1079444337) (2208700: 1187119077) (2208378: 1447743503) ... (1996019: 1378401665) (1989217: 1457497754) (1989042: 1336229409) real 0m4.915s user 0m4.750s sys 0m0.094s $ ``` The Python program does not reach 1% of completion after 120 seconds ```bash $ time make runpython python3 bst_semantics_python.py (2158070: 1498305903) iteration: 0 iteration: 100 iteration: 200 ^CTraceback (most recent call last): File "bst_semantics_python.py", line 36, in <module> main() File "bst_semantics_python.py", line 23, in main test_key = nth_smallest_key(1000, my_map) File "bst_semantics_python.py", line 33, in nth_smallest_key return sorted(m.keys())[n] KeyboardInterrupt Makefile:8: recipe for target 'runpython' failed make: *** [runpython] Error 1 real 2m2.040s user 1m59.063s sys 0m0.375s $ ``` I would like to propose adding such a data structure to the Python standard library. It could be called a SortedDict. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/VFBM6UFVDUEPVCUTSTKGLB66MRIMOTCO/ Code of Conduct: http://python.org/psf/codeofconduct/