In other languages, there are built-in data structures with binary search tree 
semantics, as an example, std::map in C++.

By binary search tree semantics we mean the following:

* We can insert elements into the structure in time faster than O(n), i.e. 
O(logn) for std::map
* We can iterate over all of the elements of the structure in sorted order in 
linear time, i.e. O(n) for std::map

Is there currently any data structure in Python with binary search tree 
semantics?

To illustrate the need, can the following program be written elegantly and 
efficiently with built-in Python data structures?

Imagine that you are a fund manager at a bank. Bank accounts are added to your 
fund very frequently. A bank account has an ID and a value. A smaller ID means 
that the bank account was created earlier than one with a larger ID. Each time 
that an account is added to your fund, you would like to know the ID number of 
the 1000th oldest account, for your records.

Here are implementations in Python 3 and C++, for comparison:

```python3
from random import randint

dist = lambda: randint(0, 2147483647)

def main():
    my_map = {}

    # fills a map with 1000000 random mappings of type (int, int)
    for i in range(0, 1000000):
        my_map[dist()] = dist()

    # prints out the 1000th smallest key and its value
    target_key = nth_smallest_key(1000, my_map)
    print("({}: {})".format(target_key, my_map[target_key]))

    # writes a new random mapping to the map
    # then prints out the 1000th smallest key and its value if that key
    # has changed
    # 100000 times
    for i in range(100000):
        my_map[dist()] = dist()

        test_key = nth_smallest_key(1000, my_map)
        if target_key != test_key:
            target_key = test_key
            print("({}: {})".format(target_key, my_map[target_key]))

        # print an indicator every 100 iterations for comparison
        if i % 100 == 0:
            print("iteration: {}".format(i))

def nth_smallest_key(n, m):
    return sorted(m.keys())[n]

if __name__ == "__main__":
    main()
```

```c++
#include <cstdio>
#include <map>
#include <random>
using namespace std;

int nth_smallest_key(int n, map<int, int>& m);

int main() {
    random_device rd;
    mt19937 psrng(rd());
    uniform_int_distribution<> dist(0, 2147483647);

    map<int, int> my_map;

    // fills a map with 1000000 random mappings of type (<int>: <int>)
    for (int i = 0; i < 1000000; i++) {
        my_map[dist(psrng)] = dist(psrng);
    }

    // prints out the 1000th smallest key and its value
    int target_key = nth_smallest_key(1000, my_map);
    printf("(%d: %d)\n", target_key, my_map[target_key]);

    // writes a new random mapping to the map
    // then prints out the 1000th smallest key and its value if that key
    // has changed
    // 100000 times
    for (int i = 0; i <= 100000; i++) {
        my_map[dist(psrng)] = dist(psrng);

        int test_key = nth_smallest_key(1000, my_map);
        if (target_key != test_key) {
            target_key = test_key;
            printf("(%d: %d)\n", target_key, my_map[target_key]);
        }
    }

    return 0;
}

int nth_smallest_key(int n, map<int, int>& m) {
    map<int, int>::iterator curr = m.begin();
    for (int i = 0; i < n; i++) {
        curr = next(curr);
    }
    return curr->first;
}
```

Makefile:
```make
runcpp: buildcpp
        ./main

buildcpp:
        g++ -o main bst_semantics_cpp.cpp

runpython:
        python3 bst_semantics_python.py
```

The C++ program runs on my machine in approximately 5 seconds
```bash
$ time ./main
(2211193: 2021141747)
(2208771: 1079444337)
(2208700: 1187119077)
(2208378: 1447743503)
...
(1996019: 1378401665)
(1989217: 1457497754)
(1989042: 1336229409)

real    0m4.915s
user    0m4.750s
sys     0m0.094s
$ 
```

The Python program does not reach 1% of completion after 120 seconds
```bash
$ time make runpython
python3 bst_semantics_python.py
(2158070: 1498305903)
iteration: 0
iteration: 100
iteration: 200
^CTraceback (most recent call last):
  File "bst_semantics_python.py", line 36, in <module>
    main()
  File "bst_semantics_python.py", line 23, in main
    test_key = nth_smallest_key(1000, my_map)
  File "bst_semantics_python.py", line 33, in nth_smallest_key
    return sorted(m.keys())[n]
KeyboardInterrupt
Makefile:8: recipe for target 'runpython' failed
make: *** [runpython] Error 1


real    2m2.040s
user    1m59.063s
sys     0m0.375s
$ 
```

I would like to propose adding such a data structure to the Python standard 
library. It could be called a SortedDict.
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/VFBM6UFVDUEPVCUTSTKGLB66MRIMOTCO/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to