Dennis Sweeney <sweeney.dennis...@gmail.com> added the comment:

If we were to do this, I think a better API might be to accept an arbitrary 
iterable, then produce a sorted iterable:

def sorted_on_disk(iterable, key=None, reverse=False) -> Iterable:
    ...

It would sort chunks of the input and store them in files as sequences of 
pickles, merging them as they got bigger, and then return an iterator over the 
resulting single sorted file.

This would be more composable with other standard Python functions and would be 
a good way of separating concerns. sorted(...) and heapq.merge(...) already 
have the correct APIs to do it this way.

Potential implementation detail: For some small fixed n, always doing a 2^n-way 
heapq.merge instead of a bunch of 2-way merges would do fewer passes over the 
data and would allow the keys to be computed 1/n as many times, assuming we 
wouldn't decorate-sort-undecorate.

----------
nosy: +Dennis Sweeney

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue41678>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to